Extremely high "global variable" region consumption on ARM?

jwiede · 2024-11-24 02:48:23 UTC

Something seems… off…about dynamic memory consumption on the PicoPlus2 build. It appears to be consuming upwards of 300KB of dynamic memory (73%) just for a no-library/no-extensions build, where in most other builds the plain build consumes much less. Here’s what I see post-build:

Sketch uses 176788 bytes (2%) of program storage space. Maximum is 8380416 bytes.
Global variables use 384296 bytes (73%) of dynamic memory, leaving 139992 bytes for local variables. Maximum is 524288 bytes.

I noticed something similar with “RAM2” consumption on Teensy 4.1 where the build gave me the following stats:

Memory Usage on Teensy 4.1:
  FLASH: code:137736, data:54428, headers:8536   free for files:7925764
   RAM1: variables:58400, code:133896, padding:29944   free for local variables:302048
   RAM2: variables:493120  free for malloc/new:31168

Something seems a bit off there too, consuming more than 490KB, leaving only 32k in the pool.

As a contrast, here’s the post-build output of the T-Deck build (those are the three supported devs I have handy, though I probably have dozens around total):

Sketch uses 919005 bytes (70%) of program storage space. Maximum is 1310720 bytes.
Global variables use 48984 bytes (14%) of dynamic memory, leaving 278696 bytes for local variables. Maximum is 327680 bytes.

Now, I get it’s apples and oranges to compare ESP32 and ARM builds, but given they’re building the same code with essentially the same functionality, and are all three high-RAM + high-FLASH systems, the different between the ARM results and the ESP32 results are extreme for an area where it seems like all platforms should share similar footprint needs.

Why are global variables consuming only ~50KB of memory on ESP32, yet ~400KB/~500KB respectively for PicoPlus2 & Teensy? That’s a tenfold difference, and I’m having a difficult time justifying even data structure differences that could easily explain why ARM is consuming 10X versus ESP32 in that regard. I could even accept a 2x, maybe even 4x difference as implementation efficiency cost, but not 10x.

So…questions:

Can you please explain what all goes into “Global Variables”?
Is there any way to get a breakdown report from ULisp on potentially large data structures that might be bloating up that consumption? Kinda guessing not, so in lieu, can you point to whatever section of the code is responsible for defining the “big consumer structures”? I’d like to make sure this isn’t the result of some unintended stray zero or needless “alignment-misalignment” bloat?
In absence of hard footprint data, any clue off top of head based on experience with D&I why ARM builds for those two systems are consuming 10X what an ESP32 system (with otherwise similar-sized flash and psram additions) is consuming?

Thanks! Just trying to sort this out because, as you can see, in the Teensy case RAM2 is nearly exhausted, and even in the PicoPlus2 case the remaining variable space is quite limited relative to what the install used.

-John W

jwiede · 2024-11-24 03:14:39 UTC

And before anyone asks, here are the respective initial defines – no library, no extensions, pretty similar expectations for all of them…

PicoPlus2:

Sketch uses 176788 bytes (2%) of program storage space. Maximum is 8380416 bytes.
Global variables use 384296 bytes (73%) of dynamic memory, leaving 139992 bytes for local variables. Maximum is 524288 bytes.

// Lisp Library
const char LispLibrary[] = "";

// Compile options

// #define resetautorun
// #define printfreespace
// #define printgcs
#define sdcardsupport
// #define gfxsupport
// #define lisplibrary
#define assemblerlist
#define lineeditor
#define vt100
// #define extensions

// Includes

// #include "LispLibrary.h"

Teensy 4.1:

Memory Usage on Teensy 4.1:
  FLASH: code:137736, data:54428, headers:8536   free for files:7925764
   RAM1: variables:58400, code:133896, padding:29944   free for local variables:302048
   RAM2: variables:493120  free for malloc/new:31168

// Lisp Library
const char LispLibrary[] = "";

// Compile options

// #define resetautorun
// #define printfreespace
// #define printgcs
#define sdcardsupport
// #define gfxsupport
// #define lisplibrary
#define assemblerlist
#define lineeditor
#define vt100
// #define extensions

// Includes

// #include "LispLibrary.h"

ESP32S3 DevModule (T-Deck):

Sketch uses 919005 bytes (70%) of program storage space. Maximum is 1310720 bytes.
Global variables use 48984 bytes (14%) of dynamic memory, leaving 278696 bytes for local variables. Maximum is 327680 bytes.

// Lisp Library
const char LispLibrary[] = "";

// Compile options

// #define resetautorun
// #define printfreespace
#define serialmonitor
// #define printgcs
#define sdcardsupport
#define gfxsupport
// #define lisplibrary
// #define extensions

// Includes

// #include "LispLibrary.h"

P.S. Is it just me, or does something also look “off” with ESP32 memory numbers? It’s got 16MB flash & 8 MB psram addons same as the PicoPlus2 (that Teensy actually has 256MB flash & 8MB psram addons, but it wouldn’t let me go higher than 16MB flash in Arduino IDE so i understand its numbers).

Any help or information always appreciated. Not new with Lisp, but nowhere near an expert either, though I guess I’m miles past “expert”-level time investment on interpreters in general.

johnsondavies · 2024-11-25 16:44:08 UTC

Taking the Pimoroni Pico Plus 2 as an example, and assuming you’re not using the PSRAM, uLisp allocates a few global variables, an array of 256 bytes to hold machine code, and then most of the remaining RAM to the Lisp workspace in a global array Workspace[].

On most platforms any free RAM is used for the stack. The Arduino IDE makes the arbitrary decision that 25% of the RAM should be left available for the stack, and if you allocate 75% or more of the RAM to global variables it displays the scary message:

Low memory available, stability problems may occur.

This is a bit illogical, because why should a processor with more RAM need a larger stack? However, my policy is to make the size of the Workspace on each platform as large as possible, while avoiding this “Low memory available” warning.

If you move the uLisp workspace to PSRAM, as on the ESP32-S3 on the T-Deck, the remaining global variables only take 14% of the standard RAM, as your example shows. The Arduino IDE doesn’t report anything about the PSRAM.

Hope that explains at least some of your questions.
David

jwiede · 2024-11-26 02:38:18 UTC

Yep, I guess my response would be: Is there a reason why the uLisp workspace isn’t in PSRAM on the PicoPlus2? Why treat it differently than the ESP32-S3 in that regard? Just trying to understand the decision and the pros/cons before potentially changing it.

Is there an easy define to change it, or do I just need to go through and change the allocation for the array?

I’ll discuss the other two individually in separate responses. Thanks as always for all info provided!

johnsondavies · 2024-11-26 07:46:33 UTC

I decided against making PSRAM the default on the Pico Plus 2 because the performance isn’t as good as with the on-chip RAM.

If you want to use the PSRAM simply uncomment the line:

// #define board_has_psram

See: Pimoroni RP2350 boards