Improving GC on PSRAM-enabled boards

dragoncoder047 · 2025-01-12 22:44:48 UTC

All ESP32 boards that have PSRAM also have malloc()… I was thinking that this could be utilized to speed up a full GC if memory usage is low.

Currently the entire workspace of 260000 cells is allocated all at once in a giant ps_malloc() call. However, I was thinking that it could be instead done in chunks.

You would define a struct that holds a reasonable chunk size of objects plus a pointer to walk to the next oldest chunk -

struct chunk {
    object objs[CHUNK_SIZE];
    struct chunk* next;
};

Then in myalloc() if the Freelist is empty, it would try to malloc() off a new chunk, and if that succeeds it will fill the freelist using the chunk’s objects and then return one of them. If the malloc() returns NULL then we’re really out of memory and the error can be thrown.

In garbage collection it will need to delete the chunk when it detect that the chunk is empty. The simplest method I have tried (and can confirm works) is as follows -

Save the head of the Freelist when we start sweeping into a new chunk.
Count the number of objects myfree()'d when sweeping the current chunk.
If the number of objects equals CHUNK_SIZE, then the chunk is unused. Set Freelist back to the head saved in step 1, splice the chunk out of the list of chunks, and free() it.

Of course this doesn’t do anything to alleviate memory fragmentation… if the program runs in such a way that it allocates a new chunk and then all but one of the objects in that chunk get GC’d, the whole chunk hangs around and has to be swept just to keep that one object usable. I’m not sure how to fix that.

Let me know if you think this would be a viable option and I’ll try to cook up something closer to a working patch.

In the meantime, you can look at this code of mine where I implemented the above GC algorithm using a slightly different object system.