Improving GC on PSRAM-enabled boards


#1

All ESP32 boards that have PSRAM also have malloc()… I was thinking that this could be utilized to speed up a full GC if memory usage is low.

Currently the entire workspace of 260000 cells is allocated all at once in a giant ps_malloc() call. However, I was thinking that it could be instead done in chunks.

You would define a struct that holds a reasonable chunk size of objects plus a pointer to walk to the next oldest chunk -

struct chunk {
    object objs[CHUNK_SIZE];
    struct chunk* next;
};

Then in myalloc() if the Freelist is empty, it would try to malloc() off a new chunk, and if that succeeds it will fill the freelist using the chunk’s objects and then return one of them. If the malloc() returns NULL then we’re really out of memory and the error can be thrown.

In garbage collection it will need to delete the chunk when it detect that the chunk is empty. The simplest method I have tried (and can confirm works) is as follows -

  1. Save the head of the Freelist when we start sweeping into a new chunk.
  2. Count the number of objects myfree()'d when sweeping the current chunk.
  3. If the number of objects equals CHUNK_SIZE, then the chunk is unused. Set Freelist back to the head saved in step 1, splice the chunk out of the list of chunks, and free() it.

Of course this doesn’t do anything to alleviate memory fragmentation… if the program runs in such a way that it allocates a new chunk and then all but one of the objects in that chunk get GC’d, the whole chunk hangs around and has to be swept just to keep that one object usable. I’m not sure how to fix that.

Let me know if you think this would be a viable option and I’ll try to cook up something closer to a working patch.

In the meantime, you can look at this code of mine where I implemented the above GC algorithm using a slightly different object system.


#2

I had some more thought on this and even if dynamic memory allocation is not able to be used, the Workspace could still be a statically allocated array of chunks. Each chunk could have a bit to indicate if it is in use, and the garbage collector could do the same sort of thing with the chunks as I mentioned above, except instead of calling free() on the unused chunks, it could just set in_use to false. When more memory is needed checking 1000 chunks for one that is not in use to replenish the Freelist, and then in the GC only sweeping the chunks that are in use, would go a lot faster than sweeping all 1000000 objects regardless of usage patterns. (using the RP2350 as an extreme example here)