Previous to this patch all interned strings lived in their own malloc'd
chunk. On average this wastes N/2 bytes per interned string, where N is
the number-of-bytes for a quanta of the memory allocator (16 bytes on 32
bit archs).
With this patch interned strings are concatenated into the same malloc'd
chunk when possible. Such chunks are enlarged inplace when possible,
and shrunk to fit when a new chunk is needed.
RAM savings with this patch are highly varied, but should always show an
improvement (unless only 3 or 4 strings are interned). New version
typically uses about 70% of previous memory for the qstr data, and can
lead to savings of around 10% of total memory footprint of a running
script.
Costs about 120 bytes code size on Thumb2 archs (depends on how many
calls to gc_realloc are made).
GC for unix/windows builds doesn't make use of the bss section anymore,
so we do not need the (sometimes complicated) build features and code related to it
This patch consolidates all global variables in py/ core into one place,
in a global structure. Root pointers are all located together to make
GC tracing easier and more efficient.
gc.enable/disable are now the same as CPython: they just control whether
automatic garbage collection is enabled or not. If disabled, you can
still allocate heap memory, and initiate a manual collection.
The heap allocation is now exactly as it was before the "faster gc
alloc" patch, but it's still nearly as fast. It is fixed by being
careful to always update the "last free block" pointer whenever the heap
changes (eg free or realloc).
Tested on all tests by enabling EXTENSIVE_HEAP_PROFILING in py/gc.c:
old and new allocator have exactly the same behaviour, just the new one
is much faster.
Recent speed up of GC allocation made the GC have a fragmented heap.
This patch restores "original fragmentation behaviour" whilst still
retaining relatively fast allocation. This patch works because there is
always going to be a single block allocated now and then, which advances
the gc_last_free_atb_index pointer often enough so that the whole heap
doesn't need scanning.
Should address issue #836.
This simple patch gives a very significant speed up for memory allocation
with the GC.
Eg, on PYBv1.0:
tests/basics/dict_del.py: 3.55 seconds -> 1.19 seconds
tests/misc/rge_sm.py: 15.3 seconds -> 2.48 seconds
This was a nasty bug to track down. It only had consequences when the
heap size was just the right size to expose the rounding error in the
calculation of the finaliser table size. And, a script had to allocate
a small (1 or 2 cell) object at the very end of the heap. And, this
object must not have a finaliser. And, the initial state of the heap
must have been all bits set to 1. All these conspire on the pyboard,
but only if your run the script fresh (so unused memory is all 1's),
and if your script allocates a lot of small objects (eg 2-char strings
that are not interned).