This patch adds the gc_sweep_all() function which does a garbage collection
without tracing any root pointers, so frees all the memory, and most
importantly runs any remaining finalisers.
This helps primarily for soft reset: it will close any open files, any open
sockets, and help to get the system back to a clean state upon soft reset.
Without this, if GC threshold is hit and there is not enough memory left to
satisfy the request, gc_collect() will run a second time and the search for
memory will happen again and will fail again.
Thanks to @adritium for pointing out this issue, see #3786.
This patch moves the start of the root pointer section in mp_state_ctx_t
so that it skips entries that are not pointers and don't need scanning.
Previously, the start of the root pointer section was at the very beginning
of the mp_state_ctx_t struct (which is the beginning of mp_state_thread_t).
This was the original assembler version of the NLR code was hard-coded to
have the nlr_top pointer at the start of this state structure. But now
that the NLR code is partially written in C there is no longer this
restriction on the location of nlr_top (and a comment to this effect has
been removed in this patch).
So now the root pointer section starts part way through the
mp_state_thread_t structure, after the entries which are not root pointers.
This patch also moves the non-pointer entries for MICROPY_ENABLE_SCHEDULER
outside the root pointer section.
Moving non-pointer entries out of the root pointer section helps to make
the GC more precise and should help to prevent some cases of collectable
garbage being kept.
This patch also has a measurable improvement in performance of the
pystone.py benchmark: on unix x86-64 and stm32 there was an improvement of
roughly 0.6% (tested with both gcc 7.3 and gcc 8.1).
This macro is written out explicitly in the two locations that it is used
and then the code is optimised, opening possibilities for further
optimisations and reducing code size:
unix: -48
minimal CROSS=1: -32
stm32: -32
This patch introduces the MICROPY_ENABLE_PYSTACK option (disabled by
default) which enables a "Python stack" that allows to allocate and free
memory in a scoped, or Last-In-First-Out (LIFO) way, similar to alloca().
A new memory allocation API is introduced along with this Py-stack. It
includes both "local" and "nonlocal" LIFO allocation. Local allocation is
intended to be equivalent to using alloca(), whereby the same function must
free the memory. Nonlocal allocation is where another function may free
the memory, so long as it's still LIFO.
Follow-up patches will convert all uses of alloca() and VLA to the new
scoped allocation API. The old behaviour (using alloca()) will still be
available, but when MICROPY_ENABLE_PYSTACK is enabled then alloca() is no
longer required or used.
The benefits of enabling this option are (or will be once subsequent
patches are made to convert alloca()/VLA):
- Toolchains without alloca() can use this feature to obtain correct and
efficient scoped memory allocation (compared to using the heap instead
of alloca(), which is slower).
- Even if alloca() is available, enabling the Py-stack gives slightly more
efficient use of stack space when calling nested Python functions, due to
the way that compilers implement alloca().
- Enabling the Py-stack with the stackless mode allows for even more
efficient stack usage, as well as retaining high performance (because the
heap is no longer used to build and destroy stackless code states).
- With Py-stack and stackless enabled, Python-calling-Python is no longer
recursive in the C mp_execute_bytecode function.
The micropython.pystack_use() function is included to measure usage of the
Python stack.
Accessing them will crash immediately instead still working for some time,
until overwritten by some other data, leading to much less deterministic
crashes.
These checks are assumed to be true in all cases where gc_realloc is
called with a valid pointer, so no need to waste code space and time
checking them in a non-debug build.
Header files that are considered internal to the py core and should not
normally be included directly are:
py/nlr.h - internal nlr configuration and declarations
py/bc0.h - contains bytecode macro definitions
py/runtime0.h - contains basic runtime enums
Instead, the top-level header files to include are one of:
py/obj.h - includes runtime0.h and defines everything to use the
mp_obj_t type
py/runtime.h - includes mpstate.h and hence nlr.h, obj.h, runtime0.h,
and defines everything to use the general runtime support functions
Additional, specific headers (eg py/objlist.h) can be included if needed.
If a finaliser raises an exception then it must not propagate through the
GC sweep function. This patch protects against such a thing by running
finaliser code via the mp_call_function_1_protected call.
This patch also adds scheduler lock/unlock calls around the finaliser
execution to further protect against any possible reentrancy issues: the
memory manager is already locked when doing a collection, but we also don't
want to allow any scheduled code to run, KeyboardInterrupts to interupt the
code, nor threads to switch.
There can be stray pointers in memory blocks that are not properly zero'd
after allocation. This patch adds a new config option to always zero all
allocated memory (via gc_alloc and gc_realloc) and hence help to eliminate
stray pointers.
See issue #2195.
Currently, MicroPython runs GC when it could not allocate a block of memory,
which happens when heap is exhausted. However, that policy can't work well
with "inifinity" heaps, e.g. backed by a virtual memory - there will be a
lot of swap thrashing long before VM will be exhausted. Instead, in such
cases "allocation threshold" policy is used: a GC is run after some number of
allocations have been made. Details vary, for example, number or total amount
of allocations can be used, threshold may be self-adjusting based on GC
outcome, etc.
This change implements a simple variant of such policy for MicroPython. Amount
of allocated memory so far is used for threshold, to make it useful to typical
finite-size, and small, heaps as used with MicroPython ports. And such GC policy
is indeed useful for such types of heaps too, as it allows to better control
fragmentation. For example, if a threshold is set to half size of heap, then
for an application which usually makes big number of small allocations, that
will (try to) keep half of heap memory in a nice defragmented state for an
occasional large allocation.
For an application which doesn't exhibit such behavior, there won't be any
visible effects, except for GC running more frequently, which however may
affect performance. To address this, the GC threshold is configurable, and
by default is off so far. It's configured with gc.threshold(amount_in_bytes)
call (can be queries without an argument).
Previously, if there was chain of allocated blocks ending with the last
block of heap, it wasn't included in number of 1/2-block or max block
size stats.
GC_EXIT() can cause a pending thread (waiting on the mutex) to be
scheduled right away. This other thread may trigger a garbage
collection. If the pointer to the newly-allocated block (allocated by
the original thread) is not computed before the switch (so it's just left
as a block number) then the block will be wrongly reclaimed.
This patch makes sure the pointer is computed before allowing any thread
switch to occur.
By using a single, global mutex, all memory-related functions (alloc,
free, realloc, collect, etc) are made thread safe. This means that only
one thread can be in such a function at any one time.