This commit adds first class support for yield and yield-from in the native
emitter, including send and throw support, and yields enclosed in exception
handlers (which requires pulling down the NLR stack before yielding, then
rebuilding it when resuming).
This has been fully tested and is working on unix x86 and x86-64, and
stm32. Also basic tests have been done with the esp8266 port. Performance
of existing native code is unchanged.
This commit changes native code to handle constant objects like bytecode:
instead of storing the pointers inside the native code they are now stored
in a separate constant table (such pointers include objects like bignum,
bytes, and raw code for nested functions). This removes the need for the
GC to scan native code for root pointers, and takes a step towards making
native code independent of the runtime (eg so it can be compiled offline by
mpy-cross).
Note that the changes to the struct scope_t did not increase its size: on a
32-bit architecture it is still 48 bytes, and on a 64-bit architecture it
decreased from 80 to 72 bytes.
This commit makes viper functions have the same signature as native
functions, at the level of the emitter/assembler. This means that viper
functions can now be wrapped in the same uPy object as native functions.
Viper functions are now responsible for parsing their arguments (before it
was done by the runtime), and this makes calling them more efficient (in
most cases) because the viper entry code can be custom generated to suit
the signature of the function.
This change also opens the way forward for viper functions to take
arbitrary numbers of arguments, and for them to handle globals correctly,
among other things.
Now that the compiler can store the results of the viper types in the
scope, the viper parameter annotation compilation stage can be merged with
the normal parameter compilation stage.
Prior to this commit a function compiled with the native decorator
@micropython.native would not work correctly when accessing global
variables, because the globals dict was not being set upon function entry.
This commit fixes this problem by, upon function entry, setting as the
current globals dict the globals dict context the function was defined
within, as per normal Python semantics, and as bytecode does. Upon
function exit the original globals dict is restored.
In order to restore the globals dict when an exception is raised the native
function must guard its internals with an nlr_push/nlr_pop pair. Because
this push/pop is relatively expensive, in both C stack usage for the
nlr_buf_t and CPU execution time, the implementation here optimises things
as much as possible. First, the compiler keeps track of whether a function
even needs to access global variables. Using this information the native
emitter then generates three different kinds of code:
1. no globals used, no exception handlers: no nlr handling code and no
setting of the globals dict.
2. globals used, no exception handlers: an nlr_buf_t is allocated on the
C stack but it is not used if the globals dict is unchanged, saving
execution time because nlr_push/nlr_pop don't need to run.
3. function has exception handlers, may use globals: an nlr_buf_t is
allocated and nlr_push/nlr_pop are always called.
In the end, native functions that don't access globals and don't have
exception handlers will run more efficiently than those that do.
Fixes issue #1573.
This patch adds full support for unwinding jumps to the native emitter.
This means that return/break/continue can be used in try-except,
try-finally and with statements. For code that doesn't use unwinding jumps
there is almost no overhead added to the generated code.
Prior to this patch, native code would use a full nlr_buf_t for each
exception handler (try-except, try-finally, with). For nested exception
handlers this would use a lot of C stack and be rather inefficient.
This patch changes how exceptions are handled in native code by setting up
only a single nlr_buf_t context for the entire function, and then manages a
state machine (using the PC) to work out which exception handler to run
when an exception is raised by an nlr_jump. This keeps the C stack usage
at a constant level regardless of the depth of Python exception blocks.
The patch also fixes an existing bug when local variables are written to
within an exception handler, then their value was incorrectly restored if
an exception was raised (since the nlr_jump would restore register values,
back to the point of the nlr_push).
And it also gets nested try-finally+with working with the viper emitter.
Broadly speaking, efficiency of executing native code that doesn't use
any exception blocks is unchanged, and emitted code size is only slightly
increased for such function. C stack usage of all native functions is
either equal or less than before. Emitted code size for native functions
that use exception blocks is increased by roughly 10% (due in part to
fixing of above-mentioned bugs).
But, most importantly, this patch allows to implement more Python features
in native code, like unwind jumps and yielding from within nested exception
blocks.
Without this patch, on 64-bit architectures the "1 << (small_int_bits - 1)"
is computed using only 32-bit values (since small_int_bits is a uint8_t)
and so will overflow (and give the wrong result) if small_int_bits is
larger than 32.
Before this patch the context manager's __aexit__() method would not be
executed if a return/break/continue statement was used to exit an async
with block. async with now has the same semantics as normal with.
The fix here applies purely to the compiler, and does not modify the
runtime at all. It might (eventually) be better to define new bytecode(s)
to handle async with (and maybe other async constructs) in a cleaner, more
efficient way.
One minor drawback with addressing this issue purely in the compiler is
that it wasn't possible to get 100% CPython semantics. The thing that is
different here to CPython is that the __aexit__ method is not looked up in
the context manager until it is needed, which is after the body of the
async with statement has executed. So if a context manager doesn't have
__aexit__ then CPython raises an exception before the async with is
executed, whereas uPy will raise it after it is executed. Note that
__aenter__ is looked up at the beginning in uPy because it needs to be
called straightaway, so if the context manager isn't a context manager then
it'll still raise an exception at the same location as CPython. The only
difference is if the context manager has the __aenter__ method but not the
__aexit__ method, then in that case uPy has different behaviour. But this
is a very minor, and acceptable, difference.