By adding __builtin_unreachable() at the end of nlr_push, we're
essentially telling the compiler that this function will never return.
When GCC LTO is in use, this means that any time nlr_push() is called
(which is often), the compiler thinks this function will never return
and thus eliminates all code following the call.
Note: I've added a 'return 0' for older GCC versions like 4.6 which
complain about not returning anything (which doesn't make sense in a
naked function). Newer GCC versions (tested 4.8, 5.4 and some others)
don't complain about this.
This constant exception instance was once used by m_malloc_fail() to raise
a MemoryError without allocating memory, but it was made obsolete long ago
by 3556e45711. The functionality is now
replaced by the use of mp_emergency_exception_obj which lives in the global
uPy state, and which can handle any exception type, not just MemoryError.
This feature is not often used so is guarded by the config option
MICROPY_PY_BUILTINS_RANGE_BINOP which is disabled by default. With this
option disabled MicroPython will always return false when comparing two
range objects for equality (unless they are exactly the same object
instance). This does not match CPython so if (in)equality between range
objects is needed then this option should be enabled.
Enabling this option costs between 100 and 200 bytes of code space
depending on the machine architecture.
This patch provides inline versions of the utf8 helper functions for the
case when unicode is disabled (MICROPY_PY_BUILTINS_STR_UNICODE set to 0).
This saves code size.
The unichar_charlen function is also renamed to utf8_charlen to match the
other utf8 helper functions, and the signature of this function is adjusted
for consistency (const char* -> const byte*, mp_uint_t -> size_t).
Prior to this patch, a float literal that was close to subnormal would
have a loss of precision when parsed. The worst case was something like
float('10000000000000000000e-326') which returned 0.0.
This patch simplifies how sentinel values are stored on the stack when
doing an unwind return or jump. Instead of storing two values on the stack
for an unwind jump it now stores only one: a negative small integer means
unwind-return and a non-negative small integer means unwind-jump with the
value being the number of exceptions to unwind. The savings in code size
are:
bare-arm: -56
minimal x86: -68
unix x64: -80
unix nanbox: -4
stm32: -56
cc3200: -64
esp8266: -76
esp32: -156
The array should be of type unsigned byte because that is the type of the
values being stored. And changing to uint8_t helps to prevent warnings
from some static analysers.
Note that the check for elem!=NULL is removed for the
MP_MAP_LOOKUP_ADD_IF_NOT_FOUND case because mp_map_lookup will always
return non-NULL for such a case.
This patch combines the compiler optimisation code for double and triple
tuple-to-tuple assignment, taking it from two separate if-blocks to one
combined if-block. This can be done because the code for both of these
optimisations has a lot in common. Combining them together reduces code
size for ports that have the triple-tuple optimisation enabled (and doesn't
change code size for ports that have it disabled).
The number of registers used should be 10, not 12, to match the assembly
code in nlrx64.c. With this change the 64bit mingw builds don't need to
use the setjmp implementation, and this fixes miscellaneous crashes and
assertion failures as reported in #1751 for instance.
To avoid mistakes in the future where something gcc-related for Windows
only gets fixed for one particular compiler/environment combination,
make use of a MICROPY_NLR_OS_WINDOWS macro.
To make sure everything nlr-related is now ok when built with gcc this
has been verified with:
- unix port built with gcc on Cygwin (i686-pc-cygwin-gcc and
x86_64-pc-cygwin-gcc, version 6.4.0)
- windows port built with mingw-w64's gcc from Cygwin
(i686-w64-mingw32-gcc and x86_64-w64-mingw32-gcc, version 6.4.0)
and MSYS2 (like the ones on Cygwin but version 7.2.0)
There are two checks that are always false so can be converted to (negated)
assertions to save code space and execution time. They are:
1. The check of the str parameter, which is required to be non-NULL as per
the original comment that it has enough space in it as calculated by
mp_int_format_size. And for all uses of this function str is indeed
non-NULL.
2. The check of the base parameter, which is already required to be between
2 and 16 (inclusive) via the assertion in mp_int_format_size.
The motivation behind this patch is to remove unreachable code in mpn_div.
This unreachable code was added some time ago in
9a21d2e070, when a loop in mpn_div was copied
and adjusted to work when mpz_dig_t was exactly half of the size of
mpz_dbl_dig_t (a common case). The loop was copied correctly but it wasn't
noticed at the time that the final part of the calculation of num-quo*den
could be optimised, and hence unreachable code was left for a case that
never occurred.
The observation for the optimisation is that the initial value of quo in
mpn_div is either exact or too large (never too small), and therefore the
subtraction of quo*den from num may subtract exactly enough or too much
(but never too little). Using this observation the part of the algorithm
that handles the borrow value can be simplified, and most importantly this
eliminates the unreachable code.
The new code has been tested with DIG_SIZE=3 and DIG_SIZE=4 by dividing all
possible combinations of non-negative integers with between 0 and 3
(inclusive) mpz digits.
Empty __VA_ARGS__ are not allowed in the C preprocessor so adjust the rule
arg offset calculation to not use them. Also, some compilers (eg MSVC)
require an extra layer of macro expansion.
This is the sixth and final patch in a series of patches to the parser that
aims to reduce code size by compressing the data corresponding to the rules
of the grammar.
Prior to this set of patches the rules were stored as rule_t structs with
rule_id, act and arg members. And then there was a big table of pointers
which allowed to lookup the address of a rule_t struct given the id of that
rule.
The changes that have been made are:
- Breaking up of the rule_t struct into individual components, with each
component in a separate array.
- Removal of the rule_id part of the struct because it's not needed.
- Put all the rule arg data in a big array.
- Change the table of pointers to rules to a table of offsets within the
array of rule arg data.
The last point is what is done in this patch here and brings about the
biggest decreases in code size, because an array of pointers is now an
array of bytes.
Code size changes for the six patches combined is:
bare-arm: -644
minimal x86: -1856
unix x64: -5408
unix nanbox: -2080
stm32: -720
esp8266: -812
cc3200: -712
For the change in parser performance: it was measured on pyboard that these
six patches combined gave an increase in script parse time of about 0.4%.
This is due to the slightly more complicated way of looking up the data for
a rule (since the 9th bit of the offset into the rule arg data table is
calculated with an if statement). This is an acceptable increase in parse
time considering that parsing is only done once per script (if compiled on
the target).
Instead of each rule being stored in ROM as a struct with rule_id, act and
arg, the act and arg parts are now in separate arrays and the rule_id part
is removed because it's not needed. This reduces code size, by roughly one
byte per grammar rule, around 150 bytes.
The rule name is only used for debugging, and this patch makes things a bit
cleaner by completely separating out the rule name from the rest of the
rule data.
Each NLR implementation (Thumb, x86, x64, xtensa, setjmp) duplicates a lot
of the NLR code, specifically that dealing with pushing and popping the NLR
pointer to maintain the linked-list of NLR buffers. This patch factors all
of that code out of the specific implementations into generic functions in
nlr.c, along with a helper macro in nlr.h. This eliminates duplicated
code.