With unicode enabled, this patch allows reading a fixed number of
characters from text-mode streams; eg file.read(5) will read 5 unicode
chars, which can made of more than 5 bytes.
For an ASCII stream (ie no chars > 127) it only needs to do 1 read. If
there are lots of non-ASCII chars in a stream, then it needs multiple
reads of the underlying object.
Adds a new test for this case. Enables unicode support by default on
unix and stmhal ports.
This script uses expected test results as generated by run-tests --write-exp,
and requires only standard unix shell funtionality (no bash). It is useful
to run testsuite on embedded systems, where there's no CPython and Bash.
Both "bound" (like, length known) and "unbound" (length unknown) are tested.
All of list, tuple, bytes, bytesarray offer approximately the same
performance, with "unbound" case being 30 times slower.
This will allow roughly the same behavior as Python3 for non-ASCII strings,
for example, print("<phrase in non-Latin script>".split()) will print list
of words, not weird hex dump (like Python2 behaves). (Of course, that it
will print list of words, if there're "words" in that phrase at all, separated
by ASCII-compatible whitespace; that surely won't apply to every human
language in existence).