* Speed-up "x in y" where x has more than one character.
The existing code made excessive calls to the expensive memcmp() function.
The new code uses memchr() to rapidly find a start point for memcmp().
In addition to knowing that the first character is a match, the new code
also checks that the last character is a match. This significantly reduces
the incidence of false starts (saving memcmp() calls and making quadratic
behavior less likely).
Improves the timings on:
python -m timeit -r7 -s"x='a'*1000" "'ab' in x"
python -m timeit -r7 -s"x='a'*1000" "'bc' in x"
Once this code has proven itself, then string_find_internal() should refer
to it rather than running its own version. Also, something similar may
apply to unicode objects.
(Patch contributed by Nick Coghlan.)
Now joining string subtypes will always return a string.
Formerly, if there were only one item, it was returned unchanged.
hack: it would resize *interned* strings in-place! This occurred because
their reference counts do not have their expected value -- stringobject.c
hacks them. Mea culpa.
interning were not clear here -- a subclass could be mutable, for
example -- and had bugs. Explicitly interning a subclass of string
via intern() will raise a TypeError. Internal operations that attempt
to intern a string subclass will have no effect.
Added a few tests to test_builtin that includes the old buggy code and
verifies that calls like PyObject_SetAttr() don't fail. Perhaps these
tests should have gone in test_string.
bit by checking the value of UCHAR_MAX in Include/Python.h. There was a
check in Objects/stringobject.c. Remove that. (Note that we don't define
UCHAR_MAX if it's not defined as the old test did.)
and left shifts. (Thanks to Kalle Svensson for SF patch 849227.)
This addresses most of the remaining semantic changes promised by
PEP 237, except for repr() of a long, which still shows the trailing
'L'. The PEP appears to promise warnings for operations that
changed semantics compared to Python 2.3, but this is not
implemented; we've suffered through enough warnings related to
hex/oct literals and I think it's best to be silent now.
* Doc - add doc for when functions were added
* UserString
* string object methods
* string module functions
'chars' is used for the last parameter everywhere.
These changes will be backported, since part of the changes
have already been made, but they were inconsistent.
types. The special handling for these can now be removed from save_newobj().
Add some testing for this.
Also add support for setting the 'fast' flag on the Python Pickler class,
which suppresses use of the memo.