60 Commits

Author SHA1 Message Date
Marc-André Lemburg
040f76b79c Slightly revised version of patch #1538956:
Replace UnicodeDecodeErrors raised during == and !=
compares of Unicode and other objects with a new
UnicodeWarning.

All other comparisons continue to raise exceptions.
Exceptions other than UnicodeDecodeErrors are also left
untouched.
2006-08-14 10:55:19 +00:00
Martin v. Löwis
d825143be1 Patch #1455898: Incremental mode for "mbcs" codec. 2006-06-14 05:21:04 +00:00
Martin v. Löwis
3f767795f6 Patch #1359618: Speed-up charmap encoder. 2006-06-04 19:36:28 +00:00
Fredrik Lundh
80f8e80c15 needforspeed: added Py_MEMCPY macro (currently tuned for Visual C only),
and use it for string copy operations.  this gives a 20% speedup on some
string benchmarks.
2006-05-28 12:06:46 +00:00
Fredrik Lundh
b3167cbcd7 needforspeed: added rpartition implementation 2006-05-26 18:15:38 +00:00
Fredrik Lundh
06a69dd8ff needforspeed: partition implementation, part two.
feel free to improve the documentation and the docstrings.
2006-05-26 08:54:28 +00:00
Fredrik Lundh
3d885e0195 needforspeed: check first *and* last character before doing a full memcmp 2006-05-23 10:10:57 +00:00
Fredrik Lundh
8a8e05a2b9 needforspeed: use memcpy for "long" strings; use a better algorithm
for long repeats.
2006-05-22 17:12:58 +00:00
Fredrik Lundh
f1d60a5384 needforspeed: speed up unicode repeat, unicode string copy 2006-05-22 16:29:30 +00:00
Martin v. Löwis
18e165558b Merge ssize_t branch. 2006-02-15 17:27:45 +00:00
Tim Peters
2576c97f52 _PyUnicode_IsWhitespace(),
_PyUnicode_IsLinebreak():
Changed the declarations to match the definitions.

Don't know why they differed; MSVC warned about it;
don't know why only these two functions use "const".
Someone who does may want to do something saner ;-).
2005-10-29 02:33:18 +00:00
Walter Dörwald
a47d1c08d0 SF bug #1251300: On UCS-4 builds the "unicode-internal" codec will now complain
about illegal code points. The codec now supports PEP 293 style error handlers.
(This is a variant of the Nik Haldimann's patch that detects truncated data)
2005-08-30 10:23:14 +00:00
Marc-André Lemburg
a9cadcd41b Correct the handling of 0-termination of PyUnicode_AsWideChar()
and its usage in PyLocale_strcoll().

Clarify the documentation on this.

Thanks to Andreas Degert for pointing this out.
2004-11-22 13:02:31 +00:00
Raymond Hettinger
57341c37c9 SF patch #1056231: typo in comment (unicodeobject.h) 2004-10-31 05:46:59 +00:00
Walter Dörwald
69652035bc SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now support
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
2004-09-07 20:24:22 +00:00
Hye-Shik Chang
e9ddfbb412 SF #989185: Drop unicode.iswide() and unicode.width() and add
unicodedata.east_asian_width().  You can still implement your own
simple width() function using it like this:
    def width(u):
        w = 0
        for c in unicodedata.normalize('NFC', u):
            cwidth = unicodedata.east_asian_width(c)
            if cwidth in ('W', 'F'): w += 2
            else: w += 1
        return w
2004-08-04 07:38:35 +00:00
Marc-André Lemburg
d2d4598ec2 Allow string and unicode return types from .encode()/.decode()
methods on string and unicode objects. Added unicode.decode()
which was missing for no apparent reason.
2004-07-08 17:57:32 +00:00
Hye-Shik Chang
974ed7cfa5 - SF #962502: Add two more methods for unicode type; width() and
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
2004-06-02 16:49:17 +00:00
Hye-Shik Chang
3ae811b57d Add rsplit method for str and unicode builtin types.
SF feature request #801847.
Original patch is written by Sean Reifschneider.
2003-12-15 18:49:53 +00:00
Marc-André Lemburg
9c329de47e Add name mangling for new PyUnicode_FromOrdinal() and fix declaration
to use new extern macro.
2002-08-12 08:19:10 +00:00
Mark Hammond
91a681debf Excise DL_EXPORT from Include.
Thanks to Skip Montanaro and Kalle Svensson for the patches.
2002-08-12 07:21:58 +00:00
Marc-André Lemburg
cc8764ca9d Add C API PyUnicode_FromOrdinal() which exposes unichr() at C level.
u'%c' will now raise a ValueError in case the argument is an
integer outside the valid range of Unicode code point ordinals.

Closes SF bug #593581.
2002-08-11 12:23:04 +00:00
Marc-André Lemburg
4da6fd63bc Fix for bug [ 561796 ] string.find causes lazy error 2002-05-29 11:33:13 +00:00
Walter Dörwald
de02bcb265 Apply patch diff.txt from SF feature request
http://www.python.org/sf/444708

This adds the optional argument for str.strip
to unicode.strip too and makes it possible
to call str.strip with a unicode argument
and unicode.strip with a str argument.
2002-04-22 17:42:37 +00:00
Guido van Rossum
b8c65bc27f SF patch #470578: Fixes to synchronize unicode() and str()
This patch implements what we have discussed on python-dev late in
    September: str(obj) and unicode(obj) should behave similar, while
    the old behaviour is retained for unicode(obj, encoding, errors).

    The patch also adds a new feature with which objects can provide
    unicode(obj) with input data: the __unicode__ method. Currently no
    new tp_unicode slot is implemented; this is left as option for the
    future.

    Note that PyUnicode_FromEncodedObject() no longer accepts Unicode
    objects as input. The API name already suggests that Unicode
    objects do not belong in the list of acceptable objects and the
    functionality was only needed because
    PyUnicode_FromEncodedObject() was being used directly by
    unicode(). The latter was changed in the discussed way:

    * unicode(obj) calls PyObject_Unicode()
    * unicode(obj, encoding, errors) calls PyUnicode_FromEncodedObject()

    One thing left open to discussion is whether to leave the
    PyUnicode_FromObject() API as a thin API extension on top of
    PyUnicode_FromEncodedObject() or to turn it into a (macro) alias
    for PyObject_Unicode() and deprecate it. Doing so would have some
    surprising consequences though, e.g.  u"abc" + 123 would turn out
    as u"abc123"...

[Marc-Andre didn't have time to check this in before the deadline.  I
hope this is OK, Marc-Andre!  You can still make changes and commit
them on the trunk after the branch has been made, but then please mail
Barry a context diff if you want the change to be merged into the
2.2b1 release branch.  GvR]
2001-10-19 02:01:31 +00:00