Commit Graph

87 Commits

Author SHA1 Message Date
Victor Stinner
d1f9942ae3 Issue #18408: Fix cjkcodecs decoders, add a new MBERR_EXCEPTION constant to
notify exceptions raised by the _PyUnicodeWriter API
2013-07-16 21:41:43 +02:00
Victor Stinner
33283ba300 Issue #18408: Fix CJK decoders, raise MemoryError on memory allocation failure 2013-07-15 17:47:39 +02:00
Victor Stinner
064bbdc79b fix indentation 2013-07-08 22:28:27 +02:00
Victor Stinner
8f674ccd64 Close #17694: Add minimum length to _PyUnicodeWriter
* Add also min_char attribute to _PyUnicodeWriter structure (currently unused)
 * _PyUnicodeWriter_Init() has no more argument (except the writer itself):
   min_length and overallocate must be set explicitly
 * In error handlers, only enable overallocation if the replacement string
   is longer than 1 character
 * CJK decoders don't use overallocation anymore
 * Set min_length, instead of preallocating memory using
   _PyUnicodeWriter_Prepare(), in many decoders
 * _PyUnicode_DecodeUnicodeInternal() checks for integer overflow
2013-04-17 23:02:17 +02:00
Victor Stinner
322cc7438c Issue #17693: Fix memory/reference leaks 2013-04-14 18:11:41 +02:00
Victor Stinner
d949126995 Issue #17693: CJK encoders now use the new Unicode API (PEP 393) 2013-04-14 02:06:32 +02:00
Victor Stinner
a0dd0213cc Close #17693: Rewrite CJK decoders to use the _PyUnicodeWriter API instead of
the legacy Py_UNICODE API.

Add also a new _PyUnicodeWriter_WriteChar() function.
2013-04-11 22:09:04 +02:00
Benjamin Peterson
26e5335a46 merge 3.3 (#16585) 2012-12-02 11:21:02 -05:00
Benjamin Peterson
47a00f3d1a support encoding error handlers that return bytes (closes #16585) 2012-12-02 11:20:28 -05:00
Benjamin Peterson
3d490d4eff merge 3.3 2012-12-02 10:53:48 -05:00
Benjamin Peterson
aff472394c unicode -> str 2012-12-02 10:53:41 -05:00
Victor Stinner
b9e2d3f884 Issue #16330: Fix compilation on Windows 2012-10-30 02:30:31 +01:00
Victor Stinner
76df43de30 Issue #16330: Use surrogate-related macros
Patch written by Serhiy Storchaka.
2012-10-30 01:42:39 +01:00
Victor Stinner
b37b17423b Replace PyUnicode_FromUnicode(NULL, 0) by PyUnicode_New(0, 0)
Create an empty string with the new Unicode API.
2011-12-01 03:18:59 +01:00
Victor Stinner
08b523a194 MultibyteCodec_Decode() catchs PyUnicode_AS_UNICODE() failures 2011-12-01 03:18:30 +01:00
Victor Stinner
4eea849469 CJK codecs checks for conversion to Py_UNICODE* failures 2011-11-21 03:01:27 +01:00
Victor Stinner
9a80faba88 MultibyteCodec_Encode() checks if PyUnicode_AS_UNICODE() failed 2011-11-21 02:50:14 +01:00
Martin v. Löwis
bd928fef42 Rename _Py_identifier to _Py_IDENTIFIER. 2011-10-14 10:20:37 +02:00
Martin v. Löwis
afe55bba33 Add API for static strings, primarily good for identifiers.
Thanks to Konrad Schöbel and Jasper Schulz for helping with the mass-editing.
2011-10-09 10:38:36 +02:00
Victor Stinner
2cded9c3f3 Issue #12016: Multibyte CJK decoders now resynchronize faster
They only ignore the first byte of an invalid byte sequence.

For example, b'\xff\n'.decode('gb2312', 'replace') gives '\ufffd\n' instead of
'\ufffd'.
2011-07-08 01:45:13 +02:00
Victor Stinner
2bc6c2ec2f (Merge 3.2) Issue #12016: Reindent decoders of HK and JP codecs 2011-06-03 23:34:32 +02:00
Victor Stinner
5dfe3bb2d9 Issue #12016: Reindent decoders of HK and JP codecs 2011-06-03 23:34:09 +02:00
Victor Stinner
e15dce3d18 Close #12171: IncrementalEncoder.reset() of CJK codecs (multibytecodec) calls
encreset() instead of decreset().
2011-05-30 22:56:00 +02:00
Victor Stinner
eb734f77ad (Merge 3.2) Issue #12100: Don't reset incremental encoders of CJK codecs at
each call to their encode() method anymore, but continue to call the reset()
method if the final argument is True.
2011-05-24 22:24:11 +02:00
Victor Stinner
6bcbef7da0 Issue #12100: Don't reset incremental encoders of CJK codecs at each call to
their encode() method anymore, but continue to call the reset() method if the
final argument is True.
2011-05-24 22:17:55 +02:00