57 Commits

Author SHA1 Message Date
Georg Brandl
2a5a3027f2 Fix codecs.EncodedFile which did not use file_encoding in 2.5.0, and
fix all codecs file wrappers to work correctly with the "with"
statement (bug #1586513).
 (backport from rev. 52517)
2006-10-29 08:39:27 +00:00
Walter Dörwald
78a0be6ab3 Add a BufferedIncrementalEncoder class that can be used for implementing
an incremental encoder that must retain part of the data between calls
to the encode() method.

Fix the incremental encoder and decoder for the IDNA encoding.

This closes SF patch #1453235.
2006-04-14 18:25:39 +00:00
Walter Dörwald
b17f12bbc6 Fix wrong attribute name. 2006-04-14 15:40:54 +00:00
Walter Dörwald
6a7ec7c3e2 Change raise statement to PEP 8 style. 2006-03-18 16:35:17 +00:00
Neal Norwitz
6bed1c1fab Add some versionadded info to new incremental codec docs and fix doco nits. 2006-03-16 07:49:19 +00:00
Walter Dörwald
abb02e5994 Patch #1436130: codecs.lookup() now returns a CodecInfo object (a subclass
of tuple) that provides incremental decoders and encoders (a way to use
stateful codecs without the stream API). Functions
codecs.getincrementaldecoder() and codecs.getincrementalencoder() have
been added.
2006-03-15 11:35:15 +00:00
Walter Dörwald
ca199432c2 If size is specified, try to read at least size characters.
This is a alternative version of patch #1379332.
2006-03-06 22:39:12 +00:00
Tim Peters
536cf99536 Whitespace normalization. 2005-12-25 23:18:31 +00:00
Martin v. Löwis
4ed673877d Patch #1268314: Cache lines in StreamReader.readlines for performance.
Will backport to Python 2.4.
2005-09-18 08:34:39 +00:00
Walter Dörwald
c5238b8288 SF bug #1235646: codecs.StreamRecoder.next() now reencodes the data it reads
from the input stream, so that the output is a byte string in the correct
encoding instead of a unicode string.
2005-09-01 11:56:53 +00:00
Martin v. Löwis
56066d2e55 Return complete lines from codec stream readers
even if there is an exception in later lines, resulting in
correct line numbers for decoding errors in source code. Fixes #1178484.
Will backport to 2.4.
2005-08-24 07:38:12 +00:00
Walter Dörwald
c9878e1b22 Make attributes and local variables in the StreamReader str objects instead
of unicode objects, so that codecs that do a str->str decoding won't promote
the result to unicode. This fixes SF bug #1241507.
2005-07-20 22:15:39 +00:00
Walter Dörwald
a4eb2d56a4 Fix comment. 2005-04-21 21:42:35 +00:00
Walter Dörwald
bc8e642c1b If the data read from the bytestream in readline() ends in a '\r' read one more
byte, even if the user has passed a size parameter. This extra byte shouldn't
cause a buffer overflow in the tokenizer. The original plan was to return a line
ending in '\r', which might be recognizable as a complete line and skip any '\n'
that was read afterwards. Unfortunately this didn't work, as the tokenizer only
recognizes '\n' as line ends, which in turn lead to joined lines and
SyntaxErrors, so this special treatment of a split '\r\n' has been dropped. (It
can only happen with a temporarily exhausted bytestream now anyway.)
Fixes parts of SF bugs #1163244 and #1175396.
2005-04-21 21:32:03 +00:00
Walter Dörwald
714f87821f Fix typos. 2005-04-04 21:42:22 +00:00
Walter Dörwald
7a6dc139de Fix for SF bug #1175396: readline() will now read one more character, if
the last character read is "\r" (and size is None, i.e. we're allowed to
call read() multiple times), so that we can return the correct line ending
(this additional character might be a "\n").

If the stream is temporarily exhausted, we might return the wrong line ending
(if the last character read is "\r" and the next one (after the byte stream
provides more data) is "\n", but at least the atcr member ensure that we
get the correct number of lines (i.e. this "\n" will not be treated as
another line ending.)
2005-04-04 21:38:47 +00:00
Skip Montanaro
9f5f9d943d typo 2005-03-16 03:51:56 +00:00
Walter Dörwald
71fd90da87 Add default value for "whence" argument. 2005-03-14 19:25:41 +00:00
Walter Dörwald
729c31f5c3 Reset internal buffers when seek() is called. This fixes SF bug #1156259. 2005-03-14 19:06:30 +00:00
Martin v. Löwis
e2713becd8 Build with --disable-unicode again. Fixes #1158607.
Will backport to 2.4.
2005-03-08 15:03:08 +00:00
Walter Dörwald
9fa0946771 Fix and test for SF bug #1098990: codec readline() splits lines apart. 2005-01-10 12:01:39 +00:00
Walter Dörwald
e57d7b179a The changes to the stateful codecs in 2.4 resulted in StreamReader.readline()
trying to return a complete line even if a size parameter was given (see
http://www.python.org/sf/1076985). This leads to buffer overflows with long
source lines under Windows if e.g. cp1252 is used as the source encoding.
This patch reverts the behaviour of readline() to something that behaves more
like Python 2.3: If a size parameter is given, read() is called only once.

As a side effect of this, readline() now supports all types of linebreaks
supported by unicode.splitlines().

Note that the tokenizer is still broken and it's possible to provoke segfaults
(see http://www.python.org/sf/1089395).
2004-12-21 22:24:00 +00:00
Hye-Shik Chang
af5c7cff56 SF #1048865: Fix a trivial typo that breaks StreamReader.readlines() 2004-10-17 23:51:21 +00:00
Walter Dörwald
69652035bc SF patch #998993: The UTF-8 and the UTF-16 stateful decoders now support
decoding incomplete input (when the input stream is temporarily exhausted).
codecs.StreamReader now implements buffering, which enables proper
readline support for the UTF-16 decoders. codecs.StreamReader.read()
has a new argument chars which specifies the number of characters to
return. codecs.StreamReader.readline() and codecs.StreamReader.readlines()
have a new argument keepends. Trailing "\n"s will be stripped from the lines
if keepends is false. Added C APIs PyUnicode_DecodeUTF8Stateful and
PyUnicode_DecodeUTF16Stateful.
2004-09-07 20:24:22 +00:00
Marc-André Lemburg
d594849c42 Ignore sizehint argument. Fixes SF #844561. 2004-02-26 15:22:17 +00:00