84 Commits

Author SHA1 Message Date
Anthony Baxter
88c646e99e backport tim_one's patch:
[Re-did unicodeobject.c - it's changed a lot since 2.1 :) Pretty confident
that it's correct]

Repair widespread misuse of _PyString_Resize.  Since it's clear people
don't understand how this function works, also beefed up the docs.  The
most common usage error is of this form (often spread out across gotos):

	if (_PyString_Resize(&s, n) < 0) {
		Py_DECREF(s);
		s = NULL;
		goto outtahere;
	}

The error is that if _PyString_Resize runs out of memory, it automatically
decrefs the input string object s (which also deallocates it, since its
refcount must be 1 upon entry), and sets s to NULL.  So if the "if"
branch ever triggers, it's an error to call Py_DECREF(s):  s is already
NULL!  A correct way to write the above is the simpler (and intended)

	if (_PyString_Resize(&s, n) < 0)
		goto outtahere;

Bugfix candidate.

Original patch(es):
python/dist/src/Objects/fileobject.c:2.161
python/dist/src/Objects/stringobject.c:2.161
python/dist/src/Objects/unicodeobject.c:2.147
2002-04-30 03:58:47 +00:00
Thomas Wouters
930874efdd Backport of Tim's checkin 2.88:
A different approach to the problem reported in
Patch #419651: Metrowerks on Mac adds 0x itself
C std says %#x and %#X conversion of 0 do not add the 0x/0X base marker.
Metrowerks apparently does.  Mark Favas reported the same bug under a
Compaq compiler on Tru64 Unix, but no other libc broken in this respect
is known (known to be OK under MSVC and gcc).
So just try the damn thing at runtime and see what the platform does.
Note that we've always had bugs here, but never knew it before because
a relevant test case didn't exist before 2.1.
2001-05-23 12:31:25 +00:00
Tim Peters
fff5325078 Bug 415514 reported that e.g.
"%#x" % 0
blew up, at heart because C sprintf supplies a base marker if and only if
the value is not 0.  I then fixed that, by tolerating C's inconsistency
when it does %#x, and taking away that *Python* produced 0x0 when
formatting 0L (the "long" flavor of 0) under %#x itself.  But after talking
with Guido, we agreed it would be better to supply 0x for the short int
case too, despite that it's inconsistent with C, because C is inconsistent
with itself and with Python's hex(0) (plus, while "%#x" % 0 didn't work
before, "%#x" % 0L *did*, and returned "0x0").  Similarly for %#X conversion.
2001-04-12 18:38:48 +00:00
Tim Peters
711088d9b8 Fix for SF bug #415514: "%#x" % 0 caused assertion failure/abort.
http://sourceforge.net/tracker/index.php?func=detail&aid=415514&group_id=5470&atid=105470
For short ints, Python defers to the platform C library to figure out what
%#x should do.  The code asserted that the platform C returned a string
beginning with "0x".  However, that's not true when-- and only when --the
*value* being formatted is 0.  Changed the code to live with C's inconsistency
here.  In the meantime, the problem does not arise if you format a long 0 (0L)
instead.  However, that's because the code *we* wrote to do %#x conversions on
longs produces a leading "0x" regardless of value.  That's probably wrong too:
we should drop leading "0x", for consistency with C, when (& only when) formatting
0L.  So I changed the long formatting code to do that too.
2001-04-12 00:35:51 +00:00
Fredrik Lundh
ccc7473fc8 reorganized PyUnicode_DecodeUnicodeEscape a bit (in order to make it
less likely that bug #132817 ever appears again)
2001-02-18 22:13:49 +00:00
Marc-André Lemburg
fde66e1bcc Fixed .capitalize() method of Unicode objects to work like the
corresponding string method. Added tests for this too.

Patch written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.
2001-01-29 11:14:16 +00:00
Ka-Ping Yee
fa004ad36c Show '\011', '\012', and '\015' as '\t', '\n', '\r' in strings.
Switch from octal escapes to hex escapes for other nonprintable characters.
2001-01-24 17:19:08 +00:00
Fredrik Lundh
06d126803c Move uchhash functionality into unicodedata (after the recent
crop of changes, the files are small enough to do this).  Also
adds "name" and "lookup" functions to unicodedata.
2001-01-24 07:59:11 +00:00
Fredrik Lundh
f60560626c Better error message if ucnhash cannot be found (obscure attribute
errors aren't that helpful), or doesn't contain what's expected from
it.  Also tweaked the test script so it compiles even if ucnhash is
missing.
2001-01-20 11:15:25 +00:00
Fredrik Lundh
0fdb90cafe refactored the unicodeobject/ucnhash interface, to hide the
implementation details inside the ucnhash module.

also cleaned up the unicode copyright blurb a little; Secret Labs'
internal revision history isn't that interesting...
2001-01-19 09:45:02 +00:00
Marc-André Lemburg
ad7c98e264 This patch adds a new builtin unistr() which behaves like str()
except that it always returns Unicode objects.

A new C API PyObject_Unicode() is also provided.

This closes patch #101664.

Written by Marc-Andre Lemburg. Copyright assigned to Guido van Rossum.
2001-01-17 17:09:53 +00:00
Marc-André Lemburg
3a645e4dd4 Added checks to prevent PyUnicode_Count() from dumping core
in case the parameters are out of bounds and fixes error handling
for .count(), .startswith() and .endswith() for the case of
mixed string/Unicode objects.

This patch adds Python style index semantics to PyUnicode_Count()
indices (including the special handling of negative indices).

The patch is an extended version of patch #103249 submitted
by Michael Hudson (mwh) on SF. It also includes new test cases.
2001-01-16 11:54:12 +00:00
Marc-André Lemburg
ec233e5803 This patch adds a new feature to the builtin charmap codec:
The mapping dictionaries can now contain 1-n mappings, meaning
that character ordinals may be mapped to strings or Unicode object,
e.g. 0x0078 ('x') -> u"abc", causing the ordinal to be replaced by
the complete string or Unicode object instead of just one character.

Another feature introduced by the patch is that of mapping oridnals to
the emtpy string. This allows removing characters.

The patch is different from patch #103100 in that it does not cause a
performance hit for the normal use case of 1-1 mappings.

Written by Marc-Andre Lemburg, copyright assigned to Guido van Rossum.
2001-01-06 14:59:58 +00:00
Marc-André Lemburg
a866df806d This patch changes the default behaviour of the builtin charmap
codec to not apply Latin-1 mappings for keys which are not found
in the mapping dictionaries, but instead treat them as undefined
mappings.

The patch was originally written by Martin v. Loewis with some
additional (cosmetic) changes and an updated test script
by Marc-Andre Lemburg.

The standard codecs were recreated from the most current files
available at the Unicode.org site using the Tools/scripts/gencodec.py
tool.

This patch closes the bugs #116285 and #119960.
2001-01-03 21:29:14 +00:00
Andrew M. Kuchling
f947ffe951 Patch #102940: use only printable Unicode chars in reporting
incorrect % characters; characters outside the printable range are
 replaced with '?'
2000-12-19 22:49:06 +00:00
Guido van Rossum
cda4f9a8dc Fix off-by-one error in split_substring(). Fixes SF bug #122162. 2000-12-19 02:23:19 +00:00
Andrew M. Kuchling
6ca8917758 [ Patch #102852 ] Make % error a bit more informative by indicates the
index at which an unknown %-escape was found
2000-12-15 13:07:46 +00:00
Tim Peters
a3a3a030af Fox for SF bug #123859: %[duxXo] long formats inconsistent. 2000-11-30 05:22:44 +00:00
Barry Warsaw
5b4c22806f _PyUnicode_Fini(): Initialize the local freelist walking variable `u'
after unicode_empty has been freed, otherwise it might not point to
the real start of the unicode_freelist.  Final closure for SF bug
#110681, Jitterbug PR#398.
2000-10-03 20:45:26 +00:00
Guido van Rossum
4ae8ef84da In _PyUnicode_Fini(), decref unicode_empty before tearng down the free
list.  Discovered by Barry, fix approved by MAL.
2000-10-03 18:09:04 +00:00
Fred Drake
d5fadf75e4 Rationalize use of limits.h, moving the inclusion to Python.h.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.

This closes SourceForge patch #101659 and bug #115323.
2000-09-26 05:46:01 +00:00
Tim Peters
38fd5b6413 Derived from Martin's SF patch 110609: support unbounded ints in %d,i,u,x,X,o formats.
Note a curious extension to the std C rules:  x, X and o formatting can never produce
a sign character in C, so the '+' and ' ' flags are meaningless for them.  But
unbounded ints *can* produce a sign character under these conversions (no fixed-
width bitstring is wide enough to hold all negative values in 2's-comp form).  So
these flags become meaningful in Python when formatting a Python long which is too
big to fit in a C long.  This required shuffling around existing code, which hacked
x and X conversions to death when both the '#' and '0' flags were specified:  the
hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or
'+' flags too, since signs were always meaningless before for x and X conversions.
Isomorphic shuffling was required in unicodeobject.c.
Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
2000-09-21 05:43:11 +00:00
Tim Peters
8f422461b4 Fix for bug 113934. string*n and unicode*n did no overflow checking at
all, either to see whether the # of chars fit in an int, or that the
amount of memory needed fit in a size_t.  Checking these is expensive, but
the alternative is silently wrong answers (as in the bug report) or
core dumps (which were easy to provoke using Unicode strings).
2000-09-09 06:13:41 +00:00
Fredrik Lundh
df84675f93 changed \x to consume exactly two hex digits, also for unicode
strings.  closes PEP-223.

also added \U escape (eight hex digits).
2000-09-03 11:29:49 +00:00
Barry Warsaw
ce4dc41b1a PyUnicode_AsUTF8String(): /F picks up what I missed: the local var
`str' is no longer necessary.  Gotta turn on -Wall!
2000-08-18 19:30:40 +00:00