Commit Graph

117 Commits

Author SHA1 Message Date
Tim Peters
4862ab7bf4 Sheesh -- repair the dodge around "cast isn't an lvalue" complaints to
restore correct semantics.
2001-05-09 08:43:21 +00:00
Tim Peters
9e897f41db Mark Favas reported that gcc caught me using casts as lvalues. Dodge it. 2001-05-09 07:37:07 +00:00
Tim Peters
b4bbcd76ea Ack! Restore the COUNT_ALLOCS one_strings code. 2001-05-09 00:31:40 +00:00
Tim Peters
cf5ad5d6f6 My change to string_item() left an extra reference to each 1-character
interned string created by "string"[i].  Since they're immortal anyway,
this was hard to notice, but it was still wrong <wink>.
2001-05-09 00:24:55 +00:00
Tim Peters
5b4d477568 Intern 1-character strings as soon as they're created. As-is, they aren't
interned when created, so the cached versions generally aren't ever
interned.  With the patch, the
		Py_INCREF(t);
		*p = t;
		Py_DECREF(s);
		return;
indirection block in PyString_InternInPlace() is never executed during a
full run of the test suite, but was executed very many times before.  So
I'm trading more work when creating one-character strings for doing less
work later.  Note that the "more work" here can happen at most 256 times
per program run, so it's trivial.  The same reasoning accounts for the
patch's simplification of string_item (the new version can call
PyString_FromStringAndSize() no more than 256 times per run, so there's
no point to inlining that stuff -- if we were serious about saving time
here, we'd pre-initialize the characters vector so that no runtime testing
at all was needed!).
2001-05-08 22:33:50 +00:00
Tim Peters
2cfe368283 Make unicode.join() work nice with iterators. This also required a change
to string.join(), so that when the latter figures out in midstream that
it really needs unicode.join() instead, unicode.join() can actually get
all the sequence elements (i.e., there's no guarantee that the sequence
passed to string.join() can be iterated over *again* by unicode.join(),
so string.join() must not pass on the original sequence object anymore).
2001-05-05 05:36:48 +00:00
Marc-André Lemburg
542fe56cb9 Fix for bug #417030: "print '%*s' fails for unicode string" 2001-05-02 14:21:53 +00:00
Guido van Rossum
189f1df301 Add a proper implementation for the tp_str slot (returning self, of
course), so I can get rid of the special case for strings in
PyObject_Str().
2001-05-01 16:51:53 +00:00
Tim Peters
b3d8d1f76c A different approach to the problem reported in
Patch #419651: Metrowerks on Mac adds 0x itself
C std says %#x and %#X conversion of 0 do not add the 0x/0X base marker.
Metrowerks apparently does.  Mark Favas reported the same bug under a
Compaq compiler on Tru64 Unix, but no other libc broken in this respect
is known (known to be OK under MSVC and gcc).
So just try the damn thing at runtime and see what the platform does.
Note that we've always had bugs here, but never knew it before because
a relevant test case didn't exist before 2.1.
2001-04-28 05:38:26 +00:00
Guido van Rossum
59d1d2b434 Iterators phase 1. This comprises:
new slot tp_iter in type object, plus new flag Py_TPFLAGS_HAVE_ITER
new C API PyObject_GetIter(), calls tp_iter
new builtin iter(), with two forms: iter(obj), and iter(function, sentinel)
new internal object types iterobject and calliterobject
new exception StopIteration
new opcodes for "for" loops, GET_ITER and FOR_ITER (also supported by dis.py)
new magic number for .pyc files
new special method for instances: __iter__() returns an iterator
iteration over dictionaries: "for x in dict" iterates over the keys
iteration over files: "for x in file" iterates over lines

TODO:

documentation
test suite
decide whether to use a different way to spell iter(function, sentinal)
decide whether "for key in dict" is a good idea
use iterators in map/filter/reduce, min/max, and elsewhere (in/not in?)
speed tuning (make next() a slot tp_next???)
2001-04-20 19:13:02 +00:00
Tim Peters
fff5325078 Bug 415514 reported that e.g.
"%#x" % 0
blew up, at heart because C sprintf supplies a base marker if and only if
the value is not 0.  I then fixed that, by tolerating C's inconsistency
when it does %#x, and taking away that *Python* produced 0x0 when
formatting 0L (the "long" flavor of 0) under %#x itself.  But after talking
with Guido, we agreed it would be better to supply 0x for the short int
case too, despite that it's inconsistent with C, because C is inconsistent
with itself and with Python's hex(0) (plus, while "%#x" % 0 didn't work
before, "%#x" % 0L *did*, and returned "0x0").  Similarly for %#X conversion.
2001-04-12 18:38:48 +00:00
Tim Peters
711088d9b8 Fix for SF bug #415514: "%#x" % 0 caused assertion failure/abort.
http://sourceforge.net/tracker/index.php?func=detail&aid=415514&group_id=5470&atid=105470
For short ints, Python defers to the platform C library to figure out what
%#x should do.  The code asserted that the platform C returned a string
beginning with "0x".  However, that's not true when-- and only when --the
*value* being formatted is 0.  Changed the code to live with C's inconsistency
here.  In the meantime, the problem does not arise if you format a long 0 (0L)
instead.  However, that's because the code *we* wrote to do %#x conversions on
longs produces a leading "0x" regardless of value.  That's probably wrong too:
we should drop leading "0x", for consistency with C, when (& only when) formatting
0L.  So I changed the long formatting code to do that too.
2001-04-12 00:35:51 +00:00
Barry Warsaw
a903ad9855 _Py_ReleaseInternedStrings(): Private API function to decref and
release the interned string dictionary.  This is useful for memory
use debugging because it eliminates a huge source of noise from the
reports.  Only defined when INTERN_STRINGS is defined.
2001-02-23 16:40:48 +00:00
Ka-Ping Yee
fa004ad36c Show '\011', '\012', and '\015' as '\t', '\n', '\r' in strings.
Switch from octal escapes to hex escapes for other nonprintable characters.
2001-01-24 17:19:08 +00:00
Tim Peters
19fe14e76a Derivative of patch #102549, "simpler, faster(!) implementation of string.join".
Also fixes two long-standing bugs (present in 2.0):
1. .join() didn't check that the result size fit in an int.
2. string.join(s) when len(s)==1 returned s[0] regardless of s[0]'s
   type; e.g., "".join([3]) returned 3 (overly optimistic optimization).
I resisted a keen temptation to make .join() apply str() automagically.
2001-01-19 03:03:47 +00:00
Marc-André Lemburg
3a645e4dd4 Added checks to prevent PyUnicode_Count() from dumping core
in case the parameters are out of bounds and fixes error handling
for .count(), .startswith() and .endswith() for the case of
mixed string/Unicode objects.

This patch adds Python style index semantics to PyUnicode_Count()
indices (including the special handling of negative indices).

The patch is an extended version of patch #103249 submitted
by Michael Hudson (mwh) on SF. It also includes new test cases.
2001-01-16 11:54:12 +00:00
Andrew M. Kuchling
6ca8917758 [ Patch #102852 ] Make % error a bit more informative by indicates the
index at which an unknown %-escape was found
2000-12-15 13:07:46 +00:00
Fred Drake
49312a52ec Jeffrey D. Collins <tokeneater@users.sourceforge.net>:
Fix type of the self parameter to some string object methods.

This closes patch #102670.
2000-12-06 14:27:49 +00:00
Tim Peters
a3a3a030af Fox for SF bug #123859: %[duxXo] long formats inconsistent. 2000-11-30 05:22:44 +00:00
Guido van Rossum
2ccda8a7c4 SF patch #102548, fix for bug #121013, by mwh@users.sourceforge.net.
Fixes a typo that caused "".join(u"this is a test") to dump core.
2000-11-27 18:46:26 +00:00
Fred Drake
661ea26b3d Ka-Ping Yee <ping@lfw.org>:
Changes to error messages to increase consistency & clarity.

This (mostly) closes SourceForge patch #101839.
2000-10-24 19:57:45 +00:00
Marc-André Lemburg
53f3d4ac74 [ Bug #116174 ] using %% in cstrings sometimes fails with unicode paramsFix for the bug reported in Bug #116174: "%% %s" % u"abc" failed due
to the way string formatting delegated work to the Unicode formatting
function.
2000-10-07 08:54:09 +00:00
Fred Drake
d5fadf75e4 Rationalize use of limits.h, moving the inclusion to Python.h.
Add definitions of INT_MAX and LONG_MAX to pyport.h.
Remove includes of limits.h and conditional definitions of INT_MAX
and LONG_MAX elsewhere.

This closes SourceForge patch #101659 and bug #115323.
2000-09-26 05:46:01 +00:00
Tim Peters
38fd5b6413 Derived from Martin's SF patch 110609: support unbounded ints in %d,i,u,x,X,o formats.
Note a curious extension to the std C rules:  x, X and o formatting can never produce
a sign character in C, so the '+' and ' ' flags are meaningless for them.  But
unbounded ints *can* produce a sign character under these conversions (no fixed-
width bitstring is wide enough to hold all negative values in 2's-comp form).  So
these flags become meaningful in Python when formatting a Python long which is too
big to fit in a C long.  This required shuffling around existing code, which hacked
x and X conversions to death when both the '#' and '0' flags were specified:  the
hacks weren't strong enough to deal with the simultaneous possibility of the ' ' or
'+' flags too, since signs were always meaningless before for x and X conversions.
Isomorphic shuffling was required in unicodeobject.c.
Also added dozens of non-trivial new unbounded-int test cases to test_format.py.
2000-09-21 05:43:11 +00:00
Marc-André Lemburg
d1ba443206 This patch adds a new Python C API called PyString_AsStringAndSize()
which implements the automatic conversion from Unicode to a string
object using the default encoding.

The new API is then put to use to have eval() and exec accept
Unicode objects as code parameter. This closes bugs #110924
and #113890.

As side-effect, the traditional C APIs PyString_Size() and
PyString_AsString() will also accept Unicode objects as
parameters.
2000-09-19 21:04:18 +00:00