Florent Xicluna
22b243809e
#7643 : Unicode codepoints VT (0x0B) and FF (0x0C) are linebreaks according to Unicode Standard Annex #14 .
2010-03-30 08:24:06 +00:00
Florent Xicluna
2e0a53fdf6
Issue #8024 : Update the Unicode database to 5.2
2010-03-18 21:50:06 +00:00
Florent Xicluna
dc36472472
Remove py3k deprecation warnings from these Unicode tools.
2010-03-15 14:00:58 +00:00
Amaury Forgeot d'Arc
5c92d4301d
#7112 : Fix compilation warning in unicodetype_db.h
...
makeunicodedata now generates double literals
2009-10-13 21:29:34 +00:00
Amaury Forgeot d'Arc
d0052d17b1
#1571184 : makeunicodedata.py now generates the functions _PyUnicode_ToNumeric,
...
_PyUnicode_IsLinebreak and _PyUnicode_IsWhitespace.
It now also parses the Unihan.txt for numeric values.
2009-10-06 19:56:32 +00:00
Antoine Pitrou
e988e286b2
Issue #1734234 : Massively speedup unicodedata.normalize() when the
...
string is already in normalized form, by performing a quick check beforehand.
Original patch by Rauli Ruohonen.
2009-04-27 21:53:26 +00:00
Walter Dörwald
5d98ec76bb
Issue #5828 (Invalid behavior of unicode.lower): Fixed bogus logic in
...
makeunicodedata.py and regenerated the Unicode database (This fixes
u'\u1d79'.lower() == '\x00').
2009-04-25 14:03:16 +00:00
Martin v. Löwis
24329ba176
Issue #3811 : The Unicode database was updated to 5.1.
...
Reviewed by Fredrik Lundh and Marc-Andre Lemburg.
2008-09-10 13:38:12 +00:00
Martin v. Löwis
111c180674
Make more symbols static.
2008-06-13 07:47:47 +00:00
Martin v. Löwis
43179c8e6f
Add changelog entry.
2006-03-11 12:43:44 +00:00
Tim Peters
88ca467ca4
Whitespace normalization.
2006-03-10 23:39:56 +00:00
Martin v. Löwis
480f1bb67b
Update Unicode database to Unicode 4.1.
2006-03-09 23:38:20 +00:00
Hye-Shik Chang
e9ddfbb412
SF #989185 : Drop unicode.iswide() and unicode.width() and add
...
unicodedata.east_asian_width(). You can still implement your own
simple width() function using it like this:
def width(u):
w = 0
for c in unicodedata.normalize('NFC', u):
cwidth = unicodedata.east_asian_width(c)
if cwidth in ('W', 'F'): w += 2
else: w += 1
return w
2004-08-04 07:38:35 +00:00
Hye-Shik Chang
974ed7cfa5
- SF #962502 : Add two more methods for unicode type; width() and
...
iswide() for east asian width manipulation. (Inspired by David
Goodger, Reviewed by Martin v. Loewis)
- Move _PyUnicode_TypeRecord.flags to the end of the struct so that
no padding is added for UCS-4 builds. (Suggested by Martin v. Loewis)
2004-06-02 16:49:17 +00:00
Martin v. Löwis
b5c980b802
Add unidata_version. Bump generator version number.
2002-11-25 09:13:37 +00:00
Martin v. Löwis
97225da29a
Sort names independent of the Python version. Fix hex constant warning.
...
Include all First/Last blocks.
2002-11-24 23:05:09 +00:00
Martin v. Löwis
677bde2dd1
Patch #626485 : Support Unicode normalization.
2002-11-23 22:08:15 +00:00
Martin v. Löwis
99ac3283e7
Verify that lower-higher case delta are 16-bit.
2002-10-18 17:34:18 +00:00
Martin v. Löwis
9def6a3a77
Update to Unicode 3.2 database.
2002-10-18 16:11:54 +00:00
Walter Dörwald
aaab30e00c
Apply diff2.txt from SF patch http://www.python.org/sf/572113
...
(with one small bugfix in bgen/bgen/scantools.py)
This replaces string module functions with string methods
for the stuff in the Tools directory. Several uses of
string.letters etc. are still remaining.
2002-09-11 20:36:02 +00:00
Fredrik Lundh
b2dfd73bdc
Unicode nits: Don't include unicodedatabase.h no more. And make sure
...
to build *all* tables in makeunicodedata.py.
2001-01-21 23:31:52 +00:00
Fredrik Lundh
7b7dd107b3
compress unicode decomposition tables (this saves another 55k)
2001-01-21 22:41:08 +00:00
Fredrik Lundh
9e9bcda547
forgot to check in the new makeunicodedata.py script
2001-01-21 17:01:31 +00:00
Fredrik Lundh
fad27aee11
Added 38,642 missing characters to the Unicode database (first-last
...
ranges) -- but thanks to the 2.0 compression scheme, this doesn't add
a single byte to the resulting binaries (!)
Closes bug #117524
2000-11-03 20:24:15 +00:00
Fred Drake
9c6850510c
Remove bogus stdout redirection and use of sys.__stdout__; use
...
augmented print statement instead.
2000-10-26 03:56:46 +00:00