cpython

mirror of https://github.com/AdaCore/cpython.git synced 2026-02-12 12:57:15 -08:00

Author	SHA1	Message	Date
Erlend Egeberg Aasland	61d26394f9	bpo-41798: Allocate unicodedata CAPI on the heap (GH-24128)	2021-01-20 12:03:53 +01:00
Victor Stinner	32bd68c839	bpo-42519: Replace PyObject_MALLOC() with PyObject_Malloc() (GH-23587) No longer use deprecated aliases to functions: * Replace PyObject_MALLOC() with PyObject_Malloc() * Replace PyObject_REALLOC() with PyObject_Realloc() * Replace PyObject_FREE() with PyObject_Free() * Replace PyObject_Del() with PyObject_Free() * Replace PyObject_DEL() with PyObject_Free()	2020-12-01 10:37:39 +01:00
Victor Stinner	84f7382215	bpo-42157: Rename unicodedata.ucnhash_CAPI (GH-22994) Removed the unicodedata.ucnhash_CAPI attribute which was an internal PyCapsule object. The related private _PyUnicode_Name_CAPI structure was moved to the internal C API. Rename unicodedata.ucnhash_CAPI as unicodedata._ucnhash_CAPI.	2020-10-27 04:36:22 +01:00
Victor Stinner	c8c4200b65	bpo-42157: Convert unicodedata.UCD to heap type (GH-22991) Convert the unicodedata extension module to the multiphase initialization API (PEP 489) and convert the unicodedata.UCD static type to a heap type. Co-Authored-By: Mohamed Koubaa <koubaa.m@gmail.com>	2020-10-26 23:19:22 +01:00
Victor Stinner	920cb647ba	bpo-42157: unicodedata avoids references to UCD_Type (GH-22990) * UCD_Check() uses PyModule_Check() * Simplify the internal _PyUnicode_Name_CAPI structure: * Remove size and state members * Remove state and self parameters of getcode() and getname() functions * Remove global_module_state	2020-10-26 19:19:36 +01:00
Victor Stinner	47e1afd2a1	bpo-1635741: _PyUnicode_Name_CAPI moves to internal C API (GH-22713) The private _PyUnicode_Name_CAPI structure of the PyCapsule API unicodedata.ucnhash_CAPI moves to the internal C API. Moreover, the structure gets a new state member which must be passed to the getcode() and getname() functions. * Move Include/ucnhash.h to Include/internal/pycore_ucnhash.h * unicodedata module is now built with Py_BUILD_CORE_MODULE. * unicodedata: move hashAPI variable into unicodedata_module_state.	2020-10-26 16:43:47 +01:00
Victor Stinner	e6b8c5263a	bpo-1635741: Add a global module state to unicodedata (GH-22712) Prepare unicodedata to add a state per module: start with a global "module" state, pass it to subfunctions which access &UCD_Type. This change also prepares the conversion of the UCD_Type static type to a heap type.	2020-10-15 16:22:19 +02:00
Mohamed Koubaa	ddc0dd001a	bpo-1635741, unicodedata: add ucd_type parameter to UCD_Check() macro (GH-22328) Co-authored-by: Victor Stinner <vstinner@python.org>	2020-09-23 12:38:16 +02:00
Victor Stinner	4a21e57fe5	bpo-40268: Remove unused structmember.h includes (GH-19530) If only offsetof() is needed: include stddef.h instead. When structmember.h is used, add a comment explaining that PyMemberDef is used.	2020-04-15 02:35:41 +02:00
Serhiy Storchaka	cd8295ff75	bpo-39943: Add the const qualifier to pointers on non-mutable PyUnicode data. (GH-19345)	2020-04-11 10:48:40 +03:00
Andy Lester	982307b9cc	bpo-39943: Remove unused self from find_nfc_index() (GH-18973)	2020-03-17 17:38:12 +01:00
Benjamin Peterson	051b9d08d1	closes bpo-39926: Update Unicode to 13.0.0. (GH-18910)	2020-03-10 20:41:34 -07:00
Dong-hee Na	1b55b65638	bpo-39573: Clean up modules and headers to use Py_IS_TYPE() function (GH-18521)	2020-02-17 11:09:15 +01:00
Victor Stinner	d2ec81a8c9	bpo-39573: Add Py_SET_TYPE() function (GH-18394) Add Py_SET_TYPE() function to set the type of an object.	2020-02-07 09:17:07 +01:00
Jordon Xu	2ec7010206	bpo-37752: Delete redundant Py_CHARMASK in normalizestring() (GH-15095)	2019-09-10 17:04:08 +01:00
Greg Price	7669cb8b21	bpo-38043: Use `bool` for boolean flags on is_normalized_quickcheck. (GH-15711)	2019-09-09 02:16:31 -07:00
Greg Price	2f09413947	closes bpo-37966: Fully implement the UAX #15 quick-check algorithm. (GH-15558) The purpose of the `unicodedata.is_normalized` function is to answer the question `str == unicodedata.normalized(form, str)` more efficiently than writing just that, by using the "quick check" optimization described in the Unicode standard in UAX #15. However, it turns out the code doesn't implement the full algorithm from the standard, and as a result we often miss the optimization and end up having to compute the whole normalized string after all. Implement the standard's algorithm. This greatly speeds up `unicodedata.is_normalized` in many cases where our partial variant of quick-check had been returning MAYBE and the standard algorithm returns NO. At a quick test on my desktop, the existing code takes about 4.4 ms/MB (so 4.4 ns per byte) when the partial quick-check returns MAYBE and it has to do the slow normalize-and-compare: $ build.base/python -m timeit -s 'import unicodedata; s = "\uf900"500000' \ -- 'unicodedata.is_normalized("NFD", s)' 50 loops, best of 5: 4.39 msec per loop With this patch, it gets the answer instantly (58 ns) on the same 1 MB string: $ build.dev/python -m timeit -s 'import unicodedata; s = "\uf900"500000' \ -- 'unicodedata.is_normalized("NFD", s)' 5000000 loops, best of 5: 58.2 nsec per loop This restores a small optimization that the original version of this code had for the `unicodedata.normalize` use case. With this, that case is actually faster than in master! $ build.base/python -m timeit -s 'import unicodedata; s = "\u0338"500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 561 usec per loop $ build.dev/python -m timeit -s 'import unicodedata; s = "\u0338"500000' \ -- 'unicodedata.normalize("NFD", s)' 500 loops, best of 5: 512 usec per loop	2019-09-03 19:45:44 -07:00
Jeroen Demeyer	530f506ac9	bpo-36974: tp_print -> tp_vectorcall_offset and tp_reserved -> tp_as_async (GH-13464) Automatically replace tp_print -> tp_vectorcall_offset tp_compare -> tp_as_async tp_reserved -> tp_as_async	2019-05-30 19:13:39 -07:00
Inada Naoki	6fec905de5	bpo-36642: make unicodedata const (GH-12855)	2019-04-17 08:40:34 +09:00
Max Bélanger	2810dd7be9	closes bpo-32285: Add unicodedata.is_normalized. (GH-4806)	2018-11-04 15:58:24 -08:00
Wonsup Yoon	d134809cd3	bpo-29456: Fix bugs in unicodedata.normalize: u1176, u11a7 and u11c3 (GH-1958) Hangul composition check boundaries are wrong for the second character ([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3) instead of [0x11A7, 0x11C3]).	2018-06-15 20:03:14 +08:00
Benjamin Peterson	7c69c1c0fb	update to Unicode 11.0.0 (closes bpo-33778) (GH-7439) Also, standardize indentation of generated tables.	2018-06-06 20:14:28 -07:00
luzpaz	a5293b4ff2	Fix miscellaneous typos (#4275 )	2017-11-05 15:37:50 +02:00
Benjamin Peterson	279a96206f	bpo-30736: upgrade to Unicode 10.0 (#2344 ) Straightforward. While we're at it, though, strip trailing whitespace from generated tables.	2017-06-22 22:31:08 -07:00
Serhiy Storchaka	f8d7d41507	Issue #28511 : Use the "U" format instead of "O!" in PyArg_Parse*.	2016-10-23 15:12:25 +03:00

1 2 3 4 5 ...

141 Commits