108 Commits

Author SHA1 Message Date
Victor Stinner
f6e323ce32 bpo-34523: Fix C locale coercion on FreeBSD CURRENT (GH-10672) (GH-10673)
bpo-34523, bpo-35290: C locale coercion now resets the Python
internal "force ASCII" mode. This change fix the filesystem encoding
on FreeBSD CURRENT, which has a new "C.UTF-8" locale, when
the UTF-8 mode is disabled.

Add _Py_ResetForceASCII(): _Py_SetLocaleFromEnv() now calls it.

(cherry picked from commit 353933e712)
2018-11-23 13:37:42 +01:00
Victor Stinner
6eff6b8eec bpo-28604: Fix localeconv() for different LC_MONETARY (GH-10606) (GH-10619)
locale.localeconv() now sets temporarily the LC_CTYPE locale to the
LC_MONETARY locale if the two locales are different and monetary
strings are non-ASCII. This temporary change affects other threads.

Changes:

* locale.localeconv() can now set LC_CTYPE to LC_MONETARY to decode
  monetary fields.
* Add LocaleInfo.grouping_buffer: copy localeconv() grouping string
  since it can be replaced anytime if a different thread calls
  localeconv().

(cherry picked from commit 02e6bf7f20)
2018-11-20 22:06:21 +01:00
Victor Stinner
7d35f79012 bpo-34403: Always implement _Py_GetForceASCII() (GH-10235)
Compilation fails on macOS because _Py_GetForceASCII() wasn't define:
always implement implement (default implementation: just return 0).
2018-10-30 14:32:01 +01:00
Victor Stinner
21220bbe65 bpo-34403: Fix initfsencoding() for ASCII (GH-10233)
* Add _Py_GetForceASCII(): check if Python forces the usage of ASCII
  in Py_DecodeLocale() and Py_EncodeLocale().
* initfsencoding() now uses ASCII if _Py_GetForceASCII() is true.
2018-10-30 12:59:20 +01:00
Miss Islington (bot)
178d1c0777 bpo-24658: Fix read/write greater than 2 GiB on macOS (GH-1705)
On macOS, fix reading from and writing into a file with a size larger than 2 GiB.
(cherry picked from commit 74a8b6ea7e)

Co-authored-by: Stéphane Wirtel <stephane@wirtel.be>
2018-10-17 23:58:40 -07:00
Victor Stinner
65ef7425a3 bpo-34527: POSIX locale enables the UTF-8 Mode (GH-8972) (GH-8974)
* The UTF-8 Mode is now also enabled by the "POSIX" locale, not only
  by the "C" locale.
* On FreeBSD, Py_DecodeLocale() and Py_EncodeLocale() now also forces
  the ASCII encoding if the LC_CTYPE locale is "POSIX", not only if
  the LC_CTYPE locale is "C".
* test_utf8_mode.test_cmd_line() checks also that the command line
  arguments are decoded from UTF-8 when the the UTF-8 Mode is enabled
  with POSIX locale or C locale.

(cherry picked from commit 5cb258950c)
2018-08-28 13:51:20 +02:00
Miss Islington (bot)
32955299b4 Spelling fixes to docs, docstrings, and comments (GH-6374)
(cherry picked from commit 61f82e0e33)

Co-authored-by: Ville Skyttä <ville.skytta@iki.fi>
2018-04-20 14:00:41 -07:00
Miss Islington (bot)
ca82e3c0ec bpo-32869: Fix incorrect dst buffer size for MultiByteToWideChar (GH-5739)
This function expects the destination buffer size to be given
in wide characters, not bytes.
(cherry picked from commit b3b4a9d300)

Co-authored-by: Alexey Izbyshev <izbyshev@users.noreply.github.com>
2018-02-18 10:40:07 -08:00
Miss Islington (bot)
2bb0bfafb0 bpo-32777: Fix _Py_set_inheritable async-safety in subprocess (GH-5560) (GH-5562)
Fix a rare but potential pre-exec child process deadlock in subprocess on POSIX systems when marking file descriptors inheritable on exec in the child process.  This bug appears to have been introduced in 3.4 with the inheritable file descriptors support.

This also changes Python/fileutils.c `set_inheritable` to use the "slow" two `fcntl` syscall path instead of the "fast" single `ioctl` syscall path when asked to be async signal safe (by way of being asked not to raise exceptions).  `ioctl` is not a POSIX async-signal-safe approved function.

ref: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
(cherry picked from commit c1e46e94de)

Co-authored-by: Alexey Izbyshev <izbyshev@users.noreply.github.com>
2018-02-05 22:31:22 -08:00
Victor Stinner
9089a26591 bpo-29240: PyUnicode_DecodeLocale() uses UTF-8 on Android (#5272)
PyUnicode_DecodeLocaleAndSize(), PyUnicode_DecodeLocale() and
PyUnicode_EncodeLocale() now use always use the UTF-8 encoding on
Android, instead of the current locale encoding.

On Android API 19, mbstowcs() and wcstombs() are broken and cannot be
used.
2018-01-22 19:07:32 +01:00
Victor Stinner
cb064fc232 bpo-31900: Fix localeconv() encoding for LC_NUMERIC (#4174)
* Add _Py_GetLocaleconvNumeric() function: decode decimal_point and
  thousands_sep fields of localeconv() from the LC_NUMERIC encoding,
  rather than decoding from the LC_CTYPE encoding.
* Modify locale.localeconv() and "n" formatter of str.format() (for
  int, float and complex to use _Py_GetLocaleconvNumeric()
  internally.
2018-01-15 15:58:02 +01:00
Victor Stinner
7ed7aead95 bpo-29240: Fix locale encodings in UTF-8 Mode (#5170)
Modify locale.localeconv(), time.tzname, os.strerror() and other
functions to ignore the UTF-8 Mode: always use the current locale
encoding.

Changes:

* Add _Py_DecodeLocaleEx() and _Py_EncodeLocaleEx(). On decoding or
  encoding error, they return the position of the error and an error
  message which are used to raise Unicode errors in
  PyUnicode_DecodeLocale() and PyUnicode_EncodeLocale().
* Replace _Py_DecodeCurrentLocale() with _Py_DecodeLocaleEx().
* PyUnicode_DecodeLocale() now uses _Py_DecodeLocaleEx() for all
  cases, especially for the strict error handler.
* Add _Py_DecodeUTF8Ex(): return more information on decoding error
  and supports the strict error handler.
* Rename _Py_EncodeUTF8_surrogateescape() to _Py_EncodeUTF8Ex().
* Replace _Py_EncodeCurrentLocale() with _Py_EncodeLocaleEx().
* Ignore the UTF-8 mode to encode/decode localeconv(), strerror()
  and time zone name.
* Remove PyUnicode_DecodeLocale(), PyUnicode_DecodeLocaleAndSize()
  and PyUnicode_EncodeLocale() now ignore the UTF-8 mode: always use
  the "current" locale.
* Remove _PyUnicode_DecodeCurrentLocale(),
  _PyUnicode_DecodeCurrentLocaleAndSize() and
  _PyUnicode_EncodeCurrentLocale().
2018-01-15 10:45:49 +01:00
Victor Stinner
2cba6b8579 bpo-29240: readline now ignores the UTF-8 Mode (#5145)
Add new fuctions ignoring the UTF-8 mode:

* _Py_DecodeCurrentLocale()
* _Py_EncodeCurrentLocale()
* _PyUnicode_DecodeCurrentLocaleAndSize()
* _PyUnicode_EncodeCurrentLocale()

Modify the readline module to use these functions.

Re-enable test_readline.test_nonascii().
2018-01-10 22:46:15 +01:00
Victor Stinner
9bee329130 bpo-32030: Add _Py_FindEnvConfigValue() (#4963)
Add a new _Py_FindEnvConfigValue() function: code shared between
Windows and Unix implementations of _PyPathConfig_Calculate() to read
the pyenv.cfg file.

_Py_FindEnvConfigValue() now uses _Py_DecodeUTF8_surrogateescape()
instead of using a Python Unicode string, the Python API must not be
used early during Python initialization. Same change in Unix
search_for_exec_prefix(): use _Py_DecodeUTF8_surrogateescape().

Cleanup also encode_current_locale(): PyMem_RawFree/PyMem_Free can be
called with NULL.

Fix also "NUL byte" => "NULL byte" typo.
2017-12-21 16:49:13 +01:00
Victor Stinner
9dd762013f bpo-32030: Add _Py_EncodeLocaleRaw() (#4961)
Replace Py_EncodeLocale() with _Py_EncodeLocaleRaw() in:

* _Py_wfopen()
* _Py_wreadlink()
* _Py_wrealpath()
* _Py_wstat()
* pymain_open_filename()

These functions are called early during Python intialization, only
the RAW memory allocator must be used.
2017-12-21 16:20:32 +01:00
Victor Stinner
e47e698da6 bpo-32030: Add _Py_EncodeUTF8_surrogateescape() (#4960)
Py_EncodeLocale() now uses _Py_EncodeUTF8_surrogateescape(), instead
of using temporary unicode and bytes objects. So Py_EncodeLocale()
doesn't use the Python C API anymore.
2017-12-21 15:45:16 +01:00
Victor Stinner
9454060e84 bpo-29240, bpo-32030: Py_Main() re-reads config if encoding changes (#4899)
bpo-29240, bpo-32030: If the encoding change (C locale coerced or
UTF-8 Mode changed), Py_Main() now reads again the configuration with
the new encoding.

Changes:

* Add _Py_UnixMain() called by main().
* Rename pymain_free_pymain() to pymain_clear_pymain(), it can now be
  called multipled times.
* Rename pymain_parse_cmdline_envvars() to pymain_read_conf().
* Py_Main() now clears orig_argc and orig_argv at exit.
* Remove argv_copy2, Py_Main() doesn't modify argv anymore. There is
  no need anymore to get two copies of the wchar_t** argv.
* _PyCoreConfig: add coerce_c_locale and coerce_c_locale_warn.
* Py_UTF8Mode is now initialized to -1.
* Locale coercion (PEP 538) now respects -I and -E options.
2017-12-16 04:54:22 +01:00
Victor Stinner
d2b02310ac bpo-29240: Don't define decode_locale() on macOS (#4895)
Don't define decode_locale() nor encode_locale() on macOS or Android.
2017-12-15 23:06:17 +01:00
Victor Stinner
91106cd9ff bpo-29240: PEP 540: Add a new UTF-8 Mode (#855)
* Add -X utf8 command line option, PYTHONUTF8 environment variable
  and a new sys.flags.utf8_mode flag.
* If the LC_CTYPE locale is "C" at startup: enable automatically the
  UTF-8 mode.
* Add _winapi.GetACP(). encodings._alias_mbcs() now calls
  _winapi.GetACP() to get the ANSI code page
* locale.getpreferredencoding() now returns 'UTF-8' in the UTF-8
  mode. As a side effect, open() now uses the UTF-8 encoding by
  default in this mode.
* Py_DecodeLocale() and Py_EncodeLocale() now use the UTF-8 encoding
  in the UTF-8 Mode.
* Update subprocess._args_from_interpreter_flags() to handle -X utf8
* Skip some tests relying on the current locale if the UTF-8 mode is
  enabled.
* Add test_utf8mode.py.
* _Py_DecodeUTF8_surrogateescape() gets a new optional parameter to
  return also the length (number of wide characters).
* pymain_get_global_config() and pymain_set_global_config() now
  always copy flag values, rather than only copying if the new value
  is greater than the old value.
2017-12-13 12:29:09 +01:00
Victor Stinner
8c663fd60e Replace KB unit with KiB (#4293)
kB (*kilo* byte) unit means 1000 bytes, whereas KiB ("kibibyte")
means 1024 bytes. KB was misused: replace kB or KB with KiB when
appropriate.

Same change for MB and GB which become MiB and GiB.

Change the output of Tools/iobench/iobench.py.

Round also the size of the documentation from 5.5 MB to 5 MiB.
2017-11-08 14:44:44 -08:00
Antoine Pitrou
a6a4dc816d bpo-31370: Remove support for threads-less builds (#3385)
* Remove Setup.config
* Always define WITH_THREAD for compatibility.
2017-09-07 18:56:24 +02:00
Serhiy Storchaka
f7eae0adfc [security] bpo-13617: Reject embedded null characters in wchar* strings. (#2302)
Based on patch by Victor Stinner.

Add private C API function _PyUnicode_AsUnicode() which is similar to
PyUnicode_AsUnicode(), but checks for null characters.
2017-06-28 08:30:06 +03:00
Victor Stinner
0f6d73343d bpo-29619: Convert st_ino using unsigned integer (#557)
bpo-29619: os.stat() and os.DirEntry.inodeo() now convert inode
(st_ino) using unsigned integers.
2017-03-09 17:34:28 +01:00
Xavier de Gaye
76febd0792 Issue #26919: On Android, operating system data is now always encoded/decoded
to/from UTF-8, instead of the locale encoding to avoid inconsistencies with
os.fsencode() and os.fsdecode() which are already using UTF-8.
2016-12-15 20:59:58 +01:00
Xavier de Gaye
ec5d3cd533 Issue #28746: Fix the set_inheritable() file descriptor method on platforms
that do not have the ioctl FIOCLEX and FIONCLEX commands
2016-11-19 16:19:29 +01:00