271 Commits

Author SHA1 Message Date
Batuhan Taskaya
a698d52c39 bpo-40176: Improve error messages for unclosed string literals (GH-19346)
Automerge-Triggered-By: GH:isidentical
2021-01-20 13:38:47 -08:00
Pablo Galindo
ae7d3cd980 bpo-42864: Fix compiler warning in the tokenizer with the new paren stack for column numbers (GH-24266) 2021-01-20 12:53:52 +00:00
Pablo Galindo
d6d6371447 bpo-42864: Improve error messages regarding unclosed parentheses (GH-24161) 2021-01-19 23:59:33 +00:00
Lysandros Nikolaou
e5fe509054 bpo-42827: Fix crash on SyntaxError in multiline expressions (GH-24140)
When trying to extract the error line for the error message there
are two distinct cases:

1. The input comes from a file, which means that we can extract the
   error line by using `PyErr_ProgramTextObject` and which we already
   do.
2. The input does not come from a file, at which point we need to get
   the source code from the tokenizer:
   * If the tokenizer's current line number is the same with the line
     of the error, we get the line from `tok->buf` and we're ready.
   * Else, we can extract the error line from the source code in the
     following two ways:
     * If the input comes from a string we have all the input
       in `tok->str` and we can extract the error line from it.
     * If the input comes from stdin, i.e. the interactive prompt, we
       do not have access to the previous line. That's why a new
       field `tok->stdin_content` is added which holds the whole input for the
       current (multiline) statement or expression. We can then extract the
       error line from `tok->stdin_content` like we do in the string case above.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2021-01-14 21:36:30 +00:00
Victor Stinner
00d7abd7ef bpo-42519: Replace PyMem_MALLOC() with PyMem_Malloc() (GH-23586)
No longer use deprecated aliases to functions:

* Replace PyMem_MALLOC() with PyMem_Malloc()
* Replace PyMem_REALLOC() with PyMem_Realloc()
* Replace PyMem_FREE() with PyMem_Free()
* Replace PyMem_Del() with PyMem_Free()
* Replace PyMem_DEL() with PyMem_Free()

Modify also the PyMem_DEL() macro to use directly PyMem_Free().
2020-12-01 09:56:42 +01:00
Victor Stinner
e822e37946 bpo-36020: Remove snprintf macro in pyerrors.h (GH-20889)
On Windows, #include "pyerrors.h" no longer defines "snprintf" and
"vsnprintf" macros.

PyOS_snprintf() and PyOS_vsnprintf() should be used to get portable
behavior.

Replace snprintf() calls with PyOS_snprintf() and replace vsnprintf()
calls with PyOS_vsnprintf().
2020-06-15 21:59:47 +02:00
Lysandros Nikolaou
896f4cf63f bpo-40847: Consider a line with only a LINECONT a blank line (GH-20769)
A line with only a line continuation character should be considered
a blank line at tokenizer level so that only a single NEWLINE token
gets emitted. The old parser was working around the issue, but the
new parser threw a `SyntaxError` for valid input. For example,
an empty line following a line continuation character was interpreted
as a `SyntaxError`.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-06-11 00:56:08 +01:00
Ammar Askar
a2bbedc8b1 Fix peg_generator compiler warnings under MSVC (GH-20405) 2020-05-26 05:33:35 +01:00
Serhiy Storchaka
74ea6b5a75 bpo-40593: Improve syntax errors for invalid characters in source code. (GH-20033) 2020-05-12 12:42:04 +03:00
Lysandros Nikolaou
846d8b28ab bpo-40246: Revert reporting of invalid string prefixes (GH-19888)
Due to backwards compatibility concerns regarding keywords immediately followed by a string without whitespace between them (like in `bg="#d00" if clear else"#fca"`) will fail to parse,
commit 41d5b94af4 has to be reverted.
2020-05-04 12:32:18 +01:00
Pablo Galindo
11a7f158ef bpo-40335: Correctly handle multi-line strings in tokenize error scenarios (GH-19619)
Co-authored-by: Guido van Rossum <gvanrossum@gmail.com>
2020-04-21 01:53:04 +01:00
Lysandros Nikolaou
41d5b94af4 bpo-40246: Report a better error message for invalid string prefixes (GH-19476) 2020-04-12 19:21:00 +01:00
Victor Stinner
87d3b9db4a bpo-39882: Add _Py_FatalErrorFormat() function (GH-19157) 2020-03-25 19:27:36 +01:00
Victor Stinner
9e5d30cc99 bpo-39882: Py_FatalError() logs the function name (GH-18819)
The Py_FatalError() function is replaced with a macro which logs
automatically the name of the current function, unless the
Py_LIMITED_API macro is defined.

Changes:

* Add _Py_FatalErrorFunc() function.
* Remove the function name from the message of Py_FatalError() calls
  which included the function name.
* Update tests.
2020-03-07 00:54:20 +01:00
Andy Lester
384f3c536d closes bpo-39721: Fix constness of members of tok_state struct. (GH-18600)
The function PyTokenizer_FromUTF8 from Parser/tokenizer.c had a comment:

    /* XXX: constify members. */

This patch addresses that.

In the tok_state struct:
    * end and start were non-const but could be made const
    * str and input were const but should have been non-const

Changes to support this include:
    * decode_str() now returns a char * since it is allocated.
    * PyTokenizer_FromString() and PyTokenizer_FromUTF8() each creates a
        new char * for an allocate string instead of reusing the input
        const char *.
    * PyTokenizer_Get() and tok_get() now take const char ** arguments.
    * Various local vars are const or non-const accordingly.

I was able to remove five casts that cast away constness.
2020-02-27 18:44:52 -08:00
Serhiy Storchaka
0cc6b5e559 bpo-39219: Fix SyntaxError attributes in the tokenizer. (GH-17828)
* Always set the text attribute.
* Correct the offset attribute for non-ascii sources.
2020-02-12 12:17:00 +02:00
Victor Stinner
f3e7ea5b8c bpo-39500: Document PyUnicode_IsIdentifier() function (GH-18397)
PyUnicode_IsIdentifier() does not call Py_FatalError() anymore if the
string is not ready.
2020-02-11 14:29:33 +01:00
Pablo Galindo
5ec91f78d5 bpo-39209: Manage correctly multi-line tokens in interactive mode (GH-17860) 2020-01-06 15:59:09 +00:00
Batuhan Taşkaya
109fc2792a bpo-38673: dont switch to ps2 if the line starts with comment or whitespace (GH-17421)
https://bugs.python.org/issue38673
2019-12-08 20:36:27 -08:00
Hansraj Das
69f37bcb28 Indent code inside if block. (GH-15284)
Without indendation, seems like strcpy line is parallel to `if` condition.
2019-08-15 09:19:07 -07:00
Anthony Sottile
5b94f3578c Fix SyntaxError indicator printing too many spaces for multi-line strings (GH-14433) 2019-07-29 14:59:13 +01:00
Michael J. Sullivan
d8a82e2897 bpo-36878: Only allow text after # type: ignore if first character ASCII (GH-13504)
This disallows things like `# type: ignoreé`, which seems wrong.

Also switch to using Py_ISALNUM for the alnum check, for consistency
with other code (and maybe correctness re: locale issues?).


https://bugs.python.org/issue36878
2019-05-22 13:43:36 -07:00
Michael J. Sullivan
933e1509ec bpo-36878: Track extra text added to 'type: ignore' in the AST (GH-13479)
GH-13238 made extra text after a # type: ignore accepted by the parser.
This finishes the job and actually plumbs the extra text through the
parser and makes it available in the AST.
2019-05-22 15:54:20 +01:00
Anthony Sottile
abea73bf4a bpo-2180: Treat line continuation at EOF as a SyntaxError (GH-13401)
This makes the parser consistent with the tokenize module (already the case
in `pypy`).

sample
------

```python
x = 5\
```

before
------

```console
$ python3 t.py
$ python3 -mtokenize t.py
t.py:2:0: error: EOF in multi-line statement
```

after
-----

```console
$ ./python t.py
  File "t.py", line 3
    x = 5\

         ^
SyntaxError: unexpected EOF while parsing
$ ./python -m tokenize t.py
t.py:2:0: error: EOF in multi-line statement
```



https://bugs.python.org/issue2180
2019-05-18 11:27:16 -07:00
Michael J. Sullivan
d8320ecb86 bpo-36878: Allow extra text after # type: ignore comments (GH-13238)
In the parser, when using the type_comments=True option, recognize
a TYPE_IGNORE as anything containing `# type: ignore` followed by
a non-alphanumeric character. This is to allow ignores such as
`# type: ignore[E1000]`.
2019-05-11 19:17:24 +01:00