254 Commits

Author SHA1 Message Date
Miss Islington (bot)
b2e281aaa2 bpo-39209: Manage correctly multi-line tokens in interactive mode (GH-17860)
(cherry picked from commit 5ec91f78d5)

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-01-06 08:26:13 -08:00
Miss Islington (bot)
184a3812b8 bpo-38673: dont switch to ps2 if the line starts with comment or whitespace (GH-17421)
https://bugs.python.org/issue38673
(cherry picked from commit 109fc2792a)

Co-authored-by: Batuhan Taşkaya <47358913+isidentical@users.noreply.github.com>
2019-12-08 20:56:19 -08:00
Miss Islington (bot)
64db5aac6b Indent code inside if block. (GH-15284)
Without indendation, seems like strcpy line is parallel to `if` condition.
(cherry picked from commit 69f37bcb28)

Co-authored-by: Hansraj Das <raj.das.136@gmail.com>
2019-08-15 09:38:22 -07:00
Miss Islington (bot)
cf52bd0b9b Fix SyntaxError indicator printing too many spaces for multi-line strings (GH-14433)
(cherry picked from commit 5b94f3578c)

Co-authored-by: Anthony Sottile <asottile@umich.edu>
2019-07-29 07:18:47 -07:00
Michael J. Sullivan
d8a82e2897 bpo-36878: Only allow text after # type: ignore if first character ASCII (GH-13504)
This disallows things like `# type: ignoreé`, which seems wrong.

Also switch to using Py_ISALNUM for the alnum check, for consistency
with other code (and maybe correctness re: locale issues?).


https://bugs.python.org/issue36878
2019-05-22 13:43:36 -07:00
Michael J. Sullivan
933e1509ec bpo-36878: Track extra text added to 'type: ignore' in the AST (GH-13479)
GH-13238 made extra text after a # type: ignore accepted by the parser.
This finishes the job and actually plumbs the extra text through the
parser and makes it available in the AST.
2019-05-22 15:54:20 +01:00
Anthony Sottile
abea73bf4a bpo-2180: Treat line continuation at EOF as a SyntaxError (GH-13401)
This makes the parser consistent with the tokenize module (already the case
in `pypy`).

sample
------

```python
x = 5\
```

before
------

```console
$ python3 t.py
$ python3 -mtokenize t.py
t.py:2:0: error: EOF in multi-line statement
```

after
-----

```console
$ ./python t.py
  File "t.py", line 3
    x = 5\

         ^
SyntaxError: unexpected EOF while parsing
$ ./python -m tokenize t.py
t.py:2:0: error: EOF in multi-line statement
```



https://bugs.python.org/issue2180
2019-05-18 11:27:16 -07:00
Michael J. Sullivan
d8320ecb86 bpo-36878: Allow extra text after # type: ignore comments (GH-13238)
In the parser, when using the type_comments=True option, recognize
a TYPE_IGNORE as anything containing `# type: ignore` followed by
a non-alphanumeric character. This is to allow ignores such as
`# type: ignore[E1000]`.
2019-05-11 19:17:24 +01:00
Pablo Galindo
f2cf1e3e28 bpo-36623: Clean parser headers and include files (GH-12253)
After the removal of pgen, multiple header and function prototypes that lack implementation or are unused are still lying around.
2019-04-13 17:05:14 +01:00
Zackery Spytz
cda139d1de bpo-36459: Fix a possible double PyMem_FREE() due to tokenizer.c's tok_nextc() (12601)
Remove the PyMem_FREE() call added in cb90c89.  The buffer will be
freed when PyTokenizer_Free() is called on the tokenizer state.
2019-03-28 15:53:00 +02:00
Pablo Galindo
cb90c89de1 bpo-36367: Free buffer if realloc fails in tokenize.c (GH-12442) 2019-03-19 17:17:58 +00:00
Guido van Rossum
495da29225 bpo-35975: Support parsing earlier minor versions of Python 3 (GH-12086)
This adds a `feature_version` flag to `ast.parse()` (documented) and `compile()` (hidden) that allow tweaking the parser to support older versions of the grammar. In particular if `feature_version` is 5 or 6, the hacks for the `async` and `await` keyword from PEP 492 are reinstated. (For 7 or higher, these are unconditionally treated as keywords, but they are still special tokens rather than `NAME` tokens that the parser driver recognizes.)



https://bugs.python.org/issue35975
2019-03-07 12:38:08 -08:00
Pablo Galindo
1f24a719e7 bpo-35808: Retire pgen and use pgen2 to generate the parser (GH-11814)
Pgen is the oldest piece of technology in the CPython repository, building it requires various #if[n]def PGEN hacks in other parts of the code and it also depends more and more on CPython internals. This commit removes the old pgen C code and replaces it for a new version implemented in pure Python. This is a modified and adapted version of lib2to3/pgen2 that can generate grammar files compatibles with the current parser.

This commit also eliminates all the #ifdef and code branches related to pgen, simplifying the code and making it more maintainable. The regen-grammar step now uses $(PYTHON_FOR_REGEN) that can be any version of the interpreter, so the new pgen code maintains compatibility with older versions of the interpreter (this also allows regenerating the grammar with the current CI solution that uses Python3.5). The new pgen Python module also makes use of the Grammar/Tokens file that holds the token specification, so is always kept in sync and avoids having to maintain duplicate token definitions.
2019-03-01 15:34:44 -08:00
Guido van Rossum
dcfcd146f8 bpo-35766: Merge typed_ast back into CPython (GH-11645) 2019-01-31 12:40:27 +01:00
Anthony Sottile
995d9b9297 bpo-16806: Fix lineno and col_offset for multi-line string tokens (GH-10021) 2019-01-13 13:05:13 +09:00
Serhiy Storchaka
8ac658114d bpo-30455: Generate all token related code and docs from Grammar/Tokens. (GH-10370)
"Include/token.h", "Lib/token.py" (containing now some data moved from
"Lib/tokenize.py") and new files "Parser/token.c" (containing the code
moved from "Parser/tokenizer.c") and "Doc/library/token-list.inc" (included
in "Doc/library/token.rst") are now generated from "Grammar/Tokens" by
"Tools/scripts/generate_token.py". The script overwrites files only if
needed and can be used on the read-only sources tree.

"Lib/symbol.py" is now generated by "Tools/scripts/generate_symbol_py.py"
instead of been executable itself.

Added new make targets "regen-token" and "regen-symbol" which are now
dependencies of "regen-all".

The documentation contains now strings for operators and punctuation tokens.
2018-12-22 11:18:40 +02:00
Serhiy Storchaka
94cf308ee2 bpo-33306: Improve SyntaxError messages for unbalanced parentheses. (GH-6516) 2018-12-17 17:34:14 +02:00
Zackery Spytz
4c49da0cb7 bpo-35436: Add missing PyErr_NoMemory() calls and other minor bug fixes. (GH-11015)
Set MemoryError when appropriate, add missing failure checks,
and fix some potential leaks.
2018-12-07 12:11:30 +02:00
Zackery Spytz
5061a74a4c Remove unneeded PyUnicode_READY() in tokenizer.c (GH-9114) 2018-09-10 09:27:31 +03:00
Victor Stinner
c884616390 Fix Windows compiler warning in tokenize.c (GH-8359)
Fix the following warning on Windows:

parser\tokenizer.c(1297): warning C4244: 'function': conversion from
'__int64' to 'int', possible loss of data.
2018-07-21 03:36:06 +02:00
Serhiy Storchaka
cf7303ed2a bpo-33305: Improve SyntaxError for invalid numerical literals. (GH-6517) 2018-07-09 15:09:35 +03:00
Victor Stinner
f2ddc6ac93 tokenizer: Remove unused tabs options (#4422)
Remove the following fields from tok_state structure which are now
used unused:

* altwarning: "Issue warning if alternate tabs don't match"
* alterror: "Issue error if alternate tabs don't match"
* alttabsize: "Alternate tab spacing"

Replace alttabsize variable with ALTTABSIZE define.
2017-11-17 01:25:47 -08:00
Jelle Zijlstra
ac317700ce bpo-30406: Make async and await proper keywords (#1669)
Per PEP 492, 'async' and 'await' should become proper keywords in 3.7.
2017-10-05 23:24:46 -04:00
Albert-Jan Nijburg
c9ccacea3f bpo-25324: add missing comma in Parser/tokenizer.c (GH-1910) 2017-06-01 13:51:27 -07:00
Albert-Jan Nijburg
fc354f0785 bpo-25324: copy tok_name before changing it (#1608)
* add test to check if were modifying token

* copy list so import tokenize doesnt have side effects on token

* shorten line

* add tokenize tokens to token.h to get them to show up in token

* move ERRORTOKEN back to its previous location, and fix nitpick

* copy comments from token.h automatically

* fix whitespace and make more pythonic

* change to fix comments from @haypo

* update token.rst and Misc/NEWS

* change wording

* some more wording changes
2017-05-31 16:00:21 +02:00