Commit Graph

26 Commits

Author SHA1 Message Date
Pablo Galindo
58fb156edd bpo-42997: Improve error message for missing : before suites (GH-24292)
* Add to the peg generator a new directive ('&&') that allows to expect
  a token and hard fail the parsing if the token is not found. This
  allows to quickly emmit syntax errors for missing tokens.

* Use the new grammar element to hard-fail if the ':' is missing before
  suites.
2021-02-02 19:54:22 +00:00
Pablo Galindo
4090151816 bpo-42986: Fix parser crash when reporting syntax errors in f-string with newlines (GH-24279) 2021-01-31 22:48:23 +00:00
Batuhan Taskaya
a698d52c39 bpo-40176: Improve error messages for unclosed string literals (GH-19346)
Automerge-Triggered-By: GH:isidentical
2021-01-20 13:38:47 -08:00
Pablo Galindo
c3f167d7b2 bpo-42864: Simplify the tokenizer exceptions after generic SyntaxError (GH-24273)
Automerge-Triggered-By: GH:pablogsal
2021-01-20 11:11:56 -08:00
Pablo Galindo
d6d6371447 bpo-42864: Improve error messages regarding unclosed parentheses (GH-24161) 2021-01-19 23:59:33 +00:00
Lysandros Nikolaou
e5fe509054 bpo-42827: Fix crash on SyntaxError in multiline expressions (GH-24140)
When trying to extract the error line for the error message there
are two distinct cases:

1. The input comes from a file, which means that we can extract the
   error line by using `PyErr_ProgramTextObject` and which we already
   do.
2. The input does not come from a file, at which point we need to get
   the source code from the tokenizer:
   * If the tokenizer's current line number is the same with the line
     of the error, we get the line from `tok->buf` and we're ready.
   * Else, we can extract the error line from the source code in the
     following two ways:
     * If the input comes from a string we have all the input
       in `tok->str` and we can extract the error line from it.
     * If the input comes from stdin, i.e. the interactive prompt, we
       do not have access to the previous line. That's why a new
       field `tok->stdin_content` is added which holds the whole input for the
       current (multiline) statement or expression. We can then extract the
       error line from `tok->stdin_content` like we do in the string case above.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2021-01-14 21:36:30 +00:00
Pablo Galindo
06f8c3328d bpo-42214: Fix check for NOTEQUAL token in the PEG parser for the barry_as_flufl rule (GH-23048) 2020-10-30 23:48:42 +00:00
Batuhan Taskaya
3af4b58552 bpo-42206: Propagate and raise errors from PyAST_Validate in the parser (GH-23035) 2020-10-30 11:48:41 +00:00
Lysandros Nikolaou
bca7014032 bpo-42123: Run the parser two times and only enable invalid rules on the second run (GH-22111)
* Implement running the parser a second time for the errors messages

The first parser run is only responsible for detecting whether
there is a `SyntaxError` or not. If there isn't the AST gets returned.
Otherwise, the parser is run a second time with all the `invalid_*`
rules enabled so that all the customized error messages get produced.
2020-10-27 00:42:04 +02:00
Pablo Galindo
e68c67805e bpo-42150: Avoid buffer overflow in the new parser (GH-22978) 2020-10-25 23:03:41 +00:00
Batuhan Taskaya
02a1603f91 bpo-42000: Cleanup the AST related C-code (GH-22641)
- Use the proper asdl sequence when creating empty arguments
- Remove reduntant casts (thanks to new typed asdl_sequences)
- Remove MarshalPrototypeVisitor and some utilities from asdl generator
- Fix the header of `Python/ast.c` (kept from pgen times)

Automerge-Triggered-By: @pablogsal
2020-10-10 10:14:59 -07:00
Pablo Galindo
a5634c4067 bpo-41746: Add type information to asdl_seq objects (GH-22223)
* Add new capability to the PEG parser to type variable assignments. For instance:
```
       | a[asdl_stmt_seq*]=';'.small_stmt+ [';'] NEWLINE { a }
```

* Add new sequence types from the asdl definition (automatically generated)
* Make `asdl_seq` type a generic aliasing pointer type.
* Create a new `asdl_generic_seq` for the generic case using `void*`.
* The old `asdl_seq_GET`/`ast_seq_SET` macros now are typed.
* New `asdl_seq_GET_UNTYPED`/`ast_seq_SET_UNTYPED` macros for dealing with generic sequences.
* Changes all possible `asdl_seq` types to use specific versions everywhere.
2020-09-16 19:42:00 +01:00
Pablo Galindo
315a61f7a9 bpo-41697: Correctly handle KeywordOrStarred when parsing arguments in the parser (GH-22077) 2020-09-03 15:29:32 +01:00
Pablo Galindo
4a97b1517a bpo-41690: Use a loop to collect args in the parser instead of recursion (GH-22053)
This program can segfault the parser by stack overflow:

```
import ast

code = "f(" + ",".join(['a' for _ in range(100000)]) + ")"
print("Ready!")
ast.parse(code)
```

the reason is that the rule for arguments has a simple recursion when collecting args:

args[expr_ty]:
    [...]
    | a=named_expression b=[',' c=args { c }] {
        [...] }
2020-09-02 17:44:19 +01:00
Pablo Galindo
1332226b32 Validate the AST produced by the parser in debug mode (GH-21643)
This will improve the debug experience if something fails in the produced AST. Previously, errors in the produced AST can be felt much later like in the garbage collector or the compiler, making debugging them much more difficult.
2020-07-27 23:46:59 +01:00
Lysandros Nikolaou
782f44b8fb bpo-41215: Make assertion in the new parser more strict (GH-21364) 2020-07-07 01:42:21 +03:00
Pablo Galindo
1ac0cbca36 bpo-41215: Don't use NULL by default in the PEG parser keyword list (GH-21355)
Automerge-Triggered-By: @lysnikolaou
2020-07-06 12:31:16 -07:00
Guido van Rossum
9d197c7d48 bpo-35975: Only use cf_feature_version if PyCF_ONLY_AST in cf_flags (#21021) 2020-06-27 17:33:49 -07:00
Lysandros Nikolaou
1f0f4abb11 bpo-41076: Pre-feed the parser with the f-string expression location (GH-21054)
This commit changes the parsing of f-string expressions with the new parser. The parser gets pre-fed with the location of the expression itself (not the f-string, which was what we were doing before). This allows us to completely skip the shifting of the AST nodes after the parsing is completed.
2020-06-28 00:41:48 +01:00
Lysandros Nikolaou
6dcbc2422d bpo-41132: Use pymalloc allocator in the f-string parser (GH-21173) 2020-06-27 18:47:00 +01:00
Lysandros Nikolaou
2e0a920e9e bpo-41084: Adjust message when an f-string expression causes a SyntaxError (GH-21084)
Prefix the error message with `fstring: `, when parsing an f-string expression throws a `SyntaxError`.
2020-06-26 12:24:05 +01:00
Lysandros Nikolaou
861efc6e8f bpo-40958: Avoid 'possible loss of data' warning on Windows (GH-20970) 2020-06-20 05:57:27 -07:00
Lysandros Nikolaou
01ece63d42 bpo-40334: Produce better error messages on invalid targets (GH-20106)
The following error messages get produced:
- `cannot delete ...` for invalid `del` targets
- `... is an illegal 'for' target` for invalid targets in for
  statements
- `... is an illegal 'with' target` for invalid targets in
  with statements

Additionally, a few `cut`s were added in various places before the
invocation of the `invalid_*` rule, in order to speed things
up.

Co-authored-by: Pablo Galindo <Pablogsal@gmail.com>
2020-06-19 00:10:43 +01:00
Pablo Galindo
51c5896b62 bpo-40958: Avoid buffer overflow in the parser when indexing the current line (GH-20875) 2020-06-16 16:49:43 +01:00
Pablo Galindo
fb61c42361 Improve readability and style in parser files (GH-20884) 2020-06-15 14:23:43 +01:00