parsing a document with a incomplete definition list #48

keewis · 2023-11-09T18:23:33Z

I'm trying to use tree-sitter-rst to parse numpydoc docstrings, which are based on rst but not a strict subset (see Carreau/velin#36).

While parsing, I noticed that this:

See Also
--------
item : description

Notes
-----
Some text.

will consume anything after the incomplete definition list item as part of a definition list:

(document (section (title)) (ERROR (classifier)))

where the definition list item consumes everything afterwards and dumps it into the classifier.

Instead, I would have expected a error node, but one that only contains the actual term and classifier, while everything else afterwards is parsed as usual (in other words, I'd like tree-sitter to prefer the insertion of a token over consuming more tokens in this case).

Do you think there is anything that can be changed in this library to get this to work (in other words, is this a bug, either in tree-sitter-rst or in upstream tree-sitter)? Or would you rather recommend a derived grammar that is specific to numpydoc (if that's possible)?

The text was updated successfully, but these errors were encountered:

stsewd · 2023-11-09T19:17:08Z

Hi, docutils parses item : description as a paragraph. Does numpydoc also expects it to be paragraph? If so, this is probably the same issue as #20.

keewis · 2023-11-09T19:29:55Z

~~numpydoc makes use of docutils, so I guess it expects the same behavior (not sure though, I'm no expert on that code base).~~ Edit: It appears that numpydoc is splitting the document (docstring) into sections and parses the content of these one by one. So no involvement of docutils or any other parsing library, just a bunch of regular expressions. This means that it also does not try to classify content as paragraphs or definition lists.

~~So yes, this can very well be a duplicate of #20.~~ This might still be a duplicate of #20, but I also think that tree-sitter-rst can be a bit stricter than docutils (which to me appears to be very forgiving).

stsewd added the bug Something isn't working label Dec 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parsing a document with a incomplete definition list #48

parsing a document with a incomplete definition list #48

keewis commented Nov 9, 2023

stsewd commented Nov 9, 2023

keewis commented Nov 9, 2023 •

edited

Loading

parsing a document with a incomplete definition list #48

parsing a document with a incomplete definition list #48

Comments

keewis commented Nov 9, 2023

stsewd commented Nov 9, 2023

keewis commented Nov 9, 2023 • edited Loading

keewis commented Nov 9, 2023 •

edited

Loading