Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsing a document with a incomplete definition list #48

Open
keewis opened this issue Nov 9, 2023 · 2 comments
Open

parsing a document with a incomplete definition list #48

keewis opened this issue Nov 9, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@keewis
Copy link

keewis commented Nov 9, 2023

I'm trying to use tree-sitter-rst to parse numpydoc docstrings, which are based on rst but not a strict subset (see Carreau/velin#36).

While parsing, I noticed that this:

See Also
--------
item : description

Notes
-----
Some text.

will consume anything after the incomplete definition list item as part of a definition list:

(document (section (title)) (ERROR (classifier)))

where the definition list item consumes everything afterwards and dumps it into the classifier.

Instead, I would have expected a error node, but one that only contains the actual term and classifier, while everything else afterwards is parsed as usual (in other words, I'd like tree-sitter to prefer the insertion of a token over consuming more tokens in this case).

Do you think there is anything that can be changed in this library to get this to work (in other words, is this a bug, either in tree-sitter-rst or in upstream tree-sitter)? Or would you rather recommend a derived grammar that is specific to numpydoc (if that's possible)?

@stsewd
Copy link
Owner

stsewd commented Nov 9, 2023

Hi, docutils parses item : description as a paragraph. Does numpydoc also expects it to be paragraph? If so, this is probably the same issue as #20.

@keewis
Copy link
Author

keewis commented Nov 9, 2023

numpydoc makes use of docutils, so I guess it expects the same behavior (not sure though, I'm no expert on that code base). Edit: It appears that numpydoc is splitting the document (docstring) into sections and parses the content of these one by one. So no involvement of docutils or any other parsing library, just a bunch of regular expressions. This means that it also does not try to classify content as paragraphs or definition lists.

So yes, this can very well be a duplicate of #20. This might still be a duplicate of #20, but I also think that tree-sitter-rst can be a bit stricter than docutils (which to me appears to be very forgiving).

@stsewd stsewd added the bug Something isn't working label Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants