Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpp grammar: Unable to parse pointer to array types as template argument #4149

Closed
deadlocklogic opened this issue Jun 30, 2024 · 9 comments · Fixed by #4150
Closed

cpp grammar: Unable to parse pointer to array types as template argument #4149

deadlocklogic opened this issue Jun 30, 2024 · 9 comments · Fixed by #4150
Labels

Comments

@deadlocklogic
Copy link

Consider:

Test<char(*)[21]> type;

This statement can't be parsed, notice that a pointer to function parses correctly: Test<void(*)(void)> type;.
Thanks.

@deadlocklogic
Copy link
Author

@kaby76 Never encountered such use case in your scraping algorithm?
For the record, Test<char(&)[21]> type; and Test<char(&&)[21]> type; aren't parsed too.
Note: 21 can be replaced with any other constant value.

@kaby76
Copy link
Contributor

kaby76 commented Jul 1, 2024

The problem is that the set of test inputs is small, only 17 files. Of those, it only tests about 60% of the parser rules as measured by trcover. A proper test suite should have been supplied when the grammar was submitted. At some point when we redo this grammar, a test suite will be added, first choice likely from the GNU project.

@deadlocklogic
Copy link
Author

A practical choice. You can choose from big projects a test suite like GNU, clang, Qt, chrome etc...
The only thing which might stands in the way is the usage of the preprocessor: so maybe a preliminary phase hooked to a compiler which generates processed code before running the tests.

By the way, do you think my issue is trivially fixable? I guess it shouldn't be this difficult.

@kaby76
Copy link
Contributor

kaby76 commented Jul 1, 2024

The only thing which might stands in the way is the usage of the preprocessor: so maybe a preliminary phase hooked to a compiler which generates processed code before running the tests.

Correct. There is no preprocessor parsing and translation currently. There are rules to ignore preprocessor directives in the grammar, but the parse can fail for a number of reasons, e.g., if a #ifdef tries to break a statement across the "then" and "else", using non-ISO extensions, etc.

===

Someone didn't refactor the noptr-abstract-declarator rule from the spec. The spec says:

noptr-abstract-declarator:
    noptr-abstract-declaratoropt parameters-and-qualifiers
    noptr-abstract-declaratoropt [ constant-expressionopt ] attribute-specifier-seqopt
    ( ptr-abstract-declarator )

Instead, we ended up with this, which is wrong.

noPointerAbstractDeclarator
: noPointerAbstractDeclarator (
parametersAndQualifiers
| noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
)
| parametersAndQualifiers
| LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
| LeftParen pointerAbstractDeclarator RightParen
;

If anything, it should look like this:

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator parametersAndQualifiers
    | noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftParen pointerAbstractDeclarator RightParen
    ;

That seems to "work" but it may break other things.

But even that I wouldn't do. It should be defined using Kleene operators.

@deadlocklogic
Copy link
Author

Following the specs, I was able to fix this issue.

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator parametersAndQualifiers
    | noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket
    | LeftParen pointerAbstractDeclarator RightParen
    ;

abstractPackDeclarator
    : noPointerAbstractPackDeclarator
    | pointerOperator abstractPackDeclarator
    ;

noPointerAbstractPackDeclarator
    : noPointerAbstractPackDeclarator parametersAndQualifiers
    | Ellipsis
    ;

Wonder why the grammar doesn't follow the specs in the first place.
Is there any roadmap for a newer/cleaner grammar?

@kaby76
Copy link
Contributor

kaby76 commented Jul 2, 2024

Wonder why the grammar doesn't follow the specs in the first place.
Is there any roadmap for a newer/cleaner grammar?

This is precisely what I have been complaining about for years.

The proper way forward is to scrape the grammar from scratch. Then, employ known, logical, repeatable, precise transformations to obtain a functioning grammar. I had performed some work on this over here: https://github.com/kaby76/scrape-c-plus-plus-spec. We can try to use some of the transformations in the existing cpp grammar, but there's really no documentation as to why the rules are structured as they are.

I now see that https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/ has a firewall preventing downloading even the drafts. Really just terrible. The only current free download of drafts is https://github.com/cplusplus/draft. Who knows how long this will be up.

@deadlocklogic
Copy link
Author

Maybe the main goal of ANTLR as a project, is to focus on the generator's runtime/tools, grammars are just treated as plugins which anyone can easily create/modify.

This draft looks plausible even though not ISO compliant.
My use case, isn't high demanding in terms of grammar.
I am using ANTLR to parse something like SWIG's typemaps, so just using a compact version of the grammar mainly for parsing expression statement like patterns.

@kaby76
Copy link
Contributor

kaby76 commented Jul 2, 2024

This draft looks plausible even though not ISO compliant.

Why would we use a non-authoritative website source when we have authoritative drafts in PDF format at https://github.com/cplusplus, and official standard doc at https://www.iso.org/standard/79358.html and have code already to read and strip the EBNF from the official drafts and formal spec?


My plan is to make a PR to fix this parsing issue. One step at a time.

@kaby76
Copy link
Contributor

kaby76 commented Jul 2, 2024

I see now what the developer did with noPointerAbstractDeclarator. The issue was The following sets of rules are mutually left-recursive [noPointerAbstractDeclarator].

Spec:

noptr-abstract-declarator:
    noptr-abstract-declaratoropt parameters-and-qualifiers
    noptr-abstract-declaratoropt [ constant-expressionopt ] attribute-specifier-seqopt
    ( ptr-abstract-declarator )

Step 1: Translated to mutually-left recursive Antlr4 format.

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator? parametersAndQualifiers
    | noPointerAbstractDeclarator? LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftParen pointerAbstractDeclarator RightParen
    ;

Step 2: Remove ?-operator.

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator parametersAndQualifiers
    | parametersAndQualifiers
    | noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | LeftParen pointerAbstractDeclarator RightParen
    ;

This is perfectly fine as is (Antlr4 handles left-recursion), but can be further improved with Kleene operator.

Step 3: Reorder alts.

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator parametersAndQualifiers
    | noPointerAbstractDeclarator LeftBracket constantExpression? RightBracket attributeSpecifierSeq?
    | parametersAndQualifiers
    | LeftParen pointerAbstractDeclarator RightParen
    ;

Step 4: Group alts.

noPointerAbstractDeclarator
    : noPointerAbstractDeclarator (parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )
    | (parametersAndQualifiers | LeftParen pointerAbstractDeclarator RightParen)
    ;

Step 5: Transform to *-operator closure.

noPointerAbstractDeclarator
    : (parametersAndQualifiers | LeftParen pointerAbstractDeclarator RightParen) (parametersAndQualifiers | LeftBracket constantExpression? RightBracket attributeSpecifierSeq? )*
    ;

Each of these steps can be handled by the Trash toolkit from the scraped grammar.

kaby76 added a commit to kaby76/grammars-v4 that referenced this issue Jul 2, 2024
@KvanTTT KvanTTT added the cpp label Jul 14, 2024
teverett pushed a commit that referenced this issue Jul 15, 2024
…rom the ISO spec (#4150)

* Fix for #4149

* Fix pointerAbstractDeclarator.

* Fix noPointerAbstractPackDeclarator

* Reformat according to antlr-format.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants