Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keywords token and layout are apparently not compatible #372

Open
MatthewDaggitt opened this issue Jun 28, 2021 · 8 comments
Open

Keywords token and layout are apparently not compatible #372

MatthewDaggitt opened this issue Jun 28, 2021 · 8 comments
Labels
AST Concerning the generated abstract syntax layout Concerning layout (keywords) position Concerning position information in parsed AST token Concerning token categories.

Comments

@MatthewDaggitt
Copy link

When using the parser generated by the grammar fragment

position token TokLet      {"let"};
position token TokIn       {"in"};

ELet.       Expr  ::= TokLet "{" [Expr] "}" TokIn Expr;

layout "let";
layout stop "in";

to parse:

test =
  let y : Real
      y = f x ! 0
  in y

I get syntax error at line 2, column 8 before 'y'.

If instead I use the grammar:

ELet.       Expr  ::= "let" "{" [Expr] "}" "in" Expr;

layout "let";
layout stop "in";

then everything parses fine. The main problem with this is that I think(?) it means that I'm unable to get position information for the "let" and "in" tokens.

@andreasabel
Copy link
Member

The BNFC philosophy is that tokens have some content you are interested in, whereas keywords like "let" have none. So the latter are not represented in the AST, and consequently have no position information.

One could boost the layout mechanism by allowing also declared tokens as layout keywords, i.e.,

layout TokLet;
layout stop TokIn;

Beyond that, one could detect choice-less tokens such as TokLet and represent them as unit types in the AST...

In general, it would be nice to have BNFC produce a concrete syntax tree (CST) and a procedure to extract the AST from the CST. But this is a quite radical redesign.

@andreasabel andreasabel added layout Concerning layout (keywords) token Concerning token categories. labels Jun 28, 2021
@MatthewDaggitt
Copy link
Author

MatthewDaggitt commented Jun 29, 2021

Thanks for the reply @andreasabel. So, without making keywords such as let a token, what is the procedure to extract the location of the let expression in the source code for error reporting purposes whilst adhering to the BNFC philosophy? At the moment my main motivating example is trying to work around #371, i.e. throw a meaningful error with source code locations if the list of let bindings is empty.

However, I think I must be misunderstanding about how to go about extracting positions generally. For example, if one were implementing a language server one would need to get positions for parse errors to highlight the location where the parsing fell over. But the pX functions generated in the Par module return Either String a with the positions already converted to the String. Am I misunderstanding something or is it something that BNFC does not support?

@andreasabel
Copy link
Member

But the pX functions generated in the Par module return Either String a with the positions already converted to the String.

I think the error reporting API also needs a overhaul. The error location shouldn't be mangled into the error message.

In general, BNFC in its present state should be thought of as a wizard generating a parser and printer for prototyping rather than professional use. In the worst case, you can get a happy parser from BNFC that sort of does the thing and then hand-edit it to your satisfaction.

That said, in general I am looking to make BNFC more professional without breaking backwards-compatibility if possible, so a better API to the parser and better position information in the syntax tree are definitely a way to go forward.

@andreasabel
Copy link
Member

One could boost the layout mechanism by allowing also declared tokens as layout keywords, i.e.,

layout TokLet;
layout stop TokIn;

I am getting doubts whether this would be a common use case or just feature bloat. It seems that the OP wants something like that only for getting position information, but providing more position information should rather be addressed more systematically.

@wenkokke
Copy link

Perhaps a BNFC flag which adds the full range of input tokens parsed to create this node to each CST node?

@andreasabel
Copy link
Member

Perhaps a BNFC flag which adds the full range of input tokens parsed to create this node to each CST node?

Atm, we do not produce CSTs. We currently only attach the starting position to each node (with --functor).

I understand your proposal in this way: instead of just attaching a single position to each node, attach a list of start positions of each production item (be it terminal or nonterminal). E.g., a node for

ELet.  Expr ::= "let" Bind "in" Expr;

would look like

ELet :: [Position] -> Bind -> Expr -> Expr

example = ELet [posLet, posBind, posIn, posExpr] bind expr

This could be happening under a flag that addresses the parser.

@andreasabel andreasabel added AST Concerning the generated abstract syntax position Concerning position information in parsed AST labels Oct 19, 2021
@wenkokke
Copy link

More or less, though instead of only providing the first position of each token it'd be better to provide the entire range, either as a pair of Position or using Data.Range.

The reason is that you don't know the length of every token, so this information is difficult to reconstruct.

@MatthewDaggitt
Copy link
Author

That would certainly address our current problems. I can't help but wonder though if it's a bit hacky though, for example you would have to pattern match on the list and there's no static guarantee that it would have the right number of elements. I guess it could be made a tuple of the correct arity rather than a list... I don't know, offering the full CST seems like a much more principled way to go, although I realise it's a lot more effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AST Concerning the generated abstract syntax layout Concerning layout (keywords) position Concerning position information in parsed AST token Concerning token categories.
Projects
None yet
Development

No branches or pull requests

3 participants