Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe it is possible that removing lexer scanner? #53

Open
95833 opened this issue Oct 9, 2022 · 2 comments
Open

Maybe it is possible that removing lexer scanner? #53

95833 opened this issue Oct 9, 2022 · 2 comments

Comments

@95833
Copy link

95833 commented Oct 9, 2022

I am writing a grammar using another parser library. i find lexer-scanner is unnatural. when we define a token , we usually give it a name with some semantics such as VARIABLE, STRING, INT, FLOAT, BOOL etc , this is unnatural because the lexer should not carry any infomation about semantics. maybe it is more suitable that using LITTTLE_CHAR_SET, CHARS_SET_WITH_QUOTES, DIGIT_SET replace VARIABLE, STRING, INT, but obviously, these name are too verbose. it seems unimportant, but when i define a grammar, i always need make a tradeoff between an natural but complex grammar and a simple but incoherent grammar, because the place using same token often have different semantics.

So, i consider whether we can get a nature grammar definition by removing lexer-scanner and replacing lexer-token with inline regex. At the same time, i think of your lib and i feel it is suitable to your lib becase it is able to complement the problem about lexer priority.

@peter-winter
Copy link
Owner

peter-winter commented Oct 13, 2022

The problem is that the parser is supposed to be a constexpr object. This is the whole idea behind the library.

Now there are some problems:

  • I need to calculate the size of a finite automaton table to construct a lexer, so...
  • I need all of the sizes of regexes in compile time
  • I would like to allow inline terms but only if they are expressed as literals, like say "[0-9]"_r

For the char_term and 'string_term' it is easy, for the regex term I found a way but in c++20 standard:

template<std::size_t N>
struct regex
{
    constexpr regex(const char (&str)[N])
    {
        std::ranges::copy(str, array);
    }

    char array[N];
};
 
template<regex a>
constexpr auto operator ""_r()
{
    return a;
}

int main()
{
    constexpr auto expr = "[0-9]"_r;
    return 0;
}

Of course I could allow inlining them like this: regex_term("[0-9]"), but this seemed to verbose and the grammar looked ugly.

@95833
Copy link
Author

95833 commented Oct 13, 2022

the target of inline is to solve the priority of matching lexer along with the process of syntax parsed. And i don't know whether or not it can realized and how to realize it. whereas the style of writing is not very important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants