Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EBNF-style grammar for Souper's text format? #782

Open
fitzgen opened this issue Jul 21, 2020 · 16 comments
Open

EBNF-style grammar for Souper's text format? #782

fitzgen opened this issue Jul 21, 2020 · 16 comments

Comments

@fitzgen
Copy link
Contributor

fitzgen commented Jul 21, 2020

Hello! I am writing a parser and stringifier for Souper's text format, and it would have been super helpful to me if there was a grammar when I started, so I've been sketching a hopefully-mostly-correct grammar as I go. I think it is fairly complete at this point. I was hoping to share it with y'all to see if there are any obvious bugs in the grammar, and ask whether you're interested in merging this into the repo's docs (and improving it to more exactly match Souper's parser / maintaining it over time as the text format changes).

EBNF-esque grammar for Souper's Text Format

(Note that I have not fed this grammar through any mechanical tool like yacc, and it is in a bit of a EBNF pseudocode rather than any particular dialect accepted by any particular tool.)

<replacement> ::= <lhs>
                | <rhs>
                | <full>

<lhs> ::= <statement>* <infer>

<infer> ::= 'infer' <valname> <root-attribute>*

<rhs> ::= <statement>* <result>

<result> ::= 'result' <operand>

<full> ::= <lhs> <rhs>
         | <statement>* <cand>

<cand> ::= 'cand' <valname> <operand> <root-attribute>*

<statement> ::= <assignment>
              | <pc>
              | <blockpc>

<assignment> ::= <assignment-lhs> '=' <assignment-rhs>

<assignment-lhs> ::= <valname> (':' <type>)?

<assignment-rhs> ::= <constant>
                   | 'var' <attribute>*
                   | 'block' <int>
                   | 'phi' <valname> (',' <operand>)*
                   | 'reservedinst' <attribute>*
                   | 'reservedconst' <attribute>*
                   | <instruction> <attribute>*

<instruction> ::= 'add' <operand> ',' <operand>
                | 'addnsw' <operand> ',' <operand>
                | 'addnuw' <operand> ',' <operand>
                | 'addnw' <operand> ',' <operand>
                | 'sub' <operand> ',' <operand>
                | 'subnsw' <operand> ',' <operand>
                | 'subnuw' <operand> ',' <operand>
                | 'subnw' <operand> ',' <operand>
                | 'mul' <operand> ',' <operand>
                | 'mulnsw' <operand> ',' <operand>
                | 'mulnuw' <operand> ',' <operand>
                | 'mulnw' <operand> ',' <operand>
                | 'udiv' <operand> ',' <operand>
                | 'sdiv' <operand> ',' <operand>
                | 'udivexact' <operand> ',' <operand>
                | 'sdivexact' <operand> ',' <operand>
                | 'urem' <operand> ',' <operand>
                | 'srem' <operand> ',' <operand>
                | 'and' <operand> ',' <operand>
                | 'or' <operand> ',' <operand>
                | 'xor' <operand> ',' <operand>
                | 'shl' <operand> ',' <operand>
                | 'shlnsw' <operand> ',' <operand>
                | 'shlnuw' <operand> ',' <operand>
                | 'shlnw' <operand> ',' <operand>
                | 'lshr' <operand> ',' <operand>
                | 'lshrexact' <operand> ',' <operand>
                | 'ashr' <operand> ',' <operand>
                | 'ashrexact' <operand> ',' <operand>
                | 'select' <operand> ',' <operand> ',' <operand>
                | 'zext' <operand>
                | 'sext' <operand>
                | 'trunc' <operand>
                | 'eq' <operand> ',' <operand>
                | 'ne' <operand> ',' <operand>
                | 'ult' <operand> ',' <operand>
                | 'slt' <operand> ',' <operand>
                | 'ule' <operand> ',' <operand>
                | 'sle' <operand> ',' <operand>
                | 'ctpop' <operand>
                | 'bswap' <operand>
                | 'bitreverse' <operand>
                | 'cttz' <operand>
                | 'ctlz' <operand>
                | 'fshl' <operand> ',' <operand> ',' <operand>
                | 'fshr' <operand> ',' <operand> ',' <operand>
                | 'sadd.with.overflow' <operand> ',' <operand>
                | 'uadd.with.overflow' <operand> ',' <operand>
                | 'ssub.with.overflow' <operand> ',' <operand>
                | 'usub.with.overflow' <operand> ',' <operand>
                | 'smul.with.overflow' <operand> ',' <operand>
                | 'umul.with.overflow' <operand> ',' <operand>
                | 'sadd.sat' <operand> ',' <operand>
                | 'uadd.sat' <operand> ',' <operand>
                | 'ssub.sat' <operand> ',' <operand>
                | 'usub.sat' <operand> ',' <operand>
                | 'extractvalue' <operand> ',' <operand>
                | 'hole'
                | 'freeze' <operand>

<operand> ::= <valname>
            | <constant>

<root-attribute> ::= '(' <root-attribute-inner> ')'

<root-attribute-inner> ::= 'demandedBits' '=' regexp([0|1]+)
                         | 'harvestedFromUse'

<attribute> ::= '(' <attribute-inner> ')'

<attribute-inner> ::= 'knownBits' '=' regexp([0|1|x]+)
                    | 'powerOfTwo'
                    | 'negative'
                    | 'nonNegative'
                    | 'nonZero'
                    | 'hasExternalUses'
                    | 'signBits' '=' <int>
                    | 'range' '=' '[' <int> ',' <int> ')'

<pc> ::= 'pc' <operand> <operand>

<blockpc> ::= 'blockpc' <valname> <int> <operand> <operand>

<constant> ::= <int> (':' <type>)?

<int> ::= regexp(-?\d+)

<type> ::= regexp(i\d+)

<valname> ::= regexp(%[a-zA-Z0-9_]+)

<ident> ::= regexp([a-zA-Z][a-zA-Z0-9_]+)
@regehr
Copy link
Collaborator

regehr commented Jul 21, 2020

looks good at a quick scan! yes, we'd be happy to have a text version of this in the repo, but I'd prefer if it was one that could be reasonably checked against the souper parser, for example by some differential testing (that I'd be likely to find a bit of time to work on)

@fitzgen
Copy link
Contributor Author

fitzgen commented Jul 21, 2020

To clarify: you mean differential testing the hand written parser vs. a parser mechanically generated from the grammar with yacc (or similar)?

@regehr
Copy link
Collaborator

regehr commented Jul 21, 2020

ideally, running a yacc-equivalent tool on the grammar and then doing some testing to see if it accepts the same set of strings as the souper parser does

though now that I think about it this might not be easy since the souper parser also does typechecking etc.

@fitzgen
Copy link
Contributor Author

fitzgen commented Jul 21, 2020

though now that I think about it this might not be easy since the souper parser also does typechecking etc.

Yeah but we should at least be able to do it unidirectionally:

  • If the hand-written parser accepts a string, then the mechanically generated parser should accept it too

  • If the mechanically-generated parser rejects a string, then the hand-written parser should reject it too.

@fitzgen
Copy link
Contributor Author

fitzgen commented Jul 25, 2020

FWIW, my library lives at https://github.com/fitzgen/souper-ir and can now deterministically round-trip every .opt file in souper/test/** and has stood up to some basic fuzzing (just checking for no panics and that if we can parse, then we can stringify, re-parse that, and re-stringify to the same string again; doesn't do differential fuzzing against Souper's hand written parser like we talked about above).

I also updated the grammar in the OP based on some changes I found while implementing my library.

@fitzgen
Copy link
Contributor Author

fitzgen commented Jul 25, 2020

Also, my parser is hand-written as well. Not mechanically generated from the grammar..

@fitzgen
Copy link
Contributor Author

fitzgen commented Oct 15, 2020

The grammar I originally wrote here allows lines like this:

%0:i32 = 1234
...

specifically because it has the production

<assignment-rhs> ::= <constant>
                   | ...

but it seems that souper doesn't actually allow lines where you assign a constant value to a variable.

Would this be difficult to support?

@regehr
Copy link
Collaborator

regehr commented Oct 15, 2020

I don't think so, but here we're sort of just following LLVM which also disallows this. Is it hard for you to push these constants into the uses, so we don't have to define values of constant type?

@fitzgen
Copy link
Contributor Author

fitzgen commented Oct 15, 2020

Not too hard, I think I actually have it working already. Will need to publish a breaking change of the souper-ir crate too, but that's no biggie since AFAIK I'm the only user :-p

@fitzgen
Copy link
Contributor Author

fitzgen commented Oct 16, 2020

I've got everything parsing okay now -- how can I verify whether souper is successfully populating redis?

This is the first time I've ever used redis, but I would expect that the keys column printed below would be non-zero after populating it with LHSes:

fitzgen@big-beefy :: (souper-no-assign-constant) :: ~/wasmtime/cranelift
    $  ~/souper/obj/souper-check --souper-external-cache --parse-lhs-only ~/scratch/markdown-lhs-candidates.souper
; parsing successful

fitzgen@big-beefy :: (souper-no-assign-constant) :: ~/wasmtime/cranelift
    $ redis-cli --stat
------- data ------ --------------------- load -------------------- - child -
keys       mem      clients blocked requests            connections          
0          839.02K  1       0       30 (+0)             6           
0          839.02K  1       0       31 (+1)             6           
0          839.02K  1       0       32 (+1)             6           
  C-c C-c

@regehr
Copy link
Collaborator

regehr commented Oct 16, 2020

your command is close to correct-- please remove --parse-lhs-only which tells souper-check to skip the parts of souper that do something

@regehr
Copy link
Collaborator

regehr commented Oct 16, 2020

if you want to populate the cache but not do synthesis I think --souper-no-infer does what you want

@fitzgen
Copy link
Contributor Author

fitzgen commented Oct 16, 2020

Does this not support parsing multiple LHSes from one file?

$  ~/souper/obj/souper-check --souper-no-infer --souper-external-cache ~/scratch/markdown-lhs-candidates.souper
/home/fitzgen/scratch/markdown-lhs-candidates.souper:11:7: Not expecting a second 'infer'

@regehr
Copy link
Collaborator

regehr commented Oct 16, 2020

I'm pretty sure it's supposed to, please drop your file (or at least the first two LHSs in it) here and I'll look at it later

@fitzgen
Copy link
Contributor Author

fitzgen commented Oct 16, 2020

Thanks for your help!

First few LHSes are:

;; Harvested from `v40` in `u0:192`
%3:i1 = var
%4:i32 = var
%5:i32 = select %3, -1:i32, %4
infer %5

;; Harvested from `v45` in `u0:177`
%0:i32 = var
%1:i64 = zext %0
%2:i64 = mul %1, 4:i64
infer %2

;; Harvested from `v31` in `u0:204`
%0:i32 = var
%1:i64 = zext %0
%2:i64 = mul %1, 4:i64
infer %2

Full files:

@regehr
Copy link
Collaborator

regehr commented Oct 16, 2020

I'm guessing you're missing an option that makes souper expect (only) LHSs, but I don't know what it is right now

sorry our command lines are such a mess, use cases for these tools has drifted over time ugh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants