Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain inputs take an extremely long time to parse #75

Open
tsandall opened this issue Nov 20, 2018 · 1 comment
Open

Certain inputs take an extremely long time to parse #75

tsandall opened this issue Nov 20, 2018 · 1 comment

Comments

@tsandall
Copy link

tsandall commented Nov 20, 2018

Hello!

First of all, thank you very much for maintaining this project!

I'm hoping that someone can provide a bit of guidance. I apologize in advance for not having a minimal test case to reproduce this issue.

The issue

I've been doing some fuzz testing on OPA and I ran into one case where certain inputs would cause the program to hang and then crash. Here's a snippet of the crash:

program hanged (timeout 10 seconds)

SIGABRT: abort
PC=0x45766f m=0 sigcode=0

goroutine 1 [running]:
runtime.aeshashbody()
        /tmp/go-fuzz-build473666132/goroot/src/runtime/asm_amd64.s:917 +0x5f fp=0xc0041a78f8 sp=0xc0041a78f0 pc=0x45766f
runtime.mapassign_faststr(0x764fc0, 0xc0041a7a20, 0xc0005d1fb0, 0x3, 0xc0118f8d68)
        /tmp/go-fuzz-build473666132/goroot/src/runtime/map_faststr.go:202 +0x62 fp=0xc0041a7960 sp=0xc0041a78f8 pc=0x4135d2
github.com/open-policy-agent/opa/ast.(*parser).parse(0xc00049a180, 0xa53e00, 0x0, 0x0, 0x0, 0x0)
        /tmp/go-fuzz-build473666132/gopath/src/github.com/open-policy-agent/opa/ast/parser.go:4362 +0x272 fp=0xc0041a7b50 sp=0xc0041a7960 pc=0x6dc782
github.com/open-policy-agent/opa/ast.Parse(0x0, 0x0, 0xc0001c95e8, 0x8, 0x8, 0xc000527cb8, 0x2, 0x2, 0xc0002f4460, 0x0, ...)
        /tmp/go-fuzz-build473666132/gopath/src/github.com/open-policy-agent/opa/ast/parser.go:3784 +0x98 fp=0xc0041a7ba8 sp=0xc0041a7b50 pc=0x6da558
github.com/open-policy-agent/opa/ast.ParseStatements(0x0, 0x0, 0xc0001c95e0, 0x8, 0xc0001c95e0, 0x8, 0x200000003, 0xc000000300, 0xc000022000, 0xc000527df8, ...)
        /tmp/go-fuzz-build473666132/gopath/src/github.com/open-policy-agent/opa/ast/parser_ext.go:468 +0x173 fp=0xc0041a7d50 sp=0xc0041a7ba8 pc=0x6e62b3
github.com/open-policy-agent/fuzz-opa.Fuzz(0x7f734c798000, 0x8, 0x200000, 0x3)

The crash above occurs here: https://github.com/open-policy-agent/opa/blob/master/ast/parser.go#L4362

I modified the code to print the size of the maxFailExpected slice and found that it grew to very large sizes in pathological cases. For example the input {{{{{{{{ takes 3.5s to parse (error) and the slice holds around 3,000,000 elements.

Expected behaviour

It's not clear whether much can be done about this. In the case of OPA, we don't display the expected values (because we found them too noisy to be helpful) so disabling the code that generates them is an option, however, I'm not sure that would resolve the problem because valid inputs with a similar structure also take a very long time to parse (e.g., {{{{{{{{}}}}}}}} takes ~1.5s before succeeding.)

The PEG file is here: https://github.com/open-policy-agent/opa/blob/master/ast/rego.peg

The vendored version is bb0192c.

Any suggestions would be appreciated.

@breml
Copy link
Collaborator

breml commented Nov 20, 2018

Thanks for the well written bug report.

The line in parser.go you are pointing to (https://github.com/open-policy-agent/opa/blob/master/ast/parser.go#L4362) is after the parsing of the input has finished and is building a map of all the possible expected values in the farthest parsing position until which the parser could advance. The purpose of this map is to easily remove duplicates from the list of possible expected values. It makes sense to me, that with the slice containing 3 mio elements, that could become a problem. Especially, because the map to hold the expected values is initialized on the size of the slice (which does not make sense with slices that big, because chances are really big, that there are a lot duplicates. With so many expected values I also understand, that you disabled those, because they are to noisy.

I think part of the problem is by design (back tracking nature of PEG parsers). PEG parsers are know to behave badly in pathological edge cases.

In regards to possible solutions, I can think of adding some checks to prevent such large slices and maps, especially for the expected values. Also completely disable this functionality as an option could be done. Maybe I can come up with other solutions, if I think a little bit more intensely about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants