Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oss-fuzz: Add raw fuzzer #5185

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

DavidKorczynski
Copy link
Contributor

@DavidKorczynski DavidKorczynski commented Jun 24, 2023

Add fuzzer that passes raw data to cppcheck.check. This fuzzer is heavily dependent on having a good seed corpus, which I have added in the corresponding PR on OSS-Fuzz: google/oss-fuzz#10590

I added this fuzzer because looking at the coverage profile for cppcheck I noticed it's about ~25%, meaning there is room for improvement: https://introspector.oss-fuzz.com/project-profile?project=cppcheck The fuzzer added in this PR together with the corpus generation from the OSS-Fuzz PR locally gets above 50% code coverage on my machine -- likely it will grow much more when run with the CPU power of OSS-Fuzz.

If you're happy with it then once this PR is merged I'll merge google/oss-fuzz#10590

Signed-off-by: David Korczynski <[email protected]>
@danmar
Copy link
Owner

danmar commented Jun 25, 2023

Can you show some results?

I fear that 99% of bugs exposed will be for passing garbage data to cppcheck. Cppcheck is not very robust for garbage input. I think users should use their compiler to ensure the code is compilable first.

There are lots of garbage test cases here: https://github.com/danmar/cppcheck/blob/main/test/testgarbage.cpp and it does not feel like that is anywhere near completed.

I have the feeling we could spend years of development to make cppcheck more robust for bad input. But it will have no actual benefit for normal users that pass compilable code.

In theory I would like a robust product that can handle garbage input properly.

@DavidKorczynski
Copy link
Contributor Author

I ran it for around 10 minutes and no issues were found, so no results as such besides the increased coverage. I'd expect things to change when run on OSS-Fuzz.

Would you like me to run it for longer locally and see if something comes up? Alternatively we can enable it to run on OSS-Fuzz and if it's too noisy we can simply remove it again.

@danmar
Copy link
Owner

danmar commented Jun 25, 2023

Do I understand it correctly that you suspect there will not be garbage code? Only compilable code will be tested?

I ran it for around 10 minutes and no issues were found

ok. hmm I am surprised. I had the feeling that issues would be found quickly..

Would you like me to run it for longer locally and see if something comes up? Alternatively we can enable it to run on OSS-Fuzz and if it's too noisy we can simply remove it again.

Could you run it for an hour or two please?

If nothing comes up then I think merging to OSS-Fuzz could be done to test it out more thoroughly.

@DavidKorczynski
Copy link
Contributor Author

DavidKorczynski commented Jun 26, 2023

Do I understand it correctly that you suspect there will not be garbage code? Only compilable code will be tested?

No, it will produce garbage code. However, the fuzzer will "start" from all of the .cpp code in this repository (building the corpus in the corresponding OSS-Fuzz PR: https://github.com/google/oss-fuzz/pull/10590/files#diff-04f92e08039112ec4b73b0230f89a2b58dc4a00c3d31c414f84bf2863f3f92bdR25-R28) and then continuously mutate over it while exploring more of the cppcheck codebase.

Could you run it for an hour or two please?
If nothing comes up then I think merging to OSS-Fuzz could be done to test it out more thoroughly.

Just ran it for two hours now and no issues have come up.

Edit: it now ran into a timeout issue where the processing took 86 seconds. It didn't run into any crashes/memory corruptions though.

@danmar
Copy link
Owner

danmar commented Jun 27, 2023

No, it will produce garbage code.

so whats the plan when there is a crash for non-compilable code. how do we tell oss-fuzz that we don't care about that testcase.

@danmar
Copy link
Owner

danmar commented Jun 27, 2023

wild idea; could you run gcc -fsyntax-only or some lightweight syntax checker before passing the code to cppcheck?

@DavidKorczynski
Copy link
Contributor Author

so whats the plan when there is a crash for non-compilable code. how do we tell oss-fuzz that we don't care about that testcase.

Hmm, I guess there isn't a plan besides marking them as WontFix? I think if the idea is that anything triggered by garbage input is not something to consider then this fuzzer doesn't really make sense. That said, a bug triggerable by garbage input doesn't imply it's not triggerable by syntax-valid input.

wild idea; could you run gcc -fsyntax-only or some lightweight syntax checker before passing the code to cppcheck?

Hmm, let me think about it. In general I think it's a valid idea of a having a syntax checker up as a precondition before passing it to cppchecker. Let me check the options for it.

@danmar
Copy link
Owner

danmar commented Jun 28, 2023

That said, a bug triggerable by garbage input doesn't imply it's not triggerable by syntax-valid input.

Yes that is true.

@firewave
Copy link
Collaborator

I think slightly improving the fuzzing of garbage code is fine.

wild idea; could you run gcc -fsyntax-only or some lightweight syntax checker before passing the code to cppcheck?

That makes total sense and is not wild at all. I think validating the generated data is actually how you guide a fuzzer towards generating actual useful input.

Maybe one of the LLVM fuzzers already provides such inputs which could be leveraged or used as inspiration.

@DavidKorczynski
Copy link
Contributor Author

That makes total sense and is not wild at all. I think validating the generated data is actually how you guide a fuzzer towards generating actual useful input.

The issues with running gcc -fsyntax-only are:

  • Launching a process in each fuzz iteration is expensive and will likely slow down the fuzzer a lot.
  • gcc will not be instrumented with coverage guided instrumentation. As such, it will be a black-box and the coverage-feedback of the fuzzing will not work, which will likely make the fuzzer unable to come up with valid syntax and never call into cppcheck anyways.

For those reasons I was going to look for a lightweight syntax checker as also suggested, but have not made progress there. I think my personal preference would be to use the fuzzer as is and if it reports bugs that are of no interest then remove or disable it.

@danmar
Copy link
Owner

danmar commented Aug 7, 2023

I think slightly improving the fuzzing of garbage code is fine.

Problem is that in my experience it's too easy to trigger crashes on garbage code. As far as I know, a few minutes of fuzzing with garbage will trigger a crash. I fear we will end up fixing thousands and thousands of garbage cases.. and see very little real code crashes..

We don't need to run such fuzzing on some infrastructure. It can be executed locally.

I want that cppcheck handles non-standard code therefore it would not be ideal to add a strict syntax check in cppcheck.

@danmar
Copy link
Owner

danmar commented Aug 7, 2023

I believe we can have a ready made script in the repo that I can run if I want to run this fuzzing locally.

So feel free to consider creating some standalone script..

@firewave
Copy link
Collaborator

I came across a case of incomplete code which triggered a crash which gave me an idea how we could "fuzz" our project in a more "sensible" way compared to actual fuzzing.

We could tokenize valid code and assemble it one by one and pass that to the analysis. Or remove random tokens. That would be way closer to real-life examples then procedural generated code.

I filed https://trac.cppcheck.net/ticket/12357 about it.

@firewave
Copy link
Collaborator

After finally understanding fuzzing I ended up with a conclusion which matches this in #6119. We do not need an additional fuzzer but simply can apply these changes to the existing one.

As the corpus we should simply use the *.cpp files from the samples folder and not our whole source. That way we have a fixed corpus instead one which changes with each revision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants