Add --check-level=fast option that reuses much faster valueflow analysis from 1.90 #6097

danmar · 2024-03-07T17:07:17Z

A self check is completed much faster with --check-level=fast compared to --check-level=normal. For me:
linux: 5-6 times faster
windows: 10 times faster

Comparing warnings with --check-level=fast and --check-level=normal shows not a lot of differences.

danmar · 2024-03-07T17:08:22Z

I ran --check-level=fast and --check-level=normal on 100 random packages. Total number of reports for each severity:

error:
1634 my_check_diff_fast.log
1665 my_check_diff_normal.log
warning:
5103 my_check_diff_fast.log
5112 my_check_diff_normal.log
style:
15629 my_check_diff_fast.log
15873 my_check_diff_normal.log
portability:
835 my_check_diff_fast.log
838 my_check_diff_normal.log
performance:
1038 my_check_diff_fast.log
1038 my_check_diff_normal.log

Assuming that all are true positives I think the fast results are pretty good. I assume that the fast analysis should have a good noise ratio but I will check it!

danmar · 2024-03-07T17:14:31Z

Refactorings will be needed before I merge this to main. I feel that ideally there should not be lots of copy/pasted code between valueflow.cpp and valueflowfast.cpp. I want to reuse fast valueFlow.. functions in valueflow.cpp it's only the slow forward/reverse analysis that should be replaced.

firewave · 2024-03-07T18:29:51Z

I have not looked at the changes yet but regardless here are some of my views on this option: #6025 (comment).

danmar · 2024-03-07T20:18:47Z

ok --check-level=fast must be used explicitly by the user. It is not "fast" by default. Therefore in my opinion we should not write warnings that the analysis is fast and that there are slower options available.

firewave · 2024-03-13T06:49:51Z

IMO it is utter madness to have two different implementations of something as it means we have to provide performance and quality for both of them. And as we are just about 4 people actively working on this and essentially just a single person working on the ValueFlow I think this is not something that can realistically be achieved.

This also means we need to duplicate all testing (including daca) and that just seems mental. Especially since we might not have the proper testing coverage. We might check that a false positive does not occur but not the negative test that we detect the issue. So if the code is somehow no longer being triggered without feedback we would have no indication that we would never get any output even if it regresses in the future.

If just yesterday is an indicator we should not be adding any major features or code at all as existing parts might not have been working for years (or ever) and we should rather try to have less code and less jobs.

I still haven't looked at the code but a note on ProgramMemory.
As the copies of it have a performance impact I looked into this several times and only did things on the level the language allows and not changing the implementation. An idea to potentially improve that was to introduce an overlay for the map so data doesn't need to be copied.

Also offering even more options to the user with less feedback makes the support much harder. It also requires all plugins/integrations to change so people have the possibility to configure it and restore the previous behavior. And one of the main advantages of Cppcheck compared to most static analyzers is the low configuration approach.

firewave · 2024-03-13T07:16:36Z

I wonder if a better approach would be to defer the ValueFlow execution until we actually need the values. But since the various passes dependent on each other I doubt that is possible. But as they are usually run on scopes maybe that could be used as the entrypoint for a different approach. I am not very familiar with it so I am not sure if that would be possible.

danmar · 2024-04-12T17:59:40Z

IMO it is utter madness to have two different implementations of something as it means we have to provide performance and quality for both of them.

Right now there is plenty of copy/paste we can get rid of easily. When I have finished refactoring, I am guessing there will be something like 500-1000 lines of code in valueflowfast.cpp instead of 5000.

Two different implementations makes sense because there are very different goals. We can continue to develop the normal/exhaustive analysis to detect more bugs and that will not have significant effect on "fast" analysis time. I will not actively improve the fast analysis to detect more bugs, I envision there will mostly be bug fixes in that.

Technically it would be possible to fork Cppcheck repo and provide this "fast" analysis in a separate repo. I am not totally against that option.

And as we are just about 4 people actively working on this and essentially just a single person working on the ValueFlow I think this is not something that can realistically be achieved.

As I read this.. this is a question if there are resources for this. This is important for Cppcheck Solutions AB and we can provide resources. For information we have paid for several bug fixes in normal/exhaustive valueflow analysis.

firewave · 2024-04-19T12:59:58Z

I will provide a proper reply later - I have not been feeling too well this week so looking into non-trivial stuff has been challenging.

Just a collection of existing ideas in helping with this:

added (optional) lazy execution of ValueFlow #4521 - having lazy execution might help
ValueFlow: start splitting it into multiple files #4748 - having the monolithic source file gotten rid of should make it easier to implement fast branches for heavy steps
https://trac.cppcheck.net/ticket/12528 - limiting handling of scopes in heavy valueflow steps might be a fast branch
https://trac.cppcheck.net/ticket/12358 / https://trac.cppcheck.net/ticket/12560 - having better tests for the generated valueflow data might help with evaluating how impactful a fast branch might be

firewave · 2024-04-22T13:55:40Z

See neutrinolabs/xrdp#3037 for the mess we are currently in...

danmar marked this pull request as draft March 7, 2024 17:12

fast valueflow

03d32d8

danmar force-pushed the valueflow-fast branch from d247145 to 03d32d8 Compare April 12, 2024 13:18

refactoring

781f190

firewave mentioned this pull request May 6, 2024

ValueFlow: start splitting it into multiple files #4748

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add --check-level=fast option that reuses much faster valueflow analysis from 1.90 #6097

Add --check-level=fast option that reuses much faster valueflow analysis from 1.90 #6097

danmar commented Mar 7, 2024 •

edited

Loading

danmar commented Mar 7, 2024 •

edited

Loading

danmar commented Mar 7, 2024

firewave commented Mar 7, 2024

danmar commented Mar 7, 2024

firewave commented Mar 13, 2024

firewave commented Mar 13, 2024

danmar commented Apr 12, 2024

firewave commented Apr 19, 2024

firewave commented Apr 22, 2024

Add --check-level=fast option that reuses much faster valueflow analysis from 1.90 #6097

Are you sure you want to change the base?

Add --check-level=fast option that reuses much faster valueflow analysis from 1.90 #6097

Conversation

danmar commented Mar 7, 2024 • edited Loading

danmar commented Mar 7, 2024 • edited Loading

danmar commented Mar 7, 2024

firewave commented Mar 7, 2024

danmar commented Mar 7, 2024

firewave commented Mar 13, 2024

firewave commented Mar 13, 2024

danmar commented Apr 12, 2024

firewave commented Apr 19, 2024

firewave commented Apr 22, 2024

danmar commented Mar 7, 2024 •

edited

Loading

danmar commented Mar 7, 2024 •

edited

Loading