-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #11750 (refactor ctu-info to generate fewer artifacts on disk) #6778
Conversation
Please add some Python tests which for those files after an analysis. I wanted to add these to test some local changes but those are already included in this PR. |
/** | ||
* CTU information | ||
*/ | ||
std::string mCtuInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this safe in terms of memory usage? I have not really understood what this is doing yet (maybe add some explanation to the PR) but appears to accumulate the the data in the memory and this could be megabytes in size or even much, much more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should check it but my hypothesis is that the ctu info will not be huge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
about memory usage, here is one small test case: cppcheck/test/cli/whole-program
- the files whole1.c and whole2.c are 94 and 64 bytes.
- the ctu-info that is generated for those are 220 bytes and 227 bytes.
in this test case ~160 bytes source code generates ~450 bytes ctu-info
I don't think that memory usage will be a large issue. If we pretend that scanning a large project with 160MB source code would require 450MB memory for ctu info.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
But that has to be constructed as a string and is required to be continuous memory.
And we also need to profile that. That sounds like it might slow down things quite a bit. Maybe we might need both modes?
I 100% agree about adding more testing. I will do it. But I don't even manage to run our original tests yet, those do test this functionality also. |
Possible - as I mentioned I have not looked into yet. |
_, _, stderr = cppcheck(args, cwd=__script_dir) | ||
assert 'misra-c2012-5.8' in stderr | ||
_, _, stderr = cppcheck(args, cwd=__script_dir) | ||
assert 'misra-c2012-5.8' in stderr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was only to refactor. But here is a test case that does not work in cppcheck main branch but does work in this branch.
there is testing for CTU analysis in addons in test/cli/whole-program_test.py the tests in that file broke when I started working on this refactoring. |
Yes - I added those. There are also far from complete and have known issues which need to be fixed. So "breaking" them might actually be fixing those (I might not have not written all tests with XFAIL). I was referring to tests which check what exists on the disk and is left (or not) on it and not just the analysis results. The local changes I have is getting rid of the duplicated cleanup code and some further reduction of redundancies. |
no cppcheck build dir: I don't see the point to write a test that tests no files are saved on the disk. the code and tests should be connected. I feel it would be like adding a test that checks that files.txt is not created in various directories. cppcheck build dir: I add one more test |
I added a test that no artifacts is remaining in the project folder. And locally that test fails. So I need to reconsider this.. |
test/cli/whole-program_test.py
Outdated
@pytest.mark.parametrize("jobs,builddir", ((1,False), (1,True), (2,False), (2,True))) | ||
def test_addon_no_artifacts(tmpdir, jobs, builddir): | ||
"""Test that there are no artifacts left after analysis""" | ||
shutil.copyfile(os.path.join(__script_dir, 'whole-program', 'whole1.c'), os.path.join(tmpdir, 'whole1.c')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to copy the files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because there can be artifacts in the test folder before I start the test. Want to have a clean folder with just the test files. This test is also executed locally and locally you might have artifacts..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems unexpected.
#6787 is about detecting such leftovers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But #6787 only works in the CI. what if I add a file locally one way or another in the test folder. For instance by manually running cppcheck on a testfile and pressing Ctrl+C while there is some temporary dump file or something.
Or run this manually before you execute the test: ./cppcheck --dump test/cli
That is not tested in the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it is a start (weirdly it doesn't fail). But so we at least know that it works as expected in the default (i.e. no fatal errors/interruption) case. We can move forward from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but it is a start (weirdly it doesn't fail).
sure. I don't disapprove that PR of course.
I only reviewed the general stuff. I still need to dig into the build dir stuff. Also still needs to be performance tested. |
I didn't expect that performance would be affected much. But according to my measurements this makes cppcheck faster at least in a self-check. I built cppcheck from this branch and main branch using the Makefile using this build command:
And I saw faster analysis with the cppcheck binary built from this branch (cppcheck-11750) than from main branch (cppcheck-HEAD):
|
Thanks for doing those tests. But you cannot analyze the Beside that I would not expect such an improvement from the code changes so that seems extremely suspect and I assume there is some data being omitted from analysis. Best would be to store the |
I compiled cppcheck-11750 and cppcheck-HEAD first and then I ran it all in 1 go from a script:
So it should be the same files. I ate lunch while it was running so I wasn't modifying files in the meantime. |
hmm how would --debug output be useful does that contain any relevant timing info? you mean to ensure that the same corpus is checked?
Frankly I didn't expect it neither. I didn't expect much difference in performance at all. |
oh wait I will have to execute different addons. |
Since that shows the data we analyze - but that would only be actual code and not what we pass as CTU - so that is of no use. Maybe we should add a debug option which displays which CTU information we generate and pass to the analysis. |
I have updated the script.
The output is:
The speedup compared to the previous run can be explained because I ran release builds of premiumaddon this time. |
That result makes way more sense. Thanks for clearing that up. |
I still would like to have a final look but I am out for most of the next two days. |
@firewave friendly ping |
1 similar comment
@firewave friendly ping |
My concerns have been address but I still cannot wrap my head around the actual workflow. My mind just isn't really there. I will give it another spin tomorrow by debugging through it. |
@@ -129,6 +129,10 @@ namespace { | |||
return !mCriticalErrors.empty(); | |||
} | |||
|
|||
const std::string& getCtuInfo() const { | |||
return mCtuInfo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit strange that this lives in the just remotely related logger. I wonder if all the logic should live in its own object and cleanup should be RAII-based and such.
That would be similar to what I tried in #6634 which comes up short in making it a local object because it needs to intercept the logged error at some point (which also applies to the code here). So adding some kind of hook into the error logging which std, XML and CTU (and even SARIF) could use would make a lot of sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if I understand these comments fully.
I do feel the ctu info belongs in cli and gui rather than lib. it's aggregated info from all threads. For me it's reasonable that CppcheckExecutor owns the data (directly or indirectly) however I can agree that the functionality does not technically belong in StdLogger actually.
We also have the mActiveCheckers and mCriticalErrors. Those do not really fit very well in the StdLogger neither.
I do not see a very elegant way to insert hooks. We could add one more ErrorLogger that owns the StdLogger?
class AllLogger : public ErrorLogger {
private:
void reportErr(const ErrorMessage& errmsg) {
... handle active checkers, critical errors, ctu, ...
// pass remaining errors to stdLogger
stdLogger.reportErr(errmsg);
}
StdLogger stdLogger;
};
Do you have better ideas..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I still suggest to put the condition in StdLogger that ensures the output is formatted with sarif/text/xml so it will take a ErrorMessage input and output the proper string..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, again something in this file turns into a grab-bag after things got sorted out at some point.
This is not something we need to solve in this PR. I will see if I am put together a PR with the proposed hook based on the previously linked cleanup.
The changes seem fine to me. That behavior is not changed much and it seems the test coverage is sufficient so if somewhere were wrong that should show up. I will add the build dir injection for all tests soon and also test the before and after. So if there are corner case that should be (hopefully) uncovered by this. |
I forgot the essential in my last comment: Feel free to merge. 👍🙂 |
Thanks! |
No description provided.