Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #11750 (refactor ctu-info to generate fewer artifacts on disk) #6778

Merged
merged 5 commits into from
Oct 7, 2024

Conversation

danmar
Copy link
Owner

@danmar danmar commented Sep 7, 2024

No description provided.

@danmar danmar marked this pull request as draft September 7, 2024 13:05
@firewave
Copy link
Collaborator

firewave commented Sep 7, 2024

Please add some Python tests which for those files after an analysis.

I wanted to add these to test some local changes but those are already included in this PR.

/**
* CTU information
*/
std::string mCtuInfo;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this safe in terms of memory usage? I have not really understood what this is doing yet (maybe add some explanation to the PR) but appears to accumulate the the data in the memory and this could be megabytes in size or even much, much more.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should check it but my hypothesis is that the ctu info will not be huge.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

about memory usage, here is one small test case: cppcheck/test/cli/whole-program

  • the files whole1.c and whole2.c are 94 and 64 bytes.
  • the ctu-info that is generated for those are 220 bytes and 227 bytes.

in this test case ~160 bytes source code generates ~450 bytes ctu-info

I don't think that memory usage will be a large issue. If we pretend that scanning a large project with 160MB source code would require 450MB memory for ctu info.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

But that has to be constructed as a string and is required to be continuous memory.

And we also need to profile that. That sounds like it might slow down things quite a bit. Maybe we might need both modes?

@danmar
Copy link
Owner Author

danmar commented Sep 7, 2024

I 100% agree about adding more testing. I will do it. But I don't even manage to run our original tests yet, those do test this functionality also.

@firewave
Copy link
Collaborator

firewave commented Sep 7, 2024

I 100% agree about adding more testing. I will do it. But I don't even manage to run our original tests yet, those do test this functionality also.

Possible - as I mentioned I have not looked into yet.

_, _, stderr = cppcheck(args, cwd=__script_dir)
assert 'misra-c2012-5.8' in stderr
_, _, stderr = cppcheck(args, cwd=__script_dir)
assert 'misra-c2012-5.8' in stderr
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was only to refactor. But here is a test case that does not work in cppcheck main branch but does work in this branch.

@danmar danmar marked this pull request as ready for review September 8, 2024 12:45
@danmar
Copy link
Owner Author

danmar commented Sep 8, 2024

there is testing for CTU analysis in addons in test/cli/whole-program_test.py

the tests in that file broke when I started working on this refactoring.

@firewave
Copy link
Collaborator

firewave commented Sep 8, 2024

there is testing for CTU analysis in addons in test/cli/whole-program_test.py

the tests in that file broke when I started working on this refactoring.

Yes - I added those. There are also far from complete and have known issues which need to be fixed. So "breaking" them might actually be fixing those (I might not have not written all tests with XFAIL).

I was referring to tests which check what exists on the disk and is left (or not) on it and not just the analysis results.

The local changes I have is getting rid of the duplicated cleanup code and some further reduction of redundancies.

@danmar
Copy link
Owner Author

danmar commented Sep 8, 2024

I was referring to tests which check what exists on the disk and is left (or not) on it and not just the analysis results.

no cppcheck build dir: I don't see the point to write a test that tests no files are saved on the disk. the code and tests should be connected. I feel it would be like adding a test that checks that files.txt is not created in various directories.

cppcheck build dir: I add one more test

@danmar
Copy link
Owner Author

danmar commented Sep 9, 2024

no cppcheck build dir: I don't see the point to write a test that tests no files are saved on the disk. the code and tests should be connected. I feel it would be like adding a test that checks that files.txt is not created in various directories.

I added a test that no artifacts is remaining in the project folder. And locally that test fails. So I need to reconsider this..

gui/threadhandler.h Outdated Show resolved Hide resolved
lib/cppcheck.cpp Outdated Show resolved Hide resolved
@pytest.mark.parametrize("jobs,builddir", ((1,False), (1,True), (2,False), (2,True)))
def test_addon_no_artifacts(tmpdir, jobs, builddir):
"""Test that there are no artifacts left after analysis"""
shutil.copyfile(os.path.join(__script_dir, 'whole-program', 'whole1.c'), os.path.join(tmpdir, 'whole1.c'))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to copy the files?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because there can be artifacts in the test folder before I start the test. Want to have a clean folder with just the test files. This test is also executed locally and locally you might have artifacts..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems unexpected.

#6787 is about detecting such leftovers.

Copy link
Owner Author

@danmar danmar Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But #6787 only works in the CI. what if I add a file locally one way or another in the test folder. For instance by manually running cppcheck on a testfile and pressing Ctrl+C while there is some temporary dump file or something.
Or run this manually before you execute the test: ./cppcheck --dump test/cli
That is not tested in the CI.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it is a start (weirdly it doesn't fail). But so we at least know that it works as expected in the default (i.e. no fatal errors/interruption) case. We can move forward from that.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but it is a start (weirdly it doesn't fail).

sure. I don't disapprove that PR of course.

@firewave
Copy link
Collaborator

I only reviewed the general stuff. I still need to dig into the build dir stuff. Also still needs to be performance tested.

@danmar
Copy link
Owner Author

danmar commented Sep 27, 2024

I didn't expect that performance would be affected much. But according to my measurements this makes cppcheck faster at least in a self-check.

I built cppcheck from this branch and main branch using the Makefile using this build command:

make CXXFLAGS=-O2 MATCHCOMPILER=yes

And I saw faster analysis with the cppcheck binary built from this branch (cppcheck-11750) than from main branch (cppcheck-HEAD):

cppcheck-11750 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    6m37,058s
user    6m17,025s
sys     0m20,182s
cppcheck-11750 -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    3m55,146s
user    6m17,953s
sys     0m24,346s
cppcheck-11750 -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    1m57,996s
user    8m5,076s
sys     0m27,413s
cppcheck-HEAD --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    7m14,262s
user    6m54,306s
sys     0m20,013s
cppcheck-HEAD -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    4m28,635s
user    6m45,882s
sys     0m23,421s
cppcheck-HEAD -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    2m29,994s
user    8m11,637s
sys     0m25,597s

@firewave
Copy link
Collaborator

And I saw faster analysis with the cppcheck binary built from this branch (cppcheck-11750) than from main branch (cppcheck-HEAD)

Thanks for doing those tests.

But you cannot analyze the lib from the repo because you modified that - so you are comparing apples and slightly-not-apples. You need a fixed corpus for both runs.

Beside that I would not expect such an improvement from the code changes so that seems extremely suspect and I assume there is some data being omitted from analysis. Best would be to store the --debug output from a -j1 run and compare that.

@danmar
Copy link
Owner Author

danmar commented Sep 27, 2024

But you cannot analyze the lib from the repo because you modified that - so you are comparing apples and slightly-not-apples. You need a fixed corpus for both runs.

I compiled cppcheck-11750 and cppcheck-HEAD first and then I ran it all in 1 go from a script:

$ cat run.sh

echo "cppcheck-11750 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-11750 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null
echo "cppcheck-11750 -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-11750 -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null
echo "cppcheck-11750 -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-11750 -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null

echo "cppcheck-HEAD --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-HEAD --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null
echo "cppcheck-HEAD -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-HEAD -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null
echo "cppcheck-HEAD -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-HEAD -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null

So it should be the same files. I ate lunch while it was running so I wasn't modifying files in the meantime.

@danmar
Copy link
Owner Author

danmar commented Sep 27, 2024

Best would be to store the --debug output from a -j1 run and compare that.

hmm how would --debug output be useful does that contain any relevant timing info? you mean to ensure that the same corpus is checked?

Beside that I would not expect such an improvement from the code changes so that seems extremely suspect

Frankly I didn't expect it neither. I didn't expect much difference in performance at all.

@danmar
Copy link
Owner Author

danmar commented Sep 27, 2024

Frankly I didn't expect it neither. I didn't expect much difference in performance at all.

oh wait I will have to execute different addons.

@firewave
Copy link
Collaborator

hmm how would --debug output be useful does that contain any relevant timing info?

Since that shows the data we analyze - but that would only be actual code and not what we pass as CTU - so that is of no use. Maybe we should add a debug option which displays which CTU information we generate and pass to the analysis.

@danmar
Copy link
Owner Author

danmar commented Sep 27, 2024

I have updated the script.

ARGS="--premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib1"

git checkout main
make clean
make -j12 CXXFLAGS=-O2 MATCHCOMPILER=yes
cp cppcheck cppcheck-main

rm -rf lib1
cp -R lib lib1

git checkout fix-11750
make clean
make -j12 CXXFLAGS=-O2 MATCHCOMPILER=yes
cp cppcheck cppcheck-11750

cp premiumaddon-11750 premiumaddon
echo "cppcheck-11750 $ARGS"
time ./cppcheck-11750 $ARGS -q 2> /dev/null

git checkout main
cp premiumaddon-main premiumaddon
echo "cppcheck-main $ARGS"
time ./cppcheck-main $ARGS -q 2> /dev/null

echo DONE

The output is:

cppcheck-11750 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib1

real    5m11,995s
user    4m52,916s
sys     0m19,250s
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
cppcheck-main --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib1

real    5m22,830s
user    5m3,381s
sys     0m19,665s
DONE

The speedup compared to the previous run can be explained because I ran release builds of premiumaddon this time.

@firewave
Copy link
Collaborator

The speedup compared to the previous run can be explained because I ran release builds of premiumaddon this time.

That result makes way more sense. Thanks for clearing that up.

@firewave
Copy link
Collaborator

I still would like to have a final look but I am out for most of the next two days.

@danmar
Copy link
Owner Author

danmar commented Oct 1, 2024

@firewave friendly ping

1 similar comment
@danmar
Copy link
Owner Author

danmar commented Oct 6, 2024

@firewave friendly ping

@firewave
Copy link
Collaborator

firewave commented Oct 6, 2024

My concerns have been address but I still cannot wrap my head around the actual workflow. My mind just isn't really there. I will give it another spin tomorrow by debugging through it.

@@ -129,6 +129,10 @@ namespace {
return !mCriticalErrors.empty();
}

const std::string& getCtuInfo() const {
return mCtuInfo;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit strange that this lives in the just remotely related logger. I wonder if all the logic should live in its own object and cleanup should be RAII-based and such.

That would be similar to what I tried in #6634 which comes up short in making it a local object because it needs to intercept the logged error at some point (which also applies to the code here). So adding some kind of hook into the error logging which std, XML and CTU (and even SARIF) could use would make a lot of sense.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I understand these comments fully.

I do feel the ctu info belongs in cli and gui rather than lib. it's aggregated info from all threads. For me it's reasonable that CppcheckExecutor owns the data (directly or indirectly) however I can agree that the functionality does not technically belong in StdLogger actually.

We also have the mActiveCheckers and mCriticalErrors. Those do not really fit very well in the StdLogger neither.

I do not see a very elegant way to insert hooks. We could add one more ErrorLogger that owns the StdLogger?

 class AllLogger : public ErrorLogger {
 private:
     void reportErr(const ErrorMessage& errmsg) {
          ... handle active checkers, critical errors, ctu, ...

          // pass remaining errors to stdLogger
          stdLogger.reportErr(errmsg);
     }
     StdLogger stdLogger;
};

Do you have better ideas..?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I still suggest to put the condition in StdLogger that ensures the output is formatted with sarif/text/xml so it will take a ErrorMessage input and output the proper string..

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, again something in this file turns into a grab-bag after things got sorted out at some point.

This is not something we need to solve in this PR. I will see if I am put together a PR with the proposed hook based on the previously linked cleanup.

@firewave
Copy link
Collaborator

firewave commented Oct 7, 2024

The changes seem fine to me. That behavior is not changed much and it seems the test coverage is sufficient so if somewhere were wrong that should show up.

I will add the build dir injection for all tests soon and also test the before and after. So if there are corner case that should be (hopefully) uncovered by this.

@firewave
Copy link
Collaborator

firewave commented Oct 7, 2024

I forgot the essential in my last comment: Feel free to merge. 👍🙂

@danmar
Copy link
Owner Author

danmar commented Oct 7, 2024

Thanks!

@danmar danmar merged commit d00dbe0 into danmar:main Oct 7, 2024
63 checks passed
ludviggunne pushed a commit to ludviggunne/cppcheck that referenced this pull request Oct 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants