Fix #11750 (refactor ctu-info to generate fewer artifacts on disk) #6778

danmar · 2024-09-07T11:39:23Z

No description provided.

firewave · 2024-09-07T20:10:15Z

Please add some Python tests which for those files after an analysis.

I wanted to add these to test some local changes but those are already included in this PR.

firewave · 2024-09-07T20:16:25Z

cli/cppcheckexecutor.cpp

+        /**
+         * CTU information
+         */
+        std::string mCtuInfo;


Is this safe in terms of memory usage? I have not really understood what this is doing yet (maybe add some explanation to the PR) but appears to accumulate the the data in the memory and this could be megabytes in size or even much, much more.

we should check it but my hypothesis is that the ctu info will not be huge.

about memory usage, here is one small test case: cppcheck/test/cli/whole-program

the files whole1.c and whole2.c are 94 and 64 bytes.

the ctu-info that is generated for those are 220 bytes and 227 bytes.

in this test case ~160 bytes source code generates ~450 bytes ctu-info

I don't think that memory usage will be a large issue. If we pretend that scanning a large project with 160MB source code would require 450MB memory for ctu info.

Thanks.

But that has to be constructed as a string and is required to be continuous memory.

And we also need to profile that. That sounds like it might slow down things quite a bit. Maybe we might need both modes?

danmar · 2024-09-07T20:38:56Z

I 100% agree about adding more testing. I will do it. But I don't even manage to run our original tests yet, those do test this functionality also.

firewave · 2024-09-07T21:35:04Z

I 100% agree about adding more testing. I will do it. But I don't even manage to run our original tests yet, those do test this functionality also.

Possible - as I mentioned I have not looked into yet.

danmar · 2024-09-08T12:44:40Z

test/cli/whole-program_test.py

+    _, _, stderr = cppcheck(args, cwd=__script_dir)
+    assert 'misra-c2012-5.8' in stderr
+    _, _, stderr = cppcheck(args, cwd=__script_dir)
+    assert 'misra-c2012-5.8' in stderr


My intention was only to refactor. But here is a test case that does not work in cppcheck main branch but does work in this branch.

danmar · 2024-09-08T12:46:38Z

there is testing for CTU analysis in addons in test/cli/whole-program_test.py

the tests in that file broke when I started working on this refactoring.

firewave · 2024-09-08T13:42:22Z

there is testing for CTU analysis in addons in test/cli/whole-program_test.py

the tests in that file broke when I started working on this refactoring.

Yes - I added those. There are also far from complete and have known issues which need to be fixed. So "breaking" them might actually be fixing those (I might not have not written all tests with XFAIL).

I was referring to tests which check what exists on the disk and is left (or not) on it and not just the analysis results.

The local changes I have is getting rid of the duplicated cleanup code and some further reduction of redundancies.

danmar · 2024-09-08T17:46:27Z

I was referring to tests which check what exists on the disk and is left (or not) on it and not just the analysis results.

no cppcheck build dir: I don't see the point to write a test that tests no files are saved on the disk. the code and tests should be connected. I feel it would be like adding a test that checks that files.txt is not created in various directories.

cppcheck build dir: I add one more test

danmar · 2024-09-09T10:03:20Z

no cppcheck build dir: I don't see the point to write a test that tests no files are saved on the disk. the code and tests should be connected. I feel it would be like adding a test that checks that files.txt is not created in various directories.

I added a test that no artifacts is remaining in the project folder. And locally that test fails. So I need to reconsider this..

gui/threadhandler.h

gui/checkthread.cpp

lib/cppcheck.cpp

test/cli/whole-program_test.py

firewave · 2024-09-10T15:02:00Z

test/cli/whole-program_test.py

+@pytest.mark.parametrize("jobs,builddir", ((1,False), (1,True), (2,False), (2,True)))
+def test_addon_no_artifacts(tmpdir, jobs, builddir):
+    """Test that there are no artifacts left after analysis"""
+    shutil.copyfile(os.path.join(__script_dir, 'whole-program', 'whole1.c'), os.path.join(tmpdir, 'whole1.c'))


Why do you need to copy the files?

because there can be artifacts in the test folder before I start the test. Want to have a clean folder with just the test files. This test is also executed locally and locally you might have artifacts..

That seems unexpected.

#6787 is about detecting such leftovers.

But #6787 only works in the CI. what if I add a file locally one way or another in the test folder. For instance by manually running cppcheck on a testfile and pressing Ctrl+C while there is some temporary dump file or something.
Or run this manually before you execute the test: ./cppcheck --dump test/cli
That is not tested in the CI.

Yes, but it is a start (weirdly it doesn't fail). But so we at least know that it works as expected in the default (i.e. no fatal errors/interruption) case. We can move forward from that.

Yes, but it is a start (weirdly it doesn't fail).

sure. I don't disapprove that PR of course.

test/cli/whole-program_test.py

firewave · 2024-09-10T15:05:25Z

I only reviewed the general stuff. I still need to dig into the build dir stuff. Also still needs to be performance tested.

danmar · 2024-09-27T11:59:23Z

I didn't expect that performance would be affected much. But according to my measurements this makes cppcheck faster at least in a self-check.

I built cppcheck from this branch and main branch using the Makefile using this build command:

make CXXFLAGS=-O2 MATCHCOMPILER=yes

And I saw faster analysis with the cppcheck binary built from this branch (cppcheck-11750) than from main branch (cppcheck-HEAD):

cppcheck-11750 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    6m37,058s
user    6m17,025s
sys     0m20,182s
cppcheck-11750 -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    3m55,146s
user    6m17,953s
sys     0m24,346s
cppcheck-11750 -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    1m57,996s
user    8m5,076s
sys     0m27,413s
cppcheck-HEAD --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    7m14,262s
user    6m54,306s
sys     0m20,013s
cppcheck-HEAD -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    4m28,635s
user    6m45,882s
sys     0m23,421s
cppcheck-HEAD -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib

real    2m29,994s
user    8m11,637s
sys     0m25,597s

firewave · 2024-09-27T12:35:37Z

And I saw faster analysis with the cppcheck binary built from this branch (cppcheck-11750) than from main branch (cppcheck-HEAD)

Thanks for doing those tests.

But you cannot analyze the lib from the repo because you modified that - so you are comparing apples and slightly-not-apples. You need a fixed corpus for both runs.

Beside that I would not expect such an improvement from the code changes so that seems extremely suspect and I assume there is some data being omitted from analysis. Best would be to store the --debug output from a -j1 run and compare that.

danmar · 2024-09-27T12:51:29Z

But you cannot analyze the lib from the repo because you modified that - so you are comparing apples and slightly-not-apples. You need a fixed corpus for both runs.

I compiled cppcheck-11750 and cppcheck-HEAD first and then I ran it all in 1 go from a script:

$ cat run.sh

echo "cppcheck-11750 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-11750 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null
echo "cppcheck-11750 -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-11750 -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null
echo "cppcheck-11750 -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-11750 -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null

echo "cppcheck-HEAD --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-HEAD --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null
echo "cppcheck-HEAD -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-HEAD -j2 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null
echo "cppcheck-HEAD -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib"
time ./cppcheck-HEAD -j8 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib -q 2> /dev/null

So it should be the same files. I ate lunch while it was running so I wasn't modifying files in the meantime.

danmar · 2024-09-27T12:55:54Z

Best would be to store the --debug output from a -j1 run and compare that.

hmm how would --debug output be useful does that contain any relevant timing info? you mean to ensure that the same corpus is checked?

Beside that I would not expect such an improvement from the code changes so that seems extremely suspect

Frankly I didn't expect it neither. I didn't expect much difference in performance at all.

danmar · 2024-09-27T12:59:10Z

Frankly I didn't expect it neither. I didn't expect much difference in performance at all.

oh wait I will have to execute different addons.

firewave · 2024-09-27T13:01:51Z

hmm how would --debug output be useful does that contain any relevant timing info?

Since that shows the data we analyze - but that would only be actual code and not what we pass as CTU - so that is of no use. Maybe we should add a debug option which displays which CTU information we generate and pass to the analysis.

danmar · 2024-09-27T15:45:20Z

I have updated the script.

ARGS="--premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib1"

git checkout main
make clean
make -j12 CXXFLAGS=-O2 MATCHCOMPILER=yes
cp cppcheck cppcheck-main

rm -rf lib1
cp -R lib lib1

git checkout fix-11750
make clean
make -j12 CXXFLAGS=-O2 MATCHCOMPILER=yes
cp cppcheck cppcheck-11750

cp premiumaddon-11750 premiumaddon
echo "cppcheck-11750 $ARGS"
time ./cppcheck-11750 $ARGS -q 2> /dev/null

git checkout main
cp premiumaddon-main premiumaddon
echo "cppcheck-main $ARGS"
time ./cppcheck-main $ARGS -q 2> /dev/null

echo DONE

The output is:

cppcheck-11750 --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib1

real    5m11,995s
user    4m52,916s
sys     0m19,250s
Switched to branch 'main'
Your branch is up to date with 'origin/main'.
cppcheck-main --premium=misra-c++-2023 -D__GNUC__ -D__CPPCHECK__ lib1

real    5m22,830s
user    5m3,381s
sys     0m19,665s
DONE

The speedup compared to the previous run can be explained because I ran release builds of premiumaddon this time.

firewave · 2024-09-27T15:55:47Z

The speedup compared to the previous run can be explained because I ran release builds of premiumaddon this time.

That result makes way more sense. Thanks for clearing that up.

test/cli/whole-program_test.py

firewave · 2024-09-27T23:48:06Z

I still would like to have a final look but I am out for most of the next two days.

danmar · 2024-10-01T14:21:06Z

@firewave friendly ping

danmar · 2024-10-06T02:55:55Z

@firewave friendly ping

firewave · 2024-10-06T17:56:15Z

My concerns have been address but I still cannot wrap my head around the actual workflow. My mind just isn't really there. I will give it another spin tomorrow by debugging through it.

firewave · 2024-10-07T16:04:43Z

cli/cppcheckexecutor.cpp

@@ -129,6 +129,10 @@ namespace {
            return !mCriticalErrors.empty();
        }

+        const std::string& getCtuInfo() const {
+            return mCtuInfo;


It feels a bit strange that this lives in the just remotely related logger. I wonder if all the logic should live in its own object and cleanup should be RAII-based and such.

That would be similar to what I tried in #6634 which comes up short in making it a local object because it needs to intercept the logged error at some point (which also applies to the code here). So adding some kind of hook into the error logging which std, XML and CTU (and even SARIF) could use would make a lot of sense.

I am not sure if I understand these comments fully.

I do feel the ctu info belongs in cli and gui rather than lib. it's aggregated info from all threads. For me it's reasonable that CppcheckExecutor owns the data (directly or indirectly) however I can agree that the functionality does not technically belong in StdLogger actually.

We also have the mActiveCheckers and mCriticalErrors. Those do not really fit very well in the StdLogger neither.

I do not see a very elegant way to insert hooks. We could add one more ErrorLogger that owns the StdLogger?

class AllLogger : public ErrorLogger { private: void reportErr(const ErrorMessage& errmsg) { ... handle active checkers, critical errors, ctu, ... // pass remaining errors to stdLogger stdLogger.reportErr(errmsg); } StdLogger stdLogger; };

Do you have better ideas..?

Personally I still suggest to put the condition in StdLogger that ensures the output is formatted with sarif/text/xml so it will take a ErrorMessage input and output the proper string..

Yeah, again something in this file turns into a grab-bag after things got sorted out at some point.

This is not something we need to solve in this PR. I will see if I am put together a PR with the proposed hook based on the previously linked cleanup.

firewave · 2024-10-07T18:58:48Z

The changes seem fine to me. That behavior is not changed much and it seems the test coverage is sufficient so if somewhere were wrong that should show up.

I will add the build dir injection for all tests soon and also test the before and after. So if there are corner case that should be (hopefully) uncovered by this.

firewave · 2024-10-07T18:59:49Z

I forgot the essential in my last comment: Feel free to merge. 👍🙂

danmar · 2024-10-07T19:17:02Z

Thanks!

…anmar#6778)

danmar marked this pull request as draft September 7, 2024 13:05

firewave reviewed Sep 7, 2024

View reviewed changes

danmar force-pushed the fix-11750 branch from f4756d0 to a4469e7 Compare September 8, 2024 11:05

danmar commented Sep 8, 2024

View reviewed changes

danmar marked this pull request as ready for review September 8, 2024 12:45

danmar force-pushed the fix-11750 branch from 0933c78 to 45363a8 Compare September 8, 2024 18:19