Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent/wrong line coverage #185

Open
AndrzejKurek opened this issue Jan 20, 2023 · 28 comments
Open

Inconsistent/wrong line coverage #185

AndrzejKurek opened this issue Jan 20, 2023 · 28 comments

Comments

@AndrzejKurek
Copy link

AndrzejKurek commented Jan 20, 2023

Hi there, I'm investigating line coverage for a new feature that I'm introducing to Mbed TLS, and I'm seeing inconsistencies when it comes to line coverage (branch information seems fine):
image
line 1857 is hit, even though nothing executed it.

Reproduction steps:
Starting from here:

make CFLAGS='--coverage -g3 -O0' LDFLAGS='--coverage' -C tests test_suite_x509parse
(cd tests && ./test_suite_x509parse)
make lcov #runs [scripts/lcov.sh](https://github.com/Mbed-TLS/mbedtls/blob/development/scripts/lcov.sh)
firefox Coverage/index.html

Check library/x509_crt.c coverage, line 1856 & 1857. The insides of the condition should not be hit.
I reduced the number of tests run to one that highlights the problem.
If you're not able to run this test or you'd like me to produce a possibly minimal test case - let me know, I'll try harder :)

Tool versions:
lcov 1.14 and 1.16 tested;
gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

@henry2cox
Copy link
Collaborator

lcov is merely presenting coverage data generated by gcov.
Could you take a look at the gcov output you get from your testcase, and check whether gcov claims that your line 1857 is executed or not?
If gcov says 'not' and lcov says "yes" - then there is an lcov bug. Otherwise, gcc/gcov bug.

Another way that this can happen is if your source code version is not aligned with your build - so the instrumented executable refers to different line numbers than correspond to your current source code.

The other thing you probably want to do is to try the current lcov TOT, as there have been some extensive changes and quite a few fixes since 1.16.

Hope this helps
Henry

@henry2cox
Copy link
Collaborator

The other thing to try is to remove the "-O0" flag and see if that fixes your coverage report.
This would not be the first time that optimization caused a compiler to lose track of some information.

@xaizek
Copy link

xaizek commented Jan 21, 2023

-O0 doesn't enable any optimizations, it disables them:

Most optimizations are completely disabled at -O0 or if an -O level is not set on the command line, even if
individual optimization flags are specified.
[...]
-O0 Reduce compilation time and make debugging produce the expected results.  This is the default.

@henry2cox
Copy link
Collaborator

Bummer.
That flag change likely won't help, then.

Could also try with LLVM and see if that gives a different result. Different gcc version might also be interesting.
(As an aside: it can be quite interesting to compile and run your tests with two different compilers and/or versions - then look at a differential coverage report. I might suggest that here, too.)

@AndrzejKurek
Copy link
Author

AndrzejKurek commented Jan 23, 2023

-Og doesn't change anything either. Here's the gcov output with -a -c flags:

        3: 1856:            if (ret != 0) {
       1*: 1857:                return ret;
    %%%%%: 1857-block  0
        1: 1857-block  1
        -: 1858:            }

@henry2cox
Copy link
Collaborator

henry2cox commented Jan 23, 2023

the gcov man page says:

Executed basic blocks
having a statement with zero execution_count end with * character
and are colored with magenta color with the -k option.

.. which appears to be what we see here.
I think we see the statement which has a zero execution count - but I don't think I understand why it is instrumented this way, or what I could do in 'geninfo' to better handle the case.
Maybe the answer is to look for following "zero count' blocks - and then mark the corresponding statement as not executed?

Could you run the same thing with "gcov -i" (json format output) - and show me what that looks like as well?

Thanks
Henry

@AndrzejKurek
Copy link
Author

Could you run the same thing with "gcov -i" (json format output) - and show me what that looks like as well?

Sure, I'll upload the whole file though, since I can see that relevant data might be scattered.
The function this issue concerns is mbedtls_x509_parse_subject_alt_name, lines 1856 - 1858.
Grepping for 1857: {"branches": [], "count": 1, "line_number": 1857, "unexecuted_block": true, "function_name": "mbedtls_x509_parse_subject_alt_name"},
x509_crt_gcov.json.txt

@henry2cox
Copy link
Collaborator

henry2cox commented Jan 23, 2023

Could you take a quick look at some of the other lines which are marked 'hit' but which have unexecuted blocks.
If I blindly tell lcov that all of them should be marked 'not hit' - then I am not quite sure whether the result will be better than today, or worse. We might need a better fix than the simplistic workaround that first came to mind.

I replaced '{' with '\n{' in the JSON file, then ran

grep '"unexecuted_block": true' | grep -v '"count": 0'

to find a bunch of lines in your file - say, lline 995, 1001, 1331, 1372, ... and so on (23 lines in total)

@AndrzejKurek
Copy link
Author

AndrzejKurek commented Jan 23, 2023

If I blindly tell lcov that all of them should be marked 'not hit' - then I am not quite sure whether the result will be better than today, or worse.

It seems that for complex statements it's probably correct (there's a part of the condition on a given line that's not executed, but it's hard to parse this .json data manually), for example:

  • while (crt->version != 0 && crt->next != NULL) { <- crt->version equals 0, so the second part is not checked.

There are however also instances that are a bit puzzling to me:

  • line 995:
    if (cb != NULL) { <- this if is called but evaluates to false. I would expect that this doesn't show up as an unexecuted_block in the if line, but maybe you can correct me;
  • line 1457:
    if (buflen != 0 && buf[buflen - 1] == '\0' && <- buflen is not equal to 0, so the second part of the if is evaluated, but found false. I would also expect it to not show up as an unexecuted_block.

I'm not sure how this intersects with other sources of data, so I'll leave the interpretation up to you.

@henry2cox
Copy link
Collaborator

Sorry for all the requests...but your testcase seems to easily generate some conditions that my simple cases do not.
(I have not checked results from any of the actual applications.)

Could you run gcov again, this time adding flags "-c -a -i" (turn on branch data).
Might also be good to try with "-c -i" (without "all blocks").
I'm most interested to see if we find lines which are marked with "unexecuted block" which do not contain "branches".
If we did not ask for branches...then we appear to see a lot of them :-) If we ask...then that will be interesting.

Ultimately, once we have resolved this issue, it will be interesting to look at a differential coverage report comparing 'before' and 'after' - to see what the differences are. This could help us figure out if there are more issues we have to fix.

Thanks

@AndrzejKurek
Copy link
Author

Each of these calls resulted in the same file (same checksum and size). Commands ran:

make CFLAGS='--coverage -g3 -O0' LDFLAGS='--coverage' -C tests test_suite_x509parse
(cd tests && ./test_suite_x509parse)
cd ./library/
gcov -c -a -i ./x509_crt.c
mv ./x509_crt.gcov.json.gz ./x509_crt.gcov-cai.json.gz
gcov -c -i ./x509_crt.c
mv ./x509_crt.gcov.json.gz ./x509_crt.gcov-ci.json.gz 
gcov -i ./x509_crt.c
mv ./x509_crt.gcov.json.gz ./x509_crt.gcov-i.json.gz 

Let me know if I should run them in any other way.
x509_crt_gcov-cai.json.txt

@henry2cox
Copy link
Collaborator

Yeah...can you try yet again, with "-b -x -i".
(Misread the code...)

@AndrzejKurek
Copy link
Author

AndrzejKurek commented Jan 23, 2023

As an update in the meantime - running this with clang yielded better results:

 make CC=clang-14 CFLAGS='--coverage -g3 -O0' LDFLAGS='--coverage' -C tests test_suite_x509parse
(cd tests && ./test_suite_x509parse)
make lcov GCOV_TOOL="llvm-cov-14 gcov"

image

Local patch added:

diff --git a/scripts/gcov_tool.sh b/scripts/gcov_tool.sh
new file mode 100755
index 000000000..7c663d8da
--- /dev/null
+++ b/scripts/gcov_tool.sh
@@ -0,0 +1,2 @@
+#!/bin/sh
+exec ${GCOV_TOOL:-gcov} "$@"
diff --git a/scripts/lcov.sh b/scripts/lcov.sh
index 8d141eedf..02e9c25ec 100755
--- a/scripts/lcov.sh
+++ b/scripts/lcov.sh
@@ -46,10 +46,10 @@ set -eu
 lcov_library_report () {
     rm -rf Coverage
     mkdir Coverage Coverage/tmp
-    lcov --capture --initial --directory library -o Coverage/tmp/files.info
-    lcov --rc lcov_branch_coverage=1 --capture --directory library -o Coverage/tmp/tests.info
-    lcov --rc lcov_branch_coverage=1 --add-tracefile Coverage/tmp/files.info --add-tracefile Coverage/tmp/tests.info -o Coverage/tmp/all.info
-    lcov --rc lcov_branch_coverage=1 --remove Coverage/tmp/all.info -o Coverage/tmp/final.info '*.h'
+    lcov --gcov-tool "$PWD/scripts/gcov_tool.sh" --capture --initial --directory library -o Coverage/tmp/files.info
+    lcov --gcov-tool "$PWD/scripts/gcov_tool.sh" --rc lcov_branch_coverage=1 --capture --directory library -o Coverage/tmp/tests.info
+    lcov --gcov-tool "$PWD/scripts/gcov_tool.sh" --rc lcov_branch_coverage=1 --add-tracefile Coverage/tmp/files.info --add-tracefile Coverage/tmp/tests.info -o Coverage/tmp/all.info
+    lcov --gcov-tool "$PWD/scripts/gcov_tool.sh" --rc lcov_branch_coverage=1 --remove Coverage/tmp/all.info -o Coverage/tmp/final.info '*.h'
     gendesc tests/Descriptions.txt -o Coverage/tmp/descriptions
     genhtml --title "mbed TLS" --description-file Coverage/tmp/descriptions --keep-descriptions --legend --branch-coverage -o Coverage Coverage/tmp/final.info
     rm -f Coverage/tmp/*.info Coverage/tmp/descriptions

@AndrzejKurek
Copy link
Author

make CFLAGS='--coverage -g3 -O0' LDFLAGS='--coverage' -C tests test_suite_x509parse
(cd tests && ./test_suite_x509parse)
cd library
gcov -b -x -i ./x509_crt.c

x509_crt##78325959f5aaa72897d25968fdf44ccd.gcov.json.txt

@henry2cox
Copy link
Collaborator

Cool.
Would be interesting to generate differential report to see what else differs between LLVM and gcc:

genhtml -o differential --baseline-file gccData.info clangData.info --branch-coverage --function-coverage -filter line,branch,function
myFavoriteBrowswer differential/index.html

Then look for lines, branches, function in categories other than "UBC" and "CBC".
(Note that this treats hit counts as boolean data - simply comparing hit/not hit/not present between the data sets. It does not compare the counts themselves and so won't detect differences if one result says some point was executed 10 times whereas the other result says 1 time.

@henry2cox
Copy link
Collaborator

The simple hack triggers on lines 1857 and 1001 - but it looks like line 1001 is probably not actually hit in your data either.
At least, line 1000 seems not hit - but 1001 is marked as 'hit' without the hack...
The upshot is that this simple approach seems to work, at least for this compiler version and this testcase.

@AndrzejKurek
Copy link
Author

it looks like line 1001 is probably not actually hit in your data either.

Yep. Another faulty result, just not one that I noticed.

@henry2cox
Copy link
Collaborator

SHA 7a538ee contains the above hack.
Please try "lcov --capture --rc geninfo_unexecuted_blocks=1 ...." in your command line.
Please extract both with and without the new option - then create a differential report so we can see the differences.
This may or may not be what is needed, and may or may not require fixes or enhancements.

@AndrzejKurek
Copy link
Author

AndrzejKurek commented Jan 24, 2023

Unfortunately, running any genhtml diff as requested above results in an error. Command:

genhtml -o upd_gcov_opt_diff --baseline-file ./Coverage_upd_gcov_no_opt/tmp/final.info ./Coverage_upd_gcov_opt_on/tmp/final.info --branch-coverage --function-coverage -filter line,branch,function

result:
genhtml: Can't locate object method "nextTlaGroup" via package "SummaryInfo" at /usr/local/bin/genhtml line 7755.
lcov --version LCOV version v1.16-28-g7a538ee

@henry2cox
Copy link
Collaborator

Sorry about that. Reversed the parameter order by mistake. Please update to sha e5c10ef and try again.
In one of my other projects, the 'unexecuted blocks' hack turns out to change state (hit to not hit) of 33 lines.
Of those changes, 30 appear valid and 3 seem questionable. It appears that the unexecuted status check needs to be slightly more discriminating.

@AndrzejKurek
Copy link
Author

AndrzejKurek commented Jan 26, 2023

Results look better, lines 1001 and 1857 are now consistent with real behaviour. The diff yields some strange results though, for example coverage in the line range around 1977 is the same between gcov (with the new option on) vs llvm_cov, but the diff shows otherwise:
gcov
llvm-cov
diff
All results and two diffs (gcov with no geninfo_unexecuted_blocks vs gcov with this option on, gcov with this option on vs llvm-cov):
coverage_tests.tar.gz

Side-note: the llvm-cov coverage isn't without its flaws either, for example it shows coverage for line 789 of x509.c, while the line isn't executed. gcov coverage works fine in this particular case.
gcov-crt
lcov-crt

@henry2cox
Copy link
Collaborator

Getting there...but not arrived yet, it seems.
Sorry to be asking for all these experiments...but thank you for the help!

For one or two of the lines that are still categorized incorrectly by the new filtering option (claimed hit when not actually hit, or claimed not hit when actually hit): could you send me the gcov output data - both gcc and llvm if you have it.
I want to see if there is anything in the data that I can use to do a better job of discrimination. The other (also bad) hacky solution is to add some source-level filtering as well...if we can figure out some structural similarity for the cases that are going south.

We may not be able to get to perfection...but we can try. Correctness is important, if you are using this tool to drive verification.
Worst case: you may want or need to exclude the problematic code - or 'sign off' on it, and then ignore it in differential reports. (My internal users tend to add exclusions - on the theory that they have much bigger fish to fry and the number of exclusions is very small compared to the size of the code.)

@henry2cox
Copy link
Collaborator

Results look better, lines 1001 and 1857 are now consistent with real behaviour. The diff yields some strange results though, for example coverage in the line range around 1977 is the same between gcov (with the new option on) vs llvm_cov, but the diff shows otherwise:

Sorry..misread your comment/question regarding line 1977.
That is actually expected/correct behaviour. Category "UBC" indicates "Uncovered Baseline Code" - meaning that this line is unchanged from baseline to current version (source text is identical) and is not covered in the baseline coverage data and not covered in the current coverage data.

Why you might want that is for ongoing development:

  • you analyze your coverage report and decide that you don't really care about this particular uncovered piece of code (you don't try to add tests for it) - and instead you 'sign off'.
  • Time passes and you have done more development and have a new coverage report. Now you want to compare that new data to what you had signed off on previously...and you discover that line 1977 is still not covered. Not good - but not different. In particular: if you had decided to sign off previously - then we expect that your criteria has not changed and you still don't care about that line. In fact - you don't care about any of the points in the "UBC" category - because you know that you had signed off on those previously.
  • What you do care about are coverpoints in the "UNC", "LBC" and "UIC" categories. Those represent unexercised new or changed code.

The paper mentioned in the project README has a fairly detailed description about the topic of coverage criteria and automation.

The other thing I noticed in your result is that gcc and llvm are showing a different number of branches on line 787 (6 and 4, respectively, I think). This is very likely due to differences in how they generated exception handlers - and is yet another issue that makes branch coverage a bit painful to use :-(
When users really care, then we have some more features to tell the tool to ignore certain branches (because they are not real) and/or to mark that some are not reachable. (You probably don't want to ignore unreachable branches because you do want the tool to complain if you happen to reach one of them: something is wrong with your code, some conditions changed, or your thinking was incorrect.)

@GitMensch
Copy link

It looks to me that the initial issue is fixed now, isn't it?
In this case I highly suggest to close it and possibly create a new one (or a discussion post) about "more stuff".

@henry2cox
Copy link
Collaborator

Hi @AndrzejKurek (or @GitMensch ..or anyone) -
If you are still interested in this issue, could you try LLVM (version 14 or newer), to collect coverage data via LLVM profile procedure (rather than via the LLVM gcov procedure) - see #234 for somewhat detailed directions.

I'm curious if the LLVM profile data shows similar artifacts or not.
(So we would want to generate a differential coverage report - to compare gcc/gcov with llvm-profdata coverage data for unchanged source code and identical tests.)
Regardless of what we see - identical or not - it will be very interesting.

Thanks
Henry

@gilles-peskine-arm
Copy link

Hi @henry2cox Thanks for the update. In Mbed TLS, we're still interested in accuracy improvements, but I'm afraid we don't really have time to dig into this at the moment. Andrzej is no longer working on Mbed TLS. You should be able to compare measurements on Mbed TLS easily: just grab a release and run make lcov, this only requires a C toolchain plus gcov and lcov. (Or grab the development branch but you'll also need a few python modules.)

@henry2cox
Copy link
Collaborator

Hi Gilles -
Thanks for letting me know.
It turns out that I have a set of internal projects to experiment on - so I probably won't find time to try the Mbed TLS code base any time soon.
If someone else gets to it: I still think that the comparison we see will be interesting - all the more, because I can't share the internal results..whereas opensource projects are fair game.
Note that I haven't looked at the Mbed TLS build procedure - but I suspect that the rules for 'make lcov' will need to be modified (to use -fprofile-instr-generate -fcoverage-mapping instead of --coverage and llvm-perfdata/llvm-cov rather than lcov -capture - as well as possibly needing to change options used to redirect the raw coverage data files).
(Any undergrad or other interested students who want to give this a shot? After investigating any anomalies and tracking them down into the compielrs...I think you would have had some very relevant experience, as well as the makings of a good conference paper...possibly even a thesis.)
Henry

@GitMensch
Copy link

I'm in a similar boar as Gilles, while I'd find the result interesting I won't do that myself any time soon.
As another free software project for comparisons - you can do that in GnuCOBOL with configure --enable-code-coverage then make check-code-coverage, for more details see one of the GC discussion posts about that.
In this case you can also override the options and tools used --with-gcov="tool arg1 arg2" and can also override the necessary options, the defaults are

	CODE_COVERAGE_CPPFLAGS="-DNDEBUG"
	CODE_COVERAGE_CFLAGS="-O0 -g -fprofile-arcs -ftest-coverage"
	CODE_COVERAGE_CXXFLAGS="-O0 -g -fprofile-arcs -ftest-coverage"
	CODE_COVERAGE_LIBS="-lgcov"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants