Fix #12266 (Performance: valueFlowCondition slowdown) #5837

danmar · 2024-01-05T11:32:23Z

I have executed test-my-pr.py on 532 packages

number of "main" warnings:

egrep -v valueFlow my_check_diff.log | egrep "^main " | wc -l
277

number of "your" warnings (genomicepidemiology-kma timed out without these changes):

egrep -v valueFlow my_check_diff.log | egrep "^your " | egrep -v genomicepidemiology-kma | wc -l
90

=> There are some false negatives. But it's rougly ~ 190 in 532 packages.. so it's not a major problem imho.

I can see a speedup. For the packages that took more than 1 second to analyze the timing factor was:

$ egrep -v " 0.[0-9][0-9] " my_check_diff_timing.log | sed 's/.* //' | egrep -v Factor | sort
0.04
0.08
0.10
0.10
0.13
0.18
0.19
0.20
0.21
0.21
0.23
0.24
0.25
0.26
0.26
0.27
0.28
0.28
0.29
0.31
0.31
0.31
0.31
0.31
0.32
0.33
0.34
0.35
0.35
0.35
0.37
0.37
0.38
0.39
0.39
0.40
0.42
0.42
0.43
0.43
0.44
0.44
0.44
0.45
0.46
0.46
0.46
0.46
0.46
0.47
0.48
0.49
0.49
0.50
0.50
0.50
0.50
0.50
0.51
0.51
0.51
0.52
0.52
0.52
0.52
0.54
0.54
0.54
0.54
0.54
0.55
0.56
0.56
0.56
0.56
0.56
0.57
0.57
0.57
0.57
0.58
0.58
0.58
0.59
0.59
0.59
0.60
0.61
0.61
0.61
0.62
0.62
0.62
0.62
0.63
0.63
0.63
0.63
0.63
0.64
0.65
0.65
0.65
0.65
0.65
0.65
0.65
0.65
0.65
0.65
0.66
0.66
0.66
0.67
0.67
0.67
0.67
0.68
0.68
0.69
0.69
0.69
0.69
0.69
0.70
0.70
0.70
0.70
0.70
0.71
0.71
0.71
0.72
0.73
0.73
0.73
0.73
0.73
0.73
0.73
0.74
0.75
0.75
0.75
0.75
0.75
0.75
0.75
0.75
0.75
0.75
0.76
0.76
0.76
0.76
0.76
0.77
0.77
0.77
0.77
0.78
0.78
0.78
0.78
0.79
0.79
0.79
0.80
0.80
0.80
0.80
0.80
0.80
0.80
0.80
0.81
0.81
0.82
0.82
0.82
0.82
0.82
0.83
0.83
0.83
0.84
0.84
0.84
0.84
0.85
0.85
0.85
0.85
0.86
0.86
0.86
0.86
0.86
0.86
0.86
0.86
0.87
0.88
0.88
0.88
0.89
0.89
0.89
0.90
0.90
0.90
0.90
0.90
0.90
0.90
0.90
0.91
0.91
0.92
0.92
0.92
0.93
0.93
0.93
0.93
0.94
0.94
0.95
0.95
0.96
0.96
0.96
0.96
0.97
0.97
0.97
0.97
0.99
0.99
0.99
1.00
1.00
1.00
1.01

firewave · 2024-01-05T13:18:01Z

lib/forwardanalyzer.cpp

@@ -646,6 +646,9 @@ namespace {
                    }
                } else if (tok->isControlFlowKeyword() && Token::Match(tok, "if|while|for (") &&
                           Token::simpleMatch(tok->next()->link(), ") {")) {
+                    const int branchCount = analyzer->countBranch();
+                    if (settings.checkLevel == Settings::CheckLevel::normal && branchCount > 4)


We should generate a message that informs the user of this so it is clear that the exhaustive mode needs to be used for the full analysis.

We should also make the threshold configurable.

firewave · 2024-01-05T14:46:02Z

I wonder if this is the same issue I saw with 4d9e69e.

pfultz2 · 2024-01-05T16:09:47Z

lib/analyzer.h

    virtual ~Analyzer() = default;
    Analyzer(const Analyzer&) = default;
 protected:
    Analyzer() = default;
+private:
+    int mBranchCount = 0;


This should go into the ForwardTraversal class instead.

okay. Fixed by c517458

pfultz2 · 2024-01-05T16:17:08Z

I do think 190 FNs seem like a lot, especially if these are real bugs. Thats a 35% chance of getting a FN in a package which seems kind of high.

danmar · 2024-01-05T18:52:40Z

I do think 190 FNs seem like a lot, especially if these are real bugs. Thats a 35% chance of getting a FN in a package which seems kind of high.

Another measure I think would be interesting is how many extra warnings you get with the extra analysis.. imagine there was 100 thousand warnings in total in those 537 packages then the exhaustive analysis will in general only give you 0.2% extra warnings for quite a substantial cpu penalty..

I don't have the exact numbers but I am thinking it would be interesting to write and run a script that creates such stats..
it can create separate stats for each severity..

danmar · 2024-01-06T00:12:56Z

I don't have the exact numbers but I am thinking it would be interesting to write and run a script that creates such stats..

I modified the test-my-pr.py script and here are some preliminary stats:

normal error:
4303
exhaustive error:
4311
normal warning:
7362
exhaustive warning:
7400
normal style:
57967
exhaustive style:
58031
normal performance:
5040
exhaustive performance:
5040
normal portability:
2679
exhaustive portability:
2690

I will publish the modified script later also. I have all the results, both normal and exhaustive, in case anybody wants to dig deeper and make some more stats..

pfultz2 · 2024-01-06T00:43:56Z

Thats 5% for warnings and 2% for errors.

Another measurement is the distribution across packages. If 90% of the FNs are in say 5% of the packages.

Also what about tweaking the threshold? Would 5 still have good performance while reducing the FNs?

danmar · 2024-01-06T09:29:41Z

Thats 5% for warnings and 2% for errors.

No wait.. it is 0.5% for warnings and 0.2% for errors.

Also what about tweaking the threshold? Would 5 still have good performance while reducing the FNs?

I have only tried 4.. I will try 5 and see what happens..

danmar · 2024-01-06T16:11:24Z

I have only tried 4.. I will try 5 and see what happens..

I am running the comparison script right now..

Current stats. Changed the "4" to "5":

normal error:
3684
exhaustive error:
3687
normal warning:
5528
exhaustive warning:
5540
normal style:
44419
exhaustive style:
44462
normal performance:
4066
exhaustive performance:
4066
normal portability:
2081
exhaustive portability:
2083

It's not the same packages as yesterday unfortunately so the stats can't be compared directly.

Here are some timing measurements for some files:

lib/tokenize.cpp: exhaustive=2m29s normal4=10.96s normal5=13.11s
p1.c: exhaustive=1m50s normal4=18s normal5=28s
p2.c: exhaustive=52s normal4=22s normal5=25s
p3.c: exhaustive=18s normal4=3s normal5=3s
p4.c: exhaustive=19s normal4=9s normal5=11s
p5.c: exhaustive=26s normal4=16s normal5=20s
p6.c: exhaustive=50s normal4=7s normal5=9s

I got p1.c-p6.c from a customer.

I am thinking that increasing the value from 4 to 5 seems to increase the analysis time with ~20% in these files. It's still considerably faster than the exhaustive mode. But still the payoff is not that huge imho it should be possible to detect more bugs (but not these particular bugs) with less cpu penalty.

pfultz2 · 2024-01-07T01:50:44Z

So thinking about this more, this could cause FPs because we rely on the lifetime being propagated to do better alias analysis. If the lifetimes bails at 4 conditions it could lead to us not recognizing that a variable was modified through an alias. Something like this:

int f(int a, bool b, bool c, bool d, int e) {
    int* g = &a;
    if(b)
        return 0;
    if(c)
        return 0;
    if(d)
        return 0;
    if(e < 1)
        return 0;
    if(a != 0)
        return 0;
    *g = 1;
    return d / a; // FP divide by zero
}

But I havent tested this yet. Are there FPs in the run? If there are FPs then the FNs would actually be higher.

danmar · 2024-01-08T12:18:20Z

ok so here are stats for each ID:

daniel@laptop:~/cppcheck4$ egrep '^exhaustive ' my_check_diff.log | sed 's/.*\[//' | sort | uniq -c
      4 arrayIndexOutOfBounds]
      4 arrayIndexOutOfBoundsCond]
      4 autovarInvalidDeallocation]
      6 containerOutOfBounds]
      3 ctunullpointer]
      1 danglingLifetime]
      5 invalidContainer]
      1 invalidLifetime]
      1 invalidPrintfArgType_sint]
     24 knownArgument]
    441 knownConditionTrueFalse]
      1 knownEmptyContainer]
      1 moduloofone]
      3 negativeIndex]
     22 nullPointer]
      8 nullPointerArithmeticRedundantCheck]
     35 nullPointerRedundantCheck]
      6 objectIndex]
     11 shiftNegativeLHS]
      1 shiftTooManyBitsSigned]
      6 uninitStructMember]
     72 uninitvar]
daniel@laptop:~/cppcheck4$ egrep '^normal ' my_check_diff.log | sed 's/.*\[//' | sort | uniq -c
      2 arrayIndexOutOfBounds]
      1 constParameterCallback]
     12 constParameterPointer]
      1 constParameterReference]
      9 constVariablePointer]
      1 cstyleCast]
      2 ctunullpointer]
      1 duplicateCondition]
      1 identicalInnerCondition]
      1 invalidContainer]
    108 knownConditionTrueFalse]
      6 legacyUninitvar]
      5 nullPointer]
      1 nullPointerArithmeticRedundantCheck]
     16 nullPointerRedundantCheck]
      1 shiftTooManyBits]
      7 uninitvar]
      1 unreadVariable]

There seems to be at least extra const* , cstyleCast, duplicateCondition, identicalInnerCondition, legacyUninitVar, unreadVariable warnings in the "normal" checking.

I will look into these..

chrchr-github · 2024-01-08T13:49:47Z

I suspect there are some effects of if (diag(tok)) checks, since e.g. cStyleCast has nothing to do with ValueFlow.

danmar · 2024-01-26T17:34:26Z

I see a number of such diffs:

normal smalt-0.7.6/src/diffstr.c:770:7: style: Condition '!dfsp->dstrp' is always false [knownConditionTrueFalse]
exhaustive smalt-0.7.6/src/diffstr.c:770:7: style: Condition '!dfsp->dstrp' is always false [knownConditionTrueFalse]
     smalt-0.7.6/src/diffstr.c:768:7: note: Assuming condition '!dfsp->dstrp' is false
     smalt-0.7.6/src/diffstr.c:770:7: note: Condition '!dfsp->dstrp' is always false

meaning there is no FN/FP but less details..

danmar · 2024-01-26T17:42:47Z

For information here is an example code where we get a constPointerParameter true positive with limited valueflow analysis. There is no warning with exhaustive checking. The parameter dbin can be const.

void Open_DB(char *dbin, DataBase **_DB, int info)
{    int i,j,v,nu; DataBase *DB; char *dbname,*fx,ext[4];
     if(*dbin==0){*_DB=NULL;return;}
     dbname = (char *) malloc(1+strlen(dbin)+File_Ext_NCmax);
     DB = (DataBase *) malloc(sizeof(DataBase)); assert(DB != NULL); *_DB=DB;
     strcpy(dbname,dbin); strcat(dbname,".info"); fx=&dbname[strlen(dbin)+1];
     if(info) {printf("Reading %s: ",dbname); fflush(0);}
     DB->Finfo=fopen(dbname,"r"); assert(DB->Finfo!=NULL);
     fscanf(DB->Finfo, "%d  %d %d %d  %d  %lld %d %lld %lld  %d %d %d %d",
	&DB->d,&DB->nV,&DB->nVmax,&DB->NUCmax,&DB->list_num, &DB->nNF,&DB->nSM,
	&DB->nNM, &DB->NB, &DB->sl_nNF, &DB->sl_SM, &DB->sl_NM, &DB->sl_NB);
     if(info) printf("%lldp (%dsl) %lldnf %lldb\n",
         2*(DB->nNF)-DB->nSM-DB->nNM+2*DB->sl_nNF-DB->sl_SM-DB->sl_NM, 2*
         DB->sl_nNF-DB->sl_SM-DB->sl_NM, DB->nNF+DB->sl_nNF,DB->NB+DB->sl_NB);
     for (v=1;v<VERT_Nmax;v++){ DB->nNUC[v]=0;
	for(nu=0; nu<NUC_Nmax; nu++) DB->NFnum[v][nu]=0;}
     for(i=0; i<DB->nV; i++){
	fscanf(DB->Finfo,"%d",&v); fscanf(DB->Finfo,"%d",&(DB->nNUC[v]));
    	for(j=0;j<DB->nNUC[v];j++){
      	    fscanf(DB->Finfo,"%d", &nu);
      	    fscanf(DB->Finfo,"%d", &(DB->NFnum[v][nu])); } }
     if(ferror(DB->Finfo)) {printf("File error in %s\n",dbname); exit(0);}
     fclose(DB->Finfo); ext[0]='v'; ext[3]=0; 		DB->v=DB->p=DB->nu=0;
     for(v=DB->d+1; v<=DB->nVmax; v++) if(DB->nNUC[v])
     {	ext[1]='0' + v / 10; ext[2]='0' + v % 10; strcpy(fx,ext);
	DB->Fv[v]=fopen(dbname,"rb"); assert(DB->Fv[v]!=NULL);
	if(0==DB->v) {DB->v=v; while(0==DB->NFnum[v][DB->nu]) (DB->nu)++;}
     }
}

danmar · 2024-01-29T15:04:45Z

The results looks acceptable and give a significant speedup.. I apply this.

firewave · 2024-02-23T18:15:16Z

I published #6025 to add the missing message about the disabled part of the valueflow.

firewave · 2024-03-20T13:05:16Z

The tests regarding false negatives/positives are currently not representative because the analysis pass is broken as soon as an incomplete variable is being detected - see https://trac.cppcheck.net/ticket/12526.

They need to be conducted again after #6153 has been merged.

They might also change with each addition to the library configurations. So the generated statistics are just a current snapshot and the results could be quite different especially if we fix some of the top ranking valueFlowBailoutIncompleteVar. In that case the speed would be impacted even more as we have more ValueFlow data.

firewave · 2024-03-21T23:32:16Z

The current diff values from daca:

MessageID                           2.13.0    Head
knownConditionTrueFalse             5388    1995

Unless there were false positives fixed we have over 5000 false negatives introduced by this.

firewave reviewed Jan 5, 2024

View reviewed changes

pfultz2 reviewed Jan 5, 2024

View reviewed changes

danmar force-pushed the fix-12266 branch from e978949 to c517458 Compare January 25, 2024 14:30

danmar force-pushed the fix-12266 branch 4 times, most recently from 2823f17 to 348cdac Compare January 27, 2024 09:02

danmar mentioned this pull request Jan 27, 2024

windows.cfg: Added constants from <winreg.h>. #5917

Merged

danmar force-pushed the fix-12266 branch from 8a4cd68 to 17137d0 Compare January 28, 2024 19:45

Partial fix for #12266 (Performance: valueFlowCondition slowdown)

2253189

danmar force-pushed the fix-12266 branch from 17137d0 to 2253189 Compare January 28, 2024 21:27

danmar changed the title ~~Partial fix for #12266 (Performance: valueFlowCondition slowdown)~~ Fix #12266 (Performance: valueFlowCondition slowdown) Jan 29, 2024

danmar merged commit 485df98 into danmar:main Jan 29, 2024
64 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #12266 (Performance: valueFlowCondition slowdown) #5837

Fix #12266 (Performance: valueFlowCondition slowdown) #5837

danmar commented Jan 5, 2024 •

edited

Loading

firewave Jan 5, 2024

firewave Jan 5, 2024

danmar Jan 5, 2024

firewave commented Jan 5, 2024

pfultz2 Jan 5, 2024

danmar Jan 25, 2024

pfultz2 commented Jan 5, 2024

danmar commented Jan 5, 2024

danmar commented Jan 6, 2024

pfultz2 commented Jan 6, 2024

danmar commented Jan 6, 2024

danmar commented Jan 6, 2024

pfultz2 commented Jan 7, 2024

danmar commented Jan 8, 2024

chrchr-github commented Jan 8, 2024

danmar commented Jan 26, 2024

danmar commented Jan 26, 2024

danmar commented Jan 29, 2024

firewave commented Feb 23, 2024

firewave commented Mar 20, 2024

firewave commented Mar 21, 2024

Fix #12266 (Performance: valueFlowCondition slowdown) #5837

Fix #12266 (Performance: valueFlowCondition slowdown) #5837

Conversation

danmar commented Jan 5, 2024 • edited Loading

firewave Jan 5, 2024

Choose a reason for hiding this comment

firewave Jan 5, 2024

Choose a reason for hiding this comment

danmar Jan 5, 2024

Choose a reason for hiding this comment

firewave commented Jan 5, 2024

pfultz2 Jan 5, 2024

Choose a reason for hiding this comment

danmar Jan 25, 2024

Choose a reason for hiding this comment

pfultz2 commented Jan 5, 2024

danmar commented Jan 5, 2024

danmar commented Jan 6, 2024

pfultz2 commented Jan 6, 2024

danmar commented Jan 6, 2024

danmar commented Jan 6, 2024

pfultz2 commented Jan 7, 2024

danmar commented Jan 8, 2024

chrchr-github commented Jan 8, 2024

danmar commented Jan 26, 2024

danmar commented Jan 26, 2024

danmar commented Jan 29, 2024

firewave commented Feb 23, 2024

firewave commented Mar 20, 2024

firewave commented Mar 21, 2024

danmar commented Jan 5, 2024 •

edited

Loading