Pruning of estimating the point value count in BooleanScorerSupplier #13988

kkewwei · 2024-11-12T01:00:13Z

Description

The pr aims to speed up computing cost in BooleanScorerSupplier with the leadCost, if there exists a lead query which cost is small , it will speed up the computing cost of rest in the bool.

Lucene benchmark: python3 src/python/localrun.py wikimedium10m
Hardware used: linux ecs.t2-c1m2dev.8xlarge | 32 cores | 64G

Report after iter 19:
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        Wildcard      204.70      (4.1%)      195.95      (4.6%)   -4.3% ( -12% -    4%) 0.002
                           range     3028.29      (9.7%)     2917.73     (10.3%)   -3.7% ( -21% -   18%) 0.249
                      AndHighLow      433.07      (3.7%)      422.23      (4.6%)   -2.5% ( -10% -    6%) 0.058
                      TermDTSort       84.40      (7.9%)       82.49      (6.2%)   -2.3% ( -15% -   12%) 0.312
                         Prefix3       76.79      (3.7%)       75.54      (5.1%)   -1.6% ( -10% -    7%) 0.245
                      HighPhrase       46.03      (4.0%)       45.52      (5.8%)   -1.1% ( -10% -    9%) 0.487
                       MedPhrase       18.85      (4.6%)       18.66      (4.9%)   -1.0% ( -10% -    8%) 0.490
               HighTermTitleSort       98.46      (4.6%)       97.70      (3.2%)   -0.8% (  -8% -    7%) 0.537
           HighTermDayOfYearSort      239.08      (6.8%)      237.24      (6.0%)   -0.8% ( -12% -   12%) 0.703
                        PKLookup      131.53      (3.9%)      130.56      (4.6%)   -0.7% (  -8% -    8%) 0.581
                       LowPhrase       21.51      (5.4%)       21.36      (4.8%)   -0.7% ( -10% -   10%) 0.682
       BrowseDayOfYearSSDVFacets       14.12     (13.0%)       14.03     (12.4%)   -0.6% ( -22% -   28%) 0.882
            MedTermDayTaxoFacets       35.01      (3.4%)       34.81      (2.8%)   -0.6% (  -6% -    5%) 0.571
                 MedSloppyPhrase       21.86      (3.0%)       21.75      (3.6%)   -0.5% (  -6% -    6%) 0.609
                      AndHighMed      117.34      (4.0%)      116.78      (4.1%)   -0.5% (  -8% -    7%) 0.710
                HighSloppyPhrase       22.99      (3.3%)       22.90      (3.8%)   -0.4% (  -7% -    6%) 0.712
     BrowseRandomLabelSSDVFacets        8.84      (4.5%)        8.81      (4.0%)   -0.4% (  -8% -    8%) 0.790
            HighIntervalsOrdered        7.43      (4.4%)        7.40      (4.1%)   -0.3% (  -8% -    8%) 0.814
                     AndHighHigh       48.15      (4.6%)       48.02      (4.6%)   -0.3% (  -9% -    9%) 0.848
                     MedSpanNear       94.70      (2.9%)       94.49      (3.1%)   -0.2% (  -6% -    6%) 0.821
                       OrHighMed       71.20      (7.8%)       71.10      (6.3%)   -0.1% ( -13% -   15%) 0.949
           BrowseMonthSSDVFacets       14.53      (5.2%)       14.55      (4.8%)    0.1% (  -9% -   10%) 0.937
                    HighSpanNear        1.92      (1.8%)        1.93      (1.6%)    0.2% (  -3% -    3%) 0.752
         AndHighMedDayTaxoFacets       32.00      (2.3%)       32.06      (2.7%)    0.2% (  -4% -    5%) 0.816
                     LowSpanNear        6.24      (2.1%)        6.26      (2.2%)    0.2% (  -4% -    4%) 0.776
        AndHighHighDayTaxoFacets        7.97      (2.8%)        7.99      (4.1%)    0.2% (  -6% -    7%) 0.840
            BrowseDateSSDVFacets        2.46     (20.7%)        2.46     (22.5%)    0.2% ( -35% -   54%) 0.974
          OrHighMedDayTaxoFacets        9.09      (2.6%)        9.11      (4.0%)    0.3% (  -6% -    7%) 0.770
            HighTermTitleBDVSort       10.86      (6.7%)       10.90      (4.9%)    0.3% ( -10% -   12%) 0.857
                          Fuzzy1       35.48      (2.6%)       35.63      (3.3%)    0.4% (  -5% -    6%) 0.659
             LowIntervalsOrdered       63.75      (3.4%)       64.05      (3.4%)    0.5% (  -6% -    7%) 0.669
             MedIntervalsOrdered       24.79      (6.0%)       24.92      (5.8%)    0.5% ( -10% -   13%) 0.777
                 LowSloppyPhrase      133.33      (6.1%)      134.05      (4.0%)    0.5% (  -9% -   11%) 0.739
                         Respell       41.42      (3.5%)       41.70      (3.3%)    0.7% (  -5% -    7%) 0.540
                          IntNRQ       44.62     (28.9%)       44.97     (27.1%)    0.8% ( -42% -   79%) 0.929
                      OrHighHigh       30.04      (7.4%)       30.30      (7.8%)    0.9% ( -13% -   17%) 0.716
               HighTermMonthSort     1217.65      (7.2%)     1231.77      (7.5%)    1.2% ( -12% -   17%) 0.617
                       OrHighLow      438.87      (3.6%)      444.22      (3.7%)    1.2% (  -5% -    8%) 0.290
                         LowTerm      411.15      (6.4%)      416.33      (5.4%)    1.3% (  -9% -   13%) 0.502
                          Fuzzy2       14.47      (2.6%)       14.66      (2.9%)    1.3% (  -4% -    7%) 0.127
     BrowseRandomLabelTaxoFacets       11.43     (24.5%)       11.66     (28.1%)    2.1% ( -40% -   72%) 0.805
                         MedTerm      489.43      (4.8%)      502.71      (6.4%)    2.7% (  -8% -   14%) 0.130
                   OrNotHighHigh      207.00      (6.1%)      212.81      (6.5%)    2.8% (  -9% -   16%) 0.158
                        HighTerm      267.15      (5.8%)      275.35      (7.7%)    3.1% (  -9% -   17%) 0.153
                    OrHighNotMed      320.80      (6.4%)      332.60      (6.1%)    3.7% (  -8% -   17%) 0.063
            BrowseDateTaxoFacets       15.25     (38.9%)       15.81     (43.6%)    3.7% ( -56% -  140%) 0.777
       BrowseDayOfYearTaxoFacets       15.59     (40.2%)       16.18     (43.9%)    3.8% ( -57% -  146%) 0.776
                    OrNotHighMed      168.53      (4.7%)      174.93      (4.9%)    3.8% (  -5% -   14%) 0.013
                    OrHighNotLow      291.68      (6.6%)      303.42      (8.0%)    4.0% (  -9% -   19%) 0.083
                    OrNotHighLow      555.79      (5.8%)      579.93      (5.8%)    4.3% (  -6% -   16%) 0.018
                   OrHighNotHigh      209.89      (6.2%)      219.36      (7.5%)    4.5% (  -8% -   19%) 0.039
           BrowseMonthTaxoFacets       15.01     (38.1%)       16.61     (47.4%)   10.7% ( -54% -  155%) 0.433

Closes #13554

kkewwei · 2024-11-12T01:04:17Z

@jpountz please have a look when you are free. I will add tests and changelog if it makes sense.

github-actions · 2024-11-27T00:24:07Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

Pruning of estimating the point value count since BooleanScorerSupplier

9398932

github-actions bot added the Stale label Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pruning of estimating the point value count in BooleanScorerSupplier #13988

Pruning of estimating the point value count in BooleanScorerSupplier #13988

kkewwei commented Nov 12, 2024 •

edited

Loading

kkewwei commented Nov 12, 2024 •

edited

Loading

github-actions bot commented Nov 27, 2024

Pruning of estimating the point value count in BooleanScorerSupplier #13988

Are you sure you want to change the base?

Pruning of estimating the point value count in BooleanScorerSupplier #13988

Conversation

kkewwei commented Nov 12, 2024 • edited Loading

Description

kkewwei commented Nov 12, 2024 • edited Loading

github-actions bot commented Nov 27, 2024

kkewwei commented Nov 12, 2024 •

edited

Loading

kkewwei commented Nov 12, 2024 •

edited

Loading