Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pruning of estimating the point value count in BooleanScorerSupplier #13988

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kkewwei
Copy link
Contributor

@kkewwei kkewwei commented Nov 12, 2024

Description

The pr aims to speed up computing cost in BooleanScorerSupplier with the leadCost, if there exists a lead query which cost is small , it will speed up the computing cost of rest in the bool.

Lucene benchmark: python3 src/python/localrun.py wikimedium10m
Hardware used: linux ecs.t2-c1m2dev.8xlarge | 32 cores | 64G

Report after iter 19:
                            TaskQPS baseline      StdDevQPS my_modified_version      StdDev                Pct diff p-value
                        Wildcard      204.70      (4.1%)      195.95      (4.6%)   -4.3% ( -12% -    4%) 0.002
                           range     3028.29      (9.7%)     2917.73     (10.3%)   -3.7% ( -21% -   18%) 0.249
                      AndHighLow      433.07      (3.7%)      422.23      (4.6%)   -2.5% ( -10% -    6%) 0.058
                      TermDTSort       84.40      (7.9%)       82.49      (6.2%)   -2.3% ( -15% -   12%) 0.312
                         Prefix3       76.79      (3.7%)       75.54      (5.1%)   -1.6% ( -10% -    7%) 0.245
                      HighPhrase       46.03      (4.0%)       45.52      (5.8%)   -1.1% ( -10% -    9%) 0.487
                       MedPhrase       18.85      (4.6%)       18.66      (4.9%)   -1.0% ( -10% -    8%) 0.490
               HighTermTitleSort       98.46      (4.6%)       97.70      (3.2%)   -0.8% (  -8% -    7%) 0.537
           HighTermDayOfYearSort      239.08      (6.8%)      237.24      (6.0%)   -0.8% ( -12% -   12%) 0.703
                        PKLookup      131.53      (3.9%)      130.56      (4.6%)   -0.7% (  -8% -    8%) 0.581
                       LowPhrase       21.51      (5.4%)       21.36      (4.8%)   -0.7% ( -10% -   10%) 0.682
       BrowseDayOfYearSSDVFacets       14.12     (13.0%)       14.03     (12.4%)   -0.6% ( -22% -   28%) 0.882
            MedTermDayTaxoFacets       35.01      (3.4%)       34.81      (2.8%)   -0.6% (  -6% -    5%) 0.571
                 MedSloppyPhrase       21.86      (3.0%)       21.75      (3.6%)   -0.5% (  -6% -    6%) 0.609
                      AndHighMed      117.34      (4.0%)      116.78      (4.1%)   -0.5% (  -8% -    7%) 0.710
                HighSloppyPhrase       22.99      (3.3%)       22.90      (3.8%)   -0.4% (  -7% -    6%) 0.712
     BrowseRandomLabelSSDVFacets        8.84      (4.5%)        8.81      (4.0%)   -0.4% (  -8% -    8%) 0.790
            HighIntervalsOrdered        7.43      (4.4%)        7.40      (4.1%)   -0.3% (  -8% -    8%) 0.814
                     AndHighHigh       48.15      (4.6%)       48.02      (4.6%)   -0.3% (  -9% -    9%) 0.848
                     MedSpanNear       94.70      (2.9%)       94.49      (3.1%)   -0.2% (  -6% -    6%) 0.821
                       OrHighMed       71.20      (7.8%)       71.10      (6.3%)   -0.1% ( -13% -   15%) 0.949
           BrowseMonthSSDVFacets       14.53      (5.2%)       14.55      (4.8%)    0.1% (  -9% -   10%) 0.937
                    HighSpanNear        1.92      (1.8%)        1.93      (1.6%)    0.2% (  -3% -    3%) 0.752
         AndHighMedDayTaxoFacets       32.00      (2.3%)       32.06      (2.7%)    0.2% (  -4% -    5%) 0.816
                     LowSpanNear        6.24      (2.1%)        6.26      (2.2%)    0.2% (  -4% -    4%) 0.776
        AndHighHighDayTaxoFacets        7.97      (2.8%)        7.99      (4.1%)    0.2% (  -6% -    7%) 0.840
            BrowseDateSSDVFacets        2.46     (20.7%)        2.46     (22.5%)    0.2% ( -35% -   54%) 0.974
          OrHighMedDayTaxoFacets        9.09      (2.6%)        9.11      (4.0%)    0.3% (  -6% -    7%) 0.770
            HighTermTitleBDVSort       10.86      (6.7%)       10.90      (4.9%)    0.3% ( -10% -   12%) 0.857
                          Fuzzy1       35.48      (2.6%)       35.63      (3.3%)    0.4% (  -5% -    6%) 0.659
             LowIntervalsOrdered       63.75      (3.4%)       64.05      (3.4%)    0.5% (  -6% -    7%) 0.669
             MedIntervalsOrdered       24.79      (6.0%)       24.92      (5.8%)    0.5% ( -10% -   13%) 0.777
                 LowSloppyPhrase      133.33      (6.1%)      134.05      (4.0%)    0.5% (  -9% -   11%) 0.739
                         Respell       41.42      (3.5%)       41.70      (3.3%)    0.7% (  -5% -    7%) 0.540
                          IntNRQ       44.62     (28.9%)       44.97     (27.1%)    0.8% ( -42% -   79%) 0.929
                      OrHighHigh       30.04      (7.4%)       30.30      (7.8%)    0.9% ( -13% -   17%) 0.716
               HighTermMonthSort     1217.65      (7.2%)     1231.77      (7.5%)    1.2% ( -12% -   17%) 0.617
                       OrHighLow      438.87      (3.6%)      444.22      (3.7%)    1.2% (  -5% -    8%) 0.290
                         LowTerm      411.15      (6.4%)      416.33      (5.4%)    1.3% (  -9% -   13%) 0.502
                          Fuzzy2       14.47      (2.6%)       14.66      (2.9%)    1.3% (  -4% -    7%) 0.127
     BrowseRandomLabelTaxoFacets       11.43     (24.5%)       11.66     (28.1%)    2.1% ( -40% -   72%) 0.805
                         MedTerm      489.43      (4.8%)      502.71      (6.4%)    2.7% (  -8% -   14%) 0.130
                   OrNotHighHigh      207.00      (6.1%)      212.81      (6.5%)    2.8% (  -9% -   16%) 0.158
                        HighTerm      267.15      (5.8%)      275.35      (7.7%)    3.1% (  -9% -   17%) 0.153
                    OrHighNotMed      320.80      (6.4%)      332.60      (6.1%)    3.7% (  -8% -   17%) 0.063
            BrowseDateTaxoFacets       15.25     (38.9%)       15.81     (43.6%)    3.7% ( -56% -  140%) 0.777
       BrowseDayOfYearTaxoFacets       15.59     (40.2%)       16.18     (43.9%)    3.8% ( -57% -  146%) 0.776
                    OrNotHighMed      168.53      (4.7%)      174.93      (4.9%)    3.8% (  -5% -   14%) 0.013
                    OrHighNotLow      291.68      (6.6%)      303.42      (8.0%)    4.0% (  -9% -   19%) 0.083
                    OrNotHighLow      555.79      (5.8%)      579.93      (5.8%)    4.3% (  -6% -   16%) 0.018
                   OrHighNotHigh      209.89      (6.2%)      219.36      (7.5%)    4.5% (  -8% -   19%) 0.039
           BrowseMonthTaxoFacets       15.01     (38.1%)       16.61     (47.4%)   10.7% ( -54% -  155%) 0.433

Closes #13554

@kkewwei
Copy link
Contributor Author

kkewwei commented Nov 12, 2024

@jpountz please have a look when you are free. I will add tests and changelog if it makes sense.

Copy link

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

@github-actions github-actions bot added the Stale label Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pruning of estimating the point value count since BooleanScorerSupplier
1 participant