Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implementation of some of the ATLAS Collider DY Datasets #1866

Open
wants to merge 22 commits into
base: master
Choose a base branch
from

Conversation

Radonirinaunimi
Copy link
Member

@Radonirinaunimi Radonirinaunimi commented Nov 26, 2023

The following PR implements some of the ATLAS collider DY in the new commondata format. The status is summarized in the table below.

☑️ in Comparison vs. Old means that the results are fully identical while 🔴 means that comparisons are available but noticeable differences are perceived.

Dataset Name Comparison vs. Old General Comments Status
ATLAS_DY_7TEV_EMUON_Y ☑️ The old implementation used a luminosity uncertainty of 3.5% while in HepData it is 3.4% ☑️
ATLAS_DY_7TEV_DILEPTON_Y-CRAP ☑️ The new implementation (from HepData) is missing one source of uncorrelated systematics ☑️
ATLAS_DY_7TEV_DILEPTON_Y-FRAP ☑️ The new implementation (from HepData) is missing one source of uncorrelated systematics ☑️
ATLAS_WPWM_8TEV_MUON_Y FK tables are missing but old commondata exists ☑️
ATLAS_Z0_8TEV_LOWMASS_2D ☑️ ☑️
ATLAS_Z0_8TEV_HIGHMASS_2D 🔴 Slight differences in treatment of asymmetric systematic correlated uncertainties ☑️
ATLAS_Z0_8TEV_3D_CRAP Could not find dataset and FK tables to compare the implementation to ☑️
ATLAS_Z0_8TEV_3D_FRAP Could not find dataset and FK tables to compare the implementation to ☑️
ATLAS_DY_13TEV_FID ☑️ Just needs to fix the plotting to be as a function of Gauge bosons ☑️
ATLAS_Z0_8TEV_20FB_PT-INVDIST ☑️
ATLAS_Z0_8TEV_20FB_PT-RAPDIST ☑️

Remains TODO:

  • Define the plotting entries to be exactly the same as before

@Radonirinaunimi
Copy link
Member Author

@scarlehoff, is something maybe wrong with the loading when the treatment of the systematics is set to MULT?

Comparing the old:

In [25]: ds = API.dataset(dataset_input={"dataset": "ATLASWZRAP11CC", "cfac": ["QCD"]}, theoryid=600, use_cuts="internal")

In [26]: ds.load_commondata().systematics_table
Out[26]:
               ADD      MULT         ADD   MULT         ADD   MULT         ADD   MULT        ADD   MULT  ...         ADD   MULT         ADD   MULT         ADD   MULT          ADD  MULT          ADD      MULT
entry                                                                                                    ...
1       760.914560  0.131840 -126.973000 -0.022    5.771500  0.001    5.771500  0.001   0.000000  0.000  ...   23.086000  0.004   23.086000  0.004  409.776500  0.071  10388.70000   1.8   715.666000  0.124000
2       845.576046  0.146580  -92.299200 -0.016   11.537400  0.002   11.537400  0.002   5.768700  0.001  ...   40.380900  0.007  -63.455700 -0.011  184.598400  0.032  10383.66000   1.8   663.400500  0.115000
3       719.275700  0.123640 -215.247500 -0.037   23.270000  0.004   17.452500  0.003  -5.817500 -0.001  ...   -5.817500 -0.001  319.962500  0.055  459.582500  0.079  10471.50000   1.8   605.020000  0.104000
4       673.101395  0.114850  -41.024900 -0.007   52.746300  0.009   23.442800  0.004   5.860700  0.001  ...   35.164200  0.006   46.885600  0.008  298.895700  0.051  10549.26000   1.8   750.169600  0.128000
5       847.481382  0.144540   23.453200  0.004   76.222900  0.013   29.316500  0.005   5.863300  0.001  ...  205.215500  0.035 -134.855900 -0.023 -222.805400 -0.038  10553.94000   1.8   738.775800  0.126000
6       766.929414  0.128020  113.823300  0.019  137.786100  0.023   53.916300  0.009   5.990700  0.001  ...  -41.934900 -0.007  281.562900  0.047 -299.535000 -0.050  10783.26000   1.8   623.032800  0.104000
7      1951.014450  0.326940 -143.220000 -0.024  310.310000  0.052  113.382500  0.019 -53.707500 -0.009  ...   83.545000  0.014   95.480000  0.016  -41.772500 -0.007  10741.50000   1.8   853.352500  0.143000
8       784.152243  0.129790  -36.250200 -0.006   84.583800  0.014   24.166800  0.004  -6.041700 -0.001  ...  132.917400  0.022   36.250200  0.006  -48.333600 -0.008  10875.06000   1.8   815.629500  0.135000
9      1071.656301  0.176570   -6.069300 -0.001   66.762300  0.011   18.207900  0.003   6.069300  0.001  ...  121.386000  0.020  182.079000  0.030   78.900900  0.013  10924.74000   1.8  1019.642400  0.168000
10      854.792700  0.144050  -23.736000 -0.004   35.604000  0.006    0.000000  0.000  -5.934000 -0.001  ...  124.614000  0.021  -29.670000 -0.005  183.954000  0.031  10681.20000   1.8   884.166000  0.149000
...

with the new implementation:

In [13]: ds_new = API.dataset(dataset_input={"dataset": "ATLAS_DY_7TEV_DILEPTON_Y", "cfac": ["QCD"]}, theoryid=600, use_cuts="internal")

In [14]: ds_new.load_commondata().systematics_table
Out[14]:
               MULT          MULT          MULT          MULT          MULT          MULT          MULT      MULT  ...          MULT          MULT      MULT          MULT          MULT      MULT      MULT      MULT
entry                                                                                                              ...
1     -3.811834e-06  1.732652e-07  1.732652e-07  0.000000e+00 -1.732652e-07  8.663259e-07 -2.079182e-06  0.000025  ...  6.930607e-07 -3.465304e-07 -0.000003  6.930607e-07  6.930607e-07  0.000012  0.000023  0.000312
2     -2.773589e-06  3.466986e-07  3.466986e-07  1.733493e-07 -0.000000e+00  8.667464e-07 -2.080191e-06  0.000020  ...  1.386794e-06  0.000000e+00 -0.000005  1.213445e-06 -1.906842e-06  0.000006  0.000026  0.000312
3     -6.360120e-06  6.875806e-07  5.156854e-07 -1.718951e-07  3.437903e-07  2.234637e-06 -3.781693e-06  0.000023  ...  5.328749e-06  1.718951e-07  0.000005 -1.718951e-07  9.454233e-06  0.000014  0.000021  0.000309
4     -1.194397e-06  1.535653e-06  6.825123e-07  1.706281e-07  1.706281e-07  1.706281e-06 -2.559421e-06  0.000022  ...  4.095074e-06  3.412562e-07  0.000010  1.023768e-06  1.365025e-06  0.000009  0.000019  0.000307
5      6.822097e-07  2.217181e-06  8.527621e-07  1.705524e-07  1.705524e-07  2.046629e-06 -2.728839e-06  0.000026  ...  1.364419e-05  3.411048e-07 -0.000016  5.969335e-06 -3.922706e-06 -0.000006  0.000024  0.000307
6      3.171583e-06  3.839284e-06  1.502329e-06  1.669254e-07 -1.669254e-07  3.839284e-06 -3.505433e-06  0.000020  ...  4.506986e-06  8.346270e-07  0.000018 -1.168478e-06  7.845494e-06 -0.000008  0.000022  0.000300
7     -4.021785e-06  8.713867e-06  3.183913e-06 -1.508169e-06 -8.378718e-07  1.139506e-05 -9.719313e-06  0.000021  ...  5.194805e-06  1.005446e-06  0.000009  2.346041e-06  2.681190e-06 -0.000001  0.000055  0.000302
8     -9.930980e-07  2.317229e-06  6.620653e-07 -1.655163e-07 -3.310327e-07  2.979294e-06 -2.648261e-06  0.000021  ...  3.475843e-06  9.930980e-07  0.000005  3.641359e-06  9.930980e-07 -0.000001  0.000022  0.000298
9     -1.647636e-07  1.812400e-06  4.942909e-07  1.647636e-07  3.295273e-07  1.647636e-06 -1.482873e-06  0.000021  ...  6.425782e-06  1.153346e-06  0.000010  3.295273e-06  4.942909e-06  0.000002  0.000030  0.000297
10    -6.740816e-07  1.011122e-06  0.000000e+00 -1.685204e-07 -5.055612e-07  8.426020e-07 -8.426020e-07  0.000022  ...  1.196495e-05  1.516684e-06  0.000013  3.538928e-06 -8.426020e-07  0.000005  0.000024  0.000303

while you can see that dumped values are exactly the same (modulo the first column ADD and MULT in the old):

In [15]: ds_new.load_commondata().systematic_errors()
Out[15]:
       ATLASWZRAP11_1001  ATLASWZRAP11_1002  ATLASWZRAP11_1003  ATLASWZRAP11_1004  ATLASWZRAP11_1005  ...  ATLASWZRAP11_1128  ATLASWZRAP11_1129  ATLASWZRAP11_1130  UNCORR  ATLASLUMI11
entry                                                                                                 ...
1                 -0.022              0.001              0.001              0.000             -0.001  ...              0.004              0.004              0.071    0.13          1.8
2                 -0.016              0.002              0.002              0.001             -0.000  ...              0.007             -0.011              0.032    0.15          1.8
3                 -0.037              0.004              0.003             -0.001              0.002  ...             -0.001              0.055              0.079    0.12          1.8
4                 -0.007              0.009              0.004              0.001              0.001  ...              0.006              0.008              0.051    0.11          1.8
5                  0.004              0.013              0.005              0.001              0.001  ...              0.035             -0.023             -0.038    0.14          1.8
6                  0.019              0.023              0.009              0.001             -0.001  ...             -0.007              0.047             -0.050    0.13          1.8
7                 -0.024              0.052              0.019             -0.009             -0.005  ...              0.014              0.016             -0.007    0.33          1.8
8                 -0.006              0.014              0.004             -0.001             -0.002  ...              0.022              0.006             -0.008    0.13          1.8
9                 -0.001              0.011              0.003              0.001              0.002  ...              0.020              0.030              0.013    0.18          1.8
10                -0.004              0.006              0.000             -0.001             -0.003  ...              0.021             -0.005              0.031    0.14          1.8

@Radonirinaunimi Radonirinaunimi linked an issue Nov 26, 2023 that may be closed by this pull request
@scarlehoff
Copy link
Member

It does look wrong, specially since there's nothing that would justify a 10^-7, isn't there? The data is all > 1 so it cannot be a add vs mult problem, let me have a look.

@Radonirinaunimi
Copy link
Member Author

It does look wrong, specially since there's nothing that would justify a 10^-7, isn't there? The data is all > 1 so it cannot be a add vs mult problem, let me have a look.

By just converting the percentage (MULT) into the absolute value (ADD), that is representing the systematics as additive instead, the entries are exactly the same (omitting the first column of ds).

n [3]: ds_new = API.dataset(dataset_input={"dataset": "ATLAS_DY_7TEV_DILEPTON_Y", "cfac": ["QCD"]}, theoryid=600, use_cuts="internal")

In [4]: ds_new.load_commondata().systematics_table
Out[4]:
              ADD         ADD         ADD        ADD        ADD         ADD         ADD  ...        ADD         ADD         ADD        ADD         ADD         ADD         ADD
entry                                                                                    ...
1     -126.973000    5.771500    5.771500   0.000000  -5.771500   28.857500  -69.258000  ... -11.543000  -86.572500   23.086000   23.08600  409.776500   750.29500  10388.7000
2      -92.299200   11.537400   11.537400   5.768700  -0.000000   28.843500  -69.224400  ...   0.000000 -155.754900   40.380900  -63.45570  184.598400   865.30500  10383.6600
3     -215.247500   23.270000   17.452500  -5.817500  11.635000   75.627500 -127.985000  ...   5.817500  168.707500   -5.817500  319.96250  459.582500   698.10000  10471.5000
4      -41.024900   52.746300   23.442800   5.860700   5.860700   58.607000  -87.910500  ...  11.721400  357.502700   35.164200   46.88560  298.895700   644.67700  10549.2600
5       23.453200   76.222900   29.316500   5.863300   5.863300   70.359600  -93.812800  ...  11.726600 -551.150200  205.215500 -134.85590 -222.805400   820.86200  10553.9400
6      113.823300  137.786100   53.916300   5.990700  -5.990700  137.786100 -125.804700  ...  29.953500  641.004900  -41.934900  281.56290 -299.535000   778.79100  10783.2600
7     -143.220000  310.310000  113.382500 -53.707500 -29.837500  405.790000 -346.115000  ...  35.805000  316.277500   83.545000   95.48000  -41.772500  1969.27500  10741.5000
8      -36.250200   84.583800   24.166800  -6.041700 -12.083400  108.750600  -96.667200  ...  36.250200  181.251000  132.917400   36.25020  -48.333600   785.42100  10875.0600
9       -6.069300   66.762300   18.207900   6.069300  12.138600   60.693000  -54.623700  ...  42.485100  382.365900  121.386000  182.07900   78.900900  1092.47400  10924.7400
10     -23.736000   35.604000    0.000000  -5.934000 -17.802000   29.670000  -29.670000  ...  53.406000  445.050000  124.614000  -29.67000  183.954000   830.76000  10681.2000

So I think it is really a difference between how the systematics are represented (unless I am doing something stupid here).

@scarlehoff
Copy link
Member

The systematic_errors method are all absolute.

I think the difference might be that you are implementing the multiplicative uncertainties as % or relative, while in the new commondata format they should be implemented always as absolute.

#1679 (comment)

I thought that we had added this to the documentations but it seems we didn't. Let me update it!

@Radonirinaunimi
Copy link
Member Author

Radonirinaunimi commented Nov 27, 2023

The systematic_errors method are all absolute.

This definitely explains why using absolute $\oplus$ ADD works.

I think the difference might be that you are implementing the multiplicative uncertainties as % or relative, while in the new commondata format they should be implemented always as absolute.

#1679 (comment)

I thought that we had added this to the documentations but it seems we didn't. Let me update it!

Is this actually correct (I don't think so!)? If the values are quoted as absolute then their treatment have to be ADD, and reciprocally if the values are quoted as percentage then their treatment have to be MULT. I don't think one can have absolute values but treated as MULT , or percentage but treated as ADD.

@scarlehoff
Copy link
Member

Regardless on how they are given in hepdata (they could tell you it's a relative value but give you a table with the absolute values) you can convert them to absolute.

I honestly don't remember why we went for everything absolute, I guess it is more consistent this way.

@Radonirinaunimi
Copy link
Member Author

Regardless on how they are given in hepdata (they could tell you it's a relative value but give you a table with the absolute values) you can convert them to absolute.

I honestly don't remember why we went for everything absolute, I guess it is more consistent this way.

Right. I just want to emphasize that if everything now is given as absolute, then only the treatment ADD is allowed (not MULT).

Re everything absolute, we might want to keep in mind the following sentence from the docs:

While it may seem at first that the multiplicative error is spurious given the presence of the additive error and data central value, this may not be the case. For example, in a closure test scenario, the data central values may have been replaced in the CommonData file by theoretical predictions. Therefore if you wish to use a covariance matrix generated with the original multiplicative uncertainties via the method, you must also store the original multiplicative (percentage) error. For flexibility and ease of I/O this is therefore done in the CommonData file itself.

@scarlehoff
Copy link
Member

scarlehoff commented Nov 27, 2023

I just want to emphasize that if everything now is given as absolute, then only the treatment ADD is allowed (not MULT)

Why? The first thing the parser does is to make it relative to the central values. The way it is written in the actual file doesn't really matter that much.

(that said... it makes it unreliable in closure tests? we need @enocera here!)

@Radonirinaunimi
Copy link
Member Author

Why? The first thing the parser does is to make it relative to the central values. The way it is written in the actual file doesn't really matter that much.

But such extra-operation is not needed at all if everything is defined as Absolute $\oplus$ ADD. At the end of the day (modulo the CT business), the treatments ADD and MULT (and representation of thereof) are exactly the same information.

@scarlehoff
Copy link
Member

Not for the t0 covmat.

@Radonirinaunimi Radonirinaunimi marked this pull request as ready for review December 4, 2023 08:02
@Radonirinaunimi
Copy link
Member Author

@scarlehoff, @enocera, this is also ready for review. Here is the report: https://vp.nnpdf.science/kuWT56KBSlai3_-XqLZxeA==/

For some datasets, I couldn't find the commondata and/or FK tables to compare to.

@scarlehoff
Copy link
Member

The ones you didn't find the commondata for is because they have no corresponding old dataset, right?

@enocera
Copy link
Contributor

enocera commented Dec 4, 2023

I understand that these are the 3D ATLAS distributions, of which we implemented only the 2D version.

@enocera
Copy link
Contributor

enocera commented Dec 4, 2023

So let's forget about the 3D distributions, for the moment.

@scarlehoff
Copy link
Member

Ok! Thanks. First comments, then I'll start going through all the old-new datasets one by one:

What about these ones? Did you forget about them or are they part of another set (or maybe they have a different name in your list?)

  • ATLASZHIGHMASS49FB
  • ATLASLOMASSDY11EXT
  • ATLASWZRAP11CF (I see you do have the CC version so this might actually be forgotten!)
  • ATLAS_WZ_TOT_13TEV (maybe this one is the one you call ATLASWZTOT13TEV81PB ??

And these four I think already asked you about, so I know you were not taking care of them but just for completeness:

  • ATLAS_WP_JET_8TEV_PT
  • ATLAS_WM_JET_8TEV_PT
  • ATLASZPT8TEVMDIST
  • ATLASZPT8TEVYDIST

@Radonirinaunimi
Copy link
Member Author

Radonirinaunimi commented Dec 4, 2023

So the status is then the following:

  • ATLASZHIGHMASS49FB, ATLASLOMASSDY11EXT: these datasets I haven't touched on purpose because as far as I understood @cschwan was/has been looking into them (?).
  • ATLASWZRAP11CF, ATLASZPT8TEVMDIST, ATLASZPT8TEVYDIST: I genuinely missed these datasets. I will implement them in this PR.
  • ATLAS_WZ_TOT_13TEV: this is indeed an updated version of ATLASWZTOT13TEV81PB (I implemented the outdated one) in that the correct one should include the experimental correlation coefficients. I will fix the currently implemented one. This is now Done.
  • As for the _JET_: If no one is looking into them yet, I can also implement them in this PR.

All in all, still a few to be done before this PR is complete 😅

buildmaster/ATLAS_DY_7TEV_DILEPTON/metadata.yaml Outdated Show resolved Hide resolved
buildmaster/ATLAS_WPWM_8TEV_MUON/metadata.yaml Outdated Show resolved Hide resolved
buildmaster/ATLAS_WPWM_8TEV_MUON/metadata.yaml Outdated Show resolved Hide resolved
buildmaster/ATLAS_Z0_8TEV_HIGHMASS/metadata.yaml Outdated Show resolved Hide resolved
buildmaster/ATLAS_Z0_8TEV_3D/metadata.yaml Outdated Show resolved Hide resolved
buildmaster/ATLAS_Z0_8TEV_3D/metadata.yaml Outdated Show resolved Hide resolved
buildmaster/ATLAS_DY_13TEV/metadata.yaml Outdated Show resolved Hide resolved
kinematic_coverage: [_zero, mu2, sqrt_s]
kinematics:
variables:
_zero: {description: "", label: "", units: ""}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_zero: {description: "", label: "", units: ""}
_Zero: {description: "", label: "", units: ""}

(maybe I should automatically fill a column with zero when one is missing... the constrain of having k1, k2, k3 is silly in both directions...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be best indeed! In this case it we don't overcrowd the implementation with spurious variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are at it, how to plot the results as a function of the Gauge bosons? Right now, it seems that plot_x can only accept one of the kinematic variables. Here is an example using the corrected ATLAS_WZ_TOT_13TEV: https://vp.nnpdf.science/B4_E9eBKRjadW8gBoZZZRw==/

Copy link
Member

@scarlehoff scarlehoff Dec 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not doing this yet! (but I will)

For the plot_x, just do whatever was done in the previous plotting file. If things are not working it just means I had not encountered that situation before and I need to fix it.

(which might mean you have to use some specific kinematic override or transformation

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some of the previous files, plot_x are not defined, as is the case for the particular example above for instance https://github.com/NNPDF/nnpdf/blob/master/nnpdfcpp/data/commondata/PLOTTING_ATLAS_WZ_TOT_13TEV.yaml (not sure how exactly it works in such a case).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for now it will complain!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so for the benefit of producing report comparisons I leave its value to some reasonable kinematic while waiting for the parser to accommodate this. I added a TODO in the description to not forget about this.

Copy link
Member

@scarlehoff scarlehoff Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but don't worry for the comparisons for now, I'll try to fix by today! (I hope it's nothing supercomplicated)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(let me know when I should look at this again btw)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can look into this now, this is the only ATLAS dataset that not only has one kinematic variable that is zero but also should not have plot_x (in the same way as before) as it should plotted as a function of the Gauge bosons.

In 1976617, I have both removed the _zero from the kinematics and the plot_x in the plotting.

PS: As for the remaining 4 datasets (inc the JETS), I will finish them by early next week.

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the parser so that automatically repeats a column if one is missing.

Here's the report for the one with the weird plot_x option: https://vp.nnpdf.science/fkjMDKmrSC6XpBXv7-mhHA==

I've repeated the tests and now I have:

old: ATLASWZRAP36PB vs new: ATLAS_DY_7TEV_EMUON_Y
# Differences in the computation of chi2 32.119931215298024 vs 32.21855155561812
    The covmats are different
    even the diagonal

old: ATLAS_DY_2D_8TEV_LOWMASS vs new: ATLAS_Z0_8TEV_LOWMASS_2D
 > Everything ok

 > old: ATLAS_WZ_TOT_13TEV vs new: ATLAS_DY_13TEV_FID
The t0 chi2 is different: 10934.218358793676 vs 81138.4387008005

> old: ATLASDY2D8TEV vs new: ATLAS_Z0_8TEV_HIGHMASS_2D
% difference in the data 
 Differences in the computation of chi2  80.2445870631963 vs 76.30021056788038
    The covmats are different
    even the diagonal

In the last one I've noticed the data itself is different at the level of few-per-mille which could be driving the difference (since a difference in the data will modify also the covmat through the multiplicative uncertainties).

For the one that has a very different t0 (but nothing else was different) I guess the MULT and ADD uncertainties are very wrong? Or I've done something else wrong...

The one that is only t0, might be a problem with MULT and ADD?

buildmaster/ATLAS_DY_13TEV/metadata.yaml Outdated Show resolved Hide resolved
buildmaster/ATLAS_DY_13TEV/metadata.yaml Outdated Show resolved Hide resolved
@Radonirinaunimi
Copy link
Member Author

I've updated the parser so that automatically repeats a column if one is missing.

Here's the report for the one with the weird plot_x option: https://vp.nnpdf.science/fkjMDKmrSC6XpBXv7-mhHA==

I've repeated the tests and now I have:

old: ATLASWZRAP36PB vs new: ATLAS_DY_7TEV_EMUON_Y
# Differences in the computation of chi2 32.119931215298024 vs 32.21855155561812
    The covmats are different
    even the diagonal

old: ATLAS_DY_2D_8TEV_LOWMASS vs new: ATLAS_Z0_8TEV_LOWMASS_2D
 > Everything ok

 > old: ATLAS_WZ_TOT_13TEV vs new: ATLAS_DY_13TEV_FID
The t0 chi2 is different: 10934.218358793676 vs 81138.4387008005

> old: ATLASDY2D8TEV vs new: ATLAS_Z0_8TEV_HIGHMASS_2D
% difference in the data 
 Differences in the computation of chi2  80.2445870631963 vs 76.30021056788038
    The covmats are different
    even the diagonal

In the last one I've noticed the data itself is different at the level of few-per-mille which could be driving the difference (since a difference in the data will modify also the covmat through the multiplicative uncertainties).

For the one that has a very different t0 (but nothing else was different) I guess the MULT and ADD uncertainties are very wrong? Or I've done something else wrong...

The one that is only t0, might be a problem with MULT and ADD?

As usual, thanks a lot for the detailed checks! For the one with different t0, I am a bit surprised that this is the case. I thought that I had check that the treatment of the systematics were the same as before. I will check again.

As for the rests, the differences are exactly understood. Before implementing the legacy versions, maybe I am just missing something from the new hepdata (?), which we'd need @enocera.

PS: I will also check the boson plotting now.

@scarlehoff
Copy link
Member

Thanks @Radonirinaunimi, your last commit fixes the t0 issue.

@Radonirinaunimi
Copy link
Member Author

This is also now ready for review.

For all of the datasets (except one), they have been implemented in the same way as in the old commondata (for legacy purposes), and comments are left in the table above to describe what I've found to be different wrt the hepdata. Nevertheless, sometimes, the numerical values of the correlated systematics (and even the central values) are not exactly equal because it might happen that the numerical values quoted in the hepdata tables are slightly different from the rawdata used in the old commondata.

PS: there are only the ATLAS_Z0_8TEV_20FB_PT-* datasets which raise some weird errors regarding indexing when computing data vs theory comparisons although the data can be loaded properly and the entries of the tables are exactly the same.

@scarlehoff
Copy link
Member

When the results are different you can implement the hepdata one and then a legacy variant with the different version (that it is compatible with the old one). This is preferred.

Btw, did you check that when loading the entire set of datasets the associated covariance matrix is the same as the old (same for the datasets in the other PRs)?

@Radonirinaunimi
Copy link
Member Author

Radonirinaunimi commented Jan 9, 2024

When the results are different you can implement the hepdata one and then a legacy variant with the different version (that it is compatible with the old one). This is preferred.

The issue that I am struggling at the moment is that I am not sure if it makes sense to have legacy versions for some particular datasets are not. And this is really one of the things we should discuss (cc @enocera). Let me provide two explicit examples:

  • Take CMS_WP_7TEV_MUON_ASY for example, when one downloads the full thing from hepdata there are two different type of files: the usual hepdata table (as shown on the HepData interface) and the rawdata (usually in txt or dat format and does not follow any convention/structure). In most of the old implementation, the rawdata were used. However, the numerical values in both are not always the same and thus the covariance matrix slightly differ. If we resort to always use the rawdata, then some of the entries in the metadata (such as tables) will be deprecated.
  • Then, there are the cases in which maybe some conscientious decisions were made (?) such as the example of ATLAS_DY_7TEV_EMUON_Y. In the paper, it is mentioned that luminosity uncertainties are about $3.5$% (and this was the value used in the old implementation) but in the hepdata entries the values are $3.4$%.

Btw, did you check that when loading the entire set of datasets the associated covariance matrix is the same as the old (same for the datasets in the other PRs)?

Yes, for the datasets listed here and have a checkmark in the column comparison vs old. For some of the CMS datasets in #1869, it is a bit tricky because of the numerical differences mentioned in the first point as I tried to use as much as possible the hepdata files instead.

Base automatically changed from new_commondata_collected to master February 16, 2024 09:57
@scarlehoff
Copy link
Member

I'm going to rebase these datasets on top of the ones currently in master.

@Radonirinaunimi I'll leave this as PR and not merge immediately in case you want to rollback the changes that you did for legacy purposes. Now we have the legacy version for reproduction as the copy from the old one but I think it is better in general to have the proper hepdata one as well

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need some feedback here, some of the keys in the plotting dictionary refer to the wrong datasets. I guess you copied part of it since it is shared, but could you have a second look to make sure the rest is ok?

(if it is only the labels I can change those)

npoints: [48]
plotting:
kinematics_override: ewk_rap_sqrt_scale
dataset_label: "ATLAS DY 2D 8 TeV low mass"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the high mass dataset, is the rest of the plotting data equal between the two?

npoints: [8, 11, 11]
plotting:
kinematics_override: ewk_rap_sqrt_scale
dataset_label: "LHCb $W,Z \\to \\mu$ 8 TeV"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one also contains data from another ds.

npoints: [9, 6]
plotting:
kinematics_override: ewk_rap_sqrt_scale
dataset_label: "LHCb $W,Z \\to \\mu$ 7 TeV"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this one

@Radonirinaunimi
Copy link
Member Author

I'm going to rebase these datasets on top of the ones currently in master.

@Radonirinaunimi I'll leave this as PR and not merge immediately in case you want to rollback the changes that you did for legacy purposes. Now we have the legacy version for reproduction as the copy from the old one but I think it is better in general to have the proper hepdata one as well

That sounds good! I will revert back to before it produced the legacy versions. I guess in doing so, I will need to call the uncertainty files to something else?

Thanks for the comments plotting metadata, I will have a second look at them and make sure they are fully correct.

Copy link
Member

@scarlehoff scarlehoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are quite some changes to be made to the metadata of these files, I will move it to the right folder, with the right names, etc, and I'll leave you to review the labels / plotting options / etc @Radonirinaunimi

data/kinematics/uncertainties should be ok, I've used your data when it was compatible up to 10^-3 with the legacy data and the legacy data when it was between 10^-2 and 10^-3 (every difference was sub-% anyway)

ndata: 64
npoints: [8, 8, 8, 20, 20]
plotting:
kinematics_override: ewk_ptrap_sqrt_scale
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has also change with respect to what's in the old commondata files (which uses jet_sqrt_scale)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Complete porting old Collider DY into the new format
3 participants