Treating arbitrarily correlated parameters with pyhf #2463

lorenzennio · 2024-03-27T15:28:21Z

Summary

So far, pyhf does not have the option to arbitrarily correlate modifier parameters. I have seen multiple workarounds implemented for Belle II analyses (and maybe also other places).

It was requested from the Belle II community, that I create a small extension package to pyhf, which lets you include arbitrary correlations between parameters:

https://github.com/lorenzennio/pyhfcorr

I have also added an example notebook here:

https://github.com/lorenzennio/pyhfcorr/blob/main/examples.ipynb

Would there be interest to merge this either into pyhf or cabinetry (@alexander-held), or should this just be kept as a standalone package?

Additional Information

Code of Conduct

I agree to follow the Code of Conduct

kratsg · 2024-03-27T19:36:30Z

I think it would be useful to add this under contrib. We'll need to figure out the spec for this that is close to HS3 as possible. Aside -- more of a technical question -- in the example notebook provided

hist1 = np.array([2. , 3., 6.,  8.,  7., 7., 6., 2., 3., 1.])
hist2 = np.array([3. , 6., 9., 12., 15., 9., 6., 3., 3., 2.])

How do I know what the resulting fully-correlated histogram should look like under "corr": [[1., 1.], [1., 1.]],? Looking at the code you provided, it's not clear at all to me. It seems you use pca to determine the appropriate shifting that needs to be done for each entry in the correlation matrix, but I don't see how you're actually shifting or computing the new histograms/modifiers: https://github.com/lorenzennio/pyhfcorr/blob/32a006651e9410e6107ae5077021feddb17b7b50/src/pyhfcorr/decorrelate.py#L126-L154

Also from a naive look through, my guess is we want to use TensorViewer somehow or similar that we have in pyhf into this, but that might complicate the logic. We'll also need to add a few tests for it.

alexander-held · 2024-03-27T19:42:41Z

I need to have a more detailed look, am I understanding correctly that this effectively defines new uncorrelated parameters? When reading this issue I at first expected multi-dimensional Gaussian constraint terms to model covariances, but I think this goes the opposite direction and defines pyhf-compatible uncorrelated parameters from correlated input.

lorenzennio · 2024-03-28T09:34:44Z

@kratsg As you say, I am using PCA to transform the correlated variables to an equal number of uncorrelated ones. For each of the uncorrelated variables, I determine the corresponding shift and add this as a new modifier. For two fully correlated variables, only one new modifier is added with the sum of the up/down variations (in the histosys case) as data (the second modifier would be redundant in this case). (as in the first example here https://github.com/lorenzennio/pyhfcorr/blob/main/examples.ipynb)

The shifts are calculated here:
https://github.com/lorenzennio/pyhfcorr/blob/main/src/pyhfcorr/modifiers.py
which are called from this line:
https://github.com/lorenzennio/pyhfcorr/blob/32a006651e9410e6107ae5077021feddb17b7b50/src/pyhfcorr/decorrelate.py#L138

Does that answer your question?

We'll need to figure out the spec for this that is close to HS3 as possible.

Good point, we should align it with how correlations are treated in HS3. The other, simple solution, would be to just take the decorrelated spec, which is completely pyhf compatible and hence no changes would be needed. I guess, the prior approach would be more desirable though.

Also from a naive look through, my guess is we want to use TensorViewer somehow or similar that we have in pyhf into this, but that might complicate the logic.

Do you mean replacing the numpy dependency that I introduced with the general tensor backend used in pyhf? I can have a go at this if you want.

We'll also need to add a few tests for it.

I have added some tests here:
https://github.com/lorenzennio/pyhfcorr/tree/main/test
Of course, these could be extended as needed.

@alexander-held Exactly, I am transforming and adding new uncorrelated parameters. The reason I opted for this, is that it is a simple pre-processing step, and it is non-invasive to the actual pyhf/HistFactory model. What would the benefit of multivariate constraint terms, apart from the interpretability of the resulting modifier parameters (which is partially lost in my approach)?

lorenzennio added feat/enhancement New feature or request needs-triage Needs a maintainer to categorize and assign labels Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treating arbitrarily correlated parameters with pyhf #2463

Treating arbitrarily correlated parameters with pyhf #2463

lorenzennio commented Mar 27, 2024

kratsg commented Mar 27, 2024

alexander-held commented Mar 27, 2024

lorenzennio commented Mar 28, 2024 •

edited

Loading

Treating arbitrarily correlated parameters with pyhf #2463

Treating arbitrarily correlated parameters with pyhf #2463

Comments

lorenzennio commented Mar 27, 2024

Summary

Additional Information

Code of Conduct

kratsg commented Mar 27, 2024

alexander-held commented Mar 27, 2024

lorenzennio commented Mar 28, 2024 • edited Loading

lorenzennio commented Mar 28, 2024 •

edited

Loading