Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Treating arbitrarily correlated parameters with pyhf #2463

Open
1 task done
lorenzennio opened this issue Mar 27, 2024 · 3 comments
Open
1 task done

Treating arbitrarily correlated parameters with pyhf #2463

lorenzennio opened this issue Mar 27, 2024 · 3 comments
Labels
feat/enhancement New feature or request needs-triage Needs a maintainer to categorize and assign

Comments

@lorenzennio
Copy link
Contributor

Summary

So far, pyhf does not have the option to arbitrarily correlate modifier parameters. I have seen multiple workarounds implemented for Belle II analyses (and maybe also other places).

It was requested from the Belle II community, that I create a small extension package to pyhf, which lets you include arbitrary correlations between parameters:

https://github.com/lorenzennio/pyhfcorr

I have also added an example notebook here:

https://github.com/lorenzennio/pyhfcorr/blob/main/examples.ipynb

Would there be interest to merge this either into pyhf or cabinetry (@alexander-held), or should this just be kept as a standalone package?

Additional Information

Code of Conduct

  • I agree to follow the Code of Conduct
@lorenzennio lorenzennio added feat/enhancement New feature or request needs-triage Needs a maintainer to categorize and assign labels Mar 27, 2024
@kratsg
Copy link
Contributor

kratsg commented Mar 27, 2024

I think it would be useful to add this under contrib. We'll need to figure out the spec for this that is close to HS3 as possible. Aside -- more of a technical question -- in the example notebook provided

hist1 = np.array([2. , 3., 6.,  8.,  7., 7., 6., 2., 3., 1.])
hist2 = np.array([3. , 6., 9., 12., 15., 9., 6., 3., 3., 2.])

How do I know what the resulting fully-correlated histogram should look like under "corr": [[1., 1.], [1., 1.]],? Looking at the code you provided, it's not clear at all to me. It seems you use pca to determine the appropriate shifting that needs to be done for each entry in the correlation matrix, but I don't see how you're actually shifting or computing the new histograms/modifiers: https://github.com/lorenzennio/pyhfcorr/blob/32a006651e9410e6107ae5077021feddb17b7b50/src/pyhfcorr/decorrelate.py#L126-L154

Also from a naive look through, my guess is we want to use TensorViewer somehow or similar that we have in pyhf into this, but that might complicate the logic. We'll also need to add a few tests for it.

@alexander-held
Copy link
Member

I need to have a more detailed look, am I understanding correctly that this effectively defines new uncorrelated parameters? When reading this issue I at first expected multi-dimensional Gaussian constraint terms to model covariances, but I think this goes the opposite direction and defines pyhf-compatible uncorrelated parameters from correlated input.

@lorenzennio
Copy link
Contributor Author

lorenzennio commented Mar 28, 2024

@kratsg As you say, I am using PCA to transform the correlated variables to an equal number of uncorrelated ones. For each of the uncorrelated variables, I determine the corresponding shift and add this as a new modifier. For two fully correlated variables, only one new modifier is added with the sum of the up/down variations (in the histosys case) as data (the second modifier would be redundant in this case). (as in the first example here https://github.com/lorenzennio/pyhfcorr/blob/main/examples.ipynb)

The shifts are calculated here:
https://github.com/lorenzennio/pyhfcorr/blob/main/src/pyhfcorr/modifiers.py
which are called from this line:
https://github.com/lorenzennio/pyhfcorr/blob/32a006651e9410e6107ae5077021feddb17b7b50/src/pyhfcorr/decorrelate.py#L138

Does that answer your question?

We'll need to figure out the spec for this that is close to HS3 as possible.

Good point, we should align it with how correlations are treated in HS3. The other, simple solution, would be to just take the decorrelated spec, which is completely pyhf compatible and hence no changes would be needed. I guess, the prior approach would be more desirable though.

Also from a naive look through, my guess is we want to use TensorViewer somehow or similar that we have in pyhf into this, but that might complicate the logic.

Do you mean replacing the numpy dependency that I introduced with the general tensor backend used in pyhf? I can have a go at this if you want.

We'll also need to add a few tests for it.

I have added some tests here:
https://github.com/lorenzennio/pyhfcorr/tree/main/test
Of course, these could be extended as needed.

@alexander-held Exactly, I am transforming and adding new uncorrelated parameters. The reason I opted for this, is that it is a simple pre-processing step, and it is non-invasive to the actual pyhf/HistFactory model. What would the benefit of multivariate constraint terms, apart from the interpretability of the resulting modifier parameters (which is partially lost in my approach)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat/enhancement New feature or request needs-triage Needs a maintainer to categorize and assign
Projects
None yet
Development

No branches or pull requests

3 participants