Skip to content

Latest commit

 

History

History
359 lines (277 loc) · 14 KB

talk.md

File metadata and controls

359 lines (277 loc) · 14 KB

class: middle, center, title-slide count: false

pyhf: pure-Python

implementation of HistFactory


(for the dev team)
Matthew Feickert
[email protected]

PyHEP 2019

October 18th, 2019


pyhf team



.grid[ .kol-1-4.center[ .circle.width-80[Lukas]

Lukas Heinrich

CERN ] .kol-1-4.center[ .circle.width-80[Matthew]

Matthew Feickert

Illinois ] .kol-1-4.center[ .circle.width-80[Giordon]

Giordon Stark

UCSC SCIPP ] .kol-1-4.center[ .circle.width-70[Kyle]

Kyle Cranmer

NYU ] ]

.kol-3-4.center.bold[Core Developers] .kol-1-4.center.bold[Advising]


Why is the likelihood important?


.kol-1-2.width-90[

  • High information-density summary of analysis
  • Almost everything we do in the analysis ultimately affects the likelihood and is encapsulated in it
    • Trigger
    • Detector
    • Systematic Uncertainties
    • Event Selection
  • Unique representation of the analysis to preserve ] .kol-1-2.width-90[


    likelihood_connections ]

Likelihood serialization...

.center[...making good on 19 year old agreement to publish likelihoods]


.center.width-100[ likelihood_publishing_agreement ]

.center[(1st Workshop on Confidence Limits, CERN, 2000)]

.bold[This hadn't been done in HEP until now]

  • In an "open world" of statistics this is a difficult problem to solve
  • What to preserve and how? All of ROOT?
  • Idea: Focus on a single more tractable binned model first

Enter HistFactory


  • A flexible p.d.f. template to build statistical models from binned distributions and data
  • Developed by Cranmer, Lewis, Moneta, Shibata, and Verkerke [1]
  • Widely used by the HEP community for standard model measurements and BSM searches

.kol-1-1.center[ .width-100[HistFactory_uses] ]


HistFactory Template

$$ f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \color{blue}{\prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right)} \,\color{red}{\prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right)} $$

$$ \nu_{cb}(\vec{\eta}, \vec{\chi}) = \sum_{s \,\in\, \textrm{samples}} \underbrace{\left(\sum_{\kappa \,\in\, \vec{\kappa}} \kappa_{scb}(\vec{\eta}, \vec{\chi})\right)}_{\textrm{multiplicative}} \Bigg(\nu_{scb}^{0}(\vec{\eta}, \vec{\chi}) + \underbrace{\sum_{\Delta \,\in\, \vec{\Delta}} \Delta_{scb}(\vec{\eta}, \vec{\chi})}_{\textrm{additive}}\Bigg) $$

.bold[Use:] Multiple disjoint channels (or regions) of binned distributions with multiple samples contributing to each with additional (possibly shared) systematics between sample estimates

.bold[Main pieces:]

  • .blue[Main Poisson p.d.f. for simultaneous measurement of multiple channels]
  • .katex[Event rates] $\nu_{cb}$ from nominal rate $\nu_{scb}^{0}$ and rate modifiers $\kappa$ and $\Delta$
  • .red[Constraint p.d.f. (+ data) for "auxiliary measurements"]
    • encoding systematic uncertainties (normalization, shape, etc)
  • $\vec{n}$: events, $\vec{a}$: auxiliary data, $\vec{\eta}$: unconstrained pars, $\vec{\chi}$: constrained pars

HistFactory Template

$$ f\left(\vec{n}, \vec{a}\middle|\vec{\eta}, \vec{\chi}\right) = \prod_{c \,\in\, \textrm{channels}} \prod_{b \,\in\, \textrm{bins}_c} \textrm{Pois} \left(n_{cb} \middle| \nu_{cb}\left(\vec{\eta}, \vec{\chi}\right)\right) \prod_{\chi \,\in\, \vec{\chi}} c_{\chi} \left(a_{\chi}\middle|\chi\right) $$

.bold[This is a mathematical representation!] Nowhere is any software spec defined

Until now, the only implementation of HistFactory has been in RooStats+RooFit


  • Preservation: Likelihood stored in the binary ROOT format
    • Challenge for long-term preservation (i.e. HEPData)
    • Why is a histogram needed for an array of numbers?
  • To start using HistFactory p.d.f.s first have to learn ROOT, RooFit, RooStats
    • Problem for our theory colleagues (generally don't want to)
  • Difficult to use for reinterpretation

pyhf: HistFactory in pure Python

.kol-1-2.width-95[

  • First non-ROOT implementation of the HistFactory p.d.f. template
    • DOI
  • pure-Python library as second implementation of HistFactory

.kol-1-1[

  • Has a JSON spec that .blue[fully] describes the HistFactory model
    • JSON: Industry standard, parsable by every language, human & machine readable, versionable and easily preserved (HEPData is JSON)
  • Open source tool for all of HEP
    • Originated from a DIANA/HEP project fellowship and now an IRIS-HEP supported project
    • Used for reinterpretation in phenomenology paper [2]
    • Used internally in ATLAS for pMSSM SUSY large scale reinterpretation ]

Example pyhf JSON spec

JSON defining a single channel, two bin counting experiment with systematics

.center.width-100[demo_JSON]


$CL_{s}$ Example using pyhf CLI

.center.width-80[demo_CLI]


JSON Patch for new signal models

.kol-1-2[
.center.width-100[demo_JSON] .center[Original model] ] .kol-1-2[

.center.width-100[patch_file] .center[New Signal (JSON Patch file)] ] .kol-1-1[ .center.width-80[demo_JSON] .center[Reinterpretation] ]


JSON Patch for new signal models

.center.width-80[signal_reinterpretation] .kol-1-2[ .center.width-70[measurement_cartoon] .center[Original analysis (model A)] ] .kol-1-2[ .center.width-70[reinterpretation_cartoon] .center[Recast analysis (model B)] ]


Likelihoods preserved on HEPData

  • Background-only model JSON stored
  • Signal models stored as JSON Patch files
  • Together are able to fully preserve the full model

.center.width-90[HEPData_likelihoods]

.footnote[Updated on 2019-10-21]


...can be streamed from HEPData

  • Background-only model JSON stored
  • Signal models stored as JSON Patch files
  • Together are able to fully preserve the full model

.center.width-100[HEPData_streamed_likelihoods]

.footnote[Updated on 2019-10-21]


ROOT + XML to JSON and back

.center.width-100[flowchart]


Likelihood serialization and reproduction

  • ATLAS PUB note on the JSON schema for serialization and reproduction of results (ATL-PHYS-PUB-2019-029)
    • Contours: .root[█] original ROOT+XML, .pyhf[█] pyhf JSON, .roundtrip[█] JSON converted back to ROOT+XML
      • Overlay of contours nice visualization of near perfect agreement
    • Serialized likelihood and reproduced results of ATLAS Run-2 search for sbottom quarks (CERN-EP-2019-142) and published to HEPData
    • Shown to reproduce results but faster! .bold[ROOT:] 10+ hours .bold[pyhf:] < 30 minutes

.kol-1-2.center.width-100[ overlay_multiplex_contour ] .kol-1-2.right.width-75[ discrepancy ]


Live demo time!



.center.bold[Just click the button!]


.center.width-70[Binder]


Summary

Through pyhf are able to provide:

  • .bold[JSON specification] of likelihoods
    • human/machine readable, versionable, HEPData friendly, orders of magnitude smaller
  • .bold[Bidirectional translation] of likelihood specifications
    • ROOT workspaces ↔ JSON
  • Independent .bold[pure-Python implementation] of HistFactory + hypothesis testing
  • Publication for the first time of the .bold[full likelihood] of a search for new physics

.kol-1-2.center.width-100[ likelihood_publishing_agreement (1st Workshop on Confidence Limits, CERN, 2000) ] .kol-1-2.center.width-95[ PUB_note_cover (ATLAS, 2019) ]


class: end-slide, center

Backup


Best-fit parameter values

.center.width-90[fit_results]


JSON Patch files for new signal models


$ pyhf cls example.json | jq .CLs_obs
0.3599845631401913
$ cat new_signal.json
[{
    "op": "replace",
    "path": "/channels/0/samples/0/data",
    "value": [5.0, 6.0]
}]
$ pyhf cls example.json --patch new_signal.json | jq .CLs_obs
0.4764263982925686

...which can be streamed from HEPData


# One signal model
$ curl -sL https://bit.ly/33TVZ5p | \
  tar -O -xzv RegionA/BkgOnly.json | \
  pyhf cls --patch <(curl -sL https://bit.ly/33TVZ5p | \
      tar -O -xzv RegionA/patch.sbottom_1300_205_60.json) | \
  jq .CLs_obs
0.24443635754482018
# A different signal model
$ curl -sL https://bit.ly/33TVZ5p | \
  tar -O -xzv RegionA/BkgOnly.json | \
  pyhf cls --patch <(curl -sL https://bit.ly/33TVZ5p | \
      tar -O -xzv RegionA/patch.sbottom_1300_230_100.json) | \
  jq .CLs_obs
0.040766025813435774

References

  1. ROOT collaboration, K. Cranmer, G. Lewis, L. Moneta, A. Shibata and W. Verkerke, .italic[HistFactory: A tool for creating statistical models for use with RooFit and RooStats], 2012.
  2. L. Heinrich, H. Schulz, J. Turner and Y. Zhou, .italic[Constraining $A_{4}$ Leptonic Flavour Model Parameters at Colliders and Beyond], 2018.

class: end-slide, center count: false

The end.