Generative Networks for LHC
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Generative Networks Tilman Plehn GAN basics 1- Events 2- Inverting Generative Networks for LHC Tilman Plehn Universität Heidelberg CMS 1/2021
Generative Networks Simulations for future LHC runs Tilman Plehn Unique: fundamental understanding of lots of data GAN basics 1- Events – precision theory predictions 2- Inverting – precision simulations – precision measurements ⇒ What’s needed to keep the edge? ATLAS Preliminary 2020 Computing Model -CPU: 2030: Aggressive R&D 2% 10% 8% 12% 13% Data Proc 7% MC-Full(Sim) MC-Full(Rec) MC-Fast(Sim) MC-Fast(Rec) EvGen 16% Heavy Ions 18% Data Deriv MC Deriv 5% Analysis 8%
Generative Networks Simulations for future LHC runs Tilman Plehn Unique: fundamental understanding of lots of data GAN basics 1- Events – precision theory predictions 2- Inverting – precision simulations – precision measurements ⇒ What’s needed to keep the edge? Event generation towards HL-LHC – simulated event numbers scaling with the expected events [factor 25] – general move to NLO/NNLO as standard [5% error] – higher relevant final-state multiplicities [jet recoil, extra jets, WBF, etc.] – additional low-rate high-multiplicity backgrounds – specific precision predictions not available in standard generators [N3 LO in MC?] – interpretation of measurements with general signal hypothesis [jets+MET]
Generative Networks Simulations for future LHC runs Tilman Plehn Unique: fundamental understanding of lots of data GAN basics 1- Events – precision theory predictions 2- Inverting – precision simulations – precision measurements ⇒ What’s needed to keep the edge? Event generation towards HL-LHC – simulated event numbers scaling with the expected events [factor 25] – general move to NLO/NNLO as standard [5% error] – higher relevant final-state multiplicities [jet recoil, extra jets, WBF, etc.] – additional low-rate high-multiplicity backgrounds – specific precision predictions not available in standard generators [N3 LO in MC?] – interpretation of measurements with general signal hypothesis [jets+MET] Three ways to use ML – improve current tools: iSherpa, ML-MadGraph, etc – new ideas, like fast ML-generator-networks – conceptual ideas in theory simulations and analyses
Generative Networks GAN algorithm Tilman Plehn Generating events GAN basics 1- Events – training: true events {xT } 2- Inverting output: generated events {r } → {xG } – discriminator constructing D(x) by minimizing [classifier D(x) = 1, 0 true/generator] LD = − log D(x) xT + − log(1 − D(x)) xG – generator constructing r → xG by minimizing [D needed] LG = − log D(x) xG – equilibrium D = 0.5 ⇒ LD = LG = 1 ⇒ statistically independent copy of training events
Generative Networks GAN algorithm Tilman Plehn Generating events GAN basics 1- Events – training: true events {xT } 2- Inverting output: generated events {r } → {xG } – discriminator constructing D(x) by minimizing [classifier D(x) = 1, 0 true/generator] – generator constructing r → xG by minimizing [D needed] ⇒ statistically independent copy of training events Generative network studies [review 2008.08558] – Jets [de Oliveira (2017), Carrazza-Dreyer (2019)] – Detector simulations [Paganini (2017), Musella (2018), Erdmann (2018), Ghosh (2018), Buhmann (2020)] – Events [Otten (2019), Hashemi, DiSipio, Butter (2019), Martinez (2019), Alanazi (2020), Chen (2020), Kansal (2020)] – Unfolding [Datta (2018), Omnifold (2019), Bellagente (2019), Bellagente (2020)] – Templates for QCD factorization [Lin (2019)] – EFT models [Erbin (2018)] – Event subtraction [Butter (2019)] – Sherpa [Bothmann (2020), Gao (2020)] – Basics [GANplification (2020), DCTR (2020)] – Unweighting [Verheyen (2020), Backes (2020)] – Superresolution [DiBello (2020), Blecher (2020)]
Generative Networks GANplification Tilman Plehn Gain beyond training data [Butter, Diefenbacher, Kasieczka, Nachman, TP] GAN basics 1- Events – true function known 0.18 compare GAN vs sampling vs fit 10 quantiles truth 2- Inverting 0.16 GAN trained on 100 data points fit Sample – quantiles with χ2 -values 0.14 GAN 0.12 0.10 p(x) 0.08 0.06 0.04 0.02 0.00 8 6 4 2 0 2 4 6 8 x
Generative Networks GANplification Tilman Plehn Gain beyond training data [Butter, Diefenbacher, Kasieczka, Nachman, TP] GAN basics 1- Events – true function known 0.18 compare GAN vs sampling vs fit 10 quantiles truth 2- Inverting 0.16 GAN trained on 100 data points fit Sample – quantiles with χ2 -values 0.14 GAN 0.12 – fit like 500-1000 sampled points 0.10 GAN like 500 sampled points [amplifictation factor 5] p(x) 0.08 requiring 10,000 GANned events 0.06 0.04 0.02 0.00 8 6 4 2 0 2 4 6 8 x 50 quantiles GAN 100 data points quantile MSE sample 10 2 200 300 fit 101 102 103 104 105 106 number GANed
Generative Networks GANplification Tilman Plehn Gain beyond training data [Butter, Diefenbacher, Kasieczka, Nachman, TP] GAN basics 1- Events – true function known 0.18 compare GAN vs sampling vs fit 10 quantiles truth 2- Inverting 0.16 GAN trained on 100 data points fit Sample – quantiles with χ2 -values 0.14 GAN 0.12 – fit like 500-1000 sampled points 0.10 GAN like 500 sampled points [amplifictation factor 5] p(x) 0.08 requiring 10,000 GANned events 0.06 – 5-dimensional Gaussian shell 0.04 sparsely populated 0.02 amplification vs quantiles 0.00 8 6 4 2 0 2 4 6 8 x – fit-like additional information – interpolation and resolution the key [NNPDF] 200 ⇒ GANs enhance training data GAN amplification factor 150 100 50 0 3 4 5 6 7 8 9 10 quantiles per dimension
Generative Networks How to GAN LHC events Tilman Plehn Idea: replace ME for hard process [Butter, TP, Winterhalder] GAN basics 1- Events – medium-complex final state t t̄ → 6 jets 2- Inverting t/t̄ and W ± on-shell with BW 6 × 4 = 18 dof on-shell external states → 12 dof [constants hard to learn] – flat observables flat [phase space coverage okay] – direct observables with tails [statistical error indicated] – constructed observables similar 3 ⇥10 6.0 True [GeV 1] GAN 4.0 dpT,t 2.0 1 d 0.0 1.2 GAN True 1.0 0.8 1.0 pT,t [GeV] Ntail 20% p1 0.1 0 50 100 150 200 250 300 350 400 pT,t [GeV]
Generative Networks How to GAN LHC events Tilman Plehn Idea: replace ME for hard process [Butter, TP, Winterhalder] GAN basics 1- Events – medium-complex final state t t̄ → 6 jets 2- Inverting t/t̄ and W ± on-shell with BW 6 × 4 = 18 dof on-shell external states → 12 dof [constants hard to learn] – flat observables flat [phase space coverage okay] – direct observables with tails [statistical error indicated] – constructed observables similar 1M true events ×101 3 – improved resolution [1M training events] 5.0 2 4.0 1 φ j2 0 3.0 −1 2.0 −2 1.0 −3 −3 −2 −1 0 1 2 3 φ j1
Generative Networks How to GAN LHC events Tilman Plehn Idea: replace ME for hard process [Butter, TP, Winterhalder] GAN basics 1- Events – medium-complex final state t t̄ → 6 jets 2- Inverting t/t̄ and W ± on-shell with BW 6 × 4 = 18 dof on-shell external states → 12 dof [constants hard to learn] – flat observables flat [phase space coverage okay] – direct observables with tails [statistical error indicated] – constructed observables similar 10M generated events ×102 3 – improved resolution [10M generated events] 3.5 2 1 3.0 φ j2 0 2.5 −1 2.0 −2 1.5 −3 −3 −2 −1 0 1 2 3 φ j1
Generative Networks How to GAN LHC events Tilman Plehn Idea: replace ME for hard process [Butter, TP, Winterhalder] GAN basics 1- Events – medium-complex final state t t̄ → 6 jets 2- Inverting t/t̄ and W ± on-shell with BW 6 × 4 = 18 dof on-shell external states → 12 dof [constants hard to learn] – flat observables flat [phase space coverage okay] – direct observables with tails [statistical error indicated] – constructed observables similar 50M generated events ×103 3 1.8 – improved resolution [50M generated events] – Proof of concept 2 1.6 1 1.4 φ j2 0 1.2 −1 1.0 −2 0.8 −3 −3 −2 −1 0 1 2 3 φ j1
Generative Networks Chemistry of loss functions Tilman Plehn GAN version of adaptive sampling GAN basics 1- Events – generally 1D features 2- Inverting phase space boundaries kinematic cuts invariant masses [top, W ] – batch-wise comparison of distributions, MMD loss with kernel k 2 0 0 MMD = k(x, x ) xT ,x 0 + k(y, y ) yG ,y 0 − 2 k(x, y) xT ,yG T G 2 LG → LG + λG MMD ,
Generative Networks Chemistry of loss functions Tilman Plehn GAN version of adaptive sampling GAN basics 1- Events – generally 1D features 2- Inverting phase space boundaries kinematic cuts invariant masses [top, W ] – batch-wise comparison of distributions, MMD loss with kernel k 2 0 0 MMD = k(x, x ) xT ,x 0 + k(y, y ) yG ,y 0 − 2 k(x, y) xT ,yG T G 2 LG → LG + λG MMD , ×10−1 ×10−1 True True 4.0 Breit-Wigner 4.0 ΓSM 1 Gauss 4 ΓSM 3.0 No MMD 3.0 4ΓSM [GeV−1] [GeV−1] 2.0 2.0 σ dmt σ dmt 1 dσ 1 dσ 1.0 1.0 0.0 0.0 160 165 170 175 180 185 160 165 170 175 180 185 mt [GeV] mt [GeV]
Generative Networks Unweighting Tilman Plehn Gaining beyond GANpliflication [Butter, TP, Winterhalder; Clausius’ talk] GAN basics 1- Events – phase space sampling: weighted events [PS weight ×|M|2 ] 2- Inverting events: constant weights – probabilistic unweighting weak spot of standard MC – learn phase space patterns [density estimation] generate unweighted events [through loss function] – compare training, GAN, classic unweighting Train 10−1 Unweighted −2 uwGAN 10 σ dEµ− 10−3 1 dσ 10−4 10−5 10−6 2.0 1.5 Truth X 1.0 0.5 0 1000 2000 3000 4000 5000 6000 Eµ− [GeV]
Generative Networks How to GAN away detector effects Tilman Plehn Goal: invert Markov processes [Bellagente, Butter, Kasiczka, TP, Winterhalder] GAN basics 1- Events – detector simulation typical Markov process 2- Inverting – inversion possible, in principle [entangled convolutions] – GAN task DELPHES GAN partons −→ detector −→ partons ⇒ Full phase space unfolded Conditional GAN – map random numbers to parton level hadron level as condition [matched event pairs]
Generative Networks Detector unfolding Tilman Plehn Reference process pp → ZW → (``) (jj) GAN basics 1- Events – broad jj mass peak 2- Inverting narrow `` mass peak modified 2 → 2 kinematics fun phase space boundaries – GAN same as event generation [with MMD] Simple application 2 1 ⇥10 ⇥10 2.5 Truth 3.0 Truth FCGAN FCGAN 2.0 2.5 [GeV 1] [GeV 1] Delphes Delphes 2.0 1.5 1.5 dpT,j1 dmjj 1.0 1 d 1 d 1.0 0.5 0.5 0.0 0.0 1.2 1.2 FCGAN FCGAN Truth Truth 1.0 1.0 0.8 0.8 0 25 50 75 100 125 150 175 200 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 90.0 pT,j1 [GeV] mjj [GeV]
Generative Networks Detector unfolding Tilman Plehn Reference process pp → ZW → (``) (jj) GAN basics 1- Events – broad jj mass peak narrow `` mass peak 2- Inverting modified 2 → 2 kinematics fun phase space boundaries – GAN same as event generation [with MMD] Simple application – detector-level cuts [14%, 39% events, no interpolation, MMD not conditional] pT ,j1 = 30 ... 50 GeV pT ,j2 = 30 ... 40 GeV pT ,`− = 20 ... 50 GeV (12) pT ,j1 > 60 GeV (13) 2 1 ⇥10 ⇥10 4.0 6.0 Truth Truth FCGAN 3.5 FCGAN 5.0 Eq.(12) 3.0 Eq.(12) Eq.(13) Eq.(13) 4.0 2.5 dpT,j1 dmjj 2.0 1 d 1 d 3.0 1.5 2.0 1.0 1.0 0.5 0.0 0.0 0 25 50 75 100 125 150 175 200 70.0 72.5 75.0 77.5 80.0 82.5 85.0 87.5 90.0 pT,j1 [GeV] mjj [GeV]
Generative Networks Detector unfolding Tilman Plehn Reference process pp → ZW → (``) (jj) GAN basics 1- Events – broad jj mass peak narrow `` mass peak 2- Inverting modified 2 → 2 kinematics fun phase space boundaries – GAN same as event generation [with MMD] Simple application – detector-level cuts [14%, 39% events, no interpolation, MMD not conditional] pT ,j1 = 30 ... 50 GeV pT ,j2 = 30 ... 40 GeV pT ,`− = 20 ... 50 GeV (12) pT ,j1 > 60 GeV (13) −3 – model dependence of unfolding ×10 6.0 Truth (W’) – train: SM events FCGAN test: 10% events with W 0 in s-channel 5.0 Truth (SM) [GeV−1] ⇒ Working fine, but ill-defined 4.0 3.0 σ dm``jj 1 dσ 2.0 1.0 0.0 600 800 1000 1200 1400 1600 1800 2000 m``jj [GeV]
Generative Networks Unfolding as inverting Tilman Plehn Invertible networks [Bellagente, Butter, Kasieczka, TP, Rousselot, Winterhalder, Ardizzone, Köthe] GAN basics 1- Events – network as bijective transformation — normalizing flow 2- Inverting Jacobian tractable — normalizing flow evaluation in both directions — INN [Ardizzone, Rother, Köthe] – building block: coupling layer – conditional: parton-level events from {r }
Generative Networks Unfolding as inverting Tilman Plehn Invertible networks [Bellagente, Butter, Kasieczka, TP, Rousselot, Winterhalder, Ardizzone, Köthe] GAN basics 1- Events – network as bijective transformation — normalizing flow 2- Inverting Jacobian tractable — normalizing flow evaluation in both directions — INN [Ardizzone, Rother, Köthe] – building block: coupling layer – conditional: parton-level events from {r } Properly defined unfolding [again pp → ZW → (``) (jj)] – performance on distributions like FCGAN – parton-level probability distribution for single detector event ⇒ Proper statistical unfolding single detector event 3200 unfoldings 1.0 1.4 FCGAN Parton Truth 1.2 0.8 events normalized fraction of events 1.0 0.6 0.8 FCGAN eINN 0.6 0.4 0.4 N 0.2 N cINN eINN cI 0.2 0.0 0.0 10 15 20 25 30 35 40 45 50 0.0 0.2 0.4 0.6 0.8 1.0 pT,q1 [GeV] quantile pT, q1
Generative Networks Unfolding as inverting Tilman Plehn Invertible networks [Bellagente, Butter, Kasieczka, TP, Rousselot, Winterhalder, Ardizzone, Köthe] GAN basics 1- Events – network as bijective transformation — normalizing flow 2- Inverting Jacobian tractable — normalizing flow evaluation in both directions — INN [Ardizzone, Rother, Köthe] – building block: coupling layer – conditional: parton-level events from {r } Unfolding initial-state radiation – detector-level process pp → ZW +jets [variable number of objects] – parton-level hard process chosen 2 → 2 [whatever you want] – ME vs PS jets decided by network [including momentum conservation] 2 jet 2 jet 104 3 jet 3 jet 4 jet 10−2 4 jet [GeV−1 ] [GeV−1 ] 103 σ d px / |px| 1 P dσP σ dpT,q1 1 dσ 10−3 102 10−4 101 20 40 60 80 100 120 140 -2.0 -1.5 -1.0 P -0.5 P 0.0 0.5 1.0 1.5 2.0 pT,q1 [GeV] px / |px | [GeV] ×10−4
Generative Networks Unfolding as inverting Tilman Plehn Invertible networks [Bellagente, Butter, Kasieczka, TP, Rousselot, Winterhalder, Ardizzone, Köthe] GAN basics 1- Events – network as bijective transformation — normalizing flow 2- Inverting Jacobian tractable — normalizing flow evaluation in both directions — INN [Ardizzone, Rother, Köthe] – building block: coupling layer – conditional: parton-level events from {r } Unfolding initial-state radiation – detector-level process pp → ZW +jets [variable number of objects] – parton-level hard process chosen 2 → 2 [whatever you want] – ME vs PS jets decided by network [including momentum conservation] ⇒ How systematically can we invert?
Generative Networks Outlook Tilman Plehn Machine learning for LHC theory GAN basics 1- Events – goal: data-to-data with fundamental physics input 2- Inverting – MC challenges higher-order precision in bulk coverage of tails unfolding to access fundamental QCD – neural network benefits best available interpolation training on MC and/or data, anything goes lightning speed, once trained – GANs the cool kid generator trying to produce best events discriminator trying to catch generator, – INNs the theory hope flow networks to control spaces invertible network the new tool Any ideas?
Generative Networks Backup: How to GAN event subtraction Tilman Plehn Idea: subtract samples without binning [Butter, TP, Winterhalder] GAN basics 1- Events – statistical uncertainty q 2- Inverting ∆B−S = ∆2B + ∆2S > max(∆B, ∆S) – applications in LHC physics soft-collinar subtraction, multi-jet merging on-shell subtraction background/signal subtraction – GAN setup 1. differential, steep class label 2. sample normalization
Generative Networks Subtracted events Tilman Plehn How to beat statistics by subtracting GAN basics 1- Events 1– 1D toy example 2- Inverting 1 1 PB (x) = + 0.1 PS (x) = ⇒ PB−S = 0.1 x x – statistical fluctuations reduced (sic!) 100 0.13 GAN vs Truth (B − S)GAN B (B − S)Truth ± 1σ 0.12 S B−S 10−1 0.11 P (x) P (x) 0.10 10−2 0.09 0.08 0 25 50 75 100 125 150 175 200 0 25 50 75 100 125 150 175 200 x x
Generative Networks Subtracted events Tilman Plehn How to beat statistics by subtracting GAN basics 1- Events 1– 1D toy example 2- Inverting 1 1 PB (x) = + 0.1 PS (x) = ⇒ PB−S = 0.1 x x – statistical fluctuations reduced (sic!) 2– event-based background subtraction [weird notation, sorry] + − + − + − pp → e e (B) pp → γ → e e (S) ⇒ pp → Z → e e (B-S) ⇥101 GAN vs Truth 2.0 B S [pb/GeV] 1.5 B S 1.0 dEe d 0.5 0.0 20 40 60 80 100 Ee [GeV]
Generative Networks Subtracted events Tilman Plehn How to beat statistics by subtracting GAN basics 1- Events 1– 1D toy example 2- Inverting 1 1 PB (x) = + 0.1 PS (x) = ⇒ PB−S = 0.1 x x – statistical fluctuations reduced (sic!) 2– event-based background subtraction [weird notation, sorry] + − + − + − pp → e e (B) pp → γ → e e (S) ⇒ pp → Z → e e (B-S) 3– collinear subtraction [assumed non-local] pp → Zg (B: matrix element, S: collinear approximation) GAN vs Truth 101 B S [pb/GeV] 0 B−S 10 10−1 dpT,g dσ 10−2 0 20 40 60 80 100 pT,g [GeV] ⇒ Applications in theory and analysis
You can also read