SBN FCRSG 2021: Computing Model - Wesley Ketchum and Joseph Zennamo - Fermilab

Page created by Darrell Knight
 
CONTINUE READING
SBN FCRSG 2021: Computing Model - Wesley Ketchum and Joseph Zennamo - Fermilab
SBN FCRSG 2021: Computing Model
Wesley Ketchum and Joseph Zennamo
SBN FCRSG 2021: Computing Model - Wesley Ketchum and Joseph Zennamo - Fermilab
SBN Program

• Three LArTPCs (SBND, MicroBooNE, and ICARUS) on the BNB
 – ICARUS: 22M data events per year, 600k neutrinos detected per year
 – SBND: 75M data events per year, 2M neutrinos detected per year
• Physics goals
 – Sterile neutrino searches via ! appearance and " disappearance
 measurements
 – -Ar cross section measurements (from BNB and NuMI beams)
 – Beyond Standard Model particle searches
 – LArTPC detector R&D and detector properties measurements

2 29 March 2021 SBN FCRSG 2021
SBN FCRSG 2021: Computing Model - Wesley Ketchum and Joseph Zennamo - Fermilab
SBN Collaboration
• For purposes here, SBN really refers to ICARUS (far
 detector) and SBND (near detector)
 – Focus is on completing the construction and commissioning of
 these detectors, and preparing for joint physics results
 – MicroBooNE is clearly at a far different and steadier state in
 executing its physics program
 • Future joint analyses with MicroBooNE data are envisaged
• ICARUS and SBND collaboration structures exist, but there is
 also a joint SBN collaboration, and many common SBN
 groups
 – SBN Analysis Group
 – SBN DAQ and Data pre-processing group (“Common Online”)
 – SBN Analysis Infrastructure Group

3 29 March 2021 SBN FCRSG 2021
SBN FCRSG 2021: Computing Model - Wesley Ketchum and Joseph Zennamo - Fermilab
Organization for Offline Computing

 SBND Physics and Analysis Tools
 ICARUS Sim and Reco Software (Costas Andreopoulus,
 (Tracy Usher and Daniele Gibin) Bill Louis, and
 Andrzej Szelc)

 Release Management: Release Management:
 Tracy Usher Patrick Green

 Production Management: Production Management:
 Maya Wospakrik Mateus Carneiro

• Each experiment has an existing software and computing
 structure
• Common SBN Analysis Infrastructure group to organize
 common needs and efforts

4 29 March 2021 SBN FCRSG 2021
Data, Computing, and Software Coordination
• SBN program has agreed on a “Statement of Principles for
 Data Sharing, Analyzing, and Publication”
• This includes (and I paraphrase…)
 – Common and agreed upon strategy for data-taking
 – Prompt availability of data with equal access for all SBN
 members
 – Software tools for data analysis to be fully shared within SBN

• SBN Institutional Board has recently commissioned a
 committee to develop formal policy and define responsibilities
 on dataset and computing coordination

5 29 March 2021 SBN FCRSG 2021
Overview Timeline

 2021 2022 2023
 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4

 ICARUS Commissioning/First Neutrinos ICARUS Full Operations

 SBND Commissioning SBND Full Operations

 ICARUS Keep-up processing

 SBND data processing

 ICARUS + SBN Joint Analysis
 Software Release Preparation
 Production Run

 Software Release Preparation ICARUS + SBND Production Runs

• ICARUS commissioning in last stages, with first neutrino data ~now, and full detector ready for
 physics in fall 2021
• SBND timelines about ~1 year offset from ICARUS à first physics data in 2022-2023
• Anticipated major productions in fall à spring
 – 2021-2022: supporting ICARUS first data/analyses and large-scale analysis combination tests with
 SBND simulation
 – 2022-2023: in support of first SBND data/analyses, ICARUS 2nd generation analyses, and first joint
 analyses

6 29 March 2021 SBN FCRSG 2021
Computing Model: General Notes
• Baseline assumptions are that computing and data storage
 are centered at Fermilab
 – We do use common job submission resources and
 opportunistically use OSG/additional resources
 – Actively developing and testing workflows for use of HPC
 resources
 • ICARUS working with SciDAC projects on algorithm optimization
 and raw data processing and calibration workflows
 • SBND demonstrated large-scale production of simulation and
 reconstruction on ALCF (Theta) (250K events!)
 – …but the latter are not yet incorporated into a full computing
 model

7 29 March 2021 SBN FCRSG 2021
Computing Model Projections: Data-Taking
• Raw data
 – Assume ICARUS and SBND have commissioning periods with ~5 Hz for 1 month,
 3 Hz for 1 month, and 2 Hz trigger rate for 1 month, before settling in to ”steady”
 physics operations
 • ICARUS: 0.8 Hz on-beam trigger rate, 0.5 Hz off beam
 • SBND: 2.6 Hz on-beam trigger rate, 1.8 Hz off beam
 – Half of ‘commissioning’ data eventually retired, otherwise all raw data stored
 • ICARUS: 170 MB/ev
 • SBND: 32.5 MB/ev
• “Keep-up” processing
 – Assume we want to run reco on all events we take as we take it
 • ICARUS: 120 s/ev for first stage, 100 s/ev for second, 4GB mem
 – ~435 slots during normal operations
 • SBND: 60 s/ev for first stage, 30 s/ev for second, 2GB mem
 – ~215 slots during normal operations
 – Assume we put only final-stage reco files to tape, and delete after a year
• Data transfers
 – Assume all ICARUS raw BNB data sent to CNAF
 • Total expected volume ~25-50 MB/s
 – Identifying location for backup of SBN data

8 29 March 2021 SBN FCRSG 2021
Detector data diagram (ICARUS Example)

 (courtesy M. Wospakrik)

9 29 March 2021 SBN FCRSG 2021
Computing Model Projections: Production Campaigns
• We assume one major campaign in 2021, and two per year after that
• We assume simulation datasets at 10x collected data stats for BNB, 5x for
 ICARUS NuMI
 – Assume additional samples for systematics variations, BSM physics signals,
 and cosmics simulation
• We assume we will move to ‘data overlay’ in simulation to have data-
 driven model of cosmic-ray backgrounds and noise
 – Use ‘off-beam’ events as underlying event, and ‘overlay’ neutrino simulation
 on top of them
 – Transfers a high memory/high CPU problem to a data I/O problem, which has
 implications for processing data elsewhere
 • Plan to investigate ways in which we can improve efficiency of overlay procedures
• We assume we reprocess all available physics data from raw
• Keep only final reco output to tape
 – Keep half of produced datasets for one year, and the other half for two
 – Produce and keep slim analysis-level tuples (more) permanently

10 29 March 2021 SBN FCRSG 2021
Computing Model Projections: Input Assumptions
• ICARUS
 – “Stage 0” (up to hit-finding): 120 s/ev, 50 MB/ev, 4 GB mem
 • Drop raw data
 – “Stage 1” (up to 3D reco): 100 s/ev, 45 MB/ev, 4 GB mem
 • Drop wires
 – Sim: 240s/ev for events with cosmics (6 GB mem), 50 s/ev for
 nu-only/overlay (3 GB mem)
 • Final simulation files: 60 MB/event
• SBND
 – Up to hit-finding: 60 s /ev, 2.7 MB/ev, 2 GB mem
 – Up to 3D reco: 30 s /ev, 4.2 MB/ev, 2 GB mem
 – Sim: 240 s/ev for events with cosmics (3 GB mem), 20 s/ev for
 nu-only/overlay (2 GB)
 • Final simulation files: 6.2 MB/event

11 29 March 2021 SBN FCRSG 2021
Big data still a big (biggest?) problem
• Tape usage estimates are dramatically lower than previous years due to work on data-size
 reductions
 – BIG gains on SBND, which matter a lot due to number of simulation events
 – ICARUS harder to make order magnitude gains, but we are working on it
 – Will require careful tracking to stick to these numbers, and further data management
 and workflow optimizations
• We calculate peak data read+write from tape at ~8.3 TB/hr à not known if this is truly
 achievable from tape and will represent significant challenge
 – Driven by large raw data sizes, and use of raw data in simulation
 – Currently assume 90 days for campaigns now à lower available I/O rate would
 immediately imply longer campaigns
 – We are definitely looking at ways to improve
 • Significant gains if we can ‘freeze’ up to hit-finding, and re-run productions from
 there
 • Put less permanent files (e.g. production output to be retired within year) to disk
 rather than tape
 • Improvements in workflows and algorithms to reduce data I/O
• We are planning to use Rucio for automated data management to set rules for data transfers
 and help enforce lifetimes
 – Critically need SCD support for this
12 29 March 2021 SBN FCRSG 2021
Note on data lifetimes
• Our computing model takes into account data lifetimes
 – Keep all raw data forever
 • Working out offsite replication plans now
 – Keep data from production campaigns for ~2 years
 – Keep final analysis data (‘skims’) used directly in published
 results kept forever, but will be much smaller
 • Still in development, but expect
General notes
• SBN currently at a critical time with commissioning and transition to
 operations of the first detector (ICARUS)

• Developments with real data will very likely lead to sophistication of
 algorithms, which we hope will be balanced with continued improvements
 in optimization of algorithms

• Major new focus is on optimization of production workflows, particularly
 noting use of HPC in production
 – Will need to work with collaboration and labs on access to HPC for
 production

• Active effort towards updating and improving our computing model to give
 more visibility into likely bottlenecks/problems as we develop new
 workflows

14 29 March 2021 SBN FCRSG 2021
Final note on looking forward
• Our computing model (and the resource requests/predictions that
 come from it) is rather static and operates within the constraints of
 what we (SBN) think is possible for large-scale production
 – Limited by our knowledge of what is/isn’t possible with current systems
 – Limited by our knowledge of what would/wouldn’t be possible with
 strategic improvements in computing infrastructure
• We hope that updating our computing modeling will make it easier
 to …
 – Clarify bottlenecks and identify impactful changes in CPU, memory,
 and data efficiency
 – Add and evaluate impact of new workflows
• Well before FCRSG presentation next year, suggest pre-review of
 our model and resource usage and have it better reflect strategic
 planning for computing at FNAL and in broader HEP
 – Likely requires a detailed back-and-forth between SBN and SCD on
 how to approach optimization with respect to available resources

15 29 March 2021 SBN FCRSG 2021
Backup

 16
You can also read