SBN FCRSG 2021: Computing Model - Wesley Ketchum and Joseph Zennamo - Fermilab
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
SBN Program • Three LArTPCs (SBND, MicroBooNE, and ICARUS) on the BNB – ICARUS: 22M data events per year, 600k neutrinos detected per year – SBND: 75M data events per year, 2M neutrinos detected per year • Physics goals – Sterile neutrino searches via ! appearance and " disappearance measurements – -Ar cross section measurements (from BNB and NuMI beams) – Beyond Standard Model particle searches – LArTPC detector R&D and detector properties measurements 2 29 March 2021 SBN FCRSG 2021
SBN Collaboration • For purposes here, SBN really refers to ICARUS (far detector) and SBND (near detector) – Focus is on completing the construction and commissioning of these detectors, and preparing for joint physics results – MicroBooNE is clearly at a far different and steadier state in executing its physics program • Future joint analyses with MicroBooNE data are envisaged • ICARUS and SBND collaboration structures exist, but there is also a joint SBN collaboration, and many common SBN groups – SBN Analysis Group – SBN DAQ and Data pre-processing group (“Common Online”) – SBN Analysis Infrastructure Group 3 29 March 2021 SBN FCRSG 2021
Organization for Offline Computing SBND Physics and Analysis Tools ICARUS Sim and Reco Software (Costas Andreopoulus, (Tracy Usher and Daniele Gibin) Bill Louis, and Andrzej Szelc) Release Management: Release Management: Tracy Usher Patrick Green Production Management: Production Management: Maya Wospakrik Mateus Carneiro • Each experiment has an existing software and computing structure • Common SBN Analysis Infrastructure group to organize common needs and efforts 4 29 March 2021 SBN FCRSG 2021
Data, Computing, and Software Coordination • SBN program has agreed on a “Statement of Principles for Data Sharing, Analyzing, and Publication” • This includes (and I paraphrase…) – Common and agreed upon strategy for data-taking – Prompt availability of data with equal access for all SBN members – Software tools for data analysis to be fully shared within SBN • SBN Institutional Board has recently commissioned a committee to develop formal policy and define responsibilities on dataset and computing coordination 5 29 March 2021 SBN FCRSG 2021
Overview Timeline 2021 2022 2023 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 ICARUS Commissioning/First Neutrinos ICARUS Full Operations SBND Commissioning SBND Full Operations ICARUS Keep-up processing SBND data processing ICARUS + SBN Joint Analysis Software Release Preparation Production Run Software Release Preparation ICARUS + SBND Production Runs • ICARUS commissioning in last stages, with first neutrino data ~now, and full detector ready for physics in fall 2021 • SBND timelines about ~1 year offset from ICARUS à first physics data in 2022-2023 • Anticipated major productions in fall à spring – 2021-2022: supporting ICARUS first data/analyses and large-scale analysis combination tests with SBND simulation – 2022-2023: in support of first SBND data/analyses, ICARUS 2nd generation analyses, and first joint analyses 6 29 March 2021 SBN FCRSG 2021
Computing Model: General Notes • Baseline assumptions are that computing and data storage are centered at Fermilab – We do use common job submission resources and opportunistically use OSG/additional resources – Actively developing and testing workflows for use of HPC resources • ICARUS working with SciDAC projects on algorithm optimization and raw data processing and calibration workflows • SBND demonstrated large-scale production of simulation and reconstruction on ALCF (Theta) (250K events!) – …but the latter are not yet incorporated into a full computing model 7 29 March 2021 SBN FCRSG 2021
Computing Model Projections: Data-Taking • Raw data – Assume ICARUS and SBND have commissioning periods with ~5 Hz for 1 month, 3 Hz for 1 month, and 2 Hz trigger rate for 1 month, before settling in to ”steady” physics operations • ICARUS: 0.8 Hz on-beam trigger rate, 0.5 Hz off beam • SBND: 2.6 Hz on-beam trigger rate, 1.8 Hz off beam – Half of ‘commissioning’ data eventually retired, otherwise all raw data stored • ICARUS: 170 MB/ev • SBND: 32.5 MB/ev • “Keep-up” processing – Assume we want to run reco on all events we take as we take it • ICARUS: 120 s/ev for first stage, 100 s/ev for second, 4GB mem – ~435 slots during normal operations • SBND: 60 s/ev for first stage, 30 s/ev for second, 2GB mem – ~215 slots during normal operations – Assume we put only final-stage reco files to tape, and delete after a year • Data transfers – Assume all ICARUS raw BNB data sent to CNAF • Total expected volume ~25-50 MB/s – Identifying location for backup of SBN data 8 29 March 2021 SBN FCRSG 2021
Detector data diagram (ICARUS Example) (courtesy M. Wospakrik) 9 29 March 2021 SBN FCRSG 2021
Computing Model Projections: Production Campaigns • We assume one major campaign in 2021, and two per year after that • We assume simulation datasets at 10x collected data stats for BNB, 5x for ICARUS NuMI – Assume additional samples for systematics variations, BSM physics signals, and cosmics simulation • We assume we will move to ‘data overlay’ in simulation to have data- driven model of cosmic-ray backgrounds and noise – Use ‘off-beam’ events as underlying event, and ‘overlay’ neutrino simulation on top of them – Transfers a high memory/high CPU problem to a data I/O problem, which has implications for processing data elsewhere • Plan to investigate ways in which we can improve efficiency of overlay procedures • We assume we reprocess all available physics data from raw • Keep only final reco output to tape – Keep half of produced datasets for one year, and the other half for two – Produce and keep slim analysis-level tuples (more) permanently 10 29 March 2021 SBN FCRSG 2021
Computing Model Projections: Input Assumptions • ICARUS – “Stage 0” (up to hit-finding): 120 s/ev, 50 MB/ev, 4 GB mem • Drop raw data – “Stage 1” (up to 3D reco): 100 s/ev, 45 MB/ev, 4 GB mem • Drop wires – Sim: 240s/ev for events with cosmics (6 GB mem), 50 s/ev for nu-only/overlay (3 GB mem) • Final simulation files: 60 MB/event • SBND – Up to hit-finding: 60 s /ev, 2.7 MB/ev, 2 GB mem – Up to 3D reco: 30 s /ev, 4.2 MB/ev, 2 GB mem – Sim: 240 s/ev for events with cosmics (3 GB mem), 20 s/ev for nu-only/overlay (2 GB) • Final simulation files: 6.2 MB/event 11 29 March 2021 SBN FCRSG 2021
Big data still a big (biggest?) problem • Tape usage estimates are dramatically lower than previous years due to work on data-size reductions – BIG gains on SBND, which matter a lot due to number of simulation events – ICARUS harder to make order magnitude gains, but we are working on it – Will require careful tracking to stick to these numbers, and further data management and workflow optimizations • We calculate peak data read+write from tape at ~8.3 TB/hr à not known if this is truly achievable from tape and will represent significant challenge – Driven by large raw data sizes, and use of raw data in simulation – Currently assume 90 days for campaigns now à lower available I/O rate would immediately imply longer campaigns – We are definitely looking at ways to improve • Significant gains if we can ‘freeze’ up to hit-finding, and re-run productions from there • Put less permanent files (e.g. production output to be retired within year) to disk rather than tape • Improvements in workflows and algorithms to reduce data I/O • We are planning to use Rucio for automated data management to set rules for data transfers and help enforce lifetimes – Critically need SCD support for this 12 29 March 2021 SBN FCRSG 2021
Note on data lifetimes • Our computing model takes into account data lifetimes – Keep all raw data forever • Working out offsite replication plans now – Keep data from production campaigns for ~2 years – Keep final analysis data (‘skims’) used directly in published results kept forever, but will be much smaller • Still in development, but expect
General notes • SBN currently at a critical time with commissioning and transition to operations of the first detector (ICARUS) • Developments with real data will very likely lead to sophistication of algorithms, which we hope will be balanced with continued improvements in optimization of algorithms • Major new focus is on optimization of production workflows, particularly noting use of HPC in production – Will need to work with collaboration and labs on access to HPC for production • Active effort towards updating and improving our computing model to give more visibility into likely bottlenecks/problems as we develop new workflows 14 29 March 2021 SBN FCRSG 2021
Final note on looking forward • Our computing model (and the resource requests/predictions that come from it) is rather static and operates within the constraints of what we (SBN) think is possible for large-scale production – Limited by our knowledge of what is/isn’t possible with current systems – Limited by our knowledge of what would/wouldn’t be possible with strategic improvements in computing infrastructure • We hope that updating our computing modeling will make it easier to … – Clarify bottlenecks and identify impactful changes in CPU, memory, and data efficiency – Add and evaluate impact of new workflows • Well before FCRSG presentation next year, suggest pre-review of our model and resource usage and have it better reflect strategic planning for computing at FNAL and in broader HEP – Likely requires a detailed back-and-forth between SBN and SCD on how to approach optimization with respect to available resources 15 29 March 2021 SBN FCRSG 2021
Backup 16
You can also read