DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Fermilab Computing Resource Scrutiny Group • Modeled on CERN CRSG • Semi-External committee that looks at the computing requests from experiments, but also the Fermilab plans to address those requests • Goal: evaluate usage requests, plan for future of computing at Fermilab – Not limited to M&S – SWF is the largest portion of our budget – Not limited to Computing and Detector Operations • Note, however, that CMS computing is separately reviewed – Not including Scientific Software development • Time frame: next year + experiment-specific horizon • Scrutiny should focus on incremental costs (SWF + M&S) – Including custom solutions in software, etc. The picture can't 2 be displayed.
Outcome • Several questions from Brian Bockelman about network resource needs – Currently have coverage for the SURF -> FNAL plans – Asked to expand the planning of network resources to include production processing and data distribution • Should expand planning to include tape/disk read/write rates to understand potential needs for drives/”spindles” • Initial response from the committee was very positive • Expect a close out report this week • Projections also presented to the RRB by Peter Clarke • Want to document the assumptions and computing model – updated annually The picture can't 5 be displayed.
Important Dates to Remember 2021 2022 2023 Likely peak Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 pDUNE SP Prod4 ProtoDUNE SP Prod 5 DUNE Computing CDR ProtoDUNE HD Prod 1 Likely peak ProtoDUNE II Horizontal Drift Operations ProtoDUNE II Vertical Drift Operations DUNE 7
CPU and Memory Efficiency Over the Last Year CPU Efficiency getting better Memory efficiency slight improvement DUNE 9
Memory Footprint & Efficiency Over the Last Year Efficiency getting a little better Envelope increasing slightly DUNE 11
Memory Efficiency Prod vs Ana Over the Last Year Production Analysis DUNE 12
Memory Footprint Prod vs Ana Over the Last Year Production Analysis DUNE 13
CPU - Prediction Going Forward and Accuracy of Your Predictions [units of Million (1 CPU, 2GB) wall hours per CY] 2018 2019 2020 2021 2022 2023 Requested 25 (FNAL) 29 FNAL 30 FNAL 30 FNAL (36 Total) 40 Total 64 Total 73 Total Actual 33.29 29.15 (GPGrid) N/A N/A N/A Used 42.7(WLCG+GPGrid) 3.85 (NERSC) Efficiency % 91% % N/A N/A N/A DUNE 14
CPU Adaptations Going Forward How can experiment use OSG/HPC/Cloud/HEPCloud going forward? For cloud resources, please describe funding mechanisms. For HPC, please describe plans for acquiring allocations. Do you anticipate using heterogeneous computing resources, e.g., GPUs? DUNE 15
Disk: dCache Usage and Predictions (in TB) Other Analysis Dedicated (Persistent) (Write) 2430 TB 396 TB (max) Current (actual) 585 TB (scratch) 2021 600 TB 5300 TB 2022 800 TB 9800 TB Total r/w (tape backed): 6264 TB Total scratch: 2333 TB Total persistent: 2576 TB Will not track cache usage, but need to know of Total other: 2131 TB 2023 800 TB 9200 TB unusual requests DUNE 16
Tape - Usage and Predictions (in PB) Total Added By End of Year 12060 TB (total) At end 2020 2900 TB (added) 2021 3000 TB (added) All needs to be migrated 2022 9200 TB (added) except for 95 TB of LBNE data 2023 6600 TB (added) DUNE 17
Disk: NAS Usage and Predictions (in TB Units) App Data 2021 15 70 2022 17 80 2023 19 90 DUNE 18
Age of files in NAS The picture can't be displayed. 19
Data Lifetimes Keep two copies of raw data on tape 1 copy of ”test” data for 6 months 1+ copy of reco/sim on tape currently assume 1 reco and 1 sim pass/year but reco passes go over previous years for each data set of reconstruction Keep 2 disk copies of reco and sim assume reco/sim resident on disk for 2 years impose shorter lifetimes on tests and intermediate sim steps DUNE 20
What Do You Want to Achieve in Computing Over Next Three Years Where does the Where does SCD need to Goals experiment need to contribute contribute Demonstration of computing engagement with Inter. Global Pool, Auth efforts, model and data event Sites and WLCG, GlideInWMS, HEPCloud, model with ProtoDUNE II accounting and int (CRIC, networking, etc. EGI tools, etc.) Fully integrated Data Integration of RSEs, req. for Rucio development, data Management Infrastructure metadata, discovery, workflow mgmt., storage R&D Transition analyzers to Joint effort Joint effort POMS Unified Framework Complete task force doc, SCD personnel DUNE organize 21
Anything else? DUNE 22
You can also read