DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL

Page created by Ivan Hodges
 
CONTINUE READING
DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
DUNE FCRSG 2021

                  The picture can't
                  be displayed.
DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
Fermilab Computing Resource Scrutiny Group

    •   Modeled on CERN CRSG
    •   Semi-External committee that looks at the computing requests from
        experiments, but also the Fermilab plans to address those requests

    •   Goal: evaluate usage requests, plan for future of computing at Fermilab
         – Not limited to M&S – SWF is the largest portion of our budget
         – Not limited to Computing and Detector Operations
    •   Note, however, that CMS computing is separately reviewed
         – Not including Scientific Software development
    •   Time frame: next year + experiment-specific horizon
    •   Scrutiny should focus on incremental costs (SWF + M&S)
         – Including custom solutions in software, etc.
                                                                                  The picture can't
2                                                                                 be displayed.
DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
DUNE Computing Model

                           The picture can't

        Heidi’s slide
3                          be displayed.
DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
DUNE Resource Request in tabular form

                                            The picture can't
4                                           be displayed.
DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
Outcome

    •   Several questions from Brian Bockelman about network resource needs
         – Currently have coverage for the SURF -> FNAL plans
         – Asked to expand the planning of network resources to include production
              processing and data distribution
    •   Should expand planning to include tape/disk read/write rates to understand
        potential needs for drives/”spindles”
    •   Initial response from the committee was very positive
    •   Expect a close out report this week
    •   Projections also presented to the RRB by Peter Clarke
    •   Want to document the assumptions and computing model – updated annually

                                                                            The picture can't
5                                                                           be displayed.
DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
DUNE Organization Chart for Offline Computing

DUNE                                            6
DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
Important Dates to Remember

   2021                                        2022                                                          2023

                                                                                               Likely peak
       Q1         Q2         Q3         Q4           Q1            Q2   Q3             Q4                    Q1            Q2        Q3   Q4

 pDUNE SP Prod4

                                             ProtoDUNE SP Prod 5

                   DUNE Computing CDR
                                                                                            ProtoDUNE HD Prod 1
                                                Likely peak
                                                                             ProtoDUNE II Horizontal Drift Operations

                                                                                            ProtoDUNE II Vertical Drift Operations

DUNE                                                                                                                                           7
DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
CPU - Experiment Usage Over the Last Year

                    Equipped to
                 exploit HEPCloud
                  elasticity? YES!

DUNE                                        8
DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
CPU and Memory Efficiency Over the Last Year

                                  CPU Efficiency getting better

   Memory efficiency slight
       improvement

DUNE                                                              9
DUNE FCRSG 2021 The picture can't be displayed - INDICO-FNAL
CPU Efficiency Comb vs Ana Over the Last Year

       Combined

       Analysis

DUNE                                            10
Memory Footprint & Efficiency Over the Last Year

                                   Efficiency getting a little better

 Envelope increasing slightly

DUNE                                                                    11
Memory Efficiency Prod vs Ana Over the Last Year

       Production

        Analysis

DUNE                                               12
Memory Footprint Prod vs Ana Over the Last Year

       Production

        Analysis

DUNE                                              13
CPU - Prediction Going Forward and Accuracy of Your Predictions [units of
Million (1 CPU, 2GB) wall hours per CY]

                   2018    2019        2020              2021       2022       2023

  Requested                       25 (FNAL)           29 FNAL    30 FNAL    30 FNAL
                                  (36 Total)          40 Total   64 Total   73 Total

  Actual                  33.29   29.15 (GPGrid)      N/A        N/A        N/A
  Used                            42.7(WLCG+GPGrid)
                                  3.85 (NERSC)

  Efficiency   %          91%     %                   N/A        N/A        N/A

DUNE                                                                                   14
CPU Adaptations Going Forward
How can experiment use OSG/HPC/Cloud/HEPCloud going forward?

        For cloud resources, please describe funding mechanisms.

        For HPC, please describe plans for acquiring allocations.

Do you anticipate using heterogeneous computing resources, e.g., GPUs?

DUNE                                                                     15
Disk: dCache Usage and Predictions (in TB)
                                                                                             Other
                                                                              Analysis
                                                                                           Dedicated
                                                                            (Persistent)
                                                                                            (Write)

                                                                                           2430 TB
                                                                              396 TB         (max)
                                                                  Current
                                                                              (actual)      585 TB
                                                                                           (scratch)

                                                                   2021       600 TB       5300 TB

                                                                   2022       800 TB       9800 TB
 Total r/w (tape backed): 6264 TB
 Total scratch: 2333 TB
 Total persistent: 2576 TB          Will not track cache usage,
                                    but need to know of
 Total other: 2131 TB                                              2023       800 TB       9200 TB
                                    unusual requests
DUNE                                                                                                   16
Tape - Usage and Predictions (in PB)

                                                     Total Added By
                                                      End of Year

                                                     12060 TB (total)
                                       At end 2020
                                                     2900 TB (added)

                                          2021       3000 TB (added)

                      All needs to
                      be migrated         2022       9200 TB (added)
                      except for 95
                       TB of LBNE
                          data
                                          2023       6600 TB (added)

DUNE                                                                    17
Disk: NAS Usage and Predictions (in TB Units)

                                          App   Data

                                   2021   15    70

                                   2022   17    80

                                   2023   19    90

DUNE                                                   18
Age of files in NAS

                       The picture can't
                       be displayed.
19
Data Lifetimes
Keep two copies of raw data on tape

1 copy of ”test” data for 6 months
1+ copy of reco/sim on tape
         currently assume 1 reco and 1 sim pass/year but reco passes go over
         previous years for each data set of reconstruction

Keep 2 disk copies of reco and sim
        assume reco/sim resident on disk for 2 years
        impose shorter lifetimes on tests and intermediate sim steps

DUNE                                                                           20
What Do You Want to Achieve in Computing Over
Next Three Years
                                         Where does the
                                                                    Where does SCD need to
                 Goals                 experiment need to
                                                                          contribute
                                           contribute

       Demonstration of computing     engagement with Inter.        Global Pool, Auth efforts,
         model and data event            Sites and WLCG,            GlideInWMS, HEPCloud,
        model with ProtoDUNE II      accounting and int (CRIC,           networking, etc.
                                          EGI tools, etc.)

         Fully integrated Data      Integration of RSEs, req. for    Rucio development, data
       Management Infrastructure              metadata,             discovery, workflow mgmt.,
                                                                           storage R&D

         Transition analyzers to             Joint effort                  Joint effort
                 POMS

           Unified Framework         Complete task force doc,            SCD personnel
DUNE                                        organize                                             21
Anything else?

DUNE             22
You can also read