KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO

Page created by Brett Warner
 
CONTINUE READING
KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
Komitee für Hadronen- und Kernphysik

 KHuK – digitalisation priorities

ErUM-Data Community meeting
Jan 18/19 - Online
Kilian Schwarz (GSI)
KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
table of contents                                                                       Komitee für Hadronen- und Kernphysik

 motivation
 KHuK digitalisation requirements
 ongoing and upcoming projects
 summary

                                     Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021             2
KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
ALICE online computing/CERN
                                                                                         Komitee für Hadronen- und Kernphysik

               increased requirements for ALICE@Run3
               O2 Data Rates in GB/s for Pb-Pb @ 50 kHz

                                                     • continuous untriggered stream of data, have to
                                                       be distributed from about 250 First Level
                                                       Processor nodes (FLPs) to about 1500 Event
                                                       Processing Nodes (EPNs).
                                                     • TPC clusters not belonging to tracks are
                                                       suppressed
                                                     • in the end about 90 GB/s are written to Disk

graphic from
P. Buncic
KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
CBM online computing/FAIR Tier0                                                                                              Komitee für Hadronen- und Kernphysik

    CBM DAQ and online event selection                                                     novel readout system:
                                                                                           • free running data aqcuisition without
                                                                                             hardware triggers
                                                                                           • continous stream of time stamped
                                                                                             detector data
                                                                                           • full track and event reconstruction in real
  First-level                                                                                time
  Event                                                                                    • online data reduction (>100) by a
  Selector                                                                                   software trigger on events
                                                                                           • about 20 PB collected annually for offline
                                                                                             analysis

Hit and track time distribution for Au+Au 10A GeV collisions at 10 MHz (UrQMD)                                      GSI Green IT Cube

                                                                                                                  high rack storage, 100,000 cores
                                                                                                                  only 5% of total energy consumption
                                                                                                                  needed for cooling

                                                                                 Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021             4
KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
PANDA online processing/FAIR Tier0                                     Komitee für Hadronen- und Kernphysik

                                                •   software trigger with full event
                                                    reconstruction
                                                •   no fixed time between events  time
                                                    reconstruction of event
                                                •   all data from sub-detectors with time
                                                    stamps and varying resolution.
                                                •   higher efficiency by dynamically
                                                    allocating resources

                                                    • full online event reconstruction
                                                    • event filtering by additional factor 10
• 200 GByte/s   • FPGA based pre-processing         • 200 MByte/s for offline analysis
  input data    • data reduction by factor 10
KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
Theory – Lattice QCD                                          Komitee für Hadronen- und Kernphysik

                                             Software and
    Data Reduction and Transformation        Algorithmic Tools
              published results,
              open access, arXiv        MB   analysis-specific
collaboration                                codes and tools
specific data physical quantities
analysis      and relation to
              experimental data              analysis frameworks &
                                        GB   statistical tools
              lattice data analysis
              workflow
Data Lake     data reduction:                analysis- and HW-
              derived data sets, e.g.   TB
                                             optimized codes,
              correlators, cumulants         pre-selection of un-
                                             correlated data sets
Lattice       stored ensembles
Data Grid     – big data sets --        PB
HPC hard-   MC simulations to                access to hardware at
ware access generate „raw“ noisy data        European HPC centers
                                             & university clusters
KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
and many smaller communities which need to
                   be included                                                                       Komitee für Hadronen- und Kernphysik

                                                                                       S-DALINAC

                                       HADES

FAIR Phase 0 in JINR, JLAB, BNL, BES III, RIKEN                            and Theory
                                                  Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021             7
KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
table of contents                                                                       Komitee für Hadronen- und Kernphysik

 motivation
 KHuK digitalisation requirements
 ongoing and upcoming projects
 summary

                                     Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021             8
KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
online computing                                                                                                 Komitee für Hadronen- und Kernphysik

HuK experiments do real time reconstruction and event selection on large online
farms. The following topics have therefore a high priority:
 continuous read out of (all) sub detectors at very high interaction rates
 real-time systems require algorithmic performance
    − better and faster algorithms need to be developed enabling parallel processing
      of new data structures allowing parallel data streams, joint efforts are needed
 highly efficient usage of hardware resources, and low latency
 online alignment and calibration
 online reconstruction
    − event reconstruction and selection w.r.t. signatures of rare observables in real-time
    − online 4D tracking
    − fast time based event building
 online NN/ML/DL
    − new techniques need to be applied for real time decisions based on reliable analysis
      and be made available on dedicated hardware like FPGA
    − required for PID, trigger channel selection, ...
 data irreversibility issue  online data reduction has to be done with care
 same algorithms needed preferably for online and offline

                                                              Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021             9
KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
Big Data Analytics                                                                                               Komitee für Hadronen- und Kernphysik

 especially for the smaller HuK communities common simulation & analysis tools are needed
  which run on new generations of high performance computers
    − common code basis, also for the use of novel resources like GPUs would be helpful
 new algorithms and methods need to be applied to the experiment frameworks, which
  includes ML, NN, DL, MVA, bayesian tools for statistical analysis, quantum computing
    − for particle identification
    − for calibration/reconstruction
        • e,g, for calibration of ALICE TPC space charge distortions
          and fluctuations. At run 3, Pb-Pb @ 50 kHz a calibration
          interval of 5 ms is required. Method via CNN and supervised
          learning fast prediction.
        • fast reconstruction to close the resource challenge gap
    − for analysis
        • of rare probes, secondary decay vertecies, rejection of large combinatorial background
 employing ML and DL for efficient, fast, accurate event generation and detector simulation
  including correlated and uncorrelated background via generative models.
 FPGA programming (e.g. as time to digital converter with high time precision)
                                                              Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021           10
Federated Infrastructures                                                                                         Komitee für Hadronen- und Kernphysik

gigantic amounts of data volume of 100s of PB/year which need to be stored, distributed and
complex information to be extracted requires new approaches
 development of Data Lakes as new and more centralised storage concepts
    − including intelligent Big Data Management
    − efficient data access from anywhere
 efficient usage and integration of new heterogenous resources as HPC centres, super computers
  and Cloud systems in federated computing and storage systems (as e.g. WLCG)
    −   development of improved dynamic data caches
    −   workflow and framework optimisation
    −   application of virtualisation techniques in standard workflows
    −   usage of new architectures as GPU clusters via architecture
        overarching data processing
 more efficiency and flexibility through common usage of
  federated resources  ErUM Science Cloud
 increased network bandwidth between centres
 development of a federated infrastructure for FAIR
                                                                                              figure taken from PUNCH4NFDI

                                                               Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021           11
Research Data Management                                                                                                 Komitee für Hadronen- und Kernphysik

 open data  German infrastructure desirable for this service
 open science
    − everything used to obtain published results must be open
    − reference guides for publishing data and sofware, partnerships with publishers
         • long term software support needs to be guaranteed, repositories need to be curated
 sustainability of software development and frameworks
 reproducibility of analysis workflows
    − application of modern analysis techniques as Jupyter NB
 research data management following the FAIR
  principles
    − data must be Findable, Accessible, Interoperable,
       Reusable
    − new metadata schemes including DOIs
    − standardised protocols and formats for data access
    − interfaces for experiment overarching data analysis                                                                                              publication

 also smaller HuK communities need to be included
 full data life cycle needs to be supported

                                                                                                                       figure taken from DMA@MT

                                                                      Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021           12
person power considerations                                                                                 Komitee für Hadronen- und Kernphysik

 funding is needed for hiring software and computing experts
    − keeping expert person power
    − long term preservation of acquired competences
 education and training of users and developers in new technologies
 dissertations and publications in the area of computing need to be supported
 new curricula including more computing courses
 interdisciplinary work with other sciences as well as industry should be encouraged
 increase IT awareness in the HuK community
    − e.g. introduce computing tracks in DPG (ongoing)

                                                         Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021           13
table of contents                                                                       Komitee für Hadronen- und Kernphysik

 motivation
 KHuK digitalisation requirements
 ongoing and upcoming projects
 summary

                                     Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021           14
selection of ongoing and upcoming projects                                      Komitee für Hadronen- und Kernphysik

                                                 These projects are a good starting
                                                 point. But by far not everything is
                                                 covered yet. Crucial projects come to
                                                 an end. It is not clear yet if the
                                                 upcoming projects will start and
         2019-22                                 when. The interaction between these
                                                 projects is also not always clear.

                                                                                                    POF IV
                                                               (2021-27) new topic: DMA

                                                                             ErUM-Data
   2018-21                                                      starting now    Hub

             start 2021?          start 2022?

                             Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021           15
table of contents                                                                       Komitee für Hadronen- und Kernphysik

 motivation
 KHuK digitalisation requirements
 ongoing and upcoming projects
 summary

                                     Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021           16
summary                                                                                                Komitee für Hadronen- und Kernphysik

 large scale Computing challenges lie ahead of the KHuK communities and need to be solved
  now
 a wide range of computing demands have to be satisfied: from large scale online computing
  installations at the experiments to HPC systems for theory and analysis, as well as support
  for the many smaller experiments and groups in the HuK community
 KHuK participates in many existing and upcoming projects, which is a good start, but by far
  not everything is covered and it is still unclear if all projects will be funded
 KHuK computing demands stretch over all areas of Erum-Data: Big Data Analytics,
  Federated Infrastructures, Research Data Management
 HuK communities would benefit tremendously from increased training/education and better
  career paths in computing related activities
 common projects with ErUM communities and common usage of federated IT resources is
  part of the strategy

                                                    Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021           17
You can also read