KHUK - DIGITALISATION PRIORITIES - ERUM-DATA COMMUNITY MEETING JAN 18/19 - ONLINE KILIAN SCHWARZ (GSI) - DESY INDICO
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Komitee für Hadronen- und Kernphysik KHuK – digitalisation priorities ErUM-Data Community meeting Jan 18/19 - Online Kilian Schwarz (GSI)
table of contents Komitee für Hadronen- und Kernphysik motivation KHuK digitalisation requirements ongoing and upcoming projects summary Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 2
ALICE online computing/CERN Komitee für Hadronen- und Kernphysik increased requirements for ALICE@Run3 O2 Data Rates in GB/s for Pb-Pb @ 50 kHz • continuous untriggered stream of data, have to be distributed from about 250 First Level Processor nodes (FLPs) to about 1500 Event Processing Nodes (EPNs). • TPC clusters not belonging to tracks are suppressed • in the end about 90 GB/s are written to Disk graphic from P. Buncic
CBM online computing/FAIR Tier0 Komitee für Hadronen- und Kernphysik CBM DAQ and online event selection novel readout system: • free running data aqcuisition without hardware triggers • continous stream of time stamped detector data • full track and event reconstruction in real First-level time Event • online data reduction (>100) by a Selector software trigger on events • about 20 PB collected annually for offline analysis Hit and track time distribution for Au+Au 10A GeV collisions at 10 MHz (UrQMD) GSI Green IT Cube high rack storage, 100,000 cores only 5% of total energy consumption needed for cooling Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 4
PANDA online processing/FAIR Tier0 Komitee für Hadronen- und Kernphysik • software trigger with full event reconstruction • no fixed time between events time reconstruction of event • all data from sub-detectors with time stamps and varying resolution. • higher efficiency by dynamically allocating resources • full online event reconstruction • event filtering by additional factor 10 • 200 GByte/s • FPGA based pre-processing • 200 MByte/s for offline analysis input data • data reduction by factor 10
Theory – Lattice QCD Komitee für Hadronen- und Kernphysik Software and Data Reduction and Transformation Algorithmic Tools published results, open access, arXiv MB analysis-specific collaboration codes and tools specific data physical quantities analysis and relation to experimental data analysis frameworks & GB statistical tools lattice data analysis workflow Data Lake data reduction: analysis- and HW- derived data sets, e.g. TB optimized codes, correlators, cumulants pre-selection of un- correlated data sets Lattice stored ensembles Data Grid – big data sets -- PB HPC hard- MC simulations to access to hardware at ware access generate „raw“ noisy data European HPC centers & university clusters
and many smaller communities which need to be included Komitee für Hadronen- und Kernphysik S-DALINAC HADES FAIR Phase 0 in JINR, JLAB, BNL, BES III, RIKEN and Theory Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 7
table of contents Komitee für Hadronen- und Kernphysik motivation KHuK digitalisation requirements ongoing and upcoming projects summary Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 8
online computing Komitee für Hadronen- und Kernphysik HuK experiments do real time reconstruction and event selection on large online farms. The following topics have therefore a high priority: continuous read out of (all) sub detectors at very high interaction rates real-time systems require algorithmic performance − better and faster algorithms need to be developed enabling parallel processing of new data structures allowing parallel data streams, joint efforts are needed highly efficient usage of hardware resources, and low latency online alignment and calibration online reconstruction − event reconstruction and selection w.r.t. signatures of rare observables in real-time − online 4D tracking − fast time based event building online NN/ML/DL − new techniques need to be applied for real time decisions based on reliable analysis and be made available on dedicated hardware like FPGA − required for PID, trigger channel selection, ... data irreversibility issue online data reduction has to be done with care same algorithms needed preferably for online and offline Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 9
Big Data Analytics Komitee für Hadronen- und Kernphysik especially for the smaller HuK communities common simulation & analysis tools are needed which run on new generations of high performance computers − common code basis, also for the use of novel resources like GPUs would be helpful new algorithms and methods need to be applied to the experiment frameworks, which includes ML, NN, DL, MVA, bayesian tools for statistical analysis, quantum computing − for particle identification − for calibration/reconstruction • e,g, for calibration of ALICE TPC space charge distortions and fluctuations. At run 3, Pb-Pb @ 50 kHz a calibration interval of 5 ms is required. Method via CNN and supervised learning fast prediction. • fast reconstruction to close the resource challenge gap − for analysis • of rare probes, secondary decay vertecies, rejection of large combinatorial background employing ML and DL for efficient, fast, accurate event generation and detector simulation including correlated and uncorrelated background via generative models. FPGA programming (e.g. as time to digital converter with high time precision) Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 10
Federated Infrastructures Komitee für Hadronen- und Kernphysik gigantic amounts of data volume of 100s of PB/year which need to be stored, distributed and complex information to be extracted requires new approaches development of Data Lakes as new and more centralised storage concepts − including intelligent Big Data Management − efficient data access from anywhere efficient usage and integration of new heterogenous resources as HPC centres, super computers and Cloud systems in federated computing and storage systems (as e.g. WLCG) − development of improved dynamic data caches − workflow and framework optimisation − application of virtualisation techniques in standard workflows − usage of new architectures as GPU clusters via architecture overarching data processing more efficiency and flexibility through common usage of federated resources ErUM Science Cloud increased network bandwidth between centres development of a federated infrastructure for FAIR figure taken from PUNCH4NFDI Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 11
Research Data Management Komitee für Hadronen- und Kernphysik open data German infrastructure desirable for this service open science − everything used to obtain published results must be open − reference guides for publishing data and sofware, partnerships with publishers • long term software support needs to be guaranteed, repositories need to be curated sustainability of software development and frameworks reproducibility of analysis workflows − application of modern analysis techniques as Jupyter NB research data management following the FAIR principles − data must be Findable, Accessible, Interoperable, Reusable − new metadata schemes including DOIs − standardised protocols and formats for data access − interfaces for experiment overarching data analysis publication also smaller HuK communities need to be included full data life cycle needs to be supported figure taken from DMA@MT Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 12
person power considerations Komitee für Hadronen- und Kernphysik funding is needed for hiring software and computing experts − keeping expert person power − long term preservation of acquired competences education and training of users and developers in new technologies dissertations and publications in the area of computing need to be supported new curricula including more computing courses interdisciplinary work with other sciences as well as industry should be encouraged increase IT awareness in the HuK community − e.g. introduce computing tracks in DPG (ongoing) Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 13
table of contents Komitee für Hadronen- und Kernphysik motivation KHuK digitalisation requirements ongoing and upcoming projects summary Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 14
selection of ongoing and upcoming projects Komitee für Hadronen- und Kernphysik These projects are a good starting point. But by far not everything is covered yet. Crucial projects come to an end. It is not clear yet if the upcoming projects will start and 2019-22 when. The interaction between these projects is also not always clear. POF IV (2021-27) new topic: DMA ErUM-Data 2018-21 starting now Hub start 2021? start 2022? Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 15
table of contents Komitee für Hadronen- und Kernphysik motivation KHuK digitalisation requirements ongoing and upcoming projects summary Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 16
summary Komitee für Hadronen- und Kernphysik large scale Computing challenges lie ahead of the KHuK communities and need to be solved now a wide range of computing demands have to be satisfied: from large scale online computing installations at the experiments to HPC systems for theory and analysis, as well as support for the many smaller experiments and groups in the HuK community KHuK participates in many existing and upcoming projects, which is a good start, but by far not everything is covered and it is still unclear if all projects will be funded KHuK computing demands stretch over all areas of Erum-Data: Big Data Analytics, Federated Infrastructures, Research Data Management HuK communities would benefit tremendously from increased training/education and better career paths in computing related activities common projects with ErUM communities and common usage of federated IT resources is part of the strategy Kilian Schwarz I ErUM-Data Community meeting I Online I 18-19 Jan 2021 17
You can also read