Computing Challenges in the Years to Come - Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors - DESY Indico
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Computing Challenges in the Years to Come Andreas Petzold, Achim Streit (Steinbuch Centre for Computing, GridKa) + many contributors KIT – The Research University in the Helmholtz Association www.kit.edu
HL-LHC Computing Challenges Source: S. Campana, Status of WLCG, 38th WLCG RRB, 27.10.2020, https://indico.cern.ch/event/957310/timetable/ 2 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Addressing HL-LHC Computing Challenges with a Multi-Pronged Approach Challenge: higher data rates Experiments have to require more storage and CPU Improve SW design & algorithms Adapt to new methods (AI) & HW (GPUs) Improve computing models (less copies & special data formats) Innovation and R&D at GridKa Federated storage (exascale data lakes) Optimizations for inc. data accesses Integration of special resources (GPUs) Access to opp. resources (HPC,cloud) Still – additional hardware is needed Source: ATLAS Experiment – Public Results, https://twiki.cern.ch/twiki/bin/view/AtlasPublic/ComputingandSoftwarePublicResults 3 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Performance Development 2.4 EFlop/s 442 PFlop/s 1.3 PFlop/s Fugaku @ Riken (7.6 M cores, 5.1 PB mem, 30 MW) Summit Taihu Light Tianhe-2 K Computer Earth Simulator Source: https://top500.org/statistics/perfdevel/ 4 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Chip Statistics System Share Source: Erich Strohmaier, slide 30, https://www.top500.org/media/filer_public/54/77/5477d858-1f1e-410b-994b-b7122cfd1d57/top500_2020_06_v2_web.pdf 5 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Chip Statistics Performance Share Fugaku Source: Erich Strohmaier, slide 31, https://www.top500.org/media/filer_public/54/77/5477d858-1f1e-410b-994b-b7122cfd1d57/top500_2020_06_v2_web.pdf 6 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Personal View – a Paradigm Change is needed Adapt to present and future hardware architectures (SIMD instructions, multi-/many-core, accelerators/GPUs, distributed memory) Apply computer science principles and algorithms Apply continuous integration/development/testing (CI/CD/CT) Consider software as a research infrastructure in itself Do Research Software Engineering (RSE) engineering software at the intersection of algorithms/numerics, RSE software engineering, and community codes More physics computing professorships 7 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Personal View – How to achieve this? Many parts of software have to be rewritten to profit from modern HW Use abstraction layers like Alpaka, OpenMP, etc. to ensure portability (GPUs/CPUs/…) and easy access to future architectures Probably not 100% performance, but good investment in sustainable code “The optimization/transformation process of software for GPUs also results in much more efficient code on CPUs” (V. Lindenstruth, 9th CERN SCF, slide 11) more Computing centralization vs. distributed resources? centralized Likely model: Centralized base resources (storage heavy & and at the network) and services + interfaces for same time a) permanent resources accessible through central site and more b) integration of opportunistic/dynamic (computing) resources distributed 8 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Personal View – Future Computing Resources Compute Heterogeneous architectures Be able to run jobs almost everywhere Some software profits from adaption to specific resources for peak performance Storage Permanent storage no longer at all sites Large fast storage at few sites with very good network connectivity Increasing tape usage with very well orchestrated data access Cache storage with low operations requirements Especially important at sites w/o direct internet access from WNs Source: WLCG Data Lake, Simone Campana, https://indico.cern.ch/event/738796/contributions/3174573/attachments/1755785/2846671/DataLake-ATCF.pdf 9 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Innovation and R&D at KIT – Examples Scalable online storage technology: GridKa as an island in the data lake throughput, IOPs, capacity requires massive scalability of storage and network infrastructure Software defined online storage to address less predictable, more remote, more diverse data access Powerful networks (internal, external) Reliable offline storage 2017 2018 2019/20 2021/22 Excellent performance and reliability upgrade upgrade upgrade https://s.kit.edu/gridka-monitoring 23 PB 35 PB 43 PB 60 PB 70 GB/s 100 GB/s 120 GB/s https://s.kit.edu/gridka-numbers 10 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Recent GridKa Storage Extension + 36h 40 5U Seagate x5u85 • 84 16TB HDDs • dual controller 12 protocol servers 200 GB IB Switches 400/100/40G Eth 11 11/20/2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Innovation and R&D at KIT – Examples Deep integration of GPU nodes in GridKa farm accessible via GridKa CEs 3 nodes á dual AMD EPYC 7662 64 core, 1TB RAM, 8 Nvidia V100S 32GB COBalD/TARDIS enabling a “regional resource pool” HPC-systems ForHLR II & HoreKA (~17 Pflop/s) @ KIT and in Bonn, Freiburg, Munich, … Tier-2/3 WLCG systems in Aachen, Bonn, Karlsruhe, … Grid CE Grid CE Grid CE 12 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
The PUNCH4NFDI Consortium Spokesperson: Thomas Schörner (thomas.schoerner@desy.de) DESY, Notkestr. 85, D-22607 Hamburg Contact: Mail: punch4nfdi@desy.de Web: www.punch4nfdi.de Twitter: #punch4nfdi
*Particles, Universe, NuClei & Hadrons for the NFDI PUNCH4NFDI* in one Slide A consortium for the NFDi PUNCH4NFDI Represents (astro)particle, astro, hadron & nuclear physics in the NFDI. Specific strengths: big data and open Broad community representation: > 40 partner institutes data; ready to take leading role in NFDI. Our offer: Task areas: A layered model of data management with Data management scalability that allows for easy FAIRification Data transformations Numerous services to develop community- Data portal specific approaches in this direction Data irreversiblity The PUNCH science data platform evolving Synergies&services, Teaching&outreach ASTRO around advanced research products @NFDI Timeline and general situation: Services: Evolving around research 9 consortia (out of 30 max in NFDI) funded in products and their dynamic first NFDI round – none from physics. Now competing e.g. with life cycle FAIRMat, DAPHNE4NFDI Connecting to entire NFDI Submission of proposal 30 Sep 2020; evaluation by review panel 10 – a cornerstone of Dec 2020; grants by July 2021; funding start 1 Oct 2021 research data management in D. Page 14
Autor der Folien: 15 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa Gregor Kasieczka, U-Hamburg
16 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Autor 17 der 20.11.2020 Folien: Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa Alexander Schmidt, RWTH
18 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
Neuer Verbund im Bereich Computing für Run-3 bei ATLAS und CMS Ziel: Zuverlässiger Betrieb der zu erweiternden Computing-Infrastruktur in Run-3 für die ATLAS- und CMS-Experimente in Deutschland essentiell für den Gesamterfolg der Forschung durch Physiker*innen an deutschen Instituten Run-3 des LHC (2022 bis 2024): pp-Datensatz wächst um Faktor ~ 2,5 Bereitstellung und Betrieb zusätzlicher Ressourcen Anforderungen 2024 ~ 1,5 * Hardware in 2021 (plus signifikanter Ersatz alter Hardware) Weiterentwicklung und Optimierung des Betriebs (Datenmanagement, Job-Scheduling, Monitoring, ...) für sich ändernde Computingmodelle Autor der Folien: Markus Schumacher, U-Freiburg
Neuer Verbund im Bereich Computing für Run-3 bei ATLAS und CMS Run-3 (2022-2024): im Wesentlichen erfolgreiches Modell aus Run-1 und Run-2 HL-LHC (ab 2028): Übergang muss aber bereits in 2024 eingeleitet werden In 2023 beschaffte Hardware, letzte die nicht in Run-4 genutzt wird Neue Computing-Modelle im effizienten Dauerbetrieb spätestens ab 2028 Nutzung neuer Ressourcen z.B. HPC-Cluster, GPUs, opportunistische Ressourcen, ... Neues Konzept für Datenspeicherung: wenige Standorte vor allem HGF-Zentren, im Rahmen des Konzepts der Data Lakes u. schnelle Caches an Analysezentren (Unis) Vorlauf von 4 Jahren notwendig für Test des Betriebs, Aufbau der Hardware, Finanzierung,... Stärkere Rolle der HGF-Zentren zusätzliche Finanzmittel jenseits normaler Förderung Neuer experimentübergreifende Verbund: noch stärkere Vernetzung (jenseits GridKa-OB, GridKa-TAB, Terascale-Computing-Board) gemeinsame Diskussion und Strategie für homogenes Betriebsmodell für HL-LHC
Conclusion Challenges ahead cannot be solved with hardware investments alone (but neither can they be solved without hardware investments) HPC, cloud, data lake… is not enough… Aggressive R&D = better exp. software Created with computer science expertise Using modern programming paradigms Being efficient and scalable chance for university groups? Source: ATLAS Experiment – Public Results, https://twiki.cern.ch/twiki/bin/view/AtlasPublic/ComputingandSoftwarePublicResults 21 20.11.2020 Achim Streit – Computing Challenges of the Coming Years Steinbuch Centre for Computing, GridKa
You can also read