HEPIX FALL 2018 WORKSHOP REPORT - IT-SEMINAR DIE TEILNEHMER BERICHTEN DIE HIGHLIGHTS - DESY PUBDB
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
HEPiX Fall 2018 Workshop Report IT-Seminar Die Teilnehmer berichten die Highlights. DESY-IT: Thomas Finnern Dirk Jahnke-Zumbusch Peter van der Reest Martin Gasthuber (Helge Meinhard)
HEPiX Mission https://www.hepix.org • From our Web site https “The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from the High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges.” • Emphasis is on site services as opposed to experiment software and middleware • Originating in HEP (particle physics), but open to other sciences • Recent participation from life sciences, photon/material sciences, ... • See “Report on the Workshop on Central Computing Support for Photon Sciences” • Twice per year, one week each, in Europe, North America and Asia • Typically 100...140 participants • Autumn 2018 in Barcelona was the 58th workshop since 1991 Report HEPiX Fall 2018 Page 2
Barcelona Workshop https://indico.cern.ch/e/hepix-autumn2018 • Held 08 – 12 October in downtown Barcelona, hosted by Port d’Informació Científica (PIC) • PIC: Tier-1 for WLCG (ATLAS, CMS, LHCb), support for particle physics, astrophysics, cosmology, earth sciences • 137 registered attendees – record for HEPiX! • Many first-timers, many old friends • 105 from Europe (including 12 from PIC),14 from North America, 9 from Asia, 9 from companies • 54 different affiliations • 37 from Europe, 7 from North America, 5 from Asia, 5 companies • 69 contributions + some extra session Report HEPiX Fall 2018 Page 3
Tracks and Trends (1) Miscellaneous, Basic IT , Grids, Clouds and Virtualisation • Miscellaneous: 4 Contributions, 1h10’ • CSBS is OUR journal. Let’s use it and publish our work! Many interesting things this week – consider writing an article • Basic IT Services: 4 Contributions, 1h40’ • Using messaging services to improve scalability and reliability of systems management • New approach in the CERN Authentication and Authorisation management • Grids, Clouds, Virtualisation: 3 Contributions, 1h10’ • Data lake proposal: To optimise costs, will need to switch to QoS (performance, reliability etc.) for storage – major change (technical, cultural, social) Report HEPiX Fall 2018 Page 5
Tracks and Trends (1) AAI: The new CERN CERN Authentication and Authorisation Infrastructure Report HEPiX Fall 2018 Page 7
Tracks and Trends (1) AAI: The new CERN CERN Authentication and Authorisation Infrastructure Report HEPiX Fall 2018 Page 8
Tracks and Trends (1) AAI: The new CERN CERN Authentication and Authorisation Infrastructure Further Reading: • The Road to the new CERN Authentication • CERN Authentication and Authorization Infrastructure Design Report HEPiX Fall 2018 Page 9
Tracks and Trends (2) Computing and Batch Services • Computing and Batch Services: 9 Contributions, 3h35’ • Photon science support: several HEP sites involved with • Benchmarks : 2 Main Activities big projects for the coming decade • A 1st workshop at BNL discussing the specific • Fast benchmark for estimating the job slot CPU issues, in particular those caused by the loose links power : LHCb DB12 adopted by LHCb and ALICE between users and the photon facility but not appropriate for the procurements • Idea of organising a workshop focused on computing • Next generation benchmark for estimating the for photon science co-located with HEPiX, like we did installed capacity and for procurements : work stil in for LHCOPN/LHCONE progress: SPECcpu 2017, set of HEP apps? • AMD EPYC Architecture: • Talk on Remote Analysis Efforts at ALBA • Promising evaluation by BNL, will restore some • Commissioning CERN Tier-0 reconstruction workloads competition on the server CPU market on Piz Daint at CSCS: Configuring and optimizing Piz • Performance and price competitive with Intel; need to Daint at CSCS for running ATLAS and CMS Tier-0 see adoption by server vendors workloads • Next generation next year will bring many more cores • PDSF - Current Status and Migration to Cori Spare • Improving OpenMP scaling using openssl:: Using set of compute resources of EOS storage nodes at CERN "openssl speed" commands to optimize OpenMP have been enabled to run user jobs in containers performance Report HEPiX Fall 2018 Page 10
Tracks and Trends (3) End-User Services and Operating Systems • End-User Services and Operating Systems: 7 • Reducing Dependencies on commercial Software Contributions, 2h55’ (providers): investigating open-source products to • CERN service management: Focus on Service provide services and avoid vendor lock-in Catalogue, bringing services on board, the User • Container Orchestration: Enable hosting large experience, and Tool configuration range of Web applications within CERN; • CERN Linux services: Update on CC7, SLC6, opportunity to consolidate all web hosting on RHEL support distributions and services; software common infrastructure collections, virtualization, openstack SIGs; • Jupyter-based analysis portal at BNL: Supports anaconda plugin, lockup; community work on containerized applications started via a seamless alternative architectures; Koji and integration into local batch with authentication • Gitlab; Future Support for Lightweight Containers, token delegation (see next slide) CC8, Freeipa, s3 for static content • Rust: Worth a serious look by Python developers, • Indico: Upgrade to 2.1! Roadmap for next good compromise between C++ and Python releases: new room booking, internationalisation, paper reviewing, CalDAV support, ... Report HEPiX Fall 2018 Page 11
Tracks and Trends (3) End-User Services and Operating Systems Report HEPiX Fall 2018 Page 12
User Consulting & tickets (3) What CERN learned • Light-weight wasy to use forms for input help • Qualify input duriing ticket creation • Offer KB articles (and ease KB article publication in parallel) • User feedback is valuable • Easy to access feedback forms generate more feedback • Transparency: all services Report HEPiX Fall 2018 Page 13
Tracks and Trends (4) IT Facilities • IT Facilities: 4 Contributions, 1h40’ • Technology watch working group kicked off with 58 people subscribed. Number of subgroups covering individual domains, e.g. processors, memory, etc. Further volunteers to do the actual work are needed • Cost model WG report: better understanding of the workloads via defined metrices; define a common framework for estimating resources, to then look at scenarios to make improvements • Latest CERN procurements, changes in the team, impact of recent issues and technology changes • Superfacility at NERSC: introduce common workflow and API for access from multiple sciences and user to the facility (rather than dedicated access per science/user) Report HEPiX Fall 2018 Page 14
What could possibly go wrong? (4) Miscellaneous – NDGF – 1 • Construction works interrupted networkk access for University of Linkoeping • There should have been two separte tracks • Those were clearly shown in the providers papers, but … Report HEPiX Fall 2018 Page 15
What could possibly go wrong? (4) Miscellaneous – NDGF – 2 • Battery driven UPS • Electric current had been monitored, but … • … One day colleague smelled acid and rack was hot (76 degC) • sustained electric current had risen from 1A to 5A • Unnoticed, as 5A peaks are common • Should have been monitored • Voltage • Resistance • Temperature Report HEPiX Fall 2018 Page 16
What could possibly go wrong? (4) Miscellaneous – NDGF – 3 • Control cabinet got hot • Light arc • Loose bolt for neutral kept fluctuating P-N-Voltage between 210..250V 230V (P-N) Report HEPiX Fall 2018 Page 17
Tracks and Trends (5) Networking and Security • Networking and Security: 11 Contributions, 4h30’ Report HEPiX Fall 2018 Page 18
Computer Security Update Liviu Vâlsan For The CERN Computer Security Team HEPiX Autumn 2018, Barcelona Page 19
20 Page 20
Google NOT disclosing user data breach In March 2018 Google finds a bug that allowed third-party app developers to access user data for which they didn’t have permission Google officials in leaked memo: Disclosure will likely result “in us coming into the spotlight alongside or even instead of Facebook despite having stayed under the radar throughout the Cambridge Analytica scandal” The disclosure would also invite “immediate regulatory interest” No way to know who was affected, logs kept for two weeks only Sources: The Guardian and The Wall Street Journal 21 Page 21
Tracks and Trends (5) Networking and Security • Networking and Security: 11 Contributions, 4h30’ Report HEPiX Fall 2018 Page 22
Tracks and Trends (5) Networking and Security • Networking and Security: 11 Contributions, 4h30’ Report HEPiX Fall 2018 Page 23
Tracks and Trends (6) Site Reports • Site Reports: 16 Contributions, 4h00’ • HPC sites / clusters more and more used by HEP, and discussed in HEPiX • Computing farms extended to clouds or HPC resources (unified pool). • HTCondor + ArcCE a common choice • More improvements / upgrades on network, enabling IPv6 everywhere now • Sites need to cope with new and important resources needs of their users • SurfSara, the Dutch supercomputer center for the first time at HEPiX. Very active on GRID activities for non WLCG comunities. Interesting among the several activities: WebDav security, and RcAuth, proxies without certificates • KiSTI notable Data Center Relocation, KISTI Grid CA system based on Hardware Security Module • LAL+GRIF: Expansion to new data center slower then expected, mainly bc of administrative problems • NERSC: move from PDSF to SLURM, move to CORI, CVMFS on the Cray Report HEPiX Fall 2018 Page 24
Tracks and Trends (7) Storage and File Systems • Storage and File Systems: 11 Contributions, 4h25’, • Tools for data management being renewed and plus two BoF Sessions adapted (also touched upon in BoF) • Tape Performance and accompanying • Interesting R&D on tape drive characterisation infrastructure: centres have been busy w/ ATLAS (as essential information needed is not Data Carousel; CERN field-testing CTA provided by vendors) to improve tape • Seven specialist presentations in Bird-of-a- operations efficiency Feather session leaving little time for discussion, • Optimizing EOS but (hopefully) continued theme at HEPiX based storage: • Impact of vendor release changes on storage splitting up storage systems and experiments timelines: example of metadata servers in Oracle Databases at CERN multiple instances to remedy performance • Large deployment of CEPH storage cluster and operational issues (Osiris) around the Great Lakes: multi-site data placement challenging, caching very rewarding • Introducing CEPHfs + Manila-based file storage as replacement of classic NFS file services • Three industry talks focussing on tape media and tape drive tech developments, and on optimising flash • Overview of large backup infrastructures: 'new' storage themes both in infrastructure and in user Report HEPiX Fall 2018 requirements Page 25
Tape technology (7) Storage • Fujifilm (sponsored talk) • 1EB/month of tapes sold in Europe (~30% of market, US~40%) • Metal particles are only ~20%, Barium-Ferrit (BaFe) already ~80% • IBM 3592 drives 30% faster than LTO8 • -“- 3x faster than newer HDDs writing speed • LTO8 = 10TB, 3592JE = 18TB per tape in 2019 • 4sqm = 10 PB • Bit error rate BER 1000x better in comparison to LTO6 and SSDs • Strontium-Ferrit in the future • Up to 400TB per tape capacity possible Report HEPiX Fall 2018 Page 26
Board Meeting • Current and Next Meetings • Working Groups • Confirmed 2019: • Most WGs reported during the week • 25 – 29 March: UCSD / SDSC, San Diego, CA, • Batch monitoring now started well USA • Infrastructure • 14 – 18 October: Nikhef, Amsterdam, The • Little hiccup with the Web site’s certificate, solved Netherlands • Logos to be made available on Web site • Quite firm ideas about 2020 and 2021 • Discussion about Protecting HEPiX Name / Logo • Some expressions of interest even beyond 2021 • Pepe Flix invited to become board member • Expressions of interest and proposals still very welcome Report HEPiX Fall 2018 Page 27
Ad-hoc BoF on AAI at Sites • Wednesday 13:30 h • Well attended • Multiple sites planning to review (and partially redo) their stack • Interest in digging into the issue further • Paolo Tedesco and Dave Kelsey volunteered to take things in hand • Series of remote meetings has started; volunteers still welcome • When enough interest is present and long terms goals agreed upon, could turn into Working Group Report HEPiX Fall 2018 Page 28
Special Thanks to Local Organisers Report HEPiX Fall 2018 Page 29
Miscellaneous Not to forget • IoT • any concepts available? • security: access • security: patching • Which kind of devices • Protocols • … • Software & computing for Big Science • Online journal • One paper edition per year • Springer is involved for publishing • Input/articles are welcome. Report HEPiX Fall 2018 Page 30
Conclusions More PICs here ● Thomas 2ct: ● Update Indico ● Observer AAI Group ● IoT coming ● Be Secure Peter 2Ct: ● Viele CERN-Beiträge fokussieren auf Infrastruktur-Maßnahmen, die wir bereits gemacht haben (oder auch dabei sind) ● Netze, WLAN, Conf- & Management-Tools, Storage-Mgmt, Performance-Evals ● Compute-technisch, Storage-Integration (EOS, CERNBox, Jupyter & SWAN), Virtualisierung, Self-Service sind sie an einigen Stellen weiter ● Mehr Enterprise-Scale-Projekte: Benutzung von Messaging-Infrastruktur für techn. Workflows, Monitoring, Auto- Reaktion (CERNMegaBus) ● ist alles nicht neu; die ersten Ansaetze unter Quattor & Lemon haben wir vor >10Jahre gehört, trotzdem aktuell! Report HEPiX Fall 2018 Page 31
See you in San Diego / CA (USA)! 25 – 29 March 2019 Report HEPiX Fall 2018 Page 32
You can also read