CERNVM PROGRAM OF WORK 2021 - JAKOB BLOMER FOR THE CERNVM TEAM SFT MEETING 22 FEBRUARY 2021 - CERN INDICO
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Infrastructure: WLCG Agent Software, containers, auxiliary data for HEP, Stratum 0/1 LIGO, EUCLID, LSST, EESSI, and many others WLCG squid 2
Infrastructure: WLCG Agent Software, containers, auxiliary data for HEP, Stratum 0/1 LIGO, EUCLID, LSST, EESSI, and many others WLCG squid Available in the default configuration: ∼ 1.4 B files ∼ 125 repositories 2
Infrastructure: WLCG Agent Software, containers, auxiliary data for HEP, Stratum 0/1 LIGO, EUCLID, LSST, EESSI, and many others WLCG squid CERN Stratum 0s fully on Ceph S3 Available in the default configuration: ∼ 1.4 B files ∼ 125 repositories 2
Code Works • Among the top 1.5 % of active open source projects • Steady 50–100 commits per month • ∼30 000 LOC changed in 2020 3
Review of 2020 Highlights • Scaling up CernVM-FS container hub: 800+ images on /cvmfs/unpacked.cern.ch • Container runtime support: containerd/k8s, podman [GSoC contribution] • Major improvements to container conversion service (DUCC) • New and improved publishing workflows • Template transactions: ultra-fast, meta-data only publishing • Ephemeral writable shell: technical foundation to publish from anywhere • Fine-grained publisher monitoring • Performance improvements: parallelized garbage collection and storage gateway services → now fast enough to publish all LHCb nightlies (1.5 k packages, >10M files) until start of working day • Commissioning of the gateway services for LHCb nightlies • Experimental support for Microsoft Azure blob storage [Microsoft contribution] • CernVM 5 prototype, EL8 based • Infrastructure modernization: web presence, CI pipeline, VM & storage replacement • Dissemination: pre-GDB EGI webinar EGI Clinic IPDPS’20 (with U Notre Dame) CernVM virtual workshop 1-2 February 2021 with 99 registered participants 4
Review of 2020 Highlights Unfinished Tasks • Scaling up CernVM-FS container hub: • Container conversion status REST API & 800+ images on /cvmfs/unpacked.cern.ch dashboard • New and improved publishing workflows • Shared, external cache manager for • Performance improvements multi-container host • CernVM 5 prototype, EL8 based • Client pre-caching (due to reduced summer student programme) • Infrastructure modernization • In progress: • ... • Transition of publishing code to new libcvmfs_server • Connecting ephemeral writable shell to gateway services 5
Platform Support Commissioned in 2020 • A Platforms: • EL 7–8 new • Ubuntu 16.04, 18.04, 20.04 • B Platforms new • macOS 10.15, 11 Big Sur (M1 + Intel) • SLES 11 – 12 • Fedora, latest two versions • Debian 8–10 • EL7 AArch64 • IA32 architecture new • Linux on Windows via WSL-2 new • Client packaged as a container (for container-only Linux distros such as Atomic Host) 6
Highlights: New Web Site 7
Highlights: New Web Site 7
Highlights: New Web Site nvm.cern.ch https://cer l to Jekyll from Drupa • Moved sive design lo ok, resp on • Modern 7
Highlights: New Monitoring Site 8
Highlights: New Monitoring Site 8
Highlights: New Monitoring Site 8
Highlights: New Monitoring Site monitor Repository OpenShift on CERN • Hosted fs and lib cvm JavaScript • Based on ad d posit or y: add your re • Easy to metadata and repository t pull reques submit JSON AP I • 8
Highlights: CernVM Forum 9
Highlights: CernVM Forum .cern.ch nvm-forum https://cer S r CernVM-F e fo m fo ru • Discours ap pl iance rn VM and the Ce le, : searchab M an y ni ce features • n be m ar ked as questions ca .. resolved, . e tually replac ed to even • Supp os mailing lists 9
Highlights: JSROOT Powered Fine-Grained Publish Monitoring CVMFS_UPLOAD_STATS_DB=true Demo • Statistics are generated with ROOT • Uploaded as static files to Stratum 0 storage • Interactive plots (JavaScript / JSROOT) 10
Highlights: Template Transactions /cvmfs/sw.cvmfs.io amd64-gcc9 cvmfs_server transaction \ -T /amd64-gcc9/4.2=/amd64-gcc9/4.2-patches \ 4.2 sw.cvmfs.io ChangeLog .. . • As part of opening the transaction, 4.2-patches “4.2” is cloned to “4.2-patches” ChangeLog • Meta-data only copy, thus extremly fast: .. . observed 50 kHz file publish rate • Only changes on top need to be published template clone Used in fast container image ingestion 11
Highlights: Ephemeral Publish Shell • A new command, cvmfs_server enter, creates a sub-shell with a writable /cvmfs • Uses internally user namespaces and fuse-overlayfs • Works unprivileged on any modern Linux (e. g. EL8) that can mount the client • Could eventually be used to directly publish from any node to a gateway — however, the 2.8 release has only a ephemeral writable shell as a first step $ cvmfs_server enter hsf.cvmfs.io ...Opens a shell with write access to /cvmfs/hsf.cvmfs.io $ cvmfs_server diff --worktree ...Close shell, back to read-only mode Solves the main technology challenge to move away from dedicated publisher node, i. e. publish from anywhere! 12
Highlights: Infrastructure Modernization • Migration of 26 OpenStack VMs (builders, web services, etc.); campaign triggered by hypervisor decommissioning – we’d prefer automatic migrations in the future • Migration of ∼1 TB project storage from NFS to Ceph-FS and Ceph S3 • Migration of ∼15 build & test jobs to new Jenkins server • Commissioning of GitHub pull request builder: allows us to fully test changes before merging There has been an exceptional amount of infrastructure work in 2020. We count on the fact that the work is amortized over the coming years. 13
CernVM / CernVM-FS Program of Work 2021 14
Developer Power 2020 2021 Jakob Blomer Staff 50 % 50 % TBS Staff — 50 % Simone Mosciatti Fellow 100 % 25 % Jan Priessnitz Tech 60 % — Andrea Valenzuela Tech 33 % 66 % TBS Tech — 33 % FTE ∼2.4 ∼2.25 Significant contributors: Mohit Tyagi (GSoC student), Enrico Bocchi (IT-ST), Dave Dykstra (FNAL) 15
CernVM Calendar F) HE 4.5 IK (N VM ’22 1 Ce Cern 2.8 M’2 M rnV rnV .1 .8, 1 .2 .3 .4 .5 .7 .9 . 2.7 2.7 2.7 2.7 2.7 Ce v2 v2 v2 bugfix releases 9 0 20 0 0 0 1 1 1 2 /1 /2 /2 /2 /2 /2 /2 /2 /2 / 12 02 04 07 09 10 02 03 Q4 Q2 Consolidation & Improvements Ongoing effort to consolidate CernVM-FS developments in a single repository, e. g. gateway services and containerd plugin scheduled for merging 16
CernVM Appliance Plan of Work for 2021 1. Ready to use platform for LHC experiment production and development 2. Reference platform for long-term data preservation • 10 000+ booted VMs / day • 45 % of all ATLAS simulation jobs in 2020 ran at point 1 on CernVM! • CernVM bootloader + reference containers covering EL 4–7 • Interactive support: cernvm-launch and cernvm-online.cern.ch 2021 Plan of Work • Maintenance updates for CernVM 4 [est 1 FTW] • Migration of cernvm-online.cern.ch to new single sign-on system [est 1 FTW] • Stretch goal: CernVM 5 pre-production release [est 1 FTM] 17
CernVM-FS Plan of Work for 2021 1. Maintenance and support 2. Consolidation tasks 3. Seamless container image ingestion 4. Kubernetes-native publisher (in collaboration with SPI) 5. Client performance improvments for very large applications (e. g. Tensorflow) 18
Maintenance and Support Significant mainte- nance and support load Key figures from 2020: • 450 mails on support mailing lists • 40 bug fixes merged 19
Consolidation Tasks • Addressing open issues: bugfix sprint [est 1 FTM] • Addressing known shortcomings of the gateway services • Trigger garbage collection from remote publishers [est 1 FTW] • Use template transaction from remote publishers [est 1 FTW] • Transaction wait queue to prevent concurrent publishers from starvation [est 1 FTW] • Full repository tagging support [est 1 FTW] • Rebase gateway receiver on new libcvmfs_server [est 1 FTW] • Source code repository consolidation [est 1 FTW] • New platforms: SLES15 (for HPC), Debian 11 [est 1 FTW] • macOS binary signatures [est 1 FTW] 20
Future-proofing: Next-Generation Server Code Legacy Code New Architecture CLI GW receiver REST API ··· libcvmfs_server commit changeset, GC, tag management, . . . PUT/GET storage abstraction A set of tools targeted for a dedicated release manager A common base library providing repository machine, and the interactive workflow open transaction transformation primitives, on top of which higher-level + copy + commit publish abstractions can be built Initial CLI commands ported to libcvmfs_server: info, diff, transaction, enter. Foundation for future maintainability and other consolidation tasks (e. g. gateway services) Plan for 2021: port complete publish workflow to libcvmfs_server, including transaction abort & commit, tagging, garbage collection [est 2 FTM] 21
Seamless Container Image Ingestion Approach Users develop containers with the standard tools and services (gitlab, Dockerhub, etc.). For their large-scale deployment, we want to automatically ingest them in /cvmfs/unpacked.cern.ch Container Publishing Container Engine Integration • Based on working prototype, commission Engine Type CernVM-FS Support web-hook connection from standard registry to CernVM-FS [est 2 FTW] singularity flat native docker layers graph driver1 • Based on working prototype, merge fast merging containerd layers remote snapshotter of image layers [est 2 FTW] podman layers extra image store • Dashboard and status API: display current 1 activity, list of hosted images, etc. [est 1 FTM] Expected to be replaced by containerd remote snapshotter • Develop standard benchmark for publish throughput to assess supported scale of user Review and improve documentation, examples, container ingestion [est 1 FTM, summer student] integration tests for different deployment options [est 2 FTW] 22
Usability Milestones • Implement publishing to gateway services from ephemeral writable shell relies on libcvmfs_server consolidation tasks [est 1 FTM] • Based on the ephemeral publish container (see before), demonstrate a kubernetes-native publish workflow in collaboration with SPI [est 1 FTM] • Implement a client-preching mechanism to improve cold-cache start-up performance of very large applications (e. g. Tensorflow) design ready, planned as GSoC project [est 2 FTM] • Stretch goal: shared, external cache manager for multi-container host [est 1 FTM] • Stretch goal: restart activity on CernVM-FS Conveyor (see backup slides) 23
Community Interaction
Community Interaction • Developers and operators meet in a monthly coordination call (no changes for 2021) • Weekly operations coffee with IT-SM (no changes for 2021) • New CernVM forum supposed to take over from mailing lists • Mattermost becoming an important information exchange between developers and power users • Two publications in preparation for vCHEP 2021 • A CernVM-FS powered container hub (with IT-ST) • Performance engineering LHCb nightly builds publishing (with LHCb, IT-ST) • Frontiers in Big Data publication in preparation on containerised analysis workflows with kubernetes (with CMS) • Conferences and workshops on the radar: experiment computing weeks, GDB, HEPIX, ACAT • Stretch goal: repository content manager training course for software librarians [est 2 FTW] 24
Summary
Outlook and Goals for 2021 Main Priorities for 2021 1. Consolidation and exploitation of the CernVM-FS new services and features 2. Improve usability and scale of CernVM-FS based container deployments 3. Demonstration of a kubernetes-native publishing workflow (with SPI) The team successfully addressed a number of technology challenges in the last 12-18 months, in particular CernVM-FS integration with the container ecosystem, unprivileged client deployments (crucial for HPC access) and containerized publishing. In 2021, the new developments will undergo a phase of consolidation and hardening. 25
Backup Slides
Stretch Goal: CernVM-FS Conveyor A high-level abstraction of writing based on interdependent publication jobs. $ ssh cvmfs-sft.cern.ch { $ cvmfs_server transaction sft.cern.ch /lcg/ROOT "repository": "sft.cern.ch", $ tar -xf ROOT-6.18.tar.gz "path": "/lcg/ROOT", $ post-install.sh "payload": "https://root.cern.ch/ROOT-6.18.tar.gz", $ cvmfs_server publish "script": "https://spi.cern.ch/post-install.sh", "uuid": "e7b67a2...", "dependencies": ["f61d...", "a00e...", "..."] } oach Current appr • Send jobs to Conveyor API • Conveyor distributes work to multiple publisher nodes Goal: liberate CI pipeline from handling cvmfs_server intrinsics. Prototype available, est 1–2 months to develop into a first usable version in collaboration with SPI
You can also read