Reusing dependencies across ecosystems: what stands in the way? - FOSDEM 2021

Page created by Shane Duran

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Reusing dependencies across ecosystems: what stands in the way? - FOSDEM 2021

Reusing dependencies across ecosystems:
          what stands in the way?

                                                                                                                                                        Todd Gamblin
                                                                                                                                           Advanced Technology Office
          FOSDEM 2021                                                                                                                           Livermore Computing
          Dependency Management Devroom

LLNL-PRES-811108
This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-
AC52-07NA27344. Lawrence Livermore National Security, LLC

Software complexity is increasing dramatically
                                         • Multi-physics          § Applications today are highly integrated
                          Program        • National Security        — HPC simulations use many numerical libraries and
                           Drivers       • UQ
                                         • Cognitive Simulation
                                                                       physics packages
                                                                    — Increasingly, AI stacks are being integrated into
                                                                       traditional physics workflows

 Software                       AI and
                                              • ML Stacks
                                              • Containers
                                                                  § Getting performance requires us to use all the
                                Cloud
Complexity                                    • Cloud APIs          available hardware
                                                                    — GPU, fast network, processor

                                         • CPU Architectures
                                                                  § Frequently, we also want to reuse native
                          Hardware
                                         • GPUs                     dependencies
                          Diversity
                                         • Accelerators
                                                                    — System libraries
                                                                    — Vendor-supplied drivers, runtime libraries, compilers

                     Integrating at this scale requires reusing code from many software ecosystems
                                                                                                                              2
  LLNL-PRES-811108

Some fairly common (but questionable) assumptions

§ 1:1 relationship between source code and binary (per platform)
      — Good for reproducibility (e.g., Debian)
      — Bad for performance optimization

§ Binaries should be as portable as possible
      — What most distributions do
      — Again, bad for performance

§ Toolchain is the same across the ecosystem
      — One compiler, one set of runtime libraries
      — Or, no compiler (for interpreted languages)

                   Outside these boundaries, users are typically on their own
                                                                                3
LLNL-PRES-811108

High Performance Computing (HPC)
   violates many of these assumptions                                               Some Supercomputers
§ Code is typically distributed as source
  — With exception of vendor libraries, compilers             Current         Summit                           Fugaku
§ Often build many variants of the same package                           Oak Ridge National Lab          RIKEN
                                                                              Power9 / NVIDIA        Fujitsu/ARM a64fx
  — Developers’ builds may be very different
  — Many first-time builds when machines are new

§ Code is optimized for the processor and GPU                          NERSC-9
                                                          Upcoming Perlmutter                      Aurora
  — Must make effective use of the hardware                               Lawrence Berkeley
  — Can make 10-100x perf difference                                        National Lab           Argonne National Lab
                                                                           AMD Zen / NVIDIA            Intel Xeon / Xe
§ Rely heavily on system packages
  — Need to use optimized libraries that come with machines
  — Need to use host GPU libraries and network

§ Multi-language                                                        Oak Ridge National Lab
                                                                                                    Lawrence Livermore
  — C, C++, Fortran, Python, others                                        AMD Zen / Radeon
                                                                                                       National Lab
       all in the same ecosystem                                                                     AMD Zen / Radeon

                                                                                                                          4
   LLNL-PRES-811108

Supporting accelerated architectures require more components,
and more complexity in the software stack
                                                                                                       (pre) Exascale Machines
MFEM Finite Element                      Kernels               Backends      Hardware
     Library
                                                                 CUDA
                                        Memory
                     linalg                                          HIP
                                    R     RW        W                           GPU
                                                                 OCCA
                     mesh
                                                                 libCEED
                     fem                 Kernel                                 CPU
                                                                     RAJA
                                        Execution
                                                                     OMP
                                                                            Added for cross-platform
                                                                            GPU support

§ MFEM has redesigned memory management and added 6 libraries
    — MFEM has not yet started to support the Intel GPUs on Aurora

§ El Capitan will have a new, rapidly changing set of libraries, compilers, and language runtimes
                                                                                                                                 5
  LLNL-PRES-811108

HPC simulations rely on icebergs of dependency libraries

                                                                                  MFEM:                                                                                                                                                                                                                                                                                                                  LBANN: Neural Nets for HPC
                                                                        Higher-order finite elements
          mfem                                                                                                                                                                                                                                                                                                  lbann

  sundials                  petsc

                                                                               31 packages,
                                                                                                                                                                                                                        cudnn                                                                          cereal                                                                                                                                                                                          py-pandas

             hypre                  superlu-dist                                                                                                                                                                                                        hydrogen                                              conduit                                                                    py-matplotlib                                                                                                              py-numexpr           py-bottleneck                          py-onnx

          hdf5        openblas               parmetis
                                                                           69 dependency edges                                                                                                                                         aluminum         cub              opencv                                          freetype                                                                                                                                            py-python-dateutil                                  py-numpy

openmpi                                                 metis                                                                                                                                                                   nccl         mpich                 libtiff              ninja                         libpng               py-pillow      py-configparser     py-graphviz       py-kiwisolver    py-pyparsing   py-texttable             py-setuptools-scm     py-pytz    py-cycler        openblas                py-cython                    py-protobuf                 py-typing    py-typing-extensions

                                    cmake                     python                                                                                                                                                    cuda    findutils            cnpy               libjpeg-turbo                                                                                                                                                                                                      py-setuptools                               py-six                     protobuf

                 hwloc           openssl            expat                          gettext     libffi                                                                                                                   hwloc    libtool          automake         texinfo                   nasm     cmake                                                                                        python

                                                                                                                                                                                                                                                  autoconf                                          openssl                                sqlite               gettext                 expat   libffi
                                           sqlite    perl               libxml2                         bzip2

                            zlib                            gdbm                  xz         tar   diffutils
                                                                                                                                                                                                                                    m4                        libxml2   perl                             hdf5                                                               bzip2                                                                                                                      71 packages
                                                            readline                         libiconv
                                                                                                                                                                                                                                libsigsegv                                                               zlib                  xz           gdbm                   tar      diffutils

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  188 dependency edges
                                                                                                                                                                                                                                                                                                                                               readline            libiconv
                                                    ncurses

                                                                                                                                                                                                                                                                                                                                               ncurses
                             pkgconf

                                                                                                                                                                                                                                                                             pkgconf
                                                                                                                                                                                                                                                                                                                                                                                                            MuMMI: Cancer/drug interaction modeling
                                                                                                                                                                                                                                                                                                                        mummi                                                                                                                                                                                                                                                 gromacs      mummi-macro

                                                                                                                                                                                                            py-keras                                                                                                           cudnn                                                                                            flux-sched                                                                                             talass                                        fftw

                 py-scikit-learn                                                py-mdanalysis-mummi                                                                                                   py-theano                                                                                                py-matplotlib                                                                                                                           flux-core                                      yaml-cpp                                                                             openmpi

py-cython            py-joblib                py-biopython             py-gsd           py-griddataformats                   py-networkx         py-scipy   py-mock                                                                py-python-dateutil                             py-cycler                                    py-pillow                                                    py-pyyaml             py-jsonschema          lz4              lua-luaposix         py-cffi                                       jansson                   faiss                       hwloc

                         py-mmtf                                                  py-numpy                 py-decorator   py-msgpack       py-keras-preprocessing             py-keras-applications                 py-six          py-setuptools-scm                        py-kiwisolver                      py-pyparsing                       libjpeg-turbo                            libyaml               py-vcversioner           py-pycparser             lua                             czmq            databroker                           swig     cuda                      numactl      libpciaccess

                                                                                                                                                              py-setuptools                                       openblas                                                                                                                             nasm                                                                                    unzip                   libuuid             libzmq                    redis         cmake           automake       pcre                         libtool          util-macros

                                                                                                                                                                                                                                                                               python                                                                                                                                                                                                 libsodium                    libevent                            autoconf

                                                         98 packages
                                                                                                                                                                                                                                                                                       gettext                expat                                                             freetype          libffi        boost                                                                                               openssl                     perl                                              m4

                                                    248 dependency edges
                                                                                                                                                                                                                                                                                                              libbsd            bzip2              sqlite                     libpng            gdbm                       libxml2                                                                                                                                                           libsigsegv

                                                                                                                                                                                                                                                                                       tar                      xz      diffutils                              readline        zlib

                                                                                                                                                                                                                                                                                                                         libiconv                                                                                                       ncurses

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         pkgconf

                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                6
                 LLNL-PRES-811108

Spack enables Software distribution for HPC

• Spack automates the build and installation of scientific software
• Packages are templated, so that users can easily tune for the host environment
   No installation required: clone and go
    $ git clone https://github.com/spack/spack
    $ spack install hdf5

   Simple syntax enables complex installs
         $ spack install hdf5@1.10.5               $ spack install hdf5@1.10.5 cppflags="-O3 –g3"
         $ spack install hdf5@1.10.5 %clang@6.0    $ spack install hdf5@1.10.5 target=haswell
         $ spack install hdf5@1.10.5 +threadsafe   $ spack install hdf5@1.10.5 +mpi ^mpich@3.2

                                                                                                    github.com/spack/spack
• Ease of use of mainstream tools, with flexibility needed for HPC tuning

• Major victories:
   • Fugaku OSS software stack deployment
   • Deployment time for 1,300-package stack on Summit supercomputer reduced from
      2 weeks to a 12-hour overnight build
    • Used by teams across U.S. Exascale Computing Project to accelerate development

                                                                                                                             7
    LLNL-PRES-811108

Spack environments enable users to build customized stacks
from an abstract description
                                                                             Simple spack.yaml file
                                                 Dependency
                                                 packages
                                                                    build
                             install
                                                                   project
spack.yaml file with                    Lockfile describes
 names of required                      exact versions installed
   dependencies

§ spack.yaml describes project requirements
                                                                             Concrete spack.lock file (generated)
§ spack.lock describes exactly what versions/configurations were
   installed, allows them to be reproduced.
§ Frequently used to maintain configuration along with Spack
   packages.
    — Versioning a local software stack with consistent compilers/MPI
           implementations

§ Allows users to specify external packages to integrate

                                                                                                                    8
   LLNL-PRES-811108

Build integration complexity has caused large delays in the
   MuMMI developer workflow                                                                                      SC ’19, November 17–22, 2019, Denver, CO, USA                                                                                      F. Di Natale et al.

§ LLNL’s MuMMI code is a used to model drugs,
 including cancers and, more recently, COVID-19
                                                                                                                 Figure 1: Addressing many important biological questions requires large length- and time-scales, yet at the same time molec-

§ When standing up Sierra, the team was at the mercy of constant system updates                                  ular level details. Here we showcase the Multiscale Machine-Learned Modeling Infrastructure (M�MMI) by simulating protein-
                                                                                                                 lipid dynamics for a 1 �m x 1 �m membrane subsection at near-atomistic resolution.

                                                                                                                 execution on heterogeneous resources. A central work�ow man-                     1 INTRODUCTION
                                                                                                                 ager simultaneously allocates GPUs and CPUs while robustly han-
   Determine                                                               Fix                                   dling failures in compute nodes, communication networks, and
                                                                                                                                                                                                  While supercomputers continue to provide more raw compute

                                                                                                                 �lesystems. A hierarchical scheduler controls GPU-accelerated MD                                                 Application
                                                                                                                                                                                                  power, it is becoming increasingly di�cult for applications to fully

 system library                      Build code                       bugs/configs                         Productivity!
                                                                                                                 simulations and in situ analysis.                         OS update              exploit these resources. The challenge of building a multiscale mod-

                                                                                                                    We present the various M�MMI components, including the                                                          breaks
                                                                                                                                                                                                  eling capability utilizing modern supercomputing architecture —

 configurations                                                      based on errors                             macro model, GPU-accelerated MD, in situ analysis of MD data,
                                                                                                                                                                                                  heterogeneous computing elements, deep memory hierarchies, and
                                                                                                                                                                                                  complex network interconnects — can be decomposed along two
                                                                                                                 machine learning selection module, a highly scalable hierarchical                thematic axes: (1) the algorithmic challenges in managing increas-
                                                                                                                 scheduler, and detail the central work�ow manager that ties these                ing levels of parallelism within an application, and (2) the logistic
                                                                                                                 modules together. In addition, we present performance data from                  challenges of scheduling and coordinating the execution of multiple
                                                                                                                 our runs on Sierra, in which we validated M�MMI by investigating                 applications across such diverse resources.
                                                                                                                 an experimentally intractable biological system: the dynamic inter-                 The prototypical approach to the �rst challenge are monolithic
                                                                                                                 action between RAS proteins and a plasma membrane. We used up                    parallel applications able to simulate problems of unprecedented
                                                                                                                 to 4000 nodes of the Sierra supercomputer, concurrently utilizing                size and scale using full-system runs [30, 34, 67]. Conversely, the
                                                                                                                 over 16,000 GPUs and 176,000 CPU cores, and running up to 36,000                 work�ow challenge is often approached through massively parallel
                                                                                                                 di�erent tasks. This multiscale simulation includes about 120,000                ensembles [56], which execute a large number of small- or medium-
                                                                                                                 MD simulations aggregating over 200 milliseconds, which is orders                scale instances simultaneously. Here, we describe the creation of
                                                                                                                 of magnitude greater than comparable studies.                                    a novel simulation infrastructure that addresses both challenges

§ We use system dependencies (MPI, compilers) to:
                                                                                                                                                                                                  and enables, as an example, the execution of a massively parallel,
                                                                                                                 CCS CONCEPTS                                                                     multiscale simulation steered by a machine learning (ML) approach,
                                                                                                                 • Computing methodologies → Machine learning; Multiscale                         and orchestrated through a sophisticated work�ow governing thou-
  — get the best performance from the machine.                                                                   systems; Massively parallel and high-performance simula-
                                                                                                                 tions; Simulation tools; • Applied computing → Computational
                                                                                                                                                                                                  sands of simultaneous tasks.
                                                                                                                                                                                                     The scienti�c challenge to which we apply our novel infrastruc-
  — avoid long build times                                                                                       biology.                                                                         ture is an investigation of the interaction of RAS proteins with the
                                                                                                                                                                                                  cell membrane. Mutations of RAS contribute to a wide range of
                                                                                                                 KEYWORDS                                                                         cancers as RAS modulates the signaling pathways that control cell
    Di Natale et al. A Massively Parallel Infrastructure for Adaptive Multiscale Simulations: Modeling RASmultiscale
                                                                                                           Initiation     Pathway
                                                                                                                     simulations, adaptivefor Cancer.
                                                                                                                                           simulations,    In Supercomputing
                                                                                                                                                        massively parallel,  2019 (SC ’19). 2019. Best paper.
                                                                                                                                                                                                  division and growth. RAS activates signaling only when bound to
                                                                                                                                                                                                  lipid bilayers that form cellular membranes. This membrane asso-
                                                                                                                 heterogenous architecture, machine learning, cancer research
                                                                                                                                                                                                  ciation is an under-explored area of cancer biology that may be
                                                                                                                 ACM Reference Format:                                                            relevant to therapeutic intervention against cancer. We use M�MMI
                                                                                                                 Francesco Di Natale, Harsh Bhatia, Timothy S. Carpenter, Chris Neale, Sara       to facilitate the better understanding of the mechanism and dynam-
                                                                                                                                                                                                                                                                       9
   LLNL-PRES-811108                                                                                              Kokkila Schumacher, Tomas Oppelstrup, Liam Stanton, Xiaohua Zhang, Shiv          ics of interaction between RAS, lipids, and other signaling proteins,
                                                                                                                 Sundram, Thomas R. W. Scogland, Gautham Dharuman, Michael P. Surh,
                                                                                                                                                                                                  which requires molecular-level detail and cannot be obtained exper-
                                                                                                                 Yue Yang, Claudia Misale, Lars Schneidenbach, Carlos Costa, Changhoan
                                                                                                                 Kim, Bruce D’Amora, Sandrasegaram Gnanakaran, Dwight V. Nissley, Fred
                                                                                                                                                                                                  imentally with current technologies. MD simulations can simulate
                                                                                                                 Streitz, Felice C. Lightstone, Peer-Timo Bremer, James N. Glosli, and Helgi I.   such interactions with the appropriate detail, but only for micro-
                                                                                                                 Ingólfsson. 2019. A Massively Parallel Infrastructure for Adaptive Multiscale    scopic length- and time-scales (even on the largest computers).
                                                                                                                 Simulations: Modeling RAS Initiation Pathway for Cancer. In The Interna-         However, lipid concentration gradients and protein aggregation
                                                                                                                 tional Conference for High Performance Computing, Networking, Storage, and       evolve over length- and time-scales hard to access through high-
                                                                                                                 Analysis (SC ’19), November 17–22, 2019, Denver, CO, USA. ACM, New York,         resolution MD.
                                                                                                                 NY, USA, 14 pages. https://doi.org/10.1145/3295500.3356197

~ 100 lines, 20 pinned system dependencies
   Build integration complexity has caused        massive        delays
                                                         configured         in the
                                                                    per machine
   MuMMI developer workflow                                                                                      SC ’19, November 17–22, 2019, Denver, CO, USA                                                                                     F. Di Natale et al.

§ LLNL’s MuMMI code is a used to model drugs,
 including cancers and, more recently, COVID-19
                                                                                                                 Figure 1: Addressing many important biological questions requires large length- and time-scales, yet at the same time molec-

§ When standing up Sierra, the team was at the mercy of constant system updates                                  ular level details. Here we showcase the Multiscale Machine-Learned Modeling Infrastructure (M�MMI) by simulating protein-
                                                                                                                 lipid dynamics for a 1 �m x 1 �m membrane subsection at near-atomistic resolution.

                                                                                                                 execution on heterogeneous resources. A central work�ow man-                     1 INTRODUCTION
                                                                                                                 ager simultaneously allocates GPUs and CPUs while robustly han-
   Determine                                                               Fix                                   dling failures in compute nodes, communication networks, and
                                                                                                                                                                                                  While supercomputers continue to provide more raw compute

                                                                                                                 �lesystems. A hierarchical scheduler controls GPU-accelerated MD                                                Application
                                                                                                                                                                                                  power, it is becoming increasingly di�cult for applications to fully

 system library                      Build code                       bugs/configs                         Productivity!
                                                                                                                 simulations and in situ analysis.                         OS update              exploit these resources. The challenge of building a multiscale mod-

                                                                                                                    We present the various M�MMI components, including the                                                         breaks
                                                                                                                                                                                                  eling capability utilizing modern supercomputing architecture —

 configurations                                                      based on errors                             macro model, GPU-accelerated MD, in situ analysis of MD data,
                                                                                                                                                                                                  heterogeneous computing elements, deep memory hierarchies, and
                                                                                                                                                                                                  complex network interconnects — can be decomposed along two
                                                                                                                 machine learning selection module, a highly scalable hierarchical                thematic axes: (1) the algorithmic challenges in managing increas-
                                                                                                                 scheduler, and detail the central work�ow manager that ties these                ing levels of parallelism within an application, and (2) the logistic
                                                                                                                 modules together. In addition, we present performance data from                  challenges of scheduling and coordinating the execution of multiple
                                                                                                                 our runs on Sierra, in which we validated M�MMI by investigating                 applications across such diverse resources.
                                                                                                                 an experimentally intractable biological system: the dynamic inter-                 The prototypical approach to the �rst challenge are monolithic
                                                                                                                 action between RAS proteins and a plasma membrane. We used up                    parallel applications able to simulate problems of unprecedented
                                                                                                                 to 4000 nodes of the Sierra supercomputer, concurrently utilizing                size and scale using full-system runs [30, 34, 67]. Conversely, the
                                                                                                                 over 16,000 GPUs and 176,000 CPU cores, and running up to 36,000                 work�ow challenge is often approached through massively parallel
                                                                                                                 di�erent tasks. This multiscale simulation includes about 120,000                ensembles [56], which execute a large number of small- or medium-
                                                                                                                 MD simulations aggregating over 200 milliseconds, which is orders                scale instances simultaneously. Here, we describe the creation of
                                                                                                                 of magnitude greater than comparable studies.                                    a novel simulation infrastructure that addresses both challenges

§ We use system dependencies (MPI, compilers) to:
                                                                                                                                                                                                  and enables, as an example, the execution of a massively parallel,
                                                                                                                 CCS CONCEPTS                                                                     multiscale simulation steered by a machine learning (ML) approach,
                                                                                                                 • Computing methodologies → Machine learning; Multiscale                         and orchestrated through a sophisticated work�ow governing thou-
  — get the best performance from the machine.                                                                   systems; Massively parallel and high-performance simula-
                                                                                                                 tions; Simulation tools; • Applied computing → Computational
                                                                                                                                                                                                  sands of simultaneous tasks.
                                                                                                                                                                                                     The scienti�c challenge to which we apply our novel infrastruc-
  — avoid long build times                                                                                       biology.                                                                         ture is an investigation of the interaction of RAS proteins with the
                                                                                                                                                                                                  cell membrane. Mutations of RAS contribute to a wide range of
                                                                                                                 KEYWORDS                                                                         cancers as RAS modulates the signaling pathways that control cell
    Di Natale et al. A Massively Parallel Infrastructure for Adaptive Multiscale Simulations: Modeling RASmultiscale
                                                                                                           Initiation     Pathway
                                                                                                                     simulations, adaptivefor Cancer.
                                                                                                                                           simulations,    In Supercomputing
                                                                                                                                                        massively parallel,  2019 (SC ’19). 2019. Best paper.
                                                                                                                                                                                                  division and growth. RAS activates signaling only when bound to
                                                                                                                                                                                                  lipid bilayers that form cellular membranes. This membrane asso-
                                                                                                                 heterogenous architecture, machine learning, cancer research
                                                                                                                                                                                                  ciation is an under-explored area of cancer biology that may be
                                                                                                                 ACM Reference Format:                                                            relevant to therapeutic intervention against cancer. We use M�MMI
                                                                                                                 Francesco Di Natale, Harsh Bhatia, Timothy S. Carpenter, Chris Neale, Sara       to facilitate the better understanding of the mechanism and dynam-
                                                                                                                                                                                                                                                                     10
   LLNL-PRES-811108                                                                                              Kokkila Schumacher, Tomas Oppelstrup, Liam Stanton, Xiaohua Zhang, Shiv          ics of interaction between RAS, lipids, and other signaling proteins,
                                                                                                                 Sundram, Thomas R. W. Scogland, Gautham Dharuman, Michael P. Surh,
                                                                                                                                                                                                  which requires molecular-level detail and cannot be obtained exper-
                                                                                                                 Yue Yang, Claudia Misale, Lars Schneidenbach, Carlos Costa, Changhoan
                                                                                                                 Kim, Bruce D’Amora, Sandrasegaram Gnanakaran, Dwight V. Nissley, Fred
                                                                                                                                                                                                  imentally with current technologies. MD simulations can simulate
                                                                                                                 Streitz, Felice C. Lightstone, Peer-Timo Bremer, James N. Glosli, and Helgi I.   such interactions with the appropriate detail, but only for micro-
                                                                                                                 Ingólfsson. 2019. A Massively Parallel Infrastructure for Adaptive Multiscale    scopic length- and time-scales (even on the largest computers).
                                                                                                                 Simulations: Modeling RAS Initiation Pathway for Cancer. In The Interna-         However, lipid concentration gradients and protein aggregation
                                                                                                                 tional Conference for High Performance Computing, Networking, Storage, and       evolve over length- and time-scales hard to access through high-
                                                                                                                 Analysis (SC ’19), November 17–22, 2019, Denver, CO, USA. ACM, New York,         resolution MD.
                                                                                                                 NY, USA, 14 pages. https://doi.org/10.1145/3295500.3356197

Transitive dependency requirements can cause cascading issues
                                                                                                                                                                                                                                                                 mummi                                                                                                                                                                                        gromacs   mummi-macro

                                                                                                                                                                                                 py-keras                                                          cudnn                                                                flux-sched                                                                               talass                             fftw

                                     py-scikit-learn                           py-mdanalysis-mummi                                                                                          py-theano                                                    py-matplotlib                                                                                        flux-core                             yaml-cpp                                                            openmpi

                         py-cython     py-joblib       py-biopython   py-gsd          py-griddataformats           py-networkx         py-scipy   py-mock                                                          py-python-dateutil      py-cycler               py-pillow                                  py-pyyaml       py-jsonschema     lz4             lua-luaposix    py-cffi                               jansson               faiss                  hwloc

                                          py-mmtf                               py-numpy         py-decorator   py-msgpack       py-keras-preprocessing             py-keras-applications                 py-six   py-setuptools-scm    py-kiwisolver     py-pyparsing         libjpeg-turbo                  libyaml         py-vcversioner     py-pycparser             lua                      czmq         databroker                    swig     cuda                numactl     libpciaccess

                                                                                                                                                    py-setuptools                                       openblas                                                                   nasm                                                               unzip                 libuuid       libzmq                 redis       cmake        automake     pcre                  libtool        util-macros

                                                                                                                                                                                                                                          python                                                                                                                                      libsodium                libevent                     autoconf

                                                                                                                                                                                                                                              gettext   expat                                         freetype     libffi   boost                                                                               openssl              perl                                       m4

§ Team “just” needed a new version of Keras                                                                                                                                                                                                             libbsd       bzip2     sqlite                libpng      gdbm               libxml2                                                                                                                                 libsigsegv

  — which needed a new version of Theano
                                                                                                                                                                                                                                              tar         xz     diffutils                readline   zlib

  — which needed a new version of Numpy
                                                                                                                                                                                                                                                                 libiconv                                                                      ncurses

  — which needed a newer version of OpenBLAS
                                                                                                                                                                                                                                                                                                                                                                                      pkgconf

                                                                                                                                                                                                                                                                                                                                                                                                                                                        > 70 packages
                                                                                                                                                                                                                                                                                                                                                                                                                                     mummi

  — But the team was using the system OpenBLAS, which was too old
  — Team had to build several versions of OpenBLAS before they found                                                                                                                                                                                                                                                                                          py-mdanalysis-mummi                                         py-scikit-learn                                                       py-keras

    one compatible with all other packages in the DAG
  — Then had to rebuild the entire stack for ABI compatibility                                                                                                                                                                                                                    faiss              py-scipy               py-gsd             py-biopython                               py-griddataformats                                         py-matplotlib                              py-theano

§ This particular issue consumed 36 person hours                                                                                                                                                                                                                                                                                                                                      py-numpy                            gromacs

§ Frequent OS updates causing ABI incompatibilities between Flux, PMI,                                                                                                                                                                                                                                                                                                                openblas

 and the system MPI cost hundreds of person hours
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            11
      LLNL-PRES-811108

Other teams (Axom, Serac, xSDK, E4S) have similar issues
    with managing configuration complexity
§ Teams really like to lock versions down for testing:
    — Axom team tests on several systems and pins about 20
      package versions to consistent (at the time) values
    — Serac team pins more system versions than this
    — xSDK and E4S stacks from ECP pin specific versions for each
      package

§ Incompatibilities arise and builds fail in one or both
    of two ways:
    1.        Spack upgrade leads to failure because new versions and
              options enter the Spack repository that are incompatible
    2.        OS upgrades at a local site change local versions                               ~21 core libraries
              underneath a package                                                            ~70 total packages

§      Inevitably, this version locking effort is spent over
       and over again for subsequent releases                            Pinned versions and options from xSDK

                                                                                                                   12
    LLNL-PRES-811108

ers seem to struggle with estimating the impact of both types of
                         no_member_overloaded_arrow
                                    init_conversion_failed                                                                                                                                         Dependencies, especially in infrastructure, and testing ar
                         member_decl_does_not_match                                                            dependencies in the release planning.                                               the top mentioned delay factors in rapid releases.
                                         pp_file_not_found
                                                                                                                  Another factor which is perceived to prominently a�ect rapid
                                                                                                               and non-rapid teams is security testing. For rapid teams, developers
                                          expected_rparen
                 ovl_no_viable_member_function_in_call
              typecheck_member_reference_suggestion                                                            report that security tests are almost always delayed because of an
                    typecheck_member_reference_arrow
                                                                                                               unstable acceptance environment or missing release notes. They                  4.3 RQ3: How do rapid release cycles a�ect
                                     expected_expression
                              typecheck_invalid_operands                                                       further add that any software release needs to pass the required                    code quality?
                                           non_virtual_dtor                                                    security penetration test and secure code review, which are centrally           For this research question, we considered 202 survey responses
                            typename_nested_not_found
                                                                                                               performed by the CIO Security department at ING. Respondents                    developers in rapid teams. We removed 165 non-rapid respon
                                         Even outside of HPC, dependencies are the most frequent cause
                                  expected_unqualified_id
                           typecheck_call_too_few_args                                                         report that they often have to delay releases because of “delayed               next to 94 rapid respondents who did not identify as a develo
                                                    access                                                     penetration tests" [r66], “unavailability of security teams" [r133] and         ING.

                                         of build errors and software release delays                           “acting upon their �ndings" [r86].
                             incomplete_member_access
                                 bound_member_function
                                                                         0      10     20     30          40
                                                                                                                  Rapid teams also report delays related to infrastructure and                 4.3.1 Developers’ Perceptions. Developers have mixed opi
                                                                                Percentage of errors           testing (in general). These factors do not feature in the top men-              on how RRs a�ect code quality. A distribution of the e�ect o
                                                                   (b) C++                                     tioned factors in�uencing non-rapid teams. Regarding infrastruc-                (improve, degrade, no e�ect) on di�erent factors related to co
                                                   Types of errors in                                          ture, respondents   mention that issues
                                                                                                                       Factors perceived                 in infrastructure
                                                                                                                                                  to cause                 are related
                                                                                                                                                                release delays                 perceived by developers is shown in Figure 6. It shows resp
ons for build errors
                                                                                                               to the failure of tools responsible for automation (such as Jenkins             suggesting improvements in quality in green, degradation in q
                                              26.6 million builds at Google                                                    among 491 developers at ING
                                                                                                                                                                                               in red and no e�ect in grey.
                                                                                                                                                                                                  Quality improvement. A majority of developers perceiv
                                  100%                                                                                                                                                         the small changes in RRs make the code easier to review, posi
                                  90%                                                                                                                                                          impacting the refactoring e�ort (e.g., “It gets easier to review th
                                  80%                                                                                                                                               § Developers avoid updates to avoid
                                                                                                                                                                                               and address technical debt" [r16]). Developers also report th
     Percentage of Build Errors

                                  70%
                                                                                                                                                                                         problems with dependencies,
                                                                                                                                                                                               small deliverables simplify the process of integrating and me
                                  60%                                                                                                                                                          code changes, and they lower the impact of errors in develop
                                  50%                                                                                                                                                    leading to:
                                                                                                                                                                                               A few developers mention that RRs motivate them to write mo
                                                                                                                                                                                               and understandable code.
                                  40%

                                  30%
                                                                                                   C++                                                                                    —     Security      vulnerabilities
                                                                                                                                                                                                  A large number of developers mention the bene�ts of
                                  20%
                                                                                                   Java
                                                                                                                                                                                          —     Lack ofin performance
                                                                                                                                                                                               feedback      RRs. Feedback from issue trackers and the end
                                  10%
                                                                                                                                                                                          —    allows  teams
                                                                                                                                                                                                Stagnation    to continuously
                                                                                                                                                                                                                   as upgrades refactorbecome
                                                                                                                                                                                                                                         and improve their code
                                   0%                                                                                                                                                          ity based on unforeseen errors and incidents in production.
                                                                                                                                                                                                harder
                                                                                                                                                                                               user feedbackand    hardertoover
                                                                                                                                                                                                               is perceived    lead totime
                                                                                                                                                                                                                                        a greater focus of devel
                                              Y

                                                              H

                                                                         X

                                                                                       C

                                                                                                ER
                                              C

                                                                      TA
                                                               C

                                                                                     TI
                                          EN

                                                                                              TH
                                                             AT

                                                                                  AN

                                                                                                                                                                                               on customer value and software reliability (e.g., “[RRs] give
                                                                     N
                                                                    SY
                                                            M

                                                                                             O
                                          D

                                                                                 M
                                                         IS
                                         EN

                                                                                SE
                                                         M
                                    EP

                                                                                                                                                                                               insight in bugs and issues after releasing. [They] enable us to re
                                                    EP
                                   D

                                                  TY

                                                                     Category
     Figure 6: Build error category (joins C++ and Java
                                                                                                                                                                                               more quickly to user requirements" [r232], “We can better m
     data)                                                                                                                                                                                     the feedback of the customers which increased [with RRs]." [r
                                   “our study clearly shows that better tools to resolve                                                                                                       This enables teams to deliver customer value at a faster and
     thatdependency
           the developer  errors  have the
                               actually      greatest
                                          spent       potential
                                                 addressing   thepayoff”
                                                                   problem
                                                                                                                                                                                               steady pace (e.g., “[With RRs] we can provide more value more
     and waiting for compiles to complete; it is possible that the                                             Figure 5: Factors perceived to cause delays in rapid and non-rapid              to end users." [r65], “Features are delivered at a more steady
         Surveyswitched
     developer     of 26.6Mtobuilds     by 18Ktask
                                  a di↵erent    developers  at Google.
                                                    or was away   from her
                                                                                                               teams Survey of 491 individuals from 691 teams at ING.
     deskSeo
           during                                                                                                                                                                              [r16]).
              et al.,some     error fixing
                      Programmers’           sessions
                                         Build Errors: we logged.
                                                       A Case  Study To
                                                                     (at re-                                            Kula et al., Releasing fast and slow: an exploratory case
     move   extreme     cases
         Google). ICSE 2014.    (e.g., the  developer  went  home   for the                                             study at ING. ESEC/SIGSOFT FSE 2019.
     day), we omitted any builds in which the resolution time
     was greater than twelve hours. When multiple error kinds
     are resolved at once, it is not easy to discriminate resolution                                                                                                                                                                                       13
     time perLLNL-PRES-811108
                error kind. Dividing the total resolution time by                                                                                                                        790
     the number of error kinds is one possible approach but this
     introduces further imprecision to the resolution time. In-
     stead, we constrained our analysis to failing builds with one
     kind of error message. This filter retains 60% of the Java
     builds and 40% of the C++ builds, for a total of more than
     10 million builds.

Three main ways to deal with dependencies have emerged
 in the past 10-20 years
                           Bundled Distribution                         Semantic Versioning                                   Live at Head
 Examples            Linux distributions (Red Hat, Debian)       Spack                                        Google, Facebook, Twitter
                     E4S, xSDK, Anaconda                         NPM, Cargo, Go
                     Spack with locked versions                  Most language dependency managers

 Idea                Curate a large set of mutually compatible   Use uniform version convention,              Everything in one repository,
                     dependencies                                Solve for compatible set                     Developers test changes with all dependents

 Pros                Stability (if software is included)         Frequent updates                             Frequent updates
                                                                 Only relies on local information             Stability, consistency
                                                                 Works in theory                              All changes tested

 Cons                Infrequent updates                          Versions are coarse                          Doesn’t scale beyond a single organization
                     High packaging/curation effort              Developers over-constrain/over-promise       High computational cost of testing
                     Lack of flexibility                         Errors start to dominate at scale            Lack of flexibility (typically just one target env.)

§ All of the approaches have serious drawbacks

§ Need a way to guarantee stability, frequent updates, and version/config flexibility
                                                                                                Taxonomy c/o T. Winters et al. Software Engineering at Google. 2020.
                                                                                                                                                                     14
  LLNL-PRES-811108

The fundamental problem with integrating native dependencies
  is lack of compatibility information
In workflows we’ve seen so far:                                            MuMMI
                                                                 E4S                     Serac
1. No information on how system libs were built

2. Build repeatedly to find compatible libraries     Axom                                        Conduit

3. Hard constraints (pinned versions, etc.) hide
    information and limit choice                                          Community
                                                   xSDK                    Package                   Others
                                                                          Repository
Each team ends up curating its own configuration
with baked-in, incompatible assumptions
                                                            If all projects add restricted versions,
                                                          conflicts will eventually arise that prevent
                                                                   all packages from building.

      We need a way to reuse package builds among different communities of developers
                                                                                                              15
   LLNL-PRES-811108

We’d really like to be able to reuse binary packages
as we find them

1. When OS updates happen underneath a stack:
      — Know what changed by examining the binaries’ ABI
      — Identify what in the stack is no longer compatible
      — Rebuild compatible configurations

2. When user requirements change (e.g., due to a new version):
      — Know which packages need to change to meet the new requirements
      — Identify existing binaries (system or packaged) that satisfy the requirements
      — Install binaries or rebuild as necessary

3. When information is not available:
      — Extrapolate what and how to build based on past, similar builds

              We will enable binary code reuse to reduce iteration in developer workflows
                                                                                            16
LLNL-PRES-811108

We’re kicking off the BUILD project at LLNL
to address some of these issues
                                Models                       Binary Analysis                Dependency Solvers

                           What makes                           How to
                                                                                                 How to find
                        software packages                 assess compatibility
                                                                                            valid configurations?
                           compatible?                      automatically?
                    • What determines binary           • How to reconstitute interface   • Which NP-complete problem
                      compatibility of functions and     information from binaries?        solver should we use?
                      data structures?                                                     • ASP? SMT?
  Research                                             • How to know what data types
  Questions         • How can we model changes           are used by other binaries?     • How can we encode large
                      over time?                                                           problems for efficient solves?
                                                       • How to efficiently manage
                    • How do changes affect other        debug information?              • What heuristics can be used to
                      packages in a graph?                                                 accelerate the solve?
                                                       • How to associate ABI
                    • Which changes are breaking?        information with versions?      • Do we need to develop new
                                                                                           solver algorithms?

                   BUILD: Binary Understanding and Integration Logic for Dependencies
                                                                                                                            17
LLNL-PRES-811108

What’s missing from current package ecosystems?
 Current model is coarse
  A version v1
                                     § Humans define rules like these:
                                       — A version 1.0 depends on B version 2.0
                                       — B version 2.0 depends on C version 3.0 or 4.0
                      C++ runtime
  B version v2         version v4
                     (not modeled)   § Humans update rules each time there is a new release

  C version v3
                                     § Specification is incomplete
                                       — Runtime libraries, compilers, etc. are not modeled
                                       — Not clear whether updates to C require us to rebuild A
                                         • no ABI information

                                     § No place for global constraints in the model, e.g.:
                                       — e.g., “must link with C++ compiler if any dependency uses C++”
                                       — “gcc and clang are ABI incompatible when …”

                                                                                                          18
  LLNL-PRES-811108

Lack of ABI information about runtime libraries has bitten our
code teams over and over again
§ C++11 brought about binary incompatible
                                                                                                2
     changes to the C++ runtime library                                             1
      — All C++ codes had to be rebuilt with new compilers
        to be compatible
      — New builds could not be linked with old ones                  icpc@16                    icpc@17

§ The C++ library version was not bumped, so
                                                                 g++@4.4.7
     strange runtime errors would happen
                                                                                                g++@7.3.0
§ ABI models would show us the differences in
     data type definitions between old and new                          Old libstdc++

                                                             Incompatible!                      Already-installed
§ This type of error happens frequently when                                    New libstdc++   dependency

     mixing Fortran compilers, as well.                                                         Compiler-imposed,
                                                                                                implicit dependency

                                                                                                                      19
LLNL-PRES-811108

Semantic versioning is the de-facto standard for conveying
compatibility information

                                                    Major Version
                   Increment only when changes break compatibility.

                                                    Minor Version
                                                                      1.2.3
              Increment for backward-compatible added functionality

                                                   Patch Version
                       Increment for backward-compatible bugfixes.

§ Pitfalls:
      — Applies to the whole package, but packages may only depend on a subset of functions
      — May over-promise: packages may break despite developers’ intentions
      — May over-constrain: pinned dependency versions are common but lead to false unsatisfiable cases

§ Relies on developers to specify versions correctly
      — Relies on broad community participation
                                                                                              https://semver.org/
                                                                                                                20
LLNL-PRES-811108

Humans             dofornot
Fig. 4. Survival analysis         accurately
                          event “the                     specify
                                     dependency constraint                version
                                                           is changed”, grouped             information
                                                                                by type of constraint (compliant, permissive or restrictive). The
survival probability for pre-1.0.0 constraints is shown with dotted lines, and with straight lines for post-1.0.0 constraints.

                         Rust (Cargo)              NPM (JavaScript)                    PHP (Packagist)                     Ruby (Gems)

§Fig.Plots  show version restrictions across 4 package ecosystems.
      5. Proportion of required packages “specialized” towards a specific constraint type (compliant, permissive or restrictive) for its reverse
   — Permissive
dependencies, based onleaves
                       monthly room   foroflots
                               snapshots         of room
                                            the package    for incorrect
                                                        dependency network.             builds
       — Restrictive rules out builds that would work
       — Non-specialized (white) leaves everything open
                                                     TABLE(no
                                                           3 constraints)
      Concrete examples of specialization analysis for some frequently used packages, and suggested recommendation for package maintainers
                                                        desiring to depend on these packages.
§ HPC ecosystem is most like Ruby (right) – dated, with many permissive constraints
       — Most   projects
           package       don’t use
                      ecosystem    semantic
                                 dependents     versioningpermissive
                                               compliant   and0 won’trestrictive
                                                                      switch                                       decision
            serde        Cargo       592       575 (97.1%)               17                             adhere to semantic versioning
         Likelihood ofPackagist
       —mage2pro/core   build errors is51very high 1       50 (98.0%)     0                         set permissive dependency constraint
             react-scripts     npm            57             1            0         56 (98.2%) impose restrictive dependency constraint
                 rails       Rubygems        997            288          506            203                       undecided
              rest-client    Rubygems     Decan
                                             387et al. What Do
                                                            173Package Dependencies
                                                                         137        Tell Us
                                                                                         77 About Semantic Versioning?  IEEE Trans. Software Eng. May 2019.
                                                                                                                  undecided

                                                                                                                                                         21
LLNL-PRES-811108

pessimistic strategy avoids breaking changes by preventing                     the cases of underscore and lodash. Both packages are dis-
newer minor releases or patches from being used. It should                     tributed on npm and provide similar features. While lodash
be followed if the required package regularly introduces                       claims to follow semver, underscore is known not to have
breaking changes in such minor or patch releases. The opti-                    respected semver in the past,13 notably in versions 1.7.0 and
mistic strategy corresponds to a reactive approach where the                   1.8.0, respectively released in August 2014 and February
package maintainer allows for new releases of dependent                        2015. An analysis of dependency constraints in the last

What if the package manager could model more aspects of ABI?
 Current model is coarse                                Complete model represents how changes affect code

  A version v1                                                         A version v1
                                                                                                    j(t1)
                                                           f(t1)                   g(t1, t2)
                          C++ runtime                                                               k(t1)
  B version v2                                                                                               C++ runtime version v4
                           version v4                             B version v2, defines t2
                         (not modeled)                                                                             defines t1
                                                           h(t3)                   i(t1, t3)
  C version v3                                                                                      l(t1)
                                                                  C version v3, defines t3

§ Libraries at call granularity:         § Runtime libraries behind compilers           § Changes in the graph
   — Entry calls                           — C++, OpenMP, glibc                           — “If C changes, what needs to be
   — Exit calls                            — GPU runtimes                                    rebuilt?”
   — Data type definitions & usage
                                                                                          — We will model semantics of interfaces
                                                                                          — “If h(t3) changes, is B still correct?

                              This model allows us to reason about compatibility
                                                                                                                                      22
  LLNL-PRES-811108

A common ABI issue: containers with hardware dependencies

§ Containers in industry are used in an isolated way        Isolated container usage model
  — No need (until recently) to use special hardware               Host (RHEL7)
  — Container has its own OS image                                                                    Container (RHEL6)
                                                                                                          Application
§ HPC containers assume tighter coupling between                                                         Container MPI
                                                                       Host MPI
  containers and host                                                                ABI Compatible

  — Choose MPI or other layer as an interface                          glibc 2.17                          glibc 2.12
    • Omits transitive library considerations
    • Works until it doesn’t – no way to check or warn
                                                            Attempt to use host MPI gone wrong
§ Example: glibc compatibility issue at NERSC (right)              Host (RHEL7)
                                                                                    bindmount         Container (RHEL6)
                                                                                                          Application
§ We will build models to check configurations and
  ensure binary compatibility in the full stack                        Host MPI                          Container
                                                                                                           Host MPIMPI
                                                                                          ABI
  — Here we could fix the problem by including glibc 2.17
                                                                       glibc 2.17    Incompatible!         glibc 2.12
  — Our models will tell us that this is what is needed

                                                                                                                          23
   LLNL-PRES-811108

To reason about ABI, we need better binary tools

       OS-provided libraries            § Binaries are the ground truth for compatibility
                                          — They contain the functions and data types we must link against
        Stripped proprietary binaries     — Can’t tell compatibility directly from source (depends on build)
                                          — Binaries are final output
         Prebuilt packages
                                        § We often want to be able to make use of existing binaries, but
                                         it’s difficult to do in a cross-platform way

                                        § There aren’t tools that can tell us when the thing we’ll build or
                                         link is actually compatible with an existing binary
                                          — Metadata and OS conventions vary by system
                                          — Debug information may not be available
                                          — Other communities’ builds may be very different from ours

                    We need better debug information to rely on prebuilt packages.
                                                                                                               24
 LLNL-PRES-811108

With the right tools, we can extract
more compatibility information from binaries
        OS-provided libraries                                     f(t1)         g(t1, t2)   entry points

                                                                                            data structures &
         Stripped proprietary binaries   Binary Analysis         B version v2, defines t2
                                                                                            semantics

                                                                  h(t3)         i(t1, t3)   exit points
          Prebuilt packages
                                                                     Compatibility Models
§ Libraries like libabigail and dyninst can do parts of this with debug information (DWARF)
  — Typically requires accurate debug information, which is not always shipped with the system

§ Tools like debuginfod may allow us to look up debug information without having it installed
  — Not available for all distros
  — Needs to be on the critical path for package managers for broader availability

§ Still some gaps even with debug information:
  — Dynamically loaded libraries
  — Can’t tell statically which ones are used
                                                                                                                25
  LLNL-PRES-811108

With compatibility information, we can improve the way
  we do dependency solving
                                                        Human-generated constraints                                  Compatibility Models
§ Package managers produce valid
                                                                                                                          f(t1)          g(t1, t2)
 but not sound graphs.
                                                                                                                         B version v2, defines t2
  — ABI info gives us what we need for soundness
                                                                                                                         h(t3)           i(t1, t3)

§ Solvers could use models generated by binary
 analysis (ground truth)
                                                                                                                Solver
§ Past 10-20 years have brought enormous
 improvements in solver technology
  — CDCL algorithms
                                                                                    mpileaks
                                                                                   version=v1

  — Optimizing SMT and ASP solvers                                                          callpath
                                                                                           version=v3

                                                                                 mpi              dyninst
                                                                                                                     Resolved Graph
§ Time is right to attack packaging with better solving
                                                                              version=v2        version=v4

                                                                                                         libdwarf
                                                                                                        version=v6

                     We aim to integrate binary compatibility checks into dependency solvers      libelf
                                                                                                version=v5

                                                                                                                                                     26
  LLNL-PRES-811108

Work on BUILD can be used across many package ecosystems
§ Currently, software curation effort is duplicated           Linux Distributions
  across packaging ecosystems
  — Each has different assumptions that can affect
    compatibility
  — Each must be maintained in a different, closed world      Monolithic repositories (Live-at-Head)
  — Humans maintain metadata about packages for each
                                                                                               Pants
§ The C++ ecosystem suffers from many of the same
  issues as HPC                                               Language-specific package managers

§ The BUILD project aims to unify these domains                   Cargo                          Rubygems
  — Build a model for binary dependency solving that can be
    used by many package managers                              Scientific Software Ecosystem
  — Initially implement in Spack
  — Export tools and solvers for other packaging ecosystems            Spack
§ Ideally, not every system will need its own strategy
  for managing native dependencies                                                                xSDK

                                                                                                            27
  LLNL-PRES-811108

Disclaimer
This document was prepared as an account of work sponsored by an agency of the United States government. Neither the United
States government nor Lawrence Livermore National Security, LLC, nor any of their employees makes any warranty, expressed or
implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus,
product, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific
commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or
imply its endorsement, recommendation, or favoring by the United States government or Lawrence Livermore National Security, LLC.
The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States government or
Lawrence Livermore National Security, LLC, and shall not be used for advertising or product endorsement purposes.

You can also read