Report of Contributions - Workshop: Building reproducible workflows for earth sciences - ECMWF Events

Page created by Gordon Ford
 
CONTINUE READING
Report of Contributions - Workshop: Building reproducible workflows for earth sciences - ECMWF Events
Workshop: Building
reproducible workflows for
      earth sciences

Report of Contributions

      https://events.ecmwf.int/e/116
Workshop: Build … / Report of Contributions                     Using Cloud to Streamline R&D W …

Contribution ID: 1                                                 Type: Oral presentation

          Using Cloud to Streamline R&D Workflow
                                                Wednesday, 16 October 2019 11:00 (40 minutes)

  Finnish Meteorological Institute (FMI) use cloud services several ways. First, FMI has piloted to
  provide its services in the cloud. Second, FMI has joined AWS Public Data Sets -program in order
  to provide its open data. Users who need the whole grid have found the service very convenient
  and for that particular use case AWS popularity is increasing rapidly while FMI own data portal
  usage is decreasing slowly. Third, FMI has piloted Google Cloud for machine learning in impact
  analysis studies. Additional services like BigQuery and Data Studios have been very useful to
  conduct the studies.

Primary author: Mr TERVO, Roope (Finnish Meteorological Institute)
Presenter: Mr TERVO, Roope (Finnish Meteorological Institute)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                    Page 1
Workshop: Build … / Report of Contributions                    Reproducible workflows - Setting t …

Contribution ID: 2                                                 Type: Oral presentation

         Reproducible workflows - Setting the scene
                                                  Monday, 14 October 2019 10:10 (20 minutes)

  ECMWF has always been a hub of much activities around earth science data. Scientist and analysts
  continue to develop new ways of analysing and presenting data. For ECMWF it is crucial that this
  work can be shared and reproduced at any time.
  But ECMWF is also a place constantly changing to make best use of new technologies. In 2020
  ECMWF whole computer centre will be moved from Reading to Bologna. Combined with the
  popularity of public and private cloud environments, this gives even more urgency to ensure that
  scientific work can be reproduced in a robust way.

  This presentation will give an overview of ECMWF’s perspective on reproducible workflows, the
  challenges encountered and solutions found.

Primary authors: SIEMEN, Stephan (ECMWF); VITOLO, Claudia (ECMWF)
Presenter: SIEMEN, Stephan (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                   Page 2
Workshop: Build … / Report of Contributions                       Scaling Machine Learning with the …

Contribution ID: 3                                                    Type: Oral presentation

    Scaling Machine Learning with the help of Cloud
                     Computing
                                                  Wednesday, 16 October 2019 14:50 (40 minutes)

  One of the most common hurdles with developing data science/machine learning models is to
  design end-to-end pipelines that can operate at scale and in real-time. Data scientists and engineers
  are often expected to learn, develop and maintain the infrastructure for their experiments. This
  process takes time away from focussing on training and developing the models.
  What if there was a way of abstracting away the non Machine Learning related tasks while still
  retaining control? This talk will discuss the merits of using Kubeflow. Kubeflow is an open source
  Kubernetes based platform. With the help of Kubeflow, users can:

      • Develop Machine Learning models easily and make repeatable, portable deployments on a
        diverse infrastructure e.g. laptop to production cluster.
      • Scale infrastructure based on the demand.

  This talk will also present the current use cases of Kubeflow and how teams from other industries
  have been utilising the cloud to scale their machine learning operations.

Primary author: Mr IQBAL, Salman (ONS / Learnk8s)
Presenter: Mr IQBAL, Salman (ONS / Learnk8s)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                       Page 3
Workshop: Build … / Report of Contributions                      A journey into the long white Cloud

Contribution ID: 4                                                  Type: Oral presentation

                A journey into the long white Cloud
                                                Wednesday, 16 October 2019 11:40 (20 minutes)

  In 2012 the New Zealand (NZ) MetService started on a journey to move its NWP modelling and
  other automated processing to Amazon Cloud computing. Sitting on multiple earthquake faults
  with a data centre that has limited abilities to increase capacity and resilience against external
  outages, it left the organisation to make a revolutionary change compared with the past: not to
  own a data centre any more for weather modelling. Although mainly driven by the requirement
  of a resilient computing environment, the preparation for the Cloud made many more benefits
  apparent if only a Cloud infrastructure solution would be designed appropriately.
  The main benefits we aimed for were
  - high resilience of infrastructure by combining multiple AWS regions in a seamless environment.
  - change towards scientists adopting professional software development practices and establishing
  an extremely robust release process.
  - cost effectiveness by using spot market instance prices for operations and research.
  - “self-healing” workflows that can guarantee automatic completion against all hardware failures.
  - scientists don’t wait for data any more, they analyse data,
  - much clearer cost attribution towards applications and services.
  This presentation will touch on a number of aspects that were encountered on this journey of
  change which impacted on people, financials, accountabilities, and the mindset of solving science
  and infrastructure problems in the Cloud. It has been an interesting time with a few surprises
  along the way but the general consensus it: let’s not go back to the way it was.

  Having said all this, not all systems and data can be moved to the Cloud leaving the NZ MetService
  to operate in a hybrid environment. The requirements for delivering data to NZ aviation with an
  availability beyond what the Southern Cross cable can provide as well as what the NZ privacy law
  asks for means that some data needs to be hosted within NZ redundantly. Some of the challenges
  that this poses will be presented as well.

Primary author: Dr ZIEGLER, Andy (Meteorological Service NZ Ltd.)
Presenter: Dr ZIEGLER, Andy (Meteorological Service NZ Ltd.)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                    Page 4
Workshop: Build … / Report of Contributions                       Challenges and needs of reproduci …

Contribution ID: 5                                                    Type: Oral presentation

 Challenges and needs of reproducible workflows of
        Open Big Weather and Climate data
                                                     Monday, 14 October 2019 14:50 (20 minutes)

  ECMWF offers, also as operator of the two Copernicus services on Climate Change (C3S) and At-
  mosphere Monitoring (CAMS), a range of open environmental data sets on climate, air quality, fire
  and floods.
  Through Copernicus, a wealth of open data is being made available free and open and a new range
  of users, not necessarily ‘expert’ users, are interested in exploiting the data.
  This makes the reproducibility of workflows particularly important A full, free and open data pol-
  icy is vital for reproducible workflows and an important prerequisite. Reproducibility however
  has to be reflected in all aspects of the data processing chain. The biggest challenge is currently a
  limited data ‘accessibility’, where ‘accessibility’ means more than just improving data access. Ac-
  cessibility aspects are strongly linked with being reproducible and
  require improvements / developments along the entire data processing chain, including the devel-
  opment of example workflows and reproducible training materials, the need for data standards
  and interoperability, as well as developing or improving the right open-source software tools.

  The presentation will go through each step of some example workflows for open meteorological
  and climate data and will discuss reproducibility and ‘accessibility’ challenges and future needs
  that will be required in order to make open meteorological and climate data fully accessible and
  reproducible.

Primary author: WAGEMANN, Julia (ECMWF)
Presenter: WAGEMANN, Julia (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                      Page 5
Workshop: Build … / Report of Contributions                     The Role of Containers in Reprodu …

Contribution ID: 6                                                  Type: Oral presentation

  The Role of Containers in Reproducible Workflows
                                                Wednesday, 16 October 2019 12:20 (20 minutes)

  A key challenge in supporting reproducible workflows in science is ensuring that software environ-
  ment for any simulation or analysis is sufficiently captured and re-runnable. This is compounded
  by the growing complexity of scientific software and the systems they execute on. Containers of-
  fer a potential approach to address some of these challenges. This presentation will describe how
  containers can be used for scientific use cases with an emphasis on reproducibility. It will also
  cover some of the aspects of reproducibility that aren’t easily addressed by containers.

Primary author: CANON, Shane (Lawrence Berkeley National Lab)
Presenter: CANON, Shane (Lawrence Berkeley National Lab)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                    Page 6
Workshop: Build … / Report of Contributions                       Publishing Reproducible Geoscien …

Contribution ID: 7                                                    Type: Oral presentation

       Publishing Reproducible Geoscientific Papers:
          Status quo, benefits, and opportunities
                                                  Wednesday, 16 October 2019 09:10 (40 minutes)

  Publishing Reproducible Geoscientific Papers: Status quo, benefits, and opportunities

  Markus Konkol, University of Münster, Institute for Geoinfomatics

  Abstract:
  Open reproducible research (ORR) is the practice of publishing the source code and the datasets
  needed to produce the computational results reported in a paper. Since many geoscientific articles
  include geostatistical analyses and spatiotemporal data, reproducibility should be a cornerstone
  of the computational geosciences but is rarely realized. Furthermore, publishing scientific out-
  comes in static PDFs does not adequately report on computational aspects. Thus, readers cannot
  fully understand how the authors came to the conclusions and how robust these are to changes in
  the analysis. Consequently, it is difficult for reviewers to follow the analysis steps, and for other
  researchers to reuse existing materials. This talk starts with obstacles that prevented geoscien-
  tists from publishing ORR. To overcome these barriers, the talk suggests concrete strategies. One
  strategy is the executable research compendium (ERC) which encapsulates the paper, code, data,
  and the entire software environment needed to produce the computational results. Such concepts
  can assist authors in adhering to ORR principles to ensure high scientific standards. However,
  ORR is not only about reproducing results but it involves a number of additional benefits, e.g. an
  ERC-based workflow. It allows authors to convey their computational methods and results by also
  providing interactive access to code and data, and readers to deeply investigate the computational
  analysis while reading the actual article, e.g. by changing the parameters of the analysis. Finally,
  the presentation introduces the concept of a binding; a binding connects those code lines and data
  subsets that produce a specific result, e.g. a figure or number. By also considering user interface
  widgets (e.g. a slider), this approach allows readers to interactively manipulate the parameters of
  the analysis to see how the results change.

Primary author: Mr KONKOL, Markus (University of Münster, Institute for Geoinformatics)
Presenter: Mr KONKOL, Markus (University of Münster, Institute for Geoinformatics)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                      Page 7
Workshop: Build … / Report of Contributions                  Workflow in CESM2

Contribution ID: 8                                              Type: Oral presentation

                             Workflow in CESM2
                                                Tuesday, 15 October 2019 12:00 (20 minutes)

  The Community Earth System Model (CESM) version 2.x includes a case control system (CCS)
  developed in object oriented python. In this talk I will present the CCS with an emphasis on
  workflow control using tools developed at NCAR as well as third party tools such as CYLC.

Primary author: Mr EDWARDS, Jim (National Center for Atmospheric Research USA)
Presenter: Mr EDWARDS, Jim (National Center for Atmospheric Research USA)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                  Page 8
Workshop: Build … / Report of Contributions                    Building robust and reproducible …

Contribution ID: 9                                                Type: Oral presentation

   Building robust and reproducible workflows with
                    Cylc and Rose
                                                  Tuesday, 15 October 2019 11:20 (40 minutes)

  Cylc is an Open-Source workflow tool used by a number of National Met Services to control the
  workflow of their software, including the Met Office in the UK. We talk about how the Met Of-
  fice uses Cylc and the related software configuration tool Rose to ensure that our workflows are
  reproducible, and discuss best practice when designing workflows. We also discuss the features
  of Cylc which improve robustness, enabling workflows to endure hardware outages and other
  interruptions to service.

Primary author: WHITEHOUSE, Stuart (Met Office)
Presenter: WHITEHOUSE, Stuart (Met Office)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                    Page 9
Workshop: Build … / Report of Contributions                       Scaling Reproducible Research wit …

Contribution ID: 10                                                   Type: Oral presentation

 Scaling Reproducible Research with Project Jupyter
                                                     Tuesday, 15 October 2019 09:10 (40 minutes)

  Jupyter notebooks have become the de-facto standard as a scientific and data science tool for pro-
  ducing computational narratives. Over five million Jupyter notebooks exist on GitHub today. Be-
  yond the classic Jupyter notebook, Project Jupyter’s tools have evolved to provide end to end work-
  flows for research that enable scientists to prototype, collaborate, and scale with ease. JupyterLab,
  a web-based, extensible, next generation interactive development environment enables researchers
  to combine Jupyter notebooks, code and data to form computational narratives. JupyterHub brings
  the power of notebooks to groups of users. It gives users access to computational environments
  and resources without burdening the users with installation and maintenance tasks. Binder builds
  upon JupyterHub and provides free, sharable, interactive computing environments to people all
  around the world.

Primary author: Ms WILLING, Carol (Project Jupyter)
Presenter: Ms WILLING, Carol (Project Jupyter)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                       Page 10
Workshop: Build … / Report of Contributions                     DARE: Integrating solutions for …

Contribution ID: 11                                                 Type: Oral presentation

  DARE: Integrating solutions for Data-Intensive and
               Reproducible Science
                                                Wednesday, 16 October 2019 09:50 (20 minutes)

  The DARE (Delivering Agile Research Excellence on European e-Infrastructures) project is imple-
  menting solutions to enable user-driven reproducible computations that involve complex and data-
  intensive methods. Technology developed in DARE enables Domain Experts, Computational Scien-
  tists and Research Developers to compose, use and validate methods that are expressed in abstract
  terms. Scientists’ workflows translates to concrete applications that are deployed and executed
  on cloud resources offered by European and international e-infrastructures, as well as in-house
  institutional platforms and commercial providers. The platforms’ core services enable researchers
  to visualise the collected provenance data from runs of their methods for detailed diagnostics and
  validation, in support of long-running research campaigns involving multiple runs. Use cases are
  presented by two scientific communities in the framework of EPOS and IS-ENES, conducting re-
  search in computational seismology and climate-impact studies respectively. DARE enables users
  to develop their methods within generic environments, such as Jupyter notebooks, associated with
  conceptual and evolving workspaces, or via the invocation of OGC WPS services interfacing with
  institutional data archives. We will show how DARE exploits computational facilities adopting
  software containerisation and infrastructure orchestration technologies (Kubernetes). These are
  transparently managed via the DARE API, in combination with registries describing data, data-
  sources and methods. Ultimately, the extensive adoption of workflows (dispel4py, CWL), methods
  abstraction and containerisation, allows DARE to dedicate special attention to portability and re-
  producibility of scientific progress in different computational contexts. We will show how choices
  of research developers as well as the effects of the execution of their workflows are captured and
  managed, which enable validation, monitoring and reproducibility. We will discuss the implemen-
  tation of the provenance mechanisms that adopt worklfow’s provenance types, lineage services
  (S-ProvFlow) and PROV-Templates, to record and interactively use context-rich provenance infor-
  mation in W3C PROV compliant formats.

Primary authors: Dr SPINUSO, Alessandro (KNMI); Dr KLAMPANOS, Iraklis (NCSR Demokri-
tos); Dr PAGÉ, Christian (CERFACS); Dr ATKINSON, Malcolm (University of Edinburgh)

Presenter: Dr SPINUSO, Alessandro (KNMI)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                    Page 11
Workshop: Build … / Report of Contributions                       Automated production of high val …

Contribution ID: 12                                                  Type: Oral presentation

     Automated production of high value air quality
     forecasts with Pangeo, Papermill and Krontab
                                                    Tuesday, 15 October 2019 09:50 (20 minutes)

  In many ways, a Jupyter notebook describes a data processing pipeline: you select some data
  at the top of the notebook, define reduction and analysis algorithms as the core of the notebook’s
  content, and generate value – often in the form of plots or new insight – at the end of the notebook
  by applying algorithm to data. Value can be added to analysis and insight by including textual
  metadata throughout the notebook that describes the analysis applied and interpretation of the
  insight generated in the notebook.
  It is a common requirement to want to apply the same processing pipeline, described by a Jupyter
  notebook, to multiple datasets. In the case of air quality forecasts, this might mean executing the
  same processing pipeline on all chemical species implicated in a particular air quality study.

  In this talk we will present Pangeo as an open-source, highly customisable, scalable, cloud-first
  data processing platform. We will demonstrate using Pangeo to run a defined data processing
  pipeline in a Jupyter notebook, and move on to explore running this notebook multiple times on
  a range of input datasets using papermill. Finally we will demonstrate running the processing
  pipeline automatically to a schedule defined with krontab, a crontab-like job scheduling system
  for kubernetes.

Primary authors: KILLICK, Peter (Met Office Informatics Lab); ROBINSON, Niall (Met Office
Informatics Lab); DONKERS, Kevin (Met Office)

Presenter: KILLICK, Peter (Met Office Informatics Lab)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                      Page 12
Workshop: Build … / Report of Contributions                      Space Situational Awareness - Virt …

Contribution ID: 13                                                 Type: Oral presentation

        Space Situational Awareness - Virtual Search
                       Environment
                                                    Tuesday, 15 October 2019 12:40 (20 minutes)

  Mr Marek Kubel-Grabau
  Analytic at EVERSIS sp z.o.o.
  EVERSIS created a prototype web platform called the SSA-VRE (Space Situational Awareness -
  Virtual Search Environment) which aims to provide domain specialists with a space where science
  solutions and tools could be made available for scientific audiences, supported by possible ideas
  exchange and community building.
  The SSA-VRE organically grows and expands the possibilities of using the data that fosters innova-
  tion and results in world-leading collaborative solutions. It supports across segmental cooperation
  of dozens of scientists and service providers. A platform is packed not only with data and tools
  but also inspiring remarks, knowledge resources to browse and share, or projects everyone can
  join and develop.

  Eversis team would like to discuss the concept of science community building and information
  flow in the open-sources digital environments. Team will try to answer a question why such an
  idea is successful in the SSA domain and how it could benefit other science domains

Primary author: KUBEL-GRABAU, Marek (Eversis)
Presenter: KUBEL-GRABAU, Marek (Eversis)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                      Page 13
Workshop: Build … / Report of Contributions                    Remote Presentation: CROW - …

Contribution ID: 14                                               Type: Oral presentation

       Remote Presentation: CROW - Python-based
       Configuration Toolbox for Operational and
               Development Workflows
  The increasing level of complexity of EMC workflows lead to several challenges for researchers
  and users. One of the major issues is the absence of a modernized and generalized front-end. The
  Configurator of Research and Operational Workflow (CROW) is developed to fill the gap between
  developers and users, through an object-oriented programming approach using python. The goal
  of CROW is to drastically automate the most time-consuming and error-prone stages of execut-
  ing a workflow, such as platform adaptation, resource allocation and model configuration. This
  means more creative work could be done with the given resources, in terms of both user hours and
  computing resources. Highly human-readable YAML definition files are taken as input of CROW,
  and Rocoto or ecFlow definition files are generated automatically at the end. The introduction
  of CROW will greatly increase the efficiency of collaboration, documentation and R2O transition,
  and benefit users and developers from both EMC and the community.

Primary author: Mrs FRIEDMAN, Kate (NOAA)
Co-author: Mr KUANG, Jian (IMSG@NOAA)
Presenter: Mrs FRIEDMAN, Kate (NOAA)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                Page 14
Workshop: Build … / Report of Contributions                        Reproducible science at large scale …

Contribution ID: 15                                                    Type: Oral presentation

         Reproducible science at large scale within a
        continuous delivery pipeline: the BSC vision
                                                      Tuesday, 15 October 2019 12:20 (20 minutes)

  Numerical models of the climate system are an essential pillar of modern climate research. Over
  the years, these models have become more and more complex and today’s Earth System Models
  (ESMs) consist of several components of the climate system coupled together, and running on high
  performance computing (HPC) facilities. Moreover, climate experiments often entail running dif-
  ferent instances of these models from different starting conditions and for an indeterminate num-
  ber of steps. This workflow usually involves other tasks needed to perform a complete experiment,
  as data pre-processing, post-processing, transfering or archiving.
  As a result, to reproduce a climate experiment is far from a trivial task which requires the orchestra-
  tion of different methodologies and tools in order to guarantee, when not bit to bit, the statistical
  reproducibility of the research.
  In this work we show the methodology and software tools employed to achieve science repro-
  ducibility in the Earth Sciences department of the Barcelona Supercomputing Center. Version
  control systems (VCS), test-oriented development, continuous integration with automatic and pe-
  riodic tests, and a well-established reproducibility methodology, data repositories and data feder-
  ation services, are all orchestrated by a fault-tolerant workflow management system.

  Additionally, we show our experience in providing an operational service in the context of a re-
  search environment, with the set up of a highly fault-tolerant system. In this case the option of
  providing redundancy by using cloud services to execute computational models was considered
  and studied in comparison with the capabilities provided by HPC systems.

Primary authors:  CASTRILLO, Miguel (BSC-CNS); Dr ACOSTA COBOS, Mario (BSC); SER-
RADELL MARONDA, Kim (Barcelona Supercomputing Center)

Presenter: CASTRILLO, Miguel (BSC-CNS)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                        Page 15
Workshop: Build … / Report of Contributions                    Remote presentation: Developing …

Contribution ID: 16                                               Type: Oral presentation

    Remote presentation: Developing a Unified
 Workflow for Convection Allowing Applications of
                    the FV3
                                               Wednesday, 16 October 2019 14:30 (20 minutes)

  The Environmental Modeling Center (EMC) at the National Centers for Environmental Prediction
  (NCEP) has developed a limited area modeling capability for the Unified Forecast System (UFS),
  which uses the Finite Volume Cubed-Sphere (FV3) dynamical core. The limited area FV3 is a deter-
  ministic, convection-allowing model (CAM) being run routinely over several domains for testing
  and development purposes as part of a long-term effort toward formulating the Rapid Refresh
  Forecast System (RRFS). The RRFS is a convection-allowing, ensemble-based data assimilation and
  prediction system which is planned to feature an hourly update cadence. While current testing
  resides mostly on NOAA high-performance computing platforms, work is underway to perform
  development using cloud compute resources.
  Two workflows for running the limited area FV3 have been developed: 1) a community workflow,
  primarily focused on engaging NOAA collaborators and research community modeling efforts,
  and 2) an operational workflow, focused on the transition to operations. Both workflows utilize
  the Rocoto workflow manager, shell scripting, and have been ported to multiple supercomputing
  platforms. Unification of the two workflows is underway to foster collaboration and accelerated
  development. In July 2019, a code sprint focusing on developing a workflow for operations and
  research applications took place featuring membership across multiple organizations. Outcomes
  from this sprint, current efforts, and ongoing challenges will be discussed.

Primary author: Mr BLAKE, Benjamin (IMSG and NOAA/NWS/NCEP/EMC)
Co-authors:      Mr KETEFIAN, Gerard (CIRES and NOAA/ESRL/GSD); Mr BECK, Jeff (CIRA and
NOAA/ESRL/GSD); Mr PYLE, Matthew (NOAA/NWS/NCEP/EMC); Mr ROGERS, Eric (NOAA/NWS/NCEP/EMC); Mr
LIU, Bin (IMSG and NOAA/NWS/NCEP/EMC); Dr REAMES, Larissa (CIMMS and NOAA/OAR/NSSL); Mrs
WOLFF, Jamie (NCAR/DTC); Dr CARLEY, Jacob (NOAA/NWS/NCEP/EMC); Mr CHAWLA, Arun
(NOAA/NWS/NCEP/EMC)

Presenter: Mr BLAKE, Benjamin (IMSG and NOAA/NWS/NCEP/EMC)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                   Page 16
Workshop: Build … / Report of Contributions                         Versioning and tracking changes o …

Contribution ID: 17                                                    Type: Oral presentation

      Versioning and tracking changes of vector data
                                                      Monday, 14 October 2019 12:40 (20 minutes)

  Geospatial data are often treated as static datasets. But, in reality, the data are presenting specific
  features at a specific time.
  Handling changes to the data is usually a manual work which does not capture the history and
  source of changes.

  We have started GEODIFF (https://github.com/lutraconsulting/geodiff), a library to version and
  track changes to vector data. The Mergin service (https://public.cloudmergin.com) was developed
  so that users can take advantage of the tool to track their vector data when changes are made from
  various sources.

Primary author: RAZMJOOEI, Saber (Lutra Consulting)
Presenter: RAZMJOOEI, Saber (Lutra Consulting)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                       Page 17
Workshop: Build … / Report of Contributions                          Responding to reproducibility chal …

Contribution ID: 18                                                      Type: Oral presentation

      Responding to reproducibility challenges from
               physics to social sciences
                                                       Monday, 14 October 2019 11:00 (40 minutes)

  Facilitating research reproducibility presents a pressing issue across all sciences. However, since
  different challenges arise in natural and social sciences, domain-specific strategies might be the
  best way to promote reproducibility. This talk presents experiences from two different disciplines:
  energy economics and high-energy physics. It discusses and compares potential technical and
  conceptual solutions for facilitating reproducibility and openness in the two fields. On the energy
  economics side, the ubiquitous use of proprietary software and sensitive data are encumbering ef-
  forts to share research and thus inhibit reproducibility. I present insights around these issues based
  on interviews with faculty and staff at the Energy Policy Institute at the University of Chicago. On
  the high-energy physics side, vast amounts of data and complex analysis workflows are among the
  main barriers to reproducibility. I present domain-tailored solutions to these problems, including
  projects called CERN Open Data, CERN Analysis Preservation and REANA - Reusable Analysis. Fi-
  nally, I discuss the types of tools that can be used to facilitate reproducibility and sharing, detailing
  their ability to address various challenges across different disciplines.

Primary author: Dr TRISOVIC, Ana (IQSS, Harvard University )
Presenter: Dr TRISOVIC, Ana (IQSS, Harvard University )

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                         Page 18
Workshop: Build … / Report of Contributions               Welcome and introduction

Contribution ID: 19                                           Type: Oral presentation

                        Welcome and introduction
                                              Monday, 14 October 2019 09:50 (20 minutes)

Presenters: VITOLO, Claudia (ECMWF); PAPPENBERGER, Florian (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                               Page 19
Workshop: Build … / Report of Contributions               Leveraging OGC standards to boo …

Contribution ID: 23                                           Type: Oral presentation

  Leveraging OGC standards to boost reproducibilty
                                              Monday, 14 October 2019 14:30 (20 minutes)

Presenter: SIMONIS, Ingo (OGC)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                               Page 20
Workshop: Build … / Report of Contributions                Recap from day 1 and remarks

Contribution ID: 31                                           Type: Oral presentation

                      Recap from day 1 and remarks
                                              Tuesday, 15 October 2019 09:00 (10 minutes)

Presenter: VITOLO, Claudia (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                Page 21
Workshop: Build … / Report of Contributions                  Recap from day 2 and remarks

Contribution ID: 45                                              Type: Oral presentation

                      Recap from day 2 and remarks
                                              Wednesday, 16 October 2019 09:00 (10 minutes)

Presenter: SIEMEN, Stephan (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                  Page 22
Workshop: Build … / Report of Contributions                      Design of a Generic Workflow Ge …

Contribution ID: 53                                                 Type: Oral presentation

    Design of a Generic Workflow Generator for the
            JEDI Data Assimilation System
                                                    Tuesday, 15 October 2019 11:00 (20 minutes)

  The JEDI (Joint Effort in Data assimilation Integration) is a collaborative project that provides a
  generic interface to data assimilation algorithms and observation operators for atmospheric, ma-
  rine and other Earth system models, allowing these components to be easily and dynamically com-
  posed into complete data-assimilation and forecast-cycling systems. In this work we present the
  design of a generic workflow generation system that allows users to easily configure the JEDI
  components to produce custom data analysis toolchains with full cycling capability. Like the
  JEDI system itself, the workflow component is designed as a dynamically composable system of
  generic applications. An important point is that the JEDI workflow system is a generic workflow
  generation system, designed to programmatically produce workflow descriptions for a range of
  production-quality workflow management software engines including ecFlow, Cylc, and Apache
  Airflow. Configuration of the JEDI executables, the Python applications that control them, and
  the connection of applications into larger workflow specifications, is entirely accomplished with
  YAML-syntax configuration files using the Jinja templating engine. The combination of YAML and
  Jinja is simultaneously powerful, simple, and easily editable, allowing the user to quickly recon-
  figure workflow descriptions. A user can change model parameters, DA algorithms, covariance
  models, observation operators, and observation QC filtering algorithms, as well as the entire work-
  flow graph structure, all without writing any shell scripts, editing any code, or recompiling any
  packages. Another key focus of the JEDI workflow system is data provenance and experiment re-
  producibility. Execution reproducibility is accomplished through elimination of unportable shell
  scripting in favor of Python-3; reliance on version control systems; universal use of checksum
  verification of all input products; and archiving of all relevant configuration and state as human-
  readable and editable YAML files.

Primary authors: Dr OLAH, Mark J. (UCAR / JCSDA); Dr TRÉMOLET, Yannick (UCAR / JCSDA)

Presenter: Dr OLAH, Mark J. (UCAR / JCSDA)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                      Page 23
Workshop: Build … / Report of Contributions                      Reproducing new and old operatio …

Contribution ID: 54                                                 Type: Oral presentation

   Reproducing new and old operational systems on
     development workstations using containers
                                                 Wednesday, 16 October 2019 12:40 (20 minutes)

  Linux containers (Singularity and Docker) have significantly assisted with the Bureau of Meteo-
  rology’s transition of operational weather model statistical post-processing from an old mid-range
  system to a new data-intensive HPC cluster. Containers provided a way to run the same software
  as both old and new systems on development workstations, which led to significantly easier de-
  velopment and allowed migration work to begin well before full readiness of the new HPC cluster
  system. Containers also provided reproducibility and consistent results for scientific verification
  of post-processing software. This talk describes how containers have been used in a team contain-
  ing a mix of scientists and software developers, what has been learnt from the use of containers
  and recommendations for other teams adopting containers as part of their development process.

Primary author: Dr GALE, Tom (Bureau of Meteorology)
Presenter: Dr GALE, Tom (Bureau of Meteorology)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                     Page 24
Workshop: Build … / Report of Contributions                         Jupyter for Reproducible Science a …

Contribution ID: 56                                                     Type: Oral presentation

     Jupyter for Reproducible Science at Photon and
                   Neutron Facilities
                                                       Tuesday, 15 October 2019 10:10 (20 minutes)

  Modern photon and neutron facilities produce huge amounts of data which can lead to interesting
  and important scientific results. However the increasing volume of data produced at these facilities
  leads to some fundamental issues with data analysis.
  With data sets in the hundreds of terrabytes it is difficult for scientists to work with their data, the
  large size also leads to another issue as these huge volumes of data require a lot of computational
  power to be analysed, lastly it can be difficult to find out what analysis was performed to arrive at
  a certain result, making reproducibility challenging.
  Jupyter notebooks potentially offer an elegant solution to these problems, they can be ran remotely
  so the data can stay at the facility it was gathered at, and the integrated text and plotting function-
  ality allows scientists to explain and record the steps they are taking to analyse their data, meaning
  that others can easily follow along and reproduce their results.
  The PaNOSC (Photon and Neutron Open Science Cloud) project aims to promote remote analysis
  via Jupyter notebooks, with a focus on reproducibility and following FAIR (Findable, Accessible,
  Interoperable, Re-Usable) data standards.
  There are many technical challenges which must be addressed before such an approach is possible,
  such as recreating the computational environments, recording workflows used by the scientists,
  seamlessly moving these environments and workflows to the data, and having this work through
  one common portal which links together several international facilities.
  Some of these challenges have likely also been encountered by other scientific, as such it would be
  very useful to work together and try to come up with common workflows and solutions to creating
  reproducible notebooks for science.

  This project (PaNOSC) has received funding from the European Union’s Horizon 2020 research
  and innovation programme under grant agreement No 823852.

Primary author: ROSCA, Robert (European XFEL)
Presenter: ROSCA, Robert (European XFEL)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                         Page 25
Workshop: Build … / Report of Contributions                      CMIP6 post-processing workflow …

Contribution ID: 57                                                  Type: Oral presentation

  CMIP6 post-processing workflow at the Met Office
                                                    Tuesday, 15 October 2019 14:00 (20 minutes)

  The Climate Data Dissemination System is the software system developed in the Met Office Hadley
  Centre to post process simulation output into the format required for submission to the Coupled
  Model Intercomparison Project phase 6 (CMIP6). The system decouples the production of data and
  metadata in CMIP standard formats from the running of climates simulations. This provides the
  flexibility required by participation in a global community-led project, allowing simulation to run
  before all the specifications of the datasets have been finalised.

  I will describe how we have developed a workflow based on standard python tools developed in
  the Met Office and the climate research community to build a system that enables traceability and
  reproducibility in the climate data post-processing workflow. I will discuss the advantages and
  disadvantages of the choices we have made for this system. I will then discuss plans for future
  climate data projects, including CMIP7.

Primary author: Mr HADDAD, Stephen (Met Office)
Presenter: Mr HADDAD, Stephen (Met Office)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                      Page 26
Workshop: Build … / Report of Contributions                     From loose scripts to ad-hoc repro …

Contribution ID: 58                                                 Type: Oral presentation

      From loose scripts to ad-hoc reproducible
  workflows: a methodology using ECMWF’s ecflow
                                                Wednesday, 16 October 2019 12:00 (20 minutes)

  With a rising need for resource-efficient and fully tractable software stacks, e.g. in operational
  contexts, a programmer increasingly faces ever-changing complex software- and hardware depen-
  dencies within their workflow. In this talk, we show how ecflow was used to adaptively build up a
  fully reproducible workflow at runtime for the calibration of the European Flood Awareness Sys-
  tem’s hydrologic model, although the presented methods can be adapted to any workflow needs.
  Whether we require serial, parallel, or mixed-mode executions on a single or multiple computer
  systems, ecflow can help to easily manage, monitor and adapt any software stack without penalis-
  ing execution performance.

Primary author: DECREMER, Damien (ECMWF)
Co-author: MAZZETTI, Cinzia (ECMWF)
Presenter: DECREMER, Damien (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                    Page 27
Workshop: Build … / Report of Contributions                        The Copernicus Climate Data Stor …

Contribution ID: 59                                                    Type: Oral presentation

   The Copernicus Climate Data Store: ECMWF’s
 approach to providing online access to climate data
                     and tools
                                                     Monday, 14 October 2019 15:10 (20 minutes)

  Data about Earth’s climate is being gathered at an ever-increasing rate. To organise and provide
  access to this data, the Copernicus (European Union’s Earth Observation Program) Climate Change
  Service (C3S) operated by ECMWF released the Climate Data Store (CDS) mid-2018. The aim of
  the C3S service is to help a diverse set of users, including policy-makers, businesses and scientists,
  to investigate and tackle climate change and the CDS provides a cloud-based platform and freely
  available data to enable this.
  The CDS provides reliable information about the past, present and future climate, on global, con-
  tinental, and regional scales. It contains a variety of data types, including satellite observations,
  in-situ measurements, climate model projections and seasonal forecasts.
  It is set to give free access to this huge amount of open climate data, presenting new opportunities
  to all those who require authoritative information on climate change. A quick ‘click-and-go’ ex-
  perience via a simple, uniform user interface offers easy online access to a wealth of climate data
  that anyone can freely browse and download after a simple registration process.
  As well as discovering and browsing trusted datasets, users can use the CDS Toolbox to analyze
  CDS data online by building their own data processing workflows and their own web-based appli-
  cations.
  The reproducibility is key here, both for the CDS catalogue and the CDS Toolbox. The CDS needs
  to be a reliable and sustainable system that users can trust.

  The CDS is continually being optimised and expanded through interaction with users. It delivers
  more than 60TB a day. It can be accessed at https://cds.climate.copernicus.eu .

Primary authors: Dr BIAVATI, Gionata (ECMWF); BERGERON, Cedric
Presenter: Dr BIAVATI, Gionata (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                      Page 28
Workshop: Build … / Report of Contributions                   Reproducibility and workflows wi …

Contribution ID: 60                                              Type: Oral presentation

  Reproducibility and workflows with the Integrated
                 Forecasting System
                                                 Monday, 14 October 2019 12:00 (40 minutes)

  Development and testing of scientific changes to ECMWF’s Integrated Forecast System is com-
  plex and is distinct from non-scientific software workflows at ECMWF. This talk will compare
  these branching models and explain why those differences exist. It will also discuss work that
  has recently been done to take the ideas from non-scientific software workflow to improve and
  streamline our scientific workflows where we can.

Primary author: BENNETT, Andrew (ECMWF)
Co-author: Dr VILLAUME, Sebastien (ECMWF)
Presenter: BENNETT, Andrew (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                  Page 29
Workshop: Build … / Report of Contributions                      Standardised data representation - …

Contribution ID: 61                                                 Type: Oral presentation

        Standardised data representation - power of
                 reproducible work-flow
                                                    Monday, 14 October 2019 16:00 (20 minutes)

  ECMWF maintains strong collaboration with WMO, space agencies, conventional data providers
  and Member/Co-operating States in standardized data representation to accomplishing exchange
  and data assimilation of the high-quality observations. That secure success of efficient data work-
  flow, longevity in data archive and climate reanalysis, ensuring new weather observations will
  continue to improve forecasts for the benefit of society. The power of the standardized data comes
  from the power of data usage amongst internal and external stakeholders serving seamless inte-
  gration in the operations while fulfilling needs in research.

Primary author: Ms CREPULJA , Marijana (ECMWF)
Presenter: Ms CREPULJA , Marijana (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                     Page 30
Workshop: Build … / Report of Contributions                       ESoWC: A Machine Learning Pipe …

Contribution ID: 62                                                   Type: Oral presentation

   ESoWC: A Machine Learning Pipeline for Climate
                     Science
                                                  Wednesday, 16 October 2019 16:00 (20 minutes)

  Thomas Lees [1], Gabriel Tseng [2], Simon Dadson [1], Steven Reece [1]
  [1] University of Oxford
  [2] Okra Solar, Phnom Penh
  As part of the ECMWF Summer of Weather Code we have developed a machine learning pipeline
  for working with climate and hydrologicla data. We had three goals for the project. Firstly, to bring
  machine learning capabilities to scientists working in hydrology, weather and climate. This meant
  we wanted to produce an extensible and reproducible workflow that can be adapted for different
  input and output datasets. Secondly, we wanted the pipeline to use open source datasets in order
  to allow for full reproducibility. The ECMWF and Copernicus Climate Data Store provided access
  to all of the ERA5 data. We augmented this with satellite derived variables from other providers.
  Finally, we had a strong focus on good software engineering practices, including code reviews, unit
  testing and continuous integration. This allowed us to iterate quickly and to develop a code-base
  which can be extended and tested to ensure that core functionality will still be provided. Here we
  present our pipeline, outline some of our key learnings and show some exemplary results.

Primary authors: Mr LEES, Thomas (University of Oxford); TSENG (WILL PRESENT REMOTELY),
Gabriel (Okra Solar); Prof. DADSON, Simon (University of Oxford); Dr REECE, Steven (University of
Oxford)

Presenters: Mr LEES, Thomas (University of Oxford); TSENG (WILL PRESENT REMOTELY), Gabriel
(Okra Solar)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                      Page 31
Workshop: Build … / Report of Contributions                   Using containers for reproducible …

Contribution ID: 63                                              Type: Oral presentation

           Using containers for reproducible results
                                                 Tuesday, 15 October 2019 14:40 (20 minutes)

  This talk will show how we’re using containers at ECMWF in several web developments, in order
  to achieve a reasonably reproducible workflow from development to production.

Primary authors: VALIENTE, Carlos (ECMWF); MARTINS, Manuel (ECMWF)
Co-author: HERMIDA MERA, Marcos
Presenter: VALIENTE, Carlos (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                   Page 32
Workshop: Build … / Report of Contributions                      A reproducible flood forecasting c …

Contribution ID: 64                                                  Type: Oral presentation

   A reproducible flood forecasting case study using
        different machine learning techniques
                                                 Wednesday, 16 October 2019 16:20 (20 minutes)

  Lukas Kugler(1), Sebastian Lehner(1,2)
  (1)University of Vienna, Department of Meteorology and Geophysics, Vienna
  (2)Zentralanstalt für Meteorologie und Geodynamik, Vienna
  Extreme weather events can cause massive economic and human damage due to their inherent
  rarity and the usually large amplitudes associated with them. For this reason, forecast with a
  focus on extreme events is essential.
  The core of this study is the prediction of extreme flooding events using various machine learn-
  ing methods (Linear Regression, Support Vector Regression, Gradient Boosting Regression, Time-
  Delay Neural Net). These will be compared with each other, with a persistence forecast and with
  forecast reruns from the GloFAS (Global Flood Awareness System). The whole work was carried
  out with Jupyter Notebooks using a small sample data set, which is all available on Github [1] and
  hence, open source and fully reproducible.
  The data bases are the ERA5 reanalysis, of which various meteorological variables are used as
  predictors and the GloFAS 2.0 reanalysis from which river discharge is used as predictand. The area
  of interest is the upper Danube catchment. All of the data is available from 1981 to 2016 and was
  divided into 25 years for model training, 6 years for validation and hyper parameter optimization,
  as well as 5 years for an independent testing period.
  Since the focus is on extreme flooding events, times within the test period containing steep in-
  creases in river discharge are evaluated with 14-day forecasts, with varying initialisation times.
  Additionaly, a comparison with GloFAS ‘forecast reruns’ is carried out for the 2013 flooding event
  in Central Europe.
  This work was supported through the ESoWC (ECMWF Summer of Weather Code) 2019 program
  from ECMWF (European Centre of Medium-Range Weather Forecast), Copernicus and the Depart-
  ment of Meteorology and Geophysics from the University of Vienna.

  [1] https://github.com/esowc/ml_flood

Primary authors: KUGLER, Lukas (University of Vienna); LEHNER, Sebastian (Zentralanstalt für
Meteorologie und Geodynamik)

Presenter: LEHNER, Sebastian (Zentralanstalt für Meteorologie und Geodynamik)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                     Page 33
Workshop: Build … / Report of Contributions                     Refactoring EFAS product generat …

Contribution ID: 65                                                 Type: Oral presentation

    Refactoring EFAS product generation - Lessons
 learned on testing, performance and reproducibility
  The European Flood Awareness System (EFAS) is composed of many flood forecasting products
  (medium range flood forecasts, seasonal hydrological outlooks, flash flood indicators, etc.) com-
  puted from different modelling chains. Each chain has its own origin, complexity, workflow and
  dissemination schedule. In this work, we discuss the challenges associated to the integration of
  these chains in an operational system. We show how computer scientists, hydrologists and mete-
  orologists, can work together to leverage their own areas of expertise in order to build efficient
  and reproducible operational workflow. The methodology is applied on the refactoring of two ma-
  jor components of EFAS: the product generation of the medium-range flood forecast model and
  ERIC, a flash flood indicator. To support these two chains, a python framework (danu) has been
  developed, which will be used for the integration of future new products. The whole develop-
  ment process has also been reviewed, using git workflow, continuous integration and versioning,
  leading to more stable and reproducible system.

Primary authors: CARTON DE WIART, Corentin (ECMWF); QUINTINO, Tiago (ECMWF); PRUD-
HOMME, Christel (ECMWF)

Presenter: CARTON DE WIART, Corentin (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                  Page 34
Workshop: Build … / Report of Contributions                     Thoughts and ideas for using devo …

Contribution ID: 66                                                 Type: Oral presentation

  Thoughts and ideas for using devops principles and
       best practices to ensure transparency and
   reproducibility in an international partnership of
                meteorological services
  The meteorological services of the Netherlands and Ireland and Iceland and Denmark are via the
  United Weather Centre West, UWCW, initiative embarking on a journey for a common NWP pro-
  duction using a common HPC facility. Such cross site partnership on both research and operation
  puts certain expectations on the workflows to follow for ensuring transparency and reproducibility
  at all times. Hence the project has recently initiated work on assessing the value of implement-
  ing a set of devops principles and best practices with the subsequent aim of letting procedures in
  software defining the workflow and do the gatekeeping. This in turn is expected to also empower
  developers to come closer to and take some responsibility for operation. Since this work is very
  much in its infancy, the presentation will just scratch the surface of the intended works to be
  carried out.

Primary author: LORENZEN, Thomas (Danish Meteorological Institute)
Presenter: LORENZEN, Thomas (Danish Meteorological Institute)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                  Page 35
Workshop: Build … / Report of Contributions                      ECMWF Data Governance

Contribution ID: 67                                                  Type: Oral presentation

                          ECMWF Data Governance
                                                    Monday, 14 October 2019 16:20 (20 minutes)

  ECMWF is hosting the largest meteorological archive in the world with more than 300 PB stored
  on tape. Data created, used, archived and distributed by ECMWF are a valuable and unique asset.
  Often, they are generated at considerable expense and effort, and can be almost impossible to
  reproduce. As such, they should be properly governed and managed to ensure that the maximum
  value and benefit is extracted from them. It is also vital that they are available for reuse in the
  future. Often decisions about data can have far reaching implications which are not immediately
  apparent, and the Data Governance process provides the means to ensure that all impacts are
  properly considered. The “Data Governance” workflow therefore plays a crucial role and enables
  ECMWF to meet both current and future data challenges.
  In this talk, I will review the type of data we handle at ECMWF and explain the importance of
  having precise, well-defined metadata to describe and index the information. I will illustrate the
  Data Governance process through a set of concrete examples covering new parameters and new
  types of data.

Primary author: VILLAUME, Sebastien (ECMWF)
Presenter: VILLAUME, Sebastien (ECMWF)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                     Page 36
Workshop: Build … / Report of Contributions                        Remote presentation: Singularity …

Contribution ID: 68                                                    Type: Oral presentation

        Remote presentation: Singularity Containers
                                                  Wednesday, 16 October 2019 16:40 (20 minutes)

  Singularity is an open source container platform which is ideally suited to reproducible scientific
  workflows. With Singularity you can package tools or entire workflows into a single container
  image file that can be flexibly deployed on High Performance Computing (HPC), cloud, or local
  compute resources. Starting from a docker image, or building from scratch, your containers can
  be cryptographically signed and verified allowing confidence in the provenance of your analyses.
  GPUs and high performance networking hardware are supported natively for efficient execution
  of large models.

  We will give a brief overview of Singularity - what it is, where to get it, and how to use it. As
  an example, we’ll show how Singularity can be used to run a workflow where different tools are
  installed on different Linux distributions, where it provides flexibility by freeing the user from the
  constraints of their environment.

Primary authors: Dr TRUDGIAN, David (Sylabs Inc.); ARANGO, Eduardo (Sylabs Inc.); KURTZER,
Gregory (Sylabs Inc.)

Presenter: Dr TRUDGIAN, David (Sylabs Inc.)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                      Page 37
Workshop: Build … / Report of Contributions                        Building a reproducible workflow …

Contribution ID: 69                                                    Type: Oral presentation

        Building a reproducible workflow for a new
        statistical post-processing system (ecPoint)
  ecPoint is a new statistical post-processing system developed at ECMWF to produce probabilistic
  forecasts at point scale from ECMWF’s global ensemble (ENS). After an initial in-house verification,
  ecPoint has shown its potential to be a revolutionary system because it could fill a critical gap that
  currently exists between end-users and forecast providers. Nowadays, end-users increasingly ask
  for medium/long-range ensemble km-scale predictions, but the scientific community is not able
  yet to provide such forecasts due to computational limitations.
  ecPoint, despite having shown initial positive results, has already challenged its developers in
  many ways. Firstly, ecPoint is a brand new system. This means that, from the scientific point of
  view, it still needs to prove its robustness to find its place within the scientific community, unlike
  other better established statistical post-processing systems. Secondly, ecPoint has a complex work-
  flow. This means that, from the computational point of view, it needs to be efficiently maintained
  to be able to produce forecasts in an operational mode.
  This presentation will focus on the efforts done to provide a comprehensive reproducible frame-
  work in order to:

      • Prove the scientific robustness of the system;
      • Favour an easy exchange of knowledge and scientific progress by limiting time-consuming
        and error-prone steps in the data processing workflow, and providing a good description
        of companion information that would enable subsequent confirmation and/or extension of
        published results;
      • Favour computational reproducibility and replicability by capturing and describing the com-
        puting environment used to calibrate ecPoint, to produce ecPoint forecasts or reproduce
        published scientific results.

Primary authors: PILLOSU, Fatima (ECMWF & Reading University); HEWSON, Tim (ECMWF,
Reading, UK); GASCON, Estibaliz (ECMWF, Reading, UK); MONTANI, Andrea (ECMWF, Reading,
UK); BONNET, Axel (ECMWF, Reading, UK); BOSE , Anirudha (Ledger, Paris, France)

Presenter: PILLOSU, Fatima (ECMWF & Reading University)

Track Classification: Workshop: Building reproducible workflows for earth sciences

May 11, 2021                                                                                      Page 38
You can also read