EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...

Page created by Harry Hughes
 
CONTINUE READING
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
EXPLORING THE FRONTIERS OF DEEP
LEARNING FOR EARTH SYSTEM
OBSERVATION AND PREDICTION
Dr. David M. Hall, Senior Data Scientist, NVIDIA
ECMWF-ESA Workshop on Machine Learning, Oct, 2020
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
THE FRONTIERS OF DEEP LEARNING

Applications        AI Trends   Scientific Challenges   Hardware and Tools

                                                                             2
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
RAPID ADOPTION
                 Deep learning is being rapidly adopted by the Earth System Science community
                                                                                                                                                           Ganapathi Subramanian and Crowley                                                                                    Spatial RL for Fire Dynamics
                                                                                                                                                                                                                                                       S. Chen, et al.

                                                                                                                                                                                     A                                               B

                                                                                                                                                                                     C                                               D

                                                                                                                                                                                     E                                               F

                                                                                                                                                                                                                                                          Remote Sens. 2018, 10, 1690                                                                                                                                                         18 of 22

                                                                                                                                                                                     G                                               H                                         being weeds or crops. Thus, the center of the extracted image is marked by a colored dot according to
                             www.nature.com/scientificreports/                                                         www.nature.com/scientificreports                                                                                                                        the probabilities. Blue, red, and white dots mean, respectively, that the extracted image is identified
                                                                                                                                                                                                                                                                               as crop, weed, and an uncertain decision (Figure 14a,c). Uncertain decision means that the two
                                                                                                                                                                                                                                                                               probabilities
                                                                                                                                                                                                                                                                          Fig. 11. Maps of correlation          are very    coefficientsclose atto1-km
                                                                                                                                                                                                                                                                                                                                                  0.5.resolution
                                                                                                                                                                                                                                                                                                                                                        Thereafter,
                                                                                                                                                                                                                                                                                                                                                                 betweenwe Chlused
                                                                                                                                                                                                                                                                                                                                                                               (a), Kd crop
                                                                                                                                                                                                                                                                                                                                                                                       (b), SSTline   information
                                                                                                                                                                                                                                                                                                                                                                                                 (c), SSS                and
                                                                                                                                                                                                                                                                                                                                                                                                          (d), and surface pCOthe    previously created
                                                                                                                                                                                                                                                                                                                                                                                                                               2, respectively. These correlations were derived
                                                                                                                                                                                                                                                                          from superpixels
                                                                                                                                                                                                                                                                                    the interannualto               classify
                                                                                                                                                                                                                                                                                                                monthly                  all the pixels of the image. On each superpixel, we identify which dot color is
                                                                                                                                                                                                                                                                                                                                  anomalies.

                                                                                                                                                                                                                                                                               dominant. A superpixel is classed as crop or weed if the majority of dots are blue or red, respectively.
                                                                                                                                                                                                                                                                                 Before locally tuning a RFRE pCO2 model for the G. Maine, we first                                           spatial resolution, it is not practical to include SSS as a predictor.
                                                                                                                                                             FIGURE 5 | Results for experiment (B) from all algorithms showing performance on the prediction of the next state For
                                                                                                                                                                                                                                                                          tested
                                                                                                                                                                                                                                                                                        superpixels
                                                                                                                                                                                                                                                                                  directly
                                                                                                                                                                                                                                                                                        theafter the training data. (A)where
                                                                                                                                                                                                                                                                                                 locally parameterized
                                                                                                                                                                                                                                                                                                                                       theMLR
                                                                                                                                                                                                                                                                                                                         Satellite image     majority     of dots are white, we used
                                                                                                                                                                                                                                                                                                                                                  model proposed by Signorini
                                                                                                                                                                                                                                                                                                                                                                                                  crop line information. Hence, superpixels
                                                                                                                                                                                                                                                                                                                                                                                             Signorini et al. (2013) used monthly SSS climatology from the World
                                                                                                                                                             of August 11. (B) Thermal image of August 11. (C) Gaussian processes. (D) Value iteration. (E) Policy iteration. (F) Q-learning. (G) MCTS. (H) A3C. Images obtained
                                                                                                                                                             from USGS/NASA Landsat Program.                                                                              et which
                                                                                                                                                                                                                                                                               al. (2013)       are forin   thethe       crop lines
                                                                                                                                                                                                                                                                                                                  G. Maine.            Similarare   regarded
                                                                                                                                                                                                                                                                                                                                                to its            as crop
                                                                                                                                                                                                                                                                                                                                                       original results,  the and
                                                                                                                                                                                                                                                                                                                                                                               modelthe others         are weeds.
                                                                                                                                                                                                                                                                                                                                                                                             Ocean Database       (WOD)The2009superpixels         created
                                                                                                                                                                                                                                                                                                                                                                                                                                 as a SSS data source        in surface pCO2
                                                                                                                                                                                                                                                                                                                                                                                                                                                        in their
                                                                                                                                                                                                                                                                          was      found          to   yield      a    RMSD           of  ~42  μatm.   Then   we  tested   the  RFRE
                                                                                                                                                                                                                                                                               the background are removed. Figure 14b,d present the classification results in parts of the spinach and resolutions of
                                                                                                                                                                                                                                                                                                                                                                                             model    development,     but the  coarse  spatial  and temporal
                                                                                                                                                           superior results. MCTS was the slowest algorithm we tried since                      this RL approach model    should be(Fig. able to2), learnwhich       was parameterized
                                                                                                                                                                                                                                                                                                             a reasonable       policy in          for the GOM, to the G. Maine.             this dataset make it difficult to find concurrent and co-located mea-
                                                                                                                                                           it is not multithreaded and requires extra roll-out simulations and                  data-scarce scenarios     Poor bean
                                                                                                                                                                                                                                                                                by focusing
                                                                                                                                                                                                                                                                                      model fields.          It can state-action
                                                                                                                                                                                                                                                                                                   onperformance
                                                                                                                                                                                                                                                                                                       the reachable        bewas   seenobtained
                                                                                                                                                                                                                                                                                                                                            that inter-row
                                                                                                                                                                                                                                                                                                                                                      (RMSD =and       intra-row
                                                                                                                                                                                                                                                                                                                                                                  89.6 μatm),    sug- weeds        are slightly
                                                                                                                                                                                                                                                                                                                                                                                             surements     of SSS foroverdetected.
                                                                                                                                                                                                                                                                                                                                                                                                                        any given in situ Overdetections
                                                                                                                                                                                                                                                                                                                                                                                                                                            field pCO2 measurements. Fur-
                                                                                                                                                           back propagation at every iteration.                                                 space only.
                                                                                                                                                                There are several reasons why the best RL algorithms are more                                                  are
                                                                                                                                                                                                                                                                          gesting       mainly
                                                                                                                                                                                                                                                                                         that      the       found
                                                                                                                                                                                                                                                                                                          effects        of   at
                                                                                                                                                                                                                                                                                                                            the      the
                                                                                                                                                                                                                                                                                                                                     input edges
                                                                                                                                                                                                                                                                                                                                            variablesoftothe   crop
                                                                                                                                                                                                                                                                                                                                                          surface  pCOrows
                                                                                                                                                                                                                                                                                                                                                                        2 may  where
                                                                                                                                                                                                                                                                                                                                                                                work      the    window
                                                                                                                                                                                                                                                                                                                                                                                             thermore,    this  cannot
                                                                                                                                                                                                                                                                                                                                                                                                                area has   overlap
                                                                                                                                                                                                                                                                                                                                                                                                                          relatively   the
                                                                                                                                                                                                                                                                                                                                                                                                                                      small   whole
                                                                                                                                                                                                                                                                                                                                                                                                                                             river      plant.and the corre-
                                                                                                                                                                                                                                                                                                                                                                                                                                                   discharge,
Remote Sens. 2020, 12, 901                                                                                                     13 of 19                                                                                                                                   differently            in   the G.      Maine         thannot  fromentirely
                                                                                                                                                                                                                                                                                                                                              the GOM.in  Because  the RFRE-based            lation between      SSS and surface
                                                                                                                                                           suited to such domains than supervised learning algorithms.                          8. CHALLENGES                  Some  ANDweed     FUTURE       pixelsWORK     are                             red   because,      after   applying        the   threshold     to   the   2 is poor
                                                                                                                                                                                                                                                                                                                                                                                                                                    pCOExG,     the(correlation
                                                                                                                                                                                                                                                                                                                                                                                                                                                     parts   of coefficient of
                                                                                                                                                           The first reason is that, RL can model the spatial dynamics along                                              pCO2 model is empirical and is locally-trained, it can only be applied to                                          ~−0.07 at p < 0.05, based on field measurements). Therefore, even
                                                                                                                                                           with time in such domains. This enables RL to predict action                         As expounded in Malarz         theseenvironments.
                                                                                                                                                                                                                                                                          similar
                                                                                                                                                                                                                                                                                        et    plants which
                                                                                                                                                                                                                                                                                            al.   (2002),     forest   fire      are less
                                                                                                                                                                                                                                                                                                                             prediction
                                                                                                                                                                                                                                                                                                                       Whereas                 green are RFRE
                                                                                                                                                                                                                                                                                                                                         the GOM-trained      considered
                                                                                                                                                                                                                                                                                                                                                                    model uses soil.
                                                                                                                                                                                                                                                                                                                                                                                   sa-       though river discharge-introduced SSS changes (i.e., 25–34) on seasonal
                                                                                                                                                        choices using a policy tuned to a particular time of fire spread             requires additional information consisting of firefighting inter-
                                                                                                                                                        test as compared with supervised learning which estimates a                                     tellite
                                                                                                                                                                                                                                     vention (such as fire       SSS
                                                                                                                                                                                                                                                           fighting     as anand
                                                                                                                                                                                                                                                                    strategy     input    to account
                                                                                                                                                                                                                                                                                   time elapsed),  whichfor the effect of freshwater mixing, in                        scale in this region could modulate surface pCO2 variations, SSS may
                                                     Figure 5. The results of the whale counting (step-2) CNN-based model that locates and counts the number
                                                                                                                                                        model ofbased on inputs and outputs only. The second reason                  are not taken into consideration  in this study, as we choseis
                                                                                                                                                                                                                                                                                                  a study
                                                     whales (green bounding boxes) in the grid cells in which step-1 CNN gave high probability for whale presence.                                                                                      the G. Maine,       because      there       no relevant satellite SSS available at 1 km                      not necessarily be an effective predictor for surface pCO2 in the G.
                                                                                                                                                        being RL prepares a policy for the agent that takes actions which            region having very minimal fire fighting. In future work, we aim
                                                     The red bounding box shows a false negative. Map data: Google, DigitalGlobe.
                                                                                                                                                        model the underlying causal fire behavior. The supervised learn-             to incorporate this kind of information as well as enriching the
                                                                                                                                                        ing algorithms do not have such a state-action mapping. Thus                 model by including more land characteristics such as moisture,

                                                                                                                                                           Frontiers in ICT | www.frontiersin.org                               10                                          April 2018 | Volume 5 | Article 6

                                                     Figure 6. The proposed automatic whale-counting procedure with a two-step CNN-based model. (A) The
                                                     first-step CNN scans the sample area (following the yellow line) to search for the presence of whales in each grid
                                                     cell (white squares). Only grid cells in which the first-step CNN gives high probability for whale presence (red
                                                     square) are analyzed by (B) the second-step CNN, which finally locates and counts individuals (the four green
                                                                                                                                                                                                                                                                                                                (a)                                                                                (b)
                                                     bounding boxes indicate correctly detected whales and the red box indicates a false negative). Map data: Google,
                                                     DigitalGlobe.
                                                                                                                                                                                                                                                                                                                                                                                                                                                            3
                                                     between 0 and 360°, randomly flipping half of the training images, randomly cropping, random the scale size of
                                                     the images, and random the brightness level of pixels by a factor of up to 50%.
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
NSF AI INSTITUTES
   Seven new Artificial Intelligence institutes, including one for Weather and Climate

NSF AI Institute for Research on Trustworthy AI in Weather, Climate and Coastal Oceanography

NSF announcement

                                                                                               4
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
NOAA CENTER FOR AI
Official strategy and dedicated center focusing on AI

  https://nrc.noaa.gov/LinkClick.aspx?fileticket=0I2p2-Gu3rA%3D&tabid=91&portalid=0

                                                                                      5
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
APPLICATIONS IN
EARTH SYSTEM SCIENCE
                       6
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
ACCELERATED PHYSICS
       Using surrogate models to speed up existing code

                             EMULATION

  SAMPLES FROM
EXPENSIVE FUNCTION

                                                        FAST
                                                     SURROGATE
                                                       MODEL

                                                                 7
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
ACCELERATED PHYSICS PARAMETERIZATIONS
                    Emulation of E3SM Super-parameterized SW and LW Radiation

8-10x Speedup in SW and LW Radiative Transfer Calculations

                                                                                8
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
60
                                                             d           e

                                                       45°

                                                                                                                                 Precipitable water [mm]
                                                                                  IMPROVED PHYSICS PARAMETERIZATIONS        40
                                           Latitude

                                                        0°

                     ARTICLE                                                                   Building
                                                                                                   NATUREmore accurate
                                                                                                         COMMUNICATIONS      physics parameterizations
                                                                                                                        | https://doi.org/10.1038/s41467-020-17142-3

                     https://doi.org/10.1038/s41467-020-17142-3                         OPEN                                20

                     Stable machine-learning parameterization
                                                      –45°                                                                                                               a          hi-res       b       ×8          c     ×8-RF
                                                                                                                                                                                                                                        60
                     of subgrid processes for climate modeling                                                                                                               d           e
                     at a range of resolutions                    20°
                                                                   Longitude
                                                                            50°       20°
                                                                                       Longitude
                                                                                                50°        20°      50°
                                                                                                            Longitude
                                                                                                                            0

                                                                                                                                                                       45°
             Janni Yuval precipitable
                                  & Paul A.   O’Gorman       1✉                   1
shots of column-integrated                water  taken from the statistical equilibrium of simulations. a High-resolution simulation (hi-res),

                                                                                                                                                                                                                                             Precipitable water [mm]
solution simulation (x8), and c coarse-resolution simulation with random forest (RF) parameterization (x8-RF). Insets in a show d a zoomed-in
e the same region but coarse-grained by a factor of 8 to the same grid spacing as in b. The colorbar is saturated in parts of panel b.                                                                                                  40
   1234567890():,;

                                                                                                                                                           Latitude
                     Global climate models represent small-scale processes such as convection using subgrid
                     models known as parameterizations, and these parameterizations contribute substantially to                                                         0°
                        a                     Mean precipitation                 b                  Extreme precipitation
                     uncertainty in climate projections. Machine learning of new parameterizations from high-
                     resolution           output is a promising approach, but800
                             30 modelhi-res                                                  hi-res
                                                                                   such parameterizations      have been
                                       x8-RF                                                 x8-RF
                        Precipitation [mm day–1]

                     prone to25 issues of
                                       x8 instability and climate drift, and their performance
                                                                                             x8 for different grid spa-
                                                                                 600                                                                                                                                                    20
                     cings has not yet been investigated. Here we use a random       forest to learn a parameterization
                             20
                     from coarse-grained output of a three-dimensional high-resolution idealized atmospheric
                     model. 15The parameterization leads to stable simulations   400at coarse resolution that replicate                                               –45°
                     the climate
                             10 of the high-resolution simulation. Retraining for different coarse-graining factors
                                                                                 200
                     shows the parameterization performs best at smaller horizontal          grid spacings. Our results
                              5
                     yield insights into parameterization performance across length scales, and they also
                              0
                     demonstrate    the potential for learning parameterizations 0from global high-resolution simu-                                                                                                                     0
                                       –45°            0°          45°                       –45°            0°           45°
                     lations that are now emerging.                                                                                                                              20°       50°       20°       50°       20°      50°
                                                    Latitude                                              Latitude
                                                                                                                                                                                  Longitude           Longitude           Longitude
n and extreme precipitation as a function of latitude. a Zonal- and time-mean precipitation and b 99.9th percentile of 3-hourly precipitation,
 -resolution simulation (hi-res; blue), and the coarse resolution simulation withFig.
                                                                                   the 1random
                                                                                          Snapshots      of parameterization
                                                                                                forest (RF)  column-integrated          precipitable water taken from the
                                                                                                                               (x8-RF; orange                                   statistical equilibrium of simulations. a High-resolution sim
                                                                                             https://www.nature.com/articles/s41467-020-17142-3
ted) and without the RF parameterization (x8; green). For hi-res, the precipitation is coarse-grained to the grid spacing of x8 prior to calculating
                                      41                                         b coarse-resolution        simulation (x8), and c coarse-resolution simulation with            random forest (RF) parameterization (x8-RF). Insets in a sho
percentile to give a fair comparison .
                                                                                                      region and e the same region but coarse-grained by a factor of 8 to the same grid spacing as in b. The colorbar is saturated
                                                                                                                                                                                                                           9       in parts o
izations both because SAM is not equipped with such                                           outputs in the vertical column together, and therefore the
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
BIAS CORRECTION
Work with ECMWF to remove IFS Model bias

        http://dx.doi.org/10.21957/pl881qs63d   10
NOWCASTING
MetNet: Can beat physical models up to 8 hours

https://ai.googleblog.com/2020/03/a-neural-weather-model-for-eight-hour.html
                                                                               11
MEDIUM-RANGE WEATHER FORECASTING
  Weather Bench: A Standardized Benchmark for Data Driven Forecasts
           WeatherBench: A benchmark dataset for data-driven weather forecasting
Stephan Rasp, Peter D. Dueben, Sebastian Scher, Jonathan A. Weyn, Soukayna Mouatadid, Nils Thuerey

                                      https://arxiv.org/abs/2002.00469

                                                                                                     12
MEDIUM-RANGE WEATHER FORECASTING
Data Driven models match accuracy of physical models on weather-bench

                                                                        13
EL NIÑO PREDICTION
Deep learning model able to predict El Niño up to 18 months in advance

                  https://www.nature.com/articles/s41586-019-1559-7.pdf
                                                                          14
RAPID INTENSIFICATION
Machine Learning improves detection by 40-200 % over operational consensus

                                                                                                       Geophysical Research Letters                                                        10.1029/2020GL089102

                 https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020GL089102
                                                                                                                              −1
                                                             Figure 1. Composite maps of surface precipitation rate (in mm hr ) in storm‐centered coordinate for tropical cyclones in four intensity and four intensification
                                                             rate groups (see text for details). The precipitation is from TRMM 3B42 from 1998 to 2014.

                                                                                                                                                                                                                    15
                                                                                                     Wu and Soden (2017). The average IWP within 100 km of storm center is plotted as a function of TC intensity
ARTICLES
                          FILLING-IN MISSING CLIMATE OBSERVATIONS                                                       https://doi.org/10.1038/s41561-020-0582-5

                                     DL transfer learning + inpainting beats Kriging and PCA
                               Artificial intelligence reconstructs missing climate
                               information
ARTICLES Christopher Kadow!                               !1,2 , David Matthew Hall3 and Uwe Ulbrich! !2
                                                                                                                                                          NATURE GEOSCIENCE

                     Original Historical  temperature measurements
                              (ground truth)                      Maskedare with
                                                                            the basis
                                                                                 missingof global
                                                                                            valuesclimate datasets like20crAI
                                                                                                                        HadCRUT4.   This dataset contains many cmipAI reconstruction
                                                                                                                              reconstruction
           a                  missing values, particularlybfor periods before the mid-twentieth century,calthough recent years are also incomplete. Hered we
                              demonstrate that artificial intelligence can skilfully fill these observational gaps when combined with numerical climate model
                              data. We show that recently developed image inpainting techniques perform accurate monthly reconstructions via transfer
                              learning using either 20CR (Twentieth-Century Reanalysis) or the CMIP5 (Coupled Model Intercomparison Project Phase 5)
 Warm                         experiments. The resulting global annual mean temperature time series exhibit high Pearson correlation coefficients (≥0.9941)
 Pacific                      and low root mean squared errors (≤0.0547!°C) as compared with the original data. These techniques also provide advantages
example                       relative to state-of-the-art kriging interpolation and principal component analysis-based infilling. When applied to HadCRUT4,
                              our method restores a missing spatial pattern of the documented El Niño from July 1877. With respect to the global mean tem-
                              perature time series, a HadCRUT4 reconstruction by our method points to a cooler nineteenth century, a less apparent hiatus in
                              the twenty-first century, an even warmer 2016 being the warmest year on record and a stronger global trend between 1850 and
  20CR                        2018 relative to previous estimates. We propose image inpainting as an approach to reconstruct missing climate information
   56th    e                                               f
                              and thereby reduce uncertainties    and biases in climate records.            g                                           h
 member

                              T
  Cold                           he deeper a research period lies in the past, the fewer observa- training based on the output of numerical models (NMs) has been
                                 tions are available. The atmospheric variable with the longest performed on, for example, regional sea surface temperature time
 Pacific                         measurement record is the near-surface air temperature over series in the Baltic Sea14.
example                    land, usually at a height of 2 m. Around the globe, individual loca-        The rapid progress in artificial intelligence (AI) research has sub-
                           tions have temperature records from as early as the late seventeenth stantially impacted many scientific fields, which includes climate
                           century1 (for example, Zurich, Prague and Berlin). These records science15,16. Examples include deep learning for the recognition of
                           are extremely valuable for science2, but these locations are too sparse forced climate patterns17 or extreme events18, ensemble learning
             5 × 5° (remap)toAnomalies
                              derive global(61–90)
                                               or even regional statements.Input    for AIs
                                                                              Station-combining                            Output composition
                                                                                                    using bootstrap aggregation      to improve decadal climate predic-      Output composition
                           datasets, which include measurements of sea surface temperature tions and estimating ocean heat content from tidal magnetic satel-
                                                                                                          19

                           by ships, start in the mid-nineteenth century. The Fifth Assessment lite observations using neural networks20, to name a few. Recently,
Fig. 1 | AI models reconstruct      two exemplary
                           Report (AR5)     of the United monthly      show cases with
                                                           Nations Intergovernmental            many
                                                                                         Panel on      missing
                                                                                                    major             has beena–h,
                                                                                                             progressvalues.    madeWarm
                                                                                                                                       in image(September        1877) of(a–d) and cold (August
                                                                                                                                                 inpainting, the process
1893) (e–h) eastern PacificClimate   Change (IPCC)
                                examples        explainChapter   2 investigates three
                                                          the reconstruction
                                                                  3
                                                                                      pathway       reconstructing
                                                                                         of thoseof the   held-outmissing      parts of images,
                                                                                                                       56th member          from   which
                                                                                                                                                    20CR.has Shown
                                                                                                                                                              been applied
                                                                                                                                                                       are temperature anomalies in    16
                           ‘Global Combined Land and Sea Surface Temperature’ datasets: primarily to photographs and paintings , but also to, for exam-
                                                                                                                                                 21

degrees centigrade with HadCRUT4
                            respect to  4   the 1961–1990
                                         , MLOST                climatology.
                                                   (Merged Land–Ocean      SurfaceThe    ground-truth
                                                                                   Temperature)  5         original
                                                                                                    ple, satellite     data
                                                                                                                   images  of (column
                                                                                                                              sea surface 1),  masked22,23datasets
                                                                                                                                           temperatures                with missing values (grey) of
                                                                                                                                                            . In particular,
SCIENTIFIC CHALLENGES

                    17
SCIENTIFIC CHALLENGES
         Problems that arise when applying AI to science

• Data Labelling
• Limited Data
• Enforcing Physical Constraints
• Uncertainty Quantification
• Trustworthiness and Interpretability
• HPC - AI Coupling
• Loss of Dynamic Range
• Data Movement
• Numerical Stability
• Generalization

                                                           18
DATA LABELLING
                              How can we get enough labelled data?

    Data Fusion            Self-Supervised Learning            Reinforcement Learning                    Active Learning
 Using one data source     Predicting input B from input A   Obtaining labels directly from the   Using human machine iteration to
as the label for another                                        environment or simulation               make labelling easier

                                                                                                                             19
://doi.org/10.5194/gmd-2020-72
                        https://doi.org/10.5194/gmd-2020-72
                        Preprint. Discussion started: 9 April 2020
rint. Discussion started:c 9Author(s)
                              April2020. 2020
                                            CC BY 4.0 License.
uthor(s) 2020. CC BY 4.0 License.
                                                                                      DATA LABELLING
                                                                        Online Extreme-Weather Labelling Tool
     https://doi.org/10.5194/gmd-2020-72
     Preprint. Discussion started: 9 April 2020
      c Author(s) 2020. CC BY 4.0 License.

                                              ClimateNet: an expert-labelled open dataset and Deep Learning
                                              architecture for enabling high-precision analyses of extreme weather
                                              Prabhat1,2,* , Karthik Kashinath1,* , Mayur Mudigonda1,10,* , Sol Kim2 , Lukas Kapp-Schwoerer3 ,
                                              Andre Graubner3 , Ege Karaismailoglu3 , Leo von Kleist3 , Thorsten Kurth4 , Annette Greiner1 ,
                                              Kevin Yang2 , Colby Lewis2 , Jiayi Chen2 , Andrew Lou2 , Sathyavat Chandran5 , Ben Toms6 ,
                                              Will Chapman7 , Katherine Dagon8 , Christine A. Shields8 , Travis O’Brien9,1 , Michael Wehner1 , and
                                              William Collins1,2
                                              *
                                                Equal contributions
                                              1
                                                Lawrence Berkeley National Laboratory, Berkeley, CA, USA
                                              2
                                                University of California, Berkeley, CA, USA
                                              3
                                                ETH Zurich, Switzerland
                                              4
                                                NVIDIA, Santa Clara, CA, USA
                                              5
                                                Rice University, Houston, TX, USA
                                              6
                                                Colorado State University, Fort Collins, CO, USA
                                              7
                                                Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA
                                              8
                                                National Center for Atmospheric Research, Boulder, CO, USA
                                              9
                                                Indiana University, Bloomington, IN, USA
                                              10
                                                 Terrafuse, Berkeley, CA,USA
                                              Correspondence to: Karthik Kashinath (kkashinath@lbl.gov)
                                                                     https://gmd.copernicus.org/preprints/gmd-2020-72/gmd-2020-72.pdf

                                                                                                                                                     20
ENFORCING PHYSICAL CONSTRAINTS
                                       Is the solution physically correct?

Conservation of Mass, Momentum, Energy, Incompressibility,   Loss Penalization, Hard Constraints, Projective Methods,
    Turbulent Energy Spectra, Translational Invariance                       Differentiable Programs

                                                                                                                  21
ENFORCING PHYSICAL CONSTRAINTS
      Physics Informed Neural Nets

         https://maziarraissi.github.io/PINNs/   22
ENFORCING PHYSICAL CONSTRAINTS
                NVIDIA SimNet

      https://www.youtube.com/watch?v=Oq2Mpi5pF1w
                                                    23
UNCERTAINTY QUANTIFICATION
                 How certain is the prediction?

https://www.researchgate.net/figure/Illustration-of-Gaussian-process-regression-in-
               one-dimension-for-the-target-test_fig1_327613136                       24
UNCERTAINTY QUANTIFICATION
      Methods for quantifying uncertainty

https://www.inovex.de/blog/uncertainty-quantification-deep-learning/
                                                                       25
UNCERTAINTY QUANTIFICATION
                               Methods for quantifying uncertainty

Gaussian process inference                                           Separation of error types

                                                                                                 26
INTERPRETABILITY
What criteria were used in this prediction?

        Layer-wise Relevance Propagation (LRP)

      https://lrpserver.hhi.fraunhofer.de/image-classification
                                                                 27
INTERPRETABILITY
Backwards optimization and Layerwise Relevance Propagation applied to ENSO

                      https://doi.org/10.1029/2019MS002002

                                                                             28
INTERPRETABILITY
LIME: Local interpretable model-agnostic explanations

               https://arxiv.org/pdf/1602.04938.pdf

                                                        29
FORTRAN / AI COUPLING
How can I glue my AI and HPC code together?

                   +
                      (Python)

                                              30
FORTRAN / AI COUPLING
                                            Many solutions. None ideal.

Use Julia. Call Fortran   Use C++ Instead          Ok, but limited    Missing: Native API   Missing: Fortran Bindings

                                                                                                              31
OVERCOMING LOSS OF DYNAMIC RANGE
Ensemble mean always has less variability than the ensemble members

                                                                      32
OVERCOMING LOSS OF DYNAMIC RANGE
       Stochastic Parameterizations

                                      33
TRENDS AND
BREAKTHROUGHS
                34
SELF-SUPERVISION
Babies learn about the world without large labelled datasets. AI can too.

                                                                            35
SELF-SUPERVISION
                                       Pretext tasks build up an internal representation
                                                         PREDICT SPATIAL RELATIONSHIPS                            PREDICT ORIENTATION
       PREDICT COLOR

https://arxiv.org/pdf/1806.09594.pdf

                               PREDICT TEMPORAL ORDER                                    PREDICT MISSING PIECES

                                                                                                                                        36
GPT-3

                                   THE TRANSFORMER
                                                                                                        175 Billion
                                                                                                         Params

                         Has enabled a series of enormous language models

                                                                                          8.3 Billion
MILLIONS of PARAMETERS

                                    best language papers   encoder-decoder-transformers                          37
GPT-3
(July 2020) Current king of the languages models. Based upon the transformer

                                                                               38
NEW HARDWARE AND
 SOFTWARE TOOLS
                   39
5 MIRACLES OF A100

                           NVIDIA Ampere Architecture                                            3rd Gen Tensor Cores
                                World’s Largest 7nm chip                                    Faster, Flexible, Easier to use
                                   54B XTORS, HBM2                                              20x AI Perf with TF32
                                                                                                    2.5x HPC Perf

New Sparsity Acceleration                                   New Multi-Instance GPU                                   3rd Gen NVLINK and NVSWITCH
Harness Sparsity in AI Models                         Optimal utilization with right sized GPU                      Efficient Scaling to Enable Super GPU
     2x AI Performance                                  7x Simultaneous Instances per GPU                                     2X More Bandwidth         40
OMNIVERSE
Interactive Raytracing for Data Visualization And Graphical User Interfaces

                      https://developer.nvidia.com/nvidia-omniverse-platform

                                                                               41
OMNIVERSE
Interactive Raytracing for Data Visualization And Graphical User Interfaces

                                                                              42
PYTORCH LIGHTNING
API to standardize and accelerate your PyTorch models

       https://reproducibility-challenge.github.io/neurips2019/resources/   43
PYTORCH-LIGHTNING BOLTS
Lightning implementations of popular models, optimized for GPUs

                                                                  44
Summary

•   AI for science is advancing rapidly!
•   Good progress on scientific challenges
•   Reviewed major trends in AI
•   Looked at powerful new hardware and
    software tools you can use

                  dhall@
You can also read