EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION - Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
EXPLORING THE FRONTIERS OF DEEP LEARNING FOR EARTH SYSTEM OBSERVATION AND PREDICTION Dr. David M. Hall, Senior Data Scientist, NVIDIA ECMWF-ESA Workshop on Machine Learning, Oct, 2020
RAPID ADOPTION Deep learning is being rapidly adopted by the Earth System Science community Ganapathi Subramanian and Crowley Spatial RL for Fire Dynamics S. Chen, et al. A B C D E F Remote Sens. 2018, 10, 1690 18 of 22 G H being weeds or crops. Thus, the center of the extracted image is marked by a colored dot according to www.nature.com/scientificreports/ www.nature.com/scientificreports the probabilities. Blue, red, and white dots mean, respectively, that the extracted image is identified as crop, weed, and an uncertain decision (Figure 14a,c). Uncertain decision means that the two probabilities Fig. 11. Maps of correlation are very coefficientsclose atto1-km 0.5.resolution Thereafter, betweenwe Chlused (a), Kd crop (b), SSTline information (c), SSS and (d), and surface pCOthe previously created 2, respectively. These correlations were derived from superpixels the interannualto classify monthly all the pixels of the image. On each superpixel, we identify which dot color is anomalies. dominant. A superpixel is classed as crop or weed if the majority of dots are blue or red, respectively. Before locally tuning a RFRE pCO2 model for the G. Maine, we first spatial resolution, it is not practical to include SSS as a predictor. FIGURE 5 | Results for experiment (B) from all algorithms showing performance on the prediction of the next state For tested superpixels directly theafter the training data. (A)where locally parameterized theMLR Satellite image majority of dots are white, we used model proposed by Signorini crop line information. Hence, superpixels Signorini et al. (2013) used monthly SSS climatology from the World of August 11. (B) Thermal image of August 11. (C) Gaussian processes. (D) Value iteration. (E) Policy iteration. (F) Q-learning. (G) MCTS. (H) A3C. Images obtained from USGS/NASA Landsat Program. et which al. (2013) are forin thethe crop lines G. Maine. Similarare regarded to its as crop original results, the and modelthe others are weeds. Ocean Database (WOD)The2009superpixels created as a SSS data source in surface pCO2 in their was found to yield a RMSD of ~42 μatm. Then we tested the RFRE the background are removed. Figure 14b,d present the classification results in parts of the spinach and resolutions of model development, but the coarse spatial and temporal superior results. MCTS was the slowest algorithm we tried since this RL approach model should be(Fig. able to2), learnwhich was parameterized a reasonable policy in for the GOM, to the G. Maine. this dataset make it difficult to find concurrent and co-located mea- it is not multithreaded and requires extra roll-out simulations and data-scarce scenarios Poor bean by focusing model fields. It can state-action onperformance the reachable bewas seenobtained that inter-row (RMSD =and intra-row 89.6 μatm), sug- weeds are slightly surements of SSS foroverdetected. any given in situ Overdetections field pCO2 measurements. Fur- back propagation at every iteration. space only. There are several reasons why the best RL algorithms are more are gesting mainly that the found effects of at the the input edges variablesoftothe crop surface pCOrows 2 may where work the window thermore, this cannot area has overlap relatively the small whole river plant.and the corre- discharge, Remote Sens. 2020, 12, 901 13 of 19 differently in the G. Maine thannot fromentirely the GOM.in Because the RFRE-based lation between SSS and surface suited to such domains than supervised learning algorithms. 8. CHALLENGES Some ANDweed FUTURE pixelsWORK are red because, after applying the threshold to the 2 is poor pCOExG, the(correlation parts of coefficient of The first reason is that, RL can model the spatial dynamics along pCO2 model is empirical and is locally-trained, it can only be applied to ~−0.07 at p < 0.05, based on field measurements). Therefore, even with time in such domains. This enables RL to predict action As expounded in Malarz theseenvironments. similar et plants which al. (2002), forest fire are less prediction Whereas green are RFRE the GOM-trained considered model uses soil. sa- though river discharge-introduced SSS changes (i.e., 25–34) on seasonal choices using a policy tuned to a particular time of fire spread requires additional information consisting of firefighting inter- test as compared with supervised learning which estimates a tellite vention (such as fire SSS fighting as anand strategy input to account time elapsed), whichfor the effect of freshwater mixing, in scale in this region could modulate surface pCO2 variations, SSS may Figure 5. The results of the whale counting (step-2) CNN-based model that locates and counts the number model ofbased on inputs and outputs only. The second reason are not taken into consideration in this study, as we choseis a study whales (green bounding boxes) in the grid cells in which step-1 CNN gave high probability for whale presence. the G. Maine, because there no relevant satellite SSS available at 1 km not necessarily be an effective predictor for surface pCO2 in the G. being RL prepares a policy for the agent that takes actions which region having very minimal fire fighting. In future work, we aim The red bounding box shows a false negative. Map data: Google, DigitalGlobe. model the underlying causal fire behavior. The supervised learn- to incorporate this kind of information as well as enriching the ing algorithms do not have such a state-action mapping. Thus model by including more land characteristics such as moisture, Frontiers in ICT | www.frontiersin.org 10 April 2018 | Volume 5 | Article 6 Figure 6. The proposed automatic whale-counting procedure with a two-step CNN-based model. (A) The first-step CNN scans the sample area (following the yellow line) to search for the presence of whales in each grid cell (white squares). Only grid cells in which the first-step CNN gives high probability for whale presence (red square) are analyzed by (B) the second-step CNN, which finally locates and counts individuals (the four green (a) (b) bounding boxes indicate correctly detected whales and the red box indicates a false negative). Map data: Google, DigitalGlobe. 3 between 0 and 360°, randomly flipping half of the training images, randomly cropping, random the scale size of the images, and random the brightness level of pixels by a factor of up to 50%.
NSF AI INSTITUTES Seven new Artificial Intelligence institutes, including one for Weather and Climate NSF AI Institute for Research on Trustworthy AI in Weather, Climate and Coastal Oceanography NSF announcement 4
NOAA CENTER FOR AI Official strategy and dedicated center focusing on AI https://nrc.noaa.gov/LinkClick.aspx?fileticket=0I2p2-Gu3rA%3D&tabid=91&portalid=0 5
ACCELERATED PHYSICS Using surrogate models to speed up existing code EMULATION SAMPLES FROM EXPENSIVE FUNCTION FAST SURROGATE MODEL 7
ACCELERATED PHYSICS PARAMETERIZATIONS Emulation of E3SM Super-parameterized SW and LW Radiation 8-10x Speedup in SW and LW Radiative Transfer Calculations 8
60 d e 45° Precipitable water [mm] IMPROVED PHYSICS PARAMETERIZATIONS 40 Latitude 0° ARTICLE Building NATUREmore accurate COMMUNICATIONS physics parameterizations | https://doi.org/10.1038/s41467-020-17142-3 https://doi.org/10.1038/s41467-020-17142-3 OPEN 20 Stable machine-learning parameterization –45° a hi-res b ×8 c ×8-RF 60 of subgrid processes for climate modeling d e at a range of resolutions 20° Longitude 50° 20° Longitude 50° 20° 50° Longitude 0 45° Janni Yuval precipitable & Paul A. O’Gorman 1✉ 1 shots of column-integrated water taken from the statistical equilibrium of simulations. a High-resolution simulation (hi-res), Precipitable water [mm] solution simulation (x8), and c coarse-resolution simulation with random forest (RF) parameterization (x8-RF). Insets in a show d a zoomed-in e the same region but coarse-grained by a factor of 8 to the same grid spacing as in b. The colorbar is saturated in parts of panel b. 40 1234567890():,; Latitude Global climate models represent small-scale processes such as convection using subgrid models known as parameterizations, and these parameterizations contribute substantially to 0° a Mean precipitation b Extreme precipitation uncertainty in climate projections. Machine learning of new parameterizations from high- resolution output is a promising approach, but800 30 modelhi-res hi-res such parameterizations have been x8-RF x8-RF Precipitation [mm day–1] prone to25 issues of x8 instability and climate drift, and their performance x8 for different grid spa- 600 20 cings has not yet been investigated. Here we use a random forest to learn a parameterization 20 from coarse-grained output of a three-dimensional high-resolution idealized atmospheric model. 15The parameterization leads to stable simulations 400at coarse resolution that replicate –45° the climate 10 of the high-resolution simulation. Retraining for different coarse-graining factors 200 shows the parameterization performs best at smaller horizontal grid spacings. Our results 5 yield insights into parameterization performance across length scales, and they also 0 demonstrate the potential for learning parameterizations 0from global high-resolution simu- 0 –45° 0° 45° –45° 0° 45° lations that are now emerging. 20° 50° 20° 50° 20° 50° Latitude Latitude Longitude Longitude Longitude n and extreme precipitation as a function of latitude. a Zonal- and time-mean precipitation and b 99.9th percentile of 3-hourly precipitation, -resolution simulation (hi-res; blue), and the coarse resolution simulation withFig. the 1random Snapshots of parameterization forest (RF) column-integrated precipitable water taken from the (x8-RF; orange statistical equilibrium of simulations. a High-resolution sim https://www.nature.com/articles/s41467-020-17142-3 ted) and without the RF parameterization (x8; green). For hi-res, the precipitation is coarse-grained to the grid spacing of x8 prior to calculating 41 b coarse-resolution simulation (x8), and c coarse-resolution simulation with random forest (RF) parameterization (x8-RF). Insets in a sho percentile to give a fair comparison . region and e the same region but coarse-grained by a factor of 8 to the same grid spacing as in b. The colorbar is saturated 9 in parts o izations both because SAM is not equipped with such outputs in the vertical column together, and therefore the
NOWCASTING MetNet: Can beat physical models up to 8 hours https://ai.googleblog.com/2020/03/a-neural-weather-model-for-eight-hour.html 11
MEDIUM-RANGE WEATHER FORECASTING Weather Bench: A Standardized Benchmark for Data Driven Forecasts WeatherBench: A benchmark dataset for data-driven weather forecasting Stephan Rasp, Peter D. Dueben, Sebastian Scher, Jonathan A. Weyn, Soukayna Mouatadid, Nils Thuerey https://arxiv.org/abs/2002.00469 12
MEDIUM-RANGE WEATHER FORECASTING Data Driven models match accuracy of physical models on weather-bench 13
EL NIÑO PREDICTION Deep learning model able to predict El Niño up to 18 months in advance https://www.nature.com/articles/s41586-019-1559-7.pdf 14
RAPID INTENSIFICATION Machine Learning improves detection by 40-200 % over operational consensus Geophysical Research Letters 10.1029/2020GL089102 https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2020GL089102 −1 Figure 1. Composite maps of surface precipitation rate (in mm hr ) in storm‐centered coordinate for tropical cyclones in four intensity and four intensification rate groups (see text for details). The precipitation is from TRMM 3B42 from 1998 to 2014. 15 Wu and Soden (2017). The average IWP within 100 km of storm center is plotted as a function of TC intensity
ARTICLES FILLING-IN MISSING CLIMATE OBSERVATIONS https://doi.org/10.1038/s41561-020-0582-5 DL transfer learning + inpainting beats Kriging and PCA Artificial intelligence reconstructs missing climate information ARTICLES Christopher Kadow! !1,2 , David Matthew Hall3 and Uwe Ulbrich! !2 NATURE GEOSCIENCE Original Historical temperature measurements (ground truth) Maskedare with the basis missingof global valuesclimate datasets like20crAI HadCRUT4. This dataset contains many cmipAI reconstruction reconstruction a missing values, particularlybfor periods before the mid-twentieth century,calthough recent years are also incomplete. Hered we demonstrate that artificial intelligence can skilfully fill these observational gaps when combined with numerical climate model data. We show that recently developed image inpainting techniques perform accurate monthly reconstructions via transfer learning using either 20CR (Twentieth-Century Reanalysis) or the CMIP5 (Coupled Model Intercomparison Project Phase 5) Warm experiments. The resulting global annual mean temperature time series exhibit high Pearson correlation coefficients (≥0.9941) Pacific and low root mean squared errors (≤0.0547!°C) as compared with the original data. These techniques also provide advantages example relative to state-of-the-art kriging interpolation and principal component analysis-based infilling. When applied to HadCRUT4, our method restores a missing spatial pattern of the documented El Niño from July 1877. With respect to the global mean tem- perature time series, a HadCRUT4 reconstruction by our method points to a cooler nineteenth century, a less apparent hiatus in the twenty-first century, an even warmer 2016 being the warmest year on record and a stronger global trend between 1850 and 20CR 2018 relative to previous estimates. We propose image inpainting as an approach to reconstruct missing climate information 56th e f and thereby reduce uncertainties and biases in climate records. g h member T Cold he deeper a research period lies in the past, the fewer observa- training based on the output of numerical models (NMs) has been tions are available. The atmospheric variable with the longest performed on, for example, regional sea surface temperature time Pacific measurement record is the near-surface air temperature over series in the Baltic Sea14. example land, usually at a height of 2 m. Around the globe, individual loca- The rapid progress in artificial intelligence (AI) research has sub- tions have temperature records from as early as the late seventeenth stantially impacted many scientific fields, which includes climate century1 (for example, Zurich, Prague and Berlin). These records science15,16. Examples include deep learning for the recognition of are extremely valuable for science2, but these locations are too sparse forced climate patterns17 or extreme events18, ensemble learning 5 × 5° (remap)toAnomalies derive global(61–90) or even regional statements.Input for AIs Station-combining Output composition using bootstrap aggregation to improve decadal climate predic- Output composition datasets, which include measurements of sea surface temperature tions and estimating ocean heat content from tidal magnetic satel- 19 by ships, start in the mid-nineteenth century. The Fifth Assessment lite observations using neural networks20, to name a few. Recently, Fig. 1 | AI models reconstruct two exemplary Report (AR5) of the United monthly show cases with Nations Intergovernmental many Panel on missing major has beena–h, progressvalues. madeWarm in image(September 1877) of(a–d) and cold (August inpainting, the process 1893) (e–h) eastern PacificClimate Change (IPCC) examples explainChapter 2 investigates three the reconstruction 3 pathway reconstructing of thoseof the held-outmissing parts of images, 56th member from which 20CR.has Shown been applied are temperature anomalies in 16 ‘Global Combined Land and Sea Surface Temperature’ datasets: primarily to photographs and paintings , but also to, for exam- 21 degrees centigrade with HadCRUT4 respect to 4 the 1961–1990 , MLOST climatology. (Merged Land–Ocean SurfaceThe ground-truth Temperature) 5 original ple, satellite data images of (column sea surface 1), masked22,23datasets temperatures with missing values (grey) of . In particular,
SCIENTIFIC CHALLENGES 17
SCIENTIFIC CHALLENGES Problems that arise when applying AI to science • Data Labelling • Limited Data • Enforcing Physical Constraints • Uncertainty Quantification • Trustworthiness and Interpretability • HPC - AI Coupling • Loss of Dynamic Range • Data Movement • Numerical Stability • Generalization 18
DATA LABELLING How can we get enough labelled data? Data Fusion Self-Supervised Learning Reinforcement Learning Active Learning Using one data source Predicting input B from input A Obtaining labels directly from the Using human machine iteration to as the label for another environment or simulation make labelling easier 19
://doi.org/10.5194/gmd-2020-72 https://doi.org/10.5194/gmd-2020-72 Preprint. Discussion started: 9 April 2020 rint. Discussion started:c 9Author(s) April2020. 2020 CC BY 4.0 License. uthor(s) 2020. CC BY 4.0 License. DATA LABELLING Online Extreme-Weather Labelling Tool https://doi.org/10.5194/gmd-2020-72 Preprint. Discussion started: 9 April 2020 c Author(s) 2020. CC BY 4.0 License. ClimateNet: an expert-labelled open dataset and Deep Learning architecture for enabling high-precision analyses of extreme weather Prabhat1,2,* , Karthik Kashinath1,* , Mayur Mudigonda1,10,* , Sol Kim2 , Lukas Kapp-Schwoerer3 , Andre Graubner3 , Ege Karaismailoglu3 , Leo von Kleist3 , Thorsten Kurth4 , Annette Greiner1 , Kevin Yang2 , Colby Lewis2 , Jiayi Chen2 , Andrew Lou2 , Sathyavat Chandran5 , Ben Toms6 , Will Chapman7 , Katherine Dagon8 , Christine A. Shields8 , Travis O’Brien9,1 , Michael Wehner1 , and William Collins1,2 * Equal contributions 1 Lawrence Berkeley National Laboratory, Berkeley, CA, USA 2 University of California, Berkeley, CA, USA 3 ETH Zurich, Switzerland 4 NVIDIA, Santa Clara, CA, USA 5 Rice University, Houston, TX, USA 6 Colorado State University, Fort Collins, CO, USA 7 Scripps Institution of Oceanography, University of California, San Diego, La Jolla, CA, USA 8 National Center for Atmospheric Research, Boulder, CO, USA 9 Indiana University, Bloomington, IN, USA 10 Terrafuse, Berkeley, CA,USA Correspondence to: Karthik Kashinath (kkashinath@lbl.gov) https://gmd.copernicus.org/preprints/gmd-2020-72/gmd-2020-72.pdf 20
ENFORCING PHYSICAL CONSTRAINTS Is the solution physically correct? Conservation of Mass, Momentum, Energy, Incompressibility, Loss Penalization, Hard Constraints, Projective Methods, Turbulent Energy Spectra, Translational Invariance Differentiable Programs 21
ENFORCING PHYSICAL CONSTRAINTS Physics Informed Neural Nets https://maziarraissi.github.io/PINNs/ 22
ENFORCING PHYSICAL CONSTRAINTS NVIDIA SimNet https://www.youtube.com/watch?v=Oq2Mpi5pF1w 23
UNCERTAINTY QUANTIFICATION How certain is the prediction? https://www.researchgate.net/figure/Illustration-of-Gaussian-process-regression-in- one-dimension-for-the-target-test_fig1_327613136 24
UNCERTAINTY QUANTIFICATION Methods for quantifying uncertainty https://www.inovex.de/blog/uncertainty-quantification-deep-learning/ 25
UNCERTAINTY QUANTIFICATION Methods for quantifying uncertainty Gaussian process inference Separation of error types 26
INTERPRETABILITY What criteria were used in this prediction? Layer-wise Relevance Propagation (LRP) https://lrpserver.hhi.fraunhofer.de/image-classification 27
INTERPRETABILITY Backwards optimization and Layerwise Relevance Propagation applied to ENSO https://doi.org/10.1029/2019MS002002 28
INTERPRETABILITY LIME: Local interpretable model-agnostic explanations https://arxiv.org/pdf/1602.04938.pdf 29
FORTRAN / AI COUPLING How can I glue my AI and HPC code together? + (Python) 30
FORTRAN / AI COUPLING Many solutions. None ideal. Use Julia. Call Fortran Use C++ Instead Ok, but limited Missing: Native API Missing: Fortran Bindings 31
OVERCOMING LOSS OF DYNAMIC RANGE Ensemble mean always has less variability than the ensemble members 32
OVERCOMING LOSS OF DYNAMIC RANGE Stochastic Parameterizations 33
TRENDS AND BREAKTHROUGHS 34
SELF-SUPERVISION Babies learn about the world without large labelled datasets. AI can too. 35
SELF-SUPERVISION Pretext tasks build up an internal representation PREDICT SPATIAL RELATIONSHIPS PREDICT ORIENTATION PREDICT COLOR https://arxiv.org/pdf/1806.09594.pdf PREDICT TEMPORAL ORDER PREDICT MISSING PIECES 36
GPT-3 THE TRANSFORMER 175 Billion Params Has enabled a series of enormous language models 8.3 Billion MILLIONS of PARAMETERS best language papers encoder-decoder-transformers 37
GPT-3 (July 2020) Current king of the languages models. Based upon the transformer 38
NEW HARDWARE AND SOFTWARE TOOLS 39
5 MIRACLES OF A100 NVIDIA Ampere Architecture 3rd Gen Tensor Cores World’s Largest 7nm chip Faster, Flexible, Easier to use 54B XTORS, HBM2 20x AI Perf with TF32 2.5x HPC Perf New Sparsity Acceleration New Multi-Instance GPU 3rd Gen NVLINK and NVSWITCH Harness Sparsity in AI Models Optimal utilization with right sized GPU Efficient Scaling to Enable Super GPU 2x AI Performance 7x Simultaneous Instances per GPU 2X More Bandwidth 40
OMNIVERSE Interactive Raytracing for Data Visualization And Graphical User Interfaces https://developer.nvidia.com/nvidia-omniverse-platform 41
OMNIVERSE Interactive Raytracing for Data Visualization And Graphical User Interfaces 42
PYTORCH LIGHTNING API to standardize and accelerate your PyTorch models https://reproducibility-challenge.github.io/neurips2019/resources/ 43
PYTORCH-LIGHTNING BOLTS Lightning implementations of popular models, optimized for GPUs 44
Summary • AI for science is advancing rapidly! • Good progress on scientific challenges • Reviewed major trends in AI • Looked at powerful new hardware and software tools you can use dhall@
You can also read