Learning forecasts of rare stratospheric transitions from short simulations
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Generated using the official AMS LATEX template v5.0 two-column layout. This work has been submitted for publication. Copyright in this work may be transferred without further notice, and this version may no longer be accessible. Learning forecasts of rare stratospheric transitions from short simulations Justin Finkel∗ arXiv:2102.07760v2 [physics.ao-ph] 29 Aug 2021 Committee on Computational and Applied Mathematics, University of Chicago Robert J. Webber Department of Computing and Mathematical Sciences, California Institute of Technology Edwin P. Gerber Courant Institute of Mathematical Sciences, New York University Dorian S. Abbot Department of the Geophysical Sciences, University of Chicago Jonathan Weare Courant Institute of Mathematical Sciences, New York University ABSTRACT Rare events arising in nonlinear atmospheric dynamics remain hard to predict and attribute. We address the problem of forecasting rare events in a prototypical example, Sudden Stratospheric Warmings (SSWs). Approximately once every other winter, the boreal stratospheric polar vortex rapidly breaks down, shifting midlatitude surface weather patterns for months. We focus on two key quantities of interest: the probability of an SSW occurring, and the expected lead time if it does occur, as functions of initial condition. These optimal forecasts concretely measure the event’s progress. Direct numerical simulation can estimate them in principle, but is prohibitively expensive in practice: each rare event requires a long integration to observe, and the cost of each integration grows with model complexity. We describe an alternative approach using integrations that are short compared to the timescale of the warming event. We compute the probability and lead time efficiently by solving equations involving the transition operator, which encodes all information about the dynamics. We relate these optimal forecasts to a small number of interpretable physical variables, suggesting optimal measurements for forecasting. We illustrate the methodology on a prototype SSW model developed by Holton and Mass (1976) and modified by stochastic forcing. While highly idealized, this model captures the essential nonlinear dynamics of SSWs and exhibits the key forecasting challenge: the dramatic separation in timescales between a single event and the return time between successive events. Our methodology is designed to fully exploit high-dimensional data from models and observations, and has the potential to identify detailed predictors of many complex rare events in meteorology. 1. Introduction 2018), rogue waves (e.g., Dematteis et al. 2018), and space weather events (e.g., coronal mass ejections; Ngwira et al. As computing power increases and weather models grow (2013)). These are very difficult to characterize and predict, more intricate and capable of generating a vast wealth of being exceptionally rare and pathological outliers in the realistic data, the goal of extreme weather event predic- tion appears less distant (Vitart and Robertson 2018). To spectrum of weather events. Ensemble forecasting in nu- take full advantage of the increased computing power, we merical weather prediction is best suited to estimate statis- must develop new approaches to efficiently manage and tics of the average or most likely scenarios, and specialized parse the data we generate (or observe) to derive physi- methods are needed to examine the more extreme outlier cally interpretable, actionable insights. Extreme weather scenarios. events are worthy targets for simulation owing to their de- In this study, we advance an alternative computational structive potential to life and property. Rare events have approach to predicting and understanding general rare attracted significant simulation efforts recently, including events without sacrificing model fidelity. Our method re- hurricanes (e.g., Zhang and Sippel 2009; Webber et al. lies on data generated by a high-fidelity model with a state 2019; Plotkin et al. 2019), heat waves (e.g., Ragone et al. space with many degrees of freedom , representing, for example, spatial resolution of the primitive equations. In this way, our method is similar to recently introduced re- ∗ Corresponding author: Justin Finkel, jfinkel@uchicago.edu 1
2 duced order modeling techniques using statistical and ma- and experiments, yet complex enough to present the essen- chine learning (e.g., Kashinath et al. (2021) and references tial computational difficulties of probabilistic forecasting therein). However, in contrast to other data-driven tech- and test our methods for addressing them. In particular, this niques, our approach focuses on directly computing key system captures the key difficulty in sampling rare events. quantities of interest that characterize the essential pre- The vast majority of the time, the system sits in one of two dictability of the rare event, rather than trying to capture the metastable states, characterizing a strong or weak vortex full detailed evolution of the system. In particular, we will respectively. Extreme events are the infrequent jumps from compute estimators of statistically optimal forecasts that one state to another. Our computational framework can ac- are useful for initial conditions somewhere between a “typ- curately characterize these rare transitions using only a data ical” configuration and an “anomalous” configuration set of “short” model simulations, short not only compared that defines the rare event, where typical and anomalous to the long periods the system sits in one state or the other, are user-defined. We focus on two forecasts in particular to but also relative to the timescale of the transition events quantify risk. The committor is the probability that a given themselves. In the future, the same methodology could be initial condition evolves directly into rather than . Given applied to query the properties of more complex models, that it does reach first, the conditional mean first pas- such as GCMs, where less theoretical understanding is sage time, or lead time, is the expected time that it takes available. to get there. The committor appears prominently in the In section 2, we review the dynamical model and de- molecular dynamics literature, with some recent applica- fine the specific rare event of interest. In section 3, we tions in geoscience including Tantet et al. (2015); Lucente formally define the risk metrics introduced above and vi- et al. (2019), and Finkel et al. (2020), which compute the sualize the results for the Holton-Mass model, including a committor for low-dimensional atmospheric models. discussion of physical and practical insights gleaned from Both quantities depend on the initial condition, defin- our approach. In section 4 we identify an optimal set of re- ing functions over -dimensional state space that encode duced coordinates for estimating risk using sparse regres- important information regarding the fundamental causes sion. These results will provide motivation for the compu- and precursors of the rare event. However, “decoding” the tational method, which we present afterward in section 5 physical insights is not automatic. With real-time measure- along with accuracy tests. We then lay out future prospects ment constraints, the risk metrics must be estimated from and conclude in section 6. low-dimensional proxies. Even visualizing them requires projecting down to one or two dimensions. This calls for a 2. Holton-Mass model principled selection of low-dimensional coordinates which are both physically meaningful and statistically informa- Holton and Mass (1976) devised a simple model of the tive for our chosen risk metrics. We address this problem stratosphere aimed at reproducing observed intra-seasonal using sparse regression, a simple but easily extensible so- oscillations of the polar vortex, which they termed “strato- lution with the potential to inform optimal measurement spheric vacillation cycles.” Earlier SSW models, origi- strategies to estimate risk as precisely as possible under nating with that of Matsuno (1971), proposed upward- constraints. propagating planetary waves as the major source of distur- Estimation of the committor and lead time is a chal- bance to the vortex. While Matsuno (1971) used impulsive lenge. We employ a method that uses a large data set forcing from the troposphere as the source of planetary of short-time independent simulations. We represent the waves, Holton and Mass (1976) suggested that even sta- committor and lead time as solutions to Feynman-Kac for- tionary tropospheric forcing could lead to an oscillatory mulae (Oksendal 2003), which relate long-time forecasts response, suggesting that the stratosphere can self-sustain to instantaneous tendencies. These equations are elegant its own oscillations. While the Holton-Mass model is meant and general, but computationally daunting: in the continu- to represent internal stratospheric dynamics, Sjoberg and ous time and space limit, they become partial differential Birner (2014) point out that the stationary boundary condi- equations (PDE) with independent variables—the same tion does not lead to stationary wave activity flux, meaning as the model state space dimension. It is therefore hopeless that even the Holton-Mass model involves some dynamic to solve the equations using any standard spatial discretiza- interaction between the troposphere and stratosphere. Iso- tion. But, as we demonstrate, the equations can be solved lating internal from external dynamics is a subtle modeling with remarkable accuracy by expanding in a basis of func- question, but in the present paper we adhere to the original tions informed by the data set. Holton-Mass framework for simplicity. Our methodology We illustrate our approach on the highly simplified applies equally well to other formulations. Holton-Mass model (Holton and Mass 1976; Christiansen Radiative cooling through the stratosphere and wave per- 2000) with stochastic velocity perturbations in the spirit turbations at the tropopause are the two competing forces of Birner and Williams (2008). The Holton-Mass model is that drive the vortex in the Holton-Mass model. Altitude- well-understood dynamically in light of decades of analysis dependent cooling relaxes the zonal wind toward a strong
3 vortex in thermal wind balance with a radiative equilib- frequency), 0 is the Coriolis parameter, and = 2.5 × rium temperature field. Gradients in potential vorticity 105 m is a horizontal length scale, selected in order to along the vortex, however, can allow the propagation of create a homogeneously shaped data set more suited to Rossby waves. When conditions are just right, a Rossby our analysis. See Holton and Mass (1976); Yoden (1987a); wave emerges from the tropopause and rapidly propagates Christiansen (2000) for details on parameters. Boundary upward, sweeping heat poleward and stalling the vortex by conditions are prescribed at the bottom of the stratosphere, depositing a burst of negative momentum. The vortex is which in this model corresponds to = 0 km, and the top destroyed and begins anew the rebuilding process. of the stratosphere = 70 km. Yoden (1987a) found that for a certain range of param- ℎ eter settings, these two effects balance each other to create Ψ(0, ) = , Ψ( , ) = 0, (4) 0 two distinct stable regimes: a strong vortex with zonal wind close to the radiative equilibrium profile, and a weak vor- (0, ) = (0), ( , ) = ( ). tex with a possibly oscillatory wind profile. We focus our The vortex-stabilizing influence is represented by ( ), the study on this bistable setting as a prototypical model of altitude-dependent cooling coefficient, and the radiative atmospheric regime behavior. The transition from strong wind profile ( ) = (0) + 1000 (with in m), which to weak vortex state captures the essential dynamics of an relaxes the vortex toward radiative equilibrium. Here = SSW. O (1) is the vertical wind shear in m/s/km. The competing The Holton-Mass model takes the linearized quasi- force of wave perturbation is encoded through the lower geostrophic potential vorticity (QGPV) equation for a per- boundary condition Ψ(0, ) = ℎ/ 0 . turbation streamfunction 0 ( , , , ) on top of a zonal Detailed bifurcation analysis of the model by both Yoden mean flow ( , , ), and projects these two fields onto a (1987a) and Christiansen (2000) in ( , ℎ) space revealed single zonal wavenumber = 2/( cos 60◦ ) and a single the bifurcations that lead to bistability, vacillations, and ul- meridional wavenumber ℓ = 3/ , where is the Earth’s timately quasiperiodicity and chaos. Here we will focus on radius. This notation is consistent with Holton and Mass an intermediate parameter setting of = 1.5 m/s/km and (1976) and Christiansen (2000), and we refer the reader to ℎ = 38.5 m, where two stable states coexist: a strong vortex these earlier papers for complete description of the equa- with closely following and an almost barotropic sta- tions and parameters. The resulting ansatz is tionary wave, as well as a weak vortex with dipping close to zero at an intermediate altitude and a stationary wave ( , , ) = ( , ) sin(ℓ ) (1) with strong westward phase tilt. The two stable equilibria, 0 /2 ( , , , ) = Re{Ψ( , ) } sin(ℓ ) which we call a and b, are illustrated in Fig. 1(a,b) by their -dependent zonal wind and perturbation streamfunction which is fully determined by the reduced state space profiles. ( , ), and Ψ( , ), the latter being complex-valued. The two equilibria can be interpreted as two different is a scale height, 7 km. Inserting this into the linearized winter climatologies, one with a strong vortex and one QGPV equations yields the coupled PDE system with a weak vortex susceptible to vacillation cycles. To explore transitions between these two states, we follow 2 Ψ 1 Birner and Williams (2008) and modify the Holton-Mass − G 2 ( 2 + ℓ 2 ) + + 2 (2) 4 equations with small additive noise in the variable to 2 mimic momentum perturbations by smaller scale Rossby = − − G 2 − − 2 Ψ waves, gravity waves, and other unresolved sources. The 4 2 2 form of noise will be specified in Eq. (7). 2Ψ 1 While the details of the additive noise are ad hoc, + 2 G 2 + − + 2 Ψ − 2 4 the general approach can be more rigorously justified through the Mori-Zwanzig formalism (Zwanzig 2001). for Ψ( , ), and Because many hidden degrees of freedom are being pro- jected onto the low-dimensional space of the Holton-Mass 2 2 2 −G ℓ − + 2 = ( − ) − (3) model, the dynamics on small observable subspaces can be considered stochastic. This is the perspective taken in 2 ∗ 2 ℓ 2 Ψ stochastic parameterization of turbulence and other high- − ( − ) + 2 + Im Ψ 2 2 dimensional chaotic systems (Hasselmann 1976; DelSole and Farrell 1995; Franzke and Majda 2006; Majda et al. for ( , ). Here, = 8/(3 ) is a coefficient for projecting 2001; Gottwald et al. 2016). In general, unobserved deter- sin2 (ℓ ) onto sin(ℓ ). We have nondimensionalized the ministic dynamics can make the system non-Markovian, equations with the parameter G 2 = 2 2 /( 02 2 ), where which technically violates the assumptions of our method- 2 = 4 × 10−4 s−2 is a constant stratification (Brunt-Väisälä ology. However, with sufficient separation of timescales the
4 (a) (b) (c) (d) (e) Fig. 1. Illustration of the two stable states of the Holton-Mass model and transitions between them. (a) Zonal wind profiles of the radiatively maintained strong vortex (the fixed point a, blue) which increases linearly with altitude, and the weak vortex (the fixed point b, red) which dips close to zero in the mid-stratosphere. (b) Streamfunction contours are overlaid for the two equilibria a and b. (c) Parametric plot of a control simulation in a 2-dimensional state space projection, including two transitions from to (orange) and to (green). (d) Time series of (30 km) from the same simulation. (e) The steady state density projected onto (30 km). Markovian assumption is not unreasonable. Furthermore, where : R → R × imparts a correlation structure to memory terms can be ameliorated by lifting data back to the vector W( ) ∈ R of independent standard white noise higher-dimensional state space with time-delay embedding processes. As discussed above, we design to be a low- (Berry et al. 2013; Thiede et al. 2019; Lin and Lu 2021). rank, constant matrix that adds spatially smooth stirring We follow Holton and Mass (1976) and discretize the to only the zonal wind (not the streamfunction Ψ) and equations using a finite-difference method in , with 27 which respects boundary conditions at the bottom and top vertical levels (including boundaries). After constraining of the stratosphere. Its structure is defined by the following the boundaries, there are = 3 × (27 − 2) = 75 degrees Euler-Maruyama scheme: in a timesetep = 0.005 days, of freedom in the model. Christiansen (2000) investigated after a deterministic forward Euler step we add the stochas- higher resolution and found negligible differences. The full tic perturbation to zonal wind on large vertical scales discretized state is represented by a long vector √ h ∑︁ 1 X( ) = Re{Ψ}(Δ , ), . . . , Re{Ψ}( − Δ , ), ( ) = sin + (7) =0 2 Im{Ψ}(Δ , ), . . . , Im{Ψ}( − Δ , ), (5) i where ( = 0, 1, 2) are independent unit normal samples, (Δ , ), . . . , ( − Δ , ) ∈ R = R75 = 2, and is a scalar that sets the magnitudes of entries in . In terms of physical units, The deterministic system can be written X( )/ = (X( )) for a vector field : R → R specified by E[( ) 2 ] discretizing (2) and (3). Under deterministic dynamics, 2 = ≈ (1 m/s) 2 /day (8) X( ) → a or X( ) → b as → ∞ depending on initial con- ditions. The addition of white noise changes the system has units of ( / )/ 1/2 , where the square-root of time into an Itô diffusion comes from the quadratic variation of the Wiener process. It is best interpreted in terms of the daily root-mean-square X( ) = (X( )) + (X( )) W( ) (6) velocity perturbation of 1.0 m/s. We have experimented
5 with this value, and found that reducing the noise level be- the streamfunction in the Holton-Mass model: low 0.8 dramatically reduces the frequency of transitions, ∫ 30 km ∫ 30 km while increasing it past 1.5 washes out metastability. We IHF = − / 0 0 ∝ |Ψ| 2 (9) keep constant going forward as a favorable numerical 0 km 0 km regime to demonstrate our approach, while acknowledging where is the phase of Ψ. The → and → tran- that the specifics of stochastic parameterization are impor- sitions are again highlighted in orange and green respec- tant in general to obtain accurate forecasts. The resulting tively, showing geometrical differences between the two matrix is 75 × 3, with nonzero entries only in the last 25 directions. We will refer to the → transition as an rows as forcing only applies to ( ). SSW event, even though it is more accurately a transition A long simulation of the model reveals metastability, between climatologies according to the Holton-Mass in- with the system tending to remain close to one fixed point terpretation. The → transition is a vortex restoration for a long time before switching quickly to the other, as event. Our focus in this paper is on predicting these transi- shown by the time series of (30 km) in panel (d) of Fig. tion events (mainly the → direction) and monitoring 1. Panel (e) shows a projection of the steady state distribu- their progress in a principled way. In the next section we tion, also known as the equilibrium/invariant distribution, explain the formalism for doing so. of as a function of . We call this density (x), which is a function over the full -dimensional state space. We 3. Forecast functions: the committor and lead time focus on the zonal wind at 30 km following Christiansen statistics (2000), because this is where its strength is minimized in a. Defining risk and lead time the weak vortex. While the two regimes are clearly associ- ated with the two fixed points, they are better characterized We will introduce the quantities of interest by way of by extended regions of state space with strong and weak example. First, suppose the stratosphere is observed in vortices. We thus define the two metastable subsets of R an initial state X(0) = x that is neither in nor , so (b)(30 km) < (x)(30 km) < (a)(30 km) and the vor- tex is somewhat weakened, but not completely broken = {X : (X)(30 km) ≥ (a) (30 km) = 53.8 m/s}, down. We call this intermediate zone = ( ∪ ) c (the = {X : (X)(30 km) ≤ (b) (30 km) = 1.75 m/s}. complement of the two metastable sets). Because and are attractive, the system will soon find its way to one or This straightforward definition roughly follows the con- the other at the first-exit time from , denoted vention of Charlton and Polvani (2007), which defines an c = min{ ≥ 0 : X( ) ∈ c } (10) SSW as a reversal of zonal winds at 10 hPa. We use 30 km for consistency with Christiansen (2000); this is technically Here, c emphasizes that the process has left , i.e., gone higher than 10 hPa because = 0 in the Holton-Mass model to or . The first-exit location X( c ) is itself a random represents the tropopause. Our method is equally applica- variable which importantly determines how the system ex- ble to any definition, and the results are not qualitatively its : either X( c ) ∈ , meaning the vortex restores to dependent on this choice. Incidentally, the analysis tools radiative equilibrium, or X( c ) ∈ , meaning the vortex we present may be helpful in distinguishing predictability breaks down into vacillation cycles. A fundamental goal properties between different definitions. In fact, we will of forecasting is to determine the probabilities of these show that the height neighborhood of 20 km is actually two events, which naturally leads to the definition of the more salient for predicting the event than wind at the 30- (forward) committor function km level, even when the event is defined by wind at 30 km! x ∈ = ( ∪ ) c Px {X( ) ∈ } c This emerges from statistical analysis alone, and gives us + (x) = 0 x∈ (11) confidence that essential SSW properties are stable with respect to reasonable changes in definition. 1 x∈ The orange highlights in Fig. 1 (d) begin when the system where the subscript x indicates that the probability is exits the region bound for , and end when the system conditional on a fixed initial condition X(0) = x, i.e., enters . The green highlights start when the system leaves Px {·} = P{·|X(0) = x}. The superscript “+” distinguishes bound for , and end when is reached. Note that → the forward committor from the backward committor, an transitions, SSWs, are much shorter in duration than → analogous quantity for the time-reversed process which we transitions. Fig. 1 (c) shows the same paths, but viewed do not use in this paper. Throughout, we will use capital parametrically in a two-dimensional state space consist- X( ) to denote a stochastic process, and lower-case x to ∫ 30 km ing of integrated heat flux or IHF 0 km − / 0 0 , and represent a specific point in state space, typically an ini- zonal wind (30 km). IHF is an informative number be- tial condition, i.e., X(0) = x. Both are = 75-dimensional cause it captures both magnitude and phase information of vectors.
6 The committor is the probability that the system will be For this more ambitious goal, DNS is prohibitively ex- in state (the disturbed state) next rather than (the strong pensive. By definition, transitions between and are vortex state). Hence + (x) = 0 if you start in , and is 1 if infrequent. Therefore, if starting from x far from , a huge you are already in . In between (i.e., when x ∈ ), + (x) number of sampled trajectories ( ) will be required to ob- tells you the probability that you will first go to rather serve even a small number ending in , and they may take a than to . That is, + (x) tells you the probability that an long time to get there. If instead we could precompute these SSW will happen. functions offline over all of state space, the online forecast- Another important forecasting quantity is the lead time ing problem would reduce to “reading off” the committor to the event of interest. While the forward committor re- and lead time with every new observation. Achieving this veals the probability of experiencing vortex breakdown goal is the key point of our paper, and we achieve this us- before returning to a strong vortex, it does not say how ing the dynamical Galerkin approximation, or DGA, recipe long either event will take. Furthermore, even if the vortex described by Thiede et al. (2019). is restored first, how long will it be until the next SSW does A brute force way to estimate these functions is to inte- occur? The time until the next SSW event is denoted , grate the model for a long time until it reaches statistical again a random variable, whose distribution depends on steady state, meaning it has explored its attractor thor- the initial condition x. We call Ex [ ] the mean first pas- oughly according to the steady state distribution. After sage time (MFPT) to . Conversely, we may ask how long long enough, it will have wandered close to every point x a vortex disturbance will persist before normal conditions sufficiently often to estimate + (x) and + robustly as in return; the answer (on average) is Ex [ ], the mean first DNS. We have performed such a “control simulation” of passage time to . These same quantities have been cal- 5×105 days for validation purposes, but our main contribu- culated previously in other simplified models, e.g. Birner tion in this paper is to compute the forecast functions using and Williams (2008) and Esler and Mester (2019). only short trajectories with DGA, allowing for massive Ex [ ] has an obvious shortcoming: it is an average parallelization. However, we will defer the methodological over all paths starting from x, including those which go details to Section 5, and first justify the effort with some re- straight into (i.e., an orange trajectory in Fig. 1c,d) and sults. We visualize the committor and lead time computed the rest which return to i.e., a green trajectory) and linger from short trajectories and elaborate on their interpretation, there, potentially for a very long time, before eventually utility, and relationship to ensemble forecasting methods. re-crossing back into . It is more relevant for near-term forecasting to condition on the event that an SSW is b. Steady state distribution coming before the strong vortex returns. For this purpose, we introduce the conditional mean first passage time, or Before visualizing the committor and lead time, it will be lead time, to : helpful to have a precise notion of the steady state distribu- tion, denoted (x), a probability density that describes the + (x) := Ex [ | < ] (12) long-term behavior of a stochastic process X( ). Assuming the system is ergodic, averages over time are equivalent to which quantifies the suddenness of SSW. averages over state space with respect to . That is, for any All of these quantities can, in principle, be estimated by well-behaved function : R → R, direct numerical simulation (DNS). For example, suppose ∫ ∫ we observe an initial condition X(0) = x in an operational 1 h i := lim (X( )) = (x) (x) x (13) forecasting setting, and wish to estimate the probability →∞ 0 R and lead time for the event of next hitting . We would initialize an ensemble {X (0) = x, = 1, . . . , } and evolve For example, if (x) = 1 (x) (an indicator function, which each member forward in time until it hits or at the is 1 for x ∈ ⊂ R and 0 for x ∉ ), Eq. (13) says that the random time . In an explicitly stochastic model, random fraction of time spent in can be found by integrating the forcing would drive each member to a different fate, while density over . The density peaks in Fig. 1(d) indicates in a deterministic model their initial conditions would be clearly that the neighborhoods of a and b are two such re- perturbed slightly. To estimate the committor to , we gions with especially large probability under . Note that could calculate the fraction of members that hit first. both sides of (13) are independent of the initial condition, Averaging the arrival times ( ), over only those members which is forgotten eventually. Short-term forecasts are by gives an estimate of the lead time to . For a single initial definition out-of-equilibrium processes, depending criti- condition x reasonably close to , DNS may be the most cally on initial conditions; however, (x) is important to us economical. But how do we systematically compute + (x) here as a “default” distribution for missing information. If over all of state space (here 75 variables, but potentially the initial condition is only partially observed, e.g. in only billions of variables in a GCM or other state-of-the-art one coordinate, we have no information about the other forecast system)? − 1 dimensions, and in many cases the most principled
7 tactic is to assume those other dimensions are distributed Each quantity is projected onto two different one- according to , conditional on the observation. dimensional CVs: (30 km) (first column) and IHF (sec- ond column). In panel (a), for example, we see the com- c. Visualizing committor and lead times mittor is a decreasing function of : the weaker the wind, the more likely a vortex breakdown. Moreover, the curve The forecasts + (x) and + (x) are functions of a high- provides a conversion factor between risk (as measured by dimensional space R . However, these degrees of free- probability) and a physical variable, zonal wind. An ob- dom may not all be “observable” in a practical sense, servation of (30 km) = 38 m/s implies a 50% chance of given the sparsity and resolution limits of weather sensors, vortex breakdown. The variation in slope also tells us that and visualizing them requires projecting onto reduced- a wind reduction from 40 m/s to 30 m/s represents a far coordinate spaces of dimension 1 or 2. We call these greater increase in risk than a reduction from 30 m/s to 20 “collective variables” (CVs) following chemistry litera- m/s. Meanwhile, panel (b) shows the committor to be an ture (e.g., Noé and Clementi 2017), and denote them as increasing function of IHF, since SSW is associated with vector-valued functions from the full state space to a re- large wave amplitude and phase lag. However, IHF seems duced space, : R → R , where = 1 or 2. For in- inferior to zonal wind as a committor proxy, as a small stance, Fig. 1 (c) plots trajectories in the CV space con- sisting of integrated heat flux and zonal change in IHF from ∼ 15000 to ∼ 20000 K·m2 /s corre- ∫ 30 km wind at 30 km: sponds to a sharp increase in committor from nearly zero (x) = 0 km − / 0 0 , (30 km) . The first compo- to nearly one. In other words, knowing only IHF doesn’t nent is a nonlinear function involving products of Re{Ψ} provide much useful information about the threat of SSW and Im{Ψ}, while the second component is a linear func- until it is already virtually certain. The dotted envelope is tion involving only at a certain altitude. For visualization also wider in panel (b) than (a), indicating that projecting in general, we have to approximate a function : R → R, the committor onto IHF removes more information than such as the committor or lead time, as a function of re- projecting onto . While the underlying noise makes it duced coordinates. That is, we wish to find : R → R impossible to divine the outcome with certainty from any such that (x) ≈ ( (x)). Given a fixed CV space , an observation, the projection error clearly privileges some “optimal” is chosen by minimizing some function-space observables over others for their predictive power. metric between ◦ and . In panels (c) and (d), the lead time is seen to have the A natural choice is the mean-squared error weighted by opposite overall trend as the committor: the weaker the the steady state distribution , so the projection problem is wind, or the greater the heat flux, the closer you are on to minimize over functions : R → R the penalty average to a vortex breakdown. + (x) is not defined when wind is strongest, as x ∈ and so + (x) = 0. However, [ ; ] : = k ◦ − k 2 2 ( ) ∫ h an interesting exception to the trend occurs in the range i2 10 m/s ≤ ≤ 40 m/s: the expected lead time stays con- = ( (x)) − (x) (x) x. (14) R stant or slightly decreases as zonal wind increases, and the projection error remains large. This means that while the The optimal for this purpose is the conditional expecta- probability of vortex breakdown increases rapidly from tion 50% to 90%, the time until vortex breakdown remains highly uncertain. To resolve this seeming paradox, we will (y) = EX∼ [ (X)| (X) = y] have to visualize the joint variation of + and + . (x) 1 y ( (x)) (x) x ∫ It is of course better to consider multiple observables = lim (15) at once. Fig. 3 shows the information gained beyond ob- 1 y ( (x)) (x) x ∫ | y|→0 serving (30 km) by incorporating IHF as a second ob- where y is a small neighborhood about y in CV space servable. In the top row we project , + , and + onto . The subscript X ∼ means that the expectation is with the two-dimensional subspace, revealing structure hidden respect to a random variable X distributed according to from view in the one-dimensional projections. Panel (a) (x), i.e., at steady state. Fig. 2 uses this formula to display is a 2-dimensional extension of Fig. 1(d), with density one-dimensional projections of the committor (first row) peaks visible in the neighborhoods of a and b. The white and lead time (second row), as well as the one-standard space surrounding the gray represents physically insignifi- deviation envelope incurred by projecting out the other 74 cant regions of state space that was not sampled by the long degrees of freedoom. This “projection error” is defined as simulation. The same convention holds for the following the square root of the conditional variance two-dimensional figures. The committor is displayed in panel (b) over the same space. It changes from blue at the 2 top (an SSW is unlikely) to red at the bottom (an SSW h i (y) = EX∼ (X) − (y) (X) = y . (16) is likely), bearing out the negative association between
8 (a) (b) ! ! = 0.5 % !!.# = 38 & (c) (d) Fig. 2. One-dimensional projections of the forward committor (first row) and lead time to (second row). These functions depend on all = 75 degrees of freedom in the model, but we have averaged across − 1 = 74 dimensions to visualize them as rough functions of two single degrees of freedom: (30 km) (first column) and integrated heat flux up to 30 km, IHF (second column). Panel (a) additionally marks the + = 12 threshold and the corresponding value of zonal wind. and + . However, there are non-negligible horizontal gra- both would have the same committor of 0.5. According to dients that show that IHF plays a role, too. Likewise, the both and IHF together, i.e., the two-dimensional heat lead time in panel (c) decreases from ∼ 90 days near a to map in Fig. 3(b), they have very different probabilities of 0 days near b, when the transition is complete. Here, IHF + ( 0 ) = 0.31 and + ( 1 ) = 0.73: an SSW is more than appears even more critically important for forecasting how twice as likely to occur from starting point 1 as 0 . the event plays out, as gradients in + are often completely While those committor values come from the DGA horizontal. method to be described in Section 5, we confirm them A horizontal dotted line in Fig. 3(a-c) marks the 50% risk empirically by plotting an ensemble of 100 trajectories level (30 km) = 38 m/s, but the committor varies along it originating from each of the two initial conditions in pan- from low risk at the left to high risk at the right: we show els (d) and (e) below, coloring -bound trajectories blue this concretely by selecting two points 0 and 1 along and -bound trajectories red. Only 28% of the sampled the line. According to alone, i.e., the curve in Fig. 2(a), trajectories through 0 exhibit an SSW, next going to state
9 (a) (b) (c) (d) (e) Fig. 3. The density, committor, and lead time as functions of zonal wind and integrated heat flux. Panel (a) projects the steady state distribution (x) onto the two-dimensional subspace ( ,IHF) at 30 km. The white regions surrounding the gray are unphysical states with negligible probability. Panels (b) and (c) display the committor and lead time in the same space. A horizontal transect marks the level (30 km) = 38.5 m/s, where + according to only is 0.5. Panels (d) and (e) show ensembles initialized from two points 0 and 1 along the transect, verifying that their committor and lead time values differ from their values according to , in a way that is predictable due to considering IHF in addition to . , while 68% of the integrations from 1 end at . In both The lead time prediction is improved similarly by incor- cases, the heatmaps and ensemble sample means roughly porating the second observable. According to alone, Fig. 2 predicts a lead time of 40 days for both 0 and 1 . Con- match. The small differences between the projected com- sidering IHF additionally, the two-dimensional heat map mittor and the empirical “success” rate of trajectories arises in Fig. 3 predicts a lead time of 52 days and 24 days for both from errors in the DGA calculation (which we analyze 0 and 1 , respectively. Referring to the ensemble from 1 in section 5) and the finite size of the ensemble. in panels (d) and (e), the arrival times of red trajectories
10 to provide a discrete sampling of the lead time distribu- spread in + than points at + = 0.5, contrary to the intu- tions of | < . The sample means are 50 and 32 days ition that closeness to in probability means closeness in respectively from 0 and 1 , again roughly matching with time. The color shading shows that + is strongly associ- our predictions. ated with (30 km), while + is more strongly associated These two-dimensional projections still leave out 73 re- with IHF(30 km). In particular the horizontal contours in maining dimensions, which we could incorporate to make panel (c) show that the large spread in lead time near is the forecasts even better. After accounting for all 75 di- due almost completely to variation in IHF. In other words, mensions, we would obtain the full committor function the system can be highly committed to with a low zonal + : R → R. This is still a probability, i.e., an expecta- wind, but if IHF is low, it may take a long time to get there. tion over the unresolved turbulent processes and uncertain We can also see this from the lower-left region of Fig. 3(a) initial condition. Low-dimensional committor projections and (b), where committor is high and lead time is high. simply treat the projected-out dimensions as random vari- There are two complementary explanations for this phe- ables sampled according to . Whether projected to a space nomenon. First, the low- , low-IHF region of state space of 1 or 75 dimensions, the committor is the function of that corresponds to a temporary restoration phase in a vacil- space that is closest, in the mean-square sense, to the bi- lation cycle, which delays the inevitable collapse of zonal nary indicator 1 (X( )); this is the defining characteristic wind below the threshold defining . In fact, the ensemble of conditional expectation (Durrett 2013). In the case that of pathways starting from 0 in Fig. 3(c) has several mem- the system does hit next, the lead time is closest in the bers whose zonal wind either stagnates at medium strength, mean-square sense to . or dips low and partially restores before finally plunging all While high-dimensional systems offer many coordinates the way down. The second explanation is that many of these to choose from, we argue that the committor and lead time partial restoration events are not part of an → transi- are the most important nonlinear coordinates to monitor tion, but rather a → transition. In a highly irreversible for forecasting purposes. We will explore their relationship system such as the Holton-Mass model, these two situa- in the next subsection. Although both encode some version of proximity to SSW, they are independent variables which tions are quite dynamically distinct. To distinguish them deserve separate consideration. using DGA, we would have to account for the past as well as the future, calculating backward-in-time forecasts such as the backward committor − (x) = Px {X( − ) ∈ }, where d. Relationship between risk and lead time − < 0 is the most-recent hitting time. Backward forecasts will be analyzed thoroughly in a forthcoming paper, but A forecast is most useful if it comes sufficiently early (to they are beyond the scope of the present one. leave some buffer time before impact) and is sufficiently In summary, + and + are principled metrics to inform precise to time your response. For example, in June we can preparation for extreme weather. For example, a threatened say with certainty it will snow next winter in Minnesota. community might decide in advance to start taking action To be useful, we want to know the date of the first snow when an event is very likely, + ≥ 0.8, and somewhat im- as early as possible. By relating levels of risk (quantified minent, + ≤ 10 days, or rather, when an event is somewhat by + ) and lead time (quantified by + ), we can now assess likely, + ≥ 0.5, and very imminent, + ≤ 3 days. Because the limits of early prediction. Such a relationship would answer two questions: for an SSW transition, (1) how far of partial restoration events, the committor does not deter- in advance will we be aware of it with some prescribed mine the lead time or vice versa, and so a good real-time confidence, say 80%? (2) given some prescribed lead time, disaster response strategy should take both of them into say 42 days, how aware or ignorant could we be of it? account, defining an “alarm threshold” that is not a single The committor and lead time have an overall negative number, but some function of both the committor and lead relationship, but they do not completely determine each time. This idea is similar in spirit to that of the Torino other, as the contours in Fig. 3(a,b) do not perfectly line scale, which assigns a single risk metric to an asteroid up. We treat them as independent variables in Fig. 4, which or comet impacts based on both probability and severity maps zonal wind and IHF as functions of the coordinates (Binzel 2000). Of course, after many near-SSW events, a + and + in an inversion of Fig. 3. The density (x) pro- lot of material damage may have already occurred, which jected on this space in 4(a) shows again a bimodal struc- may be a reason to define a higher threshold for the defini- ture around a and b, which occupy opposite corners of this tion of , or even a continuum for different severity levels space by construction. Meanwhile, zonal wind and IHF of SSW. We emphasize that the choice of , and alarm are indicated by the shading in panels (b) and (c). The thresholds are more of a community and policy decision bridge between a and b is not a narrow band, but rather than a scientific one. The strength of our approach is that includes a curious high-committor, high-lead time branch it provides a flexible numerical framework to quantify and which seems paradoxical: points at + = 0.9 have a greater optimize the consequences of those decisions.
11 (a) (b) (c) Fig. 4. Committor and lead time as independent coordinates. This figure inverts the functions in Fig. 3, considering the zonal wind and integrated heat flux as functions of committor and lead time. The two-dimensional space they span is the essential goal of forecasting. Panel (a) shows the steady state distribution on this subspace, which is peaked near a and b (darker shading), weaker in the "bridge" region between them, and completely negligible the white regions unexplored by data. Panels (b) and (c) display zonal wind and heat flux in color as functions of the committor and lead time. 4. Sparse representation of the committor input-output pairs. However, to keep the representation in- terpretable, we will restrict ourselves to physics-informed The committor projections showed give only an impres- input features based on the Eliassen-Palm (EP) relation, sion of its high-dimensional structure. While Eq. (15) says which relates wave activity, PV fluxes and gradients, and how to optimally represent the committor over a given CV heating source terms in a conservation equation. From Yo- subspace, optimizing [ ; ] over , it does not say which den (1987b), the EP relation for the Holton-Mass model subspace is optimal. If the committor does admit a sparse takes the form representation, we could specifically target observations on 02 these high-impact signals. In this section we address this + ( ) −1 ∇ · much harder problem of optimizing [ ; ] over subspaces 2 . 02 The set of CV spaces is infinite, as observables can be =− −1 0 ( 0) (17) 2 arbitrarily complex nonlinear functions of the basic state where = (− 0 0)j + ( 0 0)k variables x. Machine learning algorithms such as artifi- cial neural networks are designed exactly for that purpose: The EP flux divergence has two alternative expressions: −1 ∇ · = 0 0 = −1 0 0 0 . If there were no to represent functions nonparametrically from observed
12 dissipation ( = 0) and the background zonal state were flux are also rather unhelpful as early warning signs, de- time-independent ( = 0), dividing both sides by spite their central role in SSW evolution. For example, the would express local conservation of wave activity A = large, positive spikes in heat flux across all altitudes gener- 02 /(2 ). Neither of these is exact in the stochastic ally occur after the committor ≈ 0.5 threshold has already Holton-Mass model, so we use the quantities in Eq. (17) as been crossed. Furthermore, the relationship of 0 0 with diagnostics: enstrophy 02 , PV gradient , PV flux 0 0, the committor is not smooth. The + < 0.5 region at each altitude is a thin band near zero. and heat flux 0 0. Each field is a function of ( , ) and The exhaustive CV search in Fig. 5 is visually com- takes on very different profiles for the states a and b, as pelling in favor of some fields and some altitudes over found by Yoden (1987b). A transition from to , where others, but it is not satisfactory as a rigorous compari- the vortex weakens dramatically, must entail a reduction son. Differences between units and ranges make it difficult in and a burst in positive 0 0 (negative 0 0) as a to objectively compare the 2 projection error. Further- Rossby wave propagates from the tropopause vertically up more, restricting to one variable at a time is limiting. Ac- through the stratosphere and breaks. This is the general cordingly, we also perform a more automated approach physical narrative of a sudden warming event, and these to identify salient variables in the form of a generalized same fields might be expected to be useful observables linear model for the forward committor, using sparsity- to track for qualitative understanding and prediction. For promoting LASSO regression (“Least Absolute Shrink- visualization, we have found (30 ) and IHF(30 km) age and Selection Operator”) due to Tibshirani (1996), ∫ 30 km = 0 km − / 0 0 to be particularly helpful. However, as implemented in the scikit-learn Python package this doesn’t necessarily imply they are optimal predictors (Pedregosa et al. 2011). As input features, we use all of + , and regression is a more principled way to find them. state ∫ variables Re{Ψ}, Im{Ψ}, , the integrated heat flux We start by projecting the committor onto each observ- 0 − / 0 0 , the eddy PV flux 0 0, and the back- able at each altitude separately, in hopes of finding partic- ground PV gradient , at all altitudes simultaneously. ularly salient altitude levels that clarify the role of vertical The advantage of a sparsity-promoting regression is that interactions. The first five rows of Fig. 5 display, for five it isolates a small number of observables that can accu- fields ( , |Ψ|, 02 , , and 0 0) and for a range of altitude rately approximate the committor in linear combination. levels, the mean and standard deviation of the committor Considering that regions close to and have low com- projected onto that field at that altitude. Each altitude has mittor uncertainty, we regress only on data points with a different range of the CV; for example, because has a + ∈ (0.2, 0.8), and of those only a subset weighted by Dirichlet condition at the bottom and a Neumann condi- (x) + (x)(1 − + (x)) to further emphasize the transition region + ≈ 0.5. To constrain committor predictions to the tion at the top, the lower levels have a much smaller range range (0, 1), we regress on the committor after an inverse- of variability than the high levels. We also plot the inte- sigmoid transformation, ln( + /(1 − + )). First we do this grated variance, or 2 projection error, at each level in the at each altitude separately, and in Fig. 6 (a) we plot the co- right-hand column. A low projected committor variance efficients of each component as a function of altitude. The over at altitude 0 means that the committor is mostly bottom row of Fig. 5 also displays the committor projected determined by the single observable ( 0 ), while a high on the height-dependent LASSO predictor. projected variance indicates significant dependence of + The height-dependent regression in 6(a) shows each on variables other than ( 0 ). In order to compare dif- component is salient for some altitude range. In general, ferent altitudes and fields as directly as possible, the 2 and Im{Ψ} dominate as causal variables at low altitudes, projection error at each altitude is an average over discrete while Re{Ψ} dominates at high altitudes. The overall pre- bins of the observable. diction quality, as measured by 2 and plotted in Fig. 6 (b), In selecting good CV’s, we generally look for a simple, is greatest around 21.5 km, consistent with our qualitative hopefully monotonic, and sensitive relationship with the observations of Fig. 5. Note that not all single-altitude committor. Of all the candidate fields, and stand out slices are sufficient for approximating the committor, even the most in this respect, being clearly negatively correlated with LASSO regression; in the altitude band 50 − 60 km, with the forward committor at all altitudes. The associated the LASSO predictor is not monotonic and has a large projection error tends to be greatest in the region + ≈ 0.5, projected variance, as seen in the bottom row of Fig. 5. as observed before, but interestingly there is a small alti- The specific altitude can matter a great deal. But by using tude band around 15 − 25 km where its magnitude is mini- all altitudes at once, the committor approximation may be mized. This suggests an optimal altitude for monitoring the improved further. We thus repeat the LASSO with all alti- committor through zonal wind, giving the most reliable es- tudes simultaneously and find the sparse coefficient struc- timate possible for a single state variable. In contrast, the ture shown in 6 (c), with a few variables contributing the projection of + onto |Ψ|, displays a large variance across most, namely the state variables Ψ and in the altitude all altitudes. The eddy enstrophy and potential vorticity range 15-22 km. The nonlinear CVs failed to make any
13 !! projection Total !! !! projection error projection error Fig. 5. Projection of the forward committor onto a large collection of altitude-dependent physical variables. The top left panel shows heatmaps of + as a function of and ; white regions denote where ( ) is negligibly observed. The top middle panel shows the standard deviation in + as a function of and ; this uncertainty stems√︁from the remaining 74 model dimensions. The right-hand panel displays the total mean-squared error due to the projection for each altitude, i.e., [ ; ] from Eq. (14). A low value indicates that this level is ideal for prediction. The following rows show the same quantities for other physical variables: streamfunction magnitude, eddy enstrophy, background PV gradient, eddy PV flux, and LASSO. nonzero contribution to LASSO, and this remained stub- ward committor, we can make a strong recommendation for bornly true for other nonlinear combinations not shown, targeting observations here. This conclusion applies only such as 0 0. With multiple lines of evidence indicating 21.5 km as an altitude with high predictive value for the for- to the Holton-Mass model under these parameters, but the
14 (a) (b) (c) Fig. 6. Results of LASSO regression of the forward committor with linear and nonlinear input features. Panel (a) shows the coefficients when + is regressed as a function of only the variables at a given altitude, and panel (b) shows the corresponding correlation score. 21.5 km seems the most predictive (where ≡ 0 at the tropopause, not the surface). Panel (c) shows the coefficient structure when all altitudes are considered simultaneously. Most of the nonzero coefficients appear between 15-22 km, distinguishing that range as highly relevant for prediction. methodology explained above can be applied similarly to ensemble forecasting, we anticipate such methods will be models of arbitrary complexity. increasingly favorable with modern trends in computing. We have presented the committor and lead time as “ideal” forecasts, especially the committor, which we have devoted considerable effort to approximating in this sec- 5. The computational method tion. We want to emphasize that + and + are not competi- tors to ensemble forecasting; rather, they are two of its most In this section we describe the methodology, which in- important end results. So far, we have simply advocated in- volves some technical results from stochastic processes cluding + and + as quantities of interest. Going forward, and measure theory. After describing the theoretical moti- however, we do propose an alternative to ensemble fore- vation and the numerical pipeline in turn, we demonstrate casting aimed specifically at the committor, lead time, and the method’s accuracy and discuss its efficiency compared a wider class of forecasting functions, as they are important to straightforward ensemble forecasting. enough in their own right to warrant dedicated computa- tion methods. Our approach uses only short simulations, making it highly parallelizable, and shifts the numerical a. Feynman-Kac formulae burden from online to offline. Figs. 2-6 were all generated using the short-simulation algorithm. While the method is The forecast functions described above—committors not yet optimized and in some cases not competitive with and passage times—can all be derived from general con-
15 ditional expectations of the form In a diffusion process like the stochastic Holton-Mass ∫ model, L is an advection-diffusion partial differential oper- (x; ) = Ex (X( )) exp Γ(X( )) (18) ator which is analogous to a material derivative in fluid me- 0 chanics. The generator encapsulates the properties of the where again the subscript x denotes conditioning on X(0) = stochastic process. In addition to solving boundary value x; , Γ are arbitrary known functions over R ; and is a problems (18), its adjoint L ∗ provides the Fokker-Planck stopping time, specifically a first-exit time like Eq. (10) but equation for the stationary density (x): possibly with replaced by another set. is a variable parameter that turns into a moment-generating function. L ∗ (x) = 0 (25) To see that the forward committor takes on this form, set (x) = 1 (x), = 0 (Γ can be anything), and = ∪ . We can also write equations for moments of , as in (22), Then (x) = Ex 1 (X( )) = Px {X( ) ∈ } = + (x). by differentiating (23) repeatedly and setting = 0: For the + , set = , = 1 , and Γ = 1. Then L (x; 0) = − Γ −1 (26) (x; ) = Ex 1 (X( )) exp( ) (19) This is an application of the Kac Moment Method (Fitzsim- 1 Ex [ 1 (X( ))] (x; 0) = (20) mons and Pitman 1999). Note that we never actually have + (x) Ex [1 (X( ))] to solve (23) with nonzero . Instead we implement the re- = + (x). (21) cursion above. Note that the base case, = 0, with = 1 gives + = + , no matter what the risk function Γ. In this pa- So we must also be able to differentiate with respect to per we compute only up to the first moment, = 1. Further . background regarding stochastic processes and Feynman- More generally, the function is chosen by the user Kac formulae can be found in Karatzas and Shreve (1998); to quantify risk at the terminal time ; in the case of the Oksendal (2003); E et al. (2019). forward committor, that risk is binary, with an SSW repre- senting a positive risk and a radiative vortex no risk at all. The function Γ is chosen to quantify the risk accumulated b. Dynamical Galerkin Approximation up until time , which might be simply an event’s duration, To solve the boundary value problem (23) with = 0, but other integrated risks may be of more interest for the we start by following the standard finite element recipe, application. For example, one could express the total pole- converting to a variational form and projecting onto a fi- ward heat flux by setting Γ = 0 0, or the momentum lost by nite basis. First, we homogenize boundary conditions by the vortex by setting Γ(x) = (a) − (x). Extending (20), writing (x) = ˆ (x) + (x), where ˆ is a guess function one can compute not only means but higher moments of that obeys the boundary condition ˆ | c = , and | c = 0. such integrals by expressing the risk with Γ. Repeated dif- Next, we integrate the equation against any test function , ferentiation of (x; ) gives weighting the integrand by a density (which is arbitrary ∫ for now, but will be specified later): (x; 0) = Ex (X( )) Γ(X( )) (22) ∫ ∫ 0 (x)L (x) (x) x = ˆ (x)( − L )(x) (x) x We choose to focus on expectations of the form (18) R in order to take advantage of the Feynman-Kac formula, ˆ h , L i = h , − L i (27) which represents (x; ) as the solution to a PDE boundary value problem over state space. As PDEs involve local The test function should live in the same space as , i.e., operators, this form is more amenable to solution with with homogeneous boundary conditions (x) = 0 for x ∈ short trajectories which don’t stray far from their source. ∪ . We refer to the inner products in (27) as being “with The boundary value problem associated with (18) is respect to” the measure (with density) .Í We approximate ( by expanding in a finite basis (x) = =1 (x) with (L + Γ) (x; ) = 0 x ∈ unknown coefficients , and enforce that (27) hold for (23) (x; ) = (x) x ∈ c each . This reduces the problem to a system of linear equations, The domain here is some combination of c and c . The operator L is known as the infinitesimal generator of ∑︁ the stochastic process, which acts on functions by pushing ˆ h , L i = h , − L i = 1, . . . , (28) expectations forward in time along trajectories: =1 Ex [ (X(Δ ))] − (x) which can be solved with standard numerical linear algebra L (x) := lim (24) Δ →0 Δ packages.
You can also read