Learning forecasts of rare stratospheric transitions from short simulations

Page created by Steve Maldonado
 
CONTINUE READING
Learning forecasts of rare stratospheric transitions from short simulations
Generated using the official AMS LATEX template v5.0 two-column layout. This work has been submitted for
 publication. Copyright in this work may be transferred without further notice, and this version may no longer be
 accessible.

 Learning forecasts of rare stratospheric transitions from short simulations
 Justin Finkel∗
arXiv:2102.07760v2 [physics.ao-ph] 29 Aug 2021

 Committee on Computational and Applied Mathematics, University of Chicago

 Robert J. Webber
 Department of Computing and Mathematical Sciences, California Institute of Technology

 Edwin P. Gerber
 Courant Institute of Mathematical Sciences, New York University

 Dorian S. Abbot
 Department of the Geophysical Sciences, University of Chicago

 Jonathan Weare
 Courant Institute of Mathematical Sciences, New York University

 ABSTRACT

 Rare events arising in nonlinear atmospheric dynamics remain hard to predict and attribute. We address the problem of forecasting rare
 events in a prototypical example, Sudden Stratospheric Warmings (SSWs). Approximately once every other winter, the boreal stratospheric
 polar vortex rapidly breaks down, shifting midlatitude surface weather patterns for months. We focus on two key quantities of interest:
 the probability of an SSW occurring, and the expected lead time if it does occur, as functions of initial condition. These optimal forecasts
 concretely measure the event’s progress. Direct numerical simulation can estimate them in principle, but is prohibitively expensive in
 practice: each rare event requires a long integration to observe, and the cost of each integration grows with model complexity. We describe
 an alternative approach using integrations that are short compared to the timescale of the warming event. We compute the probability
 and lead time efficiently by solving equations involving the transition operator, which encodes all information about the dynamics. We
 relate these optimal forecasts to a small number of interpretable physical variables, suggesting optimal measurements for forecasting. We
 illustrate the methodology on a prototype SSW model developed by Holton and Mass (1976) and modified by stochastic forcing. While
 highly idealized, this model captures the essential nonlinear dynamics of SSWs and exhibits the key forecasting challenge: the dramatic
 separation in timescales between a single event and the return time between successive events. Our methodology is designed to fully exploit
 high-dimensional data from models and observations, and has the potential to identify detailed predictors of many complex rare events in
 meteorology.

 1. Introduction 2018), rogue waves (e.g., Dematteis et al. 2018), and space
 weather events (e.g., coronal mass ejections; Ngwira et al.
 As computing power increases and weather models grow
 (2013)). These are very difficult to characterize and predict,
 more intricate and capable of generating a vast wealth of
 being exceptionally rare and pathological outliers in the
 realistic data, the goal of extreme weather event predic-
 tion appears less distant (Vitart and Robertson 2018). To spectrum of weather events. Ensemble forecasting in nu-
 take full advantage of the increased computing power, we merical weather prediction is best suited to estimate statis-
 must develop new approaches to efficiently manage and tics of the average or most likely scenarios, and specialized
 parse the data we generate (or observe) to derive physi- methods are needed to examine the more extreme outlier
 cally interpretable, actionable insights. Extreme weather scenarios.
 events are worthy targets for simulation owing to their de- In this study, we advance an alternative computational
 structive potential to life and property. Rare events have approach to predicting and understanding general rare
 attracted significant simulation efforts recently, including events without sacrificing model fidelity. Our method re-
 hurricanes (e.g., Zhang and Sippel 2009; Webber et al. lies on data generated by a high-fidelity model with a state
 2019; Plotkin et al. 2019), heat waves (e.g., Ragone et al. space with many degrees of freedom , representing, for
 example, spatial resolution of the primitive equations. In
 this way, our method is similar to recently introduced re-
 ∗ Corresponding author: Justin Finkel, jfinkel@uchicago.edu

 1
Learning forecasts of rare stratospheric transitions from short simulations
2

duced order modeling techniques using statistical and ma- and experiments, yet complex enough to present the essen-
chine learning (e.g., Kashinath et al. (2021) and references tial computational difficulties of probabilistic forecasting
therein). However, in contrast to other data-driven tech- and test our methods for addressing them. In particular, this
niques, our approach focuses on directly computing key system captures the key difficulty in sampling rare events.
quantities of interest that characterize the essential pre- The vast majority of the time, the system sits in one of two
dictability of the rare event, rather than trying to capture the metastable states, characterizing a strong or weak vortex
full detailed evolution of the system. In particular, we will respectively. Extreme events are the infrequent jumps from
compute estimators of statistically optimal forecasts that one state to another. Our computational framework can ac-
are useful for initial conditions somewhere between a “typ- curately characterize these rare transitions using only a data
ical” configuration and an “anomalous” configuration set of “short” model simulations, short not only compared
that defines the rare event, where typical and anomalous to the long periods the system sits in one state or the other,
are user-defined. We focus on two forecasts in particular to but also relative to the timescale of the transition events
quantify risk. The committor is the probability that a given themselves. In the future, the same methodology could be
initial condition evolves directly into rather than . Given applied to query the properties of more complex models,
that it does reach first, the conditional mean first pas- such as GCMs, where less theoretical understanding is
sage time, or lead time, is the expected time that it takes available.
to get there. The committor appears prominently in the In section 2, we review the dynamical model and de-
molecular dynamics literature, with some recent applica- fine the specific rare event of interest. In section 3, we
tions in geoscience including Tantet et al. (2015); Lucente formally define the risk metrics introduced above and vi-
et al. (2019), and Finkel et al. (2020), which compute the sualize the results for the Holton-Mass model, including a
committor for low-dimensional atmospheric models. discussion of physical and practical insights gleaned from
 Both quantities depend on the initial condition, defin- our approach. In section 4 we identify an optimal set of re-
ing functions over -dimensional state space that encode duced coordinates for estimating risk using sparse regres-
important information regarding the fundamental causes sion. These results will provide motivation for the compu-
and precursors of the rare event. However, “decoding” the tational method, which we present afterward in section 5
physical insights is not automatic. With real-time measure- along with accuracy tests. We then lay out future prospects
ment constraints, the risk metrics must be estimated from and conclude in section 6.
low-dimensional proxies. Even visualizing them requires
projecting down to one or two dimensions. This calls for a
 2. Holton-Mass model
principled selection of low-dimensional coordinates which
are both physically meaningful and statistically informa- Holton and Mass (1976) devised a simple model of the
tive for our chosen risk metrics. We address this problem stratosphere aimed at reproducing observed intra-seasonal
using sparse regression, a simple but easily extensible so- oscillations of the polar vortex, which they termed “strato-
lution with the potential to inform optimal measurement spheric vacillation cycles.” Earlier SSW models, origi-
strategies to estimate risk as precisely as possible under nating with that of Matsuno (1971), proposed upward-
constraints. propagating planetary waves as the major source of distur-
 Estimation of the committor and lead time is a chal- bance to the vortex. While Matsuno (1971) used impulsive
lenge. We employ a method that uses a large data set forcing from the troposphere as the source of planetary
of short-time independent simulations. We represent the waves, Holton and Mass (1976) suggested that even sta-
committor and lead time as solutions to Feynman-Kac for- tionary tropospheric forcing could lead to an oscillatory
mulae (Oksendal 2003), which relate long-time forecasts response, suggesting that the stratosphere can self-sustain
to instantaneous tendencies. These equations are elegant its own oscillations. While the Holton-Mass model is meant
and general, but computationally daunting: in the continu- to represent internal stratospheric dynamics, Sjoberg and
ous time and space limit, they become partial differential Birner (2014) point out that the stationary boundary condi-
equations (PDE) with independent variables—the same tion does not lead to stationary wave activity flux, meaning
as the model state space dimension. It is therefore hopeless that even the Holton-Mass model involves some dynamic
to solve the equations using any standard spatial discretiza- interaction between the troposphere and stratosphere. Iso-
tion. But, as we demonstrate, the equations can be solved lating internal from external dynamics is a subtle modeling
with remarkable accuracy by expanding in a basis of func- question, but in the present paper we adhere to the original
tions informed by the data set. Holton-Mass framework for simplicity. Our methodology
 We illustrate our approach on the highly simplified applies equally well to other formulations.
Holton-Mass model (Holton and Mass 1976; Christiansen Radiative cooling through the stratosphere and wave per-
2000) with stochastic velocity perturbations in the spirit turbations at the tropopause are the two competing forces
of Birner and Williams (2008). The Holton-Mass model is that drive the vortex in the Holton-Mass model. Altitude-
well-understood dynamically in light of decades of analysis dependent cooling relaxes the zonal wind toward a strong
Learning forecasts of rare stratospheric transitions from short simulations
3

vortex in thermal wind balance with a radiative equilib- frequency), 0 is the Coriolis parameter, and = 2.5 ×
rium temperature field. Gradients in potential vorticity 105 m is a horizontal length scale, selected in order to
along the vortex, however, can allow the propagation of create a homogeneously shaped data set more suited to
Rossby waves. When conditions are just right, a Rossby our analysis. See Holton and Mass (1976); Yoden (1987a);
wave emerges from the tropopause and rapidly propagates Christiansen (2000) for details on parameters. Boundary
upward, sweeping heat poleward and stalling the vortex by conditions are prescribed at the bottom of the stratosphere,
depositing a burst of negative momentum. The vortex is which in this model corresponds to = 0 km, and the top
destroyed and begins anew the rebuilding process. of the stratosphere = 70 km.
 Yoden (1987a) found that for a certain range of param- ℎ
eter settings, these two effects balance each other to create Ψ(0, ) = , Ψ( , ) = 0, (4)
 0
two distinct stable regimes: a strong vortex with zonal wind
close to the radiative equilibrium profile, and a weak vor- (0, ) = (0), ( , ) = ( ).
tex with a possibly oscillatory wind profile. We focus our The vortex-stabilizing influence is represented by ( ), the
study on this bistable setting as a prototypical model of altitude-dependent cooling coefficient, and the radiative
atmospheric regime behavior. The transition from strong 
 wind profile ( ) = (0) + 1000 (with in m), which
to weak vortex state captures the essential dynamics of an relaxes the vortex toward radiative equilibrium. Here =
SSW. O (1) is the vertical wind shear in m/s/km. The competing
 The Holton-Mass model takes the linearized quasi- force of wave perturbation is encoded through the lower
geostrophic potential vorticity (QGPV) equation for a per- boundary condition Ψ(0, ) = ℎ/ 0 .
turbation streamfunction 0 ( , , , ) on top of a zonal Detailed bifurcation analysis of the model by both Yoden
mean flow ( , , ), and projects these two fields onto a (1987a) and Christiansen (2000) in ( , ℎ) space revealed
single zonal wavenumber = 2/( cos 60◦ ) and a single the bifurcations that lead to bistability, vacillations, and ul-
meridional wavenumber ℓ = 3/ , where is the Earth’s timately quasiperiodicity and chaos. Here we will focus on
radius. This notation is consistent with Holton and Mass an intermediate parameter setting of = 1.5 m/s/km and
(1976) and Christiansen (2000), and we refer the reader to ℎ = 38.5 m, where two stable states coexist: a strong vortex
these earlier papers for complete description of the equa- with closely following and an almost barotropic sta-
tions and parameters. The resulting ansatz is tionary wave, as well as a weak vortex with dipping close
 to zero at an intermediate altitude and a stationary wave
 ( , , ) = ( , ) sin(ℓ ) (1)
 with strong westward phase tilt. The two stable equilibria,
 0 /2 
 ( , , , ) = Re{Ψ( , ) } sin(ℓ ) which we call a and b, are illustrated in Fig. 1(a,b) by their
 -dependent zonal wind and perturbation streamfunction
which is fully determined by the reduced state space profiles.
 ( , ), and Ψ( , ), the latter being complex-valued. The two equilibria can be interpreted as two different
is a scale height, 7 km. Inserting this into the linearized winter climatologies, one with a strong vortex and one
QGPV equations yields the coupled PDE system with a weak vortex susceptible to vacillation cycles. To
 explore transitions between these two states, we follow
 2 Ψ
    
 1 Birner and Williams (2008) and modify the Holton-Mass
 − G 2 ( 2 + ℓ 2 ) + + 2 (2)
 4 equations with small additive noise in the variable to
 2 mimic momentum perturbations by smaller scale Rossby
   
 
 = − − G 2 − − 2 Ψ waves, gravity waves, and other unresolved sources. The
 4 2 
 2
 form of noise will be specified in Eq. (7).
 2Ψ
     
 1 While the details of the additive noise are ad hoc,
 + 2 G 2 + − + 2 Ψ − 2 
 4 the general approach can be more rigorously justified
 through the Mori-Zwanzig formalism (Zwanzig 2001).
for Ψ( , ), and Because many hidden degrees of freedom are being pro-
 jected onto the low-dimensional space of the Holton-Mass
 2 
  
 2 2 
 −G ℓ − + 2 = ( − ) − 
 
 (3) model, the dynamics on small observable subspaces can
 be considered stochastic. This is the perspective taken in
  2 ∗
 2 ℓ 2 
  
 Ψ stochastic parameterization of turbulence and other high-
 − ( − ) + 2 + Im Ψ 2
 2 dimensional chaotic systems (Hasselmann 1976; DelSole
 and Farrell 1995; Franzke and Majda 2006; Majda et al.
for ( , ). Here, = 8/(3 ) is a coefficient for projecting 2001; Gottwald et al. 2016). In general, unobserved deter-
sin2 (ℓ ) onto sin(ℓ ). We have nondimensionalized the ministic dynamics can make the system non-Markovian,
equations with the parameter G 2 = 2 2 /( 02 2 ), where which technically violates the assumptions of our method-
 2 = 4 × 10−4 s−2 is a constant stratification (Brunt-Väisälä ology. However, with sufficient separation of timescales the
Learning forecasts of rare stratospheric transitions from short simulations
4

 (a) (b) (c)

 (d) (e)

 Fig. 1. Illustration of the two stable states of the Holton-Mass model and transitions between them. (a) Zonal wind profiles of the radiatively
maintained strong vortex (the fixed point a, blue) which increases linearly with altitude, and the weak vortex (the fixed point b, red) which dips close
to zero in the mid-stratosphere. (b) Streamfunction contours are overlaid for the two equilibria a and b. (c) Parametric plot of a control simulation
in a 2-dimensional state space projection, including two transitions from to (orange) and to (green). (d) Time series of (30 km) from
the same simulation. (e) The steady state density projected onto (30 km).

Markovian assumption is not unreasonable. Furthermore, where : R → R × imparts a correlation structure to
memory terms can be ameliorated by lifting data back to the vector W( ) ∈ R of independent standard white noise
higher-dimensional state space with time-delay embedding processes. As discussed above, we design to be a low-
(Berry et al. 2013; Thiede et al. 2019; Lin and Lu 2021). rank, constant matrix that adds spatially smooth stirring
 We follow Holton and Mass (1976) and discretize the to only the zonal wind (not the streamfunction Ψ) and
equations using a finite-difference method in , with 27 which respects boundary conditions at the bottom and top
vertical levels (including boundaries). After constraining of the stratosphere. Its structure is defined by the following
the boundaries, there are = 3 × (27 − 2) = 75 degrees Euler-Maruyama scheme: in a timesetep = 0.005 days,
of freedom in the model. Christiansen (2000) investigated after a deterministic forward Euler step we add the stochas-
higher resolution and found negligible differences. The full tic perturbation to zonal wind on large vertical scales
discretized state is represented by a long vector
 √
   
 h ∑︁ 1
 X( ) = Re{Ψ}(Δ , ), . . . , Re{Ψ}( − Δ , ), ( ) = sin + (7)
 =0
 2 
 Im{Ψ}(Δ , ), . . . , Im{Ψ}( − Δ , ), (5)
 i where ( = 0, 1, 2) are independent unit normal samples,
 (Δ , ), . . . , ( − Δ , ) ∈ R = R75 = 2, and is a scalar that sets the magnitudes of entries
 in . In terms of physical units,
The deterministic system can be written X( )/ =
 (X( )) for a vector field : R → R specified by E[( ) 2 ]
discretizing (2) and (3). Under deterministic dynamics, 2 = ≈ (1 m/s) 2 /day (8)
 
X( ) → a or X( ) → b as → ∞ depending on initial con-
ditions. The addition of white noise changes the system has units of ( / )/ 1/2 , where the square-root of time
into an Itô diffusion comes from the quadratic variation of the Wiener process.
 It is best interpreted in terms of the daily root-mean-square
 X( ) = (X( )) + (X( )) W( ) (6) velocity perturbation of 1.0 m/s. We have experimented
Learning forecasts of rare stratospheric transitions from short simulations
5

with this value, and found that reducing the noise level be- the streamfunction in the Holton-Mass model:
low 0.8 dramatically reduces the frequency of transitions, ∫ 30 km ∫ 30 km
 
while increasing it past 1.5 washes out metastability. We IHF = − / 0 0 ∝ |Ψ| 2 (9)
keep constant going forward as a favorable numerical 0 km 0 km 
regime to demonstrate our approach, while acknowledging where is the phase of Ψ. The → and → tran-
that the specifics of stochastic parameterization are impor- sitions are again highlighted in orange and green respec-
tant in general to obtain accurate forecasts. The resulting tively, showing geometrical differences between the two
matrix is 75 × 3, with nonzero entries only in the last 25 directions. We will refer to the → transition as an
rows as forcing only applies to ( ). SSW event, even though it is more accurately a transition
 A long simulation of the model reveals metastability, between climatologies according to the Holton-Mass in-
with the system tending to remain close to one fixed point terpretation. The → transition is a vortex restoration
for a long time before switching quickly to the other, as event. Our focus in this paper is on predicting these transi-
shown by the time series of (30 km) in panel (d) of Fig. tion events (mainly the → direction) and monitoring
1. Panel (e) shows a projection of the steady state distribu- their progress in a principled way. In the next section we
tion, also known as the equilibrium/invariant distribution, explain the formalism for doing so.
of as a function of . We call this density (x), which
is a function over the full -dimensional state space. We 3. Forecast functions: the committor and lead time
focus on the zonal wind at 30 km following Christiansen statistics
(2000), because this is where its strength is minimized in a. Defining risk and lead time
the weak vortex. While the two regimes are clearly associ-
ated with the two fixed points, they are better characterized We will introduce the quantities of interest by way of
by extended regions of state space with strong and weak example. First, suppose the stratosphere is observed in
vortices. We thus define the two metastable subsets of R an initial state X(0) = x that is neither in nor , so
 (b)(30 km) < (x)(30 km) < (a)(30 km) and the vor-
 tex is somewhat weakened, but not completely broken
 = {X : (X)(30 km) ≥ (a) (30 km) = 53.8 m/s}, down. We call this intermediate zone = ( ∪ ) c (the
 = {X : (X)(30 km) ≤ (b) (30 km) = 1.75 m/s}. complement of the two metastable sets). Because and 
 are attractive, the system will soon find its way to one or
This straightforward definition roughly follows the con- the other at the first-exit time from , denoted
vention of Charlton and Polvani (2007), which defines an c = min{ ≥ 0 : X( ) ∈ c } (10)
SSW as a reversal of zonal winds at 10 hPa. We use 30 km
for consistency with Christiansen (2000); this is technically Here, c emphasizes that the process has left , i.e., gone
higher than 10 hPa because = 0 in the Holton-Mass model to or . The first-exit location X( c ) is itself a random
represents the tropopause. Our method is equally applica- variable which importantly determines how the system ex-
ble to any definition, and the results are not qualitatively its : either X( c ) ∈ , meaning the vortex restores to
dependent on this choice. Incidentally, the analysis tools radiative equilibrium, or X( c ) ∈ , meaning the vortex
we present may be helpful in distinguishing predictability breaks down into vacillation cycles. A fundamental goal
properties between different definitions. In fact, we will of forecasting is to determine the probabilities of these
show that the height neighborhood of 20 km is actually two events, which naturally leads to the definition of the
more salient for predicting the event than wind at the 30- (forward) committor function
km level, even when the event is defined by wind at 30 km! x ∈ = ( ∪ ) c
  Px {X( ) ∈ }
 
  c
This emerges from statistical analysis alone, and gives us
 
 
 +
 (x) = 0 x∈ (11)
confidence that essential SSW properties are stable with 
respect to reasonable changes in definition. 1
 
  x∈ 
 The orange highlights in Fig. 1 (d) begin when the system where the subscript x indicates that the probability is
exits the region bound for , and end when the system conditional on a fixed initial condition X(0) = x, i.e.,
enters . The green highlights start when the system leaves Px {·} = P{·|X(0) = x}. The superscript “+” distinguishes
 bound for , and end when is reached. Note that → the forward committor from the backward committor, an
transitions, SSWs, are much shorter in duration than → analogous quantity for the time-reversed process which we
transitions. Fig. 1 (c) shows the same paths, but viewed do not use in this paper. Throughout, we will use capital
parametrically in a two-dimensional state space consist- X( ) to denote a stochastic process, and lower-case x to
 ∫ 30 km
ing of integrated heat flux or IHF 0 km − / 0 0 , and represent a specific point in state space, typically an ini-
zonal wind (30 km). IHF is an informative number be- tial condition, i.e., X(0) = x. Both are = 75-dimensional
cause it captures both magnitude and phase information of vectors.
Learning forecasts of rare stratospheric transitions from short simulations
6

 The committor is the probability that the system will be For this more ambitious goal, DNS is prohibitively ex-
in state (the disturbed state) next rather than (the strong pensive. By definition, transitions between and are
vortex state). Hence + (x) = 0 if you start in , and is 1 if infrequent. Therefore, if starting from x far from , a huge
you are already in . In between (i.e., when x ∈ ), + (x) number of sampled trajectories ( ) will be required to ob-
tells you the probability that you will first go to rather serve even a small number ending in , and they may take a
than to . That is, + (x) tells you the probability that an long time to get there. If instead we could precompute these
SSW will happen. functions offline over all of state space, the online forecast-
 Another important forecasting quantity is the lead time ing problem would reduce to “reading off” the committor
to the event of interest. While the forward committor re- and lead time with every new observation. Achieving this
veals the probability of experiencing vortex breakdown goal is the key point of our paper, and we achieve this us-
before returning to a strong vortex, it does not say how ing the dynamical Galerkin approximation, or DGA, recipe
long either event will take. Furthermore, even if the vortex described by Thiede et al. (2019).
is restored first, how long will it be until the next SSW does A brute force way to estimate these functions is to inte-
occur? The time until the next SSW event is denoted , grate the model for a long time until it reaches statistical
again a random variable, whose distribution depends on steady state, meaning it has explored its attractor thor-
the initial condition x. We call Ex [ ] the mean first pas- oughly according to the steady state distribution. After
sage time (MFPT) to . Conversely, we may ask how long long enough, it will have wandered close to every point x
a vortex disturbance will persist before normal conditions sufficiently often to estimate + (x) and + robustly as in
return; the answer (on average) is Ex [ ], the mean first DNS. We have performed such a “control simulation” of
passage time to . These same quantities have been cal- 5×105 days for validation purposes, but our main contribu-
culated previously in other simplified models, e.g. Birner tion in this paper is to compute the forecast functions using
and Williams (2008) and Esler and Mester (2019). only short trajectories with DGA, allowing for massive
 Ex [ ] has an obvious shortcoming: it is an average parallelization. However, we will defer the methodological
over all paths starting from x, including those which go details to Section 5, and first justify the effort with some re-
straight into (i.e., an orange trajectory in Fig. 1c,d) and sults. We visualize the committor and lead time computed
the rest which return to i.e., a green trajectory) and linger from short trajectories and elaborate on their interpretation,
there, potentially for a very long time, before eventually utility, and relationship to ensemble forecasting methods.
re-crossing back into . It is more relevant for near-term
forecasting to condition on the event that an SSW is b. Steady state distribution
coming before the strong vortex returns. For this purpose,
we introduce the conditional mean first passage time, or Before visualizing the committor and lead time, it will be
lead time, to : helpful to have a precise notion of the steady state distribu-
 tion, denoted (x), a probability density that describes the
 + (x) := Ex [ | < ] (12) long-term behavior of a stochastic process X( ). Assuming
 the system is ergodic, averages over time are equivalent to
which quantifies the suddenness of SSW. averages over state space with respect to . That is, for any
 All of these quantities can, in principle, be estimated by well-behaved function : R → R,
direct numerical simulation (DNS). For example, suppose ∫ ∫
 
we observe an initial condition X(0) = x in an operational 1
 h i := lim (X( )) = (x) (x) x (13)
forecasting setting, and wish to estimate the probability →∞ 0 R 
and lead time for the event of next hitting . We would
initialize an ensemble {X (0) = x, = 1, . . . , } and evolve For example, if (x) = 1 (x) (an indicator function, which
each member forward in time until it hits or at the is 1 for x ∈ ⊂ R and 0 for x ∉ ), Eq. (13) says that the
random time . In an explicitly stochastic model, random fraction of time spent in can be found by integrating the
forcing would drive each member to a different fate, while density over . The density peaks in Fig. 1(d) indicates
in a deterministic model their initial conditions would be clearly that the neighborhoods of a and b are two such re-
perturbed slightly. To estimate the committor to , we gions with especially large probability under . Note that
could calculate the fraction of members that hit first. both sides of (13) are independent of the initial condition,
Averaging the arrival times ( ), over only those members which is forgotten eventually. Short-term forecasts are by
gives an estimate of the lead time to . For a single initial definition out-of-equilibrium processes, depending criti-
condition x reasonably close to , DNS may be the most cally on initial conditions; however, (x) is important to us
economical. But how do we systematically compute + (x) here as a “default” distribution for missing information. If
over all of state space (here 75 variables, but potentially the initial condition is only partially observed, e.g. in only
billions of variables in a GCM or other state-of-the-art one coordinate, we have no information about the other
forecast system)? − 1 dimensions, and in many cases the most principled
Learning forecasts of rare stratospheric transitions from short simulations
7

tactic is to assume those other dimensions are distributed Each quantity is projected onto two different one-
according to , conditional on the observation. dimensional CVs: (30 km) (first column) and IHF (sec-
 ond column). In panel (a), for example, we see the com-
c. Visualizing committor and lead times mittor is a decreasing function of : the weaker the wind,
 the more likely a vortex breakdown. Moreover, the curve
 The forecasts + (x) and + (x) are functions of a high- provides a conversion factor between risk (as measured by
dimensional space R . However, these degrees of free- probability) and a physical variable, zonal wind. An ob-
dom may not all be “observable” in a practical sense,
 servation of (30 km) = 38 m/s implies a 50% chance of
given the sparsity and resolution limits of weather sensors,
 vortex breakdown. The variation in slope also tells us that
and visualizing them requires projecting onto reduced-
 a wind reduction from 40 m/s to 30 m/s represents a far
coordinate spaces of dimension 1 or 2. We call these
 greater increase in risk than a reduction from 30 m/s to 20
“collective variables” (CVs) following chemistry litera-
 m/s. Meanwhile, panel (b) shows the committor to be an
ture (e.g., Noé and Clementi 2017), and denote them as
 increasing function of IHF, since SSW is associated with
vector-valued functions from the full state space to a re-
 large wave amplitude and phase lag. However, IHF seems
duced space, : R → R , where = 1 or 2. For in-
 inferior to zonal wind as a committor proxy, as a small
stance, Fig. 1 (c) plots trajectories in the CV space con-
sisting of integrated heat flux and zonal change in IHF from ∼ 15000 to ∼ 20000 K·m2 /s corre-
 ∫ 30 km  wind at 30 km: sponds to a sharp increase in committor from nearly zero
 (x) = 0 km − / 0 0
 , (30 km) . The first compo- to nearly one. In other words, knowing only IHF doesn’t
nent is a nonlinear function involving products of Re{Ψ} provide much useful information about the threat of SSW
and Im{Ψ}, while the second component is a linear func- until it is already virtually certain. The dotted envelope is
tion involving only at a certain altitude. For visualization also wider in panel (b) than (a), indicating that projecting
in general, we have to approximate a function : R → R, the committor onto IHF removes more information than
such as the committor or lead time, as a function of re- projecting onto . While the underlying noise makes it
duced coordinates. That is, we wish to find : R → R impossible to divine the outcome with certainty from any
such that (x) ≈ ( (x)). Given a fixed CV space , an observation, the projection error clearly privileges some
“optimal” is chosen by minimizing some function-space observables over others for their predictive power.
metric between ◦ and . In panels (c) and (d), the lead time is seen to have the
 A natural choice is the mean-squared error weighted by opposite overall trend as the committor: the weaker the
the steady state distribution , so the projection problem is wind, or the greater the heat flux, the closer you are on
to minimize over functions : R → R the penalty average to a vortex breakdown. + (x) is not defined when
 wind is strongest, as x ∈ and so + (x) = 0. However,
 [ ; ] : = k ◦ − k 2 2 ( )
 ∫ h an interesting exception to the trend occurs in the range
 i2
 10 m/s ≤ ≤ 40 m/s: the expected lead time stays con-
 = ( (x)) − (x) (x) x. (14)
 R stant or slightly decreases as zonal wind increases, and the
 projection error remains large. This means that while the
The optimal for this purpose is the conditional expecta- probability of vortex breakdown increases rapidly from
tion 50% to 90%, the time until vortex breakdown remains
 highly uncertain. To resolve this seeming paradox, we will
 (y) = EX∼ [ (X)| (X) = y] have to visualize the joint variation of + and + .
 (x) 1 y ( (x)) (x) x
 ∫
 It is of course better to consider multiple observables
 = lim (15) at once. Fig. 3 shows the information gained beyond ob-
 1 y ( (x)) (x) x
 ∫
 | y|→0
 serving (30 km) by incorporating IHF as a second ob-
where y is a small neighborhood about y in CV space servable. In the top row we project , + , and + onto
 . The subscript X ∼ means that the expectation is with the two-dimensional subspace, revealing structure hidden
respect to a random variable X distributed according to from view in the one-dimensional projections. Panel (a)
 (x), i.e., at steady state. Fig. 2 uses this formula to display is a 2-dimensional extension of Fig. 1(d), with density
one-dimensional projections of the committor (first row) peaks visible in the neighborhoods of a and b. The white
and lead time (second row), as well as the one-standard space surrounding the gray represents physically insignifi-
deviation envelope incurred by projecting out the other 74 cant regions of state space that was not sampled by the long
degrees of freedoom. This “projection error” is defined as simulation. The same convention holds for the following
the square root of the conditional variance two-dimensional figures. The committor is displayed in
 panel (b) over the same space. It changes from blue at the
 2
 top (an SSW is unlikely) to red at the bottom (an SSW
 h i
 (y) = EX∼ (X) − (y) (X) = y . (16)
 is likely), bearing out the negative association between 
Learning forecasts of rare stratospheric transitions from short simulations
8

 (a) (b)

 ! ! = 0.5

 %
 !!.# = 38
 &

 (c) (d)

 Fig. 2. One-dimensional projections of the forward committor (first row) and lead time to (second row). These functions depend on all
 = 75 degrees of freedom in the model, but we have averaged across − 1 = 74 dimensions to visualize them as rough functions of two single
degrees of freedom: (30 km) (first column) and integrated heat flux up to 30 km, IHF (second column). Panel (a) additionally marks the + = 12
threshold and the corresponding value of zonal wind.

and + . However, there are non-negligible horizontal gra- both would have the same committor of 0.5. According to
dients that show that IHF plays a role, too. Likewise, the both and IHF together, i.e., the two-dimensional heat
lead time in panel (c) decreases from ∼ 90 days near a to map in Fig. 3(b), they have very different probabilities of
0 days near b, when the transition is complete. Here, IHF + ( 0 ) = 0.31 and + ( 1 ) = 0.73: an SSW is more than
appears even more critically important for forecasting how twice as likely to occur from starting point 1 as 0 .
the event plays out, as gradients in + are often completely While those committor values come from the DGA
horizontal. method to be described in Section 5, we confirm them
 A horizontal dotted line in Fig. 3(a-c) marks the 50% risk empirically by plotting an ensemble of 100 trajectories
level (30 km) = 38 m/s, but the committor varies along it originating from each of the two initial conditions in pan-
from low risk at the left to high risk at the right: we show els (d) and (e) below, coloring -bound trajectories blue
this concretely by selecting two points 0 and 1 along and -bound trajectories red. Only 28% of the sampled
the line. According to alone, i.e., the curve in Fig. 2(a), trajectories through 0 exhibit an SSW, next going to state
Learning forecasts of rare stratospheric transitions from short simulations
9

 (a) (b) (c)

 (d)

 (e)

 Fig. 3. The density, committor, and lead time as functions of zonal wind and integrated heat flux. Panel (a) projects the steady state
distribution (x) onto the two-dimensional subspace ( ,IHF) at 30 km. The white regions surrounding the gray are unphysical states with
negligible probability. Panels (b) and (c) display the committor and lead time in the same space. A horizontal transect marks the level (30 km) =
38.5 m/s, where + according to only is 0.5. Panels (d) and (e) show ensembles initialized from two points 0 and 1 along the transect, verifying
that their committor and lead time values differ from their values according to , in a way that is predictable due to considering IHF in addition to
 .

 , while 68% of the integrations from 1 end at . In both The lead time prediction is improved similarly by incor-
cases, the heatmaps and ensemble sample means roughly porating the second observable. According to alone, Fig.
 2 predicts a lead time of 40 days for both 0 and 1 . Con-
match. The small differences between the projected com-
 sidering IHF additionally, the two-dimensional heat map
mittor and the empirical “success” rate of trajectories arises
 in Fig. 3 predicts a lead time of 52 days and 24 days for
both from errors in the DGA calculation (which we analyze 0 and 1 , respectively. Referring to the ensemble from 1
in section 5) and the finite size of the ensemble. in panels (d) and (e), the arrival times of red trajectories
Learning forecasts of rare stratospheric transitions from short simulations
10

to provide a discrete sampling of the lead time distribu- spread in + than points at + = 0.5, contrary to the intu-
tions of | < . The sample means are 50 and 32 days ition that closeness to in probability means closeness in
respectively from 0 and 1 , again roughly matching with time. The color shading shows that + is strongly associ-
our predictions. ated with (30 km), while + is more strongly associated
 These two-dimensional projections still leave out 73 re- with IHF(30 km). In particular the horizontal contours in
maining dimensions, which we could incorporate to make panel (c) show that the large spread in lead time near is
the forecasts even better. After accounting for all 75 di- due almost completely to variation in IHF. In other words,
mensions, we would obtain the full committor function the system can be highly committed to with a low zonal
 + : R → R. This is still a probability, i.e., an expecta- wind, but if IHF is low, it may take a long time to get there.
tion over the unresolved turbulent processes and uncertain We can also see this from the lower-left region of Fig. 3(a)
initial condition. Low-dimensional committor projections and (b), where committor is high and lead time is high.
simply treat the projected-out dimensions as random vari- There are two complementary explanations for this phe-
ables sampled according to . Whether projected to a space nomenon. First, the low- , low-IHF region of state space
of 1 or 75 dimensions, the committor is the function of that corresponds to a temporary restoration phase in a vacil-
space that is closest, in the mean-square sense, to the bi- lation cycle, which delays the inevitable collapse of zonal
nary indicator 1 (X( )); this is the defining characteristic wind below the threshold defining . In fact, the ensemble
of conditional expectation (Durrett 2013). In the case that of pathways starting from 0 in Fig. 3(c) has several mem-
the system does hit next, the lead time is closest in the
 bers whose zonal wind either stagnates at medium strength,
mean-square sense to .
 or dips low and partially restores before finally plunging all
 While high-dimensional systems offer many coordinates
 the way down. The second explanation is that many of these
to choose from, we argue that the committor and lead time
 partial restoration events are not part of an → transi-
are the most important nonlinear coordinates to monitor
 tion, but rather a → transition. In a highly irreversible
for forecasting purposes. We will explore their relationship
 system such as the Holton-Mass model, these two situa-
in the next subsection. Although both encode some version
of proximity to SSW, they are independent variables which tions are quite dynamically distinct. To distinguish them
deserve separate consideration. using DGA, we would have to account for the past as well
 as the future, calculating backward-in-time forecasts such
 as the backward committor − (x) = Px {X( − ) ∈ }, where
d. Relationship between risk and lead time − < 0 is the most-recent hitting time. Backward forecasts
 will be analyzed thoroughly in a forthcoming paper, but
 A forecast is most useful if it comes sufficiently early (to they are beyond the scope of the present one.
leave some buffer time before impact) and is sufficiently
 In summary, + and + are principled metrics to inform
precise to time your response. For example, in June we can
 preparation for extreme weather. For example, a threatened
say with certainty it will snow next winter in Minnesota.
 community might decide in advance to start taking action
To be useful, we want to know the date of the first snow
 when an event is very likely, + ≥ 0.8, and somewhat im-
as early as possible. By relating levels of risk (quantified
 minent, + ≤ 10 days, or rather, when an event is somewhat
by + ) and lead time (quantified by + ), we can now assess
 likely, + ≥ 0.5, and very imminent, + ≤ 3 days. Because
the limits of early prediction. Such a relationship would
answer two questions: for an SSW transition, (1) how far of partial restoration events, the committor does not deter-
in advance will we be aware of it with some prescribed mine the lead time or vice versa, and so a good real-time
confidence, say 80%? (2) given some prescribed lead time, disaster response strategy should take both of them into
say 42 days, how aware or ignorant could we be of it? account, defining an “alarm threshold” that is not a single
 The committor and lead time have an overall negative number, but some function of both the committor and lead
relationship, but they do not completely determine each time. This idea is similar in spirit to that of the Torino
other, as the contours in Fig. 3(a,b) do not perfectly line scale, which assigns a single risk metric to an asteroid
up. We treat them as independent variables in Fig. 4, which or comet impacts based on both probability and severity
maps zonal wind and IHF as functions of the coordinates (Binzel 2000). Of course, after many near-SSW events, a
 + and + in an inversion of Fig. 3. The density (x) pro- lot of material damage may have already occurred, which
jected on this space in 4(a) shows again a bimodal struc- may be a reason to define a higher threshold for the defini-
ture around a and b, which occupy opposite corners of this tion of , or even a continuum for different severity levels
space by construction. Meanwhile, zonal wind and IHF of SSW. We emphasize that the choice of , and alarm
are indicated by the shading in panels (b) and (c). The thresholds are more of a community and policy decision
bridge between a and b is not a narrow band, but rather than a scientific one. The strength of our approach is that
includes a curious high-committor, high-lead time branch it provides a flexible numerical framework to quantify and
which seems paradoxical: points at + = 0.9 have a greater optimize the consequences of those decisions.
11

 (a)

 (b) (c)

 Fig. 4. Committor and lead time as independent coordinates. This figure inverts the functions in Fig. 3, considering the zonal wind and
integrated heat flux as functions of committor and lead time. The two-dimensional space they span is the essential goal of forecasting. Panel (a)
shows the steady state distribution on this subspace, which is peaked near a and b (darker shading), weaker in the "bridge" region between them,
and completely negligible the white regions unexplored by data. Panels (b) and (c) display zonal wind and heat flux in color as functions of the
committor and lead time.

4. Sparse representation of the committor input-output pairs. However, to keep the representation in-
 terpretable, we will restrict ourselves to physics-informed
 The committor projections showed give only an impres-
 input features based on the Eliassen-Palm (EP) relation,
sion of its high-dimensional structure. While Eq. (15) says which relates wave activity, PV fluxes and gradients, and
how to optimally represent the committor over a given CV heating source terms in a conservation equation. From Yo-
subspace, optimizing [ ; ] over , it does not say which den (1987b), the EP relation for the Holton-Mass model
subspace is optimal. If the committor does admit a sparse takes the form
representation, we could specifically target observations on  02 
these high-impact signals. In this section we address this 
 + ( ) −1 ∇ · 
much harder problem of optimizing [ ; ] over subspaces 2
 . 02
 The set of CV spaces is infinite, as observables can be =− −1 0 ( 0) (17)
 2 
arbitrarily complex nonlinear functions of the basic state
 where = (− 0 0)j + ( 0 0)k
variables x. Machine learning algorithms such as artifi-
cial neural networks are designed exactly for that purpose: The EP flux divergence has two  alternative expressions:
 −1 ∇ · = 0 0 = −1 0 0 0 . If there were no
 
to represent functions nonparametrically from observed
12

dissipation ( = 0) and the background zonal state were flux are also rather unhelpful as early warning signs, de-
time-independent ( = 0), dividing both sides by spite their central role in SSW evolution. For example, the
would express local conservation of wave activity A = large, positive spikes in heat flux across all altitudes gener-
 02 /(2 ). Neither of these is exact in the stochastic ally occur after the committor ≈ 0.5 threshold has already
Holton-Mass model, so we use the quantities in Eq. (17) as been crossed. Furthermore, the relationship of 0 0 with
diagnostics: enstrophy 02 , PV gradient , PV flux 0 0, the committor is not smooth. The + < 0.5 region at each
 altitude is a thin band near zero.
and heat flux 0 0. Each field is a function of ( , ) and
 The exhaustive CV search in Fig. 5 is visually com-
takes on very different profiles for the states a and b, as pelling in favor of some fields and some altitudes over
found by Yoden (1987b). A transition from to , where others, but it is not satisfactory as a rigorous compari-
the vortex weakens dramatically, must entail a reduction son. Differences between units and ranges make it difficult
in and a burst in positive 0 0 (negative 0 0) as a to objectively compare the 2 projection error. Further-
Rossby wave propagates from the tropopause vertically up more, restricting to one variable at a time is limiting. Ac-
through the stratosphere and breaks. This is the general cordingly, we also perform a more automated approach
physical narrative of a sudden warming event, and these to identify salient variables in the form of a generalized
same fields might be expected to be useful observables linear model for the forward committor, using sparsity-
to track for qualitative understanding and prediction. For promoting LASSO regression (“Least Absolute Shrink-
visualization, we have found (30 ) and IHF(30 km) age and Selection Operator”) due to Tibshirani (1996),
 ∫ 30 km
= 0 km − / 0 0 to be particularly helpful. However, as implemented in the scikit-learn Python package
this doesn’t necessarily imply they are optimal predictors (Pedregosa et al. 2011). As input features, we use all
of + , and regression is a more principled way to find them. state
 ∫ variables Re{Ψ}, Im{Ψ}, , the integrated heat flux
 We start by projecting the committor onto each observ- 0
 − / 0 0 , the eddy PV flux 0 0, and the back-
able at each altitude separately, in hopes of finding partic- ground PV gradient , at all altitudes simultaneously.
ularly salient altitude levels that clarify the role of vertical The advantage of a sparsity-promoting regression is that
interactions. The first five rows of Fig. 5 display, for five it isolates a small number of observables that can accu-
fields ( , |Ψ|, 02 , , and 0 0) and for a range of altitude rately approximate the committor in linear combination.
levels, the mean and standard deviation of the committor Considering that regions close to and have low com-
projected onto that field at that altitude. Each altitude has mittor uncertainty, we regress only on data points with
a different range of the CV; for example, because has a + ∈ (0.2, 0.8), and of those only a subset weighted by
Dirichlet condition at the bottom and a Neumann condi- (x) + (x)(1 − + (x)) to further emphasize the transition
 region + ≈ 0.5. To constrain committor predictions to the
tion at the top, the lower levels have a much smaller range
 range (0, 1), we regress on the committor after an inverse-
of variability than the high levels. We also plot the inte-
 sigmoid transformation, ln( + /(1 − + )). First we do this
grated variance, or 2 projection error, at each level in the
 at each altitude separately, and in Fig. 6 (a) we plot the co-
right-hand column. A low projected committor variance
 efficients of each component as a function of altitude. The
over at altitude 0 means that the committor is mostly
 bottom row of Fig. 5 also displays the committor projected
determined by the single observable ( 0 ), while a high
 on the height-dependent LASSO predictor.
projected variance indicates significant dependence of + The height-dependent regression in 6(a) shows each
on variables other than ( 0 ). In order to compare dif- component is salient for some altitude range. In general, 
ferent altitudes and fields as directly as possible, the 2 and Im{Ψ} dominate as causal variables at low altitudes,
projection error at each altitude is an average over discrete while Re{Ψ} dominates at high altitudes. The overall pre-
bins of the observable. diction quality, as measured by 2 and plotted in Fig. 6 (b),
 In selecting good CV’s, we generally look for a simple, is greatest around 21.5 km, consistent with our qualitative
hopefully monotonic, and sensitive relationship with the observations of Fig. 5. Note that not all single-altitude
committor. Of all the candidate fields, and stand out slices are sufficient for approximating the committor, even
the most in this respect, being clearly negatively correlated with LASSO regression; in the altitude band 50 − 60 km,
with the forward committor at all altitudes. The associated the LASSO predictor is not monotonic and has a large
projection error tends to be greatest in the region + ≈ 0.5, projected variance, as seen in the bottom row of Fig. 5.
as observed before, but interestingly there is a small alti- The specific altitude can matter a great deal. But by using
tude band around 15 − 25 km where its magnitude is mini- all altitudes at once, the committor approximation may be
mized. This suggests an optimal altitude for monitoring the improved further. We thus repeat the LASSO with all alti-
committor through zonal wind, giving the most reliable es- tudes simultaneously and find the sparse coefficient struc-
timate possible for a single state variable. In contrast, the ture shown in 6 (c), with a few variables contributing the
projection of + onto |Ψ|, displays a large variance across most, namely the state variables Ψ and in the altitude
all altitudes. The eddy enstrophy and potential vorticity range 15-22 km. The nonlinear CVs failed to make any
13

 !! projection Total !!
 !! projection error projection error

 Fig. 5. Projection of the forward committor onto a large collection of altitude-dependent physical variables. The top left panel shows
heatmaps of + as a function of and ; white regions denote where ( ) is negligibly observed. The top middle panel shows the standard
deviation in + as a function of and ; this uncertainty stems√︁from the remaining 74 model dimensions. The right-hand panel displays the total
mean-squared error due to the projection for each altitude, i.e., [ ; ] from Eq. (14). A low value indicates that this level is ideal for prediction.
The following rows show the same quantities for other physical variables: streamfunction magnitude, eddy enstrophy, background PV gradient,
eddy PV flux, and LASSO.

nonzero contribution to LASSO, and this remained stub- ward committor, we can make a strong recommendation for
bornly true for other nonlinear combinations not shown,
 targeting observations here. This conclusion applies only
such as 0 0. With multiple lines of evidence indicating
21.5 km as an altitude with high predictive value for the for- to the Holton-Mass model under these parameters, but the
14

 (a) (b)

 (c)

 Fig. 6. Results of LASSO regression of the forward committor with linear and nonlinear input features. Panel (a) shows the coefficients
when + is regressed as a function of only the variables at a given altitude, and panel (b) shows the corresponding correlation score. 21.5 km seems
the most predictive (where ≡ 0 at the tropopause, not the surface). Panel (c) shows the coefficient structure when all altitudes are considered
simultaneously. Most of the nonzero coefficients appear between 15-22 km, distinguishing that range as highly relevant for prediction.

methodology explained above can be applied similarly to ensemble forecasting, we anticipate such methods will be
models of arbitrary complexity. increasingly favorable with modern trends in computing.
 We have presented the committor and lead time as
“ideal” forecasts, especially the committor, which we have
devoted considerable effort to approximating in this sec-
 5. The computational method
tion. We want to emphasize that + and + are not competi-
tors to ensemble forecasting; rather, they are two of its most In this section we describe the methodology, which in-
important end results. So far, we have simply advocated in- volves some technical results from stochastic processes
cluding + and + as quantities of interest. Going forward, and measure theory. After describing the theoretical moti-
however, we do propose an alternative to ensemble fore- vation and the numerical pipeline in turn, we demonstrate
casting aimed specifically at the committor, lead time, and the method’s accuracy and discuss its efficiency compared
a wider class of forecasting functions, as they are important to straightforward ensemble forecasting.
enough in their own right to warrant dedicated computa-
tion methods. Our approach uses only short simulations,
making it highly parallelizable, and shifts the numerical
 a. Feynman-Kac formulae
burden from online to offline. Figs. 2-6 were all generated
using the short-simulation algorithm. While the method is The forecast functions described above—committors
not yet optimized and in some cases not competitive with and passage times—can all be derived from general con-
15

ditional expectations of the form In a diffusion process like the stochastic Holton-Mass
   ∫  model, L is an advection-diffusion partial differential oper-
 (x; ) = Ex (X( )) exp Γ(X( )) (18) ator which is analogous to a material derivative in fluid me-
 0 chanics. The generator encapsulates the properties of the
where again the subscript x denotes conditioning on X(0) = stochastic process. In addition to solving boundary value
x; , Γ are arbitrary known functions over R ; and is a problems (18), its adjoint L ∗ provides the Fokker-Planck
stopping time, specifically a first-exit time like Eq. (10) but equation for the stationary density (x):
possibly with replaced by another set. is a variable
parameter that turns into a moment-generating function. L ∗ (x) = 0 (25)
To see that the forward committor takes on this form, set
 (x) = 1 (x),  = 0 (Γ can be anything), and = ∪ . We can also write equations for moments of , as in (22),
Then (x) = Ex 1 (X( )) = Px {X( ) ∈ } = + (x). by differentiating (23) repeatedly and setting = 0:
For the + , set = , = 1 , and Γ = 1. Then  
 L (x; 0) = − Γ −1 (26)
 (x; ) = Ex 1 (X( )) exp( )
  
 (19)
 This is an application of the Kac Moment Method (Fitzsim-
 1 Ex [ 1 (X( ))]
 (x; 0) = (20) mons and Pitman 1999). Note that we never actually have
 + (x) Ex [1 (X( ))] to solve (23) with nonzero . Instead we implement the re-
 = + (x). (21) cursion above. Note that the base case, = 0, with = 1 
 gives + = + , no matter what the risk function Γ. In this pa-
So we must also be able to differentiate with respect to
 per we compute only up to the first moment, = 1. Further
 .
 background regarding stochastic processes and Feynman-
 More generally, the function is chosen by the user
 Kac formulae can be found in Karatzas and Shreve (1998);
to quantify risk at the terminal time ; in the case of the
 Oksendal (2003); E et al. (2019).
forward committor, that risk is binary, with an SSW repre-
senting a positive risk and a radiative vortex no risk at all.
The function Γ is chosen to quantify the risk accumulated b. Dynamical Galerkin Approximation
up until time , which might be simply an event’s duration, To solve the boundary value problem (23) with = 0,
but other integrated risks may be of more interest for the we start by following the standard finite element recipe,
application. For example, one could express the total pole- converting to a variational form and projecting onto a fi-
ward heat flux by setting Γ = 0 0, or the momentum lost by nite basis. First, we homogenize boundary conditions by
the vortex by setting Γ(x) = (a) − (x). Extending (20), writing (x) = ˆ (x) + (x), where ˆ is a guess function
one can compute not only means but higher moments of that obeys the boundary condition ˆ | c = , and | c = 0.
such integrals by expressing the risk with Γ. Repeated dif- Next, we integrate the equation against any test function ,
ferentiation of (x; ) gives weighting the integrand by a density (which is arbitrary
  ∫   for now, but will be specified later):
 (x; 0) = Ex (X( )) Γ(X( )) (22) ∫ ∫
 0
 (x)L (x) (x) x = ˆ
 (x)( − L )(x) (x) x
 We choose to focus on expectations of the form (18) R 
in order to take advantage of the Feynman-Kac formula, ˆ 
 h , L i = h , − L i (27)
which represents (x; ) as the solution to a PDE boundary
value problem over state space. As PDEs involve local The test function should live in the same space as , i.e.,
operators, this form is more amenable to solution with with homogeneous boundary conditions (x) = 0 for x ∈
short trajectories which don’t stray far from their source. ∪ . We refer to the inner products in (27) as being “with
The boundary value problem associated with (18) is respect to” the measure (with density) .Í We approximate
 ( by expanding in a finite basis (x) = =1 (x) with
 (L + Γ) (x; ) = 0 x ∈ unknown coefficients , and enforce that (27) hold for
 (23)
 (x; ) = (x) x ∈ c each . This reduces the problem to a system of linear
 equations,
The domain here is some combination of c and c .
The operator L is known as the infinitesimal generator of 
 ∑︁
the stochastic process, which acts on functions by pushing ˆ 
 h , L i = h , − L i = 1, . . . , (28)
expectations forward in time along trajectories: =1

 Ex [ (X(Δ ))] − (x) which can be solved with standard numerical linear algebra
 L (x) := lim (24)
 Δ →0 Δ packages.
You can also read