Ensemble Riemannian Data Assimilation over the Wasserstein Space
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Ensemble Riemannian Data Assimilation over the Wasserstein Space Sagar K. Tamang1,2 , Ardeshir Ebtehaj1,2 , Peter J. Van Leeuwen3 , Dongmian Zou4 , and Gilad Lerman5 arXiv:2009.03443v1 [stat.ME] 7 Sep 2020 1 Saint Anthony Falls Laboratory, University of Minnesota-Twin Cities 2 Department of Civil, Environmental and Geo- Engineering, University of Minnesota-Twin Cities 3 Department of Atmospheric Science, Colorado State University 4 Duke Kunshan University 5 School of Mathematics, University of Minnesota-Twin Cities Abstract In this paper, we present a new ensemble data assimilation paradigm over a Rieman- nian manifold equipped with the Wasserstein metric. Unlike Eulerian penalization of error in the Euclidean space, the Wasserstein metric can capture translation and shape difference between square integrable probability distributions of the background state and observations, enabling to formally penalize geophysical biases in a non-Gaussian state-space. The new approach is applied to dissipative and chaotic evolutionary dy- namics with a wide range of applications in Earth system models. Its advantages over classic variational and particle filter techniques are documented under systematic errors and non-Gaussian state-spaces. 1
1 Introduction Extending the forecast skill of Earth System Models (ESM) relies on ad- vancing the science of data assimilation (DA). A large body of current DA methodologies, either filtering (Evensen, 1994, 2003; Kalman, 1960; Reichle et al., 2002a,0) or variational approaches (Andersson et al., 1994; Ebtehaj and Foufoula-Georgiou, 2013; Le Dimet and Talagrand , 1986; Lorenc, 1986; Park and Županski , 2003; Reichle et al., 2001), are derived from basic principles of Bayesian inference under the assumption that the model and observation errors are drawn from zero-mean Gaussian distributions. It is well docu- mented that these assumptions often limit the forecast skills of DA systems (Chen et al., 2019a; Dee, 2005; Ebtehaj et al., 2014; Walker et al., 2001) as natural processes are not necessarily Gaussian (Bocquet et al., 2010; Pires et al., 2010) and systematic errors or biases exist in ESMs and observations – chiefly due to under-representation of the underlying physics and errors in instruments as well as retrieval algorithms (Dee, 2003). Apart from particle filters (Van Leeuwen, 2010), which are intrinsically de- signed to deal with non-Gaussian state-space, numerous modifications to the variational DA (VDA) and ensemble filtering techniques have been made to tackle non-Gaussianity (Anderson, 2010; Han and Li , 2008; Mandel and Bee- zley, 2009; Pires et al., 1996). In four-dimensional VDA, where non-Gaussian state-space may lead to local minima, a quasi-static VDA is proposed to en- sure convergence to a global minimum by gradually increasing the assimila- tion intervals (Pires et al., 1996). For ensemble-based filters, Anderson (2010) proposes a new approach to account for non-Gaussian priors and posteriors 2
by utilizing rank histograms (Anderson, 1996; Hamill , 2001). A hybrid en- semble Kalman-particle filter is also developed (Mandel and Beezley, 2009) to utilize both the ensemble Kalman filters (EnKF; Evensen, 1994) and particle filters to cope with non-Gaussian priors in high-dimensional geophysical DA systems. In recent years, significant progress has been made to treat systematic errors through numerous ad-hoc methods such as field alignment techniques (Ravela et al., 2007) and morphing EnKF (Beezley and Mandel , 2008) that can tackle position errors between observations and forecast. Dual state- parameter EnKF (Moradkhani et al., 2005) is also developed to resolve sys- tematic errors originating from parameter uncertainties. Bias aware variants of the Kalman filter are designed (De Lannoy et al., 2007a,0; Drécourt et al., 2006; Kollat et al., 2008) to simultaneously update the state-space and an a priori estimate of the additive biases. In parallel, the cumulative distribu- tion function matching (Reichle and Koster , 2004) has garnered widespread attention in land DA to alleviate systematic biases in remote sensing obser- vations. Despite significant progress in developing versatile DA systems, many existing techniques are ad-hoc and cannot formalize non-Gaussianity and systematic errors rigorously. Even though particle filters can handle non- Gaussian likelihood functions, when observations lie away from the support set of the particles, their ensemble variance collapses over time and this de- generates the filters (Poterjoy and Anderson, 2016). In this paper, we aim to address the shortcomings of current Bayesian DA methodologies by devel- 3
oping a new ensemble methodology, which borrows ideas from Riemannian geometry and optimal mass transport, that can effectively handle a non- Gaussian state-space and geophysical biases. In order to emphasize the advantage of this Riemannian perspective over common Bayesian methods for DA, we refer to the following technical prop- erty: The set of equal-mean Gaussian distributions, which are common in Bayesian modeling, span an affine Euclidean subspace (Amari , 2012). How- ever, sufficiently smooth non-Gaussian distributions with finite second-order moments constitute a Riemannian statistical manifold (Lauritzen, 1987). It is equipped with a geodesic distance, which assigns to any two non-Gaussian probability distributions the shortest curve on the Riemannian manifold they belong to. This distance thus encapsulates the dissimilarity between the mean values as well as all higher-order moments of the two distributions (Pennec, 2006). More specifically, the geodesic distance on the Riemannian manifold of square-integrable probability distributions is the Wasserstein distance (Vil- lani , 2003). It stems from the theory of optimal mass transport (Kantorovich, 1942; Monge, 1781). Unlike the Euclidean distance, which is “Eulerian” in the sense that it becomes insensitive to the magnitude of translation between two disjoint densities, the Wasserstein distance acts as a “Lagrangian” metric and penalizes the mismatch based on the difference between their central positions as well as their shapes (Chen et al., 2019b; Ning et al., 2014). Wasserstein space representations have been established for the following physical mod- els: The Fokker-Plank equation (Jordan et al., 1998), the dissipative porous 4
media equation (Otto, 2001) and the semi-geostrophic approximation of the compressible Navier-Stokes equations (Cullen et al., 1991). Several attempts have been made to utilize the Wasserstein metric in geo- physical DA. Ning et al. (2014) used the Wasserstein distance to reduce fore- cast uncertainty due to errors in parameter estimation in dissipative evolu- tionary equations. Feyeux et al. (2018) suggested replacing the Euclidean distance in VDA with the Wasserstein distance to penalize the position error between state and observations. Tamang et al. (2020) introduced a Wasser- stein regularization of the VDA cost function with particular emphasis on bias correction under chaotic dynamics. All of the aforementioned techniques stem from the variational prob- lem formulation and therefore they are unable to rigorously tackle non- Gaussian state-space. Furthermore, the high computational cost associated with the Wasserstein distance inhibit the application of these methods in high-dimensional geophysical systems. In this work, we attempt to answer some of the key questions which still remain to be explored: How can a Rie- mannian interpretation of the DA using the Wasserstein distance formally account for non-Gaussian state-space? Why and how does the Riemannian DA can correct for systematic errors? Is there a way for ensemble implemen- tation in high-dimensional problems given the fact that the computation of the Wasserstein distance is costly? We present a novel ensemble DA methodology namely Ensemble Rieman- nian DA (EnRDA). As explained, the presented methodology extends the DA formalism over a Riemannian manifold to account for non-Gaussian state- 5
space under systematic geophysical biases. It is important to note that since EnRDA operates in the probability domain it can lead to the estimation of the entire forecast probability distribution. The initial results of the EnRDA are demonstrated for dissipative advection-diffusion and chaotic Lorenz-63 system with a wide range of prospective applications in the forecast of mass, momentum, and heat transport in Earth system models. The organization of the paper is as follows: Section 2 provides a brief background on Bayesian DA formulations and optimal mass transport. The mathematical formalism of the EnRDA is described in Section 3. Section 4 presents the results and compares them with their Euclidean counterparts. Section 5 discusses the findings and ideas for future research. 2 Background 2.1 Notations Throughout, small and capital boldface letters represent m-element column vectors x ∈ Rm and m-by-n matrices X ∈ Rm×n , respectively, whereas m×n Rm + (R+ ) denotes that vectors (matrices) only contain non-negative real numbers. The 1m denotes an m-element vector of ones, Im is an m × m identity matrix, δ(x − y) represents Kronecker delta function with value 1 iff x = y and 0 otherwise, diag(x) ∈ Rm×m represents a diagonal matrix with entries given by x ∈ Rm and EX (x) is the expectation of x under probability density function p(x). Notation x ∼ N (µ, Σ) denotes that the random vec- tor x is drawn from a Gaussian distribution with mean µ and covariance Σ. 6
The square of the weighted `2 -norm of x is represented as kxk2B−1 = xT B−1 x, where B is a positive definite matrix and (·)T is the transposition opera- tor. Notations of x y and x y represent the element-wise Hadamard product and division between equal length vectors x and y, respectively, hA, Bi = tr(AT B) denotes the Frobenius inner product between matrices A and B, tr(·) and det(·) represent trace and determinant of a square matrix respectively. 2.2 Data Assimilation on Euclidean Space In this section, we provide a brief review on derivation of classic variational DA and particle filters based on the Bayes’ theorem to set the stage for the presented Ensemble Riemannian DA formalism. 2.2.1 Variational Formulation Let us consider a discrete-time Markovian state-space and its observations as follows: xt = M(xt−1 ) + ω t , ω t ∼ N (0, B) (1) yt = H(xt ) + v t , v t ∼ N (0, R) , where xt ∈ Rm and yt ∈ Rn represent the state variables and the observations at time t; M : Rm → Rm and H : Rm → Rn are the deterministic forward model and observation operators; ω t ∈ Rm and v t ∈ Rn are the independent and identically distributed model and observation errors, respectively. Recalling the Bayes’ theorem, dropping the time subscript, without loss of generality the posterior probability density function (pdf) of the state given 7
the observation can be obtained as p(x|y) = p(y|x) p(x) /p(y), where p(y|x) is proportional to the likelihood function, p(x) is the prior density and p(y) denotes the distribution of observations. As is evident, in the log-space, the theorem can be expressed as follows: − log p(x|y) = − log p(x) − log p(y|x) + log p(y) . (2) Letting xb = EX (x) ∈ Rm represents the background state, ignoring the additive constant terms including log p(y) and assuming Gaussian errors, Equation 2 can be expanded to the known three-dimensional VDA (3D-Var) cost function (Lorenc, 1986) as follows: 1 1 − log p(x|y) ∝ (x − xb )T B−1 (x − xb ) + (y − H(x))T R−1 (y − H(x)) 2 2 ∝ kx − xb k2B−1 + ky − H(x)k2R−1 . (3) As a result, the analysis state obtained by minimization of the 3D-Var cost function in Equation 3 is the mode of the posterior distribution that coincides with the posterior mean when errors are drawn from unbiased Gaussian densi- ties and H is a linear operator. Using the Woodbury matrix inversion lemma (Woodbury and Woodbury, 1950), it can be easily demonstrated that for a linear observation operator, the analysis states in the 3D-Var and Kalman filter are equivalent. As is evident, zero-mean Gaussian assumption leads to penalization of the error through the weighted Euclidean norm. 8
2.2.2 Particle Filters Particle filters (Gordon et al., 1993; Van Leeuwen, 2010; Van Leeuwen et al., 2019) in DA were introduced to address the issue of non-Gaussian state- space by representing the prior and posterior distributions through a weighted ensemble of model outputs referred to as the “particles”. In its standard discrete setting, using Monte Carlo simulations, the state-space prior density p(x) is represented by a sum of equal-weight Kronecker delta functions as: N 1 X p(x) = δ(x − xi ) , (4) N i=1 where xi ∈ Rm is the state variable represented by the ith particle and N is the total number of particles. Each of these N particles are then evolved through the nonlinear model in Equation 1. Let us assume that at every assimilation cycle, when observations become available, the probability of observations y ∈ Rn given the state variable xi follows a Gaussian distribution as: 1 1 T p(y|xi ) = exp − y − H(xi ) R−1 y − H(xi ) n/2 1/2 ∀i. (2π) det(R) 2 (5) Using the Bayes’ theorem, under the Gaussian error assumption, it can be easily shown that the posterior distribution p(x|y) can be approximated using a set of weighted particles as: N X p(x|y) = wi δ(x − xi ) , (6) i=1 9
where {wi }N th i=1 ∈ R represents the weight of the i particle p(y|xi ) wi = P N . (7) p(y|x j) j=1 The particles are then resampled from the posterior distribution in Equa- tion 6 based on their relative weights and are propagated forward in time according to the model dynamics. As is evident, in particle filters, weights of each particle are updated using the Gaussian likelihood function under a zero-mean error assumption. However, in the presence of systematic biases, when the support sets of particles and the observation are disjoint, only the weights of a few particles become significantly large and weights of all other particle tend to zero. As the underlying dynamical system progresses in time, only those few particles, with relatively larger weights, are resampled, which gradually makes the filter degenerate in time (Poterjoy and Anderson, 2016). It is to note that the problem of filter degeneracy exists even when the ensem- ble size is low and requires an exponential growth in the number of particles to maintain a constant effective particle size (Doucet and Johansen, 2009). 2.3 Optimal Mass Transport The theory of optimal mass transport (OMT) first coined by Gaspard Monge (Monge, 1781) and later extended by Kantorovich (Kantorovich, 1942) was developed to minimize transportation cost in resource allocation problems with purely practical motivations. Later on mathematicians discovered that the OMT provides a rich ground to compare and morph probability distribu- tions and uncovered new connections to partial differential equations (Jordan 10
et al., 1998; Otto, 2001) as well as functional analysis (Benamou and Brenier , 2000; Brenier , 1987; Villani , 2003). In a discrete setting, for two probability histograms {pxb ∈ Rm P + : i pxbi = 1} and {py ∈ Rm P + : j pyj = 1} with their respective measures supported on xbi and yj , a “ground” transportation cost matrix C ∈ Rm×m + is defined such that its elements cij = |xbi − yj |q represent the cost of transporting unit probability masses from location xbi to yj , where q is a positive exponent. Without loss of generality, assuming that pxb and py have the same number of support points, OMT determines an optimal transportation plan Ua ∈ Rm×m + that can linearly map the two histograms onto each other with minimum amount of total transportation cost or “work” as follows: Ua = argmin hC, Ui s.t. U ≥ 0, U1m = pxb , UT 1m = py . (8) U It is worth noting that the transportation plan can be interpreted as a joint distribution that couples the marginals densities pxb and py . For the ground transportation cost with q = 2, the OMT problem in Equation 8 is convex and defines the square of the 2-Wasserstein distance d2W (pxb , py ) between the probability histograms. One key question that might arise here is – what is the advantage of the Wasserstein distance for interpolating probability distributions compared to other measures of proximity – such as the Hellinger distance (Hellinger , 1909) or the KullbackLeibler (KL) divergence (Kullback and Leibler , 1951)? To elaborate on this question, we confine our consideration to the Gaus- sian densities over which the Wasserstein distance can be represented in a closed form. In particular, let us interpolate between two Gaussian distri- 11
Figure 1: Barycenter of two Gaussian distributions N (1.4, 0.01) and N (−1.1, 0.35) as a function of displacement parameter η ∈ [0, 1] for the (a) Hellinger distance, (b) Kullback- Leibler divergence, and (c) Wasserstein distance. butions N (µx , Σx ) with N (µy , Σy ) with an interpolation or displacement parameter η. It can be shown that the interpolated density, with mini- mum sum of squares of the Wasserstein distance, is a Gaussian distribution N (µη , Ση ), where µη = (1 − η) µx + η µy and Ση = Σ−1/2 x (1 − η) Σx + 1/2 2 −1/2 η (Σx1/2 Σy Σ1/2 x ) Σx (Chen et al., 2019b). Figure 1 shows the spectrum of barycenters between two Gaussian pdfs for a range of the displacement parameter η ∈ [0, 1]. As shown, the interpolated densities using the Hellinger distance, which is Euclidean in the space of probability distributions, are bimodal. Although the Gaussian shape of the interpolated densities using the KL divergence is preserved, the variance of the interpolants is not necessarily bounded by the variances of the input Gaussian densities. Unlike these metrics, as shown, the Wasserstein distance moves the mean and preserves the shape of the interpolants through a natural morphing process. 12
Figure 2: Wasserstein barycenters pη between two Gaussian mixture models pxb : 0.5 N (−12, 0.4) + 0.5 N (−8, 0.8) and py : 0.55 N (5, 4) + 0.45 N (9.5, 4) as a function of the displacement parameter η ∈ [0, 1]. Also shown is the Wasserstein barycenter for η = 0.5. As previously noted, this metric is not limited to any Gaussian assumption. Figure 2 shows the Wasserstein barycenter between two bimodal mixtures of Gaussians as a function of the displacement parameter η ∈ [0, 1]. The results clearly show the Lagrangian nature of the Wasserstein metric that penalizes the mismatch not only based on the shape but also the shift between the two probability density functions, which makes it suitable to tackle geophysical biases. In a general setting, it can be shown that d2W (pxb , py ) = d2W (pxb , py )+ 2 µxb − µy , where pxb and py are the centred zero-mean probability masses and µxb and µy are the respective mean values. 13
3 Ensemble Riemannian Data Assimilation 3.1 Problem Formulation First, let us recall that the weighted mean of a cloud of points {xi }Ni=1 ∈ R m on Euclidean space is µx = N P i=1 ηi xi /N for a given family of non-negative P weights i ηi = 1. This expected value is equivalent to solving the following variational problem: N X µx = argmin ηi kxi − xk2 . (9) x i=1 Thus, the 3D-Var problem in Equation 3 after Cholesky decomposition (Nash, 1990) of the error covariance matrices and rearrangement of terms can be interpreted as a “barycenter problem” in the Euclidean space, where the analysis state is the weighted mean of the background state and observation. By changing the distance metric from Euclidean to the Wasserstein (Agueh and Carlier , 2011), a Riemannian barycenter can be defined as the Fréchet mean (Fréchet, 1948) of N probability histograms with finite second-order moments N X pη = argmin ηk d2W (p, pk ) . (10) p k=1 Let us assume that the probability histogram of the background state pxb = 1 PN i N i=1 δ(x−x ) can be represented through a sum of equal weight Kronecker delta functions located at N ensemble members of the state variable xi ∈ Rm . We similarly approximate the distribution of the observations py = 1 PM j∗ j∗ M n M j=1 δ(y − y ) using the replicated “pseudo-observations” {y }j=1 ∈ R obtained by perturbing it with zero-mean Gaussian noise v ∗ ∼ N (0, R). 14
The EnRDA defines the probability histogram of the analysis state pxa ∈ Rm as the Frchet barycenter over the Wasserstein space as follows: pxa = argmin η d2W (px , pxb ) + (1 − η) d2W (|det(H0 (x))|−1 px , py ) , (11) px where the displacement parameter η > 0 assigns the relative weights to the observation and background term to capture their respective geodesic dis- tances from the true state and H0 (·) is the Jacobian of the observation op- erator assuming that the support set of the probability histograms of the observation and state variables are the same. The optimal value of η shall be determined empirically using some reference data. It is important to note that the solution of the above DA formalism involves finding the opti- mal transportation plan (Equation 8) or the joint density, which couples the background and observation marginal distributions. From a Bayesian point of view, the above problem can be derived as- suming that the model and observation errors have finite second-order mo- ments and are explained by generalized Riemannian Gaussian distributions (Said et al., 2017). Specifically, following the Bayesian principle explained in Equation 2, the EnRDA can be derived assuming that the prior p(x) and likelihood function p(y|x) are p(x) ∝ exp −η d2W (px , pxb ) and p(y|x) ∝ exp −(1 − η)d2W |detH0 (x)|−1 px , py , . From the joint histogram (Ua ) coupling the pxb and py (Equation 8), the histogram of the analysis state can be obtained based on the McCann’s interpolation (McCann, 1997; Peyré et al., 2019), N X X M uaij δ (1 − η) xi + η yj∗ . pxa = (12) i=1 j=1 15
Samples from the analysis state histogram in Equation 12 can be drawn to produce model ensemble forecast. It should be noted that computation of the joint histogram is an expensive linear programming problem. The widely used interior-point methods (Altman and Gondzio, 1999) and the Orlin’s (Orlin, 1993) algorithm have super-cubic run time with computational complexity of O(d3 log d), where d represents the total number of support points over which the input probability histograms are defined. This is a serious limitation in high-dimensional geophysical DA problems. 3.2 Entropic Regularization of EnRDA In order to speed up computation of coupling between pxb and py , the problem in Equation 8 can be regularized (Cuturi , 2013) as follows: Ua = argmin hC, Ui−γ H(U) s.t. U ≥ 0, U1m = pxb , UT 1m = py , (13) U where γ > 0 is the regularization parameter and H(U) := hU, log U − 1m 1T mi represents the Gibbs-Boltzmann relative entropy function. Note that the relative entropy is a concave function and thus its negative value is convex. The Lagrangian function (L) of Equation 13 can be obtained by adding two dual variables or Lagrangian multipliers q, r ∈ Rm as follows: L(U, q, r) = hC, Ui − γ H(U) − hq, U1m − pxb i − hr, UT 1m − py i . (14) Setting the derivative of the Lagrangian function to zero, we have ∂L(U, q, r) = cij + γ log(uij ) − qi − rj = 0 , ∀i, j . (15) ∂uij 16
The convexity of the entropic regularization keeps the problem in Equa- tion 13 strongly convex and it can be shown (Peyré et al., 2019) that Equa- tion 15 leads to a unique optimal joint density with the following form: Ua = diag(v) K diag(w) , (16) where v = exp(q) (γ 1m ) ∈ Rm and w = exp(r) (γ 1m ) ∈ Rm are the unknown scaling variables and K ∈ Rm×m+ is the Gibbs kernel, associated cij with cost matrix C with element kij = exp (− ). γ From the mass conservation constraints in Equation 13 and scaling form of the optimal joint density in Equation 16 we can derive, diag(v) K diag(w)1m = pxb and diag(w) KT diag(v)1m = py . (17) Using Equation 17, the two unknown scaling variables v and w can be iteratively solved using the Sinkhorn’s algorithm (Cuturi , 2013) as follows: vl+1 = pxb (Kwl ) and wl+1 = py (KT vl ) . (18) 3.3 Selection of the Entropic Regularization Parameter The entropic regularization parameter γ plays a very important role in char- acterization of the joint density; however, there exists no closed form solu- tion for its optimal selection. Generally speaking, increasing the value of γ will increase convexity of the cost function and thus computational efficiency; however, at the expense of reduced coupling between the marginal histograms of the background state and observations. The reduced coupling might be in- terpreted physically using the first and second law of thermodynamics, which 17
states that the drive of thermodynamic systems toward equilibrium is a result of a tendency toward maximum entropy or disorder. Therefore, we expect that by increasing the entropic regularization, the degree of coupling is re- duced and the probability masses of the joint distribution becomes more spread out. As an example, the effects of γ on the coupling between two Gaussian mixture models pxb and py , shown in Figure 2, are demonstrated in Figure 3. It can be seen that at smaller values of γ = 0.001, the probability masses of the joint distribution are sparse and lie compactly along the main diagonal – capturing a strong coupling between the background state and observations. However, as the value of γ increases, the probability masses of the joint distribution become more dense and spread out – reflecting less degree of dependencies between the marginals. It is important to note that in limiting cases, as γ → 0, the solution of Equation 13 converges to the true optimal joint histogram, while as γ → ∞ the entropy of the analysis state increases and tends to pxb pT y. 4 Numerical Experiments and Results In order to demonstrate the performance of the EnRDA and assess its effec- tiveness, we focus on the linear advection-diffusion equation and the chaotic Lorenz-63 model (Lorenz , 1963). The advection-diffusion model explains a wide range of heat, mass, and momentum transport across the land, vege- tation, and atmospheric continuum, and has been utilized to evaluate the performance of DA methodologies (Berardi et al., 2016; Ebtehaj et al., 2014; 18
Figure 3: The effect of the entropic regularization parameter γ on the optimal joint his- togram coupling two Gaussian mixture models pxb : 0.5 N (−12, 0.4) + 0.5 N (−8, 0.8) and py : 0.55 N (5, 4) + 0.45 N (9.5, 4) – shown in Figure 2. Hurkmans et al., 2006; Ning et al., 2014; Zhang et al., 1997). Similarly, the Lorenz-63 model, as a chaotic model of atmospheric convection, has been widely used in testing the performance of DA methodologies (Goodliff et al., 2015; Miller et al., 1994; Nakano et al., 2007; Tamang et al., 2020; Tandeo et al., 2015; Van Leeuwen, 2010). Throughout, under controlled experimental settings with foreknown model and observation errors, we run the forward models under systematic errors and compare the results of the EnRDA with 3D-Var for advection-diffusion dynamics and with a particle filter for the Lorenz-63 system. 4.1 Advection-Diffusion Equation 4.1.1 State-space Characterization The advection-diffusion is a special case of the Navier-Stokes partial differ- ential equation and its linearized form, with constant diffusivity in an incom- 19
pressible fluid flow, is expressed for a conserved physical quantity x(s, t) as follows: ∂x(s, t) +a ∇x(s, t) = D ∇2 x(s, t) , (19) ∂t where s ∈ Rk represents a k−dimensional spatial domain and t is time. In the above expression, a = (a1 , . . . , ak ) ∈ Rk is the constant velocity vector and D = diag(D1 , . . . , Dk ) ∈ Rk×k represents the diffusivity matrix. Given initial condition x(s, t = 0), owing to its linearity, the solution at time t can be obtained by convolving the initial condition with a Kronecker delta function δ(s − a t) followed by convolution with a zero-mean Gaussian kernel G(s, t) 1 1 T −1 G(s, t) = exp − s Σ s , (20) (2π) det(Σ)1/2 k/2 2 where the covariance matrix of the kernel is Σ = 2 D t. 4.1.2 Experimental Setup and Results In this subsection, we present the results of DA experiments on 1-D and 2-D advection-diffusion equations. For the 1-D case, the state-space is character- ized over a spatial domain s ∈ (0, 60] with a discretization of ∆s = 0.1. The model parameters are chosen to be a = 0.8 [L/T] and D = 0.25 [L2 /T]. The initial state is considered to be a bimodal mixture of Gaus- sian distributions obtained by superposition of two Kronecker delta func- tions x(s, t = 0) = 300 δ(s) – evolved for time 15 and 25 [t], respectively. The ground truth of the state-space is then obtained by evolving the initial state with a time step of ∆t = 0.5 over a time period of T = 0–30 [t] in the absence of any model error. 20
The observations are obtained at assimilation intervals 10∆t by corrupting the ground truth with a heteroscedastic Gaussian noise v t ∼ N (0, diag(εy x x)), where εy = 0.05. We introduce both systematic and random errors in model simulations to evolve the state-space under model errors. For the systematic error, model velocity and diffusivity coefficient are set to a0 = 0.12 [L/T] and D0 = 0.4 [L2 /T] respectively, and for the random error, a Gaussian noise ω t ∼ N (0, diag(εb x x)) is added at every ∆t, where εb = 0.02. For implementation of EnRDA, to construct the distribution of the background state, 100 model ensemble members are generated by corrupting the ground truth at T = 0 with a heteroscedastic Gaussian noise ω 0 ∼ N (0, diag(ε0 x x)), where ε0 = 0.05. The regularization and displacement parameters are set to γ = 3 and η = 0.2 by trial and error. To obtain a robust conclusion about the comparison of the proposed EnRDA methodology with 3D-Var, experiments are repeated for 50 independent and identical simulation scenarios. The evolution of the initial state over a time period 0 to 30 [t] and the results comparing the EnRDA and 3D-Var at 5, 15, and 25 [t] are shown in Figure 4. As demonstrated, during all time steps, EnRDA reduced the analysis uncertainty, in terms of both bias and unbiased root mean squared error (ubrmse), and the shape of the entire state-space is properly preserved and remains closer to the ground truth – compared to 3D-Var. As seen, in the 3D-Var, although the analysis state follows the true state reasonably well for initial time steps, as the system propagates over time under systematic errors, the analysis state deviates further away from the ground truth and the shape 21
Figure 4: (a) Temporal evolution of a bimodal initial state under a linear advection-diffusion equation and (b–d) the true state xtr , observations y and analysis states xa by 3D-Var and the EnRDA for three snapshots of time at 5, 15 and 25 [t], respectively. Also provided are the bias and ubrmse of the analysis state by EnRDA (3D-Var) at respective times. of the analysis state is no longer preserved properly. For 50 independent simulations, on average, the bias and ubrmse are decreased by 97.9% and 73.1%, respectively, compared to the 3D-Var. Figure 5 shows the results of a 2-D experiment, elaborating more on the ways that the Riemannian DA can tackle systematic errors in comparison to the classic variational approach over the Euclidean space. In particular, the results demonstrate the role of the displacement parameter η in EnRDA versus the ratio of the error covariance matrices in 3D-Var. As is well un- derstood, error covariance matrices in variational approaches act as weights enabling interpolation between the background state and observations across different dimensions. Recall that in the present form, EnRDA interpolates 22
uniformly across multiple dimensions of the problem with a single displace- ment parameter η. To make a fair comparison, a parameter α ∈ [0, 1] over the Euclidean space is defined based on the model and observation error co- variances as α = tr B−1 (HT R−1 H + B−1 )−1 – assuming that observation and model errors are uncorrelated and homoscedastic. Thus, when assimilat- ing over the Euclidean space, larger values of α assign larger weights to the background state and vice versa. The state-space is characterized over a spatial domain s1 = (0, 10] and s2 = (0, 10] with a discretization of ∆s1 = ∆s2 = 0.1. The advection- diffusion is considered to be an isotropic process with the true model param- eter values set as a1 = a2 = 0.08 [L/T], and D1 = D2 = 0.02 [L2 /T]. Similar to the 1-D setting, the ground truth is considered to be a bi-variate Gaus- sian mixture, which is obtained after evolving two Kronecker delta functions x(s, t) = 1000 δ(s1 , s2 ) and x(s, t) = 4000 δ(s1 , s2 ) for 25 and 35 [t], respec- tively. To resemble a model with systematic errors, a background state is obtained by increasing the advective velocity isotropically set to 0.12 [L/T] while dif- fusivity is reduced to 0.01 [L2 /T]. Thus the previous time forecast is not only corrupted with systematic biases in the position but also in terms of its spread and the size of its support set (Figure 5b). Observations are not considered to have position biases. However, systematic representative errors are imposed assuming that the sensing system has lower resolution than the model. To that end, we evolve two Kronecker delta functions x(s, t) = 800 δ(s1 , s2 ) and x(s, t) = 2400 δ(s1 , s2 ), which have less mass than the true state, for same 23
Figure 5: The true state xtr versus the background states xb and observations y (a–c) with systematic errors under a 2-D advection-diffusion dynamics as well as the analysis state xa by 3D-Var (d–f) and the EnRDA (g–i) for different displacement parameters in the Euclidean α and Riemannian space η, where the entropic regularization parameter is set to γ = 0.003. The black plus signs show the location of the modes for the true state. 24
time period of 25 and 35 [t], respectively. Then, in order to resemble poten- tial representative error of the sensing system, the observations are up-scaled by a factor of two through box averaging. As shown in Figure 5, the EnRDA preserves the shape of the state-space very well and gradually moves the mass towards the background state as the value of η increases. However, the expected shape of the state is not recovered properly, in 3D-Var, especially for smaller values of α = 0.25 and 0.5 – which tend to nudge the analysis state towards the background state. Although the higher values of α = 0.75 improves preservation of the shape of the state-space in 3D-Var, it does so by nudging the modes of analysis state further away from their true position towards the observations, ultimately increasing the error metrics. Specifically, the bias is reduced by more than 30%, from 0.15 to 0.05, as α increases from 0.25 to 0.75 for the 3D-Var; however, this occurs at the expense of almost three folds increase in ubrmse, from 0.3 to 1.1. This is primarily due to the fact that the background state has position error due to erroneous velocity and diffusivity parameters. Since bias measures the average magnitude of errors, the positive differences between the analysis state and true state due to the position error is compensated by the negative differences in their magnitude, leading to reduction in bias as the α increases. However, ubrmse is quadratic and thus measures the average magnitude of the error irrespective of its signs. Whereas, the bias (0.002–0.002) and ubrmse (0.12–0.95) in EnRDA are consistently lower than the 3D-Var by almost 99% and 60% respectively, over the reported range of η. 25
4.2 Lorenz-63 4.2.1 State-space Characterization The Lorenz system (Lorenz-63, Lorenz , 1963) is derived through truncation of the Fourier series of the Rayleigh-Bénard convection model. This model can be interpreted as a simplistic local weather system only involving the effect of local shear stress and buoyancy forces. The system is expressed using coupled ordinary differential equations that describe the temporal evolution of three coordinates x, y, and z representing the rate of convective overturn, horizontal, and vertical temperature variations as: dx = −σ(x − y) dt dy = ρx − y − xz (21) dt dz = xy − βz , dt where σ represents the Prandtl number, ρ is a normalized Rayleigh num- ber proportional to the difference in temperature gradient through the depth of the fluid and β denotes a horizontal wave number of the convective mo- tion. It is well established that for parameter values of σ = 10, ρ = 28 and β = 8/3, the system exhibits chaotic behavior with the phase space re- p p volving around two attractors located at ( β(ρ − 1), β(ρ − 1), ρ − 1) and p p (− β(ρ − 1), − β(ρ − 1), ρ − 1). 4.2.2 Experimental Setup and Results In this subsection, we will present the results from DA experiments conducted on Lorenz system under systematic error and also compare the results of the 26
EnRDA with particle filter. Apart from the systematic error component, we utilize the standard experimental setting used in numerous DA studies (Furtado et al., 2008; Miller et al., 1994; Van Leeuwen, 2010). In order to obtain the ground truth of the model trajectory, the system initialized at x0 = (1.508870, −1.531271, 25.46091), is integrated with a time step of ∆t = 0.01 over a time period of T = 0–20 [t] using the fourth-order Runge-Kutta approximation (Kutta, 1901; Runge, 1895). The observations are obtained at every assimilation time step of 40∆t by perturbing the ground truth with 2 2 Gaussian noise v t ∼ N (0, σobs Σρ ), where σobs = 2 is the observation error variance and Σρ ∈ R3×3 denotes the correlation matrix with 1 on the diagonal entries, 0.5 on the first sub and super diagonals, and 0.25 on the second sub and super diagonals. In order to characterize the distribution of the background state, 10 en- semble members of the EnRDA and 100 particles of the filter are generated by corrupting the ground truth at T = 0 [t] with a zero-mean Gaussian noise ω 0 ∼ N (0, σ02 I3 ), where σ02 = 2. For introducing systematic errors, the model parameter values are set to σ 0 = 10.5, ρ0 = 27, and β 0 = 10/3. The random errors are also introduced as the system evolves in time by adding a Gaussian noise ω t ∼ N (0, σb2 I3 ) at every ∆t, where σb2 = 0.02. For En- RDA, the regularization and displacement parameters are set to γ = 10 and η = 0.05. Throughout, to draw a robust statistical conclusion about the error statistics, the DA experiments are repeated for 50 independent and identical simulations. Figure 6 shows the temporal evolution of the ground truth and the anal- 27
Figure 6: Temporal evolution of the true state xtr of the Lorenz-63, observations y, mean of 100 and 10 ensemble members in particle filter xa (PF) (first column) and EnRDA xa (EnRDA) (second column), respectively. The temporal evolution of the ensemble mem- bers are shown in gray. Also shown within dashed rectangles are the window of time over which support set of observations and ensemble spread are disjoint. ysis state by particle filter (left column) and EnRDA (right column) over a time period of 0 to 15 [t] for one simulation. As is evident, the particle filter is well capable of capturing the ground truth when the observations lie within the particle spread. However, when the observations lie far apart from the support set of particles (Figure 6, dashed box), i.e. when the pdfs of background state and observations are disjoint, the particle filter becomes degenerate and the analysis state (particle mean) deviates away from the ground truth. Whereas, the EnRDA is capable of capturing the true state remarkably well even when ensemble spread and observations are far apart 28
Figure 7: Temporal evolution of bias and ubrmse along three dimensions of the Lorenz-63 for particle filter with 100 particles (a, c) and EnRDA with 10 ensemble members (b, d). The mean values are computed over 50 independent and identical simulations. from each other. The time evolution of the average bias and ubrmse for 50 independent and identical simulations is color coded over the phase space in Figure 7. As is evident, the average bias and ubrmse for the EnRDA is much lower than the particle filter throughout the entire simulation period. The error statistics show a transient behavior; however, for the EnRDA (Figure 7, left column) error statistics remain fairly stable and much lower than that of the particle filter (Figure 7, right column). In particular, the mean bias and ubrmse are decreased by 36–88% and 47–55% along the three dimensions, respectively. The expected values of the bias and ubrmse are reported in Table 1. 29
Table 1: Expected values of the bias and ubrmse for the particle filter and EnRDA from 50 independent and identical simulations of Lorenz-63. Values shown inside the parentheses are percent reduction as a result of EnRDA. Methods bias ubrmse x y z x y z Particle Filter 2.24 2.41 0.59 6.25 7.95 7.88 EnRDA 0.26 (88.4) 0.32 (86.7) 0.38 (35.6) 2.80 (55.2) 4.22 (46.9) 4.17 (47.1) 5 Discussion and Concluding Remarks In this study, we introduced an ensemble data assimilation (DA) method- ology over a Riemannian manifold, namely Ensemble Riemannian DA (En- RDA), and illustrated its advantages over classic Bayesian DA techniques for dissipative and chaotic dynamics. We demonstrated that the presented methodology is capable of assimilating information in the probability domain – characterized by the families of probability distributions with finite second- order moments. The key message is that when the forecast and observations exhibit non-Gaussian structure and their support sets are disjoint due to the presence of systematic errors; the translation variance of the Wasserstein metric can be leveraged to extend geophysical forecast skills beyond what is currently viable through classic Euclidean DA techniques. Even though, future research for a comprehensive comparison with existing bias correction methodologies is needed to completely characterize its relative pros and cons. We have explained the role of regularization and displacement parameter in the EnRDA and empirically examined their effects on the optimal joint histogram and consequently on the analysis state. Nevertheless, future stud- 30
ies are required to characterize closed-form expressions to better understand their implications on the forecast uncertainty. As was explained earlier, un- like the classic Bayesian DA techniques which assimilate available information using different relative weights across multiple dimensions dictated by the er- ror covariance matrices; a scalar displacement parameter is utilized in the EnRDA that balances the entire analysis state across all dimensions. Fu- ture research can be devoted to developing a framework that utilizes a vector representation of the displacement parameters to effectively tackle possible heterogeneity of uncertainty across multiple dimensions. Although the proposed EnRDA methodology through entropic regular- ization works properly for the presented low dimensional problems, future research is needed to test its efficiency in operational applications in high- dimensional geophysical DA systems with a state-space easily exceeding mil- lion dimensions (Van Leeuwen, 2009). In particular, additional research can be conducted to characterize the effective number of ensemble members re- quired to sufficiently represent such a large state-space. In high-dimensional systems, further constraining the solution of the optimal joint histogram on a submanifold of probability distributions with a Gaussian mixture structure can provide a way forward by significantly lowering the overall computational cost (Chen et al., 2019b). Nevertheless, the effectiveness of the presented methodology in treating systematic errors shows a great potential to improve precipitation and soil moisture prediction in weather and land surface models, where position of convective cells (Jones and Macpherson, 1997; Le Coz et al., 2019; Lin et al., 31
2017) and spatial dynamics of soil moisture (Dee and Da Silva, 1998; Reichle and Koster , 2004) are often corrupted with significant biases. The intrinsic ability to assimilate probabilistic information regardless of the shape of the distribution further promises the methodology to be effective in the sea ice DA framework, where the subgrid thickness is represented through probability distribution of sea ice fractions (Asadi et al., 2019). Furthermore, a promising area of future research is that of extending the presented formalism in Earth system models to assimilate a priori climatolog- ical information obtained from in-situ gauges, ships, buoys, and radiosondes where currently due to the existing scale gaps, assimilation of such observa- tions into the model grids is not straightforward. A crude framework may be cast to find the analysis state distribution as the barycenter of the three marginals namely background state, observations, and a priori climatological data. For such a framework, the Sinkhorn algorithm (Cuturi , 2013) can be easily generalized to solve through iterative projections (Peyré et al., 2019). Acknowledgements The first and second author acknowledge the grant from the National Aero- nautics and Space Administration (NASA) Terrestrial Hydrology Program (THP, 80NSSC18K1528) and the New (Early Career) Investigator Program (NIP, 80NSSC18K0742). The fifth author also acknowledge the support from National Science Foundation (NSF, DMS1830418). 32
References Agueh, M., and G. Carlier (2011), Barycenters in the wasserstein space, SIAM Journal on Mathematical Analysis, 10.1137/100805741. Altman, A., and J. Gondzio (1999), Regularized symmetric indefinite systems in interior point methods for linear and quadratic optimization, Optimization Methods and Software, 11 (1-4), 275–302. Amari, S.-i. (2012), Differential-geometrical methods in statistics, vol. 28, Springer Science & Business Media. Anderson, J. L. (1996), A method for producing and evaluating probabilistic forecasts from ensemble model integrations, Journal of climate, 9 (7), 1518–1530. Anderson, J. L. (2010), A non-gaussian ensemble filter update for data assimilation, Monthly Weather Review, 138 (11), 4186–4198. Andersson, E., J. Pailleux, J.-N. Thépaut, J. Eyre, A. McNally, G. Kelly, and P. Courtier (1994), Use of cloud-cleared radiances in three/four-dimensional variational data assimi- lation, Quarterly Journal of the Royal Meteorological Society, 120 (517), 627–653. Asadi, N., K. A. Scott, and D. A. Clausi (2019), Data fusion and data assimilation of ice thickness observations using a regularisation framework, Tellus A: Dynamic Meteorology and Oceanography, pp. 1–20. Beezley, J. D., and J. Mandel (2008), Morphing ensemble kalman filters, Tellus A: Dynamic Meteorology and Oceanography, 60 (1), 131–140. Benamou, J.-D., and Y. Brenier (2000), A computational fluid mechanics solution to the monge-kantorovich mass transfer problem, Numerische Mathematik, 84 (3), 375–393. Berardi, M., A. Andrisani, L. Lopez, and M. Vurro (2016), A new data assimilation technique based on ensemble kalman filter and brownian bridges: an application to richards equation, Computer Physics Communications, 208, 43–53. 33
Bocquet, M., C. A. Pires, and L. Wu (2010), Beyond gaussian statistical modeling in geo- physical data assimilation, Monthly Weather Review, 138 (8), 2997–3023. Brenier, Y. (1987), Décomposition polaire et réarrangement monotone des champs de vecteurs, CR Acad. Sci. Paris Sér. I Math., 305, 805–808. Chen, B., L. Dang, Y. Gu, N. Zheng, and J. C. Principe (2019a), Minimum Error Entropy Kalman Filter, IEEE Transactions on Systems, Man, and Cybernetics: Systems, 10.1109/ tsmc.2019.2957269. Chen, Y., T. T. Georgiou, and A. Tannenbaum (2019b), Optimal transport for gaussian mixture models, IEEE Access, 7, 6269–6278. Cullen, M., J. Norbury, and R. J. Purser (1991), Generalised lagrangian solutions for atmospheric and oceanic flows, SIAM Journal on Applied Mathematics, 51 (1), 20–31, 10.1137/0151002. Cuturi, M. (2013), Sinkhorn distances: Lightspeed computation of optimal transport, in Advances in neural information processing systems, pp. 2292–2300. De Lannoy, G. J., P. R. Houser, V. R. Pauwels, and N. E. Verhoest (2007a), State and bias estimation for soil moisture profiles by an ensemble kalman filter: Effect of assimilation depth and frequency, Water resources research, 43 (6). De Lannoy, G. J., R. H. Reichle, P. R. Houser, V. Pauwels, and N. E. Verhoest (2007b), Correcting for forecast bias in soil moisture assimilation with the ensemble kalman filter, Water Resources Research, 43 (9). Dee, D. P. (2003), Detection and correction of model bias during data assimilation, Meteo- rological Training Course Lecture Series (ECMWF). Dee, D. P. (2005), Bias and data assimilation, Quarterly Journal of the Royal Meteorological Society, 131 (613), 3323–3343. Dee, D. P., and A. M. Da Silva (1998), Data assimilation in the presence of forecast bias, Quarterly Journal of the Royal Meteorological Society, 124 (545), 269–295. 34
Doucet, A., and A. M. Johansen (2009), A tutorial on particle filtering and smoothing: Fifteen years later, Handbook of nonlinear filtering, 12 (656-704), 3. Drécourt, J.-P., H. Madsen, and D. Rosbjerg (2006), Bias aware kalman filters: Comparison and improvements, Advances in Water Resources, 29 (5), 707–718. Ebtehaj, A. M., and E. Foufoula-Georgiou (2013), On variational downscaling, fusion, and assimilation of hydrometeorological states: A unified framework via regularization, Water Resources Research, 49 (9), 5944–5963, 10.1002/wrcr.20424. Ebtehaj, A. M., M. Zupanski, G. Lerman, and E. Foufoula-Georgiou (2014), Variational data assimilation via sparse regularisation, Tellus A: Dynamic Meteorology and Oceanography, 66 (1), 21,789. Evensen, G. (1994), Sequential data assimilation with a nonlinear quasi-geostrophic model using monte carlo methods to forecast error statistics, Journal of Geophysical Research: Oceans, 99 (C5), 10,143–10,162. Evensen, G. (2003), The ensemble kalman filter: Theoretical formulation and practical im- plementation, Ocean dynamics, 53 (4), 343–367. Feyeux, N., A. Vidard, and M. Nodet (2018), Optimal transport for variational data assim- ilation, Nonlinear Processes in Geophysics, 25 (1), 55–66. Fréchet, M. (1948), Les éléments aléatoires de nature quelconque dans un espace distancié, in Annales de l’institut Henri Poincaré, vol. 10, pp. 215–310. Furtado, H. C. M., H. F. de Campos Velho, and E. E. N. Macau (2008), Data assimilation: Particle filter and artificial neural networks, in Journal of Physics: Conference Series, vol. 135, p. 012073, IOP Publishing. Goodliff, M., J. Amezcua, and P. J. Van Leeuwen (2015), Comparing hybrid data assimila- tion methods on the lorenz 1963 model with increasing non-linearity, Tellus A: Dynamic Meteorology and Oceanography, 67 (1), 26,928. 35
Gordon, N. J., D. J. Salmond, and A. F. Smith (1993), Novel approach to nonlinear/non- gaussian bayesian state estimation, in IEE proceedings F (radar and signal processing), vol. 140, pp. 107–113, IET. Hamill, T. M. (2001), Interpretation of rank histograms for verifying ensemble forecasts, Monthly Weather Review, 129 (3), 550–560. Han, X., and X. Li (2008), An evaluation of the nonlinear/non-gaussian filters for the se- quential data assimilation, Remote Sensing of Environment, 112 (4), 1434–1449. Hellinger, E. (1909), Neue begründung der theorie quadratischer formen von unendlichvie- len veränderlichen., Journal für die reine und angewandte Mathematik (Crelles Journal), 1909 (136), 210–271. Hurkmans, R., C. Paniconi, and P. A. Troch (2006), Numerical assessment of a dynami- cal relaxation data assimilation scheme for a catchment hydrological model, Hydrological Processes: An International Journal, 20 (3), 549–563. Jones, C., and B. Macpherson (1997), A latent heat nudging scheme for the assimilation of precipitation data into an operational mesoscale model, Meteorological Applications: A journal of forecasting, practical applications, training techniques and modelling, 4 (3), 269–277. Jordan, R., D. Kinderlehrer, and F. Otto (1998), The variational formulation of the Fokker- Planck equation, SIAM Journal on Mathematical Analysis, 10.1137/S0036141096303359. Kalman, R. E. (1960), A new approach to linear filtering and prediction problems, Journal of basic Engineering, 82 (1), 35–45. Kantorovich, L. V. (1942), On the translocation of masses, in Dokl. Akad. Nauk. USSR (NS), vol. 37, pp. 199–201. Kollat, J., P. Reed, and D. Rizzo (2008), Addressing model bias and uncertainty in three dimensional groundwater transport forecasts for a physical aquifer experiment, Geophysical research letters, 35 (17). 36
Kullback, S., and R. A. Leibler (1951), On information and sufficiency, The annals of math- ematical statistics, 22 (1), 79–86. Kutta, W. (1901), Beitrag zur naherungsweisen integration totaler differentialgleichungen, Z. Math. Phys., 46, 435–453. Lauritzen, S. L. (1987), Statistical manifolds, Differential geometry in statistical inference, 10, 163–216. Le Coz, C., A. Heemink, M. Verlaan, M.-c. ten Veldhuis, N. van de Giesen, et al. (2019), Correcting position error in precipitation data using image morphing, Remote Sensing, 11 (21), 2557. Le Dimet, F.-X., and O. Talagrand (1986), Variational algorithms for analysis and assimi- lation of meteorological observations: theoretical aspects, Tellus A: Dynamic Meteorology and Oceanography, 38 (2), 97–110. Lin, L.-F., A. M. Ebtehaj, A. N. Flores, S. Bastola, and R. L. Bras (2017), Combined Assimilation of Satellite Precipitation and Soil Moisture: A Case Study Using TRMM and SMOS Data, Monthly Weather Review, 145 (12), 4997–5014, 10.1175/MWR-D-17-0125.1. Lorenc, A. C. (1986), Analysis methods for numerical weather prediction, Quarterly Journal of the Royal Meteorological Society, 112 (474), 1177–1194. Lorenz, E. N. (1963), Deterministic nonperiodic flow, Journal of the atmospheric sciences, 20 (2), 130–141. Mandel, J., and J. D. Beezley (2009), An ensemble kalman-particle predictor-corrector filter for non-gaussian data assimilation, in International Conference on Computational Science, pp. 470–478, Springer. McCann, R. J. (1997), A convexity principle for interacting gases, Advances in mathematics, 128 (1), 153–179. Miller, R. N., M. Ghil, and F. Gauthiez (1994), Advanced data assimilation in strongly nonlinear dynamical systems, Journal of the atmospheric sciences, 51 (8), 1037–1056. 37
Monge, G. (1781), Mémoire sur la théorie des déblais et des remblais, Histoire de l’Académie Royale des Sciences de Paris. Moradkhani, H., S. Sorooshian, H. V. Gupta, and P. R. Houser (2005), Dual state–parameter estimation of hydrological models using ensemble kalman filter, Advances in water re- sources, 28 (2), 135–147. Nakano, S., G. Ueno, and T. Higuchi (2007), Merging particle filter for sequential data assimilation, Nonlinear Processes in Geophysics, 10.5194/npg-14-395-2007. Nash, J. C. (1990), Compact numerical methods for computers: linear algebra and function minimisation, CRC press. Ning, L., F. P. Carli, A. M. Ebtehaj, E. Foufoula-Georgiou, and T. T. Georgiou (2014), Coping with model error in variational data assimilation using optimal mass transport, Water Resources Research, 50 (7), 5817–5830. Orlin, J. B. (1993), A faster strongly polynomial minimum cost flow algorithm, Operations research, 41 (2), 338–350. Otto, F. (2001), The geometry of dissipative evolution equations: The porous medium equa- tion, Communications in Partial Differential Equations, 10.1081/PDE-100002243. Park, S. K., and D. Županski (2003), Four-dimensional variational data assimilation for mesoscale and storm-scale applications, Meteorology and Atmospheric Physics, 82 (1-4), 173–208. Pennec, X. (2006), Intrinsic statistics on Riemannian manifolds: Basic tools for geometric measurements, Journal of Mathematical Imaging and Vision, 10.1007/s10851-006-6228-4. Peyré, G., M. Cuturi, et al. (2019), Computational optimal transport, Foundations and Trends R in Machine Learning, 11 (5-6), 355–607. Pires, C., R. Vautard, and O. Talagrand (1996), On extending the limits of variational assimilation in nonlinear chaotic systems, Tellus A, 48 (1), 96–121. 38
You can also read