ICLASS 1.1, a variational Inverse modelling framework for the Chemistry Land-surface Atmosphere Soil Slab model: description, validation, and ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Model description paper Geosci. Model Dev., 16, 47–74, 2023 https://doi.org/10.5194/gmd-16-47-2023 © Author(s) 2023. This work is distributed under the Creative Commons Attribution 4.0 License. ICLASS 1.1, a variational Inverse modelling framework for the Chemistry Land-surface Atmosphere Soil Slab model: description, validation, and application Peter J. M. Bosman1 and Maarten C. Krol1,2 1 Meteorology and Air Quality Group, Wageningen University, Wageningen, the Netherlands 2 Institute for Marine and Atmospheric Research, Utrecht University, Utrecht, the Netherlands Correspondence: Peter J. M. Bosman (peter.bosman.publicaddress@gmail.com) Received: 3 March 2022 – Discussion started: 7 April 2022 Revised: 8 November 2022 – Accepted: 25 November 2022 – Published: 3 January 2023 Abstract. This paper provides a description of ICLASS 1.1, dle non-linearity and provides posterior statistics on the esti- a variational Inverse modelling framework for the Chemistry mated parameters. The paper provides a technical description Land-surface Atmosphere Soil Slab model. This framework of the framework, includes a validation of the adjoint code, can be used to study the atmospheric boundary layer, surface in addition to tests for the full inverse modelling framework, layer, or the exchange of gases, moisture, heat, and momen- and a successful example application for a grassland in the tum between the land surface and the lower atmosphere. The Netherlands. general aim of the framework is to allow the assimilation of various streams of observations (fluxes, mixing ratios at mul- tiple heights, etc.) to estimate model parameters, thereby ob- taining a physical model that is consistent with a diverse set 1 Introduction of observations. The framework allows the retrieval of pa- rameters in an objective manner and enables the estimation Exchanges of heat, mass, and momentum between the land of information that is difficult to obtain directly by observa- surface and the atmosphere play an essential role for weather, tions, for example, free tropospheric mixing ratios or stom- climate, air quality, and biogeochemical cycles. Surface heat- atal conductances. Furthermore, it allows the estimation of ing under sunny daytime conditions usually leads to the possible biases in observations. Modelling the carbon cycle growth of a relatively well-mixed layer close to the land sur- at the ecosystem level is one of the main intended fields of ap- face, i.e. the convective boundary layer (CBL). This layer is plication. The physical model around which the framework is directly impacted by exchange processes with the land sur- constructed is relatively simple yet contains the core physics face and is also a layer wherein often humans live. Modelling to simulate the essentials of a well-mixed boundary layer and the composition and thermodynamic state of the CBL in its of the land–atmosphere exchange. The model includes an ex- interaction with the land surface is the target of the Chem- plicit description of the atmospheric surface layer, a region istry Land-surface Atmosphere Soil Slab model (CLASS; where scalars show relatively large gradients with height. An Vilà-Guerau De Arellano et al., 2015). This model and simi- important challenge is the strong non-linearity of the model, lar models have been applied frequently, e.g. for understand- which complicates the estimation of the best parameter val- ing the daily cycle of evapotranspiration (van Heerwaarden ues. The constructed adjoint of the tangent linear model can et al., 2010), studying the effects of aerosols on boundary be used to mitigate this challenge. The adjoint allows for layer dynamics (Barbaro et al., 2014), studying the effects an analytical gradient of the objective cost function, which of elevated CO2 on boundary layer clouds (Vilà-Guerau De is used for minimisation of this function. An implemented Arellano et al., 2012), or for studying the ammonia budget Monte Carlo way of running ICLASS can further help to han- (Schulte et al., 2021). Next to a representation of the CBL, the CLASS model includes a simple representation for the Published by Copernicus Publications on behalf of the European Geosciences Union.
48 P. J. M. Bosman and M. C. Krol: ICLASS 1.1 exchange of gases, heat, moisture, and momentum between /CBL-level observations may suffer from biases. An exam- the land surface and the lower atmosphere. The model ex- ple is the closure of the surface energy balance, where the plicitly accounts for the surface layer, which is, under sunny available energy is often larger than the sum of the latent and daytime conditions, a layer within the CBL that is close sensible turbulent heat fluxes (Foken, 2008). This energy bal- to the surface, with relatively strong vertical gradients of ance closure problem is a known issue with eddy covariance scalars (e.g. specific humidity and temperature) and momen- observations (Foken, 2008; Oncley et al., 2007; Renner et al., tum (Stull, 1988). The best model performance is during the 2019), and various explanations have been suggested (Foken, convective daytime period. Since the CBL model physics 2008). are relatively simple and only include the essential bound- The above text illustrates the need for an objective opti- ary layer processes, the model performs best on what might misation framework capable of correcting observations for be called golden days. Those are days on which advection is biases. We therefore present a description of ICLASS, an in- either absent or uniform in time and space, deep convection verse modelling framework built around the CLASS model, and precipitation are absent, and sufficient incoming short- including a bias correction scheme for specific bias patterns. wave radiation heats the surface, allowing for the forma- This framework can estimate model parameters by minimis- tion of a prototypical convective boundary layer. When these ing an objective cost function using a variational (Chevallier assumptions are met, the evolution of the budgets of heat, et al., 2005) framework. ICLASS uses a Bayesian approach, moisture, and gases is, to a large extent, determined by local in the sense that it combines information, both from obser- land–atmosphere interactions. The aforementioned assump- vations and from prior knowledge about the parameters, to tions should ideally be valid for the whole modelled period. come to a solution with a reduced uncertainty in the opti- They should ideally hold on a spatial scale large enough that mised parameters. A major strength of this framework is that violations of the assumptions in the region do not influence it allows the incorporation of several streams of observations, the model simulation location. In practice, days are often not for instance, chemical fluxes, mixing ratios, temperatures at ideal, e.g. a time-varying advection can be present. This does multiple heights, and radiosonde observations of the bound- not necessarily mean that the model cannot be applied to that ary layer height. By optimising a number of predefined key day, but the performance is likely to be worse. parameters of the model, we aim to obtain a diurnal simu- To further our understanding of the land–atmosphere ex- lation that is consistent with a diverse set of measurements. change, tall tower observational sites have been established, Additionally, error statistics that are estimated provide infor- for instance at Cabauw, the Netherlands (Bosveld et al., mation about the constraints the measurements place on the 2020; Vermeulen et al., 2011), Hyytiälä, Finland (Vesala model parameters. Modelling the carbon cycle at ecosystem et al., 2005), and Harvard Forest, USA (Commane et al., level is one of the main intended fields of application. As an 2015). These observational sites provide time series of dif- example, with some extensions to the framework, ICLASS ferent types of measurements (observation streams). Even could be applied to ecosystem observations of the coupled so, many studies only use a (small) fraction of the different exchange of CO2 and carbonyl sulfide (a tracer for obtaining streams of observations available for a specific day and lo- stomatal conductance; Whelan et al., 2018). cation (e.g. Vilà-Guerau De Arellano et al., 2012). A model Besides ICLASS, there have been extensive earlier ef- like CLASS, containing both a mixed layer and land surface forts in the literature to estimate the parameters in land– part, can be used to fit an extensive set of observation streams atmosphere exchange models. For example, Bastrikov et al. simultaneously. When model results are consistent with a di- (2018) and Mäkelä et al. (2019) optimised parameters of the verse set of measurements, this gives more confidence that land surface models ORCHIDEE and JSBACH, respectively. the internal physics are robust and that the model has been These models simulate additional processes not included in adequately parameterised to reliably simulate reality. How- the CLASS model, which enables the calculation of addi- ever, an important difficulty in the application of a model tional land surface variables. For example, in contrast to OR- like CLASS concerns parameter tuning to obtain a good fit CHIDEE, CLASS cannot simulate leaf phenology or the al- to observations. Some parameters can be obtained quite di- location of carbon to different biomass pools. However, a dis- rectly from observations (for instance, initial mixed layer hu- tinct advantage of the CLASS model is the coupling of a land midity), but, for example, estimating free tropospheric lapse surface model to a mixed layer model. This facilitates the in- rates or certain land surface parameters is often more chal- clusion of atmospheric observations such as mixing ratios in lenging. When many parameters need to be determined, the the optimisation of land–atmosphere exchange parameters. feasible parameter space becomes vast. If this vast parameter Next to that, simple models like CLASS have the advantage space is not properly explored, the obtained parameters can of requiring less computation time, and the output might be be subjective and sub-optimal. The estimation of parameters more easily understood. Kaminski et al. (2012) and Schür- is further complicated by possible overfitting and the prob- mann et al. (2016) also assimilate both land-surface-related lem of parameter equifinality (Tang and Zhuang, 2008), the and atmosphere-related observations. In those studies, a land latter occurs especially in case not enough types of observa- surface model is coupled to an atmospheric transport model. tions are used. Next to that, some of the available ecosystem- Meteorology is not simulated in those studies. In ICLASS, Geosci. Model Dev., 16, 47–74, 2023 https://doi.org/10.5194/gmd-16-47-2023
P. J. M. Bosman and M. C. Krol: ICLASS 1.1 49 meteorology adds an additional set of observation streams 2 Forward model that can be used to optimise land-surface-related parameters that are linked both to gas fluxes and meteorology. The employed forward model in our inverse modelling An important challenge for the optimisation framework is framework is the (slightly adapted) Chemistry Land-surface the strong non-linearity of the model. As an example, the Atmosphere Soil Slab model (CLASS; Vilà-Guerau De Arel- change in mixed-layer specific humidity (q) with time is a lano et al., 2015). The model code is freely available on function of q itself; a stronger evapotranspiration flux leads GitHub (https://classmodel.github.io/, last access: 9 Decem- to an increased specific humidity in the mixed layer, which ber 2022). We made use of the Python version of CLASS to in turn reduces the evapotranspiration flux again (van Heer- construct our inverse modelling framework. We will briefly waarden et al., 2009). The non-linearity causes numerically describe the essentials of the model which are relevant for calculated cost function gradients to deviate from the true an- the inverse modelling framework. alytical gradients, since the cost function can vary irregularly The model consists of several parts, namely the mixed with a changing model parameter value. This is hampering layer, the surface layer, and the land surface, including the the proper minimisation of the cost function when using nu- soil (Fig. 1). It is a conceptual model that uses a relatively merically calculated gradients. An adjoint has been used in small set of differential equations (Wouters et al., 2019). the past to optimise parameters, e.g. for land surface models The core of the model is a box model representation of an (Raoult et al., 2016; Ziehn et al., 2012). Constructing the ad- atmospheric mixed layer. Therefore, an essential assump- joint of the tangent linear model is a way to obtain more ac- tion of the model is that, during the daytime, turbulence is curate gradient calculations, as the adjoint provides a locally strong enough to maintain well-mixed conditions in this layer exact analytical gradient of the cost function at the locations (Ouwersloot et al., 2012). The mixed layer tendency equation where the function is differentiable. This approach, further- for any scalar (e.g. CO2 , heat) is as follows: more, allows the efficient retrieval of the sensitivity of model output to model parameters. Also, using an analytical gradi- (surface flux + entrainment flux) tendency = ent is generally computationally less expensive compared to mixed layer height using a numerical gradient (Doicu et al., 2010, p. 17). Mar- + advection. (1) gulis and Entekhabi (2001a) constructed an adjoint model framework of a coupled land surface–boundary layer model, The surface flux is the exchange flux with the land surface which they used to study differences in daytime sensitivity of (including vegetation and soil). The entrainment flux is the surface turbulent fluxes for the same model in coupled and exchange flux between the mixed layer and the overlying uncoupled modes (Margulis and Entekhabi, 2001b). How- free troposphere. For moisture and chemical species, a cloud ever, their CBL model neither includes carbon dioxide nor mass flux can also be included in the equation. The mixed does it allow the modelling of scalars at specific heights in layer height is dynamic during the day and evolves under the the surface layer. We expect these to be important for our driving force of the surface heat fluxes and large-scale subsi- framework that aims to make optimal use of several informa- dence. Cloud effects on the boundary layer height and growth tion streams. To do gradient calculations within the ICLASS due to mechanical turbulence can also be accounted for. framework, we constructed the adjoint of CLASS. Above the mixed layer, a discontinuity occurs in the scalar The paper is structured as follows. First, we give some in- quantities, representing an infinitely small inversion layer. formation on the (slightly adapted) forward model CLASS Above the inversion, the scalars are assumed to follow a lin- (Sect. 2). The inverse modelling framework built around ear profile with height in the free troposphere (Fig. 1). The CLASS is described in Sect. 3. Information on how error entrainment fluxes are calculated as follows. First, the buoy- statistics are employed and produced follows in Sect. 4, after ancy entrainment flux is taken as a fixed fraction of the sur- which we provide a description of the model output (Sect. 5) face flux of this quantity (Stull, 1988, p. 478), to which the and technical details of the code (Sect. 6). Afterwards, we entrainment flux driven by shear can optionally be added. present the adjoint and gradient tests that serve as validation From this virtual heat entrainment flux, an entrainment ve- for the constructed adjoint model (Sect. 7). Further informa- locity is calculated. The entrainment flux for a specific scalar tion about the adjoint model is available in the Supplement. (e.g. CO2 ) is then obtained by multiplying the entrainment In Sect. 8, we perform observation system simulation exper- velocity with the value of the (inversion layer) discontinuity iments that validate the full inverse modelling framework. In for the respective scalar. the last section before the concluding discussion, we present The surface layer is defined in the model as the low- an example application for a grassland site in the Nether- est 10 % of the boundary layer. In this (optional) layer, the lands (Cabauw), where a very comprehensive meteorological Monin–Obukhov similarity theory (Monin and Obukhov, dataset is complemented with detailed measurements of CO2 1954; Stull, 1988) is employed. In the original CLASS sur- mixing ratios and surface fluxes. face layer, scalars such as temperature are evaluated at 2 m height. For some scalars, we have extended this to multiple user-specified heights. This allows the comparison of obser- https://doi.org/10.5194/gmd-16-47-2023 Geosci. Model Dev., 16, 47–74, 2023
50 P. J. M. Bosman and M. C. Krol: ICLASS 1.1 vations of chemical mixing ratios and temperatures at differ- ent heights (e.g. along a tower) to the model output. Since the steepness of vertical profiles depends on wind speed and roughness of the surface, these gradients reflect information about these quantities. The (optional) land surface includes a simple soil repre- sentation and an a-gs (Jacobs, 1994; Ronda et al., 2001) mod- ule. This a-gs module (Jacobs, 1994; Ronda et al., 2001) is a big-leaf method (Friend, 2001) for calculating the exchange of CO2 between atmosphere and biosphere and the stom- atal resistance. The latter is used for calculating the H2 O ex- Figure 1. Sketch of the employed forward model (the slightly change. As an alternative to a-gs, a Jarvis–Stewart approach adapted CLASS model). The blue dots in the surface layer represent (Jarvis, 1976; Stewart, 1988) can also be used in the calcula- user-specified heights, where the model calculates scalars, e.g. the tion of the H2 O exchange. The latter approach is more sim- CO2 mixing ratio. The mixed layer is represented by a single bulk value in the model (blue dot in the mixed layer). At the top of the ple; herein, the stomatal conductance consists of a maximum mixed layer, a discontinuity (jump) occurs in the profiles. The free conductance multiplied with a set of factors between 0 and troposphere is not explicitly modelled but is taken into account for 1 (Jacobs, 1994). In CLASS, there are four factors included the exchange with the mixed layer (entrainment). The slope of the which represent limitations due to the amount of incoming free troposphere line is the free tropospheric lapse rate. A constant light, the temperature, vapour pressure deficit, and soil mois- advection can be taken into account as a source/sink. ture. The land surface part is responsible for calculating the exchange fluxes of sensible heat, latent heat, and CO2 be- tween the mixed layer and the land surface. The model has tion scheme (see Sect. 3.2). The forward-model H (CLASS) a module for calculating long- and shortwave radiation dy- projects the vector x m to provide model output that can namically. In this module, shortwave radiation is calculated be compared to observations, e.g. temperatures at different using the date and time, cloud cover, and albedo. For long- heights (full list in the ICLASS manual). This output does wave radiation, surface temperature and the temperature at not only depend on x m but also on model parameters that are the top of the surface layer are used. The resulting net radi- not part of the state. Those are contained in a vector p. The ation is used implicitly in the calculation of the heat fluxes, result is contained in vector H (x m , p). Note that an overview thereby obtaining a closed energy balance in the model (sim- of all inverse-modelling variables defined in this section is plified calculation). Soil temperature and moisture are also given in Table B1, including the dimensions and units. We simulated based on a force–restore model. The soil heat flux initially define a cost function J (–) as follows (Brasseur and to the atmosphere is calculated based on the gradient between Jacob, 2017): soil and surface temperature, and the latter is obtained from a simplified energy balance calculation. More details on the equations in the model can be J (x) = (x − x A )T S−1 A (x − x A ) found in Vilà-Guerau De Arellano et al. (2015). Note that + (y − H (x m , p))T S−1 O (y − H (x m , p)), (2) some relatively small changes with respect to the original CLASS model have been implemented, as documented in the ICLASS manual, which is part of the material that can be where x A represents the a priori estimate of the state vec- downloaded via the Zenodo link in the code and data avail- tor, y is the vector of observations used within the mod- ability section. elled time window, and H (x m , p) is the vector of model results at the times of the observations. The latter vector is the model equivalent of the observation vector y. The su- 3 Inverse modelling framework perscript T means transpose, SA represents the a priori er- ror covariance matrix (defined in the Supplement), and SO 3.1 General represents the matrix of the observational error covariances. This cost function quantifies two aspects, namely the fit be- Inverse modelling is based on using observations and, ide- tween model output and observations, in addition to how well ally, prior information to statistically optimise a set of vari- the posterior state matches with prior information about the ables driving a physical system (Brasseur and Jacob, 2017). state. Regarding the observational error covariance matrix SO The n variables to be optimised are contained in a state vec- (fully defined in Brasseur and Jacob, 2017), we assume, for tor x. In our framework, this vector can be subdivided in two simplicity, the observational errors to be uncorrelated (as in, vectors. Those are x m , containing state variables belonging e.g., McNorton et al., 2018; Chevallier et al., 2010; Ma et al., to the input of CLASS (e.g. CO2 advection and albedo), and 2021). This simplifies the matrix SO to a diagonal matrix, x b , containing state variables belonging to our bias correc- with the observational error variances as diagonal elements. Geosci. Model Dev., 16, 47–74, 2023 https://doi.org/10.5194/gmd-16-47-2023
P. J. M. Bosman and M. C. Krol: ICLASS 1.1 51 In this way, Eq. (2) simplifies to the following: of the scaling factor for the ith observation. The decomposi- tion from Eq. (4) remains valid. m X (H (x m , p)i − yi )2 In the statistical optimisation, we attempt to find the values J (x) = (x −x A )T S−1 A (x −x A )+ 2 . (3) σO,i of the state vector x, such that the function in Eq. (5) reaches i=1 its absolute minimum. This is done by starting from an initial 2 is the ith diagonal element of S , and m is the Here, σO,i O guess (x = x A ), after which the state vector is improved iter- number of observations. It is customary to refer to the first atively. The cost function and the gradient of the cost func- term of this cost function as the background part and to the tion (derivatives with respect to all parameters) are computed second part of this function as the data part. Note that if the for different combinations of parameters in the state vector a priori errors are also uncorrelated, then the first term in (Fig. 2). The framework uses, by default, a truncated New- the equation can be simplified in a similar way as the sec- ton method, the tnc algorithm (The SciPy community, 2022; ond term. The observational error variance linked to the ith Nash, 2000), for the optimisations. Truncated Newton meth- 2 ) can be further split up as follows: observation (σO,i ods are suitable for non-linear optimisation problems (Nash, 2000). The chosen algorithm allows the specification of hard 2 σO,i 2 = σI,i 2 + σM,i 2 + σR,i , (4) bounds for the state vector parameters, thus preventing un- physical parameter values for individual parameters in the where σI,i2 is the instrument (measurement) error variance, state vector. Raoult et al. (2016) used similar constraints in 2 2 is the represen- their inverse modelling system, and Bastrikov et al. (2018) σM,i is the model error variance, and σR,i used bound constraints as well. The analytical cost function tation error variance (see Brasseur and Jacob, 2017). These gradient calculations are described in Sect. 3.4. A basic nu- errors are assumed to be independent of each other and nor- merical derivative option (Sect. 3.5) is available as well, al- mally distributed. though we expect this, in general, to be outperformed by the At this point, we introduce two extra features to the cost analytical derivative. function. First, we allow the user to specify a weight for each individual observation, in case some observations are 3.2 State vector parameters deemed less important than others. In principle, the observa- tional error variances could also be adapted for this purpose, As mentioned before, the state vector can be decomposed but, by using weights, we can keep realistic error estimations into vector x m and vector x b . Vector x m contains state vari- (important for Sect. 4.2). Those weights can also be used to ables related to the input of CLASS, such as the initial condi- manipulate the relative importance of the background term tions (e.g. initial mixed layer potential temperature), bound- and the data term. This is similar to the regularisation factor ary conditions (e.g. CO2 advection), and uncertain model explained in Brasseur and Jacob (2017). Second, we intro- constants (e.g. roughness length for heat). The full list of duce part of our bias correction scheme in the data part of the model parameters that can be optimised is given in the cost function, namely scaling factors for observations. These ICLASS manual. factors can also be optimised. With the additions mentioned Vector x b contains parameters belonging to our bias cor- above, the cost function, as given in Eq. (3), modifies to the rection scheme in the data part of the cost function. There are following: two ways of bias correcting. The first one is by using obser- vation scaling factors for observation streams. These scaling J (x) = (x − x A )T S−1 A (x − x A ) factors (si ) have been introduced in Eq. (5). As an example, m in the state vector to be optimised, one can include a single X (H (x m , p)i − si yi )2 + wi 2 , (5) scaling factor for all CO2 surface flux observations. The sec- i=1 σO,i ond possible method of bias correcting (Sect. 3.3) is imple- where wi (–) is a weight for each individual observation in mented specifically for the energy balance closure problem the cost function. si (–) is a scaling factor for observation yi , (Foken, 2008; Oncley et al., 2007; Renner et al., 2019), and which is identical for each time step but is allowed to differ it involves a parameter FracH (–) to be optimised. between each observation stream. The background term in 3.3 Bias correction for energy balance closure the cost function can be left out if the user desires so. The in- troduction of the scaling factors means we need to adapt the We first define an observational energy balance closure resid- observational error variances as well. Thus, the ith observa- ual as follows (Foken, 2008): tional error variance is now given by the following: 2 {t} {t} εeb = R n − (H orig + LE orig + G), (7) σO,i = var(si yi − H (x m , p)i ), (6) where R n is the time series of net radiation measurements, {t} where x m is the unknown vector of the “true” values of the H orig and LE orig are the measured sensible and latent heat {t} model parameters in the state vector, and si is the true value fluxes, and G is the measured soil heat flux. The difference https://doi.org/10.5194/gmd-16-47-2023 Geosci. Model Dev., 16, 47–74, 2023
52 P. J. M. Bosman and M. C. Krol: ICLASS 1.1 Figure 2. Slightly simplified sketch of the workflow of the inverse modelling framework when using the adjoint model for the derivatives with respect to model parameters. Note that, for the purposes of clarity, direct arrows between the parameters and the cost function and its gradients are not drawn. These arrows arise via the background part of the cost function (see the equations in the text). Everything within the shaded rectangle is part of the iterative cycle of optimisation. Model parameters that are not optimised are in the vector p. This vector is used together with x m in every model simulation. In case ICLASS is run in Monte Carlo mode (Sects. 3.6 and 4.2), this figure applies to the individual ensemble members. between the measured net radiation and the sum of the mea- with respect to this element is computed as follows (similar sured heat fluxes is calculated for every time step, and this to Brasseur and Jacob, 2017, their Eq. 11.105): represents the energy balance closure residual. If desired, ∂J the user can easily specify an expression of their own for = 2(S−1 T A (x − x A ))i + 2((∇ x m H (x m , p)) F )i−l , (10) εeb . Subsequently, the observations for the sensible and la- ∂xi tent heat flux are adapted as follows: where ∇ x m H (x m , p) is the local Jacobian matrix of H . l is the number of non-model-input parameters in the state x y H = H orig + FracH εeb (8) occurring before xi . F is the forcing vector, with elements y LE = LE orig + (1 − FracH )εeb . (9) Fk defined as follows: This implies that the energy balance closure residual is added (H (x m , p)k − sk yk ) Fk = wk . (11) partly to the sensible heat flux and partly to the latent heat 2 σO,k flux, thereby closing the energy balance in the observations. This partitioning is determined by parameter FracH , which There is one forcing element in the vector for every obser- is taken to be constant during the modelled period. This ap- vation that is used, and Fk is the forcing related to the ob- proach of closing the energy balance is similar to Renner servation yk . For calculating the model input part of the an- et al. (2019), but we optimise a parameter, instead of using alytical derivative, we constructed the adjoint of the model, the evaporative fraction, for partitioning εeb . The limitations (∇ x m H (x m , p))T , which is used to obtain a locally exact an- of this approach are that we assume the radiation and soil alytical gradient (in specific cases the gradient is not exact; heat flux measurements to be bias-free and the FracH param- see the section of “if statements” in the Supplement). More eter constant. information on the adjoint is given in the Supplement. In case the ith state vector element is an observation scal- 3.4 Analytical derivative ing factor (Eq. 5) for observation stream j , the gradient of the cost function with respect to the ith state vector element For the optimisations, we do not only compute a cost func- is computed as follows: tion, but we also use the gradient of the cost function with mj respect to the state vector elements. This informs us on the ∂J X direction in which the cost function lowers. How the gradient = 2(S−1 A (x − x A ))i + −2Fk yj,k , (12) ∂xi k=1 with respect to an individual element is calculated depends on which state vector element is considered. In case the ith where mj is the number of observations of the type (stream) state vector element is a model input parameter, the gradient to which the observation scaling factor is applied, e.g. if the Geosci. Model Dev., 16, 47–74, 2023 https://doi.org/10.5194/gmd-16-47-2023
P. J. M. Bosman and M. C. Krol: ICLASS 1.1 53 observation scale is a scaling factor for surface CO2 flux ob- possible to place hard bounds on individual parameters when servations, then mj is the number of surface CO2 flux obser- using the tnc algorithm, but this does not always prevent all vations. yj,k is the kth observation of this observation stream. possible problematic combinations of parameters from being Fk is the forcing related to the observation yj,k . evaluated. Finally, if the ith state vector element is FracH , then the To obtain information about the uncertainty in the poste- gradient is calculated as follows: rior solution, and to deal with the challenges described above, ∂J the framework also allows a Monte Carlo approach (Taran- = 2(S−1 A (x − x A ))i tola, 2005). This entails the framework not starting at a single ∂xi state vector with prior estimates but instead using an ensem- dy H dy LE ble of prior state vectors x A , thus leading to an ensemble −2 FH · + F LE · , (13) dFracH dFracH of posterior parameter estimates. The ensemble of optimi- where F H and F LE are the forcing vectors for the sensi- sations can be executed in parallel on multiple processors, ble and latent heat fluxes, respectively, y H and y LE are the thereby reducing the time it takes to perform the total opti- observation vectors for the sensible and latent heat fluxes, misation. More details on the Monte Carlo mode of ICLASS respectively, and · is the Euclidian inner product. Note that, are given in the Sect. 4.2. when the FracH parameter is included in the state, the ob- As an additional way to improve the posterior solution, servation scaling factors for sensible and latent heat flux ob- we have implemented a restart algorithm. If the optimisa- servations will be set equal to 1 and are not allowed to be tion results in a cost function that is higher than a user- dy dy LE specified number, then the framework will restart the optimi- included in the state. The terms dFracHH and dFrac H represent, respectively, the derivatives of the sensible and latent heat sation from the best state reached so far. This fresh start of the flux observations to the FracH parameter, which follow from tnc algorithm, whereby the algorithm’s memory is cleared, Eqs. (8) and (9) as follows: often leads to a further lowering of the cost function. The maximum number of restarts (>=0) is specified by the user. dy H = εeb (14) If an ensemble is used, then every individual member with dFracH too high a posterior cost function will be restarted. dy LE = −εeb . (15) dFracH Note that Eqs. (10), (12), and (13) all have the same first term 4 Error statistics that originates from the background part of the cost function. 4.1 Prior and observations 3.5 Numerical derivative From a Bayesian point of view, ICLASS can combine in- A simple numerical derivative is available as alternative to formation, both from observations and from prior knowl- the analytical gradient. The derivative of the cost function to edge about the state vector, to come to a solution with a re- the ith state element is numerically calculated as follows: duced uncertainty in the state parameters. In the derivation ∂J J (xi + α) − J (xi − α) of Eq. (2), it is assumed that both prior and observational = , (16) ∂xi 2α errors follow a (multivariate) normal distribution (Tarantola, where α is a very small perturbation to state parameter xi , 2005). However, some prior input parameters are bounded with a default value of 10−6 , and has the units of xi . (Sect. 3.6), e.g. the albedo cannot be negative. Dealing with bounds is a known challenge in inverse modelling, and sev- 3.6 Handling convergence challenges eral approaches can be followed (Miller et al., 2014). As an example, Bergamaschi et al. (2009) had to deal with methane The highly non-linear nature of the optimisation problem emissions in the state vector becoming negative. Their solu- can cause the optimisation to become stuck in a local min- tion was to make the emissions a function of an emission imum of the cost function (Santaren et al., 2014; Bastrikov parameter that is being optimised instead of optimising the et al., 2018; Ziehn et al., 2012). This means that the resulting emissions themselves. Through their choice of function, the posterior state vector can depend on the prior starting point emissions cannot become negative, even though the emission (Raoult et al., 2016), and the resulting posterior state can re- parameter is unbounded. In our case, such an approach is main far from optimal. In the worst case, the non-linearity more difficult, as we have a diverse set of bounded parame- of the model can even lead to a crash of the forward model. ters. This happens with certain combinations of input parameters Our approach is to enforce hard bounds for state values that lead to unphysical situations or undesired numerical be- via the tnc algorithm. It, however, means that the normality haviour. After starting from a user-specified prior state vec- assumption will be violated to some extent, as the normal dis- tor, the tnc algorithm autonomously decides which parame- tribution (for which the user specifies the variance) for prior ter values are tested during the rest of the optimisation. It is parameter values then becomes a truncated normal distribu- https://doi.org/10.5194/gmd-16-47-2023 Geosci. Model Dev., 16, 47–74, 2023
54 P. J. M. Bosman and M. C. Krol: ICLASS 1.1 tion. For a parameter following a truncated normal prior dis- tribution, the prior variance used in the cost function is not {p} T −1 {p} (fully) equal to the variance of the actual prior distribution. J (x) = x − x A SA x − x A The extent to which this is the case depends on the degree of m truncation. X (H (x m , p)i − si yi + εi )2 + wi 2 . (17) Our system also allows for specifying covariances be- i=1 σO,i tween state elements in SA . We assume observational errors to be uncorrelated (see Sect. 3.1). Equation (4) states that the {p} x A is the perturbed prior state vector. The random numbers observational error variance consists of an instrument error, a εi in the data part of the cost function are sampled from nor- model error, and a representation error variance. The instru- mal distributions with mean 0 and standard deviation σO,i . ment and representation error standard deviation are taken This can be seen as either a perturbation of the scaled obser- from user input, while the model error standard deviation can vations or a perturbation of the model–data mismatch. As a either be specified by the user or estimated from a sensitiv- consequence of the change in the cost function equation, the ity analysis. In the latter case, an ensemble of forward-model equation for the forcing vector (Eq. 11) is updated as well: runs is performed. For each of these runs, a set of parameters, not belonging to the state vector, is perturbed (so the param- (H (x m , p)k − sk yk + εk ) Fk = wk 2 . (18) eters are part of the vector p and not from x m ; see Sect. 3.1). σO,k These perturbed parameters are used together with the un- perturbed prior vector x m as model input. The user should Furthermore, in Eqs. (10), (12), and (13), for perturbed en- {p} specify which parameters should be perturbed, and the distri- semble members, x A is used instead of x A . The vector εeb bution from which random numbers will be sampled to add (Eq. 7) is not perturbed and kept identical for all members. to the default (prior) model parameters. For the latter, there For every ensemble member, an optimisation is performed. is the choice between a normal, bounded normal, uniform, or Each optimisation is classified as being either successful or triangular distribution. The model error standard deviation not successful. Here, “successful” is defined as having a final for each observation stream at each observation time is then reduced chi-square (see Sect. 5; Eq. 19) smaller or equal than obtained from the spread in the ensemble of model output for a user-specified number. The posterior state parameters for the observation stream at that time. the successful members are considered a sample of the pos- terior state space and are saved in a matrix. Those parameters 4.2 Ensemble mode to estimate posterior uncertainty can be used to estimate the true posterior probability density functions (pdf’s). A covariance and a correlation matrix are When ICLASS is run in Monte Carlo (ensemble) mode, it then constructed using the matrix of the posterior state pa- allows the estimation of the uncertainty in the posterior state. rameters. These covariance and correlation matrices are used The principle for obtaining posterior error statistics bears a as an estimate of the true posterior state covariance and cor- strong similarity to what has been done in Chevallier et al. relation matrices, respectively. If the user prefers to do so, (2007), and our method will be shortly explained here. We the numeric parameters not belonging to the state can also be start with the prior state specified by the user. Then, for each perturbed in the ensemble, in a similar way to that outlined ensemble member and for every state parameter, a random in Sect. 4.1. number is drawn from a normal distribution with the variance Note that, next to the ensemble, there will also be a run of the respective parameter specified by the user and a mean with an unperturbed prior, just as is the case when no ensem- of 0. When covariances are also specified for the prior param- ble is used. We refer to this run as member 0. This member is, eters, a multivariate normal distribution can be used to sam- however, not included in the calculation of ensemble-based ple from. The sampled numbers are then added to the state error statistics (e.g. correlation matrix), as the choice of prior vector. In case any non-zero covariances are used to perturb, values is not random for this member. and then, if the new state vector falls out of bounds (if spec- ified), it is discarded, and the procedure is repeated for that ensemble member. In case no covariance structure is used, 5 Output then only the parameter which falls out of its bounds has to be replaced. It has to be noted that this effectively leads to ICLASS can write several output files. The output in- sampling from a truncated (multivariate) normal distribution. cludes the obtained parameters, the posterior reduced chi- Similarly, for every ensemble member, a random number square statistic, and the posterior and prior cost function. drawn from a normal distribution (without bounds) is added The reduced chi-square goodness-of-fit statistic is defined in to the (scaled) observations. This modifies the cost function ICLASS as follows: of an individual ensemble member (Eq. 5) as follows: J χr2 = Pm . (19) i=1 (wi ) + n Geosci. Model Dev., 16, 47–74, 2023 https://doi.org/10.5194/gmd-16-47-2023
P. J. M. Bosman and M. C. Krol: ICLASS 1.1 55 In this equation, m is the number of observations, and n is where Jb is the background part of the cost function, i.e. the the number of state parameters (see Sect. 3 for the other sym- first term from Eq. (5). The mean bias error, root mean bols). Note that if all the weights are set to 1, then the denom- squared error, and the ratio of model and observation vari- inator simplifies to m + n. The n will only be included in the ance for every observation stream is also calculated, both for denominator when the user chooses to include a background the prior and posterior state. The mean bias error for the j th part in the cost function (default). In a simple case where the observation stream is defined here as follows: prior errors are uncorrelated, this χr2 statistic for the posterior mj solution should be around 1 when the optimisation converges 1 X εbias,j = (H (x m , p)j,i − sj yj,i ). (23) well and priors and errors are properly specified (e.g. Micha- mj i=1 lak et al., 2005). In a case where all weights are 1, this can be understood as follows. The average value of the ith poste- In this equation, mj is the number of observations of type rior observation residual squared, (H (x m,post , p)i − si yi )2 , (stream) j , H (x m , p)j,i is the model output for observation should be close to σO,i 2 , and the average value of the ith stream j at time index i, yj,i is an observation of stream j at time index i, and sj is an observation scaling factor for obser- posterior data residual squared, (xpost,i − xA,i )2 , should be vations of stream j . Note that, even when the scaled obser- close to the ith diagonal element of the a priori error co- vations are perturbed, we do not include those perturbations variance matrix when the optimisation converges well and here (compare Eq. 20). In case the FracH parameter is used, errors and prior parameters are properly specified. We have the energy-balance-corrected observations (Eqs. 8 and 9) will m observation residuals and n data residuals (if background be used in the equation above. For the root mean squared er- part included). In this example with a diagonal SA matrix, the ror and the ratio of model and observation variance, the ob- residuals are assumed to be independent of each other. Each servation scales and energy-balance-corrected observations squared residual contributes on average a value of approx- are used as well. ICLASS also calculates the normalised de- imately 1 to the cost function, summing to approximately viation of the posterior from the prior for every state param- m + n and, thus, χr2 ≈ 1. Note, however, that the χr2 statis- eter xi as follows: tic can be misleading, in particular when observational errors are correlated (Chevallier, 2007). Furthermore, as mentioned xpost,i − xA,i in Sect. 4.1, prior parameters can follow a truncated normal δnor,i = , (24) σA,i distribution, violating the normality assumption. The impact of this depends on the degree of truncation and also on the where σA,i is the square root of the prior variance of param- number of observations and so on. It can lead to an ideal χr2 eter i, i.e. the square root of diagonal element i from matrix value diverting from 1. SA . Note that, also in case of an ensemble, we use the unper- ICLASS also calculates a prior and posterior partial cost turbed prior, i.e. the prior of member 0. function and a posterior χr2 statistic for individual observa- If run in ensemble mode, we additionally store estimates of tion streams and for the correspondence to prior informa- the posterior error covariance matrix (Sect. 4), the posterior tion (background). The partial cost function for observation error correlation matrix, the ratio of the posterior and prior stream j is calculated as follows: variance for each of the state parameters, the mean posterior state, and the optimised and prior states for every single en- mj X (H (x m , p)i − sj yj,i + εi )2 semble member. When non-state parameters are perturbed in Jj = wi , (20) 2 σO,i the ensemble, these parameters are part of the output. More- i=1 over, the correlation of the posterior state parameters with where mj is the number of observations of stream j , yj,i is these non-state parameters can also be written as output. the ith observation of stream j , and H (x m , p)i is the model When the model error standard deviations are estimated by output corresponding with yj,i . In case scaled observations ICLASS (Sect. 4.1), there is a separate file containing statis- are perturbed in the ensemble (Sect. 4.2), εi is a random num- tics on the estimated error standard deviations. Finally, there ber; otherwise, it is 0. The χr2 statistic for observation stream are additional output files with information about the opti- j is defined here as follows: misation process. For every model simulation (and for every Jj ensemble member) in the iterative optimisation process, one 2 χr,j = Pm j . (21) can find the parameter values used in this simulation and the i=1 (wi ) value of the cost function split up into a data and a back- This equation resembles Eq. (13) in Meirink et al. (2008), but ground part. For the gradient calculations, one can find the 2 parameter values used and the derivatives of the cost function note that the variable χns in their paper is similar to χr,j 2 in our with respect to every state parameter. The derivatives of the paper. The χr2 statistic for the background part is calculated background part are also provided separately. More details as follows (similar to Eq. 26 of Michalak et al., 2005): on the output of ICLASS can be found in a separate section Jb of the manual. An overview of the output variables defined 2 χr,b = , (22) in this section is given in Table B2. n https://doi.org/10.5194/gmd-16-47-2023 Geosci. Model Dev., 16, 47–74, 2023
56 P. J. M. Bosman and M. C. Krol: ICLASS 1.1 6 Technical details of the code Table 1. A simple gradient test example involving the derivative of the 2 m temperature with respect to the roughness length for heat We have written the entire code in Python 3 (https://www. (z0h ). The right column gives, for different values of perturbation α python.org, last access: 9 December 2022). Using the frame- (m), the result of the calculation 1 − RHS divided by LHS, whereby work requires a Python 3 installation, including the NumPy RHS is the right-hand side of Eq. (25), and LHS is the left-hand (van der Walt et al., 2011; https://numpy.org), SciPy (https: side of the same equation. Note that only the RHS of the equation is //scipy.org, last access: 9 December 2022), and Matplotlib influenced by α. In this example, we have only perturbed parameter z0h , but multiple parameters can be perturbed within one test. (Hunter, 2007; https://matplotlib.org, last access: 9 Decem- ber 2022) libraries. The code is operating system indepen- α(m) 1 − ratio RHS and LHS (–) dent and consists of the following three main files: 0.5 4.7 × 10−1 – forwardmodel.py, which contains the adapted CLASS 0.2 3.2 × 10−1 model. 0.1 2.2 × 10−1 – inverse_modelling.py, which contains most of the in- 1 × 10−2 3.4 × 10−2 1 × 10−3 3.6 × 10−3 verse modelling framework, such as the function to be 1 × 10−4 3.7 × 10−4 minimised in the optimisation and the model adjoint. 1 × 10−5 3.7 × 10−5 – optimisation.py, which is the file to be run by the user 1 × 10−6 4.2 × 10−6 for performing an actual optimisation. In this file, the 1 × 10−7 2.6 × 10−6 observations are loaded, the state vector defined, and so 1 × 10−8 −8.2 × 10−6 on. The input paragraphs in this file should be adapted 1 × 10−9 5.9 × 10−4 by the user to the optimisation performed. 1 × 10−12 −8.2 × 10−2 There are three additional files. The first is optimisa- tion_OSSE.py and is similar to optimisation.py but is meant specifically for observation system simulation experiments model is compared to the change in model output when a nu- (also, in this file, the user should adapt the input paragraphs merical finite difference approximation is employed. Math- to the optimisation to be performed). The second file is test- ematically, the test for the ith model output element can be ing.py, which contains tests for the code, as described in written as follows (similar to Honnorat et al., 2007; Elizondo Sect. 7. The file postprocessing.py is a script that can be run, et al., 2000): after the optimisations are finished, for post-processing the dH (x m , p)i H (x m + α1x m , p)i − H (x m , p)i output data, e.g. to plot a coloured matrix of correlations. · 1x m ≈ , (25) Using the Pickle module, the optimisation can store variables dx m α on the disk near the end of the optimisation, and these vari- where 1x m is a vector of ones, with the same length as ables can be read in again in the postprocessing.py script. vector x m . α is a small positive number, H represents the This has the advantage that, for example, the formatting of forward-model operator, H (x m , p)i is the ith model output plots can easily be adapted without redoing the optimisation. element, x m is the vector of the model input variables to be This script should be adapted by the user to the optimisation tested (model parameters part of state), · is the Euclidian in- performed and the output desired. ner product, and vector p is the set of non-state parameters used by the model. Several increasingly smaller values are 7 Adjoint model validation tested for α. When the tangent linear model is correct, then the right-hand side of the equation will converge to the left- When using the adjoint modelling technique, an extensive hand side when α becomes progressively smaller, although testing system is required to make sure that the analytical for too small α values, numerical errors start to arise (Eli- gradient of the cost function computed by the adjoint model zondo et al., 2000). Instead of using the full tangent linear is correct. There are two tests that are essential, which are model, individual model statements can also be checked. In described below. this case, x m and H (x m , p)i can contain intermediate (not The gradient test (for the tangent linear model) is a test to part of model input or output) model variables. The gradient determine whether the derivatives with respect to the model test is considered successful if the ratio of the left- and right- input parameters in the tangent linear model are constructed hand sides of the equation lies in the interval [0.999–1.001]. correctly. The construction of the adjoint is based on the tan- The results of a simple example of a gradient test, involv- gent linear model, so errors in the tangent linear propagate to ing the derivative of the 2 m temperature with respect to the the adjoint model. In the gradient test, model input state vari- roughness length for heat (z0h ), is shown in Table 1. ables are perturbed, leading to a change in model output. The The dot product (adjoint) test checks whether the adjoint change in model output when employing the tangent linear model code is correct for the given tangent linear code. It Geosci. Model Dev., 16, 47–74, 2023 https://doi.org/10.5194/gmd-16-47-2023
P. J. M. Bosman and M. C. Krol: ICLASS 1.1 57 tests whether the identity 8 Inverse modelling validation: observation system simulation experiments hHL x m , yi = hx m , HTL yi (26) holds (Krol et al., 2008; Meirink et al., 2008; Honnorat et al., To test the ability of the system to properly estimate model 2007; Claerbout, 12004). The equation can also be written as parameters, we show, in this chapter, the results of obser- follows: vation system simulation experiments (OSSEs) with artifi- cial data. These types of experiments are classically used to hHL x m , yi − hx m , HTL yi = 0. (27) test the ability of the system to properly estimate model pa- hHL x m , yi rameters. For example, similar tests have been performed by In this equation, HL = ∇ x m H (x m , p) represents the tangent Henze et al. (2007), for their adjoint of a chemical transport linear model operator, i.e. a matrix (the local Jacobian of H ) model. In our experiments, we simulate a growing convective with the element on the ith row and j th column given by boundary layer for a location at mid-latitudes from 10–14 h, dH (x m ,p)i dxm,j . x m is, in this equation, a vector with perturbations including surface layer calculations. In the chosen set-up, the to the model state, HTL represents the adjoint model operator, land surface is coupled to the boundary layer. The land sur- and hi is the Euclidian inner product. A very small deviation face provides heat and moisture and exchanges CO2 with the from 0 is acceptable due to machine rounding errors (Claer- CBL. bout, 12004). Our criterion for passing the test, when using 64 bit floating point calculations, is that the absolute value of 8.1 Parameter estimation the left-hand side of Eq. (27) should be
58 P. J. M. Bosman and M. C. Krol: ICLASS 1.1 Table 2. Parameter values for the main observation system simulation experiments of Sect. 8.1. The observation streams used in the two obs (observation) streams experiments are q and h, for the six obs streams experiment, H , LE, T2 , and FCO2 are added, and for the seven obs streams experiments, CO220 is additionally added. The value of, for example, parameter z0m (only a state element in the 10-parameter- state experiments) is in the 2-parameter-state and 5-parameter-state experiments equal to the true value of z0m in the 10-parameter-state experiments. See Table A1 for a description of the parameters and Table A2 for a description of the observation streams. Note: ppm is parts per million. Parameter True Prior Optimised 2-parameter state Two obs streams h (m) 350.0 650.0 350.0 αrad (–) 0.200 0.450 0.200 5-parameter state Two obs streams Six obs streams h (m) 350.0 650.0 350.0 350.0 αrad (–) 0.200 0.450 0.200 0.200 αsto (–) 1.00 0.50 0.96 1.00 wg (–) 0.270 0.140 0.306 0.270 γθ (K m−1 ) 0.0030 0.0050 0.0030 0.0030 10-parameter state Seven obs streams No noise Noise h (m) 350.0 650.0 350.0 344.4 αrad (–) 0.200 0.450 0.200 0.207 αsto (–) 1.00 0.50 1.00 1.01 wg (–) 0.270 0.140 0.270 0.238 γθ (K m−1 ) 3.0 × 10−3 5.0 × 10−3 3.0 × 10−3 3.1 × 10−3 γq (kg kg−1 m−1 ) −1.0 × 10−6 −3.0 × 10−6 −1.0 × 10−6 −1.1 × 10−6 CO2 (ppm) 422.0 380.0 422.0 423.2 advCO2 (ppm s−1 ) 0.0 6.0 × 10−3 9.8 × 10−8 −1.4 × 10−4 z0m (m) 0.020 0.100 0.020 0.048 z0h (m) 0.020 0.010 0.020 0.013 Table 3. The root mean squared error (RMSE) and the specified measurement error standard deviation (σI ) for the observation streams in the 10-parameter-state OSSE with perturbed observations. In all our OSSEs, the measurement error standard deviations are chosen to be constant with time. Note that the measurement error equals the observational error here, since the model and representation errors were set to 0 for simplicity. The σI are identical for all OSSEs in Table 2, but not all observation streams are used in each OSSE. Observation stream RMSE prior RMSE optimised σI q (kg kg−1 ) 4.5 × 10−4 3.6 × 10−4 4.0 × 10−4 h (m) 365.2 91.9 100.0 H (W m−2 ) 91.7 22.8 25.0 LE (W m−2 ) 128.6 22.7 25.0 T2 (K) 1.50 0.62 0.65 FCO2 (mg CO2 m−2 s−1 ) 42 × 10−2 3.6 × 10−2 4.0 × 10−2 CO220 (ppm) 30.5 1.7 2.0 vation streams we used are specific humidity and boundary tributed. The state parameters are thus expected to have a layer height. The specific humidity is expected to be influ- profound influence on the cost function. enced by the albedo, as the amount of available net radiation The optimised model shows a very good fit to the obser- is relevant for the amount of evapotranspiration, which in- vations (Fig. 3), and the original parameter values are found fluences the specific humidity. The boundary layer height is with a precision of at least five decimal places (Table 2; fewer relevant for specific humidity as well, as it determines the than five decimal places are shown). This result indicates the size of the mixing volume in which evaporated water is dis- optimisations work in such simple situations. Geosci. Model Dev., 16, 47–74, 2023 https://doi.org/10.5194/gmd-16-47-2023
You can also read