Some Statistic and Information-theoretic Results On Arithmetic Average Fusion
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
VOL.XX, NO.XX, 1 OCT. 2021 1 Some Statistic and Information-theoretic Results On Arithmetic Average Fusion Tiancheng Li Abstract—Finite mixture such as the Gaussian mixture is a of the same family is usually a mixture of distributions of flexible and powerful probabilistic modeling tool for representing arXiv:2110.01440v1 [math.ST] 1 Oct 2021 that family. That is, the AA-based f -fusion always leads to a the multimodal distribution widely involved in many estimation mixture. In the mixture distribution, components/mixands are and learning problems. The core of it is representing the target distribution by the arithmetic average (AA) of a finite number properly weighted and correspond to the information gained of sub-distributions which constitute the mixture. The AA fusion from different fusing sources. They joint approximate the has demonstrated compelling performance for both single-sensor target distribution p(X) by their average/AA: and multi-sensor estimator design. In this paper, some statistic and information-theoretic results are given on the AA fusion X fAA (X) = wi fi (X), (1) approach, including its covariance consistency, mean square er- i∈I ror, mode-preservation capacity, mixture information divergence and principles for fusing/mixing weight design. In particular, based on the concept of conservative fusion, the relationship of where X stands for the state of a single or multiple tar- the AA fusion with the existing covariance union, covariance get(s), w = [w1 , w2 , ...]T are positive mixing/fusing weights intersection and diffusion combination approaches is exposed. which are typically normalized, namely wT 1 = 1, and Linear Gaussian models are considered for illustration. fi (X), i ∈ I = {1, 2, · · · , } are the probability distributions, Index Terms—Finite mixture, multi-sensor fusion, distributed e.g., probability density function (PDF) and probability hy- filter, conservative estimation, average consensus, covariance pothesis density function [6]–[8], [19], [20], regarding the intersection, diffusion, covariance union same target(s) yielded by a set of estimators i ∈ I conditioned on different data, models and/or hypotheses. I. I NTRODUCTION The mixture distribution facilitates the closed-from Markov- Bayesian recursion greatly in two means: First, a mixture T HE last two decades have witnessed a steady uptick in the application of information fusion technologies to the state estimation problem which has burgeoned with the vitalization of conjugate priors is also conjugate and can approximate any kind of prior [21], [22]. Second, the linear fusion of a finite number of mixtures of the same parametric family of networked sensors/agents [1]–[4] and has seen substantial remains a mixture of the same family. These properties play a interest in both military and commercial realms. One of the key role in the mixture filters such as GM filter [23], [24], most fundamental fusion approaches is the linear opinion Student’s-t mixture filter [25] and multi-Bernoulli mixture pool which simply mixes all information into one entity in filters of various forms [19], [26], [27]. The AA fusion has a linear manner. This results in a finite mixture representation demonstrated outstanding performances in many challeng- of the underlying state probability distribution [5], such as the ing scenarios [6]–[8], [11]–[17]. Nevertheless, statistical and popular Gaussian mixture (GM). Recently, it has been further information-theoretical study on the mixture/AA fusion of shown that the linear opinion pool has provided a compelling probability distributions (of the same family or not) seems approach, referred to as arithmetic average (AA) fusion, to still missing in two aspects, which motivate this paper. multi-sensor random set information fusion [6]–[17], which 1) First, while the concept of conservative estimation and enjoys high efficiency in computation, resilience to sensor fault fusion has been well accepted, the AA fusion results (such as misdetection), tolerance to internode correlation of in an inflated covariance which seems at variant with any degree, and closure for mixture fusion. the minimum variance estimator/fusion. This unavoid- Two common types of data to be fused are variables and ably raises a concern on the decreased accuracy if one probability distribution/functions, typically such as the esti- simply swaps the inflated variance with increased mean mated number of targets and the posterior distribution of the square error (MSE) in the context of fusion. We in target states, hereafter referred to as v-fusion and f -fusion [18], this paper clarify their major difference and provide in- respectively. Averaging operation is significantly different in depth analysis on the covariance consistency and “mode- two types of fusion: The average of multiple variables is preservation” feature of the AA fusion. simply a single variable while the average of distributions 2) Second, we analyze how does the AA fusion compare Manuscript first submitted on 1st Oct. 2021 with the fusing estimators and how the fusing weights This work was partially supported by National Natural Science Foundation should be designed in order to maximize the fusion of China under grants 62071389. gain. By these theoretical studies, the use of the AA T. Li is with the Key Laboratory of Information Fusion Technology (Ministry of Education), School of Automation, Northwestern Polytechnical fusion approach for multi-sensor estimator design is University, Xi’an 710129, China, e-mail: t.c.li@nwpu.edu.cn better motivated.
VOL.XX, NO.XX, 1 OCT. 2021 2 The remainder of this paper is organized as follows. In We consider a set of estimate pairs composed of state Sec. II, we revisit and compare several classic conservative estimate x̂i and associated positive definite error covariance estimation approaches and clarify the difference between the matrix Pi , i ∈ I = {1, 2, 3, ...}, which are to be fused covariance and MSE of an estimator. Moreover, as the unique P normalized weights w , {w1 , w2 , ...}, where using positive, feature of the AA fusion, it adaptively switches between wi > 0, i∈I wi = 1. Hereafter, x̂i is given by the EAP mixing and merging fusing components according to the need estimator unless otherwise stated. Then, each estimate pair and can thereby preserve the mode information. This leads to corresponds to the first and second moments of a posterior the fault-tolerant capacity to deal with inconsistent-estimators. PDF fi (x) thatR is an estimate of Rthe true distribution p(x), In Sec. III, we study the exact divergence of the mixture from i.e., x̂i = x̃fi (x̃)dx̃, Pi = (x̃ − x̂i )(·)T fi (x̃)dx̃. In the true/target distribution, providing an information-theoretic particular, a Gaussian PDF can be uniquely determined by an justification for the AA fusion. Then, we discuss principles estimate pair. Fusing in terms of only the mean and variance for fusing weight design. Some of the results are not limited implicitly imposes linear Gaussian model assumption. We use to linear Gaussian models, but we use linear Gaussian models the shorthand writing (x − y)(·)T := (x − y)(x − y)T . The for illustration. We briefly conclude in Sec. IV. MSE of x̂i is denoted as MSEx̂i = E[(x − x̂i )(·)T ]. The KL divergence of the probability R distribution p(x) relative to f (x) II. C ONSERVATIVE F USION AND S TATISTICS OF AA f (x) is given as DKL (f ||p) = f (x) log p(x) dx. In the context of time-series state estimation, optimality is Remark 1. Pi indicates the variance of the state estimate usually sought such as minimum mean square error (MMSE), x̂i , which can be interpreted as the level of uncertainty of maximum a posteriori (MAP) or minimized Bayes risk [28], the estimator i on its state estimate x̂i . It is, however, not an resulting in different classes of estimators: MMSE point es- estimate/approximate of MSEx̂i . timator and Bayes-optimal density estimator. There is a key Definition 1 (conservative). An estimate pair (x̂, Px̂ ) regard- difference between two optimal criterion: the former relies on ing the real state x, is conservative [33]–[36] when the statistics of the estimator such as the mean and square error [29], and the latter on the overall quality of the posterior for Px̂ ≥ MSEx̂ , which a proper distribution-based metric such as the Kullback i.e., Px̂ − E[(x − x̂)(·)T ] is positive semi-definite. Leibler (KL) divergence is useful. The notion is also referred to as pessimistic definite [37] or The very nature of optimal fusion whether in the sense of as covariance consistent. Extended definition of the conserva- either MMSE or Bayes needs the knowledge of the cross- tiveness of PDFs can be found in [38], [39]. A relevant notion correlation or dependence among the information sources of “informative” is given as follows [29]–[32]. However, this often turns out to be impractical Definition 2 (informative). An estimate pair (x̂1 , P1 ) is said due to the complicated correlation and unknown common to be more informative than (x̂2 , P2 ) regarding the same state information among sensors/agents. This leads to suboptimal, x when P1 < P2 . covariance consistent solutions including the AA fusion. With respect to the type of data, there are two forms of AA fusion as follows: A. Notations, Concepts and Definitions • In a point estimation problem, the AA v-fusion is carried Note that the mean of the multitarget distribution corre- out with regard to these state estimate variables x̂i , i ∈ I sponds to the state of no target while the variance relies on not which yields a new variable as follows: only the estimation uncertainty but also on the distance among X targets. Therefore, statistics such as the mean and variance do x̂AA = wi x̂i . (4) not apply to the multi-target estimator. In this section, we limit i∈I statistics analysis with respect to a single target only. • In the Bayesian formulation, the estimation problem is to In the following, we use x ∈ Rnx to denote the nx - find a distribution that best fits p(x). The corresponding dimensional state of the concerning single-target which is the AA f -fusion is carried out with regard to fi (x), i ∈ I quantity to be estimated, namely the true state. It can be a non- which yields a mixture distribution of these fusing distri- random/deterministic or a random vector. When it is a random butions as follows: vector, we use p(x) to denote the corresponding PDF, namely X the true distribution of x. For a given Bayesian posterior f (x) fAA (x) = wi fi (x). (5) i∈I which is an estimate to the real distribution p(x), the two most common state estimate extraction approaches, namely the In either way, as long as the fused data can be split to two expected a posteriori (EAP) and MAP estimators are given as parts depending on whether they are shared by all the fusing follows. parties: non-common part and common part, it is obvious that the AA fusion avoids double-counting the common data as Z x̂EAP = x̃f (x̃)dx̃, (2) long as the fusing weights are normalized. This is the key x̂MAP = arg max f (x̃). (3) to deal with common a-priori information [30], [31] and data x̃ incest [40] when a sensor network is employed where some That is, in the EAP estimator, the state estimate is given as information can easily be replicated and repeatedly transmitted the mean of the posterior while in the MAP the state estimate to the local sensors. Meanwhile, non-common information will is given as the peak of the posterior distribution. be diluted in the result: counted less than a unit [16], which
VOL.XX, NO.XX, 1 OCT. 2021 3 implies an over-conservativeness. In other words, in order to According to the expressions (7), (8) and (9), we have a avoid information double-counting [41], the AA fusion treats conservative fusion chain: the unknown correlation as the worst case: all fusing data are PNaive < min(Pi ) ≤ PlCU ≤ PAA ≤ PuCU , (12) identical. In this paper, we refer to this worst-case handling i∈I ability loosely as robustness and so the AA fusion can be where mini∈I (Pi ) = PlCU holds if and only if (iif) x̂AA = x̂i deemed a robust fusion approach. in (8), PlCU = PAA = PuCU hold iff P̃i = P̃j , ∀i 6= j and X −1 B. Conservative Fusion: CU versus AA PNaive = P−1 , (13) i Based on the concept of conservativeness, there are a i∈I number of results as given in the following Lemmas. which corresponds the covariance of theQmultiplied Gaussian Lemma 1. A sufficient condition for the fused estimate pair distributions, i.e., N (µNaive , PNaive ) = i∈I Ni (µi , Pi ). (x̂AA , PCU ) in which at least one pair is unbiased and The above chain implies that the AA fusion trades off conservative, to be conservative is that between conservative and informative, lying in the middle of the two conditions: (i) all fusing estimators are conservative PCU ≥ Pi + (x̂AA − x̂i )(·)T , ∀i ∈ I , (6) and (ii) at least one fusing estimator is conservative. which is upper bounded by C. Conservative Fusion: More or Less Conservative PuCU = max Pi + (x̂AA − x̂i )(·)T . (7) i∈I In contrast to the AA fusion (5), the geometric average (GA) Proof for Lemma 1 can be found in [7]. It actually pro- of the fusing sub-PDFs fi (x) is given as vides a conservative fusion method which is known as the Y fGA (x) = C −1 fiwi (x). (14) covariance union (CU) [34], [42], [43]. It is fault tolerant i∈I as it preserves the covariance consistency as long as at least −1 Q wi one fusing estimator is consistent. When all fusing estimators where C := i∈I fi (x) is the normalization constant. are conservative, i.e., Pi ≥ (x − x̂i )(·)T , ∀i ∈ I, a more In the linear Gaussian case with respect to estimate pairs informative estimator can be obtained by the following Lemma (x̂i , Pi ), i ∈ I , it is given as follows X Lemma 2. For a set of conservative estimate pairs (x̂i , Pi ), x̂GA (w) = PGA wi P−1 i x̂i , (15) i ∈ I = {1, 2, · · · }, a sufficient condition for the fused i∈I −1 estimate pair (x̂AA , PlCU ) to be conservative is given by X PGA (w) = wi P−1 i . (16) PlCU ≥ min Pi + (x̂AA − x̂i )(·)T . (8) i∈I i∈I As a special case of the GA fusion, the covariance intersec- When the local estimation is given in terms of Bayesian tion (CI) fusion optimizes the fusing weights as follows posterior fi (x) with the corresponding state estimate x̂i and associated positive definite error covariance matrix Pi , i ∈ I, wCI = arg min Tr(PGA ), (17) w the AA fusion has the following results where Tr(P) calculates the trace (or the determinant) of matrix Lemma 3. The AA f -fusion (5) results in a mixture distribu- P. tion for which the mean and covariance are respectively given This indicates x̂CI = x̂GA (wCI ), PCI = PGA (wCI ). A by (4) and variety of strategies and approaches have been reported for further reducing the error covariance metric, leading to various X PAA = wi P̃i , (9) i∈I CI-like fusion approaches such as the so-called split-CI [46] / bounded covariance inflation [47], ellipsoidal intersection [48], where the adjusted covariance matrix is given by inverse CI (ICI) [49], [50]. P̃i = Pi + (x̂AA − x̂i )(·)T . (10) It is worth noting that, convex combinations have been R R P widely, earlier considered in the context of adaptive filtering Proof. P First, x̂AA = x̃fAA (x̃)dx̃ = x̃ i∈I wi fi (x̃)dx̃ = [51]–[54] and of Kalman filters [55], as referred to as diffu- w i∈I i x̂i . Proof of (9) for fusing two Gaussian distributions sion. That is, the diffusion combination operation is actually can be found in Appendix B of [16], which can be easily the AA fusion of merely the point state estimates as in (4) extended to any finite number of fusing distributions. The without any adjusting the local error covariance matrix. Instead Appendix B of [16] further showed that, the above results of using any of the above adjusted/fused covariance, the local x̂AA and PAA correspond to the moments of the merged single estimated error covariance matrix does not change in the Gaussian distribution after applying the merging approach [44] diffusion combination operation at each local node i, i.e., to all Gaussian distributions in the mixture. Pdiff = Pi ≥ PCI . In fact, the Gaussian PDF that best fits the AA mixture has In contrary to this trend, it is our observation that the GA the same first and second moments [45, Theorem 2], i.e., fusion is often not too conservative but insufficient in cluttered scenarios which may suffer from out-of-sequential measure- (x̂AA , PAA ) = arg min DKL fAA kN (µ, P) . (11) ment [42] or spurious data [43] and model mismatching. The (µ,P)
VOL.XX, NO.XX, 1 OCT. 2021 4 reason is simply that the fused covariance PCI does not take Proof. The proof is straightforward as follows, into account x̂CI or any fusing state estimate x̂i , i ∈ I as both h X T i AA fusion and CU fusion do, not to mention any higher order MSEx̂AA =E x − wi x̂i · moments of the posterior distribution. If any fusing estimator i∈I X X is covariance-inconsistent, the GA fusion will very likely be = wi2 MSEx̂i + 2wi wj Cov(x̂i , x̂j ) (24) inconsistent. In this case, a more conservative fusion approach i∈I i 0, ∀i ∈ I, i∈I wi = 1. fusion chain: Remark 2. For independent estimators of equivalent covari- PNaive < PICI < PCI ≤ Pdiff ≤ PAA , PFFCC , (18) ance Pi = Pj , ∀i 6= j ∈ I, the optimal fusion is the aforementioned naive fusion which yields the same fused where Pdiff = PCI = PAA holds iif all fusing estimators are state estimate x̂Naive = x̂AA but a smaller covariance as identical, and PCI = PFFCC holds iif δ = 1. compared with the AA fusion except that all fusing estimators are identical. That is, the state estimate of the AA fusion in this case is optimal but it is not confident and conservatively D. State Estimate MSE of AA f -Fusion uses an inflated covariance. There is a significant difference between EAP and MAP: When MSEx̂i = MSEx̂j , wi = wj , ∀i 6= j ∈ I, (25) will 1 only when the EAP estimator is implemented, the AA f -fusion reduce to MSEx̂AA = |I| MSEx̂i , which indicates that the AA (5) will lead to the AA v-fusion (4), i.e., fusion can significantly benefit in gaining lower MSE in the Z case that all fusing estimators have equivalent MSEs and are EAP conditionally independent of each other. This does not matter x̂AA = x̃fAA (x̃)dx̃ what are their respective associated error covariances. In fact, X Z = wi x̃fi (x̃)dx̃ if these Pfusing estimators are overall negatively correlated, i∈I namely, i 0. (23) densities), un-weighted GA fusion (i.e., (15) and (16) using i∈I w1 = w2 = 0.5), CI fusion (i.e., (15) and (16) using optimized
VOL.XX, NO.XX, 1 OCT. 2021 5 Fig. 1. Fusing two Gaussian densities having four different levels of divergences, using naive fusion, GA fusion, AA fusion or CU fusion, all using the same fusing weights. (a): two densities overlap largely and both estimators are likely to be conservative. (b): two densities are offset from each other but still overlap somehow. (c) and (d): two densities are greatly offset from each other and at most one estimator is conservative. Note: when the fusing weights are optimized as shown in (17), the GA fusion reduces to the CI fusion which will have smaller covariance. The naive AA fusion is given the union of the re-weighted two fusing densities which may not be merged to one in practice (especially when they are unsimilar/divergent, like in (c)) — This is the unique feature of the AA fusion which does not always merge pieces of information into one. What has been shown in blue is the merged result, which is only reasonable when the components are close/similar, like in (a), but not in (d). weight as (17) which results in w1 = 0.3764, w2 = 0.6236 GM/Gauss 1 AA in this case), un-weighted AA fusion (i.e., (4) and (9) using Fusion w1 = w2 = 0.5) with and without component merging, and + MAP two versions of the CU fusion with fused covariance given as adaptive Output mixture in (7) (referred to as CU max) and in (8) (referred to as CU GM/Gauss n merging min), respectively. These results confirm the two conservative fusion chains (12) and (18): First, all these conservation fusion Adaptive Switching Between EAP and MAP approaches lead to obvious covariance inflation as compared with the naive fusion. Second, the CU and AA fusion have Fig. 2. AA fusion with adaptive mixture merging and MAP performs more or less greater inflation than the GA/CI fusion. equivalently as an estimator that adaptively switches between EAP and MAP. In all four scenarios, without the knowledge of the true target position, we cannot tell whether any of the seven fusion distributions and the fused result as follows [16], [58], [59] schemes is better than the others, no matter the state estimate X fAA (x) = arg min wi DKL fi ||g , (27) is given by the EAP or MAP. Even in scenario (a) these is g i∈I no guarantee that the target is localized in the intersection of X fGA (x) = arg R min wi DKL g||fi . (28) P1 and P2; if it is not, then both naive fusion and GA/CI g: χ g(x)δx=1 i∈I fusion will likely produce incorrect results. In scenarios (c) and (d), at least one of the local densities to be fused (P1 It has been pointed out in [60] that the forward KL divergence or P2) is inconsistent in the sense that there is a large offset as shown in (28) but not (27) has a tendency towards the from the target position, whatever it may be 1 . In these cases, merging operation no matter how separated the components the AA fusion opts not to merge two components to one to are; see illustrative examples studied therein. Such a tendency avoid producing incorrect results. This demonstrates an unique may be preferable in applications such as MMSE based feature of the AA fusion: When the fusing distributions diverge estimation, namely, (2), but may lead to a loss of the important significantly with each other (such as in (d)), it will not merge details of the mixture, e.g., the mode, which is less desirable them but only re-weight them and preserve the original modes; in applications such as MAP estimation, namely, (3). This so is done in existing AA fusion approaches which outperform explains the advantage of the AA fusion in dealing with in- their competitors in various scenarios [6]–[17]. We refer to this consistent fusing estimators, namely “fault-tolerant”. Arguably as the mode-preservation capacity of the AA fusion. speaking, when component merging is adaptively applied with the AA fusion and the state estimates are extracted in the There is a theoretical explanation. Recall that the AA and MAP mode, they joint perform equivalently to the scheme that GA fusion rules symmetrically minimize the weighted sum of adaptive switches between EAP and MAP for state estimate the directional KL divergences between the fusing probability output. This can be illustrated as in Fig. 2. There are other reasons why a local fusing estimator may be grossly incorrect, including supernormal noise, missed detec- tion and false alarms at the respective sensor, while the other 1 This is common in mixture-type filters, since at any specific time-instant, fusing estimators may still provide an accurate representation the centers of most components do not necessarily conform to the state of of the target state. In any time-instant, it is also possible that any target even through the mixture filter might be statistically unbiased. This can be easily illustrated in the GM-based random finite set filters of which different local estimators provide accurate representations of Matlab codes are available at http://ba-tuong.vo-au.com/codes.html different targets. In these cases, the AA fusion can be expected
VOL.XX, NO.XX, 1 OCT. 2021 6 to compensate the effects of local misdetection, false alarm As shown above, the component that fits the target dis- and noises through averaging and tends to be more robust and tribution better (corresponding to smaller DKL (fi kp)) and accurate, in the sense that its mode-preservation feature makes diverges more from the average (corresponding to greater better use of the correct parts of the information provided by DKL (fi kfAA )) will be assigned with a greater fusing weight. the fusing estimators. However, the true/target distribution p(x) is always unknown, Remark 3. In the sequential filtering problem, new data will so is DKL (fi kp). It is also obvious that even if the true help identify which component is correct and which is false distribution p(x) is available, the knowledge DKL (fi kp) < and can therefore be pruned. That is, the AA fusion does not DKL (fj kp), ∀j 6= i does not necessarily result in wi = 1, wj = forcefully merge/prune conflicting information but leave the 0, ∀j 6= i. It also depends on DKL (fi kfj ), ∀j 6= i. decision to the new data. Remark 4. As long as at least two fusing estimators are consistent, the optimal fusion will not be fully dominated by III. I NFORMATION D IVERGENCE I N BAYESIAN VIEWPOINT any one component. In other words, when the fusing weights are properly designed, the average of the mixture may fit the In the Bayesian formulation, the real state is considered target distribution better than the best component. random and the Bayesian posterior is given in the manner of A simplified alternative is ignoring the former part in (31) an estimate to the true distribution p(x) of x (or p(X) of a which will then be reduced approximately to the following multi-target set X). In this section we use the KL divergence suboptimal maximization problem to measure the quality of distribution f (x) with regard to the X real state distribution p(x). wsubopt = arg max wi DKL (fi kfAA ), (32) w i∈I X A. Mixture Divergence = arg max wi H(fi , fAA ) − H(fi ) , (33) w i∈I Lemma 5. For a number of probability distributions fi (x), i ∈ R I, the KL divergence of the target distribution p(x) relative where H(f, g) := − f (x) log g(x)δx is the cross-entropy to their average fAA (x) is given as of distributions f and g, also called Shannon entropy, and X H(f ) := H(f, f ) is the differential entropy of distribution DKL (fAA kp) = wi DKL (fi kp) − DKL (fi kfAA) (29) f (x). i∈I The suboptimal, practically operable, optimization given by (32)/(33) assigns a greater fusing weight to the distribution X ≤ wi DKL (fi kp) (30) i∈I that diverges more from the others. This can be referred to as a diversity preference solution. An alternative solution is to where the last equation holds iif all fusing sub-distributions resort to some functionally-similar divergences or metrics to fi , i ∈ I are identical. assign higher weights to the components that fit the data better, Proof. The proof is straightforward and is independently namely having a higher likelihood. This likelihood driven found in [61]. Similar results have been given earlier in the solution is the key idea for weight updating in most mixture textbook [62, Theorem 4.3.2]. Noticing that the KL divergence models/filters. function is convex, this can also be proved by using Jensen’s Nevertheless, one may design the fusing weights for some inequality [63, Ch. 2.6]. other purposes, e.g., in the context of seeking consensus over a peer-to-peer network [1], [67], they are typically designed Lemma 4 indicates that the average of the mixture fits the for ensuring fast convergence [7], [8], [13], [16]. target distribution better than all component sub-distributions on average. This therefore provides an information-theoretic C. Max-Min Optimization justification for the AA fusion. Optimized mixing weights will Recall the divergence minimization (27) that the AA fusion accentuate the benefit of fusion. admits [13], [59]. Now, combining (27) with (32) yields joint optimization of the fusing form and fusing weights as follows B. Fusing Weight X fAA (wsubopt ) = arg max min wi DKL (fi kg). (34) w g The naive weighting solution is the normalized uniform i∈I weights [64], [65], namely w = 1/S. That is, all fusing This variational fusion problem (34) resembles that for estimators are treated equally which makes sense for fusing geometric average (GA) fusion [68], [69], i.e., information from homogeneous sources. This is simple but X does not distinguish online the information of high quality fGA (wsubopt ) = arg max min wi DKL (gkfi ). (35) w g from that of low at any particular time. That being said, it i∈I does not necessarily mean a worse performance as compared It has actually been pointed out that the suboptimal fusion with online tuned fusing weights [66]. results for both variational fusion problems have equal KL More convincingly, the optimal solution should minimize divergence from/to the fusing sub-distributions [68]. That is, DKL (fAA kp) in order to best fit the target distribution, i.e., ∀i 6= j ∈ I X DKL (fi kfAA (wsubopt )) = DKL (fj kfAA (wsubopt )), (36) wopt = arg min wi DKL (fi kp) − DKL (fi kfAA ) . (31) w i∈I DKL (fGA (wsubopt )kfi ) = DKL (fGA (wsubopt )kfj ), (37)
VOL.XX, NO.XX, 1 OCT. 2021 7 which implies that the suboptimal, diversity preference fusion When the target distribution is unknown and the diversity tends to revise all fusing estimators equivalently, resulting in preference solution as given in (32) is adopted, (39) approxi- a middle distribution where the AA and GA differs from each mately reduces to other due to the asymmetry of the KL divergence. X wsubopt ≈ arg max wi DKL fi kfAA,merged Remark 5. The above max-min solution P is suboptimal, which w i∈I has ignored the minimization over i∈I wi DKL (fi kp) and X det(PAA ) wi tr P−1 prefers diversity. Derivation for (37) has been earlier given in = arg max AA Pi − nx + log [35], [70] which is related to the Chernoff information [63], w det(Pi ) i∈I [71]. Weighted middle is suggested in [72] which is assigning different weights on both sides of (37). + (µi − µAA )T P−1 AA (µi − µAA ) . (40) Both (39) and (40) can be solved exactly. However, such an approximation based on fitting the GM by a single Gaussian is D. Case Study: Gaussian Fusion poor in accuracy [75]. As explained in Sec. II-E and in [60], the merging may lead to a loss of the important details of We now consider the case of Gaussian distribution, denoted the mixture, such as the sub-peaks and modes. More general by N (x; µ, P) with nx -dimensional mean vector µ and error result of the KL divergence between multivariate generalized covariance matrix P. The probability density function of the Gaussian distributions can be found in [76]. The KL diver- nx -dimensional Gaussian distribution N (x; µ, P) is given by gence between two Bernoulli random finite set distributions 1 1 with Gaussian single target densities is given in [77]. T −1 N (x; µ, P) = exp − (x−µ) P (x−µ) . 2) Bound Optimization: Using Jensen’s inequality on the (2π)nx /2 |P|1/2 2 convex function − log(x), i.e., f (E[x]) ≤ E[f (x)], a lower bound can be found for H(fi , fAA ) as follows The KL divergence of f1 (x) := N (x; µ1 , P1 ) relative to Z f2 (x) := N (x; µ2 , P2 ) is given as H(fi , fAA ) = − fi (x) log fAA δx Z DKL N (x; µ1 , P1 )kN (x; µ2 , P2 ) ≥ − log fi (x)fAA δx 1 det(P2 ) tr P−1 = 2 P1 − nx + log X Z 2 det(P1 ) = − log wj fi (x)fj (x)δx . (41) T −1 j∈I + (µ1 − µ2 ) P2 (µ1 − µ2 ) . (38) Considering Gaussian distributions, i.e., fi (x) = N (x; µi , Pi ),fj (x) = N (x; µj , Pj ), we have the integration The KL divergence between Gaussians follows a relaxed of the product of two Gaussian distributions as triangle inequality and small KL divergence further shows Z approximate symmetry [73]. However, for the KL divergence fi (x)fj (x)δx = N (µi ; µj , Pi + Pj ) , zi,j . (42) between two GMs, there is no such closed form expression. Beyond the Monte Carlo method [74], a number of approx- Furthermore, for a single Gaussian density fi (x) = imate, exactly-expressed approaches have been investigated N (x; µi , Pi ), the differential entropy is given by [75]. In the following we consider two alternatives. 1 H(fi ) = log (2πe)nx |Pi | . 1) Moment Matching-Based Approximation: The first is (43) 2 merging the mixture to a single Gaussian, or to say, fitting Combing (41), (42) and (43) leads to the GM by a single Gaussian. Then, the divergence of two GMs or between a Gaussian distribution and a GM can DKL (fi kfAA ) = H(fi , fAA ) − H(fi ) be approximated by that between their best-fitting single X log (2πe)nx |P | i Gaussian distributions as shown in (38). As given in Lemma ≥ − log wj zi,j − . 2 3, the moment fitting Gaussian for a GM consisting of a j∈I number of Gaussian distributions fi (x) = N (x; µi , Pi ), i ∈ (44) P is fAA,merged (x) P= N (x; µAA , PAA ), where I T µAA = This is useful for solving the suboptimal optimization prob- w i∈I i iµ , PAA = w i∈I i Pi + (µ AA − µ i )(·) . lem (32) as maximizing the lower bounds implies maximizing Then, one uses fAA,merged (x) for approximately fitting the the content. That is, target Gaussian distribution p(x) = N (x; µ, P), yielding X wsubopt = arg max wi DKL (fi kfAA ), w i∈I wopt ≈ arg min DKL N (x; µAA , PAA )kN (x; µ, P) " X w X det(P) ≈ arg min wi log wj zi,j w = arg min tr P−1 PAA − nx + log i∈I j∈I w det(PAA ) # log (2πe)nx |Pi | + (µAA − µ)T P−1 (µAA − µ) . (39) + . (45) 2
VOL.XX, NO.XX, 1 OCT. 2021 8 Bounds for entropy or KL divergence between GMs can be [16] T. Li, X. Wang, Y. Liang, and Q. Pan, “On arithmetic average fusion found in e.g., [75], [78]–[82] which can therefore be used in and its application for distributed multi-Bernoulli multitarget tracking,” IEEE Trans. Signal Process., vol. 68, pp. 2883–2896, 2020. (33) or in (32) for approximate sub-optimization. Sure, other [17] H. V. Nguyen, H. Rezatofighi, B.-N. Vo, and D. C. Ranasinghe, divergences may be considered in lieu of the KL divergence, “Distributed multi-object tracking under limited field of view sensors,” such as the Jensen-Renyi divergence [83], integral square error IEEE Trans. Signal Process., 2021, DOI: 10.1109/TSP.2021.3103125. [18] T. Li, H. Fan, J. Garcı́a, and J. M. Corchado, “Second-order statistics [84], [85] and so on; see [82]. analysis and comparison between arithmetic and geometric average fusion: Application to multi-sensor target tracking,” Information Fusion, vol. 51, pp. 233 – 243, 2019. IV. C ONCLUSION [19] R. P. S. Mahler, Advances in Statistical Multisource-Multitarget Infor- mation Fusion, ser. Electronic Warfare. Artech House, 2014. This paper explains how does the AA fusion maintain co- [20] B.-N. Vo, M. Mallick, Y. Bar-shalom, S. Coraluppi, R. Osborne, variance consistency, improve accuracy and preserve important R. Mahler, and B.-T. Vo, “Multitarget tracking,” in Wiley Encyclopedia mode information, how is it information-theoretically sound of Electrical and Electronics Engineering. John Wiley & Sons, 2015. [21] S. R. Dalal and W. J. Hall, “Approximating priors by mixtures of natural and how should the fusing weights be properly designed. conjugate priors,” Journal of the Royal Statistical Society: Series B Its connection with classic conservative fusion approaches is (Methodological), vol. 45, no. 2, pp. 278–286, 1983. exposed. These findings are expected to fill the gap in the [22] P. Diaconis and D. Ylvisaker, “Quantifying prior opinion,” Report Number: EFS NSF207, Oct. 1983. theoretical study and promote further development of this [23] H. Sorenson and D. Alspach, “Recursive Bayesian estimation using fundamental information fusion approach. Gaussian sums,” Automatica, vol. 7, no. 4, pp. 465–479, 1971. It is interesting albeit challenging to extend the AA fusion [24] L. Pishdad and F. Labeau, “Analytic MMSE bounds in linear dynamic systems with Gaussian mixture noise statistics,” arXiv:1506.07603, for information fusing regarding state trajectories of different 2015. spatial-temporal dimensions, whether based on discrete labels [25] D. Peel and G. McLachlan, “Robust mixture modelling using the t [26], [86] or continuous-time curves [87], [88]. distribution,” Stat. Comput., vol. 10, no. 4, pp. 339–348, 2000. [26] B.-T. Vo and B.-N. Vo, “Labeled random finite sets and multi-object conjugate priors,” IEEE Trans. Signal Process., vol. 61, no. 13, pp. R EFERENCES 3460–3475, Jul. 2013. [27] A. F. Garcı́a-Fernández, J. L. Williams, K. Granström, and L. Svensson, [1] R. Olfati-Saber, J. A. Fax, and R. M. Murray, “Consensus and coopera- “Poisson multi-Bernoulli mixture filter: Direct derivation and implemen- tion in networked multi-agent systems,” Proc. IEEE, vol. 95, no. 1, pp. tation,” IEEE Trans. Aerosp. Electron. Syst., vol. 54, no. 4, pp. 1883– 215–233, Jan. 2007. 1901, 2018. [2] A. H. Sayed, “Adaptation, learning, and optimization over networks,” [28] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Found. Trends in Machine Learn., vol. 7, no. 4-5, pp. 311–801, 2014. Theory. Upper Saddle River, NJ, USA: Prentice-Hall, 1993. [3] S. H. Javadi and A. Farina, “Radar networks: A review of features and [29] X. Li, Y. Zhu, J. Wang, and C. Han, “Optimal linear estimation fusion.i. challenges,” Information Fusion, vol. 61, pp. 48–55, 2020. unified fusion rules,” IEEE Transactions on Information Theory, vol. 49, [4] K. Da, T. Li, Y. Zhu, H. Fan, and Q. Fu, “Recent advances in multisensor no. 9, pp. 2192–2208, 2003. multitarget tracking using random finite set,” Front Inform Technol [30] Y. Bar-Shalom and L. Campo, “The effect of the common process Electron Eng., vol. 22, no. 1, pp. 5–24, 2021. noise on the two-sensor fused-track covariance,” IEEE Trans. Aerosp. [5] G. McLachlan and D. Peel, Finite Mixture Models, ser. Wiley Series in Electron. Syst., vol. AES-22, no. 6, pp. 803–805, Nov. 1986. Probability and Statistics. New York, USA: Wiley, 2000. [31] C. Chong, S.Mori, and K. Chang, “Distributed multitarget multisensory [6] T. Li, J. Corchado, and S. Sun, “On generalized covariance intersection tracking,” in Multitarget Multisensor Tracking: Advanced applications, for distributed PHD filtering and a simple but better alternative,” in Proc. Y. Bar-Shalom, Ed. Artech House, 1990. FUSION 2017, Xi’an, China, Jul. 2017, pp. 808–815. [32] S.-L. Sun, “Distributed optimal linear fusion estimators,” Information [7] ——, “Partial consensus and conservative fusion of Gaussian mixtures Fusion, vol. 63, pp. 56–73, 2020. for distributed PHD fusion,” IEEE Trans. Aerosp. Electron. Syst., vol. 55, [33] J. K. Uhlmann, “Dynamic map building and localization: New theoret- no. 5, pp. 2150–2163, Oct. 2019. ical foundations,” Ph.D. dissertation, University of Oxford, UK, 1995. [8] T. Li and F. Hlawatsch, “A distributed particle-PHD filter using [34] ——, “Covariance consistency methods for fault-tolerant distributed data arithmetic-average fusion of Gaussian mixture parameters,” Information fusion,” Inf. Fusion, vol. 4, no. 3, pp. 201–215, 2003. Fusion, vol. 73, pp. 111–124, 2021. [35] S. J. Julier, “An empirical study into the use of chernoff information for [9] T. Li, V. Elvira, H. Fan, and J. M. Corchado, “Local-diffusion-based dis- robust, distributed fusion of Gaussian mixture models,” in Proc. FUSION tributed SMC-PHD filtering using sensors with limited sensing range,” 2006, Florence, Italy, Jul. 2006. IEEE Sensors J., vol. 19, no. 4, pp. 1580–1589, Feb. 2019. [36] O. Bochardt, R. Calhoun, J. K. Uhlmann, and S. J. Julier, “Generalized [10] A. K. Gostar, R. Hoseinnezhad, and A. Bab-Hadiashar, “Cauchy- information representation and compression using covariance union,” in Schwarz divergence-based distributed fusion with Poisson random finite Proc. FUSION 2006, Florence, Italy, Jul. 2006, pp. 1–7. sets,” in Proc. ICCAIS 2017, Chiang Mai, Thailand, Oct. 2017, pp. 112– [37] X. R. Li and Z. Zhao, “Measuring estimator’s credibility: Noncredibility 116. index,” in Proc. FUSION 2006, 2006, pp. 1–8. [11] H. Kim, K. Granström, L. Gao, G. Battistelli, S. Kim, and H. Wymeer- [38] J. Ajgl and M. Šimandl, “Conservativeness of estimates given by sch, “5g mmwave cooperative positioning and mapping using multi- probability density functions: Formulation and aspects,” Inf. Fusion, model PHD filter and map fusion,” IEEE Trans. Wireless Commun., vol. 20, pp. 117–128, 2014. vol. 19, no. 6, pp. 3782–3795, 2020. [39] S. Lubold and C. N. Taylor, “Formal definitions of conservative PDFs,” [12] R. K. Ramachandran, N. Fronda, and G. Sukhatme, “Resilience in multi- [Online] arXiv:1912.06780v2, 2019. robot multi-target tracking with unknown number of targets through [40] B. Khaleghi, A. Khamis, F. O. Karray, and S. N. Razavi, “Multisensor reconfiguration,” IEEE Trans. Contr. Netw. Syst., 2021, in press. data fusion: A review of the state-of-the-art,” Information Fusion, [13] T. Li, Z. Liu, and Q. Pan, “Distributed Bernoulli filtering for target vol. 14, no. 1, pp. 28–44, 2013. detection and tracking based on arithmetic average fusion,” IEEE Signal [41] T. Bailey, S. Julier, and G. Agamennoni, “On conservative fusion of Processing Letters, vol. 26, no. 12, pp. 1812–1816, Dec. 2019. information with unknown non-Gaussian dependence,” in Proc. FUSION [14] K. Da, T. Li, Y. Zhu, and Q. Fu, “Gaussian mixture particle jump- 2012, Singapore, Jul. 2012, pp. 1876–1883. Markov-CPHD fusion for multitarget tracking using sensors with limited [42] S. Julier and J. Uhlmann, “Fusion of time delayed measurements with views,” IEEE Trans. Signal Inform. Process. Netw., vol. 6, pp. 605–616, uncertain time delays,” in Proc. ACC 2005, 2005, pp. 4028–4033 vol. Aug. 2020. 6. [15] L. Gao, G. Battistelli, and L. Chisci, “Multiobject fusion with minimum [43] X. Wang, S. Sun, T. Li, and Y. Liu, “Fault tolerant multi-robot co- information loss,” IEEE Signal Process. Lett., vol. 27, pp. 201 – 205, operative localization based on covariance union,” IEEE Robotics and Jan. 2020. Automation Letters, vol. 6, no. 4, pp. 7799–7806, 2021.
VOL.XX, NO.XX, 1 OCT. 2021 9 [44] D. J. Salmond, “Mixture reduction algorithms for point and extended [71] N. R. Ahmed and M. Campbell, “Fast consistent chernoff fusion of object tracking in clutter,” IEEE Transactions on Aerospace and Elec- Gaussian mixtures for ad hoc sensor networks,” IEEE Transactions on tronic Systems, vol. 45, no. 2, pp. 667–686, 2009. Signal Processing, vol. 60, no. 12, pp. 6739–6745, 2012. [45] A. R. Runnalls, “Kullback-Leibler approach to Gaussian mixture reduc- [72] N. Ahmed, J. R. Schoenberg, and M. E. Campbell, Fast Weighted tion,” IEEE Trans. Aerosp. Electron. Syst., vol. 43, no. 3, pp. 989–999, Exponential Product Rules for Robust General Multi-Robot Data Fusion. July 2007. MIT Press, 2013, pp. 9–16. [46] S. Julier and J. Uhlmann, “General decentralized data fusion with [73] Y. Zhang, W. Liu, Z. Chen, J. Wang, and K. Li, “On the covariance intersection (CI),” in Handbook of Data Fusion, D. Hall and properties of Kullback-Leibler divergence between Gaussians,” J. Llinas, Eds. Boca Raton, FL, USA: CRC Press, 2001, ch. 12, pp. arxiv.org/abs/2102.05485, 2021. 1–25. [74] J.-Y. Chen, J. R. Hershey, P. A. Olsen, and E. Yashchin, “Accelerated [47] S. Reece and S. Roberts, “Robust, low-bandwidth, multi-vehicle map- monte carlo for Kullback-Leibler divergence between Gaussian mixture ping,” in 2005 7th International Conference on Information Fusion, models,” in 2008 IEEE International Conference on Acoustics, Speech vol. 2, 2005, pp. 8 pp.–. and Signal Processing, 2008, pp. 4553–4556. [48] J. Sijs and M. Lazar, “State fusion with unknown correlation: Ellipsoidal [75] J. R. Hershey and P. A. Olsen, “Approximating the Kullback Leibler intersection,” Automatica, vol. 48, no. 8, pp. 1874–1878, 2012. divergence between Gaussian mixture models,” in Proc. ICASSP ’07, [49] B. Noack, J. Sijs, M. Reinhardt, and U. D. Hanebeck, “Decentralized vol. 4, 2007, pp. IV–317–IV–320. data fusion with inverse covariance intersection,” Automatica, vol. 79, [76] N. Bouhlel and A. Dziri, “Kullback-Leibler divergence between mul- pp. 35–41, 2017. tivariate generalized Gaussian distributions,” IEEE Signal Processing [50] B. Noack, U. Orguner, and U. D. Hanebeck, “Nonlinear decentralized Letters, vol. 26, no. 7, pp. 1021–1025, 2019. data fusion with generalized inverse covariance intersection,” in Proc. [77] M. Fontana, A. F. Garcia-Fenandez, and S. Maskell, “Bernoulli merging FUSION 2019, 2019, pp. 1–7. for the poisson multi-bernoulli mixture filter,” in Proc. FUSION 2020, [51] J. Arenas-Garcia, A. Figueiras-Vidal, and A. Sayed, “Mean-square Rustenburg, South Africa, 2020, pp. 1–8. performance of a convex combination of two adaptive filters,” IEEE [78] M. F. Huber, T. Bailey, H. Durrant-Whyte, and U. D. Hanebeck, “On Trans. Signal Process., vol. 54, no. 3, pp. 1078–1090, 2006. entropy approximation for Gaussian mixture random vectors,” in 2008 [52] Y. Zhang and J. Chambers, “Convex combination of adaptive filters for IEEE International Conference on Multisensor Fusion and Integration a variable tap-length LMS algorithm,” IEEE Signal Processing Letters, for Intelligent Systems, 2008, pp. 181–188. vol. 13, no. 10, pp. 628–631, 2006. [79] J. V. Michalowicz, J. M. Nichols, and F. Bucholtz, “Calculation of [53] N. J. Bershad, J. C. M. Bermudez, and J.-Y. Tourneret, “An affine differential entropy for a mixed Gaussian distribution,” Entropy, vol. 10, combination of two lms adaptive filters - transient mean-square analysis,” no. 3, pp. 200–206, 2008. IEEE Transactions on Signal Processing, vol. 56, no. 5, pp. 1853–1864, [80] J.-L. Durrieu, J.-P. Thiran, and F. Kelly, “Lower and upper bounds for 2008. approximation of the Kullback-Leibler divergence between Gaussian [54] N. Takahashi, I. Yamada, and A. H. Sayed, “Diffusion least-mean mixture models,” in 2012 IEEE International Conference on Acoustics, squares with adaptive combiners: Formulation and performance analy- Speech and Signal Processing (ICASSP), 2012, pp. 4833–4836. sis,” IEEE Transactions on Signal Processing, vol. 58, no. 9, pp. 4795– [81] S. Cui and M. Datcu, “Comparison of Kullback-Leibler divergence 4810, 2010. approximation methods between Gaussian mixture models for satellite [55] F. S. Cattivelli and A. H. Sayed, “Diffusion strategies for distributed image retrieval,” in 2015 IEEE International Geoscience and Remote kalman filtering and smoothing,” IEEE Transactions on Automatic Sensing Symposium (IGARSS), 2015, pp. 3719–3722. Control, vol. 55, no. 9, pp. 2069–2084, 2010. [82] F. Nielsen and K. Sun, “Guaranteed bounds on the Kullback-Leibler [56] Y. Wang and X. R. Li, “A fast and fault-tolerant convex combination divergence of univariate mixtures,” IEEE Signal Processing Letters, fusion algorithm under unknown cross-correlation,” in Proc. FUSION vol. 23, no. 11, pp. 1543–1546, 2016. 2009, Seattle, Washington, USA, Jul. 2009, pp. 571–578. [83] F. Wang, T. Syeda-Mahmood, B. Vemuri, D. Beymer, and A. Rangarajan, [57] R. P. S. Mahler, “The multisensor PHD filter: II. Erroneous solution via “Closed-form Jensen-renyi divergence for mixture of Gaussians and Poisson magic,” in Proc. SPIE, vol. 7336, 2009, pp. 7336–12. applications to group-wise shape registration,” Med Image Comput [58] A. E. Abbas, “A Kullback-Leibler view of linear and log-linear pools,” Comput Assist Interv., vol. 12, no. Pt 1, pp. 648–655, 2009. Decision Analysis, vol. 6, no. 1, pp. 25–37, 2009. [84] D. W. Scott and W. F. Szewczyk, “From kernels to mixtures,” Techno- [59] K. Da, T. Li, Y. Zhu, H. Fan, and Q. Fu, “Kullback-Leibler averaging metrics, vol. 43, no. 3, pp. 323–335, 2001. for multitarget density fusion,” in Proc. DCAI 2019, Avila, Spain, Jun. [85] J. L. Williams and P. S. Maybeck, “Cost-function-based hypothesis 2019, pp. 253–261. control techniques for multiple hypothesis tracking,” Math Comput [60] T. Ardeshiri, U. Orguner, and E. Özkan, “Gaussian mixture Model., vol. 43, no. 9-10, pp. 976–989, 2006. reduction using reverse Kullback-Leibler divergence,” 2015, [86] T. Kropfreiter, F. Meyer, and F. Hlawatsch, “A fast labeled multi- arxiv.org/abs/1508.05514. Bernoulli filter using belief propagation,” IEEE Trans. Aerosp. Electron. [61] T. Li, “Why mixture?” [Online] techXiv:15082113, 2021. Syst., vol. 56, no. 3, pp. 2478–2488, Jun 2020. [62] R. E. Blahut, Principles and Practice of Information Theory. USA: [87] T. Li, H. Chen, S. Sun, and J. M. Corchado, “Joint smoothing and Addison-Wesley Longman Publishing Co., Inc., 1987. tracking based on continuous-time target trajectory function fitting,” [63] T. M. Cover and J. A. Thomas, Elements of Information Theory. John IEEE Transactions on Automation Science and Engineering, vol. 16, Wiley & Sons, Ltd, 2001. no. 3, pp. 1476–1483, 2019. [64] T. Li, V. Elvira, H. Fan, and J. M. Corchado, “Local-diffusion-based dis- [88] T. Li and H. Fan, “A computationally efficient approach to non- tributed SMC-PHD filtering using sensors with limited sensing range,” cooperative target detection and tracking with almost no a-priori in- IEEE Sensors Journal, vol. 19, no. 4, pp. 1580–1589, Feb. 2019. formation,” 2021, arXiv:2104.11122. [65] T. Li, K. Da, Z. Liu, X. Wang, and Y. Liang, “Best fit of mixture for computationally efficient poisson multi-bernoulli mixture filtering,” DOI: 10.36227/techrxiv.12351710, 2021, under revision in IEEE TSP. [66] M. Üney, D. E. Clark, and S. J. Julier, “Information measures in distributed multitarget tracking,” in Proc. Fusion 2011, Chicago, IL, USA, Jul. 2011. [67] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Syst. Control. Lett., vol. 53, no. 1, pp. 65–78, 2004. [68] F. Nielsen, “An information-geometric characterization of Chernoff information,” IEEE Signal Processing Letters, vol. 20, no. 3, pp. 269– 272, 2013. [69] M. Üney, J. Houssineau, E. Delande, S. J. Julier, and D. E. Clark, “Fusion of finite set distributions: Pointwise consistency and global cardinality,” IEEE Trans. Aerosp. Electron. Syst., vol. 55, no. 6, pp. 2759–2773, 2019. [70] M. B. Hurley, “An information theoretic justification for covariance intersection and its generalization,” in Proc. FUSION 2002, Annapolis, MD, USA, Jul. 2002, pp. 505–511.
You can also read