Observational Methods - Lecture 9: Data Analysis WS 2020/21 Prof Joe Mohr, LMU
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Observational Methods Lecture 9: Data Analysis Prof Joe Mohr, LMU WS 2020/21 Observational Methods - Lecture 9 12. Feb 2021 1
Summary • Poisson and Gaussian Noise • Fitting Data and Confidence Intervals • Mock Datasets Observational Methods - Lecture 9 12. Feb 2021 2
Poisson Noise • Source of constant flux has a fixed probability per unit time of photon arriving o Poisson distribution describes prob of observing x photons given the expectation is µ= µ x −µ P (x | µ) = e x! o At low expectation, significant assymmetry in Poisson distribution o Probability of detecting 0 is still significant if expectation is 3 or 4 • Poisson noise generally termed “sampling noise” o Applied broadly to physical processes Poisson distribution for m=3, 6, 10.3 Observational Methods - Lecture 9 12. Feb 2021 3
Normal or Gaussian Distribution • Gaussian or normal distribution described as: 2 1 " x−µ % 1 − $ 2# σ & ' P ( x | µ, σ ) = e 2πσ o Where expectation µ= and FWHM 2.354s o 68% of prob lies between +/-1s o Note the normalization • Distribution is symmetric about the mean µ. o 68% of the integral lies between µ+/-s o 95% between µ+/-2s o 99.7% between µ+/-3s o 6.3x10-5 outside µ+/-4s o 5.7x10-7 outside µ+/-5s o 2.0x10-9 outside µ+/-6s Observational Methods - Lecture 9 12. Feb 2021 4
Poisson Noise in Gaussian Limit • In the limit that >10 the Poisson distribution can be approximated as Gaussian or normal distribution with mean µ and s=sqrt(µ) • Out in the tails of the distribution the differences are (much) larger. Observational Methods - Lecture 9 12. Feb 2021 5
Variance and Standard Deviation • Width of distribution indicates range of values obtained 1 n 2 within a set of measurements 1 σ 2 ≡ ∑( x − µ ) n i=1 • The variance s2 is the mean squared deviation from the 1 n 2 mean. For an ensemble of n 2 2 σ ≡ ∑ ( x − µmeas ) points drawn from a n −1 i=1 distribution the variance can be o Slight difference when mean is extracted from the dataset (see Eq. 2) o For a probability distribution P(x) rather than a sample drawn from the distribution +∞ one calculates the variance as in Eq. 3 2 σ2 ≡ ∫ dx ( x − µ ) P ( x ) • The Standard Deviation s is 3 −∞ the square root of the +∞ variance µ≡ ∫ dx x P ( x ) −∞ Observational Methods - Lecture 9 12. Feb 2021 6
Gaussian Distribution +∞ • Variance in Gaussian 2 σ ≡ 2 ∫ dx ( x − µ ) P ( x ) distribution is in fact the −∞ s2 that appears in the +∞ 1 % x−µ ( − '' 2 ** 2 distribution 2 2 1 2& σ w ) σ = ∫ dx ( x − µ ) 2πσ w e −∞ σ 2 = σ w2 • The standard deviation of the Gaussian sets the standard for discussing measurement significance Observational Methods - Lecture 9 12. Feb 2021 7
Skewness and Kurtosis • In addition to the first moment (mean), second moment +∞ (variance) of distributions P(x), 3 the third moment (Skewness) Skewness ≡ ∫ dx ( x − µ ) P ( x ) and fourth moments (Kurtosis) −∞ are also often valuable • Skewness measures the asymmetry of distribution about the mean • Kurtosis measures the extent +∞ 4 of a distribution around the Kurtosis ≡ ∫ dx ( x − µ ) P ( x ) − 3 mean with respect to the −∞ Gaussian distribution Observational Methods - Lecture 9 12. Feb 2021 8
Poisson Distribution • The variance of the Poisson distribution is s2=µ • Consider a source of a particular flux f and to observations of length t1 and 2t1 o The photon number N1=f*t1 and N2=2N1 o Standard deviations are s1 and s2 σ 1 = N1 and σ 2 = N 2 = 2σ 1 o So the noise is higher in sources that are observed with more photons o Importantly, the fractional noise s/N drops the more counts one obtains 1 1 σ1 σ 1 N1 = and σ 2 = = N1 N2 2 Observational Methods - Lecture 9 12. Feb 2021 9
Mapping and Sample Purity • Common application of Gaussian statistics is within source finding in maps whose noise is Gaussian. Gaussian noise is not so uncommon. o A good example would be mm-wave maps (like from SPT) o Optical/NIR images where background count levels are high • Why on earth might one ever be interested in restricting to 5 sigma sources? o Prob is 6x10-7 of Gaussian distribution delivering such an outlier o Typically in mapping experiments the solid angle mapped encompasses many PSFs or Beams. So even extremely rare events are possible • Ex: SPT beam is ~1 arcmin2. So there are 3600 independent beams per deg2. In a survey of 2500 deg2 there are then ~5 noise fluctuations above 5s expected o Because SZE selected galaxy clusters are quite rare (~500 over same area), a restriction to 5s is required to keep contamination at ~5/500=1% level Observational Methods - Lecture 9 12. Feb 2021 10
Fitting to a Model • Comparison of data to theory can be carried out using a least squares fit 2 n # yi,obs − ymod ( xi ) & χ ≡ ∑% 2 ( i=1 $ σi ' • One chooses the model parameters such that c2 is minimized. • Where does this come from? o In the limit of Gaussian measurement uncertainties si and n measurements we could construct the Likelihood of the measurements and model as the product of the individual Gaussian probabilities 2 # y −y ( xi ) &( n − 12 % i,obs mod 1 σi L≡∏ e $ ' i=1 2πσ i 2 " yi,obs − ymod ( xi ) % n −2 ln L = ∑$ i=1 # σi ' + 2 ln & ( 2πσ i ) o The last term does not depend on the model parameters if si is known and so can be dropped Observational Methods - Lecture 9 12. Feb 2021 11
Minimizing c 2 • Any model is possible. In general one must have the number of observations n be large compared to the number of model parameters p or else the parameter values are not well constrained • There are many tools that have been developed to find the best fit o Matrix inversion including singular value decomposition for linear systems o Simplex minimization (e.g. Amoeba) o Methods that explicitly use the functional form of the gradient o Levenberg-Marquardt iterative method • Least squares fitting has been justified within the context of Gaussian, independent errors. It is nevertheless useful in a much broader context Observational Methods - Lecture 9 12. Feb 2021 12
Fitting to a Model in the Poisson Limit • Often in astronomy one is working in the Poisson limit– where the number of detected photons or objects is subject to Poisson noise and the expectation value is low enough that one cannot apply Gaussian statistics o Consider a study of the galaxy luminosity function of a galaxy cluster– at least on the bright end • Even in situations where there is plenty of data, one often ends up working in the Poisson limit o Examining trends in the behavior of the sample with mass, redshift or some other property often drives the analysis of smaller and smaller subsets of the data o To study the distribution of galaxy clusters in observable and mass one typically introduces binning such that most bins have zero occupation and a few bins have occupation numbers of >=1 • In such a case one turns to the Poisson likelihood directly: ) x µ −µ P (x | µ) = e ln * = x ln , − , − x ln . + x ln ℒ = % ln *& x! &'( o One must simply evaluate the expectation value of the model µ for each subsample of the data (in each bin in luminosity and redshift, for example) o Note that likelihood is sensitive even to empty bins, note also Stirling’s approximation for ln(x!) Observational Methods - Lecture 9 12. Feb 2021 13
Background in Poisson Limit • Typically in astronomy analyses, there is a background o In Gaussian limit can simply subtract background, adjust uncertainty • if # = % − ', then -./ = -0/ + -1/ (Gaussian error propagation) o In Poisson limit one is working with integer data and cannot simply subtract the background • A forward modeling of the full signal (source plus background) is the typical solution o The model becomes the sum of the (observed) background and the adopted model that is being fitted o The observation becomes the actual number of objects or photons in each bin, regardless of whether those objects/photons are source or background • This forward modeling approach works just as well in the Gaussian limit and so is typically the best way to model data in astronomy Observational Methods - Lecture 9 12. Feb 2021 14
Robust Statistics • Real datasets are often not described by simple Gaussian distributions- often there is a small fraction of objects that exhibit much larger deviations than expected • A c2 is strongly affected by outliers, because it is the sum of the square of the deviations. In the absence of prior knowledge of the true underlying distribution, a variety of tools exist to deal with this issue: o One can use the sum of the absolute value of the deviations o One can use the full distribution of deviations, extracting a characteristic value using the median of the distribution • MAD: median absolute deviation- perhaps normalized to act like a Gaussian s (NMAD) Observational Methods - Lecture 9 12. Feb 2021 15
Goodness of Fit • In general one can choose a minimization algorithm that will find the minimum c2, corresponding to the “best fit” But the “best fit” need not be a “good fit.” • One can use the value of the minimized c2 to evaluate tension between the data and the model o The c2 probability function Q(c2|n) gives the expected range of c2 assuming that the underlying errors are Gaussian and uncorrelated o To evaluate the goodness of fit one needs the number of degrees of freedom n, which is defined to be n=n-p, the difference between the number of observations and the number of free parameters • A reduced c2red=c2/n should be ~1, corresponding to a typical deviation of 1s for each measurement o Values that are too large indicate inconsistency between the data and the model o Values that are too small suggest flaws with the uncertainties (s too large or correlated measurements) Observational Methods - Lecture 9 12. Feb 2021 16
Confidence Intervals • Within a c2 context one can interpret a change in c2 or a Dc2 in a probabilistic sense, and this allows one to define confidence intervals on parameters • One can use this approach to define single parameter uncertainties or joint parameter uncertainties • This table shows the Dc2 corresponding to 1, 2 and 3s parameter intervals in the case where we have 1, 2 and 3 parameters of interest in our model p 1 2 3 68.3% 1.00 2.30 3.53 95.4% 4.00 6.17 8.02 99.73% 9.00 11.8 14.2 Observational Methods - Lecture 9 12. Feb 2021 17
Error Propagation • Typically parameters are presented with their uncertainties and these are often called “errors” • Within the context of Gaussian, uncorrelated errors it is straightforward to propagate these errors • Consider a function of x and y: f(x,y) where there is Gaussian scatter with sx and sy for the two variables. %" # # %" %" %" # # !"# = !& + !& !( + !( %& %& %( %( o In case of independent errors in x and y and middle term vanishes Observational Methods - Lecture 9 12. Feb 2021 18
Mock Samples • A valuable technique is to create mock samples of observations that exhibit the noise properties and other characteristics expected of your dataset o One can use these mocks to test fitting routines o One can use these mocks to establish the significance of deviations of data from a particular model or to establish confidence intervals on parameters • The core underlying tool is the ability to produce a Uniform Random Deviate, which is a random number from 0 to 1 • Operating systems offer such tools, and scientific algorithm packages (e.g. Numerical Recipes) typically offer improved options Observational Methods - Lecture 9 12. Feb 2021 19
Arbitrary Distributions • Using the uniform random deviate to produce more generic distributions f(x) is straightforward o One transforms to the cumulative distribution F(x) of the generic distribution f(x), selects a URD and then infers the value x at which the cumulative distribution F(x) equals that value o To find F-1(x) one typically uses a root finder algorithm Observational Methods - Lecture 9 12. Feb 2021 20
Cumulative Distributions • A particularly useful way of probing for differences in two observed distributions or an observed distribution and a model is to examine the cumulative distributions • The Kolmogorov-Smirnov (K-S) test allows one to characterize the probability of the maximum distance between the two distributions o One extracts the probability that two distributions are drawn from the same parent distribution • There are many variants of the K-S test. o 2D K-S test, Anderson-Darling test that improves sensitivity to changes in the tails of the distribution • With a model one can draw many random samples and place the observed sample in the context of the large ensemble of randoms to quantify the probability the observed distribution is consistent with the modelled distribution Observational Methods - Lecture 9 12. Feb 2021 21
Power Law Relations • Physical parameters of astrophysical objects often span many orders of magnitude and β there often exist relations among them !M $ LX ( M ) = α # & • These “scaling relations” are typically fit by power law relations " M0 % • Often these relations are fit in log space rather than linear space, but they need not be • Linear space: A goodness of fit measure !M $ (! ")will be heavily influenced by the systems log LX ( M ) = log α + β log # & with the largest values (e.g., Luminosity or M " 0% Mass), whereas the systems with small values will have little impact. • Log space: One is effectively using the fractional deviation between the model and #& the data, and so every system (high or low # ln & = value) has similar impact (reflected by & uncertainty) Observational Methods - Lecture 9 12. Feb 2021 22
Normal and Log-Normal • When fitting a relation one must be careful to characterize the uncertainties properly • Often the noise is characterized as log-normal or normal o Poisson noise in Gaussian limit-> normal o Intrinsic variation-> log-normal " x− x %2 " log x− log x %2 − 12 $ ' 1 − 12 $ # σ & ' 1 # σ & P(x) = e P(log x) = e 2πσ 2πσ • Implications are dramatically different in limit of dataset with values extending over an order of magnitude o And thus fit results are sensitive to this choice Observational Methods - Lecture 9 12. Feb 2021 23
Pivot Points and Parameter Correlations • When fitting a power law one can inadvertently introduce a logY = A(log X − log X pivot ) + B false correlation between the amplitude and the slope by poorly choosing the pivot point • In general, when fitting power • Consider a normal distribution of points in a logX-logY space law relations, one wants to with mean position select the pivot point in the (,) and dispersion independent variable to be the slogX-Y mean of the sample • Choosing the pivot at would lead to uncorrelated • Similar thinking should guide errors in parameters A and B parametrization in more general situations • Choosing the pivot away from introduces a strong correlation between parameters A and B Observational Methods - Lecture 9 12. Feb 2021 24
Intrinsic Scatter and Parameter Uncertainties • Often “measurement • Example: cluster redshift uncertainties” are not enough to o Measure redshift of single galaxy with measurement uncertainty of 50 km/s estimate the true uncertainties o Have you measured the cluster on the parameters redshift with the same accuracy? o Consider the case that the velocity dispersion of the cluster is 1000km/s • Often times intrinsic scatter must be included to get realistic • In cases where the intrinsic estimates of the parameter scatter isn’t understood one uncertainties can often estimate the scatter by requiring that the • If a model of intrinsic scatter exists best fit model produces a then one must adopt intrinsic reduced c2=1 scatter as another source of measurement uncertainty • In fact, often the intrinsic (added in quadrature, assuming scatter is at least as important independence from from a science perspective measurement noise) as any other parameter of the model 2 2 2 σ tot = σ meas + σ int Observational Methods - Lecture 9 12. Feb 2021 25
Malmquist Bias 0.5 1.0 1.5 • Often times one studies a sample log luminosity of flux limited objects −0.5 • Such a selection introduces biases– the more luminous an −1.5 object the better represented it is in the sample −1.0 −0.5 0.0 log mass 0.5 1.0 o Malmquist 1925 o Consider the volume within which an object 0.5 1.0 1.5 would exceed the selection threshold log luminosity • When fitting power laws to a sample the results of the −0.5 Malmquist bias can be dramatic! o To avoid biases one typically has to include the selection effects in the model −1.5 o The intrinsic scatter is critical here Mantz et al 2010 −1.0 −0.5 0.0 0.5 1.0 log mass Observational Methods - Lecture 9 12. Feb 2021 26
Eddington Bias • Symmetric scatter (normal • Following discussion in or log-normal) can lead to Mortonson et al 2011, the net biases in common effect is: astrophysical situations ln M M = ln M obs + 12 nσ M2 • Eddington bias occurs when o In the presence of log-normal the underlying population is scatter in the Mobs with width sM varying rapidly with the o This result can be derived from a observable Bayesian framework which expresses the probability of a true mass M given an observed mass • Consider the mass function Mobs as the product of the of objects P(M)~Mn with n
Scaling Relations and 0.5 1.0 1.5 the Eddington Bias log luminosity • Similarly to the Malmquist −0.5 bias, the Eddington bias makes it difficult to extract −1.5 the true underlying model from an observed (selected) −1.0 −0.5 0.0 0.5 1.0 dataset log mass 0.5 1.0 1.5 • As previously noted, the log luminosity likelihood of a particular set of parameters must include −0.5 the selection effects to enable extraction of an −1.5 Mantz et al 2010 unbiased answer −1.0 −0.5 0.0 0.5 1.0 log mass Observational Methods - Lecture 9 12. Feb 2021 28
References • Astronomy Methods (Bradt) • Numerical Recipes in C (Press et al, 2nd Edition) • Mortonson et al 2011 • Mohr et al 1999 • Mantz et al 2010 Observational Methods - Lecture 9 12. Feb 2021 29
You can also read