Observational Methods - Lecture 9: Data Analysis WS 2020/21 Prof Joe Mohr, LMU

Page created by Amy Cortez
 
CONTINUE READING
Observational Methods

             Lecture 9: Data Analysis

                               Prof Joe Mohr, LMU
                                   WS 2020/21

Observational Methods - Lecture 9                   12. Feb 2021   1
Summary

              • Poisson and Gaussian Noise

              • Fitting Data and Confidence Intervals

              • Mock Datasets

Observational Methods - Lecture 9                  12. Feb 2021   2
Poisson Noise
•    Source of constant flux has
     a fixed probability per unit
     time of photon arriving
      o Poisson distribution describes prob
        of observing x photons given the
        expectation is µ=

                            µ x −µ
                P (x | µ) =    e
                            x!
      o At low expectation, significant
        assymmetry in Poisson distribution
      o Probability of detecting 0 is still
        significant if expectation is 3 or 4

• Poisson noise generally
  termed “sampling noise”
      o Applied broadly to physical
        processes
                                               Poisson distribution for m=3, 6, 10.3

    Observational Methods - Lecture 9                                    12. Feb 2021   3
Normal or Gaussian Distribution
•    Gaussian or normal
     distribution described as:          2
                                1 " x−µ %
                          1    − $
                                2# σ &
                                        '
        P ( x | µ, σ ) =     e
                         2πσ
      o Where expectation µ= and
        FWHM 2.354s
      o 68% of prob lies between +/-1s
      o Note the normalization

•    Distribution is symmetric
     about the mean µ.
      o 68% of the integral lies between
        µ+/-s
      o 95% between µ+/-2s
      o 99.7% between µ+/-3s
      o 6.3x10-5 outside µ+/-4s
      o 5.7x10-7 outside µ+/-5s
      o 2.0x10-9 outside µ+/-6s

    Observational Methods - Lecture 9        12. Feb 2021   4
Poisson Noise in Gaussian Limit

• In the limit that  >10
  the Poisson distribution
  can be approximated
  as Gaussian or normal
  distribution with mean µ
  and s=sqrt(µ)

• Out in the tails of the
  distribution the
  differences are (much)
  larger.

 Observational Methods - Lecture 9   12. Feb 2021   5
Variance and Standard Deviation
•    Width of distribution indicates
     range of values obtained                                   1  n
                                                                            2
     within a set of measurements                      1   σ 2 ≡ ∑( x − µ )
                                                                n i=1
•    The variance s2 is the mean
     squared deviation from the                                  1   n
                                                                                      2
     mean. For an ensemble of n                        2    2
                                                           σ ≡     ∑    ( x − µmeas )
     points drawn from a                                       n −1 i=1
     distribution the variance can
     be
      o   Slight difference when mean is extracted
          from the dataset (see Eq. 2)
      o   For a probability distribution P(x) rather
          than a sample drawn from the distribution               +∞
          one calculates the variance as in Eq. 3                                  2
                                                           σ2 ≡   ∫ dx ( x − µ ) P ( x )
•    The Standard Deviation s is                       3
                                                                  −∞
     the square root of the                                       +∞
     variance                                               µ≡    ∫ dx x P ( x )
                                                                  −∞

    Observational Methods - Lecture 9                                         12. Feb 2021   6
Gaussian Distribution
                                           +∞
• Variance in Gaussian                 2
                                     σ ≡
                                                             2
                                            ∫ dx ( x − µ ) P ( x )
  distribution is in fact the              −∞
  s2 that appears in the                   +∞                               1 % x−µ (
                                                                           − '' 2 **
                                                                                      2

  distribution                         2                     2    1         2& σ w )
                                     σ =    ∫ dx ( x − µ )       2πσ w
                                                                       e
                                           −∞

                                     σ 2 = σ w2
• The standard deviation
  of the Gaussian sets the
  standard for discussing
  measurement
  significance

 Observational Methods - Lecture 9                                   12. Feb 2021         7
Skewness and Kurtosis
• In addition to the first moment
  (mean), second moment                             +∞
  (variance) of distributions P(x),                                  3

  the third moment (Skewness)
                                      Skewness ≡    ∫ dx ( x − µ ) P ( x )
  and fourth moments (Kurtosis)                     −∞

  are also often valuable

• Skewness measures the
  asymmetry of distribution
  about the mean

• Kurtosis measures the extent                     +∞
                                                                 4
  of a distribution around the        Kurtosis ≡   ∫ dx ( x − µ ) P ( x ) − 3
  mean with respect to the                         −∞
  Gaussian distribution

 Observational Methods - Lecture 9                           12. Feb 2021   8
Poisson Distribution
• The variance of the Poisson distribution is s2=µ

• Consider a source of a particular flux f and to
  observations of length t1 and 2t1
    o The photon number N1=f*t1 and N2=2N1
    o Standard deviations are s1 and s2

                        σ 1 = N1 and σ 2 = N 2 = 2σ 1
    o So the noise is higher in sources that are observed with more photons

    o Importantly, the fractional noise s/N drops the more counts one obtains

                              1            1    σ1
                     σ 1 N1 =    and σ 2 =    =
                              N1           N2    2
Observational Methods - Lecture 9                                   12. Feb 2021   9
Mapping and Sample Purity
• Common application of Gaussian statistics is within
  source finding in maps whose noise is Gaussian. Gaussian
  noise is not so uncommon.
   o A good example would be mm-wave maps (like from SPT)
   o Optical/NIR images where background count levels are high

• Why on earth might one ever be interested in restricting
  to 5 sigma sources?
   o Prob is 6x10-7 of Gaussian distribution delivering such an outlier
   o Typically in mapping experiments the solid angle mapped encompasses many
     PSFs or Beams. So even extremely rare events are possible

• Ex: SPT beam is ~1 arcmin2. So there are 3600
  independent beams per deg2. In a survey of 2500 deg2
  there are then ~5 noise fluctuations above 5s expected
   o Because SZE selected galaxy clusters are quite rare (~500 over same area), a
     restriction to 5s is required to keep contamination at ~5/500=1% level

 Observational Methods - Lecture 9                                      12. Feb 2021   10
Fitting to a Model
•    Comparison of data to theory can be carried out using a least
     squares fit                              2
                            n            # yi,obs − ymod ( xi ) &
                                 χ ≡ ∑%
                                     2
                                                                (
                                     i=1 $         σi           '
•    One chooses the model parameters such that c2 is minimized.
•    Where does this come from?
      o   In the limit of Gaussian measurement uncertainties si and n measurements we could construct
          the Likelihood of the measurements and model as the product of the individual Gaussian
          probabilities
                                                                                    2
                                                           #    y    −y     ( xi ) &(
                                         n               − 12 % i,obs mod
                                                1                     σi
                                 L≡∏                 e        $                    '

                                         i=1   2πσ i
                                                                    2
                               " yi,obs − ymod ( xi ) %
                                 n
                 −2 ln L = ∑$
                           i=1 #         σi
                                                      ' + 2 ln
                                                      &
                                                                                        (   2πσ i   )
      o   The last term does not depend on the model parameters if si is known and so can be
          dropped

    Observational Methods - Lecture 9                                                                   12. Feb 2021   11
Minimizing                          c 2

• Any model is possible. In general one must have the
  number of observations n be large compared to the
  number of model parameters p or else the
  parameter values are not well constrained

• There are many tools that have been developed to
  find the best fit
   o   Matrix inversion including singular value decomposition for linear systems
   o   Simplex minimization (e.g. Amoeba)
   o   Methods that explicitly use the functional form of the gradient
   o   Levenberg-Marquardt iterative method

• Least squares fitting has been justified within the
  context of Gaussian, independent errors. It is
  nevertheless useful in a much broader context
 Observational Methods - Lecture 9                                      12. Feb 2021   12
Fitting to a Model in the Poisson Limit
•    Often in astronomy one is working in the Poisson limit– where the
     number of detected photons or objects is subject to Poisson
     noise and the expectation value is low enough that one cannot
     apply Gaussian statistics
      o   Consider a study of the galaxy luminosity function of a galaxy cluster– at least on the bright end

•    Even in situations where there is plenty of data, one often ends
     up working in the Poisson limit
      o   Examining trends in the behavior of the sample with mass, redshift or some other property often drives
          the analysis of smaller and smaller subsets of the data
      o   To study the distribution of galaxy clusters in observable and mass one typically introduces binning
          such that most bins have zero occupation and a few bins have occupation numbers of >=1

•    In such a case one turns to the Poisson likelihood directly:
                                                                                               )
                          x
                        µ −µ
          P (x | µ) =      e        ln * = x ln , − , − x ln . + x                   ln ℒ = % ln *&
                        x!                                                                    &'(
      o   One must simply evaluate the expectation value of the model µ for each subsample of the data (in
          each bin in luminosity and redshift, for example)
      o   Note that likelihood is sensitive even to empty bins, note also Stirling’s approximation for ln(x!)

    Observational Methods - Lecture 9                                                              12. Feb 2021   13
Background in Poisson Limit
• Typically in astronomy analyses, there is a background
    o In Gaussian limit can simply subtract background, adjust uncertainty
        • if # = % − ', then -./ = -0/ + -1/ (Gaussian error propagation)

    o In Poisson limit one is working with integer data and cannot simply subtract the
      background

• A forward modeling of the full signal (source plus
  background) is the typical solution
    o The model becomes the sum of the (observed) background and the adopted
      model that is being fitted
    o The observation becomes the actual number of objects or photons in each
      bin, regardless of whether those objects/photons are source or background

• This forward modeling approach works just as well in the
  Gaussian limit and so is typically the best way to model
  data in astronomy

 Observational Methods - Lecture 9                                        12. Feb 2021   14
Robust Statistics
• Real datasets are often not described by simple
  Gaussian distributions- often there is a small fraction
  of objects that exhibit much larger deviations than
  expected

• A c2 is strongly affected by outliers, because it is the
  sum of the square of the deviations. In the absence
  of prior knowledge of the true underlying distribution,
  a variety of tools exist to deal with this issue:
   o One can use the sum of the absolute value of the deviations
   o One can use the full distribution of deviations, extracting a characteristic
     value using the median of the distribution
      • MAD: median absolute deviation- perhaps normalized to act like a
         Gaussian s (NMAD)

 Observational Methods - Lecture 9                                      12. Feb 2021   15
Goodness of Fit
• In general one can choose a minimization algorithm that
  will find the minimum c2, corresponding to the “best fit”
  But the “best fit” need not be a “good fit.”

• One can use the value of the minimized c2 to evaluate
  tension between the data and the model
   o The c2 probability function Q(c2|n) gives the expected range of c2 assuming
     that the underlying errors are Gaussian and uncorrelated
   o To evaluate the goodness of fit one needs the number of degrees of freedom n,
     which is defined to be n=n-p, the difference between the number of
     observations and the number of free parameters

• A reduced c2red=c2/n should be ~1, corresponding to a
  typical deviation of 1s for each measurement
   o Values that are too large indicate inconsistency between the data and the
     model
   o Values that are too small suggest flaws with the uncertainties (s too large or
     correlated measurements)

 Observational Methods - Lecture 9                                         12. Feb 2021   16
Confidence Intervals
• Within a c2 context one can interpret a change in c2 or a
  Dc2 in a probabilistic sense, and this allows one to define
  confidence intervals on parameters

• One can use this approach to define single parameter
  uncertainties or joint parameter uncertainties

• This table shows the Dc2 corresponding to 1, 2 and 3s
  parameter intervals in the case where we have 1, 2 and 3
  parameters of interest in our model

                      p               1      2      3
                   68.3%             1.00   2.30   3.53
                   95.4%             4.00   6.17   8.02
                   99.73%            9.00   11.8   14.2

 Observational Methods - Lecture 9                        12. Feb 2021   17
Error Propagation
• Typically parameters are presented with their
  uncertainties and these are often called “errors”

• Within the context of Gaussian, uncorrelated errors it
  is straightforward to propagate these errors

• Consider a function of x and y: f(x,y) where there is
  Gaussian scatter with sx and sy for the two variables.
                                  %" # #       %" %"             %" # #
                        !"#   =       !&   +           !& !( +       !(
                                  %&           %& %(             %(

   o In case of independent errors in x and y and middle term vanishes

 Observational Methods - Lecture 9                                        12. Feb 2021   18
Mock Samples
• A valuable technique is to create mock samples of
  observations that exhibit the noise properties and
  other characteristics expected of your dataset
   o One can use these mocks to test fitting routines
   o One can use these mocks to establish the significance of deviations of data
     from a particular model or to establish confidence intervals on parameters

• The core underlying tool is the ability to produce a
  Uniform Random Deviate, which is a random number
  from 0 to 1

• Operating systems offer such tools, and scientific
  algorithm packages (e.g. Numerical Recipes)
  typically offer improved options

 Observational Methods - Lecture 9                                  12. Feb 2021   19
Arbitrary Distributions
• Using the uniform random deviate to produce more
  generic distributions f(x) is straightforward
   o One transforms to the cumulative distribution F(x) of the generic distribution
     f(x), selects a URD and then infers the value x at which the cumulative
     distribution F(x) equals that value

   o To find F-1(x) one typically uses a root finder algorithm

 Observational Methods - Lecture 9                                     12. Feb 2021   20
Cumulative Distributions
•    A particularly useful way of probing for
     differences in two observed
     distributions or an observed distribution
     and a model is to examine the
     cumulative distributions

•    The Kolmogorov-Smirnov (K-S) test
     allows one to characterize the
     probability of the maximum distance
     between the two distributions
      o   One extracts the probability that two distributions are drawn
          from the same parent distribution

•    There are many variants of the K-S test.
      o   2D K-S test, Anderson-Darling test that improves sensitivity to
          changes in the tails of the distribution

•    With a model one can draw many
     random samples and place the
     observed sample in the context of the
     large ensemble of randoms to quantify
     the probability the observed
     distribution is consistent with the
     modelled distribution

    Observational Methods - Lecture 9                                       12. Feb 2021   21
Power Law Relations
•    Physical parameters of astrophysical objects
     often span many orders of magnitude and                                         β
     there often exist relations among them                               !M $
                                                             LX ( M ) = α #    &
•    These “scaling relations” are typically fit by
     power law relations
                                                                          " M0 %
•    Often these relations are fit in log space rather
     than linear space, but they need not be

•    Linear space: A goodness of fit measure                                          !M $
     (! ")will be heavily influenced by the systems      log LX ( M ) = log α + β log #   &
     with the largest values (e.g., Luminosity or                                       M
                                                                                      " 0%
     Mass), whereas the systems with small values
     will have little impact.

•    Log space: One is effectively using the
     fractional deviation between the model and                               #&
     the data, and so every system (high or low                    # ln & =
     value) has similar impact (reflected by                                  &
     uncertainty)

    Observational Methods - Lecture 9                                         12. Feb 2021   22
Normal and Log-Normal
•    When fitting a relation one must
     be careful to characterize the
     uncertainties properly

•    Often the noise is characterized as
     log-normal or normal
      o Poisson noise in Gaussian limit-> normal
      o Intrinsic variation-> log-normal

                         " x− x %2                             " log x− log x %2
                                                          − 12 $              '
              1      − 12 $
                         # σ &
                                '                  1           #      σ       &
    P(x) =       e                   P(log x) =       e
             2πσ                                  2πσ

•    Implications are dramatically
     different in limit of dataset with
     values extending over an order of
     magnitude
      o And thus fit results are sensitive to this choice

    Observational Methods - Lecture 9                                              12. Feb 2021   23
Pivot Points and Parameter Correlations
•    When fitting a power law one
     can inadvertently introduce a          logY = A(log X − log X pivot ) + B
     false correlation between the
     amplitude and the slope by
     poorly choosing the pivot point
                                        •   In general, when fitting power
•    Consider a normal distribution
     of points in a logX-logY space         law relations, one wants to
     with mean position                     select the pivot point in the
     (,) and dispersion         independent variable to be the
     slogX-Y                                mean of the sample
•    Choosing the pivot at 
     would lead to uncorrelated         •   Similar thinking should guide
     errors in parameters A and B           parametrization in more
                                            general situations
•    Choosing the pivot away from
      introduces a strong
     correlation between
     parameters A and B

    Observational Methods - Lecture 9                              12. Feb 2021   24
Intrinsic Scatter and Parameter Uncertainties

•    Often “measurement               •         Example: cluster redshift
     uncertainties” are not enough to            o   Measure redshift of single galaxy with
                                                     measurement uncertainty of 50 km/s
     estimate the true uncertainties             o   Have you measured the cluster
     on the parameters                               redshift with the same accuracy?
                                                 o   Consider the case that the velocity
                                                     dispersion of the cluster is 1000km/s
•    Often times intrinsic scatter must
     be included to get realistic       •       In cases where the intrinsic
     estimates of the parameter                 scatter isn’t understood one
     uncertainties                              can often estimate the
                                                scatter by requiring that the
•    If a model of intrinsic scatter exists     best fit model produces a
     then one must adopt intrinsic              reduced c2=1
     scatter as another source of
     measurement uncertainty                •   In fact, often the intrinsic
     (added in quadrature, assuming             scatter is at least as important
     independence from                          from a science perspective
     measurement noise)                         as any other parameter of
                                                the model
           2       2        2
         σ tot = σ meas + σ int
    Observational Methods - Lecture 9                                         12. Feb 2021    25
Malmquist Bias

                                                               0.5 1.0 1.5
• Often times one studies a sample

                                                     log luminosity
  of flux limited objects

                                                    −0.5
• Such a selection introduces
  biases– the more luminous an

                                                                −1.5
  object the better represented it is
  in the sample                                                              −1.0   −0.5       0.0
                                                                                            log mass
                                                                                                       0.5    1.0

   o Malmquist 1925
   o Consider the volume within which an object

                                                               0.5 1.0 1.5
     would exceed the selection threshold

                                                     log luminosity
• When fitting power laws to a
  sample the results of the

                                                    −0.5
  Malmquist bias can be dramatic!
   o To avoid biases one typically has to include
     the selection effects in the model

                                                                −1.5
   o The intrinsic scatter is critical here                                                Mantz et al 2010
                                                                             −1.0   −0.5       0.0     0.5    1.0
                                                                                            log mass

 Observational Methods - Lecture 9                                                             12. Feb 2021   26
Eddington Bias
•    Symmetric scatter (normal                         •   Following discussion in
     or log-normal) can lead to                            Mortonson et al 2011, the net
     biases in common                                      effect is:
     astrophysical situations
                                                               ln M M = ln M obs + 12 nσ M2
•    Eddington bias occurs when                             o In the presence of log-normal
     the underlying population is                             scatter in the Mobs with width sM
     varying rapidly with the                               o This result can be derived from a
     observable                                               Bayesian framework which
                                                              expresses the probability of a true
                                                              mass M given an observed mass
•    Consider the mass function                               Mobs as the product of the
     of objects P(M)~Mn with n
Scaling Relations and

                                                0.5 1.0 1.5
   the Eddington Bias

                                      log luminosity
• Similarly to the Malmquist

                                     −0.5
  bias, the Eddington bias
  makes it difficult to extract

                                                  −1.5
  the true underlying model
  from an observed (selected)                                 −1.0   −0.5        0.0     0.5       1.0

  dataset
                                                                              log mass

                                                0.5 1.0 1.5
• As previously noted, the

                                      log luminosity
  likelihood of a particular set
  of parameters must include

                                     −0.5
  the selection effects to
  enable extraction of an
                                                 −1.5
                                                                            Mantz et al 2010
  unbiased answer                                             −1.0   −0.5       0.0      0.5       1.0
                                                                             log mass

 Observational Methods - Lecture 9                                                  12. Feb 2021     28
References

•    Astronomy Methods (Bradt)
•    Numerical Recipes in C (Press et al, 2nd Edition)
•    Mortonson et al 2011
•    Mohr et al 1999
•    Mantz et al 2010

    Observational Methods - Lecture 9               12. Feb 2021   29
You can also read