Machine Learning to Decipher the Astrophysical Processes at Cosmic Dawn
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
MNRAS 000, 1–17 (2021) Preprint 21 January 2022 Compiled using MNRAS LATEX style file v3.0 Machine Learning to Decipher the Astrophysical Processes at Cosmic Dawn Sudipta Sikder,1★ Rennan Barkana,1,2 Itamar Reis1 and Anastasia Fialkov3 1 School of Physics and Astronomy, Tel-Aviv University, Tel-Aviv, 69978, Israel 2 Institute for Advanced Study, 1 Einstein Drive, Princeton, New Jersey 08540, USA 3 Institute of Astronomy, University of Cambridge, Madingley Road, Cambridge, CB3 0HA, UK arXiv:2201.08205v1 [astro-ph.CO] 20 Jan 2022 Accepted XXX. Received YYY; in original form ZZZ ABSTRACT The cosmic 21-cm line of hydrogen is expected to be measured in detail by the next generation of radio telescopes. The enormous dataset from future 21-cm surveys will revolutionize our understanding of early cosmic times. We present a machine learning approach that uses emulation in order to uncover the astrophysics in the epoch of reionization and cosmic dawn. Using a seven-parameter astrophysical model that covers a very wide range of possible 21-cm signals, over the redshift range 6 to 30 and wavenumber range 0.05 Mpc−1 to 1 Mpc−1 we emulate the 21-cm power spectrum with a typical accuracy of 10 − 20%. As a realistic example, we train an emulator using the 21-cm power spectrum with an optimistic model for observational noise as expected for the Square Kilometre Array (SKA). Fitting to mock SKA data results in a typical measurement accuracy of 5% in the optical depth to the CMB, 30% in the star-formation efficiency of galactic halos, and a factor of 3.5 in the X-ray efficiency of galactic halos; the latter two parameters are currently uncertain by orders of magnitude. In addition to standard astrophysical models, we also consider two exotic possibilities of strong excess radio backgrounds at high redshifts. We use a neural network to identify the type of radio background present in the 21-cm power spectrum, with an accuracy of 87% for mock SKA data. Key words: methods: numerical – methods: statistical – dark ages, reionization, first stars – cosmology: theory 1 INTRODUCTION of redshifts including cosmic dawn. Thus, we expect a great deal of data from observations in the upcoming decade. The redshifted 21-cm signal from neutral hydrogen is the most promising probe of the Epoch of Reionization (EoR) and cosmic The question arises as to what are the possible ways to infer the dawn. This 21-cm emission or absorption originates from the hy- astrophysical parameters from the observed 21-cm power spectrum perfine splitting of the hydrogen atom. As this signal depends on data. Since the characteristic astrophysical parameters at high red- both cosmological and astrophysical parameters, it should be possi- shifts are currently almost entirely unconstrained, the 21-cm signal ble to decipher abundant information about the early universe from must be calculated for a large number of parameter sets that cover a the signal once it is observed. The Low Frequency Array (LOFAR, wide range of possibilities. Given the complexity of the 21-cm signal Gehlot et al. 2019), the Precision Array to Probe the Epoch of Reion- (see Barkana 2018a; Mesinger 2019) and its highly non-linear de- ization (PAPER, Kolopanis et al. 2019), the Murchison Wide-field pendence on the astrophysical parameters, artificial neural networks Array (MWA, Trott et al. 2020), the Owens Valley Radio Observa- (ANNs) are a useful method for emulation and fitting. Shimabukuro tory Long Wavelength Array (OVRO-LWA, Eastwood et al. 2019), & Semelin (2017) used an ANN to estimate the astrophysical param- The Large-aperture Experiment to detect the Dark Age (LEDA, Price eters from 21-cm observations. They trained the ANN using 70 data et al. 2018; Garsden et al. 2021) and the Hydrogen Epoch of Reion- sets where each set consists of the 21-cm power spectrum obtained ization Array (HERA, DeBoer et al. 2017) are experiments that have using 21cmfast (Mesinger et al. 2011) as input, with three EoR pa- analyzed data in an attempt to detect the power spectrum from the rameters used in the simulation as output. They applied the trained epoch of reionization. Although the existing upper limits are weak, ANN to 54 data sets to evaluate how the algorithm performs. Kern they already provide interesting constraints on some of the exotic sce- et al. (2017) used a machine learning algorithm to emulate the 21-cm narios (e.g. with extra radio background as considered here) (Mondal power spectrum and perform Bayesian analysis for parameter con- et al. 2020; The HERA Collaboration et al. 2021). HERA along with straints over eleven parameters which included six parameters of the the New Extension in Nançay Upgrading LOFAR (NenuFAR, Zarka EoR and X-ray heating and five additional cosmological parameters. et al. 2012) and the Square Kilometre Array (SKA, Koopmans et al. Schmit & Pritchard (2018) built an emulator using a neural network 2015) will aim to measure the power spectrum over a wide range to emulate the 21-cm power spectrum where they generated the train- ing and test data sets using the 21cmfast simulation and compared their results with 21CMMC. Cohen et al. (2019) introduced the first ★ E-mail: sudiptas@mail.tau.ac.il global 21-cm signal emulator using an ANN. Recently, Hellum Bye © 2021 The Authors
2 S. Sikder et al. et al. (2021); Bevins et al. (2021) proposed two different approches • The X ray radiation efficiency, , is defined by the standard for emulating all sky averaged (global) 21-cm signal. In this paper, expression of the ratio of the X-ray luminosity to the star formation we use an emulation method to constrain 21-cm power spectrum for rate (LX −SFR relation) [see Fialkov et al. (2014), Cohen et al. (2017) the seven-parameter astrophysical model. We construct the emulator for more details] using a large dataset of models that cover a very wide range of the LX astrophysical parameter space. Given the seven-parameter astrophys- = 3 × 1040 erg s−1 M−1 yr . (2) SFR ical model, the emulator is able to predict the 21-cm power spectrum over a wide redshift range ( = 6 to 30). We also explore a more real- In the above expression LX is the bolometric luminosity and is istic case of the observational measurements expected for the SKA, the X-ray efficiency of the source. The normalization is such that as well as extended models that also include an excess early radio = 1 corresponds to the typical observed value for low-metallicity background. galaxies. Given the almost total absence of observational constraints This paper is organised as follows: We present in section 2 a at the relevant redshifts, we vary from 0.0001 to 1000. description of the theory and methods used to generate the datasets • The power law slope and the low energy cutoff min de- (2.1 – 2.4) and build the ANN (2.5). Section 3 presents our results, termine the shape of the spectral energy distribution (SED). We for standard astrophysical models (3.1 – 3.4) and ones with an early parameterize the X-ray SED by the power law slope (where radio background (3.5 – 3.7). Finally, we summarize our results and log( )/ log( ) = − ) and the low energy cutoff ( min ). These discuss our conclusions in section 4. two parameters have significant degeneracy, so we vary in the nar- row range 1 − 1.5 and min in the broad range of 0.1 − 3.0 keV. The SEDs of the early X-ray sources strongly affect the 21-cm signal from both the EoR and cosmic dawn (Fialkov et al. 2014; Fialkov 2 THEORY AND METHODS & Barkana 2014). Soft X-ray sources (emitting mostly below 1 keV) produce strong fluctuations on relatively small scales (up to a few 2.1 21-cm signals tens of Mpc) whereas hard X-ray sources produce milder fluctuations 2.1.1 Astrophysical parameters on larger scales. X-Ray binaries (XRB) (Mirabel et al. 2011; Fragos et al. 2013) are major sources that are expected to have a hard X-ray We use seven key parameters to parameterize the high redshift as- spectral energy distribution. trophysics: the star formation efficiency ( ∗ ), the minimum circular • The optical depth of the CMB, , is one of two parameters velocity of star-forming halos ( ), the X ray radiation efficiency that describe the epoch of reionization. For given values of the other ( ), the power law slope ( ) and the low energy cutoff ( min ) of parameters, the CMB optical depth has a one to one relation with the the X ray spectral energy distribution (SED), the optical depth ( ) ionizing efficiency which is defined by of the cosmic microwave background (CMB) and the mean free path ( mfp ) of ionizing photons. Here we briefly discuss these astrophys- 1 = ∗ esc ion , (3) ical parameters. 1 + ¯ rec • The star formation efficiency, ∗ , quantifies the fractional where ∗ is the star formation efficiency, esc is the fraction of ioniz- amount of gas in star-forming dark matter halos that is converted into ing photons that escape from their host galaxy, ion is the number of stars (Tegmark et al. 1997). The value of ∗ depends on the details ionizing photons produced per stellar baryon in star-forming halos, of star formation that are unknown at high redshift, so we treat it as and ¯ rec is the mean number of recombinations per ionized hydrogen a free parameter. We assume a constant star formation efficiency in atom. We choose to include the CMB optical depth ( ) in our seven- halos heavier than the atomic cooling mass and a logarithmic cutoff parameter astrophysical model instead of the ionizing efficiency ( ) in the efficiency in lower mass halos (Fialkov et al. 2013). We cover because is directly constrained by CMB observations (Planck Col- a wide range of ∗ values, from 0.0001 to 0.5. laboration et al. 2018). • The circular velocity, , is another parameter that encodes • The mean free path of ionizing photons, mfp , is the other the information about star formation. Star formation takes place in EoR parameter (Alvarez & Abel 2012). mfp sets the maximum dark matter halos that are massive enough to radiatively cool the in- distance travelled by ionizing photons. Due to the process of structure falling gas (Tegmark et al. 1997). This is the main element in setting formation, dense regions of neutral hydrogen (Lyman-limit systems) the minimum mass of star-forming halos, min . We equivalently effectively absorb all the ionizing radiation and thus limit the sphere use the minimum circular velocity as one of our free parameters. of influence of each ionizing source. The mean free path parameter Since the cooling and the internal feedback depend on the depth of approximately accounts for the effect of these dense neutral hydrogen the potential and the potential is directly related to , it is more pockets during reionization. In our simulations, we vary mfp from physical to use a fixed versus redshift rather than a fixed min . 10 to 70 comoving Mpc. Since complex feedback (e.g., Schauer et al. 2015) of various types can suppress star formation in low-mass halos, we treat as a free parameter. In practice the actual threshold is not spatially ho- 2.1.2 Power spectrum mogeneous in our simulation since individual pixels are affected by feedback processes including Lyman-Werner feedback on small ha- It is possible in principle to map the distribution of neutral hydrogen los, photoheating feedback during the EoR and the streaming velocity three dimensionally in the early universe by observing the brightness between dark matter and baryons. The relation between the circular temperature contrast of the 21-cm line. In order to infer the informa- velocity ( ) and the minimum mass of the dark matter halo ( min ) tion about the astrophysical processes in the epoch of reionization is given (in the Einstein de-Sitter limit which is valid at high redshift) and cosmic dawn, there are a variety of approaches one can follow to by characterize the 21-cm signal. Other than the global signal, the most straightforward approach is to use the statistical description of the min 1/3 1 + 1/2 Ω 1/6 21-cm fluctuations, i.e., the 21-cm power spectrum. = 16.9 km s−1 . (1) 108 10 0.0316 The 21-cm power spectrum encodes a great deal of information MNRAS 000, 1–17 (2021)
Machine learning in 21-cm cosmology 3 about the underlying physical processes related to reionization and the 21-cm rest frame frequency at redshift is given by (Fialkov & cosmic dawn. We define the power spectrum ( ) of fluctuation of Barkana 2019) the 21-cm brightness temperature (relative to the radio background, 1420 which is the CMB in standard models) by radio = × 2.725(1 + ) K , (7) 78(1 + ) h ˜ (k) ˜ ∗ (k 0 )i = (2 ) 3 (k − k 0 ) ( ) , (4) where the spectral index = −2.6 (set to match the slope of the where k is the comoving wave vector, is the Dirac delta function, observed extragalactic radio background observed by ARCADE2 and the angular brackets denote the ensemble average. ˜ (k) is the (Fixsen et al. 2011; Seiffert et al. 2011) and confirmed by LWA1 Fourier transform of (x) which is defined by (x) = ( (x) − (Dowell & Taylor 2018)) and is the amplitude of the radio back- ¯ )/ ¯ . Finally we express the power spectrum in terms of the ground. Here 1420 MHz/(1 + ) is the observed frequency corre- variance, in mK2 units: sponding to redshift , and measures the amplitude (relative to the CMB) at the central frequency of the EDGES feature (78 MHz). 3 ( ) Δ2 = h i 2 , (5) Thus, the external radio model has eight free parameters: ∗ , , , 2 2 , min , , mfp and . where the expression 3 ( )/2 2 is dimensionless. The 21-cm sig- In contrast to this external radio background, astrophysical sources nal is significantly non-Gaussian because of both large-scale and such as supermassive black holes or supernovae could in principle small-scale processes during reionization and cosmic dawn. Thus, produce such an extra radio background due to synchrotron radiation. the power spectrum does not reveal all the statistical information that In such a case, the radio emission would originate from within high is available. Nevertheless, a wealth of astrophysical information can redshift radio galaxies and would thus result in a spatially varying be extracted from the 21-cm power spectrum and it can be measured radio background, as computed accurately on large scales within our relatively easily from observations. semi-numerical simulations (Reis et al. 2020b). The galaxy radio luminosity can be written as 2.2 The Excess radio background − radio SFR radio ( , ) = × 1022 W Hz−1 , The first observational signature of the HI 21-cm line from cosmic 150 MHz M yr−1 dawn was tentatively detected by the EDGES collaboration (Bow- (8) man et al. 2018). The shape and magnitude of this signal are not where radio is the spectral index in the radio band, SFR is the star consistent with the standard astrophysical expectation. The reported formation rate and is the normalization of the radio emissivity. 21-cm signal is centered at = 78.2 MHz with an absorption trough Based on observations of low-redshift galaxies, we set radio = 0.7 of = −500+200 −500 mK (Bowman et al. 2018). The amplitude of and note that = 1 roughly corresponds to the expected value absorption is more than a factor of two larger than that predicted (Gürkan et al. 2018; Mirocha & Furlanetto 2019). Since extrapolating from standard astrophysics based on the ΛCDM cosmology and hi- low-redshift observations to cosmic dawn may be wildly inaccurate, erarchical structure formation. The SARAS 3 experiment recently in our analysis we allow to vary over a wide range. Thus, the reported the upper limit of the global signal that is inconsistent with galactic radio model is also based on eight parameters: ∗ , , , the EDGES signal (Singh et al. 2021) at 95%, so it will be some time , min , , mfp , and . before we can be confident that the global 21-cm signal has been Both types of radio background, if they exist, can affect the 21- reliably measured. cm power spectrum, leading to a strong amplification of the 21-cm If EDGES is confirmed, one possible explanation of this observed signal during cosmic dawn and the EoR in models in which the radio signal is that there is be an additional cooling mechanism that makes background is significantly brighter than the CMB. However, there the neutral hydrogen gas colder than expected; a novel dark matter are some major differences between the two models. The external interaction with the cosmic gas (Barkana 2018b) is a viable option, radio background is spatially uniform, is present at early cosmic but it likely requires a somewhat elaborate dark matter model (Berlin times (prior to the formation of the first stars), and increases with et al. 2018; Barkana et al. 2018; Muñoz & Loeb 2018; Liu et al. redshift (i.e., it is very strong at cosmic dawn and weakens during 2019). Another possibility, which we consider in detail in this paper, the EoR). On the other hand, the galactic radio background is non- is an excess radio background above the CMB (Bowman et al. 2018; uniform, and its intensity generally rises with time as it follows the Feng & Holder 2018; Ewall-Wice et al. 2018; Fialkov & Barkana formation of galaxies (as long as is assumed to be constant with 2019; Mirocha & Furlanetto 2019; Ewall-Wice et al. 2020; Reis et al. redshift). 2020b). This excess radio background increases the contrast between the spin temperature and the background radiation temperature. In this case the basic equation for the observed 21-cm brightness tem- 2.3 Mock SKA data perature from redshift relative to the background is To consider a more realistic case study, we create mock SKA data − rad = S (1 − − ) , (6) by including several expected observational effects in the 21-cm 1+ power spectrum, which we refer to as the case "with SKA noise". To where rad = CMB + radio , with radio being the brightness tem- incorporate the SKA noise case within the data, (i) we include the perature of the excess radio background and CMB = 2.725(1 + ) effect of the SKA angular resolution, (ii) we add a pure Gaussian K. We consider two distinct types of extra radio models, which we noise smoothed over the SKA resolution as a realization of the SKA have considered in previous publications. The external radio model thermal noise (following Banet et al. (2021), see also Koopmans et al. assumes a homogeneous background that is not directly related to as- (2015)) and (iii) we also add residuals from foreground avoidance, by trophysical sources, i.e., may be generated by exotic processes (such assuming that part of -space (the "foreground wedge") is removed as dark matter decay) in the early universe. In this model, we assume since it is dominated by foregrounds (following Reis et al. (2020a), that the brightness temperature of the excess radio background at see also Datta et al. (2010); Dillon et al. (2014); Pober et al. (2014); MNRAS 000, 1–17 (2021)
4 S. Sikder et al. Pober (2015); Jensen et al. (2015)). Each of the three effects is = 0.01 − 0.12, and mfp = 10.0 − 70.0 Mpc. As explained above, included along with its expected redshift dependence. Regarding the our analysis involved two more datasets (3195 models each) of 21-cm foreground residuals, we note that we assume that the high-resolution power spectra, with either full SKA noise or SKA thermal noise only, maps of the SKA will enable a first step of reasonably accurate in order to analyze a more realistic situation. In order to investigate foreground subtraction, so that the remaining wedge-like region for the two scenarios of the excess radio background (where the number avoidance will be limited (corresponding to the "optimistic model" of free parameters is increased by one), we use two new datasets of of Pober et al. (2014)). In order to gain some understanding of the models: 10158 models with the galactic radio background and 5077 separate SKA effects, we also consider a case that we label "with models with the external radio background. thermal noise". In this case, we add the effect of SKA resolution and thermal noise, i.e., the same as "SKA noise" except without foreground avoidance. Given the lower accuracy, for cases with mock SKA effects we use 2.5 Artificial Neural Network coarser binning, namely eight redshift bins and five bins. The five Artificial neural networks (ANN) (often simply called neural net- -bins are spaced evenly in log scale between = 0.05 Mpc−1 and works or NN) are computing systems that mimic in some ways the = 1.0 Mpc−1 ; we average the 21-cm power spectrum at each redshift biological neural networks that constitute the human brain. We briefly over the range of values within each bin. To fix the redshift bins, we summarize their properties. An ANN consists of a collection of arti- imagine placing our simulation box multiple times along the line of ficial neurons. Each artificial neuron has inputs and produces a single sight, so that our comoving box size fixes the redshift range of each output which can be the input of multiple other neurons. In our anal- bin. For example, we start with = 27.4, which corresponds to 50 ysis, we use a Multi-layer Perceptron (MLP) which is a supervised MHz (the limit of the SKA), as the far side of the highest-redshift bin. machine learning algorithm in an artificial neural network. To define Then the center of the box is 192 comoving Mpc (half of our 384 Mpc the neural network architecture we need to specify the number of box length) closer to us. The redshift corresponding to the center is hidden layers, number of nodes in each layer, the activation function, taken as the central redshift of the first bin. The next redshift bin is the solver, and the maximum number of iterations. 384 Mpc closer and so on. As the total comoving distance between A Multi-layer Perceptron (MLP) is a supervised learning algo- = 27.4 and = 6 is around 3000 Mpc, we obtain 8 redshift bins that rithm that learns to fit a mapping : → using a training naturally correspond to a line of sight filled with simulation boxes. dataset, where is the input dimension and is the output di- We then average the 21-cm power spectrum over the redshift range mension. When we apply unknown data as a set of input features spanned by each box along the line of sight, by using the simulation = 1 , 2 , 3 , ..., , the neural network uses the mapping to in- outputs which we have at finer resolution in redshift. This averaging fer the target output ( ). This Multi-layer Perceptron can be used is part of the effect of observing a light cone; while there is also an for both classification and regression problems. The advantage of a associated anisotropy (Barkana & Loeb 2006; Datta et al. 2012), in Multi-layer Perceptron is that it can learn highly non-linear models. this paper we only consider the spherically-averaged 21-cm power Every neural network has three different types of layers each con- spectrum. sisting of a set of nodes or neurons. They are the input layer, hidden layer and output layer. The input layer consists of a set of neurons 2.4 Method to generate the dataset that represent the input features = 1 , 2 , 3 , ..., . Each neuron in the input layer is connected to all the neurons in the first hidden We use our own semi numerical simulation (Visbal et al. 2012; Fi- layer with some weights and each node in the first hidden layer is alkov & Barkana 2014) to predict the 21-cm signal for each possible connected to all the nodes in the next hidden layer and so on. The model. The simulation generates realizations of the universe in a output layer receives the values from the last hidden layer and trans- large cosmological volume (3843 comoving Mpc3 ) with a resolution forms them to the output target value. A specific weight ( ) and of 3 comoving Mpc over a wide range of redshifts (6 to 50). The sim- a bias ( ) are applied to every input or feature. Both the weight ulation follows the hierarchical structure formation and the evolution and the bias are initially chosen randomly. For a particular neuron in of the Ly , X-ray, Lyman-Werner (LW), and ionizing ultra-violet ra- the ’th hidden layer, if is the input and +1 is the output of that diation. The extended Press-Schechter formalism is used to compute neuron, then +1 = ( + ), where is called the activation the star formation rate in each cell at each redshift (Barkana & Loeb function. Using linear activation functions would make the entire 2004). The 21-cm brightness temperature cubes are output by the network linear in the inputs, and thus equivalent to a one layered simulation and we use them to calculate the 21-cm power spectrum network. Thus, non-linear activation function are typically used in at each redshift. While this semi-numerical simulation was inspired order to provide the ability to handle complex, non-linear data. The by 21cmFAST (Mesinger et al. 2011), it is an entirely independent activation function activates a neuron, i.e., this function takes its in- implementation with various differences such as more accurate X-ray put and compares it with a threshold value. If the input is greater than heating (including the effect of local reionization on the X-ray absorp- the threshold, it is forwarded to the next layer and if it is less than the tion) and Ly fluctuations (including the effect of multiple scattering threshold, it is turned to zero. Commonly used non-linear activation and Ly heating). Inhomogeneous processes such as the streaming functions include the logistic sigmoid function, hyperbolic tangent velocity, LW feedback, and photo-heating feedback are also included function and the rectified linear unit function. A backpropagation in the code. We created a mock 21-cm signal using the code for a algorithm is usually used to train an artificial neural network. The large number of astrophysical models and calculated the 21-cm power training procedure for a network involves several steps: spectrum for each parameter combination. Considering first standard astrophysical models (without an excess radio background), we gen- • Initialization: Randomly chosen initial weights and biases are erated the 21-cm power spectrum for 3195 models that cover a wide applied to all the nodes or neurons in each layer. range of possible values of the seven astrophysical parameters. The • Forward propagation: The output is computed using the neural ranges of the parameters were ∗ = 0.0001 − 0.50, = 4.2 − 100 network based on the initial choices of the weights and biases given km s−1 , = 0.0001 − 1000, = 1.0 − 1.5, min = 0.1 − 3.0 keV, the input from the training dataset. Since the calculation progresses MNRAS 000, 1–17 (2021)
Machine learning in 21-cm cosmology 5 from the input to the output layer (through the hidden layers), this is moid function as the activation function for the hidden layers and the known as forward propagation. stochastic gradient-based optimizer for the weight optimization. We • Error estimation: An error function (often called a loss function) use 3095 models to train the neural network, and we then apply the is used to compute the difference between the predicted and the true trained ANN to a test dataset consisting of 100 models. Throughout (known) output of the model, given the current weights. MLP uses this paper, for simplicity we choose test cases that have non-zero different loss functions based on the problem type. For regression, a power spectra from intergalactic hydrogen, i.e., that have not fully common choice is the mean square error. reionized by redshift 6. • Backpropagation and updating of the weights : A backpropa- gation algorithm minimizes the error function and finds the optimal weight values, typically by using the gradient descent technique. The 2.5.2 Emulation of the 21-cm power spectrum outermost weights get updated first and then the updates propagates If the statistical description of the 21-cm signal (here the 21-cm towards the input layer, hence the term backpropagation. power spectrum) is our main focus, then we hope to avoid the need • Repetition until convergence: In each iteration, the weights get to run a semi-numerical simulation for each parameter combination. updated by a small amount, so to train a neural network several We can instead construct an emulator that provides rapidly-computed iterations are required. The number of iterations until convergence output statistics that capture the important information in the signal depends on the learning rate and the optimization method used in the given a set of astrophysical parameters. network. We train the neural network to predict the 21-cm power spec- Once the network has been trained using the training dataset, the trum based on the seven parameter astrophysical model specified trained network can make predictions for arbitrary input data that above. As in the case of the ANN to predict the parameters, here were not a part of the training set. also we standardize the features as part of data pre-processing. To reduce the dimension of the power spectrum data, we apply PCA transformation to the data; after experimentation we found that here 2.5.1 Astrophysical parameter predictions 20 PCA components suffice. As before, we again use a log scale for both the dataset of the parameters and the 21-cm power spectra. For the purpose of predicting the astrophysical parameters, we used Next we need to find the appropriate neural network architecture to a two layer MLP with 150 neurons in the first hidden layer and 50 construct the emulator. For this, we choose some specified hyperpa- neurons in the second hidden layer. The network was expected to rameters for our multi layer perceptron estimator and search among be somewhat complex as we want a mapping between the seven all possible combinations to find the best one to use in our MLP re- astrophysical parameters of the model and, on the other side, the gressor. To emulate the 21-cm power spectrum, we use a three layer 21-cm power spectrum for 32 values of the wavenumber in the range MLP with 134 neurons in each layer. We use the logistic sigmoid 0.05 Mpc−1 <
6 S. Sikder et al. we follow a Bayesian analysis for finding the posterior probability Parameters Lower bound Upper bound distribution of the parameters. We use MCMC methods for sampling the probability distribution functions or probability density functions ∗ 0.0001 0.50 (pdfs). [km/s] 4.2 100 The posterior pdf for the parameters given the data , ( | ), 0.0001 1000 is, in general, the likelihood ( | ) (i.e., the pdf for the data given 0.9 1.6 the parameters ) times the prior pdf ( ) for the parameters, divided min [keV] 0.09 3.1 0.01 0.14 by the probability of the data ( ): mfp [Mpc] 9 74 ( | ) ( ) ( | ) = , (9) ( ) Table 1. The prior bounds for the astrophysical parameters. where the denominator ( ) can be thought of as a normalization factor that makes the posterior distribution function integrate to unity. In practice we show below that the MCMC uncertainties signif- If we assume that the noise is independent between data points, then icantly underestimate the true errors. Thus, in order to find more the likelihood function is the product of the conditional probabilities accurate error bounds, we use ensemble learning. Instead of using one emulator, we use an ensemble of emulators with the same neural Ö network architecture. Here we use 20 emulators, each of which we L= ( | ) . (10) train with a randomly drawn subset consisting of 90% of the training =1 dataset (3095 models). We apply each of the trained emulators to Taking the logarithm, the same test dataset and carry out the Bayesian analysis employing the MCMC sampler. Then, for a particular parameter, we take as " # 1 ∑︁ [ − n,model ( )] 2 2 our best predicted value the mean of the predicted values from the ln L = − + ln(2 ) , (11) MCMC sampler using all the emulators. For the uncertainties, we 2 =1 2 take the mean of the distances to the upper and lower uncertainty bounds of the emulators. We label the resulting average the MCMC where we set 2 = 2 + 2n,model 2 . The likelihood function here uncertainty; this is an ensemble-averaged estimate of the internal is assumed to be a Gaussian, where the variance is modelled as is error of the MCMC procedure using a single emulator. To find the common for the MCMC procedure, as a sum of a constant plus a external error of each parameter, we calculate the standard deviation multiple of the predicted data (i.e., as a combination of an absolute of the predicted best-fit values given by these 20 different emulators, error and a relative error). While we might in the future try to directly and this we label the Bootstrap uncertainty as it originated due to the include estimated observational errors and covariances in the data, random sampling of the training dataset. here we instead adopt a black-box approach where we allow the NN and MCMC procedures to estimate on their own the total effective uncertainties and correlations, including also the effect of the un- certainty in the emulation. In particular, is a free parameter that 3 RESULTS gives the MCMC procedure the flexibility to do this, so we include it effectively as an additional model parameter. We apply the proce- 3.1 Performance analysis of the emulators dure to obtain the posterior distribution for all the parameters (seven We show the performance of the emulator of the 21-cm power spec- astrophysical parameters and ) and then marginalize over the extra trum in Fig. 1. We compare the emulated power spectrum and the parameter ( ) to obtain the properly marginalized posterior distribu- true power spectrum from the semi-numerical simulation for two tion for the seven astrophysical parameters (Hogg et al. 2010). Here particular values. The left panel shows a few random examples of the index denotes various -bins and -bins, where the data is the emulated power spectrum (solid lines) and the true power spec- the mock observation of the 21-cm power spectrum and n,model is trum (dashed lines). The different colors denote different models. the predicted 21-cm power spectrum from the emulator. In this work In this figure, we see that the accuracy of the emulator is generally we adopt an effective constant error of: good and tends to improve with the height of the power spectrum, = 0.15 mK2 . (12) although there is some random variation among different models. A more representative, statistical analysis of the accuracy is shown This ensures that the algorithm does not try to achieve a low relative further below. error when the fluctuation itself is low (below ∼ 0.4 mK) and likely The right panel of Fig. 1 shows a few random examples of the more susceptible to systematic errors in realistic data. What we have comparison between the power spectrum emulated by the emulator described here is a typical setup for MCMC. The final uncertainties with SKA noise and the true power spectrum with the SKA noise. are insensitive to the detailed assumptions since in the end the errors The different colors denote different astrophysical models. Again, the are found numerically by the MCMC procedure; furthermore, we emulation is seen to be reasonably accurate, although in some cases have test models that allow us to independently test the reliability of the emulated 21-cm power spectrum significantly deviates from the the uncertainty estimates, as described further in the results section actual one at low redshift and/or when the power spectrum is low. The below. variations intrinsic to the different models in the power spectra (left We use the emcee sampler (Foreman-Mackey et al. 2013) which panels in Fig. 1) are heavily suppressed once we include the expected is the affine-invariant ensemble sampler for MCMC (Goodman & observational effects of the SKA experiment into the power spectra Weare 2010). The MCMC sampler only computes the likelihood (right panels in Fig. 1). In particular, the thermal noise dominates when the parameters are within the prior bounds. We set the prior at high redshift. However, as we find from the results below, when bounds for the parameters according to Table 1 and we use flat priors we fit the power spectrum with SKA noise there is still significant for the parameter values (in log except for and min ). information in the data that allows the fitting procedure to reconstruct MNRAS 000, 1–17 (2021)
Machine learning in 21-cm cosmology 7 Without SKA noise With SKA noise 103 k = 0.11 Mpc −1 10 2 k = 0.11 Mpc−1 2 10 101 101 ∆2 [mK2] ∆2 [mK2] 100 100 −1 10 Emulated ∆2 Emulated ∆2 True ∆2 True ∆2 10−2 10−1 31 27 23 19 15 11 7 25 22 19 16 13 10 7 1+z 1+z 103 103 k = 1.09 Mpc−1 k = 1.0 Mpc−1 102 102 ∆2 [mK2] ∆2 [mK2] 101 101 100 100 Emulated ∆2 Emulated ∆2 True ∆2 True ∆2 10−1 10−1 31 27 23 19 15 11 7 25 22 19 16 13 10 7 1+z 1+z Figure 1. A few random examples of the emulated power spectrum without SKA noise (left panel) and with SKA noise (right panel) at = 0.11 Mpc−1 (upper panel) and ≈ 1.0 Mpc−1 (lower panel); note that the -bin values and widths are different in the SKA case, as explained in the text. The dashed line is the true power spectrum from the simulation and the solid line is the emulated power spectrum (for combinations of astrophysical parameters that were not included in the training set). Different colors show different models. the input parameters. An advantage of machine learning is that the median of the relative error compared to the case without SKA noise. algorithm learns directly how to best deal with noisy data, and there This first performance analysis uses an optimistic measure of error is no need to try to explicitly model or fit the observational effects. as it is normalized to the maximum value of the power spectrum, To test statistically the performance of the emulator in predicting but it clearly indicates that some portions of and space can be the 21-cm power spectrum, we use a test dataset of 100 randomly- emulated accurately, including for the case where the emulator only chosen models that were not part of the training set. We quantify has access to data with SKA noise. Below we consider more detailed the performance in detail below, but here, as an optimistic overall assessments of the performance of the emulator. estimate, we quantify whether any parts of the power spectrum (in and space) are well measured. To this end, we calculate the error in the predicted power spectrum Δ2predicted compared to the power 3.2 Dependence of the emulation error on the redshift and spectrum generated from the simulation for the same parameter set, wavenumber Δ2true , and normalize relative to the maximum value of the power spectrum. Specifically, we find the r.m.s. value of the difference For a more detailed assessment of the emulator, we calculate how between Δ2predicted and Δ2true , and divide by the maximum value of the error varies with redshift and wavenumber. For this we use test the true power spectrum over all and : datasets of 100 models for each of the cases: without SKA noise, with √︄ SKA noise, and with SKA thermal noise only. We first directly test 2 the emulator by comparing the predicted power spectrum (feeding Mean Δ2predicted − Δ2true into the emulator the known true parameters) to the true simulated Error = . (13) power spectrum (as in the previous subsection, but here divided sepa- Max Δ2true rately into and bins). In addition, we test the complete framework Here we take the mean over all and . When we calculate this for the by finding the best-fit astrophysical parameters to mock data using dataset with SKA noise, we normalize the error using the maximum the MCMC sampler; feeding the best-fit parameters to the emulator value of the power spectrum without SKA noise, binned over the of the power spectrum; and finding the error of this best-fit predicted SKA and -bins. For the case of the emulator trained using the 21- power spectrum compared to the true simulated power spectrum. In cm power spectrum without SKA noise, the median of this relative cases with SKA noise, we are not interested in finding the error in the error over the test dataset is 0.009 whereas for the emulator trained predicted power spectrum with SKA noise (as the power spectrum using the 21-cm power spectrum with the SKA noise, the median is is often dominated by noise, especially at high redshifts); instead, 0.002. The SKA binning and the SKA smoothing effects (namely the we make the more challenging comparison of the best-fit predicted angular resolution and foreground avoidance) reduce the differences power spectrum to the true power spectrum, both in their "clean" between different inherent power spectra, and this results in a lower versions (i.e., without SKA noise). To be clear, this means taking MNRAS 000, 1–17 (2021)
8 S. Sikder et al. z = 6 z = 6 z = 6 z = 6 z = 11 z = 11 z = 11 z = 11 z = 17 z = 17 z = 17 z = 17 100 z = 22 z = 22 z = 22 z = 22 Error (k, z) z = 26 z = 26 z = 26 z = 26 z = 30 z = 30 z = 30 z = 30 10−1 10−1 100 10−1 100 10−1 100 10−1 100 k [Mpc−1] k [Mpc−1] k [Mpc−1] k [Mpc−1] k = 0.05 k = 0.05 k = 0.05 k = 0.05 k = 0.09 k = 0.09 k = 0.09 k = 0.09 k = 0.16 k = 0.16 k = 0.16 k = 0.16 100 k = 0.3 k = 0.3 k = 0.3 k = 0.3 Error (k, z) k = 0.54 k = 0.54 k = 0.54 k = 0.54 k = 0.99 k = 0.99 k = 0.99 k = 0.99 10−1 6 10 14 18 22 26 30 6 10 14 18 22 26 30 6 10 14 18 22 26 30 6 10 14 18 22 26 30 z z z z Figure 2. Redshift and wavenumber dependence of the relative error in emulating the best-fit power spectrum. The upper panels shows the dependence on wavenumber (for fixed redshift) and the lower panels depict the redshift dependence (for fixed wavenumber). For the left-most panels, we emulate the power spectrum using the true parameters from the test dataset. For the panels in the second column from the left, we emulate the power spectrum using the best-fit parameters derived from the network without SKA noise. For the panels in the third column from the left, we use the best-fit parameters derived from the network with SKA noise, but for the error we measure the prediction of the real power spectrum, i.e., we apply the emulator trained without SKA noise. For the panels in the right-most column, we use the best-fit parameters derived from the network with thermal noise and otherwise do the same as for the third column. Note that the plots in this figure show all 25 values and 32 values. the reconstructed best-fit astrophysical parameters (which were re- some perspective, we note that a 20% error is typically adopted constructed from the mock data with SKA noise, based on the NN to represent the systematic theoretical modeling error in the 21-cm trained using power spectra with SKA noise), and using it as input to power spectrum (e.g., Kern et al. 2017). In the panels in the second the NN trained using power spectra without SKA noise. Here we use column from the left, we use the best-fit parameters derived from the the following definition to quantify the error as a function of redshift network without SKA noise to emulate the power spectrum. From and wavenumber: the comparison to the left-most panels, we see that the fitting of the astrophysical parameters (in this case without SKA noise) is nearly Δ2predicted_clean − Δ2true_clean perfect, in that the error that it adds is small compared to the error Error( , ) = Median , (14) Δ2true_clean + 0.15 mK2 of the emulator itself. In the panels in the third column, the best-fit parameters are derived from the network with SKA noise, but as where we take the median over the test models; in this paper we noted above, the errors are calculated for the ability to predict the often take the median in order to measure the typical error and real power spectrum, i.e., by comparing the true power spectrum reduce the sensitivity to outliers. This definition of the error measures to the prediction of the emulator that was trained using the power the absolute value of the relative error, except that the denominator spectrum without SKA noise. SKA noise reduces the accuracy of the includes a constant in order not to demand a low relative error when reconstruction of the astrophysical parameters but not by too much, the fluctuation itself is low (in agreement with eq. 12). Note that increasing the typical errors by a fairly uniform factor of ∼ 1.5, to here the errors are much larger than before because they are not 15 − 30% for most values of and . For the panels in the last normalized to the maximum value of the power spectrum but are column, we use the best-fit parameters derived from the network measured separately at each bin, including when the power spectrum trained using the power spectrum with SKA thermal noise only. The is low. errors are nearly identical to the full SKA noise panels, showing that In Fig. 2, we show how the error varies with wavenumber (top the foreground effects do not add substantial error beyond the angular panels) and redshift (bottom panels), for both the without and with resolution plus thermal noise, at least for the optimistic foreground SKA cases. For the direct emulation case (left-most panels, where avoidance model that we have assumed. we emulate the power spectrum using the true parameters from the In order to get a better understanding of the span of the models test dataset), the relative error decreases with wavenumber up to over and , we show in Fig. 3 characteristic quantities that enter ∼ 0.1 − 0.2 Mpc−1 , then plateaus, and again increases above into the above calculation of the relative errors. In the left column, we ∼ 0.6 Mpc−1 . The redshift dependence shows a less regular pattern, show the median of the clean power spectrum (without any noise) as a except that the errors tend to increase both at the low-redshift and function of the wavenumber (upper panel) and redshift (lower panel). high-redshift end. Overall, the typical emulation error of the power In the other columns, the median of the absolute difference between spectrum in each bin is 10 − 20% over a broad range of and , the true and predicted clean power spectra is shown as a function but it rises above 20% at the lowest and highest values (for most of wavenumber (upper panels) and redshift (lower panels). For the redshifts), and at the lowest redshift for all values (i.e., at = 6, panels in the middle column, the best-fit parameters are derived near the end of reionization, when the power spectrum is highly from the network without noise, whereas for the panels in the right variable and is sensitive to small changes in the parameters). For column we use the best-fit parameters derived from the network with MNRAS 000, 1–17 (2021)
Machine learning in 21-cm cosmology 9 SKA noise to emulate the clean power spectrum (without noise). If 0 is an accurate estimate then the actual values of this normalized This figure shows that the 21-cm power spectrum varies greatly as error (for the test dataset) should have a standard deviation of unity. a function of and , even when we take out the model-to-model All these quantities, namely Ptrue , Ppredicted and 0 , are measured in variation by showing the median of the 100 random test cases. The log space (log10 ) for all the parameters except for and min . We variation is by three and a half orders of magnitude; even if we show the histogram of the errors in predicting each of the parameters ignore the parameter space in which the power spectrum is lower in Figs. 4 (for the three most important parameters of high-redshift than 0.4 mK (see eq. 12), we are left with a range of more than two galaxies) and B1 (for the four other parameters, shown in the Ap- orders of magnitude. For the considered ranges, the overall variation pendix). In these figures, the left panels are for the case without SKA with redshift at a given wavelength is much greater than the variation noise, the middle panels are for the case with SKA noise, and the with wavelength at a given redshift. Over this large range, the relative right panels are for the case with SKA thermal noise. The black solid error in each case (with or without SKA noise) remains relatively line in each panel shows the best-fit Gaussian of the histogram, also constant; this is seen by the panels in Fig. 3 that show the relative listing its mean ( ) and standard deviation ( ) within the panel. The error, which overall follows a similar pattern (with and ) as the two grey dashed lines in each panel represent the 3 boundary of power spectrum except with a compressed range of values. the respective Gaussian. The standard deviations ( ) for most of the seven parameters are close to unity (within ∼ 20%), which implies that our procedure generates a reasonable estimate of the uncertain- ties. Also, the mean (which measures the bias in the prediction) is 3.3 Errors in the fitted astrophysical parameters in every parameter (without SKA noise) less than 0.3 in size. The Up to now, we have examined the errors in emulating or reconstruct- datasets with SKA noise or with thermal noise give similar results, ing the 21-cm power spectrum. Of greater interest is, of course, consistent with the similar comparison in Fig. 2. With the noisy data, the ability to extract astrophysical information from a given power the mean values are as large as ∼ 1 for some of the parameters spectrum. In addition to the unavoidable effect on the fitting of the ( and min ) which means that adding SKA noise in the dataset emulation uncertainty, there are also the SKA observational effects. increases the bias in the predicted results. We also find that most of We show results for a random example model in Tables 2 and 3, the distributions are fairly Gaussian, with only a small fraction of the without or with SKA noise. In this example, for several parameters 100 models yielding best-fit parameter values that fall outside the 3 the true parameter values lie well outside the 1 uncertainty bounds boundary of the respective Gaussian fit. calculated from MCMC only, e.g., and min are off by nearly 5 In Figs. 5 and B2 (the latter in the Appendix), we show the his- in the case without SKA noise, while is off by 2.4 in the case togram of the total uncertainty ( 0 ) for each of the parameters. In with SKA noise. Thus, as explained in section 2.6, we also calcu- the left panels we compare the histogram of the total uncertainty for late a Bootstrap uncertainty using random sampling of the training the cases: without SKA noise and with SKA noise, whereas in the dataset. Tables 2 and 3 show that the Bootstrap uncertainty is often right panels we compare the total uncertainty for the cases: with- significantly larger, especially for the parameters where the MCMC out SKA noise and with thermal noise. Again the total uncertainties uncertainty severely underestimates the actual error. In the case with- are measured in log scale (log10 ) for all the parameters except for out SKA noise, the bootstrap uncertainty is larger than the MCMC and min . Table 4 shows the corresponding median of the total uncertainty by a factor that ranges from 1.4 to 4.9 in this example. Af- uncertainty ( 0 ) for the cases: without SKA noise, with SKA noise ter adding SKA noise, the MCMC uncertainty is significantly larger and with thermal noise. In the theoretical case of no observational than in the case without SKA noise, for all the parameters, while limitations ("Without SKA"), the emulation errors still allow the pa- for the bootstrap uncertainty this is the case only for a few of the rameters and to be reconstructed with a typical accuracy of 3%, parameters. and ∗ to within 8%. The ionizing mean free path ( mfp ) is typically Since the MCMC uncertainty (as given by a single emulator) is uncertain by a factor of 1.5, and by a factor of 3.1. For the linear unreliable, in what follows we do not focus on the MCMC contours. parameters, the uncertainty is typically ±0.2 in and ±0.65 in min . We show them in the Appendix for one other example (i.e., a dif- The uncertainties with full SKA noise and with only SKA thermal ferent astrophysical model than the one used for Tables 2 and 3), in noise are basically the same (except for some random scatter). The figures A1 and A2. More generally, while the results in this section uncertainties in , , min , and mfp are only marginally affected are interesting, we have only shown a couple random example models by adding mock SKA effects, indicating that the emulation error here. In order to understand the general trends, we consider below dominates for these parameters. However, SKA noise substantially the overall statistics of the fitting as calculated for a large number of increases the errors in the other parameters, to 15% in , 5% in , models. and 32% in ∗ . Of course, currently our knowledge of most of these parameters is uncertain by large factors (orders of magnitude in some cases), so these types of constraints would represent a remarkable advance. 3.4 Statistical analysis of the astrophysical parameter errors In order to test the overall performance in predicting each of the 3.5 Classification of the radio backgrounds parameters, we use our test dataset of 100 models. We calculate in each case the MCMC uncertainty 1 and the bootstrap uncertainty 2 As noted in the introduction, the possible observation of the absorp- 0 √︃ in the previous sub-section, and define a total uncertainties ≡ as tion profile of the 21-cm line centered at 78 MHz with an amplitude 12 + 22 . In order to test whether this total uncertainty is a realistic of −500 K by the EDGES collaboration is incompatible with the standard astrophysical prediction. One of the possible explanations error estimate, we calculate the normalized error in predicting a for this unexpected signal is that the excess radio background above parameter (Ppredicted ) compared to the true value (Ptrue ) as: the CMB enhances the contrast between the spin temperature and Ptrue − Ppredicted the background radiation temperature. Fialkov & Barkana (2019) Error ( ) = . (15) considered a uniform external radio background (not related to the 0 MNRAS 000, 1–17 (2021)
10 S. Sikder et al. 102 [mK2] [mK2] z = 6 z = 6 101 101 clean | clean | z = 11 z = 11 [mK2] 101 z = 17 z = 17 − ∆2predicted − ∆2predicted z = 22 z = 22 clean ) z = 6 z = 26 z = 26 100 100 100 Median(∆2true z = 11 z = 30 z = 30 clean clean z = 17 Median |∆2true Median |∆2true 10−1 z = 22 z = 26 10−1 10−1 z = 30 10−2 10−1 100 10−1 100 10−1 100 k [Mpc−1] k [Mpc−1] k [Mpc−1] 102 [mK2] [mK2] k = 0.05 k = 0.05 101 101 clean | clean | k = 0.09 k = 0.09 [mK2] 101 k = 0.16 k = 0.16 − ∆2predicted − ∆2predicted k = 0.3 k = 0.3 clean ) k = 0.05 k = 0.54 k = 0.54 100 100 100 Median(∆2true k = 0.09 k = 0.99 k = 0.99 clean clean k = 0.16 Median |∆2true Median |∆2true 10−1 k = 0.3 k = 0.54 10−1 10−1 k = 0.99 −2 10 6 10 14 18 22 26 30 6 10 14 18 22 26 30 6 10 14 18 22 26 30 z z z Figure 3. Left column: Median of the true (clean) power spectrum (without SKA noise), Δ2true_clean , as a function of wavenumber (upper panel) and redshift (lower panel). Other columns : The median of the absolute value of the difference between the true and predicted clean power spectrum. For the panels in the middle column, we emulate the power spectrum using the best-fit parameters derived from the network without SKA noise. For the panels in the right column, the best-fit parameters are derived from the network with SKA noise, but the error is measured by emulating the clean power spectrum. As in Fig. 2, the plots in this figure show all 25 values and 32 values. Uncertainty Parameters True value Predicted value MCMC Uncertainty Bootstrap Uncertainty ∗ -1.108 -1.101 ±0.007 ±0.015 [km/s] 1.108 1.110 ±0.007 ±0.010 -0.942 -0.901 ±0.032 ±0.124 1.5 1.274 ±0.048 ±0.236 min [keV] 0.7 0.648 ±0.011 ±0.026 -1.112 -1.112 ±0.002 ±0.004 mfp [Mpc] 1.447 1.449 ±0.019 ±0.054 Table 2. Predicted parameter values and their respective uncertainties for a model with relatively low-mass halos (the value of the parameter is 12.8 km/s). Here all the parameter values are in log10 except and min . Here we use the 21-cm power spectrum without SKA noise. Uncertainty Parameters True value Predicted value MCMC Uncertainty Bootstrap Uncertainty ∗ -1.108 -1.038 ±0.042 ±0.057 [km/s] 1.108 1.170 ±0.026 ±0.042 -0.942 -0.851 ±0.110 ±0.103 1.5 1.187 ±0.186 ±0.089 min [keV] 0.7 0.668 ±0.027 ±0.028 -1.112 -1.116 ±0.007 ±0.004 mfp [Mpc] 1.447 1.475 ±0.132 ±0.079 Table 3. Same as table 2. Here we use the 21-cm power spectrum with SKA noise. astrophysical sources directly), with a synchrotron spectrum of spec- use both the external and galactic radio models and train a neural tral index = −2.6 and amplitude parameter r measured relative network to try to infer the type of the radio background given the to the CMB at the reference frequency of 78 MHz. Another potential 21-cm power spectrum. For this purpose, we create a training dataset model for the excess radio background is that it comes from the high of 9500 models (where there are ∼ 5000 models with a galactic redshift radio galaxies. The effect of the inhomogeneous galactic radio background and ∼ 4500 models with an external radio back- radio background on the 21-cm signal has been explored by Reis ground), with the astrophysical parameters varying over the following et al. (2020b). They used the galactic radio background model to ranges: ∗ = 0.01 − 0.5, = 4.2 − 60 km s−1 , = 0.0001 − 1000, explain the unexpected EDGES low band signal. In our work, we = 1.0−1.5, min = 0.1−3.0, = 0.033−0.089, mfp = 10.0−70.0. MNRAS 000, 1–17 (2021)
Machine learning in 21-cm cosmology 11 30 30 35 σ: 0.81, µ: 0.01 σ: 1.10, µ: 0.10 σ: 1.04, µ: 0.16 30 25 25 Without SKA noise With SKA noise With thermal noise 25 parameter: f∗ parameter: f∗ parameter: f∗ 20 20 Counts Counts Counts 20 15 15 15 10 10 10 5 5 5 0 0 0 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 Error Error Error 35 35 σ: 0.99, µ: -0.18 35 σ: 0.80, µ: 0.04 σ: 0.93, µ: 0.13 30 30 Without SKA noise 30 With SKA noise With thermal noise 25 parameter: VC parameter: VC 25 parameter: VC 25 20 20 Counts Counts Counts 20 15 15 15 10 10 10 5 5 5 0 0 0 −8 −6 −4 −2 0 2 4 6 8 −10.0 −7.5 −5.0 −2.5 0.0 2.5 5.0 7.5 10.0 −8 −6 −4 −2 0 2 4 6 8 Error Error Error 35 30 35 σ: 1.04, µ: 0.29 σ: 0.98, µ: 0.99 σ: 1.05, µ: 1.17 30 25 30 Without SKA noise With SKA noise With thermal noise 25 parameter: fX parameter: fX 25 parameter: fX 20 20 20 Counts Counts Counts 15 15 15 10 10 10 5 5 5 0 0 0 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 −8 −6 −4 −2 0 2 4 6 8 Error Error Error Figure 4. Histogram of the errors in the predicted parameters: ∗ , and as defined in Eq. 15. Parameters Without SKA With SKA With thermal the criteria adopted by Fialkov & Barkana (2019) as representing a rough compatibility with the 99% limits of the detected signal in the ∗ 0.032 0.120 0.140 EDGES low band experiment, in terms of the overall decline and [km/s] 0.013 0.062 0.062 rise without regard to the precise shape of the absorption (which is 0.486 0.539 0.527 much more uncertain). The enhanced radio emission must strictly be 0.206 0.216 0.220 a high redshift phenomena, in order to not over-produce the observed min [keV] 0.647 0.657 0.664 radio background (Fialkov & Barkana 2019), so we assume a cut-off 0.013 0.021 0.021 redshift, cutoff = 15 (Reis et al. 2020b) below which = 1 as for mfp [Mpc] 0.164 0.164 0.174 present-day radio sources. So we only consider here redshifts from 15 to 30 (or the highest SKA redshift in the case with SKA noise). In Table 4. The median (over 100 test models) of the total uncertainty ( 0 ) for our training dataset, we treat the radio background parameters or each parameter. As before, all the parameter values are in log10 except and min . The columns show the cases without SKA noise, with SKA noise, and on an equal footing and add an extra column of a binary parameter with SKA thermal noise. that specify the type of radio background: 0 for the external radio background and 1 for the galactic radio background. In our EDGES compatible test dataset, we have 530 models and 308 models with an For the models with a galactic radio background, the normalization external and a galactic radio background, respectively. We apply this of the radio emissivity (measured relative to low-redshift galaxies), test dataset to the trained NN. In the predicted parameters, we round R , varies over the range = 0.01 − 107 , and the range for the off the binary parameter either to zero (when it is ≤ 0.5) which is the amplitude of the radio background, r , for the external radio models label for the external radio background, or to unity (when it is > 0.5) is is 0.0001 − 0.5. which is the label of the galactic radio background. The confusion We apply an EDGES-compatible test dataset to the two trained matrix shown in Fig. 6 indicates the performance of our classification networks. The models that we refer to as EDGES-compatible satisfy method for identifying the type of radio background. In the case MNRAS 000, 1–17 (2021)
You can also read