A method for classification of red, blue and green galaxies using fuzzy set theory
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
MNRAS 000, 1–6 (2020) Preprint 28 August 2020 Compiled using MNRAS LATEX style file v3.0 A method for classification of red, blue and green galaxies using fuzzy set theory Biswajit Pandey⋆ Department of Physics, Visva-Bharati University, Santiniketan, Birbhum, 731235, India arXiv:2005.11678v2 [astro-ph.GA] 27 Aug 2020 28 August 2020 ABSTRACT Red and blue galaxies are traditionally classified using some specific cuts in colour or other galaxy properties, which are supported by empirical arguments. The vagueness associated with such cuts are likely to introduce a significant contamination in these samples. Fuzzy sets are vague boundary sets which can efficiently capture the clas- sification uncertainty in the absence of any precise boundary. We propose a method for classification of galaxies according to their colours using fuzzy set theory. We use data from the SDSS to construct a fuzzy set for red galaxies with its members hav- ing different degrees of ‘redness’. We show that the fuzzy sets for the blue and green galaxies can be obtained from it using different fuzzy operations. We also explore the possibility of using fuzzy relation to study the relationship between different galaxy properties and discuss its strengths and limitations. Key words: methods: statistical - data analysis - galaxies: formation - evolution - cosmology: large scale structure of the Universe. 1 INTRODUCTION nosity, stellar mass and environment (Balogh, et al. 2004; Baldry, et al. 2006). Colour is considered to be one of the fundamental properties of a galaxy. It is defined as the ratio of fluxes in two different The origin of the observed colour bimodality must bands and characterizes the stellar population of a galaxy. be explained by successful models of galaxy formation. Understanding the distribution of galaxy colours and their Considerable efforts have been directed towards repro- evolution can provide important clues to galaxy formation ducing the observed distribution of galaxy colours us- and evolution. ing different semi-analytic models of galaxy formation The modern redshift surveys like SDSS (York et al. (Menci, et al. 2005; Driver, et al. 2006; Cattaneo, et al. 2000) have now measured the photometric and spectro- 2006, 2007; Cameron, et al. 2009; Trayford et al. 2016; scopic information of a large number of galaxies. Using Nelson, et al. 2018; Correa, Schaye & Trayford 2019). Sig- SDSS, Strateva, et al. (2001) first revealed a striking bi- nificant number of studies have been also devoted to un- modal feature in the distribution of galaxy colours. Subse- derstand the spatial distribution of red and blue galax- quent studies with SDSS and other surveys (Blanton, et al. ies using various clustering measures like two-point corre- 2003; Bell, et al. 2003; Baldry, et al. 2004) confirmed the lation function (Zehavi, et al. 2011), three-point correlation colour bimodality and quantified it in greater detail. The function (Kayo, et al. 2004), genus (Hoyle et al. 2002), fil- observed colour bimodality indicates the existence of two amentarity (Pandey & Bharadwaj 2006) and local dimen- different populations namely the ‘red sequence’ and ‘blue sion (Pandey & Sarkar 2020). The studies of mass function sequence’ with a significant overlap between them. The over- of red and blue galaxies (Drory, et al. 2009; Taylor, et al. lapping region is often termed as ‘green valley’. Driver, et al. 2015) play a crucial role in guiding the theories of galaxy (2006) argued that galaxy colours are an outcome of the mix- formation and evolution. Any such analysis requires one to ture of colours from stars in the disc and bulge of a galaxy classify the red and blue galaxies using some operational def- and the colour bimodality can be related to the bimodal inition. But colour is a subjective judgment and it is difficult distribution of their bulge to disc mass ratios. A number to classify galaxies according to their colours in an objective of studies show that colour bimodality is sensitive to lumi- manner. Also one should remember that it is the observed bimodal distribution of galaxy colours that motivates us to classify the galaxies according to their colours. In reality, no galaxies can be regarded as truly either ‘red’ or ‘blue’ based ⋆ E-mail: biswap@visva-bharati.ac.in on their colours. Strateva, et al. (2001) prescribed that the c 2020 The Authors
2 Pandey, B. red and blue galaxies can be separated by using an optimal 3 METHOD OF ANALYSIS colour separator (u − r) = 2.22. However a substantial over- 3.1 Fuzzy set lap between the two populations prohibit us to objectively define galaxies as ‘red’ and ‘blue’ based on their colours. A fuzzy set A in an Universal set X is defined as a set of The boundary separating the two populations remains ar- ordered pairs (Zadeh 1965), bitrary and currently there are no meaningful theoretical argument favoring any specific cut over others. Applying a A = ( x, µA (x) ) | x ∈ X (1) hard-cut to define samples of red and blue galaxies is ex- where µA (x) is the membership function of x in A. µA (x) maps pected to introduce significant contamination in these sam- elements of the Universal crisp set X into real numbers in ples. Previously, Coppa, et al. (2011) use an unsupervised [0, 1] i.e. µA (x) : X → [0, 1]. It provides the degree of member- fuzzy partition clustering algorithm applied on the principal ship of x in X. Thus fuzzy sets are vague boundary sets. The components of a PCA analysis to study the bimodality in maximum value of the membership degree characterizes the galaxy colours. Another study by Norman, et al. (2004) use height of a fuzzy set. Fuzzy sets with height one are known Bayesian statistical methods instead of sharp cuts in param- as normal fuzzy sets. The membership function of a fuzzy eter space for such classification problems. set can have different shapes such as: triangular, trapezoidal, The fuzzy set theory was first introduced by Lotfi A. Gaussian, sigmoidal etc. in different contexts. Zadeh (Zadeh 1965) which has been applied in various fields such as industrial automation (Zadeh 1973), control system (Lugli et al. 2016), image processing (Rosenfeld 3.2 Fuzzy relation 1979), pattern recognition (Rosenfeld 1984) and robotics A fuzzy relation S between two fuzzy sets A1 and A2 in (Wakileh & Gill 1988). It provides a natural language to product space X × Y is a mapping from the product space to express vagueness or uncertainty associated with imprecise an interval [0, 1] and is defined as, boundary. (2) In this Letter, we propose a framework to describe S = (x, y) | x ∈ A1 , y ∈ A2 galaxy colours using a fuzzy set theoretic approach. We also S describes the relations between the elements of the two study the prospects of using fuzzy relations to understand fuzzy sets A1 and A2 . The strength of the relation be- the relationship between different galaxy properties by treat- tween the ordered pairs of the two sets is expressed by ing them as fuzzy variables. the membership function of the relation µA ×A (x, y) = µS (x, y) = 1 2 The Letter is organized as follows: We describe the min µA (x), µA (y) . The fuzzy relation S itself is a fuzzy set 1 2 SDSS data in Section 2, explain the method in Section 3 whose membership function is represented with a matrix and present the results and conclusions in Section 4. known as relation matrix. 3.3 Definition of fuzzy sets for different colours and luminosity Application of a hard-cut yields two crisp sets correspond- 2 SDSS DATA ing to red and blue galaxies. This denies a gradual transi- tion between the two populations which is mathematically The Sloan Digital Sky Survey (SDSS) is one of the most suc- correct but unrealistic. In reality, there is an uncertainty in- cessful sky surveys of modern times. The SDSS has so far volved in the decision of labelling a galaxy as either ‘red’ measured the photometric and spectroscopic information of or ‘blue’ whenever the colour falls in the neighbourhood of more than one million of galaxies and quasars in five differ- the precisely defined border. This uncertainty is completely ent bands. We use data from SDSS DR16 (Ahumada, et al. ignored by the hard-cut separator. In fuzzy set theory, an 2019) for the present analysis. We retrieve data from the element may partly belong to multiple fuzzy sets with dif- SDSS SkyServer 1 using Structured Query Language. We ferent degree of memberships thereby allowing to capture download the spectroscopic information of all the galaxies this uncertainty whenever the boundaries are imprecise. with r-band Petrosian magnitude 13.5 ≤ r p < 17.77 and lo- We do not label galaxies as purely red, blue or green. cated within 135◦ ≤ α ≤ 225◦ , 0◦ ≤ δ ≤ 60◦ , z < 0.3. Here α, δ Rather we attach ‘redness’, ‘blueness’ or ‘greenness’ to all and z are right ascension, declination and redshift respec- the galaxies. We use SDSS data to define fuzzy sets corre- tively. We obtain a total 376495 galaxies which satisfy these sponding to three colours red, blue and green. We define a cuts. We then apply a cut to the K-corrected and extinction fuzzy set R corresponding to ‘redness’ of galaxies using (u− r) corrected r-band absolute magnitude −21 ≥ Mr ≥ −23 which colour of 103984 galaxies in the volume limited sample as, corresponds to a redshift cut of 0.041 ≤ z ≤ 0.120. This pro- (3) duces a volume limited sample with 103984 galaxies. We use R = (u − r, µR (u − r)) | (u − r) ∈ X a ΛCDM cosmological model with Ωm0 = 0.315, ΩΛ0 = 0.685 Here X is the Universal set of (u − r) colour of all galaxies. and h = 0.674 (Planck Collaboration, et al. 2018). We choose a sigmoidal membership function, 1 µR (u − r; a, c) = (4) 1 + e−a[(u−r)−c] where a and c are constants. We choose a = 5.2 and c = 2.2 for our analysis. The choice of the sigmoidal membership 1 https://skyserver.sdss.org/casjobs/ function is based on the fact that galaxies with larger (u − r) MNRAS 000, 1–6 (2020)
Fuzzifying galaxy colours 3 Figure 1. The top left panel of this figure shows the pdf of the (u − r) colour of SDSS galaxies. The top right panel shows the membership functions µR (u − r ), µ B (u − r ), µg (u − r ) of the fuzzy sets corresponding to red, blue and green galaxies in the SDSS as a function (u − r) colour. The bottom left panel shows the membership function µL of the fuzzy set for luminosity of the SDSS galaxies as a function of r-band absolute magnitude (Mr ). The membership function µL (Mr ) for luminosity and membership function µR (u − r ) for red galaxies are plotted against each other in the bottom right panel. −23 −22.8 −22.6 −22.4 −22.2 L −22 −21.8 −21.6 −21.4 −21.2 −21 1 1.3 1.6 1.9 2.2 2.5 2.8 3.1 3.4 3.7 4 R 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Membership function (µL X R) Figure 2. This shows the fuzzy relation between the fuzzy sets for luminosity (L) and red galaxies (R). colour are considered to be progressively redder and those difficult to classify a galaxy as red or blue around this (u − r) with a very high (u − r) colour can be reliably tagged as colour. The slope a at the cross-over point is somewhat arbi- ‘mostly red’. One may choose a linear shape for it but non- trarily chosen in this analysis where we have ensured that the linear functions are known to perform better in most prac- galaxy with highest and lowest (u − r) colour have µR (u − r) = 1 tical situations (Kwang 2005). It may be noted that the and µR (u − r) = 0 respectively. We believe that the choice of parameter c represents the crossover point of the fuzzy set the parameter a can be improved by training an artificial R where µR (u − r) = 0.5 and parameter a denotes the slope at neural network with sample data. the crossover point. The crossover point of a fuzzy set is the A larger ‘redness’ is associated with smaller ‘blueness’ location where the fuzzy set has maximum uncertainty or and vice versa. So we can define a fuzzy set B corresponding vagueness. SDSS observations of the bimodal distribution of to ‘blueness’ of galaxies by simply taking a fuzzy comple- (u − r) colour show that the two peaks corresponding to ‘red’ ment of the normal fuzzy set R i.e. B = Rc . The membership and ‘blue’ population merge together at u − r ∼ 2.2. It is most function µB (u − r) of fuzzy set B can be simply obtained as MNRAS 000, 1–6 (2020)
4 Pandey, B. follows, 4 RESULTS AND CONCLUSIONS µB (u − r) = 1 − µR (u − r), ∀(u − r) ∈ X (5) In the top left panel of Figure 1, we show the pdf of (u − r) colour of SDSS galaxies in the volume limited sample an- We define the green galaxies as those which simultane- alyzed here. The bimodal nature of the (u − r) colour can ously belong to both the fuzzy sets R and B. The fuzzy set G be clearly seen in this figure. It has been shown that the corresponding to the ‘greenness’ of galaxies is defined from colour distribution can be well described by a double Gaus- the two fuzzy sets R and B by taking a fuzzy intersection sian (Balogh, et al. 2004; Baldry, et al. 2006; Taylor, et al. between them i.e. G = R ∩ B. The fuzzy set G is pointwise 2015). We observe that the pdf of the (u − r) colour exhibit defined by the membership function µG (u − r) as two distinct peaks which merge together at (u − r) ∼ 2.2. The (u − r) colour of SDSS galaxies show a gradual transition on µG (u − r) = 2 min µR (u − r), µB (u − r) , ∀(u − r) ∈ X (6) either sides of this merging point. Traditionally, the galax- where min is the minimum operator and the prefactor of 2 ies are classified as ‘red’ or ‘blue’ by employing a hard- is used for normalization. The intersection of the fuzzy sets cut around the merging point of the two peaks. However, R and B will be a subnormal fuzzy set with a height 0.5 at any (u − r) colour close to this border should be regarded the crossover point (u − r) = 2.2. However a galaxy with (u − r) as an equal evidence for both the classes. The uncertainty colour of 2.2 should be maximally green with a membership reaches its maximum at the border which is completely over- function µG (u − r ) = 1. looked by such a hard-cut separator. We treat this merging It may be noted here that the membership functions do point as the point of maximum confusion and use it as the not provide the probability of a galaxy being ‘red’, ‘blue’ crossover point (µR (u − r) = 0.5) for the membership function or ‘green’. The membership functions show the possibil- (Equation 4) of the fuzzy set of red galaxies (R) in SDSS. We ity of being contained in a particular fuzzy set. They are obtain the fuzzy set of blue galaxies (B) in the SDSS by tak- not similar to the likelihood functions in probability the- ing a fuzzy complement of the fuzzy set R. The green galaxies ory where the overall likelihood function need to be normal- belong to an intermediate class which are most difficult to ized to unity. The fuzzy set theory is based on the idea of disentangle. We carry out a fuzzy intersection between the possibility rather than probability. Both probability theory two fuzzy sets R and B to construct the fuzzy set of green and fuzzy theory express uncertainty and have similarities galaxies (G) in the SDSS. We compute the membership func- in certain aspects. But there are some important differences tions of all the SDSS galaxies in these fuzzy sets following between the two. For example the sum of the probabilities Equation 4, Equation 5 and Equation 6. The membership in a probability distribution must be equal to 1. Contrary functions µR (u − r), µB (u − r) and µG (u − r) corresponding to red, to this, there is no such limit for the sum in a possibility blue and green galaxies respectively are shown as a function distribution defined on an Universal set (Kwang 2005). of (u − r) colour of SDSS galaxies in the top right panel of It is worthwhile to mention here that the hard-cut sam- Figure 1. We can see that galaxies are maximally green at ples can be easily obtained from these fuzzy sets by con- the merging point of the peaks where classifying a galaxy as structing appropriate α−level sets which are crisp sets. The red or blue is most uncertain. elements that belong to a fuzzy set with at least a degree of We define the fuzzy set L for luminosity of SDSS galax- membership α is called the α-level set: ies using their r-band absolute magnitudes. The membership Aα = x ∈ X | µA (x) ≥ α (7) function µL ( Mr ) of the fuzzy set L is defined according to vari- ations in luminosity with absolute magnitude (Equation 9). An α-level set selects the objects above a certain degree of We show the membership function µL ( Mr ) of the fuzzy set L possibility. A higher alpha-cut to a fuzzy set implies that for SDSS galaxies as a function of r-band absolute magni- the members of the resulting crisp set belong to a particular tude (Mr ) in the bottom left panel of Figure 1. We also plot class with higher possibility. the membership function (µL ( Mr )) of set L against the mem- We also define a fuzzy set for luminosity of SDSS galax- bership function µR (u − r) of set R of the SDSS galaxies in the ies as, bottom right panel of Figure 1. The bottom right panel of Figure 1 shows that there are two distinct peaks located near L = ( Mr , µL ( Mr )) | Mr ∈ Y (8) the both extremities of µR (u − r) around which the members where Y is the Universal set of r-band absolute magnitude of the two fuzzy sets are clustered. It shows that the bright (Mr ) of all galaxies. The membership function of this fuzzy galaxies are preferentially red and then blue but they have set is, least possibility of being green. The observed trend is some- what more conspicuous than those observed in the color- 1 µL ( Mr ) = , ∀Mr ∈ Y (9) magnitude diagram. Unlike the color-magnitude diagram, it (2.512)[Mr −(Mr )brightest ] shows the relationship between the two membership func- The choice of this membership function is based on the fact tions which themselves express the degree of ‘redness’ and that a difference of 1 in absolute magnitude corresponds to a ‘brightness’. A combination of two α-cuts within 0 to 1 on factor of 2.512 in luminosity. Thus the brightest galaxy in the the two membership functions µR (u − r ) and µL ( Mr ) can be sample have a membership function of 1 and the membership used to define a galaxy sample with desired properties. This function µL ( Mr ) for the galaxies in L are assigned following 9. can be extended to any number of fuzzy sets and hence al- It should be noted that the fuzzy set L is not defined to cat- lows a greater flexibility in classifying galaxies with different egorize galaxies according to their luminosity. We construct properties. the fuzzy set L in order to study the fuzzy relation between We compute the fuzzy relation S between the fuzzy sets it and the ‘red’ fuzzy set. L and R using Equation 2. The resulting relation matrix is MNRAS 000, 1–6 (2020)
Fuzzifying galaxy colours 5 shown in Figure 2. This shows the strengths of association author acknowledges IUCAA, Pune for providing support for different possible combinations of r-band absolute mag- through associateship programme. nitude (Mr ) and colour (u − r). We find that for (u − r) > 2.2, Funding for the SDSS and SDSS-II has been provided the membership function of the fuzzy relation gradually in- by the Alfred P. Sloan Foundation, the Participating Institu- creases with increasing luminosity and the association is tions, the National Science Foundation, the U.S. Department strongest near the largest luminosity. Contrary to this, no of Energy, the National Aeronautics and Space Administra- such gradients are seen for (u − r) < 2.2. This implies that tion, the Japanese Monbukagakusho, the Max Planck Soci- brighter galaxies are fairly redder for galaxies with (u − r) ety, and the Higher Education Funding Council for England. colour larger than 2.2. But this is not necessarily the case The SDSS Web Site is http://www.sdss.org/. when we consider the region below (u − r) < 2.2 which rep- The SDSS is managed by the Astrophysical Research resents less redder i.e. bluer galaxies. However, one should Consortium for the Participating Institutions. The Partic- remember that such a boundary is not precise and a mixed ipating Institutions are the American Museum of Natu- behaviour can be observed in the proximity of (u − r) = 2.2. ral History, Astrophysical Institute Potsdam, University of In probability theory, we may be simply interested in calcu- Basel, University of Cambridge, Case Western Reserve Uni- lating the covariance of two random variables Mr and (u − r) versity, University of Chicago, Drexel University, Fermilab, which can be represented by a 2 × 2 matrix. The elements the Institute for Advanced Study, the Japan Participation corresponding to each variables in that case are treated as Group, Johns Hopkins University, the Joint Institute for crisp as there are no uncertainty about their memberships. Nuclear Astrophysics, the Kavli Institute for Particle As- Also the covariance matrix does not provide the association trophysics and Cosmology, the Korean Scientist Group, the between all possible elements of two sets. The fuzzy relation Chinese Academy of Sciences (LAMOST), Los Alamos Na- delineate the relationship between two fuzzy sets by mea- tional Laboratory, the Max-Planck-Institute for Astronomy suring the association between all possible elements of two (MPIA), the Max-Planck-Institute for Astrophysics (MPA), sets. This allows one to distinguish the degree of associa- New Mexico State University, Ohio State University, Uni- tion between the two variables across the different regions versity of Pittsburgh, University of Portsmouth, Princeton of parameter space. University, the United States Naval Observatory, and the Training an artificial neural network with sample data University of Washington. to learn about the membership function may help a more ef- ficient construction of the membership function. Further, we are using Type-1 fuzzy sets whose membership functions are crisp sets. So there are no uncertainties in the membership 6 DATA AVAILABILITY functions of the fuzzy sets defined here. One can include the The data underlying this article are available in uncertainties in the membership functions by using Type-2 https://skyserver.sdss.org/casjobs/. The datasets fuzzy sets. A Type-2 fuzzy set is a fuzzy set whose member- were derived from sources in the public domain: ship function is a Type-1 fuzzy set. We plan to address these https://www.sdss.org/ issues in a future work. The analysis presented in this work is to be considered as a first step towards a more complete and rigorous treatment. Finally, we note that fuzzy set theory has great poten- REFERENCES tial to become an useful tool in highly data driven fields Ahumada R., et al., 2019, arXiv, arXiv:1912.02905 of astrophysics and cosmology. An imprecise boundary be- Balogh M. L., Baldry I. K., Nichol R., Miller C., Bower R., Glaze- tween different class of objects and their properties are quite brook K., 2004, ApJL, 615, L101 common in astronomical datasets. In cosmology, reproduc- Baldry I. K., Glazebrook K., Brinkmann J., Ivezić Ž., Lupton ing different observed galaxy properties using various semi- R. H., Nichol R. C., Szalay A. S., 2004, ApJ, 600, 681 analytic models help us to fine-tune our understanding of Baldry I. K., Balogh M. L., Bower R. G., Glazebrook K., Nichol galaxy formation and evolution. There are complex inter- R. C., Bamford S. P., Budavari T., 2006, MNRAS, 373, 469 play between various factors (e.g. density, environment) and Bell E. F., McIntosh D. H., Katz N., Weinberg M. D., 2003, ApJS, galaxy properties which leads to large scatter in their val- 149, 289 Blanton M. R., et al., 2003, ApJ, 594, 186 ues and relationships with each other. Fuzzy relations and Cattaneo A., Dekel A., Devriendt J., Guiderdoni B., Blaizot J., their compositions in such situations may help us under- 2006, MNRAS, 370, 1651 stand their relationships from a different perspective and can Cattaneo A., et al., 2007, MNRAS, 377, 63 serve as a complementary measure to other existing tools. Cameron E., Driver S. P., Graham A. W., Liske J., 2009, ApJ, 699, 105 Coppa G., et al., 2011, A&A, 535, A10 Correa C. A., Schaye J., Trayford J. W., 2019, MNRAS, 484, 4401 Driver S. P., et al., 2006, MNRAS, 368, 414 5 ACKNOWLEDGEMENT Drory N., et al., 2009, ApJ, 707, 1595 Hoyle, F., et al. 2002, ApJ, 580, 663 The author thanks Didier Fraix-Burnet for some useful com- Kayo I., et al., 2004, PASJ, 56, 415 ments and suggestions. I would like to thank the SDSS team Kwang H. Lee 2005, First Course on Fuzzy Theory and Applica- for making the data public. I would also like to thank Suman tions, Springer Sarkar for the help with the SDSS data. A financial sup- Lugli, A.B., Neto, E.R., Henriques, J. P. C., Hervas, M.D.A., San- port from the SERB, DST, Government of India through tos, M.M.D., Justo, J. F., 2016, International Journal of In- the project CRG/2019/001110 is duly acknowledged. The novative Computing, Information and Control, 12, 665 MNRAS 000, 1–6 (2020)
6 Pandey, B. Menci N., Fontana A., Giallongo E., Salimbeni S., 2005, ApJ, 632, 49 Nelson D., et al., 2018, MNRAS, 475, 624 Norman C., et al., 2004, ApJ, 607, 721 Pandey, B., Bharadwaj, S. 2006, MNRAS, 372, 827 Pandey B., Sarkar S., 2020, arXiv, arXiv:2002.08400 Planck Collaboration, et al., 2018, arXiv, arXiv:1807.06209 Rosenfeld, A., 1979, Information and Control, 40, 76 Rosenfeld, A., 1984, Pattern Recognition Letters, 2, 311 Strateva I., et al., 2001, AJ, 122, 1861 Taylor E. N., et al., 2015, MNRAS, 446, 2144 Trayford, J. W. et al., 2016, MNRAS, 460, 3925 Wakileh, B.A.M., Gill, K.F., 1988, Computers in Industry, 10, 35 York, D. G., et al. 2000, AJ, 120, 1579 Zadeh, L. A., 1965, Fuzzy sets. Information and Control, 8, 338 Zadeh, L. A., 1973, IEEE Transactions on Systems, Man and Cybernetics, 3, 28 Zehavi I., et al., 2011, ApJ, 736, 59 This paper has been typeset from a TEX/LATEX file prepared by the author. MNRAS 000, 1–6 (2020)
You can also read