Big Data Research Infrastructure Collaboration Toward the SKA (BRICSKA) - SciELO
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
An Acad Bras Cienc (2021) 93(Suppl. 1): e20201027 DOI 10.1590/0001-3765202120201027 Anais da Academia Brasileira de Ciências | Annals of the Brazilian Academy of Sciences Printed ISSN 0001-3765 I Online ISSN 1678-2690 www.scielo.br/aabc | www.fb.com/aabcjournal PHYSICAL SCIENCES Big Data Research Infrastructure Collaboration Toward the SKA (BRICSKA) RUSS TAYLOR, FABIO PORTO, CHENZHOU CUI, YOGESH WADADEKAR & OLEG MALKOV Abstract: Astronomy is entering an era of mega-data that will render conventional research methods as well as data and visual analytics tools ineffective. The Square Kilometre Array (SKA) drives one of the most significant big data challenges of the next decades. South Africa, China and India are partners in the global SKA collaboration and host recently completed, next generation radio astronomy facilities. South Africa, Brazil, China and India are involved in the Large Synoptic Survey Telescope (LSST), which represents a complementary mega-data challenge, vastly increasing the current data volume of optical surveys, and providing critical multi-wavelength data set for SKA analytics. Russian researchers are also engaged in radio astronomy and multi-wavelength, multi-messenger projects driving increasing volumes of observational data. This project brings together teams leading programs in data innovation in each partner country to collaborate on the development of new technologies and systems to meet the big data challenge of SKA pathfinder facilities and the multi-wavelength projects that are critical to the advance of astronomy. In so doing we will prototype and demonstrate scalable big data technologies for the new big data era, and establish a BRICS multinational federated data intensive cloud network for collaborative programs in data intensive astronomy. Key words: BRICS, SKA, big data, large surveys. INTRODUCTION endeavour, and that leadership in science worldwide is inclusively distributed across the Modern science faces several challenges that globe. are transformative in how science operates and in how innovations and new knowledge are Astronomy is a science at the nexus of generated and exploited. The challenges are (i) these challenges and the BRICS nations have Big Data, how to record, transport and deliver all made the strategic choice to invest in it, (ii) Big Compute, how to process, analyse, astronomy and in the technologies underpinning and visualise big data, (iii) Big Science; by its practice; each country with their specific asking big questions as a global community advantages. This project aims to build a strong facilitated by a suite of unique international international partnership to jointly develop facilities, science can only progress in leaps technologies and systems that address the Big through large international collaborations, and Data challenge posed by astronomy Big Science (iv) Human Capacity Development; to make sure projects, of which the participating countries all people are active participants in the scientific are strong members. Seizing the opportunity An Acad Bras Cienc (2021) 93(Suppl. 1)
RUSS TAYLOR et al. BRICSKA presented by large international astronomy new sets of skills and technologies; skills that collaborations, the project will significantly combine knowledge of astronomy with computer contribute to Human Capacity Development in science, information science and statistics, and BRICS to ensure participation and leadership technologies that adapt the innovations of the in global science, and by linking those science fourth industrial revolution to the service of and technology skills to society and industry, scientific research. The international scope and contribute to stimulating innovation. scale of these next-generation astronomical With BRICS countries home to some of endeavours, and the multi-national nature of the the best astronomy facilities in the world, this collaborations on large science programs means project aims to facilitate, through development that we must respond to these challenges as a of people and technology, the synergies between global community. these facilities, thereby addressing some of the Three of the BRICS partner countries big challenges facing science today, and helping are part of the international SKA project answer some of the big scientific questions of organisation. All partner countries are engaged our time, which have mobilised international in data-intensive multi-wavelength and resources into large scientific projects like the multi-messenger programs providing critical SKA and the LSST. ancillary data to realise the scientific potential The SKA drives one of the most significant of SKA pathfinders and the SKA itself. The LSST big data challenges of the coming decade (An is the flagship of upcoming optical facilities 2019). New technologies under development that will create vast data sets that must be on the pathway to the SKA are creating mined both for transient phenomena and to commensurate advances in the data capacities create catalogues of multi-wavelength data of radio telescopes. The data rates from for billions of objects that are interoperable existing radio telescopes to the researcher with radio data. Four of the BRICS partner are 10,000 times larger than what was typical countries are international collaborators of only a few years ago. At the same time the the LSST project. The impact of MeerKAT and enterprise of observational radio astronomy is the SKA will be dramatically increased by changing. Projects undertaken by large global incorporating multi-wavelength data, and the collaborations, in which large amounts of LSST dataset will be the widest, deepest and observing time are devoted to major key science fastest optical survey ever done. Combining programs that create vast data sets, is becoming these datasets is however a non-trivial task. the new paradigm. This mode of observing This proposal brings together scientific leads combined with the new instrumental capacities of large programs on SKA pathfinders, the LSST is driving an exponential growth in the rate of and other multi-wavelength facilities along data confronting researchers (see Figure 1). with computer scientists, data scientists and To rise to the scientific opportunity opened experts in 4th industrial revolution technologies up by this new generation of instruments to build solutions for big data astronomy requires the research community to develop that will enable the full scientific potential new infrastructure, algorithms, software systems of significant investment in major facilities, and platforms to manage, process, analyse leverage investment in HPC systems for data and mine these data sets. The enterprise of intensive research, and develop the expertise science in this new data intensive age requires to adapt and innovate new technologies to meet An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 2 | 22
RUSS TAYLOR et al. BRICSKA Figure 1. Data volumes for large projects on radio astronomy facilities as a function of time. Current and planned BRICS facilities are indicated in red. BRICS researchers are being confronted with a data deluge in the coming decade characterized by exponential growth that is much faster than has been experienced by the rest of the world to this point. the exponential growth in astronomical data in Realizing the HCD and development potential the next decade. of the flagship naturally calls for a coherent effort across the BRICS nations. The proposal Enabling access to globally competitive team and their respective national networks research for smaller institutions previously have extensive and complementary experience isolated from research; revealing and developing in all the topics mentioned above. Working untapped talent in the BRICS countries; jointly on those topics prevents duplication embedding young researchers into the global of effort in developing such platforms and networks of science and industry; increasing promotes sustained collaboration between direct economic benefits of science through partners countries beyond astronomy research. interfaces with the private sector; increasing the awareness of, and the support for science SCIENTIFIC RATIONALE among the general public with outreach; supporting development projects through The concept of the Square Kilometre Array (SKA) interdisciplinary projects using the research emerged in the 1990s. Currently, it is one of infrastructure; and developing communities the largest international science projects in the through the development of astro-tourism at world, as it is set to answer fundamental open astronomical facilities in the BRICS countries questions about our universe. Using the SKA are some of the potential development benefits and other telescopes described in Table I the of this flagship project that are detailed below. participants in this project are research leaders An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 3 | 22
RUSS TAYLOR et al. BRICSKA in most of the SKA key science projects (KSP) as Magnetism detailed below. Magnetic fields are an important element of astrophysical phenomena that is yet to be well understood. They contain energy and influence physical processes on all scales, from Science Goals star formation to galaxies and to the largest structures known in the universe. We don’t yet Fundamental Physics know how these magnetic fields formed and we are not in a position to explain how strong The discovery of gravitational waves has they are. What SKA will be able to do is map opened up a new window to the universe. magnetic fields in great detail. For this question SKA will be crucial to look for counterparts and others, the MIGHTEE project on MeerKAT in radio to the sources of those gravitational (Jarvis et al. 2016) will reach similar depth to waves. For example, the uGMRT has already that reached by the larger area SKA sky survey, demonstrated this with meaningful radio and thus will provide a pilot to the experiments follow-up observations of gravitational wave that will be carried by the SKA on a much larger events such as the GW170817 which was the survey volume. Similarly, when combined with merger of two neutron stars (Kim et al. 2017). uGMRT data, the SuperMIGHTEE data set (Taylor Combining the SKA with the Five-hundred-meter 2019) will become the premier data on the GHz Aperture Spherical Telescope (FAST) in China radio properties of the deep sky thanks to its (Gibney 2019) may also allow us to study the unique full spectral coverage and ultra-broad low-frequency universe using high precision band nature. pulsar timing, massive black holes, interacting galaxies in galaxy pairs and in groups of The Hydrogen Universe galaxies. Some of the fastest and most energetic phenomena hold the key to our understanding Neutral hydrogen is a powerful probe of the of fundamental laws of physics in conditions of evolution of the universe. Neutral hydrogen is extreme energy. The detection of powerful bursts found everywhere and can therefore be used of gravitational waves, gamma rays, X-rays, or to probe anything from the early distribution radio waves provide evidence for this; however, of matter to the detailed structure of evolved we still know very little about their appearance in galaxies themselves. Radio telescopes are optical light and therefore combining data from the best instruments to observe spontaneous large modern facilities into multi-wavelength emission in radio waves of neutral hydrogen data sets is key to understanding what is and therefore to map the distribution of neutral happening in those conditions of extreme hydrogen in the universe. The SKA precursor energy. Contributions to this science objective and pathfinder telescopes are enabling detailed are expected directly from the ThunderKAT imaging of neutral hydrogen with unprecedented project on MeerKAT (Fender et al. 2016), with the depth and detail, laying the groundwork for the MeerLICHT telescope, FAST, transient detection in SKA. From detecting galaxies in the early universe the era of the Large Synoptic Survey Telescope in emission and absorption as proposed by (LSST), uGMRT follow-up of LIGO events, and LADUMA (Blyth et al. 2016) and MALS (Gupta more. et al. 2016) to investigating galaxy formation and An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 4 | 22
RUSS TAYLOR et al. BRICSKA evolution with MIGHTEE, with these projects on environment. Because of this, radio transients the MeerKAT telescope, the evolution of neutral are invaluable probes for subjects as diverse hydrogen in the Universe will be studied. The as stellar evolution, relativistic astrophysics low-frequency uGMRT receivers will complement and cosmology. ThunderKAT is a MeerKAT large these by enabling detections at even higher survey program to detect and study such redshifts, and many new results are already phenomena using the high sensitivity and wide being reported from systematic surveys of HI in area imaging capability of the MeerKAT. As well emission and absorption, including some of the as performing targeted programs, ThunderKAT furthest detections to date of HI in emission. will co-observe with other MeerKAT large survey The science programs of FAST also include a projects and search their data for transients. The large scale neutral hydrogen survey. Matching uGMRT also provides for significant capabilities detected galaxies with objects and signals in for detections of transient phenomena in the other wavelengths will enable new discoveries, Universe. and new science. These ambitious research In addition, optical observations of projects will help us understand how this neutral transients may lead to important discoveries in gas eventually turns into stars, and even how other fields of astronomy: Near-Earth Objects, supermassive black holes at the centres of nearby asteroids potentially dangerous for Earth; galaxies are fueled. exoplanets; variability of super massive black holes in galaxies; stellar flares; gravitational The Transient Universe lenses; and probably several other yet-unknown types of fast-changing phenomena that have not One of the key capacities of the new generation yet been observed. of radio telescopes and of the SKA is the To enable such studies, there is a need ability to measure and detect transient objects. for Big Data and Big Compute facilities within While the universe may appear to evolve only the BRICS group to fully exploit the data over scales of billions of years, astrophysical resources following from transient science phenomena are all dynamic and some can occur observations obtained with the facilities within over incredibly short timescales. We have only the BRICS countries. The BRICS member gained an awareness of this as the fields of countries’ participation as global leaders in view of telescopes and the ability to observe on astrophysical transient research is expected to short time scales have improved with advances accelerate greatly over the next decade with the in technology. Time domain astronomy, where development of new facilities. A cornerstone repeated observations can be undertaken over of this will be the emergence of the LSST as a range of timescales (from sub-second to a generator of transient discoveries. Two of the years), is key to obtaining data of sufficient BRICS countries (Brazil and South Africa) are now cadence to unlock the nature of transient objects actively participating in LSST transient science and understand the dynamics of astrophysical collaborations and more may still do so in the events. future. Transient radio emission is associated with essentially all explosive phenomena The Continuum Universe and high-energy astrophysics in the universe. It acts as a locator for such events, and Radio continuum surveys were identified as a measure of their feedback to the local a scientific priority for the SKA for several An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 5 | 22
RUSS TAYLOR et al. BRICSKA reasons, including galaxy evolution, magnetism, cosmology is entering a new era where the transients, and more. Among those priorities large-scale structure in the universe can be are the understanding of the star formation mapped in three dimensions. Moreover, science history of the universe - to which MIGHTEE collaboration between FAST and MeerKAT on and SuperMIGHTEE will contribute by studying the intensity mapping of neutral hydrogen the evolution of star-forming galaxies, and of will contribute to a specific cosmological active galactic nuclei (AGN), where supermassive measurement; that of the so-called Baryonic black holes reside, and how this affects the Acoustic Oscillation (BAO). This is fundamental, evolution of galaxies and their environment. as the large-scale structure of the universe and Another important aspect is the potential how it came to be, of which the BAO is a keystone for serendipitous discoveries. Unplanned measure, is one of the pillars of our current discoveries often lead to fundamental insights understanding of cosmology. into new physics. A technique called Very Long Baseline Interferometry (VLBI) is key for this Large projects on next-generation BRICS last point. The technique combines signals facilities from individual radio observatories to form a The scientific journey between now and SKA1 single, Earth-sized radio interferometer with an will be charted through large science programs angular resolution several orders of magnitude with new SKA pathfinder and precursor facilities. finer than offered by connected-element These programs map directly onto the key interferometers. Many important results have science goals of the SKA, and prototype not only resulted from the use of the VLBI technique, the technical and scientific capabilities of the a recent example being the imaging of a SKA but also the growth of big data in astronomy black hole by the Event Horizon Telescope and the new modality of global collaboration (Kohler 2019). One of the science goals of the around very large projects generating vast FAST telescope is provide the largest aperture amounts of data. At the same time developments in international cm-wave VLBI networks, and new projects in optical and infrared enabling unprecedented ultra-sensitive imaging. astronomy such as the LSST are set to create a Similarly, MeerKAT will add an extremely rapid growth of multi-wavelength data sets. sensitive antenna to global networks, paving the The proposal team includes leaders of large way towards SKA-VLBI (Paragi et al. 2015). programs on SKA pathfinder facilities and in multi-wavelength astronomy, most of which Cosmology involves collaborations among researchers in The SKA precursor and pathfinder telescopes the BRICS partners. The data solutions to be are making possible new types of observations; developed will thus advance key science projects neutral hydrogen intensity mapping surveys that that address the fundamental questions driving do not detect individual galaxies. Combined multi-national investment in megascience with radio continuum surveys that detect radio projects in the coming decade. galaxies out to very high redshift as proposed by This flagship program and the complementary LADUMA and to a certain extent MIGHTEE, and flagship program on transients are seen as some of the ongoing and proposed programs synergistic, where together they can both with the uGMRT, and with LSST and optical advance the science and also the associated observations able to measure cosmic distances, development and spin-off benefit goals. This An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 6 | 22
RUSS TAYLOR et al. BRICSKA Table I. Current and Planned Facilities providing big data challenges for BRICS Astronomy Research. Telescope Location Wavelengths Data rate SKA1-mid South Africa Radio: 350 MHz - 14 GHz 1000 TB/day MeerKAT South Africa Radio: 580 MHz - 3.5 GHz 10-100 TB/day FAST China Radio: 70 MHz - 3 GHz 690 TB/day uGMRT India Radio: 0.1 - 1.45 GHz 10 TB/day PRAO Russia Radio: 111 MHz 0.1 TB/day AVN, VLBI Global Radio: 1 - 100 GHz 10 TB/day LSST Chile Optical - near Infrared 20 TB/night MeerLICHT South Africa Optical 0.1 TB/night will leverage existing and planned new facilities radio frequencies in the GHz range. The MeerKAT within the BRICS countries and will also draw science program includes large survey projects on the opportunities presented by other space- that will occupy about 70% of the observing time. and ground-based facilities that exist within Each program requires typically thousands of the BRICS group. As shown in FIgure 1 the hours of observing and will, over the course of new generation of BRICS facilities that drive 5 years, create Petabyte scale data sets. These the data challenge in the next decade are data sets must be converted to science products MeerKAT, the upgraded Giant Metrewave Radio and then visualised, analysed and mined for Telescope, the Five-hundred-metre Aperture the answers to the scientific questions. Large Spherical Telescope and the SKA1-mid. Parallel programs relevant to this proposal include: developments in wide-field Very Long Baseline The MeerKAT International GHz Tiered Interferometry and mult-wavelength astronomy Extragalactic Exploration (MIGHTEE) project with the LSST provide complementary data is being undertaken by an international challenges that are critical to our overall science collaboration of researchers. The MIGHTEE objectives. observations will provide radio continuum, spectral line and polarisation information. MeerKAT MIGHTEE, along with multi-wavelength data, will allow a range of science to be The MeerKAT telescope is the precursor achieved, as listed above. to the SKA mid-frequency array. MeerKAT was inaugurated in July 2018, and science The LADUMA (Looking At the Distant operations have begun. MeerKAT will operate Universe with the MeerKAT Array) project as a stand-alone South African facility for will study evolution of the neutral hydrogen approximately five years until its 64 antennas are gas in galaxies over cosmic time spanning merged with 135 SKA antennas to form SKA1-mid. over two-thirds the age of the Universe. The The large number of antennas, high sensitivity, survey strategy is to observe a single area and wide-field of view of MeerKAT make it the on the sky rich in existing multi-wavelength premier imaging telescope in the world for data. The primary science goals of LADUMA An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 7 | 22
RUSS TAYLOR et al. BRICSKA include investigating, as a function of directly and detect transients in other environment and look-back time: the MeerKAT large survey projects. This distribution of mass of neutral hydrogen commensal use of the other surveys means of galaxies and how it depends on stars that the combined MeerKAT large survey and dark matter in their environment projects will produce by far the largest and the evolution of the cosmic neutral GHz-frequency radio transient program gas density. LADUMA will be the deepest before the SKA. HI survey before the SKA comes online, The associated MeerLICHT optical thereby acting as a pilot project for future telescope will simultaneous take large wider-field deep observations of neutral field optical images for all night-time hydrogen to be carried out with the SKA. MeerKAT observations and create a large The MeerKAT Absorption Line Survey (MALS) optical data set for unique and powerful will carry out the most sensitive search of multi-wavelength studies of transient HI and OH absorption lines at 0 < z < 2, phenomena. the redshift range over which most of the evolution in the star formation rate density The Upgraded Giant Metrewave Radio takes place. The key science themes of Telescope the survey are: (1) Evolution of atomic and The National Centre for Radio Astrophysics of molecular gas in galaxies and relationship the Tata Institute for Fundamental Research with star formation rate density, (2) Fuelling operates the Giant Metrewave Radio Telescope of active galactic nucleus (AGN), AGN (GMRT) on the Deccan plateau north of Pune feedback and dust-obscured AGNs, (3) India. The Giant Metrewave Radio Telescope Variation of fundamental constants of (GMRT) consists of 30 antennas each of 45-meter physics, (4) Evolution of magnetic fields diameter, spread over a region of 25 kilometres. in galaxies, and (5) Physical modeling of It is the world’s largest and most powerful radio the ISM, Astrochemistry and Cosmology. interferometer dish array in its frequency range Due to the excellent sensitivity of the (100 to 1450 MHz). The GMRT has undergone a MeerKAT telescope, MALS will also deliver major upgrade with new receivers, control, data an extremely sensitive HI 21-cm emission, transmission and wide band correlator systems radio continuum and polarization survey that has substantially improved performance to address a wide range of issues at the and increased the data rate from the array by forefront of galaxy evolution research. over a factor of 10. The data rate can be as ThunderKAT is MeerKAT large survey high as 100 TB/hour at the raw voltage recording program to detect transient radio signals. mode. The upgraded GMRT is an SKA pathfinder ThunderKAT comprises a comprehensive and has the capability to demonstrate SKA kind and complementary program of surveying of science with unprecedented sensitivity at low and monitoring both galactic (within the radio frequencies Milky Way) and extragalactic transients There are several large programs underway such as microquasars, supernovae with the uGMRT including search for pulsars and and possibly yet unknown transient transients (both targeted and general surveys); phenomena. ThunderKAT will both survey precision timing of millisecond pulsars with An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 8 | 22
RUSS TAYLOR et al. BRICSKA the aim to contribute to the global effort for Commission (NDRC) and managed by the the detection of gravitational waves; detailed National Astronomical Observatories (NAOC) imaging of selected target deep fields; detailed of the Chinese Academy of Sciences (CAS), studies of galaxy clusters; large surveys in with the government of Guizhou province the neutral hydrogen line – both in emission as a cooperation partner. Being the largest and absorption; radio follow-up of supernovae, filled-aperture telescope worldwide located at gamma-ray bursts, and also gravitational wave a radio quiet site, the FAST science impact on detections, etc. astronomy will be extraordinary, and has the With interferometer baselines almost four potential to revolutionise many other areas times larger than those of the MeerKAT, the of the natural sciences. Compared with its uGMRT offers similar imaging angular resolution precursor Arecibo, FAST has an advantage of at as MeerKAT at longer wavelengths. Moreover, a factor of two in raw sensitivity and a factor the increased bandwidth and receivers of the of five to ten in surveying speed. FAST will uGMRT provide similar sensitivity to MeerKAT. also cover two to three times more sky area The uGMRT and MeerKAT together thus have thanks to its innovative design of an active tremendous synergy as complementary facilities primary surface. A science instrument with an for the study of the deep radio sky over an ultra order of magnitude improvement in any of its broad band, opening a powerful new approach capacities, which is also able to explore new to studies of the distant universe. dimensions in parameter space, is likely to The SuperMIGHTEE project forms part of a generate unexpected discoveries. bilateral Indo-South African Flagship program FAST produced its first light in September in Astronomy, for which we have developed 2016 and following a commissioning, phase a technical and scientific collaboration to normal operation to commenced in late 2019. expand MIGHTEE to a joint MeerKAT and uGMRT The science programs of FAST mainly include project. The unique full spectral coverage and a large scale neutral hydrogen survey, pulsar ultra-broad band nature of the superMIGHTEE observations, leading the international very data set will make it the premier data on the GHz long baseline interferometry (VLBI) network, radio properties of the deep radio sky, and will detection of interstellar molecules, detecting not be surpassed until well into the SKA phase 1 interstellar communication signals, and pulsar era. timing arrays. FAST is currently the largest single The Hz-MALS project uses uGMRT to extend dish telescope, while MeerKAT is the largest the redshift coverage of MALS to 2 < z < 5.2. This aperture synthesis array telescope. redshift coverage is unique to uGMRT and the Science collaboration between the two outcomes from the survey make it highly relevant could probe pulsars, exotic events, the low to design future surveys with SKA. HI surface brightness, low mass galaxies and the HI intensity mapping for cosmological BAO The Five-hundred-meter Aperture Spherical measurement. The combined surveys may also Telescope allow us to study low-frequency gravitational The Five-hundred-meter Aperture Spherical waves using the high precision pulsar timing, Telescope (FAST) is a next-generation extremely massive black holes, interacting galaxies in large single dish radio telescope. It is funded galaxy pairs and in groups of galaxies. FAST is by the National Development and Reform facing a similar situation of big data processing. An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 9 | 22
RUSS TAYLOR et al. BRICSKA The required data transmission rate of FAST is at time domain astronomy by detecting millions of least 6GB/s, which represents a big challenge for transient and variable phenomena each night. data transfer and processing. LSST will be the next big data generator of optical transients, estimated to be of order several Very Long Baseline Interferometry million detections of new transients or variability per night. Analysing this enormous quantity of Among other things, VLBI enables the study of data to allow rapid follow-up of transients with AGNs and black holes, magnetic fields, galaxy other telescopes will require significant compute formation, or even cosmological dark energy infrastructure, as well as novel algorithms. The models. In this new survey-driven era, it is vital parallel proposal for a dedicated BRICS-wide that global VLBI networks continue to increase Flagship program on transients shares many their sensitivity and observing capabilities to of the Big Data and Big Compute challenges enable high resolution followup, discovery, and discussed in this proposal in relation to the statistical analyses at radio wavelengths. This SKA. The parallel proposal aims to develop a is of particular importance in the realization network of ground-based optical telescopes for of SKA1-mid, at which point VLBI is in effect the follow-up of astrophysical multi-wavelength merged into a standard connected-element and multi-messenger transient objects. The radio interferometer. SKA1-mid will thus carry demands for data handling associated out high resolution wide-field surveys by design. with transient studies amongst the BRICS The increasing wide-field VLBI capability poses collaboration will be met by the implementation significant big data and big compute challenges. of this Big Data Flagship program. VLBI has been at the heart of the SKA since its conceptualisation. One of the PROJECT OBJECTIVES key science objectives of FAST is to lead the international VLBI network. Shanghai The objectives of the proposed research are Astronomical Observatory, Chinese Academy of to develop and foster collaborations between Sciences (SHAO) have been developing VLBI researchers among the five BRICS partner techniques since 1972. The longest baseline countries to address the data challenges of the of the Chinese VLBI Network is 3249 km. With coming era. Specifically we have the following wide-field VLBI being so closely tied with goals: developments towards the SKA, as well as VLBI itself being inter-continental in nature, it fits very Establish a prototype network of federated naturally into the goals and aspirations of this cloud-based infrastructure and e-science BRICS proposal. tools to support the next-generation data intensive radio astronomy to serve Large Synoptic Survey Telescope our national research communities, and exploit synergies between partners and The full science goals of the SKA will only be opportunities for enhanced research achieved in synergy with other instruments. The through joint big data astronomy science LSST will create a data set of multi-wavelength programs. information for a vast number of cosmic radiation sources that can be mined jointly with Create new technologies and modalities radio survey data. The LSST will also transform for visualisation and machine learning An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 10 | 22
RUSS TAYLOR et al. BRICSKA enhanced visual analytics of cloud based HPC clusters within clouds systems. Areas being remote big data, as well as cloud-based worked on include: tools for collaborative exploration by Elastic scheduling and workflow tools for distributed research teams. OpenStack and CloudStack Develop cloud-based provisioning of HPC Distributed role based authorization to for automated processing workflows for resources. use by science teams. Provisioning flexible containerised analytics environments such Scheduling of data movement between as Jupyter notebook allowing interactive sites. engagement with the data for user Scheduling of compute jobs to sites. applications. Such platform and systems will empower research teams to work Use of containerisation to support with big data. Such workflow systems will movement of applications and reproducible address the challenges of reproducible science. science with big data. Teams in South Africa (IDIA) and China (NAOC, SHAO, Alibaba Cloud) are also working on Develop cloud-enabled systems and science gateway and portal technologies for user environments for the fusion of access to data products, to cloud processing and multi-wavelength data sets such as analytics. the LSST with radio data, and for multi-messenger and transient science, Algorithms and Machine Learning for Big Data including a cloud-based LSST data capacity Processing and Analytics Perhaps the largest in South Africa to host and analyse area of collaboration lies in the development LSST transient events. An objective here of novel algorithms and HPC software for is to develop and incorporate machine processing and analysis of big (radio) astronomy learning to analyse data and enable us data, and we include the following subsections to characterise, classify (and discover) the for collaborative purposes: unknown and prioritise rapid response programs. processing pipelines for raw data to science products, including development and Develop scalable knowledge bases and integration of workflow systems, research data lakes to provide access to integrated data management and reproducible and correlated knowledge transforming science technologies (South Africa: IDIA, raw data into scientific knowledge on a Rhodes University; India: NCRA-TIFR, IUCAA, global scale. IISER; Brazil: LNCC, LIneA; China: NAOC, Tianjin University, Guangzhou University, Data intensive cloud architectures: Teams Russia). We aim to leverage the scientific within IDIA (South Africa) will work closely and technical skills from our partners to with researchers based in India (NCRA-TIFR), further develop workflows and recipes for China (Tianjin University, SHAO, Alibaba Cloud) calibration and imaging pipelines. and Brazil (LNCC) to build open-source based In addition, we will test and develop cloud architecture and tools to enable deployment of interfaces to allow for seamless interaction An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 11 | 22
RUSS TAYLOR et al. BRICSKA with radio astronomy software, with the Cloud-based Visual Analytics and Exploration of aim to facilitate collaborative development Big Data Cloud-based visual analytics of remote for pipeline design and data inspection. big data (South Africa: IDIA; China: SHAO, Alibaba Cloud, India: IUCAA). This will expand on an initial framework being developed at IDIA for algorithms for mining big data for low-latency cloud server-client visualization of information, including machine learning remote big data including 2D and 3d (virtual (South Africa: IDIA, SARAO; Brazil: LNCC, reality) interfaces. We will expand on a new hdf5 LIneA; India: NCRA-TIFR, IUCAA, SPPU; image data set framework that is optimized for China: NAOC, Tianjin University, Guangzhou big data and develop cloud provisioned HPC University, Russia). We will use the server-side rendering technologies and tools techniques developed by e.g. the Spitzer that interface to multiple remote clients. We Data Fusion and the HELP EC-FP7-SPACE will design interfaces and tools sets for visual projects to merge multi-wavelength analytics of big data sets from the cloud from surveys and develop machine learning MeerKAT, uGMRT and FAST in response to use tools to best exploit the wealth of cases that meet the needs of the BRICS data multi-wavelength data available in intensive programs. upcoming radio deep survey fields and enable new discovery space. HCD and Development Human capital development (HCD) is a key data systems and distributed knowledge aspect of the project. An integral part of the bases for management for fusion and new approach to science driven by big data, big mining of diverse data sets. This work compute and large international collaborations will be lead by LNCC advancing their is to create and embed the platforms within the developments for LSST. research communities so that researchers may work collaboratively on these large, diverse data The novel processing challenges and sets. Such platforms are best developed through extraordinary image sizes emerging from an extensive, international program such as the field of wide-field VLBI presents a the one proposed here: it promotes sustained new type of problem that foreshadows collaboration between partners from the BRICS the SKA (South Africa: IDIA, SARAO; China: countries, and it prevents duplication of effort SHAO, FAST; India, Russia). We will interface in developing such platforms in individual disparate pieces of custom-built software countries. into a more general, multi-purpose VLBI The project will develop the required survey toolkit. A component that will training resources that can run in the cloud require new development, based on an and give access to large data sets. The existing pilot project, will be the area HCD and education platform development of VLBI source-finding. At present, this and maintenance needs to be supported does not include Bayesian inference by staff members embedded in the partner and machine-learning that are relatively organisations that develop and maintain such straightforward extensions of the pilot infrastructure for research. Material for running source-finding project. workshops will be made openly available, and An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 12 | 22
RUSS TAYLOR et al. BRICSKA referenced extensively through the proposal exploiting the pivotal role of ICT and harnessing participants’ global networks. the potential of big data; exploiting the full By developing a dedicated platform to run potential of scientific knowledge and improving training workshops, data science hackathons the performance of historically disadvantaged and research schools that operates in the universities; focusing on inter-disciplinary cloud, the HCD and skills development efforts research; expanding research infrastructure, of this project will be greatly enhanced. including cyber-infrastructure; and exploiting It will be possible to multiply this model the potential of STI for African development and across BRICS countries, in smaller institutions continental integration. as described above, and beyond. Relevant This research project is closely aligned with experience of proposal participants includes the SA National Strategy for Multiwavelength the support by IDIA of DARA Big Data Astronomy, building on the large investment Africa research schools and hackathons South Africa has made in radio astronomy (https://idia.ac.za/BigDataAfrica-2018/), the (through MeerKAT and the SKA), as well as at Astronomy Data Science Toolkit developed by optical wavelengths (SALT, LSST). This proposal the IAU OAD (http://datascience.astro4dev.org), seeks to build capacity in the development and the expertise of our Chinese collaborators of solutions in processing, analysing, mining, (NAOC, Alibaba Cloud). visualising and maximising the science coming from increasing large, rich and complex ALIGNMENT WITH NATIONAL STRATEGIES multi-wavelength data sets. By bringing the LSST data to the SKA-mid at an SA LSST data centre, South Africa we will enable multi-wavelength studies and The South African government White Paper increase the impact of SKA science. In addition, on Science Technology and Innovation (STI) ensuring access to the LSST alert stream will envisions STI enabling inclusive and sustainable allow the full use of the rich range of telescopes South African development in a changing within South Africa for transient follow-up. world. This proposal addresses two of the Development of the hardware, software and three high-level goals spelled out in the human capacity within the field of VLBI is a key White Paper, namely to take advantage of part of SARAO and DST?s strategy to develop opportunities presented by megatrends and the African VLBI Network and the expertise technological change, and to contribute to within the SKA African partner countries to a more inclusive economy at all levels. The become world-class users of this instrument. grand data challenge in astronomy is part of Furthermore, this is directly aligned with the the magatrend characterised as the fourth strategic goal of the AVN to form the backbone industrial revolution, and the integrated HCD of the SKA Phase 2 to be built across the African strategy is specifically aimed at making the continent. development of skills high in demand more This project aligns with the National inclusive, and to facilitate the translation Strategy for Human Capacity Development of highly skilled young scientists into the for Research, Innovation and Scholarships economy. Recognising that STI can contribute through the proposed collaborations on to the Sustainable Development Goals, the globally competitive research and innovation. White Paper highlights the importance of: These strategic international engagements will An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 13 | 22
RUSS TAYLOR et al. BRICSKA enhance the research and technological capacity scientific portal for the publication of DES data for innovation within South Africa, as well as and the execution of scientific pipelines. Brazil ensure the establishment and maintenance of is also participating in the LSST project, through research infrastructure and platforms. In this the Brazilian Participation group (BPG), led by way, the project is also aligned to the Strategic LIneA and the National Laboratory of Astronomy Research Infrastructure Roadmap. (LNA). Through these different initiatives, Brazil This project helps advance the SA National is contributing to the development of astronomy Integrated Cyberinfrastructure System (NICIS). in South America through the development It leverages the investment in the DIRISA of Big Data software and the availability of Tier 2 data intensive research facility to HPC and network infrastructure. Moreover, it is work with international partners on data investing in the education of multi-disciplinary intensive challenges that will be applicable human resources founded in basic science and to cyberinfrastructure solutions in SA for computer science. astronomy and other disciplines such as bioinformatics. China China is one of the prominent participants in Brazil SKA. SKA global collaboration is an important Brazil is engaged in establishing data centers in task for building a community of common support to eScience. The initiative includes HPC destiny for mankind, as proposed by the infrastructures, with the prominent availability Chinese government. The Chinese government of the Santos Dumont supercomputer services put large investment in technology research, available for the whole scientific community data center and human capital development. and currently hosting more than 100 scientific Chinese scientists have been working closely projects. In a complementary way, the RNP with members from other countries preparing (National Research Network) is fostering the for the technology and science using SKA data, improvement of links within Brazil and with and are involved in 13 associated science working Africa, Europe and the US. In particular the groups and focus groups of SKA. The China SKA Monet cable has been installed linking Chile to science team work together with the information, Miami, and passing by São Paulo enabling the communication and computer industries aiming transfer of data produced by the LSST project. at tackling the challenges associated with the Additionally, the SAIL and SACS cables now link SKA big data, which will not only promote major Brazil to Africa through Angola and Cameroon, original scientific discoveries, but also apply respectively, improving the connection with the the obtained technological achievements for African continent countries, including South stimulating the national economy. This project Africa. Brazil is intensively involved in important is perfectly in step with the goals of China SKA astronomy surveys. The LIneA laboratory hosts a team. tertiary site of the Sloan SDSS III collaboration Exploring the synergy between FAST and the providing a replica of the SDSS portal with MeerKAT/SKA projects is part of the research access to published data from the Latin America content of this project, and will open up new Community. Brazil additionally, supports the research areas for scientific users of both Dark Energy Survey (DES) project, through FAST and SKA data. With the improvements in the DES-Brazil consortium, making available a sensitivity and resolution by the combination An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 14 | 22
RUSS TAYLOR et al. BRICSKA of FAST and MeerKAT, the survey area of India is a full partner in the SKA, and their common sky is still quite large. The NCRA-TIFR is the nodal institute leading the science cooperation between the largest single Indian participation in the SKA. Indian scientists dish telescope, FAST, and the largest aperture and engineers lead by NCRA, with significant synthesis array telescope, MeerKAT/SKA, in the participation from Indian industry, are involved world, will result in detection of hundreds or in several work packages of SKA. In particular, even thousands of new pulsars and some exotic NCRA is leading the Telescope Manager (TM) work events, as well as allowing the study in details package, which is the brain and nervous system of the exotic events and the HI gas in nearby of the SKA. In addition, researchers from India galaxies. This is also highly consistent with the are involved in significant roles in projects on scientific goal of FAST. SKA precursor facilities like MALS on MeerKAT National Astronomical Data Center (NADC) of in South Africa and Solar science on MWA in China is recently established based in National Australia, many of which involve the need to Astronomical Observatories, CAS (NAOC) on the handle large volumes of data. India also plans purpose of managing astronomical data at the to set-up a SKA regional data centre to cater national level and providing data services and to the specific needs of storing SKA data and technologies for the whole life cycle of data developing specialised analysis pipelines that management, including outreach and education. are of benefit to Indian researchers involved with This project works on cloud-based computing key science projects of the SKA. environments and scientific platforms for data Finally, much of the above activity envisaged users, and on data techniques including machine in India is being carried out in collaboration learning, visualisation and data mining, which is with industry partners, many of whom bring new aligned with NADC’s goal. ideas to the table and are willing to (and are well equipped to) take on the challenges of big India data analytics that are entailed in these ongoing The Giant Metrewave Radio Telescope is a and planned activities. This includes the exciting SKA pathfinder facility. The recently completed new field of machine learning which is finding upgrade of the GMRT has further enhanced increased role in the plans of researchers from the front-line capabilities of the GMRT, and India. several exciting new discoveries and results Given the above, this project is very strongly are being produced. It has also resulted in aligned with both institutional and national an order of magnitude increase in the data level strategies and plans. Indian researchers rates from the observatory, making it imperative and institutions will be able to contribute to adopt big data intensive techniques, and meaningfully to this initiative, and also stand to improved, efficient data analysis pipelines to benefit significantly in their efforts on existing process the data. In order to maximise the and planned Indian and other BRICS member science returns from the uGMRT, NCRA plans to facilities. It will also allow Indian industry to implement imaging pipelines to create science engage and grow in these high-tech areas ready products for the benefit of users who do with a view to becoming a major player in not have the resource to handle large data. There the global arena. The goals and activities is significant scope for new developments, and of this collaborative project also align well these are on the road-map going forward. with India’s plans for effective participation in An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 15 | 22
RUSS TAYLOR et al. BRICSKA the construction phase of the SKA, including size and scale. The development of the research expediting the work on setting up of a SKA infrastructure and algorithms needed to tackle regional data centre in India, and enhancing the big data and big compute challenges meaningful partnership with BRICS members addressed in this proposal lends itself to joint participating in the SKA. research and development projects with industry for the local, regional and global markets. A Russia multi country collaboration, involving industry partners is critically needed for this activity. In the future of multi-wavelength, multi-messenger A successful collaboration will lead to the astrophysics, it is important to develop development of human and technical capacity the ability to organise timely, coordinated within academia and industry in all BRICS observations across multiple facilities with ever countries. For example, India has an ongoing increasing volumes of observational data, as well engagement with industry in several areas. as the technical capabilities and expertise to NCRA is already in collaboration with some efficiently sort, search and analyze large sets of industry partners like Thoughtworks India Ltd, heterogeneous observations. Russia operates, Persistent Systems Ltd and TCS Research on or is involved with, many observational facilities, handling large data, machine learning and other including numerous ground-based optical and technical aspects with regard to SKA regional radio telescopes, the BDUNT and BNO neutrino data centers. In Brazil Petrobras and DELL Brazil telescopes, the Russian-German Spektr-RG have expressed interest in the project. In South space-based X-ray observatory, two international Africa, there are ongoing conversations with networks of optical telescopes (MASTER and IBM Research - Africa and SAC, a local Space ISON), and the proposed BRICS-based network Engineering company. Underpinning the growth of optical telescopes. Besides the technical of industrial and economic activity is also the requirements outlined above, the fusion of need for trained talent. The inspirational aspect these many astrophysical data sets also requires of astronomy attracts young people to science intensive training of new specialists and human and technology fields but not all will become capital development, with a particular focus astronomers. Engaging in those studies, however, on the role of exchange between the BRICS leads to the acquisition of skills that are in high countries. Besides direct application to the field demand in industry. This flagship will play a of astrophysics, the knowledge and techniques key role in making that talent available for the developed throughout the course of the project growth of industry in the BRICS nations. are a high priority for other fields faced with similar challenges, including geophysics and Revealing and developing untapped talent climate research, telecommunications, energy efficiency, material sciences and medicine, BRICS countries are characterised by a growing among others. young population hungry for opportunities. HCD will take place through graduate study support BENEFITS BEYOND ASTRONOMY in the form of scholarships and participation in the research and innovation that this flagship Economic impact: collaborations with industry will drive among other things. As part of their Building and managing a SKA regional centre is training, young researchers supported by this a formidable challenge due to its unprecedented network will learn communication skills and the An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 16 | 22
RUSS TAYLOR et al. BRICSKA international aspect of the project will equip The inclusive participation of all research them with multicultural experiences critical to institutions in the BRICS countries needs to being competitive in a global workforce. The joint be promoted. The international nature of this experience of the proposal partners ensures a flagship guarantees that such efforts are not in successful implementation of this approach to isolation and offer collaboration opportunities HCD. previously not easily accessible for those The benefits of an associated HCD program institutions. In South Africa for example, this will be felt wider than astronomy and computing. means access for historically disadvantaged Students and early career researchers have the institutions with the support of well established technical and problem-solving skills that are institutions. In China, it may mean that smaller vital to many sectors of a knowledge-driven universities in that vast country can join the economy, and that are essential to using big endeavour. The experience of the IAU Office of data to address some of the most pressing Astronomy for Development (OAD), the South development issues. The exponential growth African Radio Astronomy Observatory (SARAO) in data science and technology careers and the Inter-University Institute for Data internationally means that research and Intensive Astronomy (IDIA) will ensure effective collaboration in the areas of cloud-based and inclusion of smaller institutions. high performance computing have high potential for impact on human capital development. This Embedding young researchers in global needs to be proactively nurtured as proposed in networks this flagship project. Young researchers engaged in this broad collaboration among leading institutions in Access to research for smaller institutions BRICS partner countries will experience exciting career opportunities. The will enhance the BRICS In BRICS countries, many smaller research context as sources of scientific and technical institutions and universities are not currently talent, and will strengthen the BRICS countries’ competitive on a global scale of scientific leadership in global science through investment research. Many new researchers and students in strategic areas where BRICS displays particular will have access to big data and big strength, such as radio astronomy. By developing compute, through the gateway technologies and big data and big compute expertise and ensuring cloud-based toolkit we propose to develop in participation of early-career researchers in this this proposal. For example, researchers based flagship, this investment is guaranteed to pay at the IAU/OAD, University of Zululand and IDIA off considering the roles those play in the (South Africa) will collaborate with researchers development of economies in the context of the in China (NAOC), India (NCRA-TIFR, IUCAA) and 4th industrial revolution. Brazil (LNCC). Access will be established through Enabling young scientists to work training workshops at institutions and through on interdisciplinary projects on research the support of participation of students from infrastructure is an important element to those institutions in research activities of facilitate the beneficial dissemination of trained the project. In addition to this, it offers the big data and big compute scientists across other opportunity to train IT staff and researchers in sectors and raises the participants’ awareness of new technology. how they can contribute to development. Such An Acad Bras Cienc (2021) 93(Suppl. 1) e20201027 17 | 22
You can also read