FROM VOLUME TO VALUE - Associazione Big Data
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
SUMMARY 1. INTRODUCTION TO THE NEW EDITION 5 1.1. SCOPE, VISION, OBJECTIVE: THE ASSOCIAZIONE BIG DATA 7 AS A SOURCE OF COMPETITIVE ADVANTAGE 1.2. ACTIVITIES FROM 2015 9 2. THE BIG DATA ASSOCIATION 11 2.1. PROFILE OF THE ASSOCIATION 12 2.2. THE ASSOCIATION IN FIGURES (2015-2019) 13 2.3. HIGH PERFORMANCE COMPUTING FACILITIES 15 IN EMILIA-ROMAGNA 2.4. ASSOCIATION MEMBERS 16 3. ENABLING INFRASTRUCTURE 17 3.1. HPC, HTC & NETWORKING 18 3.2. OTHER RELEVANT FACILITIES 21 3.3. FUTURE TRENDS AND DEVELOPMENT 23 4. SCIENTIFIC AND APPLICATION DOMAINS 25 4.1. DIGITAL ENABLING TECHNOLOGIES 27 4.2. HEALTH AND LIFE SCIENCES 29 4.3. AGRI-FOOD, BIOBASED INDUSTRY AND BLUE GROWTH 31 4.4. SMART CITIES AND SECURITY 33 4.5. MATERIALS SCIENCE 35 4.6. INDUSTRY 4.0 37 4.7. EARTH SCIENCE AND CLIMATE CHANGE 39 4.8. GLOBAL SOCIAL SCIENCES AND ECONOMICS 41 4.9. HUMANITIES, CULTURAL HERITAGE 43 4.10. PHYSICS, ASTROPHYSICS AND SPACE SCIENCE 45 5. EUROPEAN PROJECTS LIBRARY 47 6. CONTACTS 53
Palma Costi With great pleasure we welcome this Regional Minister for Economy edition of From Volume to Value. and development, energy and green economy, post-earthquake In 2016 we set up the research group on reconstruction big data and supercomputing, and we started by checking the infrastructures Patrizio Bianchi and competences on the territory of the Regional Minister for Coordination of Emilia-Romagna region. European policies for growth, education, Emilia-Romagna has always been a vocational training, university, research crossroads. Here almost a thousand and employment years ago, after dark centuries, the first “students” eager for new knowledge found themselves in Bologna, founding that institution which still today remains the crossroads of many university experiences. Here, over the years, the supercomputing infrastructures of the national scientific system have been located and from here, we started to set up an association named Associazione Big Data that represents a place of convergence of skills and knowledge relating to big data, their applications, their impact on the daily lives of all citizens. We are honored to have contributed to its birth: now the association is open to the whole country, aware of having to contribute to the full integration of our scientific system in the great European and world community of knowledge.
The management of large amounts of the leverage effects of research data via high computing capacity in terms investments already used at of performance and memory available, regional, national and EU levels; provides remarkable opportunities in • Promoting joint actions including various research areas and applicative domains such as, among others, financial the coordination, planning and analysis, environmental monitoring programming of relevant research and geophysical simulations, cultural and innovation activities in the big heritage management, precision farming, data domain; multimedia, etc. • Supporting researcher careers, training and mobility, and Many world-class institutions in developing skills in relevant supercomputing and big data are located sectors to ensure the necessary in Italy and in the Emilia Romagna region. highly qualified workforce needed They are able to store, manage and to underpin an effective and analyze large amounts of data. Therefore, sustainable exploitation of big data the impact of a joint exploitation of the analytics in research and industry. expertise, know-how and facilities of the national and regional public and private The document collects figures, actors is very relevant. expertise, technologies and facilities With the aim to exploit such potential, available in the frame of the association the “Associazione Big Data” was recently in each of the knowledge and innovation launched. Based in Emilia Romagna, but domains of major relevance of the mobilizing the best world class actors of Associazione Big Data. the whole country, it aims at: The president • Sharing and jointly exploiting existing results, knowledge, capacities, and research and innovation initiatives and frameworks; The vice president • Fostering cooperation between national and regional public and private actors to also maximize 6
1.1. SCOPE, VISION, OBJECTIVE: THE ASSOCIAZIONE BIG DATA AS A SOURCE OF COMPETITIVE ADVANTAGE High performance computing, big data European Centre for Medium-Range analysis, deep and machine learning Weather Forecasts (ECMWF), which algorithms, high bandwidth networks are is going to be set up in Bologna, will technological pillars of digital society. bring other prominent computing Thanks to them, a wide range of future and analytics capabilities, with the scientific, technological and societal potential of further exploiting available challenges can be met to pursue growth expertise, infrastructures and facilities, in many sectors of the economy and by generating new local economic and society. The opportunities are many and societal opportunities. yet not all predictable. These challenges will require the development of new skills Since 2015, stakeholders involved in and of new technologies. supercomputing and big data production and management have engaged in Large investments in ICT infrastructures mapping and analyzing their potential and technologies, such as high impact on scientific, social and business performance computing, cloud domains: thus, the Associazione Big Data computing, artificial intelligence and was set up to interconnect and jointly big data, as well as in research and exploit the knowledge, capacities and innovation, inclusiveness and skills are research and innovation potentials of the priorities for Europe in order for it this community to leverage the effects to benefit from the digital economy of actions and investments made so far potential. and to maximize their impacts, locally but also at national, EU and international In Italy, research bodies such as INFN, levels. CNR, ENEA, GARR and CINECA have already implemented big data via The “Associazione Big Data” mission is to high performance computing and facilitate: e-infrastructures to support major • the sharing and exploiting of existing research and academic communities. results, knowledge, capacities, Most of these Italian high performance research and innovation initiatives and high throughput computing and frameworks; resources are concentrated in the North of Italy, especially in Emilia-Romagna, • cooperation between public and where they are well integrated into private entities, maximizing the the local knowledge and business leverage effects of R&I public and systems, consisting of universities and private investments; research centers, large-cap and mid- • joint actions including coordination, cap companies, as well as international planning, facilitating and promoting players that are new entries in the relevant R&I actions, policies and local scenario. The data centre of the international cooperation activities; 7
SCOPE, VISION, OBJECTIVE: THE ASSOCIAZIONE BIG DATA AS A SOURCE OF COMPETITIVE ADVANTAGE • researcher careers, training and mobility and, in general, the development of skills in the big data and AI domains. More precisely, the Association shall operate in the following economic and societal sectors and disciplinary contexts: • the economic system, with special attention to mid-cap companies, in order to facilitate its access to big data processing and related analytic tools, high bandwidth networking and high performance and high throughput computing in order to increase competitiveness and stimulate job creation and economic growth. • the public administrations, in order to support awareness and implementation of well-developed data and computing based infrastructures to provide better services to citizen in the education, health, environment, mobility and security, in order to foster a more widespread, effective and citizen- friendly e-government system at national and local level. • the knowledge community, in order to enhance its ability to analyze large amount of data from experiments, observations, monitoring and sensoring systems and to process more detailed and complete data series to assess or gain insight of the investigated phenomena. 8
1.2. ACTIVITIES FROM 2015 The Big Data Community started its Data Community greatly supported the operations informally in 2015 working preparation of the proposal, sharing on a common framework including data and information that improved the experiences, skills, facilities and attractiveness of the proposal as a whole. opportunities under the auspices of In May 2018 the Italian government Emilia-Romagna Region, and in the announced the selected proposals to context of the regional policy for research create the new Italian Competence Centre and innovation infrastructures. foreseen in the Industry 4.0 national plan. Among them was the proposal BI-REX: Big Since that time, the community enlarged Data Innovation & Research EXcellence, the field of action from regional to national with a cofinancing of 9.2 M€ and a large and international dimension, and a research partnership mostly gathered number of significant achievements have from the Community. been reached by the members of the Big Data Community. All the actors in the big data field that On September 2016 an all-Italian launched the “Big Data Community” partnership between ENEA and formalised their partnership in an CINECA was selected by EUROfusion, association called “Associazione Big the European Consortium for the Data”. The charter of the association was development of fusion energy, to deliver signed by all the participants in June, high-performance computing and data 2018. The Association aims at promoting storage to support European research a community of research centers and on fusion for two years. The 29 member infrastructures of excellence in the field countries of the Consortium are using of supercomputing (High Performance a partition of MARCONI, the Italian Computing) and the treatment of Big Data. supercomputer hosted by the CINECA. Moreover, it provides the possibility of This machine has replaced the Japan- acting at the international level as a single based supercomputer HELIOS, set up in subject. the International Fusion Energy Centre in Furthermore, in November 2018, some of Rokkasho. the members participated in the H2020 call “HPC and Big Data Enabled Large- In December 2016, the regional Scale Test-beds and Applications” with the government of the Emilia-Romagna project IO-Twins - Distributed Digital Twins Region submitted the Italian proposal for for industrial SMEs: a big-data platform the relocation of the ECMWF Data Centre that has been successfully funded with in Bologna. It was then announced as the a budget of 17M euros. The IO-Twins winner in June 2017. The relocation site project will deliver large-scale industrial is located in the area of the Tecnopolo test-beds, leveraging and combining data di Bologna Big Data hub owned by related to the manufacturing and facility the Regione Emilia-Romagna. The Big management optimisation domains, 9
ACTIVITIES FROM 2015 coming from diverse sources, such as data APIs, historical data, embedded sensors, and Open Data sources. In January 2019 the president of the association signed a memorandum of understanding with the Slovenian HPC Sling Association. The aim of the MoU is to collaborate through a number of possible activities, such as the sharing of access to digital infrastructure and establishment of partnerships for developing digital infrastructure; joint research and innovation projects and actions; research exchange programs; support for research groups with common activities or projects; enhancement of the capacity to attract funds and resources from third parties for the respective members at the national and/or international level; training, dissemination and outreach, joint summer schools, and any other activities deemed to be of mutual interest and agreed upon. It is hoped that additional international agreements will be signed in the next few months. In 2019 the SUPER project will start, where SUPER stands for “Supercomputing Unified Platform”. It is funded by the Emilia-Romagna ERDF’s funds and will integrate CINECA and INFN platforms for first, followed by Tier1-CRESCO6-ENEA ENEA and the CMCC Supercomputing Center, with cloud access delivering and enabling value in multiple application domains. Once the infrastructure is available, a number of test beds will be developed in a subsequent “phase II” of the project.
2 THE BIG DATA ASSOCIATION
2.1. PROFILE OF THE ASSOCIATION The Big Data Association operates in a initiatives and projects as described wide ecosystem of initiatives carried out below. It is worth mentioning that at the regional, national and European CINECA and INFN have a key role in level, regarding Big Data and the relevant the Italian participation in the European enabling technologies. At the regional High-Performance Computing Joint level, the Association has an important Undertaking (EuroHPC), that will pool role in the harmonisation of initiatives led European resources to develop top-of- by the stakeholders, e.g. the setup of the the-range exascale supercomputers Tecnopolo di Bologna Big Data hub or for processing big data, based on the participation in BI-REX competence competitive European technology. centre. Another significant contribution is made to the European Open Science Cloud (EOSC), At the national level, all the members with participation in both hub and pilot are involved in projects and initiatives projects. EOSC is the cloud infrastructure related to Big Data, Artificial Intelligence for research data in Europe, responsible and digital innovation, such as the Human for infrastructures and policy-making. Technopole Foundation, leading partner All the initiatives and projects above of the Human Technopole Project, aimed are part of an important commitment at developing personalised medicine by Europe to the fields of big data and and nutrition to tackle cancer and artificial intelligence, as stated in the EU neurodegenerative diseases by means of Declaration on Cooperation on Artificial genomics and big data analysis. Moreover, Intelligence, signed in April 2018, aimed several members have provided support at coordinating national policies and at to industry in the framework of the establishing cooperation between the national plan Industria 4.0 for innovation adhering countries on the exploitation in Italian industry. Some member of AI skills, features and potential representatives are also personally applications. involved in the national policy-making process, taking part in two High Level Expert Groups on Artificial Intelligence set up by the Ministry of Economic Development and by the Ministry of Education, University and Research, aiming at defining a national strategy for the applications of AI and at producing guidelines to inject AI into the educational process. The Association’s members are also involved in European and international 12
2.2. THE ASSOCIATION IN FIGURES (2015-2019) Supercomputing resources for public research in Italy managed by >90% association members H2020, FP7 or CEF projects 1 140 Total Cost of the projects > 2,1 BLN€ Funding received from EC 160 MLN€ 1 Project funded by H2020, FP7 or CEF programs that were on 2015 or starded after and involves at least one member of the association. 13
THE ASSOCIATION IN FIGURES (2015-2019) GARR-X LEPIDA RESEARCH NETWORK THE EMILIA-ROMAGNA REGIONAL NETWORK • Up to 200 gbps OF THE PUBLIC • VPN for WLCG ADMINISTRATIONS high energy physics community • Up to 100 gbps capacity • VPN for PRACE HPC Infrastructure • More than 140,000 km optical fibers and • VPN for Human approx. 3,000 access Brain project nodes FENIX Federating HPC Infrastructure • 43 Pop nodes + 4 integrated regional data centers. • Active in 10 International Public Peering Exchange Points and in 4 Private Peering Facilities. • Current coverage of fast internet availability in Emilia- Romagna: 76% of households • Provision of dark fiber special link between CINECA and INFN- CNAF at 2 terabits /s. 14
2.3. HIGH PERFORMANCE COMPUTING FACILITIES IN EMILIA-ROMAGNA CINECA A high-throughput GARR-X/LEPIDA computing (HTC) facility Tier0: MARCONI system. which hosts the WLCG Fast and effective nation- 3500 low latency server Tier1 at the CNAF-INFN in wide and international nodes + 3500 scale out Bologna, with the following network connection, server nodes; ~10 Petabyte capabilities: mainly provided by GARR of memory RAM; ~20 and Lepida. Petaflops peak perfomance CPU: ~30000 computing cores; Tier0 in PRACE Partnership for Advanced Computing in STORAGE: ~40 PB of online Europe disk space; Eurofusion HPC service facility LIBRARY: ~90 PB of Tier1: GALILEO system. nearline tape space. 1000 low latency server nodes; ~2 petaflops peak perfomance Cloud service: MEUCCI system. 250 low latency server nodes Openstack virtualization middleware; containers, urgent computing AI & ML: DAVIDE system. 45 hybrid nodes x 2 Power8 + 4Tesla P100; ~1 petaflops peak performance STORAGE REPOSITORY: PICO system. 50 server nodes for visualization and data management services; multi tier storage repository; ~20 petabytes of net disk space; ~20 petabytes of tape library 15
2.4. ASSOCIATION MEMBERS CNR Consiglio Nazionale INAF delle Ricerche Istituto Nazionale www.cnr.it di Astrofisica www.inaf.it CINECA INGV Inter-University Consortium Istituto Nazionale for supercomputing di Geofisica e Vulcanologia www.cineca.it www.ingv.it CMCC LEPIDA Centro Euro-Mediterraneo Regional in-house providing per i Cambiamenti Climatici company for ICT service and www.cmcc.it infrastructure www.lepida.it ENEA Agenzia nazionale per le nuove UNIFE tecnologie. L’energia e lo sviluppo Università degli studi economico sostenibile di Ferrara www.enea.it www.unife.it INFN ART-ER Istituto Nazionale di Fisica Emilia Romagna Nucleare innovation agency www.inf.it www.art-er.it UNIPR Università degli Studi IOR di Parma Istituto Ortopedico www.unipr.it Rizzoli www.ior.it UNIMORE Università degli studi di Modena e Reggio Emilia UNIBO www.unimore.it Alma Mater Studiorum Università di Bologna www.unibo.it 16
3 ENABLING INFRASTRUCTURE
3.1. HPC, HTC & NETWORKING CINECA is the national supercomputing tier 0 high end computing system with facility. Thanks to the long-term vision a peak performance on the order of 20 of the Ministry of Education, University petaflops, ranked number 12 in TOP500 at and Research, which has persistently the time of its installation, a tier 1 system supported the progressive development for quality of service, an HPC cloud of the CINECA Interuniversity Consortium, system, and a prototype production over time CINECA introduced in Italy system for artificial intelligence and the most innovative world class HPC machine learning applications. In all, system architectures. The first vector the current computing architecture supercomputer was introduced in integrates more than 9000 server nodes. Italy in the 1980s, the first parallel HPC A large-scale data repository, with a architectures in the 1990s, the first cluster full capacity in excess of 50 petabytes, at the beginning of the 2000s, the large completes and integrates the computing massively parallel architectures, the architecture as well. first petascale system in Italy, ten years ago; and lastly, the modular cluster, The CINECA HPC facility enables a wide the current multipetascale HPC system range of scientific research through actually in production. At the time of open access granted by independent their installation, most of the CINECA international peer reviewed processes supercomputer systems were ranked by mean of the PRACE association at in the top part, from the 7th to the 12th the European level, and the ISCRA at the position, of the TOP500, enabling and national level. supporting national and European excellence in science and technology CINECA provides services to the innovation. The high end tier 0 systems Worldwide Eurofusion community, being have been continuously integrated in a the contractor of the European tender for complex environment that complements that specific service that will run till the the tier 0 supercomputers with a high end of 2023, and also provides operational quality of service tier 1 system, a large computing service for weather forecasting scale data repository, and, from time for National Civic Protection under the to time, innovative prototypes to keep supervision of Emilia Romagna ARPA- pace with the cycles of innovation and SMR. CINECA is part of the European to develop and test new architectures digital infrastructure for many ESFRI RI with the objectives of increasing and facilities and initiatives, among others, improving the effectiveness and the EPOS: European Plate Observing System, efficiency of the next cycle highest led by INGV, ELIXIR: Infrastructure for Life performing computing system. Science, and the HPB European Human Brain Flagship Project. Also, with reference The current computing architecture to those RI initiatives (but not only those), facility hosted by CINECA integrates a CINECA is a core partner of the European 18
HPC, HTC & NETWORKING Open Science Cloud HUB infrastructure the framework of a national INFN HTC and the European Data Infrastructure. infrastructure consisting of the CNAF Tier At the national level, many joint 1 and 10 smaller facilities, called Tier 2 development partnership agreements centres, placed all over Italy. are in force with INFN, ENEA, INAF, SISSA, ICTP and many R&D collaboration The CNAF data centre operates about actions and agreements are in force with 1,000 computing servers providing qualified national research institutes, ~30,000 computing cores, allowing the universities, and public administrations. concurrent run of an equivalent number CINECA received recognition as a ‘Golden of general-purpose processing tasks. All Digital Hub for Innovation’ from the Big the computing resources are centrally Data Value Association, is a Competence managed by a single batch system Centre for supporting innovation towards (LSF by IBM) and dynamically allocated industries, and has led many proofs of through a fair-share mechanism, which concepts in collaboration with industries allows full exploitation of the available and private organisations. CINECA CPU-production with an efficiency of entered formal partnerships for added about 95%. Part of the CNAF computing value services and R&D activities with power is currently hosted at CINECA, and ENI and manages and operates the connected to the main site via dedicated ENI corporate supercomputing facility, high-speed network links traversing the one of the world’s largest industrial city of Bologna, allowing the remote supercomputing infrastructures. computing nodes to access the mass storage located at CNAF with maximum The HTC facility is hosted at CNAF throughput and minimal latency, as if Bologna, which is one of the INFN they were local. National Centres defined in the INFN charter. CNAF has been charged CNAF operates a very large storage with the primary task of setting up infrastructure based on industry and running the so-called Tier-1 data standards, for connections (about 100 centre for the Large Hadron Collider disk servers and disk enclosures are (LHC) experiments at CERN in Geneva. interconnected through a dedicated Nowadays it hosts computing not only Storage Area Network) and for data for LHC but also for many experiments, access (data are hosted on parallel file ranging from high-energy physics to systems, GPFS by IBM, typically one per astroparticles, dark matter searches in major experiment). This solution allows underground laboratories, etc. CNAF the implementation of a completely also participated as a primary contributor redundant data-access system. In total, in the development of grid middleware CNAF hosts about 40 PB of online disk and in the operation of the Italian grid space, with a total I/O bandwidth of infrastructure. This facility operates within about 1.5 Tb/s and 90 PB of nearline tape 19
HPC, HTC & NETWORKING space arranged in a robotic library read out by 20 tape enterprise-level drives. The 10 additional Tier 2 centres distributed across Italy have a similar aggregate of resources, both in terms of computing and disk storage, but not tape. They are connected to the CNAF Tier1 through dedicated high-speed links, and the resources can work in a coordinated way in the global INFN computing ecosystem. The user community of the CNAF Tier1 facility is primarily composed by research groups from INFN and most of the Italian universities (including UNIBO, UNIFE, UNIPR), working on nuclear, subnuclear, astroparticle and theoretical physics. In the coming years, the HTC computing resources of INFN CNAF will be increased at least by an order of magnitude in order to meet the computing and storage requirements of forthcoming experiments and upgrades. In addition, LEPIDA, within the framework of the Digital Agenda of the Emilia- Romagna, has been entrusted with the design, implementation and provision of four data centres, geographically distributed and natively connected by the LEPIDA network, for the use of public administrations and healthcare systems. These data centres offer advanced computing services, storage, data protection, business continuity with native disaster recovery functions and a specific focus on energy management. 20
3.2. OTHER RELEVANT FACILITIES As previously mentioned, the Reusable) principles and guidelines and concentration in Bologna of national to move towards the implementation infrastructures for HPC and big data of the digital single marketplace at the constitutes the interchange and European level provided by the EOSC – reference hub for the national and European Open Science Cloud initiative. European public and private research community. The national panorama also presents a vitality of tier 1-class infrastructures and highly complex cloud computing services. As shown schematically below, with positioning data, which, due to its continuous updating process could suffer from some inaccuracy, but which will represent the national panorama, the distributed presence of digital infrastructures is certainly valuable. Attention is drawn to the distributed infrastructure of the INFN sites, to the infrastructure implemented through the action of ReCaS (calculation network for SuperB and other applications ), and to the distributed system for cloud computing services implemented by GARR. In addition to the distributed infrastructures, there are tier1 systems at the peta scale class at the Portici computer centre of ENEA, the CRESCO systems, at the University of Salento computer centre, which houses the CMCC supercomputing system, and the computing centre of SISSA / ICTP, which hosts the SISSA computing system. By mean of “SUPER – Supercomputing Unified Platform Emilia-Romagna” the HPC and Big Data national ecosystem will be federated to enable good practices of data access and processing based on FAIR (Findable, Accessible, Interoperable, 21
OTHER RELEVANT FACILITIES National HPC and Big Data infrastructure Institution Location Site Name # of cores # of server nods Storage (PB) Tape (PB) Bari NFN-BARI T2 174 12 1,51 Bari RECAS 4.000 250 3,55 Bari GARR 1.152 58 3,00 Bologna CNAF-T1 30.000 1875 40,00 90,00 Bologna CNAF-LHCB-T2 384 12 Catania INFN-CATANIA 390 20 1,20 Catania RECAS 2.562 128 0,32 Cosenza RECAS 3.500 175 0,90 INFN Frascati INFN-FRASCATI 150 8 ,13 Milano INFN-MILANO-ATLAS 180 12 1,31 Napoli INFN-NAPOLI-ATLAS 334 21 1,82 Napoli RECAS 4.956 310 4,96 Padova LNL 452 28 1,76 Pisa INFN-PISA 1.250 78 2,49 Roma INFN-ROMA1 160 10 1,34 Roma INFN-ROMA1-CMS 192 6 Torino INFN-TORINO 78 6 0,06 Bologna MARCONI 406.000 7000 Cineca Bologna GALILEO 72.576 1512 20,00 20,00 Bologna PICO 2.400 50 CMCC Lecce CMCC 13.824 288 4,00 CNR Roma Data Center 1.034 64 1,20 Casaccia ENEA-Casaccia 192 12 0,02 Frascati ENEA-FRASCATI 480 24 0,08 ENEA Portici CRESCO4 4.864 04 Portici CRESCO5 512 32 4,00 5,00 Portici CRESCO6 20.832 434 Catania GARR 1.152 58 3,00 Cosenza RECAS 384 32 1,00 GARR Napoli GARR 384 24 1,00 Palermo GARR 1.152 72 3,00 INAF Trieste OATS 256 16 1,00 SISSA Trieste SISSA 4.000 250 1,50 22
3.3. FUTURE TRENDS AND DEVELOPMENT The digital infrastructures, high science. performance computing architecture, From this point of view, the development hyper scale cluster architecture and roadmap of CINECA foresees, as a high bandwidth network infrastructure baseline target, systems with dozens of will have a tremendous and drastically petaflops in a short time and hundred- innovation in the next few years. petaflop systems in the medium term. The medium term target will be The global race towards exascale definitively increased in case of success computing is imparting a strong in acting as an EuroHPC Hosting Entity. acceleration to the deployment of supercomputing systems with The needs and the objectives of capping a performance in excess of many the consumption of electrical energy hundreds of petaflops and the target of supercomputer systems has led to for the availability of the first exascale the development of microprocessor computing system, a supercomputer with technologies characterised by an a performance of at least one thousands extremely narrow performance profile, petaflops, is more and more close. The such as floating point accelerators, FPGA USA, China, and Japan are investing many components, tensor microprocessors, billions of dollars for supporting R&D with and neuromorphing chips, but the target of having an exascale system an extremely high ratio of energy in production by 2022/23. efficiency to performance if properly exploited at the level of the algorithms The European Union and 28 member and programming models. Such states created a joint undertaking components have to be integrated in with the purpose of mobilising a server nodes that combine general- budget of billions of Euros for R&D for purpose microprocessor sockets and the development of European HPC floating point accelerators in hybrid technologies and for achieving towards- configurations that needs to be co- pre-exascale and exascale systems designed in light of specific application to open production in Europe in the domains. The physics of the materials same time frame. The Italian federal and matter, the environment and climate government, the local ecosystem of change, energy savings and energy universities and research institutes, production from renewable sources, CINECA and other agencies, and regional the health and life sciences, artificial public administrations are collaborating intelligence, as well as cyber security, and working for hosting in Italy one of are and will be the main drivers for the those supercomputers, on the basis of CINECA co-design approach, and co- the excellent track records of CINECA design will be the main driver for the and of the vital persistent development development roadmap of the CINECA in Italy of the methods of computational supercomputing infrastructure towards 23
FUTURE TRENDS AND DEVELOPMENT performances on the order of hundreds scale facilities producing and processing of petaflops and at the exascale. the increasing availability of big data. Regarding the large hyper scale cluster, In the future, the LEPIDA physical fiber the overlap between the low-power optic network integrated in the GARR-X multi-core sockets and the software service will guarantee state of the art technology for virtualisation and high-speed network links, moving from containers pushes the limits of the very the current tens of Gigabits per second large resilient configurations towards to the scale of hundreds of Gigabits per multimillion-core clusters. This has led second, till the next scale of up to 400 the scientific community and industry Gigabits a second backbone foreseen to develop new processing methods for wide availability in the timeframe based on high performance throughput 2021/2022. computing for filtering and gaining insight into the huge amount of data In order to deploy on time such a produced by extreme experiments or breakthrough in all the dimensions of observations, such as the High Energy the digital infrastructure technologies, Lumen LHC experiments at CERN, it will be necessary, starting now, to or the Square Kilometre Array multi- develop progressively the consolidation radio telescope project, or the Internet of the main hubs around which will of Things, in the context of industrial be federated the large resources innovation processes and public distributed in the different core centres administration decision support work of the national and European public flows. and private research systems and the main core centres of the public From this point of view, as the INFN administrations. is one of the world’s largest tier 1 LHC computing grids, the target The aim of the SUPER regional is to maintain and improve this infrastructural project will be to proceed leading position and enabling the in the direction of coordinating the provisioning of services to consolidate resources distributed at the national the scientific competitiveness of the level around the Bologna high national and European community. The performance computing and big data implementation of this strategy foresees world class hub as a model for open a multimillion-core cluster at CNAF, science in Italy and in Europe and as complemented with a data repository an enabling facility to support the capacity at the near-exabyte level. development of the proofs of concepts The sharing of data and the anywhere- for innovation in the industrial sector anytime access model requires and the optimisation of the decision the availability of large bandwidth processes of the public administrations. geographical networks to federate large 24
4 SCIENTIFIC AND APPLICATION DOMAINS
26
4.1. DIGITAL ENABLING TECHNOLOGIES Trends in digital innovation are chains, both for large national, European progressively providing the ability to and international players and, at the level globally extend connectivity to any person of regional ecosystems, for SMEs and and any device, to seamlessly access any innovative industries and creative startups, information and data from anywhere and at with big data modelling and processing any time, with on demand processing and market in hardware, software and ICT analysis of the data for modelling scenarios services a fast growing multibillion-euro and supporting the decisions (ATAWAD: business sector of industry. anytime, anyway, any device). In order to unleash the potential arising The leading edge components of such a from the exploitation of big data value, it pervasive ecosystem of digitally enabling is essential to research new technologies, technologies are the high speed networks including HPC, data processing, data to interconnect large scale facilities; large science and artificial intelligence. volume repositories to preserve and curate the big data, and high end (HPC and HPTC) Multimedia, data storage, data movement computing systems. These technologies and networking, communication and deal with simulation and modeling of visualisation tools are complementary scientific and engineering problems as well enabling technologies for the development as data analysis. The scientific capabilities, of the big-data market. Fairness in the industrial competitiveness and sovereignty access to public data and at the same time depend critically on access to world-leading strong safeguards for privacy and cyber HPC computing and data infrastructures to security are also essential. keep pace with the growing demand and complexity of the problems to be solved. Furthermore, a consistent long-term strategy is essential for educating and As the problems modeled in computer training professional data scientists to favor simulations and decision support systems an holistic and integrated multidisciplinary grow in size and complexity (to enable and interdisciplinary problem solving more detailed predictions, to cope with approach and to manage policies and ever larger amounts of data, or both), so do complexity to support the research the demands on digital resources. In many infrastructures and industries in properly areas, spanning from health, biology and handling and gaining insight from big data. climate change to automotive, aerospace, energy and economics, digital technology provides a practical way to address Major European Projects: AI4EU, AIDA, AIDA-2020, ANTAREX, BD2DECIDE, DeepHealth, EnABLES, complexity, and access to it becomes ENABLE-S3, EOSC-hub, EPEEC, EUDAT2020, essential. EUROfusion, ExaNeSt, EXDCI, XDC, HERCULES, HPC-EUROPA3, ICEI, OPRECOMP, PRACE-4IP, To totally harvest the benefits of digital SeaDataCloud, SECREDAS, Re-Search Alps, technology, it is necessary to support a SEADATANET-II. full ecosystem comprising hardware and software infrastructures, applications, skills, Active members: CMCC, CNR, ENEA, INAF, INFN, services and their interconnections. Digital INGV, LEPIDA, UNIBO, UNIMORE, UNIPR, CINECA. innovation research has a direct and major impact on the whole ICT and HPC value 27
28
4.2. HEALTH AND LIFE SCIENCES Big data are attracting a great deal of time, reducing costs for the national health attention for the potential in healthcare and, services and/or regional health authorities. more widely, in all biomedical scenarios. In the last few years, hospitals, universities and Big data in the life sciences and medical research centres have started fruitful and practice are still a big challenge, but data-productive analyses at the European, they will leverage the existing scientific national and regional levels. For example, knowledge, improving translational omics studies and multidimensional research, develop personalised strategies, imaging scans producing large volumes of and implement innovative technological data and knowledge are helpful in areas products. All these efforts will have an from neuroscience to orthopaedics, from impressive social, economic and industrial cardiovascular diseases to aging. By this impact. In this perspective, a regional means, an increasing number of case infrastructure, able to manage and process studies in healthcare are well suited for a large-scale data, will allow running big data solution. in addition, big data allow prospective population studies that will solving clues in common conditions but also give an insight in treating epidemiologically discovering new developments for treating significant diseases such as cancer, rare diseases, steering healthcare towards neurological and cardiovascular conditions, personalised and “precision” medicine. osteoporosis, muscle-skeletal diseases, Nowadays, there is a pressing need for as well as geriatric and ageing-related fast and preferential connections for data pathologies. transfer to transform potential into reality and instruments for information aggregation and extrapolation are urgent, such as rapid Major European Projects: ADOPT-BBMRI- ERIC, AirPROM, ChiLTERN, COMPARE, CORBEL, and high-performance techniques for DeepHealth, ECDP, EDEN ISS, EJPRD, EU- machine learning and data mining. Despite STAND4PM, HPB SGA1, INforFUTURE, LANGELIN, this strong impulse, the exploitation of big MCDS-Therapy, MYNEWGUT, Neuromics, data information to generate new health ORTHOUNION, PAPA-ARTIS, PLATYPUS, PROPAG- knowledge is much delayed/hindered by AGEING, THALAMOSS, VPH-Share. the inherent heterogeneity of big data and the lack of broadly accepted standards, as Active members: CNR, CINECA, INFN, INGV, IOR, well as by legal issues surrounding the use UNIBO, UNIFE, UNIMORE, UNIPR. of personal data. At the regional level, a proper and coordinated technology framework, taking care of context and metadata, will activate a data-driven improvement for a meaningful use of big data in healthcare development, boosted by top-level HPC infrastructures. A more accurate and defined management of large-scale information and local data exchange and integration will lead physicians, medical doctors and researchers to better treatments capable of answering patients’ issues and, at the same 29
30
4.3. AGRI-FOOD, BIOBASED INDUSTRY AND BLUE GROWTH The relevance of big data to the agri-food analysis of key data of interest for maritime sector, bio-based industry, and blue growth security (including migration phenomena), is pivotal, also for the current blooming of maritime navigation and transportation innovative technologies and omic tools safety and security, sustainable fisheries applied to the most diverse industrial and aquaculture, biotic and abiotic sea sectors (food production and safety, resources exploitation, etc. primary production and animal/plant breeding, industrial biotechnology, enzyme and microbial discovery, etc.). Managing Major European Projects: CIRCLES, CYBELE, INMARE, LEGVALUE, MedAID, PANINI, PerformFISH, these (big) data is a formidable challenge. Prolific, RES URBIS, SMARTCHAIN, TREASURE, Improving consumer health by monitoring VALUMICS. food-related data is one of the areas that may benefit most from a radically Active members: CINECA, CNR, INGV, UNIBO, innovative use of big data to provide UNIPR. more personal recommendations, via various technological platforms, which can improve the quality of life. Beyond the typical use of data analysis for food safety, big data is also related to predictive analytics, with an impact on economy and logistics, as well as to metagenomics for the characterisation of food spoilage. EFSA is interested in future strategic approaches to risk assessment in areas of food and feed safety that benefit from the acquisition, processing, and sharing of large quantities of data and evidence. Therefore, EFSA supports this initiative and is open to supporting solutions that reduce the costs and improve the effectiveness of data acquisition by sharing resources and capabilities. Big data has also an increasing importance in agricultural practices, since the integration of data from sensors, Internet of Things (IoT) applications and genomics can lead farmers to increase their productivity and sustainability. They can also improve efficiency and sustainability as well as flexibility and safety in the food and biobased industry. Finally, IoT, cloud computing and big data and data analytics are required for sharing, advanced processing and 31
32
4.4. SMART CITIES AND SECURITY Big data is a growing area of interest offering transport services. for public policy makers, for cities and urban management: it is related with A smart use of big data supports the enormous stream of data coming governments in optimising multimodal from administrative and social data, transport and managing traffic flows, making from city energy, mobility and transport cities smarter. Real-time transportation infrastructures, from large and increasing planning and safety, environmentally sensor networks, comprehending large sustainable and resource-efficient transport, nodes connected under the Internet-of- socio-economic and behavioural research, Things paradigm, video surveillance and and forward looking activities for policy environmental cameras, and so on. making are fundamental topics which greatly benefit from big data. As well, The analysis of data for purposes of social the huge amount of data coming from innovation and human-centric services GPS systems, real-time traffic monitoring, (e.g. personal safety and physical security, parking availability, electric charge station crowd-sensing/participatory sensing in availability, etc., can provide competitive an urban environment), for smart mobility advantages in order to optimise vehicle and smart logistics services, for critical design, maintenance and energy infrastructure protection, for emergency management, improve on-board automation management, etc. is becoming crucial and and safety, and reduce CO2 emissions. challenging due to the magnitude of the data and the requirement of timeliness. Moreover, the widespread use of Connected with smart city and social big modern monitoring systems in electricity data, cyber security (including privacy, transmission, distribution systems and smart access control management, biometrics, meters monitored by the consumer enables etc.) is one of the most critical areas in big the acquisition of real-time, high-resolution data analytics and management. There’s a data. This coupled with the addition of other growing demand for security information data sources, such as weather data, usage and event management technologies and patterns and market data, dramatically services, which gather and analyse security increases the need for appropriate big data event big data that is used to manage handling techniques and machine learning threats. approaches. Existing grid capacities could be better used, and renewable energy In transport, the volume of data has resources could be better integrated. increased because of the growth in the amount of traffic (all modes) and detectors. Also, travellers, goods and vehicles are Major European Projects: Domino, FLEXMETER, FOCAS, ICARUS, MARISA, OSMOSE, PRYSTINE, generating more data from mobile devices SocialCar, SOGNO, TRAFAIR. and tracking transponders (including trains, ships and aircraft). Infrastructure, Active members: CINECA, CMCC, ENEA, LEPIDA, environmental and meteorological UNIBO, UNIMORE. monitoring also produce data that is related to transport operations and users. New ways to collect, manage and analyse vast quantities of data are important both for governments and private companies 33
34
4.5. MATERIALS SCIENCE Data and metadata for the development is increasingly important, and requires of new materials, their processing, and sharing and analysing the resulting highly application life cycles are considered distributed, heterogeneous data-sets. key assets to accelerate discovery An example comes from the electron and innovation across all design and microscopy community: they produce manufacturing sectors. With its academic multidimensional data that need to be and industrial knowledge base and the stored, transferred, shared, managed related production of data, Emilia Romagna and processed, both online (i.e. during competes with the most advanced regions acquisition) and offline. Efforts in this in Europe and worldwide. Here, the direction are going to involve broader main challenges are often related to the communities working on automated construction and maintenance of well- materials and systems analysis at large annotated and structured repositories, their scale facilities (e.g. synchrotron and neutron accessibility, their specialisation/integration facilities), research laboratory equipment, and interactions with similar European or distributed sensing systems in academic and international operations, as well as and industrial environments. the development and deployment of data analytics strategies. Major European Projects: EMCC-CSA, EoCoE, EXTMOS, GEMMA, GRAPHENE FLAGSHIP, A special feature of the materials domain GRIDABLE, INTERSECT, IQubits, MaX, NEXTOWER, is the wealth of predictive information that SimDOME, NANODOME. can be obtained from simulations, now extending to quantum and multiscale Active members: CINECA, CNR, ENEA, UNIBO, approaches thanks to HPC and HTC. In the UNIMORE. silico design of materials and (bio)molecular systems this is no longer limited to their structures and stability: it includes a huge range of functionalities, e.g. from friction and wear to biocompatibility, from colour and optical appearance to hydrophobicity, from electron transport to thermal properties within devices. Italy hosts an important community of advanced users of such applications, but also leads European efforts of code developers who work at the frontiers of the current and future HPC technologies, invest in software/hardware co-design, and are creating an ecosystem of capabilities, applications, data workflows and analysis, and user-oriented services. Importantly, Italy coordinates the European Centre of Excellence for HPC on materials design at the exascale, MaX. Data acquisition from the advanced spectroscopy and imaging of materials 35
36
4.6. INDUSTRY 4.0 Manufacturing industrial sectors are the synthetic data from digital-twin simulations. most relevant and impactful area in Italian Manufacturing facilities must be capable national research, innovation, and company of meeting the requirements of the activities, in terms of employees, production, exponential increase in data production, as and business. The related applied research well as possess the analytical techniques activity is carried out in a set of coordinated needed to extract valuable knowledge from research & innovation organisations big data. Techniques, services, applications, (university labs as well as interdepartmental, platforms, and infrastructure needs are technopoles, the 8 Competence Centres often shared with other research and established in the framework of the Industry production fields, in terms of cloud/edge 4.0 programme, etc.) and industries working computing and HPC, big data analytics and in manufacturing, mechanics, industrial visualisation, machine learning, and online processing, and in many related areas such stream processing. as mechatronics, automotive, robotics and automation, packaging, ceramics, electronic equipment, textile and garments, machinery Major European Projects: AEOLIX, CARIM, CLASS, CloudiFacturing, ColRobot, DREAM, ETEKINA, and metal production, etc. EXCELLERAT, FIRST, Fortissimo, FORTISSIMO2, I-MECH, IMPROVE, INCLUSIVE, IO-Twins, LINCOLN, The changes in production processes and Plan4Res, ROSSINI, SARAS, SYMBIOPTIMA, their management increasingly require SYMPLEXITY. the exploitation of knowledge extracted from data, acquired (offline and online) in Active members: ART-ER, CINECA, CNR, UNIBO, virtual prototyping, testing, production, use, UNIFE, UNIMORE. and maintenance. Industry 4.0 leverages the new paradigms of the IoT and cyber physical systems, exploiting the enormous amount of data coming from simulations, digital twins, and real machines to realise superior performance and energy-efficient manufacturing. The manufacturing industry is currently in the middle of a data-driven revolution, from traditional manufacturing facilities to highly optimised smart manufacturing facilities to create manufacturing intelligence from machine-to-machine dataflow and to support accurate and timely decision- making. For example, new techniques, instruments, services for data analysis, learning, prediction and statistics enable novel diagnostics and prognostics for predictive maintenance by processing huge amounts of data describing the time-history of working conditions, integrated with open data about the execution environment and 37
38
4.7. EARTH SCIENCE AND CLIMATE CHANGE Big data in climate change is related to services amount of output data both in diagnostic and in for: (i) developing, porting and running Earth- forecast mode. Archives of high resolution (about system models at high spatial resolution; (ii) 1 km) atmospheric surface parameters (wind, developing metrics and diagnostic tools for temperature, solar radiation, etc.) predicted, on evaluating the models and for specific end-users’ a daily basis, up to 48 hours ahead, are available needs; (iii) acquisition, storage and processing for many environmental and renewable energy of data on the order of petabytes from model applications. Computing services for numerical simulations and observations. weather predictions are provided daily and operational ensemble forecasts on the monthly High performance computing, data repository, timescale are performed on a weekly basis. data sharing and staging services are crucial Chemical transport models produce both 5-day to allowing a wide user base to produce and air pollution forecasts for Italy and Europe and have access to a set of climate variables at high annual diagnostic simulations for the country. temporal resolutions and at extremely high Hourly data for air pollutant concentrations and spatial resolutions, and to perform probability meteorological fields are calculated over the predictions from sub-seasonal to multi-annual national domain with horizontal spatial resolution time scales. This is necessary in order to provide ranging from 4 to 1 km for 12 vertical levels from reliable climate information and maintain ground level to an altitude of 12 km. competitiveness at the international level with the major international meteo-climate services. Environmental change is particularly rapid in the coastal zone and offshore under the Users include both climate scientists and pressure of climate change (e.g. relative sea researchers from a wide range of fields, but level rise) having impacts on drainage basins, climate data is of crucial interest for a large coastal plains (e.g. man-induced subsidence), number of public/private actors in the energy, and offshore (e.g. hydrocarbon exploitation, fish agriculture, water, health, tourism, urban trawling, energy plants, maritime traffic). Ocean management and transport sectors. Overall, the observing systems combined with HPC provide output of high-resolution climate simulations is a scenarios of storm surges and associated valuable asset at both the national and regional coastal flooding, sea state, coastal erosion levels. and long term ecological trends. Quantitative 4D geophysical pictures of the seafloor and The term ‘Earth data’ refers both to atmosphere, sub-seafloor after major events (blooms, river ocean and sea state simulations (uncoupled or floods, coastal erosion or anthropic interventions) coupled in earth system models), environmental, help determine habitat modifications, coastal geophysical and oceanographic observing dynamics and subsidence trends. systems, seafloor mapping and forecast services for institutional agencies (Civil Protection, Major European Projects: AtlantOS, BE-OI, Coast Guard and Ministry of the Environment) ChEESE, CLARA, CRESCENDO, ENVRI PLUS, and to large databases of energy uses in HIGHLANDER, IMMERSE, Iscape, LISTEN MED- industry, emission factors, basic data for life GOLD, MEDSCOPE. MISTRAL, MOSES, ODYSSEA, cycle assessment (LCA) and carbon footprint OPERANDUM, PRIMAVERA, ROCK, RURITAGE, calculation and to databases related to the uses SeaDataCloud, SECLI-FIRM, SOCLIMPACT, of the sea under a blue growth strategy and the SYSTEM-RISK. BLUEMED Initiative. Active members: ART-ER, CINECA, CMCC, CNR, ENEA, INGV, UNIBO, UNIMORE. Atmospheric models have increased their spatial and temporal resolution and produced a large 39
40
4.8. GLOBAL SOCIAL SCIENCES AND ECONOMICS Big data in the social sciences and in boundaries, including the fields of economics are increasingly important for economics, actuarial and financial sciences, research, business, finance, policy-making mathematics, statistics, programming and the society at large, creating social and computer science. The emergence innovation, unique opportunities for job of a multidisciplinary big data approach creation and worldwide leadership in R&I can provide a solid evidence-base for and the market. policy-making, as well as a quantitative knowledge base for solving problems in These data (e.g. videos, photos and micro- finance, insurance and risk-management. messages shared on social networks, Legal aspects can also play a big role in the GIS, networks of relations, as well as success of big data in finance and economy, behavioural and psychological data) especially as regards competition law, promote interdisciplinary activities with intellectual property rights, privacy law and regard to economic aspects, legal and consumer law. ethical issues, and privacy-by-design principles. Behavioural data refers to In a digital global marketplace, leveraging multidimensional data-sets on human and big data is an unparalleled opportunity social behaviour, actions, and interactions for companies including retail, finance, that allow understanding individual as banking, advertising and insurance to gain well as community preferences, attitudes a competitive edge. The real-time and fast and habits, while psychological data-sets processing of large volumes and varieties collect the results of psychological tests, of data has a strong potential impact on surveys, attitude exams, and similar sources. key areas such as: customer data analytics; Behavioural data analysis contributes innovative marketing strategies; market to designing focused and personalised trends and expectations; access to credit services in different fields (e.g. transport, and finance for households and enterprises; healthcare, the environment); social e-commerce and e-paying systems for an network and GIS data analysis is focused easier access of consumers and citizens on understanding inner aspects of human to services; risk management; business relations, and it can be applied to many insights, organisational intelligence and fields (e.g. commercial advertising, study of operational efficiency. the propensity to buy, sentiment analysis); psychological data analysis, on the other Major European Projects: CoSIE, MICADO, MIREL. hand, helps researchers in proving or . disproving theories, and/or supporting Active members: UNIBO. conclusions with statistical evidence. The collection and analysis of large amounts of behavioural data has also important consequences for the formulation and application of legal rules. Moreover, the increasing use of large-scale administrative data-sets, private sector data and social media data can contribute to impactful social science research that transcends traditional disciplinary 41
You can also read