FROM VOLUME TO VALUE - Associazione Big Data

Page created by Ricardo Lowe
 
CONTINUE READING
FROM VOLUME TO VALUE - Associazione Big Data
FROM
VOLUME
TO VALUE
FROM VOLUME TO VALUE - Associazione Big Data
June 2019
FROM VOLUME TO VALUE - Associazione Big Data
SUMMARY
1.   INTRODUCTION TO THE NEW EDITION				                          5

       1.1. SCOPE, VISION, OBJECTIVE: THE ASSOCIAZIONE BIG DATA   7
            AS A SOURCE OF COMPETITIVE ADVANTAGE

       1.2. ACTIVITIES FROM 2015					                             9

2.   THE BIG DATA ASSOCIATION					                                11

       2.1. PROFILE OF THE ASSOCIATION				                        12

       2.2. THE ASSOCIATION IN FIGURES (2015-2019)			             13

       2.3. HIGH PERFORMANCE COMPUTING FACILITIES 		              15
            IN EMILIA-ROMAGNA

       2.4. ASSOCIATION MEMBERS				                               16

3.   ENABLING INFRASTRUCTURE					                                 17

       3.1. HPC, HTC & NETWORKING				                             18

       3.2. OTHER RELEVANT FACILITIES				                         21

       3.3. FUTURE TRENDS AND DEVELOPMENT		                       23

4.   SCIENTIFIC AND APPLICATION DOMAINS				                       25

       4.1. DIGITAL ENABLING TECHNOLOGIES			                      27

       4.2. HEALTH AND LIFE SCIENCES				                          29

       4.3. AGRI-FOOD, BIOBASED INDUSTRY AND BLUE GROWTH          31

       4.4. SMART CITIES AND SECURITY				                         33

       4.5. MATERIALS SCIENCE					                                35

       4.6. INDUSTRY 4.0						                                    37

       4.7. EARTH SCIENCE AND CLIMATE CHANGE			                   39

       4.8. GLOBAL SOCIAL SCIENCES AND ECONOMICS		                41

       4.9. HUMANITIES, CULTURAL HERITAGE			                      43

       4.10. PHYSICS, ASTROPHYSICS AND SPACE SCIENCE		            45

5.   EUROPEAN PROJECTS LIBRARY				                                47

6.   CONTACTS							                                              53
FROM VOLUME TO VALUE - Associazione Big Data
Palma Costi
                                            With great pleasure we welcome this
Regional Minister for Economy
                                            edition of From Volume to Value.
and development, energy and
green economy, post-earthquake
                                            In 2016 we set up the research group on
reconstruction
                                            big data and supercomputing, and we
                                            started by checking the infrastructures
Patrizio Bianchi
                                            and competences on the territory of the
Regional Minister for Coordination of       Emilia-Romagna region.
European policies for growth, education,    Emilia-Romagna has always been a
vocational training, university, research   crossroads. Here almost a thousand
and employment                              years ago, after dark centuries, the first
                                            “students” eager for new knowledge
                                            found themselves in Bologna, founding
                                            that institution which still today remains
                                            the crossroads of many university
                                            experiences.

                                            Here, over the years, the supercomputing
                                            infrastructures of the national scientific
                                            system have been located and from
                                            here, we started to set up an association
                                            named Associazione Big Data that
                                            represents a place of convergence of
                                            skills and knowledge relating to big data,
                                            their applications, their impact on the
                                            daily lives of all citizens.

                                            We are honored to have contributed to
                                            its birth: now the association is open to
                                            the whole country, aware of having to
                                            contribute to the full integration of our
                                            scientific system in the great European
                                            and world community of knowledge.
FROM VOLUME TO VALUE - Associazione Big Data
1
INTRODUCTION
TO THE NEW
EDITION
FROM VOLUME TO VALUE - Associazione Big Data
The management of large amounts of               the leverage effects of research
data via high computing capacity in terms        investments already used at
of performance and memory available,             regional, national and EU levels;
provides remarkable opportunities in         •   Promoting joint actions including
various research areas and applicative
domains such as, among others, financial         the coordination, planning and
analysis, environmental monitoring               programming of relevant research
and geophysical simulations, cultural            and innovation activities in the big
heritage management, precision farming,          data domain;
multimedia, etc.                             •   Supporting researcher careers,
                                                 training and mobility, and
Many world-class institutions in
                                                 developing skills in relevant
supercomputing and big data are located
                                                 sectors to ensure the necessary
in Italy and in the Emilia Romagna region.
                                                 highly qualified workforce needed
They are able to store, manage and
                                                 to underpin an effective and
analyze large amounts of data. Therefore,
                                                 sustainable exploitation of big data
the impact of a joint exploitation of the
                                                 analytics in research and industry.
expertise, know-how and facilities of the
national and regional public and private
                                             The document collects figures,
actors is very relevant.
                                             expertise, technologies and facilities
With the aim to exploit such potential,
                                             available in the frame of the association
the “Associazione Big Data” was recently
                                             in each of the knowledge and innovation
launched. Based in Emilia Romagna, but
                                             domains of major relevance of the
mobilizing the best world class actors of
                                             Associazione Big Data.
the whole country, it aims at:
                                              		                 The president

•   Sharing and jointly exploiting
    existing results, knowledge,
    capacities, and research and
    innovation initiatives and
    frameworks;
                                                              The vice president
•   Fostering cooperation between
    national and regional public and
    private actors to also maximize

6
FROM VOLUME TO VALUE - Associazione Big Data
1.1.         SCOPE, VISION, OBJECTIVE:
             THE ASSOCIAZIONE BIG DATA
             AS A SOURCE OF COMPETITIVE
             ADVANTAGE

High performance computing, big data         European Centre for Medium-Range
analysis, deep and machine learning          Weather Forecasts (ECMWF), which
algorithms, high bandwidth networks are      is going to be set up in Bologna, will
technological pillars of digital society.    bring other prominent computing
Thanks to them, a wide range of future       and analytics capabilities, with the
scientific, technological and societal       potential of further exploiting available
challenges can be met to pursue growth       expertise, infrastructures and facilities,
in many sectors of the economy and           by generating new local economic and
society. The opportunities are many and      societal opportunities.
yet not all predictable. These challenges
will require the development of new skills   Since 2015, stakeholders involved in
and of new technologies.                     supercomputing and big data production
                                             and management have engaged in
Large investments in ICT infrastructures     mapping and analyzing their potential
and technologies, such as high               impact on scientific, social and business
performance computing, cloud                 domains: thus, the Associazione Big Data
computing, artificial intelligence and       was set up to interconnect and jointly
big data, as well as in research and         exploit the knowledge, capacities and
innovation, inclusiveness and skills are     research and innovation potentials of
the priorities for Europe in order for it    this community to leverage the effects
to benefit from the digital economy          of actions and investments made so far
potential.                                   and to maximize their impacts, locally
                                             but also at national, EU and international
In Italy, research bodies such as INFN,      levels.
CNR, ENEA, GARR and CINECA have
already implemented big data via             The “Associazione Big Data” mission is to
high performance computing and               facilitate:
e-infrastructures to support major           •    the sharing and exploiting of existing
research and academic communities.                results, knowledge, capacities,
Most of these Italian high performance            research and innovation initiatives
and high throughput computing                     and frameworks;
resources are concentrated in the North
of Italy, especially in Emilia-Romagna,      •    cooperation between public and
where they are well integrated into               private entities, maximizing the
the local knowledge and business                  leverage effects of R&I public and
systems, consisting of universities and           private investments;
research centers, large-cap and mid-         •    joint actions including coordination,
cap companies, as well as international           planning, facilitating and promoting
players that are new entries in the               relevant R&I actions, policies and
local scenario. The data centre of the            international cooperation activities;

                                                                                          7
FROM VOLUME TO VALUE - Associazione Big Data
SCOPE, VISION, OBJECTIVE:
                                               THE ASSOCIAZIONE BIG DATA
                                              AS A SOURCE OF COMPETITIVE
                                                              ADVANTAGE

•    researcher careers, training and
     mobility and, in general, the
     development of skills in the big data
     and AI domains.
More precisely, the Association shall
operate in the following economic and
societal sectors and disciplinary contexts:
•    the economic system, with special
     attention to mid-cap companies, in
     order to facilitate its access to big
     data processing and related analytic
     tools, high bandwidth networking
     and high performance and high
     throughput computing in order
     to increase competitiveness and
     stimulate job creation and economic
     growth.
•    the public administrations, in
     order to support awareness and
     implementation of well-developed
     data and computing based
     infrastructures to provide better
     services to citizen in the education,
     health, environment, mobility and
     security, in order to foster a more
     widespread, effective and citizen-
     friendly e-government system at
     national and local level.
•    the knowledge community, in
     order to enhance its ability to
     analyze large amount of data
     from experiments, observations,
     monitoring and sensoring systems
     and to process more detailed and
     complete data series to assess
     or gain insight of the investigated
     phenomena.

8
FROM VOLUME TO VALUE - Associazione Big Data
1.2.         ACTIVITIES
             FROM 2015

The Big Data Community started its              Data Community greatly supported the
operations informally in 2015 working           preparation of the proposal, sharing
on a common framework including                 data and information that improved the
experiences, skills, facilities and             attractiveness of the proposal as a whole.
opportunities under the auspices of             In May 2018 the Italian government
Emilia-Romagna Region, and in the               announced the selected proposals to
context of the regional policy for research     create the new Italian Competence Centre
and innovation infrastructures.                 foreseen in the Industry 4.0 national plan.
                                                Among them was the proposal BI-REX: Big
Since that time, the community enlarged         Data Innovation & Research EXcellence,
the field of action from regional to national   with a cofinancing of 9.2 M€ and a large
and international dimension, and a              research partnership mostly gathered
number of significant achievements have         from the Community.
been reached by the members of the Big
Data Community.                                 All the actors in the big data field that
On September 2016 an all-Italian                launched the “Big Data Community”
partnership between ENEA and                    formalised their partnership in an
CINECA was selected by EUROfusion,              association called “Associazione Big
the European Consortium for the                 Data”. The charter of the association was
development of fusion energy, to deliver        signed by all the participants in June,
high-performance computing and data             2018. The Association aims at promoting
storage to support European research            a community of research centers and
on fusion for two years. The 29 member          infrastructures of excellence in the field
countries of the Consortium are using           of supercomputing (High Performance
a partition of MARCONI, the Italian             Computing) and the treatment of Big Data.
supercomputer hosted by the CINECA.             Moreover, it provides the possibility of
This machine has replaced the Japan-            acting at the international level as a single
based supercomputer HELIOS, set up in           subject.
the International Fusion Energy Centre in       Furthermore, in November 2018, some of
Rokkasho.                                       the members participated in the H2020
                                                call “HPC and Big Data Enabled Large-
In December 2016, the regional                  Scale Test-beds and Applications” with the
government of the Emilia-Romagna                project IO-Twins - Distributed Digital Twins
Region submitted the Italian proposal for       for industrial SMEs: a big-data platform
the relocation of the ECMWF Data Centre         that has been successfully funded with
in Bologna. It was then announced as the        a budget of 17M euros. The IO-Twins
winner in June 2017. The relocation site        project will deliver large-scale industrial
is located in the area of the Tecnopolo         test-beds, leveraging and combining data
di Bologna Big Data hub owned by                related to the manufacturing and facility
the Regione Emilia-Romagna. The Big             management optimisation domains,

                                                                                            9
FROM VOLUME TO VALUE - Associazione Big Data
ACTIVITIES
                                               FROM 2015

coming from diverse sources, such as
data APIs, historical data, embedded
sensors, and Open Data sources.

In January 2019 the president of the
association signed a memorandum of
understanding with the Slovenian HPC
Sling Association. The aim of the MoU is to
collaborate through a number of possible
activities, such as the sharing of access to
digital infrastructure and establishment
of partnerships for developing digital
infrastructure; joint research and
innovation projects and actions; research
exchange programs; support for research
groups with common activities or projects;
enhancement of the capacity to attract
funds and resources from third parties for
the respective members at the national
and/or international level; training,
dissemination and outreach, joint summer
schools, and any other activities deemed
to be of mutual interest and agreed upon.
It is hoped that additional international
agreements will be signed in the next few
months.

In 2019 the SUPER project will start, where
SUPER stands for “Supercomputing
Unified Platform”. It is funded by the
Emilia-Romagna ERDF’s funds and will
integrate CINECA and INFN platforms for
first, followed by Tier1-CRESCO6-ENEA
ENEA and the CMCC Supercomputing
Center, with cloud access delivering and
enabling value in multiple application
domains. Once the infrastructure is
available, a number of test beds will be
developed in a subsequent “phase II” of
the project.
2
THE BIG DATA
ASSOCIATION
2.1.         PROFILE OF
             THE ASSOCIATION

The Big Data Association operates in a         initiatives and projects as described
wide ecosystem of initiatives carried out      below. It is worth mentioning that
at the regional, national and European         CINECA and INFN have a key role in
level, regarding Big Data and the relevant     the Italian participation in the European
enabling technologies. At the regional         High-Performance Computing Joint
level, the Association has an important        Undertaking (EuroHPC), that will pool
role in the harmonisation of initiatives led   European resources to develop top-of-
by the stakeholders, e.g. the setup of the     the-range exascale supercomputers
Tecnopolo di Bologna Big Data hub or           for processing big data, based on
the participation in BI-REX competence         competitive European technology.
centre.                                        Another significant contribution is made to
                                               the European Open Science Cloud (EOSC),
At the national level, all the members         with participation in both hub and pilot
are involved in projects and initiatives       projects. EOSC is the cloud infrastructure
related to Big Data, Artificial Intelligence   for research data in Europe, responsible
and digital innovation, such as the Human      for infrastructures and policy-making.
Technopole Foundation, leading partner         All the initiatives and projects above
of the Human Technopole Project, aimed         are part of an important commitment
at developing personalised medicine            by Europe to the fields of big data and
and nutrition to tackle cancer and             artificial intelligence, as stated in the EU
neurodegenerative diseases by means of         Declaration on Cooperation on Artificial
genomics and big data analysis. Moreover,      Intelligence, signed in April 2018, aimed
several members have provided support          at coordinating national policies and at
to industry in the framework of the            establishing cooperation between the
national plan Industria 4.0 for innovation     adhering countries on the exploitation
in Italian industry. Some member               of AI skills, features and potential
representatives are also personally            applications.
involved in the national policy-making
process, taking part in two High Level
Expert Groups on Artificial Intelligence
set up by the Ministry of Economic
Development and by the Ministry of
Education, University and Research,
aiming at defining a national strategy for
the applications of AI and at producing
guidelines to inject AI into the educational
process.

The Association’s members are also
involved in European and international

12
2.2.             THE ASSOCIATION IN FIGURES
                 (2015-2019)

                            Supercomputing resources
                                    for public research
                                   in Italy managed by               >90%
                                 association members

                                               H2020, FP7
                                          or CEF projects 1
                                                                     140

                                               Total Cost of
                                               the projects
                                                                     > 2,1 BLN€

                                         Funding received
                                                  from EC
                                                                     160 MLN€

1 Project funded by H2020, FP7 or CEF programs that were on 2015 or starded after and involves at least one member

 of the association.

                                                                                                               13
THE ASSOCIATION IN FIGURES
                                                                   (2015-2019)

     GARR-X                   LEPIDA
     RESEARCH NETWORK         THE EMILIA-ROMAGNA
                              REGIONAL NETWORK
     •   Up to 200 gbps
                              OF THE PUBLIC
     •   VPN for WLCG         ADMINISTRATIONS
         high energy
         physics community    •   Up to 100 gbps
                                  capacity
     •   VPN for PRACE
         HPC Infrastructure   •   More than 140,000
                                  km optical fibers and
     •   VPN for Human            approx. 3,000 access
         Brain project            nodes
         FENIX Federating
         HPC Infrastructure   •   43 Pop nodes + 4
                                  integrated regional
                                  data centers.
                              •   Active in 10
                                  International Public
                                  Peering Exchange
                                  Points and in 4 Private
                                  Peering Facilities.
                              •   Current coverage
                                  of fast internet
                                  availability in Emilia-
                                  Romagna: 76% of
                                  households
                              •   Provision of dark fiber
                                  special link between
                                  CINECA and INFN-
                                  CNAF at 2 terabits /s.

14
2.3.          HIGH PERFORMANCE COMPUTING
              FACILITIES IN EMILIA-ROMAGNA

CINECA                             A high-throughput             GARR-X/LEPIDA
                                   computing (HTC) facility
Tier0: MARCONI system.             which hosts the WLCG          Fast and effective nation-
3500 low latency server            Tier1 at the CNAF-INFN in     wide and international
nodes + 3500 scale out             Bologna, with the following   network connection,
server nodes; ~10 Petabyte         capabilities:                 mainly provided by GARR
of memory RAM; ~20                                               and Lepida.
Petaflops peak perfomance          CPU: ~30000 computing
                                   cores;
Tier0 in PRACE Partnership
for Advanced Computing in          STORAGE: ~40 PB of online
Europe                             disk space;
Eurofusion HPC service
facility                           LIBRARY: ~90 PB of
Tier1: GALILEO system.             nearline tape space.
1000 low latency server
nodes; ~2 petaflops peak
perfomance
Cloud service: MEUCCI
system. 250 low latency
server nodes
Openstack virtualization
middleware; containers,
urgent computing
AI & ML: DAVIDE system.
45 hybrid nodes x 2 Power8
+ 4Tesla P100; ~1 petaflops
peak performance

STORAGE REPOSITORY:
PICO system. 50 server
nodes for visualization
and data management
services; multi tier storage
repository;

~20 petabytes of net disk space;
~20 petabytes of tape library

                                                                                          15
2.4.            ASSOCIATION
                MEMBERS

                                                            CNR
                                                            Consiglio Nazionale
                                        INAF                delle Ricerche
                           Istituto Nazionale               www.cnr.it
                                 di Astrofisica
                                  www.inaf.it
                                                                CINECA
                                   INGV                         Inter-University Consortium
                      Istituto Nazionale                        for supercomputing
            di Geofisica e Vulcanologia                         www.cineca.it
                            www.ingv.it
                                                                     CMCC
                    LEPIDA                                           Centro Euro-Mediterraneo
Regional in-house providing                                          per i Cambiamenti Climatici
company for ICT service and                                          www.cmcc.it
              infrastructure
             www.lepida.it                                                ENEA
                                                                          Agenzia nazionale per le nuove
                    UNIFE                                                 tecnologie. L’energia e lo sviluppo
      Università degli studi                                              economico sostenibile
                  di Ferrara                                              www.enea.it
             www.unife.it
                                                                      INFN
                     ART-ER                                           Istituto Nazionale di Fisica
              Emilia Romagna                                          Nucleare
            innovation agency                                         www.inf.it
                www.art-er.it

                                 UNIPR
                   Università degli Studi                           IOR
                               di Parma                             Istituto Ortopedico
                           www.unipr.it                             Rizzoli
                                                                    www.ior.it

                                            UNIMORE
                                Università degli studi
                           di Modena e Reggio Emilia     UNIBO
                                    www.unimore.it       Alma Mater Studiorum
                                                         Università di Bologna
                                                         www.unibo.it

16
3
ENABLING
INFRASTRUCTURE
3.1.         HPC, HTC
             & NETWORKING

CINECA is the national supercomputing           tier 0 high end computing system with
facility. Thanks to the long-term vision        a peak performance on the order of 20
of the Ministry of Education, University        petaflops, ranked number 12 in TOP500 at
and Research, which has persistently            the time of its installation, a tier 1 system
supported the progressive development           for quality of service, an HPC cloud
of the CINECA Interuniversity Consortium,       system, and a prototype production
over time CINECA introduced in Italy            system for artificial intelligence and
the most innovative world class HPC             machine learning applications. In all,
system architectures. The first vector          the current computing architecture
supercomputer was introduced in                 integrates more than 9000 server nodes.
Italy in the 1980s, the first parallel HPC      A large-scale data repository, with a
architectures in the 1990s, the first cluster   full capacity in excess of 50 petabytes,
at the beginning of the 2000s, the large        completes and integrates the computing
massively parallel architectures, the           architecture as well.
first petascale system in Italy, ten years
ago; and lastly, the modular cluster,           The CINECA HPC facility enables a wide
the current multipetascale HPC system           range of scientific research through
actually in production. At the time of          open access granted by independent
their installation, most of the CINECA          international peer reviewed processes
supercomputer systems were ranked               by mean of the PRACE association at
in the top part, from the 7th to the 12th       the European level, and the ISCRA at the
position, of the TOP500, enabling and           national level.
supporting national and European
excellence in science and technology            CINECA provides services to the
innovation. The high end tier 0 systems         Worldwide Eurofusion community, being
have been continuously integrated in a          the contractor of the European tender for
complex environment that complements            that specific service that will run till the
the tier 0 supercomputers with a high           end of 2023, and also provides operational
quality of service tier 1 system, a large       computing service for weather forecasting
scale data repository, and, from time           for National Civic Protection under the
to time, innovative prototypes to keep          supervision of Emilia Romagna ARPA-
pace with the cycles of innovation and          SMR. CINECA is part of the European
to develop and test new architectures           digital infrastructure for many ESFRI RI
with the objectives of increasing and           facilities and initiatives, among others,
improving the effectiveness and the             EPOS: European Plate Observing System,
efficiency of the next cycle highest            led by INGV, ELIXIR: Infrastructure for Life
performing computing system.                    Science, and the HPB European Human
                                                Brain Flagship Project. Also, with reference
The current computing architecture              to those RI initiatives (but not only those),
facility hosted by CINECA integrates a          CINECA is a core partner of the European

18
HPC, HTC
                                                                    & NETWORKING

Open Science Cloud HUB infrastructure           the framework of a national INFN HTC
and the European Data Infrastructure.           infrastructure consisting of the CNAF Tier
At the national level, many joint               1 and 10 smaller facilities, called Tier 2
development partnership agreements              centres, placed all over Italy.
are in force with INFN, ENEA, INAF,
SISSA, ICTP and many R&D collaboration          The CNAF data centre operates about
actions and agreements are in force with        1,000 computing servers providing
qualified national research institutes,         ~30,000 computing cores, allowing the
universities, and public administrations.       concurrent run of an equivalent number
CINECA received recognition as a ‘Golden        of general-purpose processing tasks. All
Digital Hub for Innovation’ from the Big        the computing resources are centrally
Data Value Association, is a Competence         managed by a single batch system
Centre for supporting innovation towards        (LSF by IBM) and dynamically allocated
industries, and has led many proofs of          through a fair-share mechanism, which
concepts in collaboration with industries       allows full exploitation of the available
and private organisations. CINECA               CPU-production with an efficiency of
entered formal partnerships for added           about 95%. Part of the CNAF computing
value services and R&D activities with          power is currently hosted at CINECA, and
ENI and manages and operates the                connected to the main site via dedicated
ENI corporate supercomputing facility,          high-speed network links traversing the
one of the world’s largest industrial           city of Bologna, allowing the remote
supercomputing infrastructures.                 computing nodes to access the mass
                                                storage located at CNAF with maximum
The HTC facility is hosted at CNAF              throughput and minimal latency, as if
Bologna, which is one of the INFN               they were local.
National Centres defined in the INFN
charter. CNAF has been charged                  CNAF operates a very large storage
with the primary task of setting up             infrastructure based on industry
and running the so-called Tier-1 data           standards, for connections (about 100
centre for the Large Hadron Collider            disk servers and disk enclosures are
(LHC) experiments at CERN in Geneva.            interconnected through a dedicated
Nowadays it hosts computing not only            Storage Area Network) and for data
for LHC but also for many experiments,          access (data are hosted on parallel file
ranging from high-energy physics to             systems, GPFS by IBM, typically one per
astroparticles, dark matter searches in         major experiment). This solution allows
underground laboratories, etc. CNAF             the implementation of a completely
also participated as a primary contributor      redundant data-access system. In total,
in the development of grid middleware           CNAF hosts about 40 PB of online disk
and in the operation of the Italian grid        space, with a total I/O bandwidth of
infrastructure. This facility operates within   about 1.5 Tb/s and 90 PB of nearline tape

                                                                                        19
HPC, HTC
                                             & NETWORKING

space arranged in a robotic library read
out by 20 tape enterprise-level drives.
The 10 additional Tier 2 centres
distributed across Italy have a similar
aggregate of resources, both in terms of
computing and disk storage, but not tape.
They are connected to the CNAF Tier1
through dedicated high-speed links, and
the resources can work in a coordinated
way in the global INFN computing
ecosystem.
The user community of the CNAF Tier1
facility is primarily composed by research
groups from INFN and most of the Italian
universities (including UNIBO, UNIFE,
UNIPR), working on nuclear, subnuclear,
astroparticle and theoretical physics. In
the coming years, the HTC computing
resources of INFN CNAF will be increased
at least by an order of magnitude in
order to meet the computing and
storage requirements of forthcoming
experiments and upgrades.

In addition, LEPIDA, within the framework
of the Digital Agenda of the Emilia-
Romagna, has been entrusted with the
design, implementation and provision
of four data centres, geographically
distributed and natively connected
by the LEPIDA network, for the use of
public administrations and healthcare
systems. These data centres offer
advanced computing services, storage,
data protection, business continuity with
native disaster recovery functions and a
specific focus on energy management.

20
3.2.        OTHER RELEVANT
            FACILITIES

As previously mentioned, the               Reusable) principles and guidelines and
concentration in Bologna of national       to move towards the implementation
infrastructures for HPC and big data       of the digital single marketplace at the
constitutes the interchange and            European level provided by the EOSC –
reference hub for the national and         European Open Science Cloud initiative.
European public and private research
community. The national panorama
also presents a vitality of tier 1-class
infrastructures and highly complex
cloud computing services.
As shown schematically below, with
positioning data, which, due to its
continuous updating process could
suffer from some inaccuracy, but which
will represent the national panorama,
the distributed presence of digital
infrastructures is certainly valuable.
Attention is drawn to the distributed
infrastructure of the INFN sites, to the
infrastructure implemented through the
action of ReCaS (calculation network
for SuperB and other applications ),
and to the distributed system for cloud
computing services implemented by
GARR. In addition to the distributed
infrastructures, there are tier1
systems at the peta scale class at the
Portici computer centre of ENEA, the
CRESCO systems, at the University
of Salento computer centre, which
houses the CMCC supercomputing
system, and the computing centre of
SISSA / ICTP, which hosts the SISSA
computing system. By mean of “SUPER
– Supercomputing Unified Platform
Emilia-Romagna” the HPC and Big Data
national ecosystem will be federated
to enable good practices of data
access and processing based on FAIR
(Findable, Accessible, Interoperable,

                                                                                 21
OTHER RELEVANT
                                                                            FACILITIES

                          National HPC and Big Data infrastructure

 Institution   Location      Site Name           # of cores   # of server nods   Storage (PB) Tape (PB)
               Bari          NFN-BARI T2         174          12                 1,51
               Bari          RECAS               4.000        250                3,55
               Bari          GARR                1.152        58                 3,00
               Bologna       CNAF-T1             30.000       1875
                                                                                 40,00        90,00
               Bologna       CNAF-LHCB-T2        384          12
               Catania       INFN-CATANIA        390          20                 1,20
               Catania       RECAS               2.562        128                0,32
               Cosenza       RECAS               3.500        175                0,90
 INFN          Frascati      INFN-FRASCATI       150          8                  ,13
               Milano        INFN-MILANO-ATLAS   180          12                 1,31
               Napoli        INFN-NAPOLI-ATLAS   334          21                 1,82
               Napoli        RECAS               4.956        310                4,96
               Padova        LNL                 452          28                 1,76
               Pisa          INFN-PISA           1.250        78                 2,49
               Roma          INFN-ROMA1          160          10
                                                                                 1,34
               Roma          INFN-ROMA1-CMS      192          6
               Torino        INFN-TORINO         78           6                  0,06
               Bologna       MARCONI             406.000      7000
 Cineca        Bologna       GALILEO             72.576       1512               20,00        20,00
               Bologna       PICO                2.400        50
 CMCC          Lecce         CMCC                13.824       288                4,00
 CNR           Roma          Data Center         1.034        64                 1,20
               Casaccia       ENEA-Casaccia      192          12                 0,02
               Frascati      ENEA-FRASCATI       480          24                 0,08
 ENEA          Portici       CRESCO4             4.864        04
               Portici       CRESCO5             512          32                 4,00         5,00
               Portici       CRESCO6             20.832       434
               Catania       GARR                1.152        58                 3,00
               Cosenza       RECAS               384          32                 1,00
 GARR
               Napoli        GARR                384          24                 1,00
               Palermo       GARR                1.152        72                 3,00
 INAF          Trieste       OATS                256          16                 1,00
 SISSA         Trieste       SISSA               4.000        250                1,50

22
3.3.         FUTURE TRENDS
             AND DEVELOPMENT

The digital infrastructures, high             science.
performance computing architecture,           From this point of view, the development
hyper scale cluster architecture and          roadmap of CINECA foresees, as a
high bandwidth network infrastructure         baseline target, systems with dozens of
will have a tremendous and drastically        petaflops in a short time and hundred-
innovation in the next few years.             petaflop systems in the medium
                                              term. The medium term target will be
The global race towards exascale              definitively increased in case of success
computing is imparting a strong               in acting as an EuroHPC Hosting Entity.
acceleration to the deployment
of supercomputing systems with                The needs and the objectives of capping
a performance in excess of many               the consumption of electrical energy
hundreds of petaflops and the target          of supercomputer systems has led to
for the availability of the first exascale    the development of microprocessor
computing system, a supercomputer with        technologies characterised by an
a performance of at least one thousands       extremely narrow performance profile,
petaflops, is more and more close. The        such as floating point accelerators, FPGA
USA, China, and Japan are investing many      components, tensor microprocessors,
billions of dollars for supporting R&D with   and neuromorphing chips, but
the target of having an exascale system       an extremely high ratio of energy
in production by 2022/23.                     efficiency to performance if properly
                                              exploited at the level of the algorithms
The European Union and 28 member              and programming models. Such
states created a joint undertaking            components have to be integrated in
with the purpose of mobilising a              server nodes that combine general-
budget of billions of Euros for R&D for       purpose microprocessor sockets and
the development of European HPC               floating point accelerators in hybrid
technologies and for achieving towards-       configurations that needs to be co-
pre-exascale and exascale systems             designed in light of specific application
to open production in Europe in the           domains. The physics of the materials
same time frame. The Italian federal          and matter, the environment and climate
government, the local ecosystem of            change, energy savings and energy
universities and research institutes,         production from renewable sources,
CINECA and other agencies, and regional       the health and life sciences, artificial
public administrations are collaborating      intelligence, as well as cyber security,
and working for hosting in Italy one of       are and will be the main drivers for the
those supercomputers, on the basis of         CINECA co-design approach, and co-
the excellent track records of CINECA         design will be the main driver for the
and of the vital persistent development       development roadmap of the CINECA
in Italy of the methods of computational      supercomputing infrastructure towards

                                                                                     23
FUTURE TRENDS
                                                          AND DEVELOPMENT

performances on the order of hundreds      scale facilities producing and processing
of petaflops and at the exascale.          the increasing availability of big data.
Regarding the large hyper scale cluster,   In the future, the LEPIDA physical fiber
the overlap between the low-power          optic network integrated in the GARR-X
multi-core sockets and the software        service will guarantee state of the art
technology for virtualisation and          high-speed network links, moving from
containers pushes the limits of the very   the current tens of Gigabits per second
large resilient configurations towards     to the scale of hundreds of Gigabits per
multimillion-core clusters. This has led   second, till the next scale of up to 400
the scientific community and industry      Gigabits a second backbone foreseen
to develop new processing methods          for wide availability in the timeframe
based on high performance throughput       2021/2022.
computing for filtering and gaining
insight into the huge amount of data       In order to deploy on time such a
produced by extreme experiments or         breakthrough in all the dimensions of
observations, such as the High Energy      the digital infrastructure technologies,
Lumen LHC experiments at CERN,             it will be necessary, starting now, to
or the Square Kilometre Array multi-       develop progressively the consolidation
radio telescope project, or the Internet   of the main hubs around which will
of Things, in the context of industrial    be federated the large resources
innovation processes and public            distributed in the different core centres
administration decision support work       of the national and European public
flows.                                     and private research systems and
                                           the main core centres of the public
From this point of view, as the INFN       administrations.
is one of the world’s largest tier 1
LHC computing grids, the target            The aim of the SUPER regional
is to maintain and improve this            infrastructural project will be to proceed
leading position and enabling the          in the direction of coordinating the
provisioning of services to consolidate    resources distributed at the national
the scientific competitiveness of the      level around the Bologna high
national and European community. The       performance computing and big data
implementation of this strategy foresees   world class hub as a model for open
a multimillion-core cluster at CNAF,       science in Italy and in Europe and as
complemented with a data repository        an enabling facility to support the
capacity at the near-exabyte level.        development of the proofs of concepts
The sharing of data and the anywhere-      for innovation in the industrial sector
anytime access model requires              and the optimisation of the decision
the availability of large bandwidth        processes of the public administrations.
geographical networks to federate large

24
4
SCIENTIFIC
AND APPLICATION
DOMAINS
26
4.1.          DIGITAL ENABLING
              TECHNOLOGIES

Trends in digital innovation are                  chains, both for large national, European
progressively providing the ability to            and international players and, at the level
globally extend connectivity to any person        of regional ecosystems, for SMEs and
and any device, to seamlessly access any          innovative industries and creative startups,
information and data from anywhere and at         with big data modelling and processing
any time, with on demand processing and           market in hardware, software and ICT
analysis of the data for modelling scenarios      services a fast growing multibillion-euro
and supporting the decisions (ATAWAD:             business sector of industry.
anytime, anyway, any device).
                                                  In order to unleash the potential arising
The leading edge components of such a             from the exploitation of big data value, it
pervasive ecosystem of digitally enabling         is essential to research new technologies,
technologies are the high speed networks          including HPC, data processing, data
to interconnect large scale facilities; large     science and artificial intelligence.
volume repositories to preserve and curate
the big data, and high end (HPC and HPTC)         Multimedia, data storage, data movement
computing systems. These technologies             and networking, communication and
deal with simulation and modeling of              visualisation tools are complementary
scientific and engineering problems as well       enabling technologies for the development
as data analysis. The scientific capabilities,    of the big-data market. Fairness in the
industrial competitiveness and sovereignty        access to public data and at the same time
depend critically on access to world-leading      strong safeguards for privacy and cyber
HPC computing and data infrastructures to         security are also essential.
keep pace with the growing demand and
complexity of the problems to be solved.          Furthermore, a consistent long-term
                                                  strategy is essential for educating and
As the problems modeled in computer               training professional data scientists to favor
simulations and decision support systems          an holistic and integrated multidisciplinary
grow in size and complexity (to enable            and interdisciplinary problem solving
more detailed predictions, to cope with           approach and to manage policies and
ever larger amounts of data, or both), so do      complexity to support the research
the demands on digital resources. In many         infrastructures and industries in properly
areas, spanning from health, biology and          handling and gaining insight from big data.
climate change to automotive, aerospace,
energy and economics, digital technology
provides a practical way to address               Major European Projects: AI4EU, AIDA, AIDA-2020,
                                                  ANTAREX, BD2DECIDE, DeepHealth, EnABLES,
complexity, and access to it becomes
                                                  ENABLE-S3, EOSC-hub, EPEEC, EUDAT2020,
essential.                                        EUROfusion, ExaNeSt, EXDCI, XDC, HERCULES,
                                                  HPC-EUROPA3, ICEI, OPRECOMP, PRACE-4IP,
To totally harvest the benefits of digital        SeaDataCloud, SECREDAS, Re-Search Alps,
technology, it is necessary to support a          SEADATANET-II.
full ecosystem comprising hardware and
software infrastructures, applications, skills,   Active members: CMCC, CNR, ENEA, INAF, INFN,
services and their interconnections. Digital      INGV, LEPIDA, UNIBO, UNIMORE, UNIPR, CINECA.
innovation research has a direct and major
impact on the whole ICT and HPC value

                                                                                                27
28
4.2.          HEALTH AND
              LIFE SCIENCES

Big data are attracting a great deal of           time, reducing costs for the national health
attention for the potential in healthcare and,    services and/or regional health authorities.
more widely, in all biomedical scenarios. In
the last few years, hospitals, universities and   Big data in the life sciences and medical
research centres have started fruitful and        practice are still a big challenge, but
data-productive analyses at the European,         they will leverage the existing scientific
national and regional levels. For example,        knowledge, improving translational
omics studies and multidimensional                research, develop personalised strategies,
imaging scans producing large volumes of          and implement innovative technological
data and knowledge are helpful in areas           products. All these efforts will have an
from neuroscience to orthopaedics, from           impressive social, economic and industrial
cardiovascular diseases to aging. By this         impact. In this perspective, a regional
means, an increasing number of case               infrastructure, able to manage and process
studies in healthcare are well suited for a       large-scale data, will allow running
big data solution. in addition, big data allow    prospective population studies that will
solving clues in common conditions but also       give an insight in treating epidemiologically
discovering new developments for treating         significant diseases such as cancer,
rare diseases, steering healthcare towards        neurological and cardiovascular conditions,
personalised and “precision” medicine.            osteoporosis, muscle-skeletal diseases,
Nowadays, there is a pressing need for            as well as geriatric and ageing-related
fast and preferential connections for data        pathologies.
transfer to transform potential into reality
and instruments for information aggregation
and extrapolation are urgent, such as rapid       Major European Projects: ADOPT-BBMRI-
                                                  ERIC, AirPROM, ChiLTERN, COMPARE, CORBEL,
and high-performance techniques for
                                                  DeepHealth, ECDP, EDEN ISS, EJPRD, EU-
machine learning and data mining. Despite         STAND4PM, HPB SGA1, INforFUTURE, LANGELIN,
this strong impulse, the exploitation of big      MCDS-Therapy, MYNEWGUT, Neuromics,
data information to generate new health           ORTHOUNION, PAPA-ARTIS, PLATYPUS, PROPAG-
knowledge is much delayed/hindered by             AGEING, THALAMOSS, VPH-Share.
the inherent heterogeneity of big data and
the lack of broadly accepted standards, as        Active members: CNR, CINECA, INFN, INGV, IOR,
well as by legal issues surrounding the use       UNIBO, UNIFE, UNIMORE, UNIPR.
of personal data.

At the regional level, a proper and
coordinated technology framework,
taking care of context and metadata, will
activate a data-driven improvement for a
meaningful use of big data in healthcare
development, boosted by top-level HPC
infrastructures. A more accurate and defined
management of large-scale information
and local data exchange and integration
will lead physicians, medical doctors and
researchers to better treatments capable of
answering patients’ issues and, at the same

                                                                                              29
30
4.3.          AGRI-FOOD, BIOBASED INDUSTRY
              AND BLUE GROWTH

The relevance of big data to the agri-food       analysis of key data of interest for maritime
sector, bio-based industry, and blue growth      security (including migration phenomena),
is pivotal, also for the current blooming of     maritime navigation and transportation
innovative technologies and omic tools           safety and security, sustainable fisheries
applied to the most diverse industrial           and aquaculture, biotic and abiotic sea
sectors (food production and safety,             resources exploitation, etc.
primary production and animal/plant
breeding, industrial biotechnology, enzyme
and microbial discovery, etc.). Managing         Major European Projects: CIRCLES, CYBELE,
                                                 INMARE, LEGVALUE, MedAID, PANINI, PerformFISH,
these (big) data is a formidable challenge.
                                                 Prolific, RES URBIS, SMARTCHAIN, TREASURE,
Improving consumer health by monitoring          VALUMICS.
food-related data is one of the areas
that may benefit most from a radically           Active members: CINECA, CNR, INGV, UNIBO,
innovative use of big data to provide            UNIPR.
more personal recommendations, via
various technological platforms, which
can improve the quality of life. Beyond
the typical use of data analysis for food
safety, big data is also related to predictive
analytics, with an impact on economy
and logistics, as well as to metagenomics
for the characterisation of food spoilage.
EFSA is interested in future strategic
approaches to risk assessment in areas
of food and feed safety that benefit from
the acquisition, processing, and sharing
of large quantities of data and evidence.
Therefore, EFSA supports this initiative and
is open to supporting solutions that reduce
the costs and improve the effectiveness of
data acquisition by sharing resources and
capabilities.

Big data has also an increasing importance
in agricultural practices, since the
integration of data from sensors, Internet of
Things (IoT) applications and genomics can
lead farmers to increase their productivity
and sustainability. They can also improve
efficiency and sustainability as well as
flexibility and safety in the food and
biobased industry.

Finally, IoT, cloud computing and big
data and data analytics are required
for sharing, advanced processing and

                                                                                             31
32
4.4.          SMART CITIES
              AND SECURITY

Big data is a growing area of interest           offering transport services.
for public policy makers, for cities and
urban management: it is related with             A smart use of big data supports
the enormous stream of data coming               governments in optimising multimodal
from administrative and social data,             transport and managing traffic flows, making
from city energy, mobility and transport         cities smarter. Real-time transportation
infrastructures, from large and increasing       planning and safety, environmentally
sensor networks, comprehending large             sustainable and resource-efficient transport,
nodes connected under the Internet-of-           socio-economic and behavioural research,
Things paradigm, video surveillance and          and forward looking activities for policy
environmental cameras, and so on.                making are fundamental topics which
                                                 greatly benefit from big data. As well,
The analysis of data for purposes of social      the huge amount of data coming from
innovation and human-centric services            GPS systems, real-time traffic monitoring,
(e.g. personal safety and physical security,     parking availability, electric charge station
crowd-sensing/participatory sensing in           availability, etc., can provide competitive
an urban environment), for smart mobility        advantages in order to optimise vehicle
and smart logistics services, for critical       design, maintenance and energy
infrastructure protection, for emergency         management, improve on-board automation
management, etc. is becoming crucial and         and safety, and reduce CO2 emissions.
challenging due to the magnitude of the
data and the requirement of timeliness.          Moreover, the widespread use of
Connected with smart city and social big         modern monitoring systems in electricity
data, cyber security (including privacy,         transmission, distribution systems and smart
access control management, biometrics,           meters monitored by the consumer enables
etc.) is one of the most critical areas in big   the acquisition of real-time, high-resolution
data analytics and management. There’s a         data. This coupled with the addition of other
growing demand for security information          data sources, such as weather data, usage
and event management technologies and            patterns and market data, dramatically
services, which gather and analyse security      increases the need for appropriate big data
event big data that is used to manage            handling techniques and machine learning
threats.                                         approaches. Existing grid capacities could
                                                 be better used, and renewable energy
In transport, the volume of data has             resources could be better integrated.
increased because of the growth in the
amount of traffic (all modes) and detectors.
Also, travellers, goods and vehicles are         Major European Projects: Domino, FLEXMETER,
                                                 FOCAS, ICARUS, MARISA, OSMOSE, PRYSTINE,
generating more data from mobile devices
                                                 SocialCar, SOGNO, TRAFAIR.
and tracking transponders (including
trains, ships and aircraft). Infrastructure,     Active members: CINECA, CMCC, ENEA, LEPIDA,
environmental and meteorological                 UNIBO, UNIMORE.
monitoring also produce data that is
related to transport operations and users.
New ways to collect, manage and analyse
vast quantities of data are important both
for governments and private companies

                                                                                               33
34
4.5.          MATERIALS
              SCIENCE

Data and metadata for the development             is increasingly important, and requires
of new materials, their processing, and           sharing and analysing the resulting highly
application life cycles are considered            distributed, heterogeneous data-sets.
key assets to accelerate discovery                An example comes from the electron
and innovation across all design and              microscopy community: they produce
manufacturing sectors. With its academic          multidimensional data that need to be
and industrial knowledge base and the             stored, transferred, shared, managed
related production of data, Emilia Romagna        and processed, both online (i.e. during
competes with the most advanced regions           acquisition) and offline. Efforts in this
in Europe and worldwide. Here, the                direction are going to involve broader
main challenges are often related to the          communities working on automated
construction and maintenance of well-             materials and systems analysis at large
annotated and structured repositories, their      scale facilities (e.g. synchrotron and neutron
accessibility, their specialisation/integration   facilities), research laboratory equipment,
and interactions with similar European            or distributed sensing systems in academic
and international operations, as well as          and industrial environments.
the development and deployment of data
analytics strategies.
                                                  Major European Projects: EMCC-CSA, EoCoE,
                                                  EXTMOS, GEMMA, GRAPHENE FLAGSHIP,
A special feature of the materials domain
                                                  GRIDABLE, INTERSECT, IQubits, MaX, NEXTOWER,
is the wealth of predictive information that      SimDOME, NANODOME.
can be obtained from simulations, now
extending to quantum and multiscale               Active members: CINECA, CNR, ENEA, UNIBO,
approaches thanks to HPC and HTC. In the          UNIMORE.
silico design of materials and (bio)molecular
systems this is no longer limited to their
structures and stability: it includes a huge
range of functionalities, e.g. from friction
and wear to biocompatibility, from colour
and optical appearance to hydrophobicity,
from electron transport to thermal
properties within devices. Italy hosts an
important community of advanced users of
such applications, but also leads European
efforts of code developers who work at
the frontiers of the current and future HPC
technologies, invest in software/hardware
co-design, and are creating an ecosystem
of capabilities, applications, data workflows
and analysis, and user-oriented services.
Importantly, Italy coordinates the European
Centre of Excellence for HPC on materials
design at the exascale, MaX.

Data acquisition from the advanced
spectroscopy and imaging of materials

                                                                                              35
36
4.6.          INDUSTRY 4.0

Manufacturing industrial sectors are the          synthetic data from digital-twin simulations.
most relevant and impactful area in Italian       Manufacturing facilities must be capable
national research, innovation, and company        of meeting the requirements of the
activities, in terms of employees, production,    exponential increase in data production, as
and business. The related applied research        well as possess the analytical techniques
activity is carried out in a set of coordinated   needed to extract valuable knowledge from
research & innovation organisations               big data. Techniques, services, applications,
(university labs as well as interdepartmental,    platforms, and infrastructure needs are
technopoles, the 8 Competence Centres             often shared with other research and
established in the framework of the Industry      production fields, in terms of cloud/edge
4.0 programme, etc.) and industries working       computing and HPC, big data analytics and
in manufacturing, mechanics, industrial           visualisation, machine learning, and online
processing, and in many related areas such        stream processing.
as mechatronics, automotive, robotics and
automation, packaging, ceramics, electronic
equipment, textile and garments, machinery        Major European Projects: AEOLIX, CARIM, CLASS,
                                                  CloudiFacturing, ColRobot, DREAM, ETEKINA,
and metal production, etc.
                                                  EXCELLERAT, FIRST, Fortissimo, FORTISSIMO2,
                                                  I-MECH, IMPROVE, INCLUSIVE, IO-Twins, LINCOLN,
The changes in production processes and           Plan4Res, ROSSINI, SARAS, SYMBIOPTIMA,
their management increasingly require             SYMPLEXITY.
the exploitation of knowledge extracted
from data, acquired (offline and online) in       Active members: ART-ER, CINECA, CNR, UNIBO,
virtual prototyping, testing, production, use,    UNIFE, UNIMORE.
and maintenance. Industry 4.0 leverages
the new paradigms of the IoT and cyber
physical systems, exploiting the enormous
amount of data coming from simulations,
digital twins, and real machines to realise
superior performance and energy-efficient
manufacturing.

The manufacturing industry is currently in
the middle of a data-driven revolution, from
traditional manufacturing facilities to highly
optimised smart manufacturing facilities
to create manufacturing intelligence
from machine-to-machine dataflow and
to support accurate and timely decision-
making. For example, new techniques,
instruments, services for data analysis,
learning, prediction and statistics enable
novel diagnostics and prognostics for
predictive maintenance by processing huge
amounts of data describing the time-history
of working conditions, integrated with open
data about the execution environment and

                                                                                                37
38
4.7.           EARTH SCIENCE AND
               CLIMATE CHANGE

Big data in climate change is related to services    amount of output data both in diagnostic and in
for: (i) developing, porting and running Earth-      forecast mode. Archives of high resolution (about
system models at high spatial resolution; (ii)       1 km) atmospheric surface parameters (wind,
developing metrics and diagnostic tools for          temperature, solar radiation, etc.) predicted, on
evaluating the models and for specific end-users’    a daily basis, up to 48 hours ahead, are available
needs; (iii) acquisition, storage and processing     for many environmental and renewable energy
of data on the order of petabytes from model         applications. Computing services for numerical
simulations and observations.                        weather predictions are provided daily and
                                                     operational ensemble forecasts on the monthly
High performance computing, data repository,         timescale are performed on a weekly basis.
data sharing and staging services are crucial        Chemical transport models produce both 5-day
to allowing a wide user base to produce and          air pollution forecasts for Italy and Europe and
have access to a set of climate variables at high    annual diagnostic simulations for the country.
temporal resolutions and at extremely high           Hourly data for air pollutant concentrations and
spatial resolutions, and to perform probability      meteorological fields are calculated over the
predictions from sub-seasonal to multi-annual        national domain with horizontal spatial resolution
time scales. This is necessary in order to provide   ranging from 4 to 1 km for 12 vertical levels from
reliable climate information and maintain            ground level to an altitude of 12 km.
competitiveness at the international level with
the major international meteo-climate services.      Environmental change is particularly rapid
                                                     in the coastal zone and offshore under the
Users include both climate scientists and            pressure of climate change (e.g. relative sea
researchers from a wide range of fields, but         level rise) having impacts on drainage basins,
climate data is of crucial interest for a large      coastal plains (e.g. man-induced subsidence),
number of public/private actors in the energy,       and offshore (e.g. hydrocarbon exploitation, fish
agriculture, water, health, tourism, urban           trawling, energy plants, maritime traffic). Ocean
management and transport sectors. Overall, the       observing systems combined with HPC provide
output of high-resolution climate simulations is a   scenarios of storm surges and associated
valuable asset at both the national and regional     coastal flooding, sea state, coastal erosion
levels.                                              and long term ecological trends. Quantitative
                                                     4D geophysical pictures of the seafloor and
The term ‘Earth data’ refers both to atmosphere,     sub-seafloor after major events (blooms, river
ocean and sea state simulations (uncoupled or        floods, coastal erosion or anthropic interventions)
coupled in earth system models), environmental,      help determine habitat modifications, coastal
geophysical and oceanographic observing              dynamics and subsidence trends.
systems, seafloor mapping and forecast services
for institutional agencies (Civil Protection,
                                                     Major European Projects: AtlantOS, BE-OI,
Coast Guard and Ministry of the Environment)
                                                     ChEESE, CLARA, CRESCENDO, ENVRI PLUS,
and to large databases of energy uses in             HIGHLANDER, IMMERSE, Iscape, LISTEN MED-
industry, emission factors, basic data for life      GOLD, MEDSCOPE. MISTRAL, MOSES, ODYSSEA,
cycle assessment (LCA) and carbon footprint          OPERANDUM, PRIMAVERA, ROCK, RURITAGE,
calculation and to databases related to the uses     SeaDataCloud, SECLI-FIRM, SOCLIMPACT,
of the sea under a blue growth strategy and the      SYSTEM-RISK.
BLUEMED Initiative.
                                                     Active members: ART-ER, CINECA, CMCC, CNR,
                                                     ENEA, INGV, UNIBO, UNIMORE.
Atmospheric models have increased their spatial
and temporal resolution and produced a large
                                                                                                     39
40
4.8.         GLOBAL SOCIAL SCIENCES
             AND ECONOMICS

Big data in the social sciences and in          boundaries, including the fields of
economics are increasingly important for        economics, actuarial and financial sciences,
research, business, finance, policy-making      mathematics, statistics, programming
and the society at large, creating social       and computer science. The emergence
innovation, unique opportunities for job        of a multidisciplinary big data approach
creation and worldwide leadership in R&I        can provide a solid evidence-base for
and the market.                                 policy-making, as well as a quantitative
                                                knowledge base for solving problems in
These data (e.g. videos, photos and micro-      finance, insurance and risk-management.
messages shared on social networks,             Legal aspects can also play a big role in the
GIS, networks of relations, as well as          success of big data in finance and economy,
behavioural and psychological data)             especially as regards competition law,
promote interdisciplinary activities with       intellectual property rights, privacy law and
regard to economic aspects, legal and           consumer law.
ethical issues, and privacy-by-design
principles. Behavioural data refers to          In a digital global marketplace, leveraging
multidimensional data-sets on human and         big data is an unparalleled opportunity
social behaviour, actions, and interactions     for companies including retail, finance,
that allow understanding individual as          banking, advertising and insurance to gain
well as community preferences, attitudes        a competitive edge. The real-time and fast
and habits, while psychological data-sets       processing of large volumes and varieties
collect the results of psychological tests,     of data has a strong potential impact on
surveys, attitude exams, and similar sources.   key areas such as: customer data analytics;
Behavioural data analysis contributes           innovative marketing strategies; market
to designing focused and personalised           trends and expectations; access to credit
services in different fields (e.g. transport,   and finance for households and enterprises;
healthcare, the environment); social            e-commerce and e-paying systems for an
network and GIS data analysis is focused        easier access of consumers and citizens
on understanding inner aspects of human         to services; risk management; business
relations, and it can be applied to many        insights, organisational intelligence and
fields (e.g. commercial advertising, study of   operational efficiency.
the propensity to buy, sentiment analysis);
psychological data analysis, on the other       Major European Projects: CoSIE, MICADO, MIREL.
hand, helps researchers in proving or           .
disproving theories, and/or supporting          Active members: UNIBO.
conclusions with statistical evidence. The
collection and analysis of large amounts
of behavioural data has also important
consequences for the formulation and
application of legal rules.

Moreover, the increasing use of large-scale
administrative data-sets, private sector
data and social media data can contribute
to impactful social science research
that transcends traditional disciplinary

                                                                                            41
You can also read