INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE

Page created by Sally Potter
 
CONTINUE READING
INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
INDIGO-­‐DataCloud
               Better	
  Software	
  for	
  Better	
  Science.

RIA-­‐653549
                       Davide Salomoni,	
  INFN-­‐CNAF
                        INDIGO-­‐DataCloud Project	
  Coordinator
                            davide.salomoni@cnaf.infn.it
                                  ESA-­‐ESPI	
  Workshop
                                  Frascati,	
  7/7/2017
INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
INDIGO-­‐DataCloud
• An	
  H2020	
  project	
  approved	
  in	
  January	
  2015	
  in	
  the	
  EINFRA-­‐1-­‐2014	
  
  call
      • 11.1M€,	
  30	
  months	
  (from	
  April	
  2015	
  to	
  September	
  2017)
• Who:	
  26	
  European	
  partners	
  in	
  11	
  European	
  countries
      • Coordination	
  by	
  the	
  Italian	
  National	
  Institute	
  for	
  Nuclear	
  Physics	
  (INFN)
      • Including	
  developers	
  of	
  distributed	
  software,	
  industrial	
  partners,	
  research	
  
        institutes,	
  universities,	
  e-­‐infrastructures
• What:	
  develop	
  an	
  open	
  source	
  Cloud	
  platform for	
  computing	
  and	
  
  data	
  (“DataCloud”)	
  tailored	
  to	
  science	
  but	
  applicable	
  to	
  other	
  
  domains	
  as	
  well.
• For:	
  multi-­‐disciplinary	
  scientific	
  communities
      • E.g.	
  structural	
  biology,	
  earth	
  science,	
  physics,	
  bioinformatics,	
  cultural	
  
        heritage,	
  astrophysics,	
  life	
  science,	
  climatology
• Where:	
  deployable	
  on	
  hybrid	
  (public	
  or	
  private)	
  Cloud	
  
  infrastructures
      • INDIGO	
  =	
  INtegrating	
  Distributed	
  data	
  Infrastructures	
  for	
  Global	
  ExplOitation
• Why:	
  answer	
  to	
  the	
  technological	
  needs	
  of	
  scientists	
  seeking	
  to	
  
  easily	
  exploit	
  distributed	
  Cloud/Grid	
  compute	
  and	
  data	
  resources.
     Davide	
  Salomoni                                                 ESA-­‐ESPI	
  Workshop,	
  7/7/2017     2
INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
INDIGO-­‐DataCloud’s Vision
• INDIGO:                                                                                                                                      Scientific	
  Users
      1.      Develops	
  open,	
  interoperable	
                                                                                                       Adopt,	
  Use
              solutions	
  for	
  scientific	
  data.
      2.      Supports	
  open	
  science organizing	
                                                              INDIGO	
  Advanced	
  Components	
  and	
  Solutions
              the	
  European	
  data	
  space.
      3.      Enables	
  collaborations across	
                                                                                                         Deployed	
  on
              diverse	
  scientific	
  communities	
  
              worldwide.                          D1.8,	
  General	
  
                                                       Architecture                                     Private	
  or	
  Commercial                        Publicly	
  funded	
  e-­‐infrastructures
• INDIGO	
  offers	
  its                              D2.1	
  and	
                                  Clouds	
  (Public,	
  PCP-­‐based,	
                  (EGI,	
  EUDAT,	
  GEANT,	
  PRACE,	
  RI,	
  
      •    architecture,                                 D2.4,	
  
                                                      community	
  
                                                                                                                    etc.)                                                      etc.)
      •    analysis,                                 requirements

      •    expertise                                               INDIGO’s	
  34	
                                                                      Exploiting
      •    and	
  software	
  components                       deliverables	
  (so	
  far)

                                                                                                                                         Datasets,	
  Resources
• as	
  a	
  concrete	
  step	
  toward	
  the	
                                                The	
  
                                                                                              INDIGO-­‐
  definition	
  and	
  implementation	
  of	
                                                DataCloud                                                   To	
  produce
  a	
  European	
  Open	
  Science	
  Cloud	
                                                  Service	
  
                                                                                             Catalogue
  and	
  Data	
  Infrastructure.                                                                                                          Scientific	
  Results
   Davide	
  Salomoni                                                           ESA-­‐ESPI	
  Workshop,	
  7/7/2017                                                                             3
INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
The	
  INDIGO	
  Services
• We	
  recently	
  released	
  our	
  second	
  and	
  final	
  major	
  software	
  release,	
  
  called	
  ElectricIndigo
• Fact	
  sheet	
  (https://www.indigo-­‐
  datacloud.eu/service-­‐component):
       • 40	
  modular	
  components,	
  distributed	
  via	
  
         170	
  software	
  packages,	
  50	
  ready-­‐to-­‐use	
  
         Docker	
  containers
       • Supported	
  operating	
  systems:	
  CentOS	
  7,	
  
         Ubuntu	
  16.04
       • Supported	
  cloud	
  frameworks:	
  
         OpenStack	
  Newton,	
  OpenNebula 5.x	
  
         (plus	
  connection	
  to	
  Amazon,	
  Azure)
       • Download	
  it	
  from	
  the	
  INDIGO-­‐DataCloud
         Software	
  Repository:	
  http://repo.indigo-­‐
         datacloud.eu/index.html
Davide	
  Salomoni                                   ESA-­‐ESPI	
  Workshop,	
  7/7/2017             4
INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
How	
  does	
  this	
  fit	
  in	
  a	
  global	
  context?
• We	
  recognize	
  that	
  value	
  for	
  users	
  (and	
  
  hence,	
  our	
  main	
  focus)	
  is	
  at	
  the	
  upper	
  
  layers,	
  not	
  in	
  the	
  bare	
  bone	
  e-­‐infrastructural	
  
  services.
       • But	
  we	
  also	
  provide	
  ways	
  to	
  optimize	
  e-­‐
         infrastructural	
  services	
  for	
  resource	
  providers
• So,	
  we	
  abstract	
  from	
  underlying	
  IaaS	
  
  technologies and	
  offer	
  flexibility	
  in	
  choosing	
  
  e-­‐infra	
  providers,	
  resources	
  and	
  
  capabilities…	
  
• …	
  giving	
  users	
  the	
  possibility	
  to	
  easily	
  
  express	
  and	
  implement	
  requirements	
  for	
  
  their	
  applications through	
  enabling	
  services	
  
  and	
  components.
• This	
  is	
  a	
  movement	
  that	
  goes	
  well	
  beyond	
  
  the	
  ”S”	
  of	
  Science in	
  a	
  EOSC.

Davide	
  Salomoni                                        ESA-­‐ESPI	
  Workshop,	
  7/7/2017   5
INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
INDIGO	
  in	
  support	
  to	
  communities
(some	
  real	
  apps	
  integrating	
  INDIGO	
  components)

• LifeWatch:	
  algae	
  bloom	
  modeling                                   • Automated	
  deployment	
  of	
  an	
  Ophidia	
  big	
  
                                                                               data	
  analytics	
  cluster
• RNA	
  sequencing	
  with	
  TRUFA
                                                                             • INDIGO	
  at	
  the	
  Central	
  Institute	
  for	
  the	
  Union	
  
• Deploying	
  an	
  elastic,	
  complex	
  cluster	
  on	
  the	
             Catalogue	
  of	
  Italian	
  Libraries	
  and	
  
  Cloud	
  with	
  INDIGO	
  components                                        Bibliographic	
  Information
• Cloudified services	
  for	
  molecular	
  dynamics                        • EGI	
  and	
  INDIGO	
  integration
• A	
  distributed	
  archive	
  system	
  for	
  the	
                      • ELIXIR-­‐ITALY:	
  developing	
  a	
  Galaxy	
  instance	
  
  Cherenkov	
  Telescope	
  Array	
  (CTA)                                     provider	
  platform
• The	
  Large	
  Binocular	
  Telescope	
  (LBT)	
                          • Multidisciplinary	
  Oceanic	
  Information	
  
  distributed	
  archive                                                       System
• INDIGO’s	
  Ophidia	
  for	
  astronomical	
  images	
                     • Deploy	
  a	
  Zenodo-­‐based	
  repository	
  in	
  the	
  
  calibration                                                                  cloud	
  using	
  Marathon
• Launching	
  POWERFIT	
  and	
  DISVIS	
  VMs	
  on	
  the	
               • On-­‐demand	
  analysis	
  and	
  big	
  data	
  
  EGI	
  FedCloud using	
  INDIGO	
  tools                                     infrastructures	
  for	
  the	
  CMS	
  LHC	
  experiment
• POWERFIT	
  and	
  DISVIS	
  web	
  portals:	
                             • Theoretical	
  physics	
  on	
  HPC	
  clusters	
  using	
  
  harnessing	
  GPGPUs	
  on	
  the	
  Grid	
  using	
                         INDIGO	
  udocker
  INDIGO	
  udocker
Davide	
  Salomoni                                       ESA-­‐ESPI	
  Workshop,	
  7/7/2017                                                        6
INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
INDIGO	
  in	
  support	
  to	
  maintenance	
  and	
  
evolution
• How	
  can	
  INDIGO	
  be	
  sustained	
  and	
  evolved?

1. Collaboration	
  with	
  commercial	
  providers
2. Collaboration	
  with	
  existing	
  projects	
  and	
  initiatives
3. Submission	
  of	
  new	
  projects

Davide	
  Salomoni                    ESA-­‐ESPI	
  Workshop,	
  7/7/2017   7
INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
Some	
  collaborations	
  with	
  commercial	
  
providers
• See	
  https://developer.ibm.com/opentech/2017/05/18/cloud-­‐
  computing-­‐better-­‐science-­‐recap-­‐egi-­‐conference-­‐indigo-­‐datacloud-­‐
  summit-­‐2017/ for	
  a	
  summary	
  of	
  the	
  INDIGO	
  Summit	
  2017	
  by	
  Dr
  Sahdev Zala of	
  IBM
       • With	
  some	
  details	
  about	
  the	
  ongoing	
  collaboration	
  between	
  IBM	
  and	
  INDIGO-­‐
         DataCloud
• See	
  
  https://indico.egi.eu/indico/event/3249/session/48/contribution/98/
  material/slides/0.pdf for	
  info	
  on	
  the	
  integration	
  of	
  INDIGO	
  tools	
  into	
  
  the	
  Open	
  Telekom	
  Cloud	
  portfolio,	
  the	
  public	
  cloud	
  offering	
  of	
  T-­‐
  Systems	
  (a	
  Deutsche	
  Telekom	
  unit)
Davide	
  Salomoni                              ESA-­‐ESPI	
  Workshop,	
  7/7/2017                              8
INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
Davide	
  Salomoni   ESA-­‐ESPI	
  Workshop,	
  7/7/2017   9
INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
Collaboration	
  with	
  existing	
  projects,	
  
infrastructures	
  and	
  initiatives
• There	
  are	
  ongoing	
  discussions	
  and	
  collaborations	
  with	
  several	
  actors,	
  
  belonging	
  to	
  many	
  areas	
  – for	
  example:
       • European	
  Space	
  Agency	
  &	
  ASI	
  (typically	
  for	
  exploitation	
  /	
  distribution	
  /	
  
         analysis	
  of	
  Copernicus	
  data)
       • Smart	
  City	
  projects
       • Rationalization	
  of	
  Public	
  Administrations
       • HelixNebula ScienceCloud (Pre-­‐Commercial	
  Procurement)
       • EU-­‐wide	
  HPC-­‐Big	
  Data	
  Integration	
  (IPCEI)
       • EGI	
  (also	
  an	
  INDIGO-­‐DataCloud partner)

Davide	
  Salomoni                                  ESA-­‐ESPI	
  Workshop,	
  7/7/2017                               10
Submission	
  of	
  new	
  European	
  projects
• In	
  the	
  last	
  round	
  of	
  the	
  H2020	
  calls	
  (March-­‐April	
  2017),	
  at	
  least	
  5	
  
  proposals were	
  submitted	
  that	
  included	
  key	
  INDIGO	
  components	
  or	
  
  their	
  possible	
  evolutions.
• We	
  still	
  do	
  not	
  know	
  how	
  many	
  of	
  these	
  proposals	
  will	
  be	
  approved,	
  
  but	
  it	
  is	
  interesting	
  to	
  note	
  that	
  there	
  is	
  a	
  very	
  significant	
  interest	
  
  and	
  request	
  for	
  solutions	
  that	
  originate	
  from	
  INDIGO.	
  If	
  results	
  are	
  
  there,	
  stakeholder	
  engagement	
  is	
  strong,	
  if	
  ideas,	
  requirements,	
  
  architectures	
  are	
  valid,	
  this	
  interest	
  will	
  eventually	
  find	
  ways	
  to	
  be	
  
  supported.

Davide	
  Salomoni                             ESA-­‐ESPI	
  Workshop,	
  7/7/2017                                  11
INDIGO	
  &	
  EOSC	
  in	
  production:	
  >=	
  TRL8
• For	
  example,	
  several	
  INDIGO	
  solutions	
  and	
  activities	
  are	
  in	
  the	
  EOSC-­‐
  hub proposal (a	
  proposal	
  jointly	
  prepared	
  by	
  EGI,	
  EUDAT	
  and	
  INDIGO-­‐
  DataCloud)
• With	
  INDIGO	
  components	
  such	
  as	
  Identity	
  and	
  Access	
  Management,	
  
  Token	
  Translation,	
  Virtual	
  filesystems	
  (Onedata),	
  Advanced	
  IaaS	
  
  Services,	
  the	
  Infrastructure	
  Manager,	
  the	
  INDIGO	
  PaaS	
  and	
  its	
  
  orchestrator,	
  web	
  front-­‐end	
  services,	
  user-­‐level	
  containers
• And	
  with	
  training,	
  support,	
  technical	
  coordination,	
  external	
  liaison,	
  
  stakeholder	
  engagement,	
  policy	
  contributions.

Davide	
  Salomoni                        ESA-­‐ESPI	
  Workshop,	
  7/7/2017                        12
INDIGO	
  &	
  EOSC	
  in	
  evolution:	
  <	
  TRL8
• For	
  example,	
  novel	
  features evolving	
  INDIGO	
  components	
  are	
  a	
  key	
  part	
  of	
  
  several	
  proposals	
  to	
  the	
  EINFRA-­‐21-­‐2017	
  (eXtreme-­‐DataCloud and	
  DEEP-­‐
  Hybrid	
  DataCloud)	
  and	
  ICT-­‐16-­‐2017	
  calls:
       • Intelligent	
  dataset	
  distribution	
  and	
  data	
  lifecycle	
  management
       • Smart	
  caching	
  
       • Orchestrating	
  Computing	
  Workflows	
  based	
  on	
  policy	
  driven	
  or	
  adaptive	
  data	
  
         movements	
  
       • Flexible	
  metadata	
  management	
  for	
  big	
  data	
  sets
       • Access	
  to	
  bare-­‐metal	
  resources	
  on	
  the	
  Cloud
       • PaaS-­‐Level	
  access	
  to	
  HPC	
  resources
       • Extensions	
  to	
  the	
  INDIGO	
  Orchestrator	
  for	
  hybrid	
  IaaS	
  deployments	
  and	
  scale	
  out	
  to	
  
         3rd	
  party	
  clouds
       • Extensions	
  to	
  the	
  INDIGO	
  Virtual	
  Router	
  Appliance
       • Real-­‐time,	
  streaming-­‐based	
  data	
  ingestion	
  and	
  processing

Davide	
  Salomoni                                    ESA-­‐ESPI	
  Workshop,	
  7/7/2017                                         13
INDIGO	
  and	
  External	
  Projects:	
  Components	
  and	
  
Patches	
  Merged	
  in	
  Upstream	
  Open	
  Source	
  Projects
• OpenStack	
  (https://www.openstack.org)                                            • TOSCA	
  adaptor	
  for	
  JSAGA	
  
       •   Nova	
  Docker                                                               (http://software.in2p3.fr/jsaga/dev/)
       •   Heat                                                                       • OCCI	
  implementation	
  for	
  OpenStack	
  
       •   OpenID-­‐Connect	
  for	
  Keystone                                          (https://github.com/openstack/ooi)
       •   Pre-­‐emptible	
  instances	
  support	
  (under	
  
           discussion)                                                                • Extended	
  AWS	
  support	
  for	
  rOCCI in	
  
                                                                                        OpenNebula.	
  Python	
  and	
  Java	
  libraries	
  for	
  OCCI	
  
• OpenNebula (http://opennebula.org)                                                    support.
       • OneDock
                                                                                      • CDMI	
  and	
  QoS extensions	
  for	
  dCache
• Infrastructure	
  Manager	
                                                           (https://www.dcache.org)
  (http://www.grycap.upv.es/im/index.php)                                             • Workflow	
  interface	
  extensions	
  for	
  Ophidia	
  
• Clues	
                                                                               (http://ophidia.cmcc.it)
  (http://www.grycap.upv.es/clues/eng/index.p
  hp)                                                                                 • OpenID	
  Connect	
  Java	
  implementation	
  for	
  
                                                                                        dCache (https://www.dcache.org)
• Onedata (https://onedata.org)
                                                                                      • MitreID (https://mitreid.org/) and	
  OpenID	
  
                                                                                        Connect	
  (http://openid.net/connect/) libraries

Davide	
  Salomoni                                                ESA-­‐ESPI	
  Workshop,	
  7/7/2017                                                  14
EGI	
  services for	
  EO	
  data	
  exploitation
         • The	
  new	
  EO	
  satellites	
  generates	
  large	
  amounts	
  of	
  data	
  not	
  easily	
  
           integrated	
  into	
  processing	
  chains	
  outside	
  the	
  ground	
  segment.
         • EGI	
  services	
  can	
  improve	
  the	
  discovery,	
  retrieval	
  and	
  processing
           capabilities	
  of	
  EO	
  data:

                                         ü capabilities	
  for	
  big	
  data	
  
                                            management
                                         ü virtualised access	
  to	
  
               Cloud Compute                geographically	
                                   Data	
  Hub
                                            distributed	
  data	
  (EGI	
  
                                            Data	
  Hub)
                                         ü computing	
  necessary	
  to	
  
               Online	
  Storage            manage	
  large	
  volumes	
  of	
                 Data	
  Transfer
                                            different	
  data	
  types

7/6/17                                                                                                            15
EGI	
  services for	
  EO	
  data	
  exploitation
         • The	
  new	
  EO	
  satellites	
  generates	
  large	
  amounts	
  of	
  data	
  not	
  easily	
  
           integrated	
  into	
  processing	
  chains	
  outside	
  the	
  ground	
  segment.
         • EGI	
  services	
  can	
  improve	
  the	
  discovery,	
  retrieval	
  and	
  processing
           capabilities	
  of	
  EO	
  data:

                                         ü capabilities	
  
                                             Advanced	
  IaaSfor	
  big	
  data	
  
                                            management
                                             Network	
  virtualization
                                             Advanced	
  Orchestration	
  
                                         ü virtualised     access	
  to	
  PaaS
                                             Standard	
  interfaces	
  support
               Cloud Compute                geographically	
  
                                             Containers	
  Orchestration                       Data	
  Hub
                                            distributed	
  data	
  (EGI	
  
                                            Data	
  Hub)
                                         ü computing	
  necessary	
  to	
  
               Online	
  Storage            manage	
  large	
  volumes	
  of	
                 Data	
  Transfer
                                            different	
  data	
  types

7/6/17                                                                                                            16
EGI	
  services for	
  EO	
  data	
  exploitation
         • The	
  new	
  EO	
  satellites	
  generates	
  large	
  amounts	
  of	
  data	
  not	
  easily	
  
           integrated	
  into	
  processing	
  chains	
  outside	
  the	
  ground	
  segment.
         • EGI	
  services	
  can	
  improve	
  the	
  discovery,	
  retrieval	
  and	
  processing
           capabilities	
  of	
  EO	
  data:
                                                                             OneData

                                         ü capabilities	
  for	
  big	
  data	
  
                                            management
                                         ü virtualised access	
  to	
  
               Cloud Compute                geographically	
                                   Data	
  Hub
                                            distributed	
  data	
  (EGI	
  
                                            Data	
  Hub)
                                         ü computing	
  necessary	
  to	
  
               Online	
  Storage            manage	
  large	
  volumes	
  of	
                 Data	
  Transfer
                                            different	
  data	
  types

7/6/17                                                                                                            17
EGI	
  services to	
  accelerate	
  the
                                                              development of	
  EO	
  exploitation platforms
          Well	
  established	
  e-­‐Infrastructure	
  services	
  as	
  a	
  set	
  of	
  reusable	
  components	
  
          to	
  solve	
  common	
  problems:
          • AAI	
  and	
  single	
  sign-­‐on                       CheckIn

          • Service	
  Monitoring                                    ARGO

          • Accounting	
  infrastructure                              APEL

          • Configuration Database                                   GOCDB

          • Operational Tools                                        Operations	
  Portal,	
  Security	
  tools,	
  etc.

          • Collaboration	
  tools                                   Wiki,	
  Doc	
  repo,	
  Agenda	
  mgmt system,	
  etc.

         EO	
  Platform	
  developers	
  can	
  focus	
  on	
  their	
  core	
  tasks!

7/6/17                                                                                                                         18
Reflections	
  on	
  some	
  of	
  the	
  suggested	
  themes
• How	
  infrastructure	
  contributes	
  to	
  make	
  possible	
  new	
  services	
  and	
  applications?
       • à INDIGO	
  contributes	
  by	
  producing	
  enabling	
  technologies directly	
  requested	
  by	
  both	
  providers	
  and	
  users,	
  that	
  can	
  be	
  
         deployed	
  on	
  ANY	
  infrastructure	
  to	
  produce	
  new,	
  high-­‐value	
  services	
  or	
  applications.
• Exponential	
  technologies:	
  more	
  a	
  software	
  or	
  a	
  hardware	
  race?	
  How	
  value	
  chain	
  will	
  be	
  affected?
       • à For	
  us	
  it	
  is	
  definitely	
  more	
  software,	
  intended	
  as	
  final	
  artifacts	
  and	
  resource	
  exploitation.	
  The	
  value	
  chain	
  is	
  represented	
  
         by	
  the	
  inverted	
  triangle	
  shown	
  earlier.	
  The	
  traditional	
  hardware	
  race	
  per	
  se	
  is	
  lost,	
  at	
  least	
  for	
  big	
  initiatives	
  such	
  as	
  HL-­‐LHC.
• How	
  can	
  we	
  best	
  incentivize	
  data	
  sharing	
  between	
  entities?
       • à First,	
  make	
  it	
  doable	
  /	
  easy	
  to	
  do,	
  considering	
  also	
  issues	
  such	
  as	
  data	
  lifecycle,	
  replication,	
  quality	
  of	
  service.
• How	
  can	
  we	
  most	
  effectively	
  integrate	
  modern	
  and	
  legacy	
  data	
  infrastructures?
       • à Through	
  open	
  solutions,	
  the	
  use	
  of	
  de	
  facto	
  or	
  de	
  jure	
  standards,	
  and	
  state-­‐of-­‐the-­‐art	
  but	
  still	
  production-­‐level	
  solutions.
• To	
  what	
  extent	
  are	
  consolidation	
  and	
  integration	
  of	
  existing	
  services	
  necessary	
  to	
  achieve	
  the	
  necessary	
  
  infrastructure?
       • à They	
  are	
  absolutely	
  necessary,	
  if	
  we	
  want	
  to	
  effectively	
  use	
  all	
  the	
  resources	
  that	
  are	
  there.	
  We	
  assume	
  that	
  we	
  MUST	
  be	
  
         able	
  to	
  utilize	
  these	
  resources	
  before	
  looking	
  /	
  asking	
  for	
  new	
  infrastructures.
• What	
  would	
  you	
  consider	
  to	
  be	
  the	
  most	
  crucial	
  next	
  steps	
  and	
  milestones	
  in	
  the	
  successful	
  implementation	
  
  of	
  the	
  programs?
       • Be	
  pragmatic	
  and	
  do	
  not	
  spend	
  too	
  much	
  time	
  discussing	
  first	
  principles.	
  Get	
  the	
  relevant	
  actors	
  around	
  the	
  same	
  table,	
  
         start	
  from	
  concrete	
  requirements	
  and	
  use	
  cases,	
  and	
  seek	
  implementers	
  /	
  implementations	
  able	
  to:
                •    Scale	
  out
                •    Run	
  in	
  multiple,	
  heterogeneous,	
  hybrid	
  infrastructures
                •    Clearly	
  show	
  the	
  benefits	
  of	
  the	
  proposed	
  vs.	
  legacy	
  /	
  proprietary	
  /	
  ad-­‐hoc	
   solutions	
  
Davide	
  Salomoni                                                                         ESA-­‐ESPI	
  Workshop,	
  7/7/2017                                                                             19
Conclusions
         • In	
  just	
  24	
  months,	
  the	
  INDIGO-­‐DataCloud project	
  has	
  realized	
  a	
  comprehensive	
  
           involvement	
  of	
  many	
  Research	
  Communities	
  and	
  providers for	
  the	
  definition	
  
           and	
  tracking	
  of	
  requirements.
         • We	
  identified	
  technology	
  gaps linked	
  to	
  several	
  concrete	
  use	
  cases	
  in	
  multiple	
  
           fields,	
  defined,	
  published	
  and	
  implemented	
  the	
  overall	
  INDIGO	
  architecture.
         • After	
  early	
  demonstrations	
  and	
  beta	
  software	
  previews,	
  we	
  produced	
  two	
  major	
  
           software	
  versions	
  and	
  9	
  minor	
  updates,	
  releasing	
  40	
  open	
  modular	
  components.	
  
           We	
  did	
  that	
  exploiting	
  key	
  European	
  know-­‐how,	
  reusing	
  and	
  extending	
  open	
  
           source	
  software,	
  and	
  contributing	
  to	
  upstream	
  projects.	
  We	
  established	
  software	
  
           development	
  and	
  management	
  processes,	
  and	
  defined	
  development	
  and	
  pre-­‐
           production	
  distributed	
  testbeds.
         • Production	
  deployment	
  of	
  many	
  applications	
  making	
  use	
  of	
  the	
  INDIGO	
  
           software	
  is	
  well	
  underway,	
  and	
  INDIGO	
  components	
  have	
  been	
  proposed	
  for	
  
           production	
  use	
  in	
  big	
  infrastructures,	
  commercial	
  companies,	
  external	
  projects.
         • Several	
  opportunities	
  for	
  further	
  exploitation	
  of	
  INDIGO	
  components	
  are	
  being	
  
           explored	
  and	
  implemented,	
  in	
  the	
  context	
  of	
  the	
  EOSC	
  and	
  beyond.
Davide	
  Salomoni                               ESA-­‐ESPI	
  Workshop,	
  7/7/2017                              20
Thank	
  you

                     https://www.indigo-­‐datacloud.eu
                     Better	
  Software	
  for	
  Better	
  Science.

   @indigodatacloud           www.indigo-­‐datacloud.eu                         https://www.facebook.com/indigodatacloud/

Davide	
  Salomoni                        ESA-­‐ESPI	
  Workshop,	
  7/7/2017                                      21
Backup	
  slides

Davide Salomoni    ESA-ESPI Workshop, 7/7/2017   22
The	
  INDIGO	
  added	
  value
• INDIGO,	
  driven	
  by	
  scientific	
  communities,	
  has	
  
  been	
  developing	
  a	
  comprehensive	
  open	
  source	
  
  Cloud	
  architecture,	
  which	
  provides	
  many	
  new	
  
  functionalities	
  and	
  services	
  previously	
  
  unavailable	
  in	
  open	
  source	
  and	
  in	
  some	
  cases	
  
  also	
  in	
  proprietary	
  Cloud	
  offerings.
• These	
  functionalities	
  abstract	
  from	
  underlying	
  
  IaaS	
  technologies through	
  the	
  consistent	
  use	
  of	
  
  both	
  de	
  jure	
  and	
  de	
  facto	
  standards.	
  This	
  allows	
  
  interoperability	
  with	
  hybrid	
  (public/private)	
  
  infrastructures.
• After	
  beta	
  testing	
  and	
  demos	
  shown	
  as	
  early	
  as	
  
  November	
  2015,	
  we	
  released	
  our	
  first	
  major	
  
  software	
  release	
  (MidnightBlue)	
  in	
  August	
  2016,	
  
  9	
  software	
  updates	
  in	
  the	
  following	
  months,	
  and	
  
  our	
  second	
  and	
  final	
  major	
  release	
  
  (ElectricIndigo)	
  in	
  April	
  2017.

    Davide	
  Salomoni                                      ESA-­‐ESPI	
  Workshop,	
  7/7/2017   23
Release	
  Timeline

                                        Aug   Sep   Oct   Nov      Dec      Jan      Feb     Mar      Apr   May     Jun     Jul    Aug   Sep    Oct     Nov     Dec       Jan

INDIGO-­‐1      Full	
  updates
                Standard	
  updates
                Security	
  updates
INDIGO-­‐2      Full	
  updates
                Standard	
  updates
                Security	
  updates
                                      Release	
  Date           End	
  of	
  Full	
  Updates           End	
  of	
  Standard	
           End	
  of	
  Security	
  
                                                                                                       Updates                           Updates	
  &	
  EOL
INDIGO-­‐1	
  MidnightBlue            08/08/2016                31/01/2017                             31/03/2017                        31/05/2017
INDIGO-­‐2	
  ElectricIndigo          14/04/2017                30/09/2017                             30/11/2017                        31/01/2018                  24
     Davide	
  Salomoni                                         ESA-­‐ESPI	
  Workshop,	
  7/7/2017
ElectricIndigo

Four	
  main	
  “solution	
  
blocks”:
• Data	
  Center	
  
  Solutions
• Data	
  /	
  Storage	
  
  Solutions
• Automated	
  
  Solutions
• User-­‐Oriented	
  
  Solutions
And	
  “common	
  
solutions”:
• Authentication	
  and	
  
  Authorization

        Davide	
  Salomoni      ESA-­‐ESPI	
  Workshop,	
  7/7/2017   25
ElectricIndigo:
  Application-­‐level	
  Interfaces	
  for	
  Cloud	
  Providers
  and	
  Automated	
  Service	
  Composition

                                    • Easily	
  port	
  applications	
  to	
  public	
  and	
  private	
  
                                      Clouds using	
  open	
  programmable	
  interfaces,	
  
                                      user-­‐level	
  containers,	
  and	
  standards-­‐based	
  
                                      languages	
  to	
  automate	
  definition,	
  composition	
  
                                      and	
  instantiation	
  of	
  complex	
  set-­‐ups.

• Typical	
  questions:	
  How	
  can	
  I	
  run	
  my	
  application	
  on	
  
  Cloud	
  provider	
  X?	
  What	
  if	
  I	
  want	
  to	
  use	
  Docker	
  but	
  
  my	
  provider	
  does	
  not	
  support	
  it	
  (e.g.	
  also	
  on	
  HPC	
  
  systems)?	
  How	
  do	
  I	
  automate	
  the	
  creation	
  and	
  
  management	
  over	
  public	
  or	
  private	
  Clouds	
  of	
  
  dynamic	
  clusters	
  running	
  multiple	
  services?
  Davide	
  Salomoni                                 ESA-­‐ESPI	
  Workshop,	
  7/7/2017                     26
ElectricIndigo:
 Flexible	
  Identity	
  and	
  Access	
  Management

                                • Manage	
  access	
  and	
  policies	
  to	
  distributed	
  
                                  resources using	
  multiple	
  methods	
  such	
  as	
  
                                  OpenID-­‐Connect,	
  SAML,	
  X.509 digital	
  certificates,	
  
                                  through	
  programmable	
  interfaces	
  and	
  web	
  front-­‐
                                  ends.

• Typical	
  questions:	
  How	
  can	
  I	
  manage	
  access	
  to	
  
  distributed	
  resources	
  by	
  users,	
  identified	
  through	
  
  diverse	
  methods?	
  (e.g.	
  Google	
  ID,	
  digital	
  
  certificates)	
  How	
  should	
  I	
  modify	
  /	
  write	
  my	
  apps	
  
  to	
  benefit	
  from	
  that?
 Davide	
  Salomoni                           ESA-­‐ESPI	
  Workshop,	
  7/7/2017               27
ElectricIndigo:
  Data	
  Management	
  and	
  Data	
  Analytics	
  Solutions

                                   • Distribute	
  and	
  access	
  data through	
  multiple	
  
                                     providers	
  via	
  virtual	
  file	
  systems	
  and	
  automated	
  
                                     replication	
  and	
  caching,	
  exploiting	
  scalable,	
  high-­‐
                                     performance	
  data	
  mining	
  and	
  analytics.

• Typical	
  questions:	
  How	
  can	
  I	
  automatically	
  replicate	
  
  datasets	
  to	
  multiple	
  sites?	
  Can	
  I	
  transparently	
  access	
  
  my	
  distributed	
  datasets	
  from	
  my	
  app?	
  Can	
  I	
  cache	
  the	
  
  most	
  accessed	
  data,	
  so	
  that	
  it’s	
  close	
  to	
  where	
  users	
  
  need	
  it?	
  How	
  do	
  I	
  instantiate	
  clusters	
  and	
  databases	
  
  for	
  big	
  data	
  analysis?
  Davide	
  Salomoni                              ESA-­‐ESPI	
  Workshop,	
  7/7/2017                     28
ElectricIndigo:
 Programmable	
  Web	
  Portals,	
  Mobile	
  Applications

                                 • Create	
  and	
  interface	
  web	
  portals	
  or	
  mobile	
  apps,	
  
                                   exploiting	
  distributed	
  data	
  as	
  well	
  as	
  compute	
  
                                   resources located	
  in	
  public	
  and	
  private	
  Cloud	
  
                                   infrastructures.

• Typical	
  questions:	
  How	
  can	
  I	
  easily	
  provide	
  my	
  
  app	
  with	
  a	
  pluggable,	
  extensible	
  web	
  front-­‐end?	
  
  Can	
  this	
  front-­‐end	
  interface	
  with	
  all	
  the	
  features	
  
  provided	
  by	
  INDIGO?	
  How	
  can	
  I	
  write	
  an	
  INDIGO-­‐
  enabled	
  app	
  for	
  Android	
  or	
  iOS?
 Davide	
  Salomoni                            ESA-­‐ESPI	
  Workshop,	
  7/7/2017                       29
ElectricIndigo:
 Enhanced	
  and	
  Scalable	
  Services	
  for	
  Data	
  Centers	
  and	
  
 Resource	
  Providers

                                  • Increase	
  the	
  efficiency	
  of	
  existing	
  Cloud	
  
                                    infrastructures based	
  on	
  OpenStack	
  or	
  
                                    OpenNebula through	
  advanced	
  scheduling,	
  
                                    flexible	
  cloud	
  /	
  batch	
  management,	
  network	
  
                                    orchestration and	
  interfacing	
  of	
  high-­‐level	
  Cloud	
  
                                    services	
  to	
  existing	
  storage	
  systems.
• Typical	
  questions:	
  How	
  can	
  my	
  cloud	
  data	
  centers	
  
  provide	
  flexible	
  and	
  fair	
  scheduling	
  policies	
  for	
  access	
  
  to	
  resources?	
  How	
  do	
  I	
  balance	
  traditional	
  vs.	
  cloud	
  
  resources	
  in	
  my	
  data	
  center?	
  How	
  do	
  I	
  connect	
  novel	
  
  INDIGO	
  features	
  to	
  my	
  existing	
  systems?	
  How	
  can	
  I	
  
  manage	
  storage	
  Quality	
  of	
  Service?
 Davide	
  Salomoni                              ESA-­‐ESPI	
  Workshop,	
  7/7/2017                  30
WP6 Services
  High-­‐level	
  view	
  of	
  the	
                                                 GUI
                                                                                                   Admin
                                                                                                   Portlets
                                                                                                                      User
                                                                                                                     Portlets
                                                                                                                                        Mobile Apps
                                                                                                                                                               Ophidpia
                                                                                                                                                                plugin
                                                                                                                                                                                              Other

  INDIGO	
  architecture
                                                                                      Clients
                                                                                                                                                              LONI plugin                    Science              SG Mon
                                                                                                    Data           Workflow             Open Mobile
                                                                                                                                                               Taverna,
                                                                                                                                                                                            Gateways
                                                                                                   Analitics       Portlets               Toolkit            Kepler plugin                                        Support
                                                                                                     Future Gateway Portal             Mobile clients       Workflows                                             services

                                                                                                                             Future Gateway REST API
                                                                                                                             Future Gateway Engine
                                                                                                                              JSAGA/JSAGA Adaptors

                                                                                                                                                                              REST/CDMI/Wedbav/posix/Gridftp
                                                                                          OIDC                                TOSCA
                                                                                                Kubernetes Cluster

                                                                                                                                                                                    Onedata                   Dynafed
                                                                                                                      PaaS                                                  Data Services
                                                                                                                                                                                                       FTS
                                                                                                                   Orchestrator
                                                                                                                                                          QoS/SLA

This	
  is	
  the	
  INDIGO-­‐DataCloud

                                                                                                                                                                               Accounting
                                                                                            IAM
                                                                                           Service
                                                                                                          TOSCA                                     Monitoring
                                                                                                                                                                                                                     WP5
General	
  Architecture*                                                                         Infrastructure
                                                                                                    Manager
                                                                                                                                          CloudProvider
                                                                                                                                             Ranker                                                               Services

                                                                                                                   TOSCA                                                                         S3/CDMI/Posix/Webdav
                                                                                                                                                                                                       GridFTP
                                                                                                                                                        Mesos
                                                                                                                               Aut. Scaling              Mesos
                                                                                                                                                        Cluster
                                                                                                                                 Service                 Cluster
                                                                                                                                                                                                             Storage
                                                                                    Native IaaS API                                                                                                          Service
                                                                                                                                                                                                 QoS Support
                                                                                                                             Heat/IM
                                                                                     Non-INDIGO
                                                                                                                               Smart                                                    Identity
                                                                                     IaaS                                    Scheduling
                                                                                                                                                      Spot Istances
                                                                                                                                                                                      Armonization
*:	
  see	
  details	
  in	
  http://arxiv.org/abs/1603.09536 or	
  in	
  
https://www.indigo-­‐datacloud.eu/documents-­‐deliverables                                                                                       Native
                                                                                                                                                 Docker                  Local
                                                                                                                                                                                                 WP4 Services
                                                                                                                                                                       Repository

          Davide	
  Salomoni                                                 ESA-­‐ESPI	
  Workshop,	
  7/7/2017                                                                                                             31
INDIGO	
  Software	
  Development	
  Flow
    Development
                                                                                                                   T3.4	
  
                             T3.1	
  Software	
                         T3.2	
  Software	
  
                                                                                                                Exploitation
WP4

            WP5

                       WP6
                                quality	
                                release	
  and	
  
                               assurance                                maintenance

   Development                                Integration                                         Preview                 External
   infrastructure                           infrastructure                                     infrastructure              Service
                                                                                                                          Providers
                                          T3.3	
  Pilot	
  services

software	
  delivery
                                                                                                   WP2                       Users
software	
  deployment
                                                                                                Application               Production
software	
  use                                                                                  Use-­‐cases               3232
                                                                                                                             work
Davide	
  Salomoni                             ESA-­‐ESPI	
  Workshop,	
  7/7/2017
The	
  INDIGO	
  Development	
  and	
  Integration	
  
       Infrastructure
                                           DESY
                                             -­‐ dCache
                                                                                            PSNC
                                                                                              -­‐ Indigokepler
                                                     CESNET                                   -­‐ indigo-­‐omt
                                                       -­‐ rOCCI

                                KIT
                                  -­‐   CDMI-­‐QoS                                                     Cyfronet
                                  -­‐   TTS                                                              -­‐  Onedata

                                        CERN
                IFCA/CSIC                 -­‐ Kubernetes
                   -­‐ OOI                -­‐ Magnun
                   -­‐ OPIE

                                                      CNAF/INFN
                                                        -­‐ IAM
LIP/INCD                                                -­‐ Oneprovider                           INFN	
  Bari
   -­‐ OpenNebula:	
  ONEDock                                                                       -­‐    Kubernetes
   -­‐ Nova-­‐Docker                                                                                -­‐    Mesos
   -­‐ FutureGateway                                           UPV                                  -­‐    Chronos
                                                                 -­‐   IM
                                                                 -­‐   CLUES
                                                                 -­‐   TOSCA

                                                                                                                        33
          Davide	
  Salomoni                          ESA-­‐ESPI	
  Workshop,	
  7/7/2017
The	
  INDIGO	
  Pilot	
  Preview	
  Testbed
                                                                                                                          DESY
                                                                                                                            -­‐   CDMI-­‐QoS
                                                                                                                            -­‐   dCache
                                                                                                                            -­‐   OneData
Demos	
  are	
  performed	
  in	
  the	
  
    preview	
  testbed                                             KIT                                                    INFN-­‐Padova
                                                                                                                             -­‐   Synergy
                                                                      -­‐    CDMI-­‐QOS
                                                                      -­‐    OneData                                         -­‐   OOI

                   LIP/INCD
                      -­‐   OOI                                    IFCA/SIC
                      -­‐   IAM	
  connector                          -­‐   ooi
                                                                                                                          CNAF/INFN
                      -­‐   Nova-­‐Docker                             -­‐   IAM	
  connector
                                                                                                                            -­‐  IAM
                      -­‐   OS	
  Identity	
  Authentication	
        -­‐   nova-­‐docker
                                                                                                                            -­‐  OneData
                            library                                   -­‐   OS	
  Identity	
  Authentication	
  library
                                                                                                                            -­‐  CDMI-­‐QoS
                      -­‐   ONEDock                                   -­‐   java-­‐syncrepos
                                                                                                                            -­‐  Orchestrator
                      -­‐   rOCCI	
  server
                                                                                                                            -­‐  CloudProviderRanker
                      -­‐   TTS
                                                                                                                            -­‐  Zabbix-­‐wrapper
                      -­‐   Java-­‐syncrepos
                                                                                                                            -­‐  SLAManager
                      -­‐   Cloud-­‐info-­‐provider
                                                                                                                            -­‐  CMDB
                      -­‐   IM
                      -­‐   OneData
                      -­‐   FG	
  API	
  server
                      -­‐   FG	
  Portal                                                                                  INFN-­‐Bari
                      -­‐   LiferayIAM                                                                                       -­‐    CDMI-­‐QoS
                      -­‐   Indigo	
  Kepler                                                             UPV                 -­‐    OneData
                      -­‐   Ophidia                                                                        -­‐      IM       -­‐    Kubernetes:
                                                                                                                             -­‐    Marathon
                                                                                                                             -­‐    Chronos
                                                                                                                             -­‐    Mesos

       Davide	
  Salomoni                                                         ESA-­‐ESPI	
  Workshop,	
  7/7/2017                                  34
Resource	
  requirements	
  for	
  LHC
200
              GRID
150           ATLAS
              CMS
              LHCb
100           ALICE

50

 0                                                                 500,0
        Run	
  1       Run	
  2         Run	
  3        Run	
  4
                                                                   400,0
                                                                                                                                  CMS
            Computing	
  power	
  needs	
  for	
  LHC	
            300,0
                                                                                                                                  ATLAS
                                                                   200,0                                                          ALICE

                                                                   100,0                                                          LHCb

                                                                     0,0
                                                                           Run	
  1       Run	
  2      Run	
  3       Run	
  4
                                                                                      Storage	
  needs	
  for	
  LHC
You can also read