INDIGO-DATACLOUD DAVIDE SALOMONI, INFN- CNAF - INDIGO-DATACLOUD PROJECT COORDINATOR - EUROPEAN SPACE POLICY INSTITUTE
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
INDIGO-‐DataCloud Better Software for Better Science. RIA-‐653549 Davide Salomoni, INFN-‐CNAF INDIGO-‐DataCloud Project Coordinator davide.salomoni@cnaf.infn.it ESA-‐ESPI Workshop Frascati, 7/7/2017
INDIGO-‐DataCloud • An H2020 project approved in January 2015 in the EINFRA-‐1-‐2014 call • 11.1M€, 30 months (from April 2015 to September 2017) • Who: 26 European partners in 11 European countries • Coordination by the Italian National Institute for Nuclear Physics (INFN) • Including developers of distributed software, industrial partners, research institutes, universities, e-‐infrastructures • What: develop an open source Cloud platform for computing and data (“DataCloud”) tailored to science but applicable to other domains as well. • For: multi-‐disciplinary scientific communities • E.g. structural biology, earth science, physics, bioinformatics, cultural heritage, astrophysics, life science, climatology • Where: deployable on hybrid (public or private) Cloud infrastructures • INDIGO = INtegrating Distributed data Infrastructures for Global ExplOitation • Why: answer to the technological needs of scientists seeking to easily exploit distributed Cloud/Grid compute and data resources. Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 2
INDIGO-‐DataCloud’s Vision • INDIGO: Scientific Users 1. Develops open, interoperable Adopt, Use solutions for scientific data. 2. Supports open science organizing INDIGO Advanced Components and Solutions the European data space. 3. Enables collaborations across Deployed on diverse scientific communities worldwide. D1.8, General Architecture Private or Commercial Publicly funded e-‐infrastructures • INDIGO offers its D2.1 and Clouds (Public, PCP-‐based, (EGI, EUDAT, GEANT, PRACE, RI, • architecture, D2.4, community etc.) etc.) • analysis, requirements • expertise INDIGO’s 34 Exploiting • and software components deliverables (so far) Datasets, Resources • as a concrete step toward the The INDIGO-‐ definition and implementation of DataCloud To produce a European Open Science Cloud Service Catalogue and Data Infrastructure. Scientific Results Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 3
The INDIGO Services • We recently released our second and final major software release, called ElectricIndigo • Fact sheet (https://www.indigo-‐ datacloud.eu/service-‐component): • 40 modular components, distributed via 170 software packages, 50 ready-‐to-‐use Docker containers • Supported operating systems: CentOS 7, Ubuntu 16.04 • Supported cloud frameworks: OpenStack Newton, OpenNebula 5.x (plus connection to Amazon, Azure) • Download it from the INDIGO-‐DataCloud Software Repository: http://repo.indigo-‐ datacloud.eu/index.html Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 4
How does this fit in a global context? • We recognize that value for users (and hence, our main focus) is at the upper layers, not in the bare bone e-‐infrastructural services. • But we also provide ways to optimize e-‐ infrastructural services for resource providers • So, we abstract from underlying IaaS technologies and offer flexibility in choosing e-‐infra providers, resources and capabilities… • … giving users the possibility to easily express and implement requirements for their applications through enabling services and components. • This is a movement that goes well beyond the ”S” of Science in a EOSC. Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 5
INDIGO in support to communities (some real apps integrating INDIGO components) • LifeWatch: algae bloom modeling • Automated deployment of an Ophidia big data analytics cluster • RNA sequencing with TRUFA • INDIGO at the Central Institute for the Union • Deploying an elastic, complex cluster on the Catalogue of Italian Libraries and Cloud with INDIGO components Bibliographic Information • Cloudified services for molecular dynamics • EGI and INDIGO integration • A distributed archive system for the • ELIXIR-‐ITALY: developing a Galaxy instance Cherenkov Telescope Array (CTA) provider platform • The Large Binocular Telescope (LBT) • Multidisciplinary Oceanic Information distributed archive System • INDIGO’s Ophidia for astronomical images • Deploy a Zenodo-‐based repository in the calibration cloud using Marathon • Launching POWERFIT and DISVIS VMs on the • On-‐demand analysis and big data EGI FedCloud using INDIGO tools infrastructures for the CMS LHC experiment • POWERFIT and DISVIS web portals: • Theoretical physics on HPC clusters using harnessing GPGPUs on the Grid using INDIGO udocker INDIGO udocker Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 6
INDIGO in support to maintenance and evolution • How can INDIGO be sustained and evolved? 1. Collaboration with commercial providers 2. Collaboration with existing projects and initiatives 3. Submission of new projects Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 7
Some collaborations with commercial providers • See https://developer.ibm.com/opentech/2017/05/18/cloud-‐ computing-‐better-‐science-‐recap-‐egi-‐conference-‐indigo-‐datacloud-‐ summit-‐2017/ for a summary of the INDIGO Summit 2017 by Dr Sahdev Zala of IBM • With some details about the ongoing collaboration between IBM and INDIGO-‐ DataCloud • See https://indico.egi.eu/indico/event/3249/session/48/contribution/98/ material/slides/0.pdf for info on the integration of INDIGO tools into the Open Telekom Cloud portfolio, the public cloud offering of T-‐ Systems (a Deutsche Telekom unit) Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 8
Collaboration with existing projects, infrastructures and initiatives • There are ongoing discussions and collaborations with several actors, belonging to many areas – for example: • European Space Agency & ASI (typically for exploitation / distribution / analysis of Copernicus data) • Smart City projects • Rationalization of Public Administrations • HelixNebula ScienceCloud (Pre-‐Commercial Procurement) • EU-‐wide HPC-‐Big Data Integration (IPCEI) • EGI (also an INDIGO-‐DataCloud partner) Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 10
Submission of new European projects • In the last round of the H2020 calls (March-‐April 2017), at least 5 proposals were submitted that included key INDIGO components or their possible evolutions. • We still do not know how many of these proposals will be approved, but it is interesting to note that there is a very significant interest and request for solutions that originate from INDIGO. If results are there, stakeholder engagement is strong, if ideas, requirements, architectures are valid, this interest will eventually find ways to be supported. Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 11
INDIGO & EOSC in production: >= TRL8 • For example, several INDIGO solutions and activities are in the EOSC-‐ hub proposal (a proposal jointly prepared by EGI, EUDAT and INDIGO-‐ DataCloud) • With INDIGO components such as Identity and Access Management, Token Translation, Virtual filesystems (Onedata), Advanced IaaS Services, the Infrastructure Manager, the INDIGO PaaS and its orchestrator, web front-‐end services, user-‐level containers • And with training, support, technical coordination, external liaison, stakeholder engagement, policy contributions. Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 12
INDIGO & EOSC in evolution: < TRL8 • For example, novel features evolving INDIGO components are a key part of several proposals to the EINFRA-‐21-‐2017 (eXtreme-‐DataCloud and DEEP-‐ Hybrid DataCloud) and ICT-‐16-‐2017 calls: • Intelligent dataset distribution and data lifecycle management • Smart caching • Orchestrating Computing Workflows based on policy driven or adaptive data movements • Flexible metadata management for big data sets • Access to bare-‐metal resources on the Cloud • PaaS-‐Level access to HPC resources • Extensions to the INDIGO Orchestrator for hybrid IaaS deployments and scale out to 3rd party clouds • Extensions to the INDIGO Virtual Router Appliance • Real-‐time, streaming-‐based data ingestion and processing Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 13
INDIGO and External Projects: Components and Patches Merged in Upstream Open Source Projects • OpenStack (https://www.openstack.org) • TOSCA adaptor for JSAGA • Nova Docker (http://software.in2p3.fr/jsaga/dev/) • Heat • OCCI implementation for OpenStack • OpenID-‐Connect for Keystone (https://github.com/openstack/ooi) • Pre-‐emptible instances support (under discussion) • Extended AWS support for rOCCI in OpenNebula. Python and Java libraries for OCCI • OpenNebula (http://opennebula.org) support. • OneDock • CDMI and QoS extensions for dCache • Infrastructure Manager (https://www.dcache.org) (http://www.grycap.upv.es/im/index.php) • Workflow interface extensions for Ophidia • Clues (http://ophidia.cmcc.it) (http://www.grycap.upv.es/clues/eng/index.p hp) • OpenID Connect Java implementation for dCache (https://www.dcache.org) • Onedata (https://onedata.org) • MitreID (https://mitreid.org/) and OpenID Connect (http://openid.net/connect/) libraries Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 14
EGI services for EO data exploitation • The new EO satellites generates large amounts of data not easily integrated into processing chains outside the ground segment. • EGI services can improve the discovery, retrieval and processing capabilities of EO data: ü capabilities for big data management ü virtualised access to Cloud Compute geographically Data Hub distributed data (EGI Data Hub) ü computing necessary to Online Storage manage large volumes of Data Transfer different data types 7/6/17 15
EGI services for EO data exploitation • The new EO satellites generates large amounts of data not easily integrated into processing chains outside the ground segment. • EGI services can improve the discovery, retrieval and processing capabilities of EO data: ü capabilities Advanced IaaSfor big data management Network virtualization Advanced Orchestration ü virtualised access to PaaS Standard interfaces support Cloud Compute geographically Containers Orchestration Data Hub distributed data (EGI Data Hub) ü computing necessary to Online Storage manage large volumes of Data Transfer different data types 7/6/17 16
EGI services for EO data exploitation • The new EO satellites generates large amounts of data not easily integrated into processing chains outside the ground segment. • EGI services can improve the discovery, retrieval and processing capabilities of EO data: OneData ü capabilities for big data management ü virtualised access to Cloud Compute geographically Data Hub distributed data (EGI Data Hub) ü computing necessary to Online Storage manage large volumes of Data Transfer different data types 7/6/17 17
EGI services to accelerate the development of EO exploitation platforms Well established e-‐Infrastructure services as a set of reusable components to solve common problems: • AAI and single sign-‐on CheckIn • Service Monitoring ARGO • Accounting infrastructure APEL • Configuration Database GOCDB • Operational Tools Operations Portal, Security tools, etc. • Collaboration tools Wiki, Doc repo, Agenda mgmt system, etc. EO Platform developers can focus on their core tasks! 7/6/17 18
Reflections on some of the suggested themes • How infrastructure contributes to make possible new services and applications? • à INDIGO contributes by producing enabling technologies directly requested by both providers and users, that can be deployed on ANY infrastructure to produce new, high-‐value services or applications. • Exponential technologies: more a software or a hardware race? How value chain will be affected? • à For us it is definitely more software, intended as final artifacts and resource exploitation. The value chain is represented by the inverted triangle shown earlier. The traditional hardware race per se is lost, at least for big initiatives such as HL-‐LHC. • How can we best incentivize data sharing between entities? • à First, make it doable / easy to do, considering also issues such as data lifecycle, replication, quality of service. • How can we most effectively integrate modern and legacy data infrastructures? • à Through open solutions, the use of de facto or de jure standards, and state-‐of-‐the-‐art but still production-‐level solutions. • To what extent are consolidation and integration of existing services necessary to achieve the necessary infrastructure? • à They are absolutely necessary, if we want to effectively use all the resources that are there. We assume that we MUST be able to utilize these resources before looking / asking for new infrastructures. • What would you consider to be the most crucial next steps and milestones in the successful implementation of the programs? • Be pragmatic and do not spend too much time discussing first principles. Get the relevant actors around the same table, start from concrete requirements and use cases, and seek implementers / implementations able to: • Scale out • Run in multiple, heterogeneous, hybrid infrastructures • Clearly show the benefits of the proposed vs. legacy / proprietary / ad-‐hoc solutions Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 19
Conclusions • In just 24 months, the INDIGO-‐DataCloud project has realized a comprehensive involvement of many Research Communities and providers for the definition and tracking of requirements. • We identified technology gaps linked to several concrete use cases in multiple fields, defined, published and implemented the overall INDIGO architecture. • After early demonstrations and beta software previews, we produced two major software versions and 9 minor updates, releasing 40 open modular components. We did that exploiting key European know-‐how, reusing and extending open source software, and contributing to upstream projects. We established software development and management processes, and defined development and pre-‐ production distributed testbeds. • Production deployment of many applications making use of the INDIGO software is well underway, and INDIGO components have been proposed for production use in big infrastructures, commercial companies, external projects. • Several opportunities for further exploitation of INDIGO components are being explored and implemented, in the context of the EOSC and beyond. Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 20
Thank you https://www.indigo-‐datacloud.eu Better Software for Better Science. @indigodatacloud www.indigo-‐datacloud.eu https://www.facebook.com/indigodatacloud/ Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 21
Backup slides Davide Salomoni ESA-ESPI Workshop, 7/7/2017 22
The INDIGO added value • INDIGO, driven by scientific communities, has been developing a comprehensive open source Cloud architecture, which provides many new functionalities and services previously unavailable in open source and in some cases also in proprietary Cloud offerings. • These functionalities abstract from underlying IaaS technologies through the consistent use of both de jure and de facto standards. This allows interoperability with hybrid (public/private) infrastructures. • After beta testing and demos shown as early as November 2015, we released our first major software release (MidnightBlue) in August 2016, 9 software updates in the following months, and our second and final major release (ElectricIndigo) in April 2017. Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 23
Release Timeline Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan INDIGO-‐1 Full updates Standard updates Security updates INDIGO-‐2 Full updates Standard updates Security updates Release Date End of Full Updates End of Standard End of Security Updates Updates & EOL INDIGO-‐1 MidnightBlue 08/08/2016 31/01/2017 31/03/2017 31/05/2017 INDIGO-‐2 ElectricIndigo 14/04/2017 30/09/2017 30/11/2017 31/01/2018 24 Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017
ElectricIndigo Four main “solution blocks”: • Data Center Solutions • Data / Storage Solutions • Automated Solutions • User-‐Oriented Solutions And “common solutions”: • Authentication and Authorization Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 25
ElectricIndigo: Application-‐level Interfaces for Cloud Providers and Automated Service Composition • Easily port applications to public and private Clouds using open programmable interfaces, user-‐level containers, and standards-‐based languages to automate definition, composition and instantiation of complex set-‐ups. • Typical questions: How can I run my application on Cloud provider X? What if I want to use Docker but my provider does not support it (e.g. also on HPC systems)? How do I automate the creation and management over public or private Clouds of dynamic clusters running multiple services? Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 26
ElectricIndigo: Flexible Identity and Access Management • Manage access and policies to distributed resources using multiple methods such as OpenID-‐Connect, SAML, X.509 digital certificates, through programmable interfaces and web front-‐ ends. • Typical questions: How can I manage access to distributed resources by users, identified through diverse methods? (e.g. Google ID, digital certificates) How should I modify / write my apps to benefit from that? Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 27
ElectricIndigo: Data Management and Data Analytics Solutions • Distribute and access data through multiple providers via virtual file systems and automated replication and caching, exploiting scalable, high-‐ performance data mining and analytics. • Typical questions: How can I automatically replicate datasets to multiple sites? Can I transparently access my distributed datasets from my app? Can I cache the most accessed data, so that it’s close to where users need it? How do I instantiate clusters and databases for big data analysis? Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 28
ElectricIndigo: Programmable Web Portals, Mobile Applications • Create and interface web portals or mobile apps, exploiting distributed data as well as compute resources located in public and private Cloud infrastructures. • Typical questions: How can I easily provide my app with a pluggable, extensible web front-‐end? Can this front-‐end interface with all the features provided by INDIGO? How can I write an INDIGO-‐ enabled app for Android or iOS? Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 29
ElectricIndigo: Enhanced and Scalable Services for Data Centers and Resource Providers • Increase the efficiency of existing Cloud infrastructures based on OpenStack or OpenNebula through advanced scheduling, flexible cloud / batch management, network orchestration and interfacing of high-‐level Cloud services to existing storage systems. • Typical questions: How can my cloud data centers provide flexible and fair scheduling policies for access to resources? How do I balance traditional vs. cloud resources in my data center? How do I connect novel INDIGO features to my existing systems? How can I manage storage Quality of Service? Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 30
WP6 Services High-‐level view of the GUI Admin Portlets User Portlets Mobile Apps Ophidpia plugin Other INDIGO architecture Clients LONI plugin Science SG Mon Data Workflow Open Mobile Taverna, Gateways Analitics Portlets Toolkit Kepler plugin Support Future Gateway Portal Mobile clients Workflows services Future Gateway REST API Future Gateway Engine JSAGA/JSAGA Adaptors REST/CDMI/Wedbav/posix/Gridftp OIDC TOSCA Kubernetes Cluster Onedata Dynafed PaaS Data Services FTS Orchestrator QoS/SLA This is the INDIGO-‐DataCloud Accounting IAM Service TOSCA Monitoring WP5 General Architecture* Infrastructure Manager CloudProvider Ranker Services TOSCA S3/CDMI/Posix/Webdav GridFTP Mesos Aut. Scaling Mesos Cluster Service Cluster Storage Native IaaS API Service QoS Support Heat/IM Non-INDIGO Smart Identity IaaS Scheduling Spot Istances Armonization *: see details in http://arxiv.org/abs/1603.09536 or in https://www.indigo-‐datacloud.eu/documents-‐deliverables Native Docker Local WP4 Services Repository Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 31
INDIGO Software Development Flow Development T3.4 T3.1 Software T3.2 Software Exploitation WP4 WP5 WP6 quality release and assurance maintenance Development Integration Preview External infrastructure infrastructure infrastructure Service Providers T3.3 Pilot services software delivery WP2 Users software deployment Application Production software use Use-‐cases 3232 work Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017
The INDIGO Development and Integration Infrastructure DESY -‐ dCache PSNC -‐ Indigokepler CESNET -‐ indigo-‐omt -‐ rOCCI KIT -‐ CDMI-‐QoS Cyfronet -‐ TTS -‐ Onedata CERN IFCA/CSIC -‐ Kubernetes -‐ OOI -‐ Magnun -‐ OPIE CNAF/INFN -‐ IAM LIP/INCD -‐ Oneprovider INFN Bari -‐ OpenNebula: ONEDock -‐ Kubernetes -‐ Nova-‐Docker -‐ Mesos -‐ FutureGateway UPV -‐ Chronos -‐ IM -‐ CLUES -‐ TOSCA 33 Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017
The INDIGO Pilot Preview Testbed DESY -‐ CDMI-‐QoS -‐ dCache -‐ OneData Demos are performed in the preview testbed KIT INFN-‐Padova -‐ Synergy -‐ CDMI-‐QOS -‐ OneData -‐ OOI LIP/INCD -‐ OOI IFCA/SIC -‐ IAM connector -‐ ooi CNAF/INFN -‐ Nova-‐Docker -‐ IAM connector -‐ IAM -‐ OS Identity Authentication -‐ nova-‐docker -‐ OneData library -‐ OS Identity Authentication library -‐ CDMI-‐QoS -‐ ONEDock -‐ java-‐syncrepos -‐ Orchestrator -‐ rOCCI server -‐ CloudProviderRanker -‐ TTS -‐ Zabbix-‐wrapper -‐ Java-‐syncrepos -‐ SLAManager -‐ Cloud-‐info-‐provider -‐ CMDB -‐ IM -‐ OneData -‐ FG API server -‐ FG Portal INFN-‐Bari -‐ LiferayIAM -‐ CDMI-‐QoS -‐ Indigo Kepler UPV -‐ OneData -‐ Ophidia -‐ IM -‐ Kubernetes: -‐ Marathon -‐ Chronos -‐ Mesos Davide Salomoni ESA-‐ESPI Workshop, 7/7/2017 34
Resource requirements for LHC 200 GRID 150 ATLAS CMS LHCb 100 ALICE 50 0 500,0 Run 1 Run 2 Run 3 Run 4 400,0 CMS Computing power needs for LHC 300,0 ATLAS 200,0 ALICE 100,0 LHCb 0,0 Run 1 Run 2 Run 3 Run 4 Storage needs for LHC
You can also read