CESSDA Expert Seminar 2018 - CESSDA Technical Infrastructure Session 2: Cloud Computing (part 1) - An introduction to the technical foundations of ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
CESSDA Expert Seminar 2018 CESSDA Technical Infrastructure John Shepherdson Session 2: Cloud Computing (part 1) - An introduction to the technical foundations CESSDA Platform Delivery Director of CESSDA 60 minutes
CESSDA Technical Framework » A guide for the development of the various (software) tools and services that form part of the CESSDA Research Infrastructure » Promote good practice for software development » Infrastructure for Development, Staging and Production ○ Harmonise development tool chain for SPs ○ Apply consistent set of tests » Stable, scalable deployment environment
Documents and Forms » Technical Architecture » User Experience Guide » API and Developer Guidelines » Software Maturity Levels form » Contributor’s Agreement form » Software Adoption » Repository Request form
Technical Architecture » Promote good software development practice across the Service Provider community, in respect of the provision of software artefacts for CESSDA Research Infrastructure » Publication of basic standards for source code quality so SPs know what is expected of them » bit.ly/tech_arch3_0
User Experience Guide » Describes the general user experience for CESSDA ERIC tools and search applications » Wireframes and visual examples are provided to illustrate functionality » bit.ly/tool_branding_1_5
‘How to’ Guidelines API Design Guidelines: » https://bitbucket.org/cessda/cessda.guidelines.api/wiki/Home Developer CIT Guidelines: » https://bitbucket.org/cessda/cessda.guidelines.cit/wiki/Develope rs
Software Maturity Levels » Approach for assessing maturity of software components ○ so CESSDA can mandate minimum levels that SPs and others have to meet ○ prerequisites for supplying software artefacts to CESSDA » bit.ly/sml_doc1
Software Maturity Levels - scoring » 1. Initial usability; software use is not recommended » 2. Use is feasible; the software can be used by skilled personnel but with considerable effort, cost and risk » 3. Use is possible by most users; with some effort, cost, and risk. A risk assessment should be made before use » 4. Software is usable; with little effort, cost, and risk » 5. Demonstrable usability; there is clear evidence that the software is widely used by many users
Software Maturity Levels form » Online mechanism for assessing 11 criteria: ○ Documentation, Intellectual property issues ○ Extensibility, Modularity, Packaging ○ Portability, Standards compliance, Support ○ Verification and testing, Security ○ Internationalisation and Localization » bit.ly/sml_2
Contributor’s Agreement form » For completion, prior to accessing CESSDA’s repositories » http://bit.ly/contrib_req
Software Adoption » Software Adoption Policy ○ bit.ly/sa_pol2 » Software Adoption Procedure ○ bit.ly/sa_proc2
Repository Request form » http://bit.ly/repo_req
Introduction to Cloud Computing » Definitions » Main players » GDPR considerations » Benefits to CESSDA
Definitions » ‘SaaS applications are designed for end-users, delivered over the web » PaaS is the set of tools and services designed to make coding and deploying those applications quick and efficient » IaaS is the hardware and software that powers it all – servers, storage, networks, operating systems’ Source: Rackspace Whitepaper ‘Understanding the Cloud Computing Stack: SaaS, PaaS, IaaS’, content licensed under CC BY-NC-ND 3.0 »
Main players » Wide choice of cloud provider platforms: ○ AWS ○ Azure ○ Google Cloud ○ IBM Cloud ○ and many more See e.g. Rightscale Cloud Comparison 2018
GDPR: Personal Data » GDPR defines ‘personal data’ as ‘any information relating to an identified or identifiable natural person’ (‘data subject’). » An identifiable natural person is defined as one ‘who can be identified, directly or indirectly, by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person.’ Source: GDRP Article 4(1)
GDPR: Processing Personal Data » Process lawfully, fair and transparent ○ The data subject is informed of what will be done with the data and data processing should be done accordingly » Keep to the original purpose ○ Data should be collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes » Minimise data size ○ Personal data that are collected should be adequate, relevant and limited to what is necessary
GDPR: What we collect » Present ○ User registration -> need clear ‘sign up’ statement » Future ○ Usage data -> Pseudonymisation may help » Google is ‘committed to GDPR’ and is Privacy Shield Certified ○ guide to aid compliance
Benefits to CESSDA » In past, relied on members (‘Service Providers’) to develop and host standalone products » Cloud is Greenfield site for technical development ○ elasticity ○ pay for what you use ○ establish common standards
Overview of Google Cloud Platform » Obtaining access » GCP dashboard » Main features » Pricing model
Obtaining access » By invitation ○ access restricted to essential users ○ temporary access for one off activities
Google Cloud Platform Dashboard
Google Cloud Platform Dashboard
Google Cloud Platform Features » Very extensive - see https://cloud.google.com/terms/services » Software networking » Containers - Docker and more » Clusters - Kubernetes ○ Auto scale/upgrade/repair
Google Cloud Platform Pricing » Pay as you go » Per second billing » Custom machine types » Rightsizing recommendations » Pricing calculator
Technical Infrastructure and GCP » Code Repositories » Development, staging and production environments » Containers and clusters » Management and monitoring
Code Repositories Code repositories in Bitbucket ● Organised into projects ○ CESSDA Architectural Guidelines - CAG ○ CESSDA Managed Content - CMC ○ CESSDA Operations - COPS ○ CESSDA Public Helpdesk - CPH ○ CESSDA Research Infrastructure - CRI ● Repository URLs ○ https://bitbucket.org/cessda/
Code Repositories ● Request access via form ● Specify who and what ● Agree to depositors’ conditions ● Admin creates Bitbucket repos(s) and accounts ● Devs check in code and add documentation
Multiple environments » Integration testing, user testing, go live ○ development has various tools ○ staging and production are very similar » Different subnets for each ○ different firewall rules
Multiple environments Specify basic parameters per product REGION=europe-west1 ZONE=europe-west1-b PROJECT=cessda-development NET=jenkins-net SUBNET=jenkins-subnet PRODUCT=cessda-pasc # TO EDIT MODULE=certbot ENVIRONMENT=dev gcloud config set project $PROJECT gcloud config set compute/region $REGION gcloud config set compute/zone $ZONE
Containers » Containers are predictable, repeatable and immutable » Use Docker containers to run components ○ Working to 12 Factor App guidelines ○ Move from ‘monolithic’ to ‘composed’ apps ■ microservices (one app per container) ○ Maintain application environment ○ Version management and ease of reuse
Containers - basic vocabulary ● Container Image - file ● Container Image Format - as defined by Open Container Initiative (OCI) ● Container Engine - typically uses OCI compliant runtime like runc ● Container - runtime instantiation of a Container Image ● Container Host - system that runs the containerized processes ● Container Registry - storage space for Container Images ● Container Orchestration - dynamic scheduling of container workloads within a cluster of computers Source: A Practical Introduction to Container Terminology
Docker at a glance Source: Docker Reference Architecture: Designing Scalable, Portable Docker Container Networks, Mark Church
Containers Average Start/Stop Times* Technology Start Time Stop Time Docker Containers < 50 ms < 50 ms Virtual Machines 30-45 sec 5-10 sec * Source: https://www.slideshare.net/Flux7Labs/performance-of-docker-vs-vms
Docker and 12 Factor App Source: 12 Factor App with docker
Containers and Clusters Use Kubernetes to orchestrate containers » Provisions and manages underlying cloud resources automatically » Routine health checks detect and replace hung/crashed applications » Autoscaling (up and down) » Portable across clouds and on-premises
Clusters - basic vocabulary » Kubernetes Master - collection of 3 processes that run on single (master) node: ○ kube-apiserver, kube-controller-manager, kube- scheduler » Each non-master node in cluster runs two processes: ○ kubelet, which communicates with the Kubernetes Master ○ kube-proxy, network proxy which reflects Kubernetes networking services on each node Source: https://kubernetes.io/docs/concepts/
Clusters - basic vocabulary » Basic Kubernetes objects: ○ Pod - runs single instance of given application ○ Node - worker machine (virtual or physical) ○ ReplicaSet - create/destroy Pods dynamically (e.g. scaling up or down) ○ Deployment - manages ReplicaSets Source: https://kubernetes.io/docs/concepts/
Clusters - basic vocabulary » Basic Kubernetes objects: ○ Service - defines logical set of Pods plus access policy ○ Volume - file persistence and sharing ○ Label - K/V pair used to organize and to select subsets of objects ○ Namespace - multiple virtual clusters backed by the same physical cluster Source: https://kubernetes.io/docs/concepts/
Kubernetes at a glance Source: Kubernetes in three diagrams, Tsuyoshi Ushio
Build, Test and Deploy Combination of Bitbucket and Jenkins » Commit code to Bitbucket repository » Post commit hook » Jenkins job
Jenkins Jobs » Continuous integration (CI) and continuous delivery (CD) application » Job (or project) is basic unit of work ○ build and test software projects continuously ○ monitor, backup, deploy, notify …. » Jenkins glossary
Jenkins Jobs » Old way - create via Jenkins UI ○ cannot version, need local backup/restore ○ difficult to edit/review/iterate by team » New way - Jenkins file ○ just another source code file ○ manage via SCM system (such as Bitbucket)
Jenkins Jobs - Pipelines Automated expression of process for getting software from version control to users Source: Jenkins Pipeline documentation
Build and Deploy - standard view » Jenkins job - build CDC from ‘develop’ branch
Build and Deploy - Blue Ocean » Jenkins job - build CDC from ‘develop’ branch
Management and monitoring Combination of Jenkins, Stackdriver, UptimeRobot » Jenkins jobs - backups » Stackdriver - error reporting and logging » UptimeRobot - external polling
Thanks for listening Any Questions?
Additional Slides
Common interoperability characteristics CESSDA defines 5 CICs, but how to achieve? • REST APIs c/w API design standards • Architectural standards • Common development environment • Adoption of 12 Factor App principles • Software acceptance criteria
1. Loosely coupled but coordinated Adopt microservices architecture based on RESTful web service APIs • provides a mechanism for reusing and combining software artefacts See also 12 factor app, number 7 (Port binding - Export services via port binding)
2. Sustainable The provision of common standards • Technical Architecture document Common development and test environment • via the technical infrastructure Deployment environment • via extensions to the technical infrastructure Central source-code repository See also 12 factor app, number 1 (Codebase - One codebase tracked in revision control, many deploys)
3. Extensible Service API is key • Integration point for new services • Combination point for building new features Version and support two versions simultaneously • Allows services to evolve, without breaking contract provided to consumers See also 12 factor app, number 8 (Concurrency - Scale out via the process model) See also 12 factor app, number 9 (Disposability - Maximize robustness with fast startup and graceful shutdown)
4. Maintainable Again, service API is key • implementation of a service can be changed as required, to take advantage of developments in software technology • location of services can be changed as required, to take advantage of developments in hardware technology See also 12 factor app, number 2 (Dependencies - Explicitly declare and isolate dependencies)
5. Standards Based • Provision of common architectural standards (via Technical Architecture) • A consistent (in both the calling and return structures and formats) and versioned API See also 12 factor app, number 4 (Backing services - Treat backing services as attached resources)
You can also read