Databases on AWS: How To Choose The Right Database
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Databases on AWS: How To Choose The Right Database Randall Hunt Markus Ostertag Dr. Sebastian Brandt Software Engineer at AWS CEO at Team Internet AG Senior Key Expert - Knowledge @jrhunt @Osterjour Graph at Siemens CT randhunt@amazon SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB DocumentDB Redis Aurora QLDB SQL Server PostgreSQL Elasticsearch Timestream Oracle DB2 MySQL MongoDB Neptune Access Cassandra 1970 1980 1990 2000 2010 SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Modern apps have modern requirements Users: 1 million+ Data volume: TB–PB–EB Locality: Global Performance: Milliseconds–microseconds Request rate: Millions per second Access: Web, Mobile, IoT, Devices Scale: Up-down, Out-in Economics: Pay for what you use Ride hailing Media streaming Social media Dating Developer access: No assembly required SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Common data categories and use cases Relational Key-value Document In-memory Graph Time-series Ledger Referential High Store Query by key Quickly and Collect, store, Complete, integrity, ACID throughput, low- documents and with easily create and process immutable, and transactions, latency reads quickly query microsecond and navigate data sequenced verifiable history schema- and writes, on any latency relationships by time of all changes to on-write endless scale attribute between application data data Lift and shift, ERP, Real-time bidding, Content Leaderboards, Fraud detection, IoT applications, Systems CRM, finance shopping cart, management, real-time analytics, social networking, event tracking of record, supply social, product personalization, caching recommendation chain, health care, catalog, customer mobile engine registrations, preferences financial SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS: Purpose-built databases Relational Key-value Document In-memory Graph Search Time-series Ledger Amazon RDS Amazon Amazon Amazon Amazon Amazon DynamoDB Amazon Amazon DocumentDB ElastiCache Neptune Elasticsearch Timestream Quantum Service Ledger Aurora Community Commercial Redis Memcached Database SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Relational Database Service (RDS) Managed relational database service with a choice of six popular database engines Easy to administer Available and durable Highly scalable Fast and secure No need for infrastructure Automatic Multi-AZ data Scale database compute SSD storage and guaranteed provisioning, installing, and replication; automated backup, and storage with a few provisioned I/O; data maintaining DB software snapshots, failover clicks with no app encryption at rest and in downtime transit SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Traditional SQL • TCP based wire protocol INSERT INTO table1 • Well Known, lots of uses (id, first_name, last_name) • Common drivers (JDBC) VALUES (1, Randall’, Hunt’); • Frequently used with ORMs • Scale UP individual instances SELECT col1, col2, col3 • Scale OUT with read replicas FROM table1 • Sharding at application level WHERE col4 = 1 AND col5 = 2 • Lots of flavors but very similar GROUP BY col1 language HAVING count(*) > 1 • Joins ORDER BY col2 SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Aurora MySQL and PostgreSQL-compatible relational database built for the cloud Performance and availability of commercial-grade databases at 1/10th the cost Performance Availability Highly secure Fully managed and scalability and durability 5x throughput of standard Fault-tolerant, self-healing Network isolation, Managed by RDS: MySQL and 3x of standard storage; six copies of data encryption at No hardware provisioning, PostgreSQL; scale-out up to across three Availability Zones; rest/transit software patching, setup, 15 read replicas continuous backup to Amazon S3 configuration, or backups SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SQL vs NoSQL SQL NoSQL Optimized for storage Optimized for compute Normalized/relational Denormalized/hierarchical Ad hoc queries Instantiated views Scale vertically Scale horizontally Good for OLAP Built for OLTP at scale SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DynamoDB Fast and flexible key value database service for any scale Comprehensive Global database for Performance at scale Serverless security global users and apps Consistent, single-digit No server provisioning, Encrypts all data by Build global applications with millisecond response times at software patching, or default and fully integrates fast access to local data by easily any scale; build applications with upgrades; scales up or down with AWS Identity and replicating tables across multiple virtually unlimited throughput automatically; continuously Access Management for AWS Regions backs up your data robust security SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DynamoDB Data Structure Table Items Attributes Partition Sort All items for key Key Key ==, , >=,
DynamoDB Schema and Queries • Connects over HTTP import boto3 • Global Secondary Indexes and Local votes_table = Secondary Indexes boto3.resource('dynamodb').Table('votes') • Speed up queries with DAX resp = votes_table.update_item( • Global tables (mutli-region-multi- Key={'name': editor}, master) UpdateExpression="ADD votes :incr", • Transactions across multiple tables ExpressionAttributeValues={":incr": 1}, • Change Streams ReturnValues="ALL_NEW" • Rich query language with expressions ) • Provision read and write capacity units separately with or without autoscaling • Also supports pay per request model SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB Advancements over the last 22 months February 2017 April 2017 April 2017 June 2017 Time To VPC DynamoDB Auto Live (TTL) endpoints Accelerator (DAX) scaling November 2017 November 2017 November 2017 March 2018 On-demand Point-in-time Global tables Encryption at rest backup recovery June 2018 August 2018 November 2018 November 2018 Adaptive 99.999% SLA Transactions On-demand SLA capacity ACID SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon DocumentDB: Modern cloud-native architecture What would you do to improve scalability and availability? 1 2 3 Decouple Distribute data in Increase the compute and smaller partitions replication of storage data (6x)
Amazon DocumentDB Fast, scalable, and fully managed MongoDB-compatible database service Fast Scalable Fully managed MongoDB compatible Millions of requests per second Separation of compute and Managed by AWS: Compatible with MongoDB 3.6; with millisecond latency; twice storage enables both layers no hardware provisioning; use the same SDKs, tools, and the throughput of MongoDB to scale independently; auto patching, quick setup, applications with Amazon scale out to 15 read replicas secure, and automatic DocumentDB in minutes backups SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Document databases • Data is stored in JSON-like documents JSON documents are first-class objects • Documents map naturally of the database { to how humans model data id: 1, • Flexible schema and name: "sue", age: 26, indexing email: "sue@example.com", • Expressive query language promotions: ["new user", "5%", "dog lover"], memberDate: 2018-2-22, built for documents (ad shoppingCart: [ {product:"abc", quantity:2, cost:19.99}, hoc queries and {product:"edf", quantity:3, cost: 2.99} aggregations) ] } SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon ElastiCache Redis and Memcached compatible, in-memory data store and cache Redis & Memcached Extreme Secure and Easily scalable compatible performance reliable Fully compatible with open In-memory data store and Network isolation, Scale writes and reads with source Redis and Memcached cache for microsecond encryption at rest/transit, sharding and replicas response times HIPAA, PCI, FedRAMP, multi AZ, and automatic failover SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Redis • Redis Serialization Protocol over TCP (RESP) • Supports Strings, Hashes, Lists, Sets, and Sorted Sets • Simple commands for manipulating in memory data structures • Pub/Sub features • Supports clustering via partitions SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Memcached • Text and Binary protocols over TCP • Small number of commands: set, add, replace, append, prepend, cas, get, gets, delete, incr,decr • Client/Application based partitioning SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Relationships enable new applications Social networks Restaurant recommendations Retail fraud detection SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Use cases for highly connected data Social networking Recommendations Knowledge graphs Fraud detection Life Sciences Network & IT operations SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Different approaches for highly connected data Purpose-built to answer questions about Purpose-built for a business process relationships SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Neptune Fully managed graph database Easy Open Fast Reliable Query billions of relationships Six replicas of your data across Build powerful queries Supports Apache TinkerPop & with millisecond latency three AZs with full backup and easily with Gremlin and W3C RDF graph models restore SPARQL SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
LEADING GRAPH MODELS AND FRAMEWORKS PROPERTY GRAPH RESOURCE DESCRIPTION FRAMEWORK (RDF) Open Source Apache TinkerPop™ W3C Standard Gremlin Traversal Language SPARQL Query Language SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Neptune high level architecture Bulk load from S3 Database Mgmt. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Gremlin g.addV('person').property(id, 1).property('name', 'randall') g.V('1').property(single, 'age', 27) g.addV('person').property(id, 2).property('name', 'markus') g.addE('knows').from(g.V('1')).to(g.V('2')).property('weight', 1.0) g.V().hasLabel('person') g.V().has('name', 'randall').out('knows').valueMap() http://tinkerpop.apache.org/docs/current/reference/#graph-traversal-steps SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SPARQL and RDF https://www.w3.org/TR/sparql11-query/ SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
THE BENEF IT OF URIs: LINKED DATA Linking across datasets by referencing globally unique URIs Wikidata GeoNames PermID Example: PermID (re)uses as a global Identifier for the USA, which is an identifier rooted in GeoNames. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Timestream (sign up for the preview) Fast, scalable, fully managed time-series database 1,000x faster and 1/10th the Trillions of Time-series analytics Serverless cost of relational databases daily events Collect data at the rate of Adaptive query processing Built-in functions for Automated setup, millions of inserts per engine maintains steady, interpolation, smoothing, configuration, server second (10M/second) predictable performance and approximation provisioning, software patching SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Quantum Ledger Database (QLDB) (Preview) Fully managed ledger database Track and verify history of all changes made to your application’s data Cryptographically Immutable verifiable Highly scalable Easy to use Maintains a sequenced record of Uses cryptography to Executes 2–3X as many Easy to use, letting you all changes to your data, which generate a secure output transactions than ledgers use familiar database cannot be deleted or modified; file of your data’s history in common blockchain capabilities like SQL APIs you have the ability to query and frameworks for querying the data analyze the full history SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS Database Migration Service (AWS DMS) Migrate between on-premises and AWS MIGRATING DATABASES Migrate between databases TO AWS Automated schema conversion 100,000+ databases migrated Data replication for zero-downtime migration SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AWS DMS—Logical replication Customer Aurora Premises, VPN/Network PostgreSQL EC2, RDS Start a replication instance Let the AWS Database Migration Service create tables and load data Connect to source and target databases Uses change data capture to keep Application Users them in sync Select tables, schemas, or databases Switch applications over to the target at your convenience SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Customers are moving to AWS Databases Verizon is migrating over 1,000 business-critical applications and database backend systems to AWS, several of which also include the migration of production databases to Amazon Aurora. By December 2018, Amazon.com will have migrated 88% of their Oracle DBs (and 97% of critical system DBs) moved to Amazon Aurora and Amazon DynamoDB. They also migrated their 50 PB Oracle Data Warehouse to AWS (Amazon S3, Amazon Redshift, and Amazon EMR). Trimble migrated their Oracle databases to Amazon RDS and project they will pay about 1/4th of what they paid when managing their private infrastructure. Wappa migrated from their Oracle database to Amazon Aurora and improved their reporting time per user by 75 percent. Samsung Electronics migrated their Cassandra clusters to Amazon DynamoDB for their Samsung Cloud workload with 70% cost savings. Intuit migrated from Microsoft SQL Server to Amazon Redshift to reduce data-processing timelines and get insights to decision makers faster and more frequently. Equinox Fitness migrated its Teradata on-premises data warehouse to Amazon Redshift. They went from static reports to a modern data lake that delivers dynamic reports. Eventbrite moved from Cloudera to Amazon EMR and were able to cut costs dramatically, spinning clusters up/down on-demand and using Spot (saving > 80%) and Reserved Instances. SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Most enterprise database & analytics cloud customers SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Most startup database & analytics cloud customers SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Problem: Matching IPs to Ranges in Real-Time Bidding Real-Time Bidding Requirements: • High throughput (>25000 req/s) • Response time is critical • Scalability SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Possible choices Relational Key-value In-memory Well known, High throughput, Query by key Flexible Indices, low-latency with Query reads, microsecond language, endless scale latency Data scheme obvious Latency? Data scheme? Data scheme? Query volume? Indices? Sharding? Future scale? Sharding? Future scale? SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Possible choices Relational Key-value In-memory Well known, High throughput, Query by key Flexible Indices, low-latency with Query reads, microsecond language, endless scale latency Data scheme obvious Latency? Data scheme? Data scheme? Query volume? Indices? Sharding? Future scale? Sharding? Future scale? SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB? But how? • Sharding and Index downsides addressed by transforming data during ingest • Accept multiple entries per range -> sharding & query possible/optimized SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB it is! SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB it is! SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
DynamoDB it is! SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Learnings • Focus on “core requirements“ • Be creative about data schemes • Accept minor downsides – very often there is no perfect solution • Use upsides of chosen database to optimize (i.e. adding DAX for caching) SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Knowledge Graphs @ Siemens Sebastian Brandt Steffen Lamparter Semantics and Reasoning Group Siemens Corporate Technology CT RDA BAM SMR-DE February 2019 siemens.com Unrestricted © Siemens AG 2019
Knowledge graphs become especially powerful for managing complex queries and heterogeneous data Why Knowledge Graphs? Game-changing data integration • Graphs are a natural way to represent entities and their relationships Robust data quality assurance • Graphs can capture a broad spectrum of data (structured / unstructured) Intuitive domain modelling • Graphs can be managed efficiently Flexibility & performance Low up-front investment Dr. Sebastian Brandt, Dr. Steffen Lamparter, Siemens Corporate Technology Siemens AG 2019
What are Graphs? Knowledge representation formalism semantic descriptions of entities and their relationships June 23, 1957 Werner v. Siemens and Objects Friedrich Halske Real-world objects (things, places, Is born on Roesemarie Arnbruck people) and abstract concepts (genres, Was founded by Kaeser religions, professions) Is born in Is married to Person Is working at Is a Relationships Is a Joe Kaeser Logical connection between two objects Was promoted Company e.g. Joe Kaeser is born in Arnbruck Has pictures August 1, 2013 Is headquartered in Munich, Berlin Semantic descriptions The semantic description indicates the meaning of an object or relation, e.g. Joe Rules make it possible to add further expert knowledge, e.g. Kaeser is a person "Siemens has to be a company, as a person is working there" Dr. Sebastian Brandt, Dr. Steffen Lamparter, Siemens Corporate Technology Siemens AG 2019
Knowledge graphs become a powerful addition to traditional data warehouses for managing heterogeneous data with complex relations Knowledge graphs Traditional data warehouses low high Dimensions Limited relations across data, High number & complex of data relations e.g. time series data relations, e.g. social networks Variety of "We have a clear scope of low high "We do not yet know all the questions to be user questions" fixed set of different questions, users will answered dashboard needed ask" Chatbot Only few different data types low high Analysis needs to combine he- Data heterogeneity and data sources e.g. terogeneous data, e.g. sensor invoicing database in ERP data, text data, product data low high Need to handle imperfect, Ensured consistency of stored Quality of data incomplete or inconsistent data data Dr. Sebastian Brandt, Dr. Steffen Lamparter, Siemens Corporate Technology Siemens AG 2019
Use cases for knowledge graphs can be clustered into five categories – overview and use case examples Degree of complexity Data access & Recommender Constraints & Data quality Digital companion dashboarding system planning Improving data availability Maintaining up-to-date Enhancing features of Providing users high Enabling autonomous and quality by combining meta-data, creating existing products or quality recommendations systems and comparing data from transparency on all services with digital by identifying similarities to understand data and its various sources to fill in available data and companions that are able in historical data dependencies and take missing data sets or making them accessible to understand and own decisions, such as identify potentially wrong to users via queries process user questions autonomous planning of data and data duplicates and providing the needed production proces-ses data insights Dr. Sebastian Brandt, Dr. Steffen Lamparter, Siemens Corporate Technology Siemens AG 2019
Example: gas-turbine maintenance planning • Cross-life-cycle data integration at PS DO: Fleet browsing and search Maintenance planning Analytics for R&D gas-turbines, including maintenance, repair, monitoring, and configuration data • Holistic engineering-centric domain model • Intuitive graph queries independent of source data schemata • Virtual integration of time-series data Siemens Knowledge Graph DB DB Docs CSV, XML, Time-series Excel, … Power turbine Maintenance Repair Configuration Dr. Sebastian Brandt, Dr. Steffen Lamparter, Siemens Corporate Technology Siemens AG 2019
A semantic data model enables flexible linking and an integrated, intuitive API for applications Planning Simulation Building Security aaS Building Service Portal Location Ba- Drive tools tool Operations Performance sed Services applications BT Knowledge Semantic Data Model Graph Building API Product API Data API www Digital Lifecycle Platform Customer Data Building Structure Product Data MindSphere Weather Data BIM Public Energy Data Dr. Sebastian Brandt, Dr. Steffen Lamparter, Siemens Corporate Technology Siemens AG 2019
Creating perfect places based on Services – a user-centric holistic approach to the modern workplace … Customer Interest Relevant KPI’s Optimizing CAPEX and OPEX Energy and CO2 emissions asset efficiency Asset Performance/Useful Life Cost per space unit Workplace Utilization Space efficiency Revenue per space unit Vacancy Rate Employee productivity Individual efficiency and comfort Employee satisfaction Dr. Sebastian Brandt, Dr. Steffen Lamparter, Siemens Corporate Technology Siemens AG 2019
Industrializing Knowledge Graphs Industrial Knowledge Graph Relevant technologies R&D Areas ML for Decision Making Decision Making Decision Graphs • Reasoning and Constraint Solving • Explanation of AI decisions Making • Machine/Deep Learning • Data access: Semantic Search • Question Answering • Machine Learning on Graphs for Knowledge Storage and recommendations, quality, etc. Automation Integration Storage and Integration Storage and Integration Ontology • Graph/NoSQL databases • Reusable Semantic Modelling Library • Constraints and Rules and Knowledge Graphs Generation • Probabilistic programming • Data integration and cleaning (e.g. • Ontologies entity reconciliation) Simple RDF + Generation Generation PGM • NLP/Text understanding • Extraction from unstructured data • Machine/Deep Learning (inclusive text, audio, image) • Computer vision • Automatic semantic annotation • Sound recognition of structured data ML for • Virtual data Integration • Learning of domain-specific automated rules/patterns • Information retrieval annotation • … Machines Humans Dr. Sebastian Brandt, Dr. Steffen Lamparter, Siemens Corporate Technology Siemens AG 2019
Get in touch! Dr. Sebastian Brandt Senior Key Expert – Knowledge Graph and Data Management CT RDA BAM SMR-DE Dr. Steffen Lamparter Head of Research Group Semantics & Reasoning CT RDA BAM SMR-DE Siemens AG Corporate Technology Otto-Hahn-Ring 6 81739 München Germany E-mail steffen.lamparter@siemens.com Intranet intranet.siemens.com/ct Dr. Sebastian Brandt, Dr. Steffen Lamparter, Siemens Corporate Technology Siemens AG 2019
Thank you! SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
SUMMIT © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.
You can also read