Dynamo: Amazon's Highly Available Key-value Store - ETH ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Computing Platforms FS2021 ETH Zürich Dynamo: Amazon’s Highly Available Key-value Store G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, W. Vogels (2007) by Jie Lou and Yanick Zengaffinen D-INFK 29.03.2021
Introduction – Problem ETH Zürich Effect of High Latency Akamai Technologies Inc. (2017) • « A 100-millisecond delay in website load time can hurt conversion rates by 7 percent » • « A two-second delay in web page load time increase bounce rates by 103 percent » https://www.akamai.com/uk/en/about/news/press/2017-press/akamai-releases-spring-2017-state-of-online-retail-performance-report.jsp https://yoursmallbusinessgrowth.com/are-your-sales-down-because-of-poor-decisions-made-months-ago/
Introduction – Situation & Goal ETH Zürich Amazon Ecosystem [2007] • Decentralized • Loosly coupled • Service oriented (>100 services) Requirements • Low and predictable latency • Always writeable • Partition tolerance • Scalability • Minimal administration https://dl.acm.org/action/downloadSupplement?doi=10.1145%2 F1323293.1294281&file=p205-slides.zip&download=true
Introduction – Situation & Goal ETH Zürich Amazon Ecosystem [2007] • Decentralized Microservices at Netflix • Loosly coupled • Service oriented (>100 services) Requirements • Low and predictable latency • Always writeable • Partition tolerance • Scalability https://www.infoq.com/presentations/netflix-chaos- • Minimal administration microservices/?utm_source=youtube&utm_campaign=newcircle experiment&utm_medium=link
Introduction – Alternatives ETH Zürich RDBMS’s [2007] • Overhead − Latency (C in ACID) − Cost − Administration • Don’t scale-out / poor load balance Simple Storage Service (S3) • Overhead / high latency • Designed for large objects • No tight control over tradeoffs between − Availability https://hazelcast.com/glossary/cap-theorem/ − Consistency − Cost-Effectiveness − Performance ClusterixDB Paper: https://mariadb.com/wp-content/uploads/2018/10/Whitepaper-ANewApproachtoScaleOutRDBMS.pdf
Implementation – Overview ETH Zürich What is Dynamo? • Key value store • Decentralized system of nodes • Eventual-consistency • Others − Incremental scalability (add nodes) − Symmetry (all nodes the same) − Heterogeneity (servers differ) ▪ Partition and replication – Consistent Hashing ▪ Consistency – Object Versioning ▪ Consistency among replicas – Quorum and Decentralized Sync Protocol ▪ Failure detection & membership – Gossip Based
Implementation – Interface ETH Zürich Interface • get (key) -> (object) or (objects, context) • put (key, value, context) • context for versioning • Hash of key determines node • All nodes can receive any request • Through load balancer (high latency!) • Forward requests to nodes responsible for data
Implementation – Partition and Replication ETH Zürich Partition – Consistent Hashing Goal • Dynamically partition data over set of nodes Idea • Output of hash function mapped to ring • Each node Consistent Hashing • has random position on ring • is responsible for region on ring between it and its predecessor • Departure or arrival of node only affects immediate neighbors Problems • Non uniform data and load distribution Consistent Hashing – New Node
Implementation – Partition and Replication ETH Zürich 1. Virtual Nodes • Each node assigned to multiple points on ring (virtual nodes) Advantages • Better load balancing − Evenly dispersed on node failure − New node takes load from all others Virtual Nodes • #virtual nodes allows for heterogenity Problems • Slow repartitioning Virtual Nodes – New Node
Implementation – Partition and Replication ETH Zürich 2. Fixed Arcs Strategy • Divide into fixed amount of equal segments • New node adopts virtual existing nodes Advantages • Fast repartition • Simple archival Fixed Arc • Known segments ➢ Less metadata ➢ Less overhead Problems • Limited scaling Fixed Arc – New Node
Implementation – Partition and Replication ETH Zürich Replication • Data replicated on first N healthy hosts • Preference list − Computable − > N entries (node failure) − Distinct physical nodes Replication
Implementation – Consistency ETH Zürich Consistency – Data Versioning Example: Shopping Cart • Most recent state unavailable and user makes change => change still meaningful • Old and divergent version reconciled later (e.g. carts merged) • Eventual consistency • Vector clocks (node, counter, timestamp) • Vector clock truncated (FIFO)
Implementation – Consistency ETH Zürich Vector Clock Example 1. Client writes new object D1 [Sx, 1] 2. Client updates D1 to D2 [Sx, 2] 3. Sx down 4. Client updates D2 to D3 [Sy, 1] 5. Sy down 6. Sx online 7. Other client reads D2 8. Sx down 9. Other client updates D2 to D4 [Sz, 1] 10. Sy,Sx online 11. Client reads D2, D3, D4 and reconciliates to D5 Fig 3. Dynamo Paper
Implementation – Consistency Among Replicas ETH Zürich Consistency Among Replicas – Sloppy Quorum • Replication over first N healthy nodes • Write successfull if W nodes participate 1. Coordinator generates new version 2. Sends to first N healthy nodes 3. Success if W-1 respond • Read successfull if R nodes participate 1. Coordinator gathers from first N healthy nodes 2. Success if R-1 respond Problem • Data spreads on failure => Handoff to preferred node when target node back online
Implementation – Consistency Among Replicas ETH Zürich Handling Longterm Errors • Need to synchronize nodes => Hashtrees (aka Merkletrees) Advantages • Each branch checked individually • Reduces amount of traffic Disadvantages • Key ranges change when node joins/leaves https://de.wikipedia.org/wiki/Hash-Baum#/media/Datei:Hash_Tree.svg => intensive recalculation
Implementation – Membership and Failure Detection ETH Zürich Membership and Failure Detection – Ring Membership • Node outage ≠ permanent departure => explicit mechanism for adding/removing nodes • Local failure detection • Nodes comunicate changes in membership or status through gossip based protocol Problem • Temporal logical partition => seed nodes
Implementation – Summary ETH Zürich Summary Table 1 Dynamo Paper
Results – Latency ETH Zürich Results – Latency Figure 4: Average and 99.9 percentiles of latencies for read and write requests during our peak request season of December 2006. The intervals between consecutive ticks in the x-axis correspond to 12 hours. Latencies follow a diurnal pattern similar to the request rate and 99.9 percentile latencies are an order of magnitude higher than averages Figure 4 Dynamo Paper
Results – Load Balance ETH Zürich Load Balance Figure 6: Fraction of nodes that are out-of-balance (i.e., nodes whose request load is above a certain threshold from the average system load) and their corresponding request load. The interval between ticks in x-axis corresponds to a time period of 30 minutes. Figure 6 Dynamo Paper
Results – Divergence ETH Zürich Divergence • Number of different versions returned • Over 24h period #versions Amount of requests [%] 1 99.94 2 00.00057 3 00.00047 4 00.00009 Data from Dynamo Paper
Critical Analysis – Paper ETH Zürich Critical Analysis – Paper ❖ Lots of aggregated data ❖ Unprecise description of method for some data (e.g. divergence) ❖ Few things lack detail (e.g. seed nodes) ❖ Lots of references to upcoming sections ✓ Still a very good paper ☺ Methodology ❖ Conflict of interest ✓ Considered lots of alternatives (2007) ✓ Iterative approach (e.g. partitioning) ✓ Smart combination of technologies to give system as a whole properties that individual nodes don’t provide
Critical Analysis – Dynamo ETH Zürich Dynamo ✓ Meets latency requirements ✓ Can choose (N,W,R) to customize ✓ Incrementally scalable ✓ Dynamo instance might not scale indefinitely but the concept does ❖ Does not support transactional semantics ❖ Programmer needs to do reconciliation ❖ New model (different from ACID) ❖ Cannot iterate over data ❖ No data hierarchy ❖ No partial updates => not suited for large objects ❖ High operational complexity => Good for very specific set of services
Dynamos Legacy – Adoption ETH Zürich Dynamos Legacy • Dynamo itself not widely adopted (high operational complexity) − S3 and SimpleDB popular (run as managed web services) − Engineers choose operational ease over suitedness «Dynamo might have been the best technology in the world at the time but it was still software you had to run yourself. And nobody wanted to learn how to do that if they didn’t have to. Ultimately, developers wanted a service.» - Werner Vogels, CTO Amazon Interview: https://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html
Dynamos Legacy – Ancestors ETH Zürich Dynamos Legacy • SimpleDB − Multi-data center replication − High availability − Durable − No setting up, configuring or patching − High latency for big data − Limited scaling (10GB per container) => workaraounds https://insights.stackoverflow.com/survey/2020#most-popular-technologies • DynamoDB − Combines advantages of Dynamo and SimpleDB − Still in use, gaining popularity
Discussion ETH Zürich Are there any questions? https://cdn.pixabay.com/photo/2015/11/03/08/56/question-mark- 1019820_1280.jpg
You can also read