Dynamo: Amazon's Highly Available Key-value Store - ETH ...

Page created by Tracy James

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Dynamo: Amazon's Highly Available Key-value Store - ETH ...

Computing Platforms
FS2021                                                                                            ETH Zürich

 Dynamo: Amazon’s Highly Available Key-value Store
 G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, W.
 Vogels (2007)

                                                                                 by Jie Lou and Yanick Zengaffinen
                                                                                                            D-INFK
                                                                                                        29.03.2021

Introduction – Problem                                                                                                            ETH Zürich

      Effect of High Latency
       Akamai Technologies Inc. (2017)
       • « A 100-millisecond delay in website load time can hurt conversion rates by 7 percent »
       • « A two-second delay in web page load time increase bounce rates by 103 percent »
       https://www.akamai.com/uk/en/about/news/press/2017-press/akamai-releases-spring-2017-state-of-online-retail-performance-report.jsp

                                         https://yoursmallbusinessgrowth.com/are-your-sales-down-because-of-poor-decisions-made-months-ago/

Introduction – Situation & Goal                                                    ETH Zürich

      Amazon Ecosystem [2007]
        • Decentralized
        • Loosly coupled
        • Service oriented (>100 services)

      Requirements
        •   Low and predictable latency
        •   Always writeable
        •   Partition tolerance
        •   Scalability
        •   Minimal administration

                                             https://dl.acm.org/action/downloadSupplement?doi=10.1145%2
                                             F1323293.1294281&file=p205-slides.zip&download=true

Introduction – Situation & Goal                                                         ETH Zürich

      Amazon Ecosystem [2007]
        • Decentralized
                                                                            Microservices at Netflix
        • Loosly coupled
        • Service oriented (>100 services)

      Requirements
        •   Low and predictable latency
        •   Always writeable
        •   Partition tolerance
        •   Scalability                      https://www.infoq.com/presentations/netflix-chaos-
        •   Minimal administration           microservices/?utm_source=youtube&utm_campaign=newcircle
                                             experiment&utm_medium=link

Introduction – Alternatives                                                                                                  ETH Zürich

      RDBMS’s [2007]
       • Overhead
          − Latency (C in ACID)
          − Cost
          − Administration
       • Don’t scale-out / poor load balance

      Simple Storage Service (S3)
       • Overhead / high latency
       • Designed for large objects
       • No tight control over tradeoffs between
          − Availability
                                                                                                   https://hazelcast.com/glossary/cap-theorem/
          − Consistency
          − Cost-Effectiveness
          − Performance
      ClusterixDB Paper: https://mariadb.com/wp-content/uploads/2018/10/Whitepaper-ANewApproachtoScaleOutRDBMS.pdf

Implementation – Overview                                                                     ETH Zürich

     What is Dynamo?
      •   Key value store
      •   Decentralized system of nodes
      •   Eventual-consistency
      •   Others
           − Incremental scalability (add nodes)
           − Symmetry (all nodes the same)
           − Heterogeneity (servers differ)

      ▪   Partition and replication                – Consistent Hashing
      ▪   Consistency                              – Object Versioning
      ▪   Consistency among replicas               – Quorum and Decentralized Sync Protocol
      ▪   Failure detection & membership           – Gossip Based

Implementation – Interface                                   ETH Zürich

      Interface
       • get (key) -> (object) or (objects, context)
       • put (key, value, context)

       • context for versioning

       • Hash of key determines node
       • All nodes can receive any request
          • Through load balancer (high latency!)
          • Forward requests to nodes responsible for data

Implementation – Partition and Replication                                                         ETH Zürich

      Partition – Consistent Hashing
       Goal
       • Dynamically partition data over set of nodes

       Idea
       • Output of hash function mapped to ring
       • Each node
                                                                                       Consistent Hashing
           • has random position on ring
           • is responsible for region on ring between it and its predecessor
       • Departure or arrival of node only affects immediate neighbors

       Problems
       • Non uniform data and load distribution

                                                                                Consistent Hashing – New Node

Implementation – Partition and Replication                                                ETH Zürich

      1. Virtual Nodes
       • Each node assigned to multiple points on ring (virtual nodes)

       Advantages
       • Better load balancing
          − Evenly dispersed on node failure
          − New node takes load from all others
                                                                               Virtual Nodes
       • #virtual nodes allows for heterogenity

       Problems
       • Slow repartitioning

                                                                         Virtual Nodes – New Node

Implementation – Partition and Replication                              ETH Zürich

      2. Fixed Arcs Strategy
       • Divide into fixed amount of equal segments
       • New node adopts virtual existing nodes

       Advantages
       • Fast repartition
       • Simple archival
                                                            Fixed Arc
       • Known segments
          ➢ Less metadata
          ➢ Less overhead

       Problems
       • Limited scaling

                                                      Fixed Arc – New Node

Implementation – Partition and Replication                   ETH Zürich

      Replication
       • Data replicated on first N healthy hosts
       • Preference list
          − Computable
          − > N entries (node failure)
          − Distinct physical nodes

                                                    Replication

Implementation – Consistency                                              ETH Zürich

      Consistency – Data Versioning
       Example: Shopping Cart
       • Most recent state unavailable and user makes change
          => change still meaningful
       • Old and divergent version reconciled later (e.g. carts merged)

       • Eventual consistency
       • Vector clocks (node, counter, timestamp)
       • Vector clock truncated (FIFO)

Implementation – Consistency                                                         ETH Zürich

      Vector Clock Example
       1.    Client writes new object D1 [Sx, 1]
       2.    Client updates D1 to D2 [Sx, 2]
       3.    Sx down
       4.    Client updates D2 to D3 [Sy, 1]
       5.    Sy down
       6.    Sx online
       7.    Other client reads D2
       8.    Sx down
       9.    Other client updates D2 to D4 [Sz, 1]
       10.   Sy,Sx online
       11.   Client reads D2, D3, D4 and reconciliates to D5

                                                               Fig 3. Dynamo Paper

Implementation – Consistency Among Replicas                            ETH Zürich

      Consistency Among Replicas – Sloppy Quorum
       •    Replication over first N healthy nodes

       •    Write successfull if W nodes participate
            1. Coordinator generates new version
            2. Sends to first N healthy nodes
            3. Success if W-1 respond

       •    Read successfull if R nodes participate
            1. Coordinator gathers from first N healthy nodes
            2. Success if R-1 respond

       Problem
       • Data spreads on failure
           => Handoff to preferred node when target node back online

Implementation – Consistency Among Replicas                                                            ETH Zürich

      Handling Longterm Errors
       •    Need to synchronize nodes
            => Hashtrees (aka Merkletrees)

       Advantages
       • Each branch checked individually
       • Reduces amount of traffic

       Disadvantages
       • Key ranges change when node joins/leaves
                                                    https://de.wikipedia.org/wiki/Hash-Baum#/media/Datei:Hash_Tree.svg
          => intensive recalculation

Implementation – Membership and Failure Detection                                            ETH Zürich

      Membership and Failure Detection – Ring Membership
       •    Node outage ≠ permanent departure
            => explicit mechanism for adding/removing nodes
       •    Local failure detection
       •    Nodes comunicate changes in membership or status through gossip based protocol

       Problem
       • Temporal logical partition
           => seed nodes

Implementation – Summary                          ETH Zürich

     Summary

                           Table 1 Dynamo Paper

Results – Latency                                            ETH Zürich

      Results – Latency
                               Figure 4: Average and 99.9 percentiles of
                               latencies for read and write requests
                               during our peak request season of
                               December 2006. The intervals between
                               consecutive ticks in the x-axis correspond
                               to 12 hours. Latencies follow a diurnal
                               pattern similar to the request rate and
                               99.9 percentile latencies are an order of
                               magnitude higher than averages

       Figure 4 Dynamo Paper

Results – Load Balance                          ETH Zürich

      Load Balance
                               Figure 6: Fraction of nodes
                               that are out-of-balance (i.e.,
                               nodes whose request load is
                               above a certain threshold
                               from the average system
                               load) and their corresponding
                               request load. The interval
                               between ticks in x-axis
                               corresponds to a time period
                               of 30 minutes.

       Figure 6 Dynamo Paper

Results – Divergence                                          ETH Zürich

      Divergence
       •       Number of different versions returned
       •       Over 24h period

           #versions                 Amount of requests [%]
           1                         99.94
           2                         00.00057
           3                         00.00047
           4                         00.00009
           Data from Dynamo Paper

Critical Analysis – Paper                                                                      ETH Zürich

      Critical Analysis – Paper
       ❖     Lots of aggregated data
       ❖     Unprecise description of method for some data (e.g. divergence)
       ❖     Few things lack detail (e.g. seed nodes)
       ❖     Lots of references to upcoming sections
       ✓     Still a very good paper ☺

       Methodology
       ❖ Conflict of interest
       ✓ Considered lots of alternatives (2007)
       ✓ Iterative approach (e.g. partitioning)
       ✓ Smart combination of technologies to give system as a whole properties that individual nodes
          don’t provide

Critical Analysis – Dynamo                                                      ETH Zürich

      Dynamo
       ✓    Meets latency requirements
       ✓    Can choose (N,W,R) to customize
       ✓    Incrementally scalable
       ✓    Dynamo instance might not scale indefinitely but the concept does

       ❖    Does not support transactional semantics
       ❖    Programmer needs to do reconciliation
       ❖    New model (different from ACID)
       ❖    Cannot iterate over data
       ❖    No data hierarchy
       ❖    No partial updates => not suited for large objects
       ❖    High operational complexity

       => Good for very specific set of services

Dynamos Legacy – Adoption                                                                        ETH Zürich

     Dynamos Legacy
      •    Dynamo itself not widely adopted (high operational complexity)
           − S3 and SimpleDB popular (run as managed web services)
           − Engineers choose operational ease over suitedness

      «Dynamo might have been the best technology in the world at the time but it was still software
      you had to run yourself. And nobody wanted to learn how to do that if they didn’t have to.
      Ultimately, developers wanted a service.» - Werner Vogels, CTO Amazon

     Interview: https://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html

Dynamos Legacy – Ancestors                                                                               ETH Zürich

     Dynamos Legacy
      • SimpleDB
         − Multi-data center replication
         − High availability
         − Durable
         − No setting up, configuring or patching
         − High latency for big data
         − Limited scaling (10GB per container)
            => workaraounds

                                                    https://insights.stackoverflow.com/survey/2020#most-popular-technologies
      • DynamoDB
         − Combines advantages of Dynamo and SimpleDB
         − Still in use, gaining popularity

Discussion                                                                ETH Zürich

     Are there any questions?

                                https://cdn.pixabay.com/photo/2015/11/03/08/56/question-mark-
                                1019820_1280.jpg

You can also read