PNUTS: Yahoo!'s Hosted Data Serving Platform - Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon ...

Page created by Teresa Chambers
 
CONTINUE READING
PNUTS: Yahoo!'s Hosted Data Serving Platform - Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon ...
PNUTS: Yahoo!’s Hosted
 Data Serving Platform
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava,
Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,
      Nick Puz, Daniel Weaver and Ramana Yerneni
What is needed from current DBMS?

v Web applications need:
   v   Scalability
   v Geographic scope
   v High availability

v Web applications typically have:
   v Simplified query needs
        v No joins, aggregations

   v Relaxed consistency needs
        v Applications can tolerate stale or reordered data
                                                               2
PNUTS
v PNUTS, a massively parallel and geographically
   distributed database system for Yahoo!’s web
   applications
Outline
v Architecture

v Experiment

v Future work

v Critique
Architecture
                   Clients
                                          Data-path components

   REST API

                Routers
                              Yahoo! Message
  Tablet                          Broker
controller

              Storage units
Ordered table
A

 Name           Price
     Apple          1.2
       Avocado      2.0
       Banana       1.0
       Grape        2.5
H

       Kiwi         2.0
       Orange       0.8
P

       Strawberry   1.2
       Watermelon   1.0
Z
Hash table
Name                                     Name         Price
                             ox0000

Apple                                    Grape        2.5
Avocado                                  Apple        1.2
Banana       Hash function               Banana       1.0
                             ox132A

Grape                                    Watermelon   1.0
Kiwi                                     Kiwi         2.0
Orange                                   Orange       0.8
Strawberry                   ox2C3F

 Avocado        2.0
Watermelon                               Strawberry   1.2
Flexible schema
v Arbitrary structures are allowed

v New attributes can be added at any time
   v Records are not required to have values for all attributes

     Name   Price
                            Name         Price   Description

                         Name    Price     Country   Description
Architecture
v Routers
   v Which table -> which tablet -> which SU
   v Contains a cached copy of the interval mapping

v Tablet controller
   v Owns the mapping
   v Decides when to move tablet, split tablet
   v Not a bottleneck
Tablet Splitting & Balancing
      Tables:	
  horizontall	
  par11oned	
  -­‐>	
  tablts	
  	
  
            Storage	
  unit	
  may	
  become	
  a	
  hotspot	
  

                                                 Tablets	
  may	
  grow	
  over	
  1me	
  

     Shed	
  load	
  by	
  moving	
  tablets	
  to	
  other	
  servers	
  

                                            10
Architecture
                   Clients
                                          Data-path components

   REST API

                Routers
                              Yahoo! Message
  Tablet                          Broker
controller

              Storage units
Architecture - YMB
v Yahoo! Message Broker (YMB)
   v Topic-based pub/sub system
   v Data is considered ‘committed’ when they have been
      published to YMB.

   v Only partial ordering of published messages
Consistency model
v Per-record timeline consistency

v Per-record mastering
   v Each record is assigned a “master region”
   v May differ between records
   v Updates to the record forwarded to the master region
   v Ensures consistent ordering of updates
Consistency model - API
v Read-any

v Read-critical(required_version)

v Read-latest

v Write

v Test-and-set-write(required_version)
Consistency model
                                                                      Read-any

                             Stale version

 Stale version

                                Current
                                                                                              version

    v. 1

   v. 2

   v. 3

   v. 4

     v. 5

       v. 6

      v. 7

    v. 8

                                            Generation 1

                                                        Time

Insertion

 Update

              In general, reads are served using a local copy

                                                                                                           15
Consistency model
                                                      Read-critical ≥ v.6

                             Stale version

 Stale version

                            Current
                                                                                          version

    v. 1

   v. 2

   v. 3

   v. 4

     v. 5

       v. 6

   v. 7

   v. 8

                                            Generation 1

                                                    Time

Insertion

 Update

                                                                                                       16
Consistency model
                                                               Read-latest

                               Stale version

 Stale version

                            Current
                                                                                            version

    v. 1

     v. 2

   v. 3

   v. 4

     v. 5

       v. 6

   v. 7

   v. 8

                                              Generation 1

                                                    Time

Insertion

 Update

              But application can request and get current version

                                                                                                         17
Consistency model
                                                                  Write

                             Stale version

 Stale version

                            Current
                                                                                          version

    v. 1

   v. 2

   v. 3

   v. 4

     v. 5

       v. 6

   v. 7

   v. 8

                                            Generation 1

                                                    Time

Insertion

 Update

                                                                                                       18
Consistency model
                                                            Write if = v.
                                                            7

                                                                                          ERROR

                             Stale version

 Stale version

                              Current
                                                                                            version

    v. 1

   v. 2

   v. 3

   v. 4

     v. 5

       v. 6

   v. 7

   v. 8

                                            Generation 1

                                                      Time

Insertion

 Update

              Test-and-set writes facilitate per-record transactions

                                                                                                         19
Record Timeline Consistency
      Transac'ons:	
  
            v Alice	
  changes	
  status	
  from	
  “Sleeping”	
  to	
  “Awake”	
  

            v Alice	
  changes	
  loca1on	
  from	
  “Home”	
  to	
  “Work”	
  

            (Alice, Home, Sleeping)         (Alice, Home, Awake)          (Alice, Work, Awake)
Region 1
                                   Awake                       Work
                                                                                           (Alice, Work, Awake)

                                                                      Work

             (Alice, Home, Sleeping)                                                  (Alice, Work, Awake)
Region 2

No	
  replica	
  should	
  see	
  record	
  as	
  (Alice,	
  Work,	
  Sleeping)	
  
Eventual Consistency
v Timeline consistency comes at a price
   v Writes not originating in record master region forward
       to master and have longer latency
   v When master region down, record is unavailable for
       write

v We added eventual consistency mode
   v On conflict, latest write per field wins
   v Target customers
       v Those that externally guarantee no conflicts
       v Those that understand/can cope
Experimental setup
v   Performance metric
     v    Average request latency

v   Three PNUTS regions
 Region           Machine                 Servers/region
 West 1,          2.8 GHz Xeon, 4GB RAM   5 SU
 West 2                                   2 YMB
 East                                     1 Router
                  Quad 2.13 GHz Xeon
                                          1 Tablet controller

v   Workload
     v    1200-3600 requests/second
     v    0-50% writes
     v    80% locality

                                                                22
Inserts
v Inserts

Region                Time (hash table)   Time (ordered table)
West 1 (master)       75.6 75.6 ms        33 ms
West 2 (non-master)   131.5 ms            105.8 ms
East (non-master)     315.5 ms            324.5 ms

                                                                 23
Write
140

120
      Average latency (ms)

100

80

60

40

20

 0
  1000                       1500             2000          2500   3000
                                       Requests per second
                                    Hash table     Ordered table

                                                                          24
Scalability
          160
          140
          120
Average latency (ms)

          100
                       80
                       60
                       40
                       20
                       0
                            1   2         3             4          5   6
                                          Storage units
                                     Hash table    Ordered table

                                                                           25
Request skew
         100
          90
          80
          70
Average latency (ms)

          60
          50
          40
          30
          20
          10
           0
                       0   0.2      0.4          0.6          0.8   1
                                        Zipf parameter
                                 Hash table   Ordered table

                                                                        26
Future work
v Indexes
   v Efficient query processing

v Bundled updates

v Batch-query processing
Critique
v Yahoo! Message Broker
   v Multiple YMBs in one region, how to coordinate ?
   v The mechanism is rather complicated, scalability ?
   v All writes go through it, bottleneck ?
Critique
v Limited scope of the experiment
   v 5 storage units for each region
   v Testing scalability: ranging SU from 2 to 5
   v Only test latency, how about throughput ?
   v Comparison with other data storage systems, like
      Cassandra
      v Seems High latency
Thanks.

Questions?
You can also read