PNUTS: Yahoo!'s Hosted Data Serving Platform - Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon ...
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni
What is needed from current DBMS? v Web applications need: v Scalability v Geographic scope v High availability v Web applications typically have: v Simplified query needs v No joins, aggregations v Relaxed consistency needs v Applications can tolerate stale or reordered data 2
PNUTS v PNUTS, a massively parallel and geographically distributed database system for Yahoo!’s web applications
Outline v Architecture v Experiment v Future work v Critique
Architecture Clients Data-path components REST API Routers Yahoo! Message Tablet Broker controller Storage units
Ordered table A Name Price Apple 1.2 Avocado 2.0 Banana 1.0 Grape 2.5 H Kiwi 2.0 Orange 0.8 P Strawberry 1.2 Watermelon 1.0 Z
Hash table Name Name Price ox0000 Apple Grape 2.5 Avocado Apple 1.2 Banana Hash function Banana 1.0 ox132A Grape Watermelon 1.0 Kiwi Kiwi 2.0 Orange Orange 0.8 Strawberry ox2C3F Avocado 2.0 Watermelon Strawberry 1.2
Flexible schema v Arbitrary structures are allowed v New attributes can be added at any time v Records are not required to have values for all attributes Name Price Name Price Description Name Price Country Description
Architecture v Routers v Which table -> which tablet -> which SU v Contains a cached copy of the interval mapping v Tablet controller v Owns the mapping v Decides when to move tablet, split tablet v Not a bottleneck
Tablet Splitting & Balancing Tables: horizontall par11oned -‐> tablts Storage unit may become a hotspot Tablets may grow over 1me Shed load by moving tablets to other servers 10
Architecture Clients Data-path components REST API Routers Yahoo! Message Tablet Broker controller Storage units
Architecture - YMB v Yahoo! Message Broker (YMB) v Topic-based pub/sub system v Data is considered ‘committed’ when they have been published to YMB. v Only partial ordering of published messages
Consistency model v Per-record timeline consistency v Per-record mastering v Each record is assigned a “master region” v May differ between records v Updates to the record forwarded to the master region v Ensures consistent ordering of updates
Consistency model - API v Read-any v Read-critical(required_version) v Read-latest v Write v Test-and-set-write(required_version)
Consistency model Read-any Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Insertion Update In general, reads are served using a local copy 15
Consistency model Read-critical ≥ v.6 Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Insertion Update 16
Consistency model Read-latest Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Insertion Update But application can request and get current version 17
Consistency model Write Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Insertion Update 18
Consistency model Write if = v. 7 ERROR Stale version Stale version Current version v. 1 v. 2 v. 3 v. 4 v. 5 v. 6 v. 7 v. 8 Generation 1 Time Insertion Update Test-and-set writes facilitate per-record transactions 19
Record Timeline Consistency Transac'ons: v Alice changes status from “Sleeping” to “Awake” v Alice changes loca1on from “Home” to “Work” (Alice, Home, Sleeping) (Alice, Home, Awake) (Alice, Work, Awake) Region 1 Awake Work (Alice, Work, Awake) Work (Alice, Home, Sleeping) (Alice, Work, Awake) Region 2 No replica should see record as (Alice, Work, Sleeping)
Eventual Consistency v Timeline consistency comes at a price v Writes not originating in record master region forward to master and have longer latency v When master region down, record is unavailable for write v We added eventual consistency mode v On conflict, latest write per field wins v Target customers v Those that externally guarantee no conflicts v Those that understand/can cope
Experimental setup v Performance metric v Average request latency v Three PNUTS regions Region Machine Servers/region West 1, 2.8 GHz Xeon, 4GB RAM 5 SU West 2 2 YMB East 1 Router Quad 2.13 GHz Xeon 1 Tablet controller v Workload v 1200-3600 requests/second v 0-50% writes v 80% locality 22
Inserts v Inserts Region Time (hash table) Time (ordered table) West 1 (master) 75.6 75.6 ms 33 ms West 2 (non-master) 131.5 ms 105.8 ms East (non-master) 315.5 ms 324.5 ms 23
Write 140 120 Average latency (ms) 100 80 60 40 20 0 1000 1500 2000 2500 3000 Requests per second Hash table Ordered table 24
Scalability 160 140 120 Average latency (ms) 100 80 60 40 20 0 1 2 3 4 5 6 Storage units Hash table Ordered table 25
Request skew 100 90 80 70 Average latency (ms) 60 50 40 30 20 10 0 0 0.2 0.4 0.6 0.8 1 Zipf parameter Hash table Ordered table 26
Future work v Indexes v Efficient query processing v Bundled updates v Batch-query processing
Critique v Yahoo! Message Broker v Multiple YMBs in one region, how to coordinate ? v The mechanism is rather complicated, scalability ? v All writes go through it, bottleneck ?
Critique v Limited scope of the experiment v 5 storage units for each region v Testing scalability: ranging SU from 2 to 5 v Only test latency, how about throughput ? v Comparison with other data storage systems, like Cassandra v Seems High latency
Thanks. Questions?
You can also read