Amazon Aurora A Fast, Aordable and Powerful RDBMS - White Paper - Apps Associates

Page created by Keith Bell

IT & Technique

English

Like
Share
Embed
Fullscreen
Slides
Download HTML
Download PDF
Abuse

←

→

Page content transcription

If your browser does not render page correctly, please read the page content below

Amazon Aurora A Fast, Aordable and Powerful RDBMS - White Paper - Apps Associates

White Paper

   Amazon Aurora
   A Fast, Aﬀordable and Powerful RDBMS

© Copyright 2018, Apps Associates LLC. All rights reserved.

www.appsassociates.com

                               TABLE OF CONTENTS

Introduction                                                                                   3

Multi-Tenant Logging and Storage Layer with Service-Oriented Architecture                      3

High Availability with Self-Healing and Fault Tolerance                                        4

The adoption of Log Structured Storage in Aurora                                               4

Lock Free Concurrency with MVCC (Multi Version Concurrency Control)                            6

Survivable Caches for Faster Instance Crash Recovery                                           7

                    © Copyright 2018, Apps Associates LLC. All rights reserved.

www.appsassociates.com

                                      A Fast, Aﬀordable and Powerful RDBMS

           Introduction
           Amazon Aurora is a relational database engine that combines the speed and reliability of high-end commer-
           cial databases with the simplicity and cost-eﬀectiveness of open source databases. This whitepaper will
           describe the unique features of Aurora that make it a highly eﬃcient, scalable, reliable and self-healing
           relational database. It will also explain how Aurora is faster than MySQL.

           Multi-tenant Logging and Storage Layer with Service Oriented Architecture
           Amazon has done a great job in applying service oriented architecture to the database. It has built a storage
           service similar to that of the Elastic Block Storage (EBS) service used by EC2 (Elastic Cloud Compute) server
           instances. AWS has created a service-oriented architecture for accessing storage from the Aurora Database
           and has moved the logging and storage layer into a multi-tenant, scale-out database-optimized storage
           service.

                                                        Data Plane            Control Plane
                                                VPC

                                                            SQL

                                                        Transactions

                                                          Caching                Amazon
                                                                                DynamoDB
                                          VPC

                                                      Logging + Storage

                                                                               Amazon SWF

                                                        Amazon S3            Amazon Route 53

           Amazon has Integrated Aurora with other AWS services like Amazon EC2, Amazon VPC (Virtual Private
           Cloud), Amazon DynamoDB, Amazon SWF, and Amazon Route 53 for control plane operations. Control
           plane is the part of the service that controls the database itself. Its capabilities include provisioning of the
           system and any metadata that Aurora has to keep about the system. The end point information of the RDS
           instance is controlled by DNS via Route53.

           The storage is integrated with Amazon S3 for continuous incremental backups with 99.999999999% dura-
           bility

Page | 3                              © Copyright 2018, Apps Associates LLC. All rights reserved.

www.appsassociates.com

High Availability with Self-healing and fault tolerant

AZ 1 AZ 2 AZ 3 AZ 1 AZ 2 AZ 3
SQL SQL
Transactions Transactions

Caching Caching

Read availability Read and write availability

High Availability in Aurora is automatically provided without additional cost to the customer. By default,
Aurora scales across three availability zones with two copies in each region, making a total of six copies. To
ensure data consistency AWS uses something called the four quorum model. This requires that at least four
of the six copies successfully complete a write command when a change is made to your data. When reading
data, Aurora will consider a read to be consistent if there is agreement from at least three of the six copies.
Also, some really nice logic has been implemented; if an Availability Zone(AZ) goes down Aurora will
automatically roll over to the three or four quorum model which means your write and reads don't
completely fail if a single AZ fails. AWS is ensuring that recovery will be fast due to log structured storage
running on SSD’s (solid state drives) that keep latencies very low, reducing disk seek time. The most signiﬁ-
cant advantage of Amazon RDS Aurora is that the system does automatic error detection, and repair.

The adoption of Log Structured Storage in Aurora
One of the major breakthrough features of Aurora which makes it unique in the RDBMS space is the log
structured storage of data. Log structure delivers high write throughput and simplifies hot backups, lock free
concurrency, recovery and snapshots. This helps Aurora to
log-structured file system
speed up file writing operations as well as crash recovery. Log
structured storage is a well-researched concept in computer
science and something very familiar for the folks working with
NoSQL products like HBASE and Cassandra. With Log Struc-
tured Storage, new records are append to the storage, and exist-
ing records are never updated. B-Tree indexes hold pointers to
the latest version of a record. On a periodic basis the stale
records are removed through a garbage collection process.

conventional ﬁle systems

www.appsassociates.com

Data Inode Dir Meta-
Data

Disk
The log structured storage avoids reading before writing. Read-before-write, especially in a large distributed
system, can produce bottlenecks in read performance. During updates the system doesn’t waste time seeking the
record on the disk to ﬁnd where the original value was stored and write back to that page. Instead, it simply
appends it to the end of the ﬁle and uses b-tree indexes to hold pointers to the latest version of the record. So when
you write a new block of storage or a new segment of storage it updates the index to point to the most recent
version of that data. In this way there is zero read activity on the database during an update.

Traditional RDBMS use the log only for temporary storage and the permanent home for information is in a tradi-
tional random-access storage structure on disk. In contrast, a log-structured ﬁle system stores data permanently in
the log: there is no other structure on disk. In the log structured storage a database actually performs only one
write against two writes in a traditional RDBMS. The diagram below compares a traditional RDBMS recovery,
which synchronously replays logs from the last checkpoint with an Aurora recovery where recovery is asynchro-
nous and in parallel, applying changes only at the segment level.

Crash at T0 requires Crash at T0 will result in redo
a re-application of the logs being applied to each segment
SQL in the redo log since on demand, in parallel, asynchronously
last checkpoint

Checkpointed Data Redo Log
T0 T0

During the garbage collection process the system cleans the stale records, reclaims the space and merges the latest
versions of records into a smaller number of ﬁles.

Recovery from failure in log structured storage based databases happens almost instantaneously because the data-
base ﬁle is a write ahead log and I/O is greatly reduced. When there's a failure event you can just restart the data-
base, update the pointers and you will be up and running.

www.appsassociates.com

The diagram below represents clearly how recovery is fast in log structured storage. On recovery, simply restart
from the previous checkpoint, and the engine will scan forward in the log and recover any updates written since
the previous checkpoint.

Checkpoint
Writes are cached in memory to provide consistent performance and ﬂushed to the disk asynchronously, improv-
ing performance signiﬁcantly. Backups are continuous and incremental where each new log segment appended is
copied to backup storage as it is written.

Lock Free Concurrency with MVCC (Multi Version Concurrency Control)
Multi version concurrency is adopted in Aurora where data is never updated, only appended; reads return a copy
of the data in the exact state it existed when the transaction started. In an MVCC system with ongoing transactions
there may be several copies of a given item of data, representing its state at diﬀerent points in time. The read
simply ignores updates from others that occur after its transaction began. Using this system, readers never block
each other, nor do they block writers. This is known as optimistic concurrency.

MVCC provides point in time consistent views. Read transactions under MVCC typically use a timestamp or
transaction ID to determine which state of the database to read, and then reads that version of the data. Read and
Write transactions are thus isolated from each other without any need for locking. Writes create a newer version,
while concurrent reads access the older version.

For example Jane requests a piece of data at 10.10, the system is going to look up the data using the pointers in the
index and it is going to give him the most current copy of the data at the time the read operation was executed.
While this read is happening, John has changed the data at 10.11, this is ﬁne for Jane as she got the most up to date
version of the data that was available when she requested it. If Jane now wants to modify a value, the system will
look to see if the pointer has changed since the read. If the answer is yes, the pointer has changed, then there is a
concurrency problem. The system will now go and reread the latest data before it writes Jane’s update. This solves
lot of scaling issues with relational databases’ heavy use of locks.

www.appsassociates.com

Survivable Caches for faster instance crash recovery
AWS has moved the cache out of the database process. It remains warm in the event of a database restart.
This will help to resume fully loaded operations much faster especially during the instance crash recovery.

Instant crash recovery + survivable cache = quick and easy recovery from DB
failures

SQL SQL

Transactions Transactions

Caching Caching Caching

In Summary, the above features of Aurora built on a MySQL engine with the power of Amazon RDS resulted
in making Aurora a fast, aﬀordable and powerful RDBMS.

References:

http://web.stanford.edu/~ouster/cgi-bin/papers/lfs.pdf

AWS Aurora Documentation.

Author Proﬁles
Satyendra Kumar Pasalapudi is an Associate Practice Director in the Infrastructure Managed Services
Team at Apps Associates. He is an Oracle ACE Director and speaker at various conferences like OAUG,
IOUG, and Oracle Open World. He has worked with various clients on Infrastructure, Cloud and AWS
related engagements. He can be contacted at satyendra.pasalapudi@appsassociates.com

ABOUT APPS ASSOCIATES
Apps Associates is a global IT services company that enables organizations to implement, adopt and manage Oracle Applications,
Cloud Infrastructure, Salesforce, Analytics and Integration solutions. We partner with our customers to enable their journey to the
cloud and deliver continuous business improvement with our managed services. We diﬀerentiate through our relentless attention
to delivery excellence and customer care. Apps Associates is an Oracle Platinum Partner, a Premier Consulting Partner with Amazon
Web Services and a Salesforce Silver Consulting Partner.

OUR STRATEGIC PARTNERS

North America (HQ) Europe Asia
www.appsassociates.com © Copyright 2017, V4.0-1115, Apps Associates LLC. All rights reserved www.shipconsole.com

You can also read