Amazon Aurora A Fast, Aordable and Powerful RDBMS - White Paper - Apps Associates
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
White Paper Amazon Aurora A Fast, Affordable and Powerful RDBMS © Copyright 2018, Apps Associates LLC. All rights reserved.
www.appsassociates.com TABLE OF CONTENTS Introduction 3 Multi-Tenant Logging and Storage Layer with Service-Oriented Architecture 3 High Availability with Self-Healing and Fault Tolerance 4 The adoption of Log Structured Storage in Aurora 4 Lock Free Concurrency with MVCC (Multi Version Concurrency Control) 6 Survivable Caches for Faster Instance Crash Recovery 7 © Copyright 2018, Apps Associates LLC. All rights reserved.
www.appsassociates.com A Fast, Affordable and Powerful RDBMS Introduction Amazon Aurora is a relational database engine that combines the speed and reliability of high-end commer- cial databases with the simplicity and cost-effectiveness of open source databases. This whitepaper will describe the unique features of Aurora that make it a highly efficient, scalable, reliable and self-healing relational database. It will also explain how Aurora is faster than MySQL. Multi-tenant Logging and Storage Layer with Service Oriented Architecture Amazon has done a great job in applying service oriented architecture to the database. It has built a storage service similar to that of the Elastic Block Storage (EBS) service used by EC2 (Elastic Cloud Compute) server instances. AWS has created a service-oriented architecture for accessing storage from the Aurora Database and has moved the logging and storage layer into a multi-tenant, scale-out database-optimized storage service. Data Plane Control Plane VPC SQL Transactions Caching Amazon DynamoDB VPC Logging + Storage Amazon SWF Amazon S3 Amazon Route 53 Amazon has Integrated Aurora with other AWS services like Amazon EC2, Amazon VPC (Virtual Private Cloud), Amazon DynamoDB, Amazon SWF, and Amazon Route 53 for control plane operations. Control plane is the part of the service that controls the database itself. Its capabilities include provisioning of the system and any metadata that Aurora has to keep about the system. The end point information of the RDS instance is controlled by DNS via Route53. The storage is integrated with Amazon S3 for continuous incremental backups with 99.999999999% dura- bility Page | 3 © Copyright 2018, Apps Associates LLC. All rights reserved.
www.appsassociates.com High Availability with Self-healing and fault tolerant AZ 1 AZ 2 AZ 3 AZ 1 AZ 2 AZ 3 SQL SQL Transactions Transactions Caching Caching Read availability Read and write availability High Availability in Aurora is automatically provided without additional cost to the customer. By default, Aurora scales across three availability zones with two copies in each region, making a total of six copies. To ensure data consistency AWS uses something called the four quorum model. This requires that at least four of the six copies successfully complete a write command when a change is made to your data. When reading data, Aurora will consider a read to be consistent if there is agreement from at least three of the six copies. Also, some really nice logic has been implemented; if an Availability Zone(AZ) goes down Aurora will automatically roll over to the three or four quorum model which means your write and reads don't completely fail if a single AZ fails. AWS is ensuring that recovery will be fast due to log structured storage running on SSD’s (solid state drives) that keep latencies very low, reducing disk seek time. The most signifi- cant advantage of Amazon RDS Aurora is that the system does automatic error detection, and repair. The adoption of Log Structured Storage in Aurora One of the major breakthrough features of Aurora which makes it unique in the RDBMS space is the log structured storage of data. Log structure delivers high write throughput and simplifies hot backups, lock free concurrency, recovery and snapshots. This helps Aurora to log-structured file system speed up file writing operations as well as crash recovery. Log structured storage is a well-researched concept in computer science and something very familiar for the folks working with NoSQL products like HBASE and Cassandra. With Log Struc- tured Storage, new records are append to the storage, and exist- ing records are never updated. B-Tree indexes hold pointers to the latest version of a record. On a periodic basis the stale records are removed through a garbage collection process. conventional file systems Page | 4 © Copyright 2018, Apps Associates LLC. All rights reserved.
www.appsassociates.com Data Inode Dir Meta- Data Disk The log structured storage avoids reading before writing. Read-before-write, especially in a large distributed system, can produce bottlenecks in read performance. During updates the system doesn’t waste time seeking the record on the disk to find where the original value was stored and write back to that page. Instead, it simply appends it to the end of the file and uses b-tree indexes to hold pointers to the latest version of the record. So when you write a new block of storage or a new segment of storage it updates the index to point to the most recent version of that data. In this way there is zero read activity on the database during an update. Traditional RDBMS use the log only for temporary storage and the permanent home for information is in a tradi- tional random-access storage structure on disk. In contrast, a log-structured file system stores data permanently in the log: there is no other structure on disk. In the log structured storage a database actually performs only one write against two writes in a traditional RDBMS. The diagram below compares a traditional RDBMS recovery, which synchronously replays logs from the last checkpoint with an Aurora recovery where recovery is asynchro- nous and in parallel, applying changes only at the segment level. Crash at T0 requires Crash at T0 will result in redo a re-application of the logs being applied to each segment SQL in the redo log since on demand, in parallel, asynchronously last checkpoint Checkpointed Data Redo Log T0 T0 During the garbage collection process the system cleans the stale records, reclaims the space and merges the latest versions of records into a smaller number of files. Recovery from failure in log structured storage based databases happens almost instantaneously because the data- base file is a write ahead log and I/O is greatly reduced. When there's a failure event you can just restart the data- base, update the pointers and you will be up and running. Page | 5 © Copyright 2018, Apps Associates LLC. All rights reserved.
www.appsassociates.com The diagram below represents clearly how recovery is fast in log structured storage. On recovery, simply restart from the previous checkpoint, and the engine will scan forward in the log and recover any updates written since the previous checkpoint. Checkpoint Writes are cached in memory to provide consistent performance and flushed to the disk asynchronously, improv- ing performance significantly. Backups are continuous and incremental where each new log segment appended is copied to backup storage as it is written. Lock Free Concurrency with MVCC (Multi Version Concurrency Control) Multi version concurrency is adopted in Aurora where data is never updated, only appended; reads return a copy of the data in the exact state it existed when the transaction started. In an MVCC system with ongoing transactions there may be several copies of a given item of data, representing its state at different points in time. The read simply ignores updates from others that occur after its transaction began. Using this system, readers never block each other, nor do they block writers. This is known as optimistic concurrency. MVCC provides point in time consistent views. Read transactions under MVCC typically use a timestamp or transaction ID to determine which state of the database to read, and then reads that version of the data. Read and Write transactions are thus isolated from each other without any need for locking. Writes create a newer version, while concurrent reads access the older version. For example Jane requests a piece of data at 10.10, the system is going to look up the data using the pointers in the index and it is going to give him the most current copy of the data at the time the read operation was executed. While this read is happening, John has changed the data at 10.11, this is fine for Jane as she got the most up to date version of the data that was available when she requested it. If Jane now wants to modify a value, the system will look to see if the pointer has changed since the read. If the answer is yes, the pointer has changed, then there is a concurrency problem. The system will now go and reread the latest data before it writes Jane’s update. This solves lot of scaling issues with relational databases’ heavy use of locks. Page | 6 © Copyright 2018, Apps Associates LLC. All rights reserved.
www.appsassociates.com Survivable Caches for faster instance crash recovery AWS has moved the cache out of the database process. It remains warm in the event of a database restart. This will help to resume fully loaded operations much faster especially during the instance crash recovery. Instant crash recovery + survivable cache = quick and easy recovery from DB failures SQL SQL Transactions Transactions Caching Caching Caching In Summary, the above features of Aurora built on a MySQL engine with the power of Amazon RDS resulted in making Aurora a fast, affordable and powerful RDBMS. References: http://web.stanford.edu/~ouster/cgi-bin/papers/lfs.pdf AWS Aurora Documentation. Author Profiles Satyendra Kumar Pasalapudi is an Associate Practice Director in the Infrastructure Managed Services Team at Apps Associates. He is an Oracle ACE Director and speaker at various conferences like OAUG, IOUG, and Oracle Open World. He has worked with various clients on Infrastructure, Cloud and AWS related engagements. He can be contacted at satyendra.pasalapudi@appsassociates.com ABOUT APPS ASSOCIATES Apps Associates is a global IT services company that enables organizations to implement, adopt and manage Oracle Applications, Cloud Infrastructure, Salesforce, Analytics and Integration solutions. We partner with our customers to enable their journey to the cloud and deliver continuous business improvement with our managed services. We differentiate through our relentless attention to delivery excellence and customer care. Apps Associates is an Oracle Platinum Partner, a Premier Consulting Partner with Amazon Web Services and a Salesforce Silver Consulting Partner. OUR STRATEGIC PARTNERS North America (HQ) Europe Asia www.appsassociates.com © Copyright 2017, V4.0-1115, Apps Associates LLC. All rights reserved www.shipconsole.com
You can also read