Hardware Based Compression in Ceph OSD with BTRFS - SNIA
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Hardware Based Compression in Ceph OSD with BTRFS Weigang Li (weigang.li@intel.com) Brian Will (brian.will@intel.com) Praveen Mosur (praveen.mosur@intel.com) Edward Pullin (edward.j.pullin@intel.com) Intel Corporation 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Agenda Motivation Data compression: value vs. cost Hardware based compression in Ceph OSD Benefit of hardware acceleration Compression algorithms Compression in BTRFS & Ceph PoC implementation Benchmark configuration & result 2 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Compression: Why More Data 10x Data Growth from 2013 to 20201 • Digital universe doubling every two years Compression can save • Data from the Internet of Things, will grow 5x • % of data that is analyzed grows from 22% to 37% storage capacity 1 Source: April 2014, EMC* Digital Universe with Research & Analysis by IDC* 3 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Compression: Cost 90.00 60% Compress 1GB Calgary 80.00 51% Corpus* file on one CPU 50% core (HT). 70.00 Compression ratio: less 60.00 40% 38% is better sec 50.00 33% cRatio = compressed size / cRatio % 30% original size 40.00 28% CPU intensive, better 30.00 20% compression ratio 20.00 requires more CPU time. 10% 10.00 Source as of August 2016: Intel internal measurements with dual E5- 2699 v3 (18C, 2.3GHz, 145W), HT & Turbo Enabled, Fedora 22 64 bit, DDR4-128GB 0.00 0% Software and workloads used in performance tests may have been lzo gzip-1 gzip-6 bzip2 optimized for performance only on Intel microprocessors. Any change to any of those factors may cause the results to vary. You should consult real (s) 6.37 22.75 55.15 83.74 other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product user (s) 4.07 22.09 54.51 83.18 when combined with other products. Any difference in system hardware or software design or configuration may affect actual performance. sys (s) 0.79 0.64 0.59 0.52 Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system cRatio % 51% 38% 33% 28% hardware or software design or configuration may affect actual performance. For more information go to Compression tool http://www.intel.com/performance 4 *The Calgary Corpus is a collection of text and binary data files, commonly used for comparing data compression algorithms. 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Hardware Acceleration - Intel® QuickAssist Technology • Security (symmetric encryption Bulk and authentication) for data in Cryptography flight and at rest • Secure Key Establishment Chipset PCI Express* SoC Public Key (asymmetric encryption, digital Plugin Card Cryptography signatures, key exchange) Connects to Connects to Connects to CPU CPU via off- CPU via on- via on-chip board PCI board PCI interconnect Express* lanes Express* lanes (slot) • Lossless data compression for Compression data in flight and at rest Intel® QuickAssist Technology integrates hardware acceleration of compute intensive workloads (specifically, cryptography & compression) on Intel® Architecture Platforms 5 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Benefit of Hardware Acceleration 90.00 60% 80.00 51% 50% 70.00 40% 38% 60.00 38% 40% 33% sec 50.00 cRatio % 30% 40.00 Less CPU load, 28% better compression 30.00 ratio 20% 20.00 10% 10.00 0.00 0% lzo accel-1 * accel-6 ** gzip-1 gzip-6 bzip2 real (s) 6.37 4.01 8.01 22.75 55.15 83.74 user (s) 4.07 0.49 0.45 22.09 54.51 83.18 sys (s) 0.79 1.31 1.22 0.64 0.59 0.52 cRatio % 51% 40% 38% 38% 33% 28% Compression tool * Intel® QuickAssist Technology DH8955 level-1 Compress 1GB Calgary Corpus File ** Intel® QuickAssist Technology DH8955 level-6 Source as of August 2016: Intel internal measurements with dual E5-2699 v3 (18C, 2.3GHz, 145W), HT & Turbo Enabled, Fedora 22 64 bit, 1 x DH8955 adaptor, DDR4-128GB Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. Any difference in system hardware or software design or 6 configuration may affect actual performance. Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. For more information go to http://www.intel.com/performance 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
BTRFS Introduction Copy on Write (CoW) filesystem for Linux. “Has the correct feature set and roadmap to serve Ceph in the long-term, and is recommended for testing, development, and any non-critical deployments… This compelling list of features makes btrfs the ideal choice for Ceph clusters”* Native compression support. Mount with “compress” or “compress-force”. ZLIB / LZO supported. Compress up to 128KB each time. 7 * http://docs.ceph.com/docs/hammer/rados/configuration/filesystem-recommendations/ 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Compression Algorithm BTRFS currently supports LZO and ZLIB: The Lempel-Ziv-Oberhumer (LZO) compression is a portable and lossless compression library that focuses on compression speed rather than data compression ratio. ZLIB provides lossless data compression based on the DEFLATE compression algorithm. LZ77 + Huffman coding Good compression ratio, slow Intel® QuickAssist Technology supports: DEFLATE: LZ77 compression followed by Huffman coding with GZIP or ZLIB header. 8 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Hardware Compression in BTRFS Application BTRFS compress the page buffers before user sys writing to the storage call kernel media. VFS Page Cache LKCF select hardware BTRFS LZO engine for compression. ZLIB Data compressed by hardware can be de- Linux Kernel Crypto API Flush compressed by software async compress library, and vise versa. Job DONE Intel®QuickAssist Technology Driver Storage Intel® QuickAssist Technology Media 9 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Hardware Compression in BTRFS (Cont.) btrfs_compress_pages BTRFS submit “async” compression job with sg-list containing up to 32 x 4K zlib_compress_pages_async pages. BTRFS compression thread is put to sleep when the Uncmpressed Data Cmpressed Data “async” compression API is 4K 4K … 4K 4K 4K 4K called. BTRFS compression thread Input Buffer (up to128KB) Output buffer (pre-allocated) is woken up when sleep… return hardware complete the async compress compression job. Hardware can be fully Callback Linux Kernel Crypto API utilized when multiple DMA DMA BTRFS compression input output interrupt threads run in-parallel. Intel® QuickAssist Technology 10 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Compression in Ceph OSD Compute Ceph is a distributed object store and file Compute Node system designed to provide excellent Node performance, reliability and scalability, but Network it doesn’t support native compression currently. Ceph OSD with BTRFS can support build- OSD OSD in compression: Transparent, real-time compression in BTRFS the filesystem level. Reduce the amount of data written to local disk, and reduce disk I/O. Disk Disk Hardware accelerator can be plugged in to free up OSDs’ CPU. Intel® DH8955 No benefit to the network bandwidth. 11 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Benchmark - Hardware Setup FIO Ceph Cluster Linux Linux CPU 0 CPU 1 CPU 0 CPU 1 DDR4 128GB Xeon(R) CPU Xeon(R) CPU Xeon(R) CPU Xeon(R) CPU HBA E5-2699 v3 E5-2699 v3 E5-2699 v3 E5-2699 v3 LSI00300 (Haswell) @ (Haswell) @ (Haswell) @ (Haswell) @ 2.30GHz 2.30GHz 2.30GHz 2.30GHz PCIe 40Gb NIC PCIe PCIe PCIe DDR4 JBOD 64GB NVMe NVMe SSD x 12 Intel® Intel® DH8955 DH8955 Client plug-in card plug-in card Server 12 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Benchmark - Ceph Configuration MON Deploy Ceph OSD on top of BTRFS as backend OSD-1 OSD-3 OSD-23 OSD-2 OSD-4 OSD-24 filesystem. Deploy 2 OSDs on 1 SSD BTRFS 24x OSDs in total. … 2x NVMe for journal. Data written to Ceph OSD SSD-1 SSD-2 SSD-12 is compressed by Intel® QuickAssist Technology NVMe-1 (Intel® DH8955 plug-in Journal Intel® Intel® card). NVMe-2 DH8955 DH8955 Journal 13 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Test Methodology FIO thread0 Start 64 FIO threads in FIO FIOthread thread client, each write / read CephFS RBD Client 2GB file to / from Ceph cluster through network. LIBRADOS Drop caches before tests. For write tests, all files are synchronized to OSDs’ disk RADOS before tests complete. OSD The average CPU load, OSD CEPH OSD disk utilization in Ceph OSDs and FIO throughput MON are measured. 14 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Benchmark Configuration Details Client CPU 2 x Intel® Xeon CPU E5-2699 v3 (Haswell) @ 2.30GHz (36-core 72-threads) Memory 64GB Network 40GbE, jumbo frame: MTU=8000 Test Tool FIO 2.1.2, engine=libaio, bs=64KB, 64 threads Ceph Cluster CPU 2 x Intel (R) Xeon CPU E5-2699 v3 (Haswell) @ 2.30GHz (36-core 72-threads) Memory 128GB Network 40GbE, jumbo frame: MTU=8000 HBA HBA LSI00300 OS Fedora 22 (Kernel 4.1.3) OSD 24 x OSD, 2 on one SSD (S3700), no-replica 2 x NVMe (P3700) for journal 2400 pgs Accelerator Intel® QuickAssist Technology, 2 x Intel® QuickAssist Adapters 8955 Dynamic compression Level-1 BTRFS ZLIB S/W ZLIB Level-3 15 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Sequential Write 120% 3500 2910 2960 100% 3003 3000 2500 80% 60% disk saving, with 2000 BW CPU Util (%) 60% minimal CPU overhead (MB/s) cRatio (%) 1500 40% 1157 1000 20% 500 0% 0 off accel * lzo zlib-3 Cpu Util (%) 13.62% 15.25% 28.30% 90.95% cRatio (%) 100% 40% 50% 36% Bandwidth(MB/s) 2910 2960 3003 1157 Source as of August 2016: Intel internal 120 measurements with dual E5-2699 v3 (18C, 2.3GHz, 145W), HT & Turbo Enabled, Fedora 22 64 bit, 100 kernel 4.1.3, 2 x DH8955 adaptor, DDR4-128GB Software and workloads used in performance tests cpu util % 80 off may have been optimized for performance only on Intel microprocessors. Any change to any of those 60 factors may cause the results to vary. You should accel * consult other information and performance tests to 40 assist you in fully evaluating your contemplated purchases, including the performance of that product 20 lzo when combined with other products. Any difference in system hardware or software design or 0 zlib-3 configuration may affect actual performance. Results have been estimated based on internal Intel analysis 1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61 65 69 73 77 81 85 89 93 97 101105 and are provided for informational purposes only. Any difference in system hardware or software Time (seconds) design or configuration may affect actual * Intel® QuickAssist Technology DH8955 level-1 performance. For more information go to 16 http://www.intel.com/performance ** Dataset is random data generated by FIO 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Sequential Read 30% 3500 3042 25% 2915 3000 2913 20% Minimal CPU overhead 2500 2557 CPU Util (%) 15% for decompression BW 2000 (MB/s) 10% 1500 5% 0% 1000 off accel * lzo zlib-3 Cpu Util (%) 7.33% 8.76% 11.81% 26.20% Bandwidth(MB/s) 2557 2915 3042 2913 Source as of August 2016: Intel internal 40 measurements with dual E5-2699 v3 (18C, 2.3GHz, 145W), HT & Turbo Enabled, Fedora 22 64 bit, kernel 4.1.3, 2 x DH8955 adaptor, DDR4-128GB 30 Software and workloads used in performance tests cpu util % may have been optimized for performance only on off Intel microprocessors. Any change to any of those 20 factors may cause the results to vary. You should accel * consult other information and performance tests to assist you in fully evaluating your contemplated 10 lzo purchases, including the performance of that product when combined with other products. Any difference in system hardware or software design or 0 zlib-3 configuration may affect actual performance. Results have been estimated based on internal Intel analysis 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940 and are provided for informational purposes only. Any difference in system hardware or software Time (seconds) design or configuration may affect actual performance. For more information go to http://www.intel.com/performance 17 * Intel® QuickAssist Technology DH8955 level-1 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Key Takeaways Data compression can save disk IO and disk utilization. Data compression is CPU intensive, getting better compression ratio requires more CPU cost. Hardware offloading method can greatly reduce CPU cost, optimize disk utilization & IO in the Storage infrastructure. Filesystem level compression in OSD is transparent to the Ceph software stack. 18 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
References DEFLATE Compressed Data Format Specification version 1.3 http://tools.ietf.org/html/rfc1951 BTRFS: https://btrfs.wiki.kernel.org Ceph: http://ceph.com/ For more information on Intel® QuickAssist Technology & Intel® QuickAssist Software Solutions can be found here: Software Package and engine are available at 01.org: Intel QuickAssist Technology | 01.org For more details on Intel® QuickAssist Technology visit: http://www.intel.com/quickassist Intel Network Builders: https://networkbuilders.intel.com/ecosystem Intel®QuickAssist Technology Storage Testimonials IBM v7000Z w/QuickAssist:http://www- 03.ibm.com/systems/storage/disk/storwize_v7000/overview.html https://builders.intel.com/docs/networkbuilders/Accelerating-data-economics-IBM-flashSystem-and-Intel-quick-assist- technology.pdf Intel’s QuickAssist Adapter for Servers: http://ark.intel.com/products/79483/Intel- QuickAssist-Adapter-8950 19 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information. The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: http://www.intel.com/design/literature.htm%20 Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit Intel Performance Benchmark Limitations All products, computer systems, dates and figures specified are preliminary based on current expectations, and are subject to change without notice. Intel, Intel logo, Intel Core, Intel Inside, Intel Inside logo, Intel Ethernet, Intel QuickAssist Technology, Intel Flow Director, Intel Solid State Drives, Intel Intelligent Storage Acceleration Library, Itanium,, Xeon, and Xeon Inside are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. 64-bit computing on Intel architecture requires a computer system with a processor, chipset, BIOS, operating system, device drivers and applications enabled for Intel® 64 architecture. Performance will vary depending on your hardware and software configurations. Consult with your system vendor for more information. No computer system can provide absolute security under all conditions. Intel® Trusted Execution Technology is a security technology under development by Intel and requires for operation a computer system with Intel® Virtualization Technology, an Intel Trusted Execution Technology-enabled processor, chipset, BIOS, Authenticated Code Modules, and an Intel or other compatible measured virtual machine monitor. In addition, Intel Trusted Execution Technology requires the system to contain a TPMv1.2 as defined by the Trusted Computing Group and specific software for some uses. See http://www.intel.com/technology/security/ for more information. Intel® Virtualization Technology requires a computer system with an enabled Intel® processor, BIOS, virtual machine monitor (VMM) and, for some uses, certain platform software enabled for it. Functionality, performance or other benefits will vary depending on hardware and software configurations and may require a BIOS update. Software applications may not be compatible with all operating systems. Please check with your application vendor. * Other names and brands may be claimed as the property of others. Other vendors are listed by Intel as a convenience to Intel's general customer base, but Intel does not make any representations or warranties whatsoever regarding quality, reliability, functionality, or compatibility of these devices. This list and/or these devices may be subject to change without notice. Copyright © 2016, Intel Corporation. All rights reserved. 20 2016 Storage Developer Conference. © Intel Corp. All Rights Reserved.
You can also read