The carbon footprint of a distributed cloud storage
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
The carbon footprint of a distributed cloud storage Lorenzo Posania,b,c , Alessio Paccoiab , Marco Moschettinib a Laboratoire de Physique Statistique, École Normale Supérieure, Paris (FR) b Research and tech development, Cubbit srl, Bologna (IT) c to whom correspondence should be addressed, email: lorenzo.posani@cubbit.io Abstract arXiv:1803.06973v2 [cs.DC] 10 Mar 2019 The ICT (Information Communication Technologies) ecosystem is estimated to be responsible, as of today, for 10% of the total worldwide energy demand - equivalent to the combined energy production of Germany and Japan. Cloud storage, mainly operated through large and densely-packed data centers, constitutes a non-negligible part of it. However, since the cloud is a fast-inflating market and the energy-efficiency of data centers is mostly an insensitive issue for the collectivity, its carbon footprint shows no signs of slowing down. In this paper, we analyze a novel paradigm for cloud storage (implemented by cubbit.io) [1, 2], in which data are stored and distributed over a network of p2p-interacting ARM-based single-board devices. We compare Cubbit’ distributed cloud to the traditional centralized solution in terms of environmental footprint and power-usage efficiency. We demonstrate how a distributed architecture is beneficial as regards impact reduction on both data transfers and data storage. Compared to the centralized cloud, the distributed architecture of Cubbit achieves a ∼ 87% reduction of the carbon footprint for data storage and a ∼ 50% reduction for data transfers, providing an example of how a radical paradigm shift can benefit both the final consumer and society as a whole. Keywords: Carbon footprint, Cloud Storage, Distributed, Peer-to-peer 1. Introduction charging costs, an equivalent energy consumption of an additional operating household fridge [5]. Con- Over the last decades, the acknowledgment of cli- trary to the electronics market, however, the foot- mate change by the general audience has driven the print caused by our online life is much less tangible law regulations of most western countries towards and, as a consequence, much less opposed. That, an increasing awareness of consumptions and effi- combined with the fast-increasing trend of online ciency. As a result of this increasingly-tight poli- presence and internet-accessing devices per capita, cies, the average per-device consumption of house- results in an increasing and mostly uncharged en- hold appliances (fridge, cooling, etc.) has consis- vironmental impact that shows no signs of slowing tently decreased in the last 20 years [3]. However, down [6]. To gain an intuition of the impact of there is a mostly underestimated factor that sen- the digital life, one has to consider that any time a sibly contributes to the environmental impact of video is streamed from Youtube servers to an iPad, our daily lives: our use of the Information Commu- or a photo is accessed on Google Photos or Drop- nication Technology (ICT) ecosystem or, in other box, the whole infrastructure that separates the fi- words, our digital life. An estimation of the to- nal user to the corporate data center, and the data tal impact of the ICT ecosystem approaches 1500 center itself, has to be powered to reliably transmit TWh of annual consumption [4, 5], which roughly information in both directions. amounts for 10% of the world energy consumption, more than the total energy production of Germany The process of transmitting information can be and Japan combined. orders of magnitude more demanding, in terms of The computation of the per-capita consumption energy consumption, than storage itself, depending shows that the sole fact of owning and using a on the relative location of the exchanging nodes. In smartphone constitutes, without considering the this document, we analyze, using an adaptation of Preprint submitted to - March 12, 2019
the model of Baliga et al. [7], the energy consump- model of [7]. The analysis relies on the definition of tion of cloud storage services, and compare it to the consumption per bit, which is computed by di- an alternative setup where data is stored on peer- viding the operating power (W) for the total trans- to-peer low-consumption devices located in users’ fer capacity (Gb/s), resulting in a Joule/bit mea- houses and implemented by Cubbit, a technologi- sure, then converted in J/GB. These units are taken cal startup focused on distributed cloud services. from the manufacturer specs sheet, shown in table 1. These quantities are combined with a set of co- efficients that reflect the redundancy of the packet 2. Analysis of centralized cloud consump- transmission, the under-operating regime of the in- tions frastructure and the cooling energy, as well as the multiplicity of some devices in a single transmis- The energy consumption of a cloud storage ser- sion (e.g. two ethernet switches at entry points vice can be divided into two main factors: plus another one inside the data center). The av- 1. the cost of storing the data, i.e. powering and erage distance between core routers on the network cooling the data center (Storage consumption) is estimated of c.a. 800Km. For full description of 2. the cost of sending the data from the user to coefficients and estimations we refer to [7] the server and back (Transfer consumption) While the first can be estimated from techni- Etransfer dcenter = (2) cal specifications of storage equipment, the second Pes Pbg Pg Ppe Pc Pw needs a more detailed analysis that takes into ac- = 6× 3 + + +2 + 18 +4 Ces Cbg Cg Cpe Cc Cw count the public internet infrastructure and the ge- ographical distance between the user and the server. kJ ' 23.9 , For both these estimations we refer to the model of GB Baliga et al. [7], where energy consumption is com- where the prefactor of 6 accounts for redundancy puted accounting for several factors, including the (× 2), cooling and other overheads (× 1.5), and the multiplicity of involved devices, redundancy, cool- fact that todays network typically operate at under ing, overbooking (see below). 50% utilization (× 2); the addends represent, in or- To delineate the calculation, we start from the der, the ethernet switch, the broadband gateway, storage consumption, i.e. the average power, ex- the data center gateway, the provider edge router, pressed in W/TB, necessary to store the payload the core network, and the relay optical fiber trans- in hot storage. We updated the technical specifica- mission. The detailed analysis of pre-factors can be tions with respect to [7], as hard-disk storage capc- found in [7]. Briefly, the factor 3 in the ethernet ity has dramatically improved in the last years. As switch accounts for the two routers involved in the a model for data center rack we consider the HP access to the public internet plus the router located StoreOnce rack [8]. We take the full-operating es- inside the data center; the factor 18 in the core net- timation since it is the one available in the manu- work accounts for an average of 9 hops (2 baseline facturer specification sheet, and we consider full ca- + 7 for the 800km distance between core nodes) of pacity (no under-usage overhead) for the 12 HDD internet packets from source to destination, times in the rack. As an average storage per disk we con- 2 for the redundancy. sider 8 TB, estimated from the mean of HDD di- mensions in blackblaze report [9] (' 7.2 TB/Disk). The consumption per TB is therfore estimated from 3. Analysis of Distributed Cloud consump- the specs resumed in table 1, considering a factor tions 2× for cooling [10] and a factor 2 for redundancy [7]. This estimation gives The distributed architecture of the Cubbit net- work relies on the same public internet infrastruc- ture delineated in the previous chapter. In the dis- 607 W W tributed paradigm, there are two key differences Pstorage dcenter = 2 × 2 × ' 25.3 (1) 12 × 8 TB TB with the server-based cloud storage: Similarly, we compute the transfer energy, ex- 1. The low energy consumption of ARM devices pressed in J/GB, following the public internet (Marvell ESPRESSObin) 2
Equipment Capacity (mean) Consumption Storage rack powering HP SO 3620 12 Disks x 8 TB 607 W (peak) Cubbit Cell Marvell ESPRESSObin / 1 W (peak) Cubbit Cell WD Blue HDD 1.5 TB 1.55 W (peak) Table 1: Equipments - power and capacity of storage equipments Equipment Capacity Consumption Data Center gateway router Juniper MX-960 660 Gb/s 5.1 kW Ethernet Switch Cisco 6509 160 Gb/s 3.8 kW BNG Juniper E320 60 Gb/s 3.3 kW Provider Edge Cisco 12816 160 Gb/s 4.21 kW Core router Cisco CRS-1 640 Gb/s 10.9 kW WDM (800 km) Fujitsu 7700 40 Gb/s 136 W/channel Table 2: Equipments - power and capacity of routing equipments. Data from [7] 2. The geographical proximity between the user can assume an average number of 2 packet hops in and his/her stored data core network routers. This lowers the correspond- We consider a network of Cubbit Cells [11], each ing factor 18 in Eq. 4 to a factor 4, accounting for composed by an ARM-based SBC and a HDD two core hops and the redundancy of the packets (Western Digital Blue) of 1TB or 2TB (estimated on the network (factor 2). For the same reason, average 1.5 TB/disk). Each Cell is located in a the 800km-relay consumption Pw is not taken into user’s house and connected by ISP internet connec- account. With respect to Eq. 4 we also ignore the tion. Files on the cloud are stored with a redun- data-center-specific terms: one ethernet switch and dancy factor of 1.5 (Reed Solomon erasure coding the data center gateway. However, we need to con- with 24+12 redundancy shards [1, 12]). As done for sider an additional BNG, since transfers are per- the centralized cloud, we analyze the consumption formed through p2p connections between endpoints of both storage (W/GB), and transfer (J/GB). located within an ISP network. The transfer energy The Marvell ESPRESSObin has a single-core per GB is therefore computed as peak consumption of ∼ 1W [13], while the embed- ded WD Blue HDD has a peak consumption of 1.4 Pes Pbg Ppe Pc W (1 TB) and 1.7 W (2 TB). We here assume that Etransfer cubbit = 6 × 2 + 2 + 2 + 4 half of the network is composed by 1TB devices and Ces Cbg Cpe Cc half by 2TB devices, giving an average storage of (4) 1.5TB for an average peak consumption of 1.55W. kJ ' 11.9 . The storage energy consumption of the Cubbit net- GB work is therefore computed as 1 + 1.55 W W 4. Comparison between centralized cloud Pstorage cubbit = 1.5 × ' 2.55 . (3) and Cubbit distributed cloud 1.5 TB TB In Cubbit, shards of the distributed payloads are preferably distributed in Cubbit Cells that are lo- The reduction of carbon footprint of Cubbit com- cated in geographical proximity of the user, since pared to centralized solutions can be computed by the distribution of the shards is controlled by the comparing the storage power and the transfer en- AI optimization routines of a coordinator server[1]. ergy for typical use case, such as backup plans and We therefore consider the scenario where data is frequent access of, for example, a web-hosted video. stored in nodes at an average distance of 80 km By comparing the power needed to store 1 TB on from the user’s access point. In this scenario, we the centralized cloud with the corresponding value 3
for the Cubbit cloud we find 105 DCenter emissions W 104 Cubbit emissions storage storage Saved emissions CO2 emissions (kg/year) ∆P storage = Pdcenter − Pcubbit ' 22.75 , (5) TB 103 which roughly corresponds to a 87% reduction of 102 overall storage consumption: 101 ∆P storage storage =' 0.90 . Pserver (6) 100 Similarly, the difference in terms of transfer en- 10 1 ergy per GB is 10 2 10 3 10 2 10 1 100 101 102 103 ∆E transfer = transfer − Edcenter transfer Ecubbit (7) Stored cloud space (TB) kJ kWh ' 12.0 = 3.33 , GB TB Figure 1: Comparison between centralized (DCenter) and distributed (Cubbit) clouds in terms of annual carbon emis- which corresponds to a 50% reduction of the en- sions per TB of stored data. ergy needed to transfer data from the cloud to the user, and back. average 10 TB of data per day (e.g. 10,000 visual- 4.1. Backup izations of 100 MB each) the saved energy per year would be A backup service hosted on the cloud is char- acterized by large volumes that are not frequently accessed. In the context of the carbon footprint, ∆E(10 TB streaming) = the consumption of a backup plan will, therefore, be dominated by the storage term. If we consider 10 TB × ∆P storage × 365 × 24h a storage plan for a professional backup of 10 TB, + 365 × ∆E transfer × 10 TB with very small daily access, we find that the total ' 14, 100 kWh , (9) energy saved in a year is which roughly corresponds to -7,000 kgCO2 emit- ted per year. Note that these computations assume ∆E(10 TB backup) = (8) that streaming are broadcast to a local audience. storage 10 TB × ∆P × 365 × 24h ' 1992 kWh . While this might be the case for university data, local news, or targeted marketing, it has a limited By considering a rough factor of 0.5 KgCO2 for range of applicability that has to be taken into ac- each kWh of consumed energy[14], choosing a dis- count. tributed cloud over a centralized one would corre- spond, for such a backup plan, to a reduced carbon 4.3. Large scale emission of c.a. -1000 kgCO2/year. On a data cen- Finally, if we speculate about the overall data ter scale, the reduction of kgCO2 emitted per year volume of a global consumer cloud storage service for a PB (1000 TB) of cloud storage would therefore like, for example, Dropbox or Google, values rise correspond to c.a. -100,000 kg/year. dramatically. Such interpolations have to be taken with due caution, since estimations are based on 4.2. Streaming undisclosed values. For the sake of speculation, we The reduction in consumed energy and, conse- consider an use base of c.a. 600 millions users. The quently, in carbon emission, is significantly larger last disclosed conversion rate from free (2 GB) to when considering large volumes of data transfers. premium (2 TB) is around 3%. This results in a For example, if we consider a regional newspaper theoretical data volume of ca. 37.2 · 106 TB of stor- service hosting 10 TB of data and streaming, on age. Considering a factor 5 due to overbooking, it 4
[4] G. International, How Clean is Your Cloud? Catalysing 106 Dcenter emissions an energy revolution, Technical Report, 2012. Cubbit emissions [5] M. P. Mills, The cloud begins with coal, Digital Power Group (2013). Saved emissions CO2 emissions (kg/year) 105 [6] Cisco vni global fixed and mobile internet traffic fore- casts, https://www.cisco.com/c/en/us/solutions/ 104 service-provider/visual-networking-index-vni/ index.html, web page. [7] J. Baliga, R. W. Ayre, K. Hinton, R. S. Tucker, Green 103 cloud computing: Balancing energy in processing, stor- age, and transport, Proceedings of the IEEE 99 (2011) 102 149–167. [8] Hpe storeonce data sheet, https://www.hpe.com/, web 101 page. [9] Blackblaze report, https://www.backblaze.com/blog/ 100 2018-hard-drive-failure-rates/, web page. [10] Report on cooling costs in data centers, 10 3 10 2 10 1 100 101 102 103 https://www.electronics-cooling.com/2007/02/ Transferred data (TB/day) in-the-data-center-power-and-cooling-costs-more-than-the-it-eq web page. [11] Cubbit website, https://www.cubbit.io/technology, web page. Figure 2: Comparison between centralized (DCenter) and [12] M. Monti, S. Rasmussen, M. Moschettini, L. Posani, An distributed (Cubbit) clouds in terms of annual carbon emis- alternative information plan, Technical Report, Work- sions per TB of daily streamed data. ing paper Santa Fe Institute, 2017. [13] Marvell espressobin specs sheet, https://www.http:// espressobin.net/, web page. gives an estimation of 7.4 106 TB of effective cloud [14] Carbon trust energy and carbon conversions, http:// storage. We can make a conservative estimation www.knowlton.org.uk/wp-content/filesEnergy, web that each user transfers, on average, 50 MB of files page. from/to the cloud, which implies a daily transfer volume of c.a. 190 TB. If we plug these estimations in our model, we obtain a total saved annual en- ergy, using a distributed architecture rather than a centralized one, of ∼ 1.5 · 109 kWh, equivalent to saving carbon emissions in the order of 700 millions kgCO2 per year. Acknowledgements The authors are grateful to S. Baldi (Cynny Space srl), A. Rovai and A. Albani for valuable comments and discussion. Bibliography References [1] L. Posani, M. Moschettini, A. Paccoia, Cubbit: a dis- tributed, crowd-sourced cloud storage service, Invited Talk at CERN Workshop on Cloud Storage Services (CS3 2018) (2018). [2] L. Posani, M. Moschettini, A. Paccoia, Cubbit, the dis- tributed cloud, Invited Talk at CERN, CNR Workshop on Cloud Storage Services (CS3 2019) (2019). [3] Energy efficiency and energy consump- tion in the household sector, https://www. eea.europa.eu/data-and-maps/indicators/ energy-efficiency-and-energy-consumption-5/ assessment, web page. 5
You can also read