Energy Analysis and Optimization for Distributed Data Centers under Electrical Load Shedding
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Energy Analysis and Optimization for Distributed Data Centers under Electrical Load Shedding by Linfeng Shen B.Eng., Beijing University of Posts and Telecommunications, 2019 Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in the School of Computing Science Faculty of Applied Sciences © Linfeng Shen 2021 SIMON FRASER UNIVERSITY Summer 2021 Copyright in this work is held by the author. Please ensure that any reproduction or re-use is done in accordance with the relevant national copyright legislation.
Declaration of Committee Name: Linfeng Shen Degree: Master of Science Thesis title: Energy Analysis and Optimization for Distributed Data Centers under Electrical Load Shedding Committee: Chair: Ouldooz Baghban Karimi Lecturer, School of Computing Science Jiangchuan Liu Senior Supervisor Professor, School of Computing Science Qianping Gu Supervisor Professor, School of Computing Science Jie Liang Examiner Professor, School of Engineering Science ii
Abstract The number and scales of data centers have significantly increased in the current digital world. The distributed data centers are standing out as a promising solution due to the development of modern applications which need a massive amount of computation resources and strict response requirements. Their reliability and availability heavily depend on the electrical power supply. Most of the data centers are equipped with battery groups as backup power in case of electrical load shedding or power outage due to severe weather or human-driven factors. The limited numbers and degradation of batteries, however, can hardly support the servers to finish all the jobs on time. In this thesis, we divide all the workload in data centers into web jobs and batch jobs. We develop a battery allocation and workload migration framework to guarantee web jobs are never interrupted and try to minimize the waiting time of batch jobs simultaneously. Our extensive evaluations show that our battery allocation and workload migration results can guarantee all the web jobs uninterrupted and minimize the average waiting time of batch jobs within a limited overall cost compared to the current practical allocation. Keywords: Distributed data center; Load shedding; Backup battery; Workload migration iii
Dedication This thesis is for my family and all the people who gave me support during my master’s degree study. iv
Acknowledgements First and foremost, I would like to express my greatest gratitude to my senior supervisor, Dr. Jiangchuan Liu, for his constant support and insightful directions throughout my master study. He teaches me how to become a researcher from a non-experienced starter. His significant enlightenment, guidance and encouragement on my research are invaluable for the success of my study in the past two years. He also sets a role model as a researcher and a professor for me with his excellent academic contributions and humble personality. This certainly inspires me to continue pursuing a Ph.D. degree. I also thank Dr. Qianping Gu and Dr. Jie Liang for serving on my thesis examining committee. I would also thank Dr. Ouldooz Baghban Karimi for chairing my thesis defense. Besides, I must express my gratitude to Dr. Feng Wang, Dr. Yifei Zhu, Dr Xiaoyi Fan and Dr. Fangxin Wang for helping me with the research as well as many other problems during my study. I also feel lucky and grateful to have the experience of studying together with Mr. Yutao Huang, Mr. Yuchi Chen, Mr. Jia Zhao, Mr. You Luo, Mr. Xiangxiang Wang, Ms. Miao Zhang and other friends at Simon Fraser University. Finally, I would like to thank my family and girlfriend for their consistent support. Words are powerless to express my gratitude. v
Table of Contents Declaration of Committee ii Abstract iii Dedication iv Acknowledgements v Table of Contents vi List of Tables viii List of Figures ix 1 Introduction 1 1.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.1.1 Energy Analysis and Energy Efficient Approach . . . . . . . . . . . . 3 1.1.2 Workload Management . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Workload Migration under Electrical Load Shedding 6 2.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Workload, Power and Heat . . . . . . . . . . . . . . . . . . . . . . . 7 2.1.2 Task-Arriving and Server-Service Rates . . . . . . . . . . . . . . . . 10 2.1.3 Workload Migration . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 Approximation of Average Waiting Time . . . . . . . . . . . . . . . . 12 2.3.2 Problem Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.3 Algorithm Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 Battery Allocation against Electrical Load Shedding 18 3.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 vi
3.1.1 Web Jobs and Batch Jobs . . . . . . . . . . . . . . . . . . . . . . . . 20 3.1.2 Backup Batteries and Servers . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3.1 Battery Allocation Framework . . . . . . . . . . . . . . . . . . . . . 23 4 Performance Evaluation 27 4.1 Evaluation of Workload Migration Algorithm . . . . . . . . . . . . . . . . . 27 4.1.1 Simulation Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.2 Evaluation of Battery Allocation Framework . . . . . . . . . . . . . . . . . 31 4.2.1 Experiment Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.2 Evaluation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5 Conclusion and Discussion 34 Bibliography 35 vii
List of Tables Table 2.1 Notations 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Table 3.1 Notations 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Table 4.1 Server configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 viii
List of Figures Figure 1.1 Various deployment types of distributed data centers . . . . . . . . 2 Figure 2.1 Overall system model of distributed data centers . . . . . . . . . . 7 Figure 2.2 Typical air cooling system in distributed DCs . . . . . . . . . . . . 9 Figure 2.3 Data flow in the model . . . . . . . . . . . . . . . . . . . . . . . . . 10 Figure 3.1 The backup batteries and the monitor system. . . . . . . . . . . . . 18 Figure 3.2 Typical discharge voltage versus time characteristics . . . . . . . . 19 Figure 3.3 Framework of our system model . . . . . . . . . . . . . . . . . . . . 20 Figure 3.4 Framework of our solution . . . . . . . . . . . . . . . . . . . . . . . 23 Figure 4.1 Relationship between utilization and different power constraints sit- uation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Figure 4.2 Response time with different tolerance parameter . . . . . . . . . . 29 Figure 4.3 Runtime with different tolerance parameter . . . . . . . . . . . . . 29 Figure 4.4 Comparison of overall response time . . . . . . . . . . . . . . . . . . 30 Figure 4.5 Comparison of overall migration cost . . . . . . . . . . . . . . . . . 31 Figure 4.6 Various metrics for a typical data center when equipped with differ- ent number of battery groups . . . . . . . . . . . . . . . . . . . . . 32 Figure 4.7 The comparison of cost and average waiting time in original alloca- tion and re-allocation based on our method . . . . . . . . . . . . . 32 ix
Chapter 1 Introduction Recent years have witnessed the rapid development of cloud computing. It brings tremen- dous and pay-as-you-go computation resources to fill the gap between the computing de- mands of ever-increasing mobile devices and the limited onboard computation power [20], where data centers (DCs), as the core component of cloud computing, have become the brain of nowadays digital world to provide storage, computation, and management for those inter- connected devices over the high-speed Internet. On the other hand, many emerging modern applications, such as autonomous driving, VR/AR, video analytics, etc., require not only a massive amount of computation resources but also ultra-low delay to guarantee the fast response, rendering the conventional centralized DC-based task offloading ineffective. To this end, distributed data centers stand out as a promising solution. As shown in Fig. 1.1, distributed DCs are usually server clusters deployed at the edge of the Internet on different scales, which handle the various tasks from numerous nearby sources and provide instant feedbacks with lower operation costs and delay compared to centralized data centers. Different from centralized DCs, which are usually well attended with sophisticated wa- ter cooling systems [22], to allow dense deployment in large regions, distributed DCs often use much less backup power supply and cheaper air cooling systems to reduce the opera- tional costs. These factors render a key challenge lies in the confluence of the ever-changing workload demand at each distributed DC and the unreliable power supply therein, particu- larly due to load shedding (which means the utilities will cut back the supply voltage when electrical generation and transmission systems cannot meet the demand requirements), as well as planned or accidental power outages [35], which tends to cause severe service de- lays or even service interruptions. For example, the government of China limits the power in several provinces due to the impact of weather or green energy policy [26] [3]. Power supplies are being cut to some industrial and commercial customers in Hunan and Jiangxi provinces, where demand has jumped by at least 18% over the previous year. Severe winter storm causes power outages to about 220,000 utility customers in Texas recently [27]. Load shedding in South Africa has crippled many data centers and incurs lots of extra costs [29]. It is estimated that a four-and-a-half-hour load shedding could cost a data center 100,000 1
(a) Monolithic (b) Edge data center (c) Mega data center Figure 1.1: Various deployment types of distributed data centers rands. As such, when the power supply becomes constrained, the server utilization of a DC will be affected accordingly, usually causing the scale-down of the task processing capacity. Yet if a crowd of tasks happens to arrive during this period, it can easily lead to ineffective resource usage and delayed task responses. To avoid service interruptions, most data centers are equipped with energy-storage bat- tery groups as backup power. When there is electrical load shedding or power outage, they will support the servers in a data center as backup power. On the other hand, workload mi- gration has been proposed as an effective solution to maximize resource efficiency by schedul- ing tasks of congested DCs to those light-loaded ones. As such, many efforts have been done to analyze the resource and workload efficiency in the data centers [2] [39] [31] [32]. And pioneer works have explored many solutions with such considerations as different data cen- ter capacities, migration cost, resource efficiency and electricity price [17] [37] [40] [28] [25]. Task offloading to the remote cloud has also been examined in the academia [6] [12] [5]. And recently deep learning has been applied to predict the task failure rate therein [15]. In this thesis, we carefully consider the impacts of electrical load shedding on the dis- tributed DCs, especially on their server utilization as well as the resulting ineffective resource usage and degraded QoS, and we further propose to use workload migrations to effectively tackle such power constraint caused issues therein. To achieve this, we first examine the relationship between the power constraints and the corresponding server utilization by fully considering the cooling requirement in real-world distributed data centers. We then develop a queuing model for the task service and formulate an optimization problem for workload migration, aiming to minimize both the service response time and the migration cost. We further propose an effective algorithm to solve the problem, which, in the worst case, is not 3D t greater than the optimal where t is the parameter for the approximation accuracy and D is the number of distributed DCs. We have conducted extensive simulations to evaluate our solution. The results demonstrate that when the power supply becomes constrained, our method can significantly improve the overall response time with over 9% reduction 2
with minimal cost. Then we consider distributed DCs with backup batteries. We divide all the workload of distributed data centers into web jobs that can not be interrupted and batch jobs that can be delayed. To better improve the Quality of Service (QoS) of data centers under electrical load shedding or power outage, we propose a new battery alloca- tion framework that first guarantees the web jobs are not interrupted and try to minimize the average waiting time of batch jobs within a limited overall cost. We use the workload migration algorithm to solve the problem. We have conducted extensive experiments based on real-world data to evaluate our framework. The results demonstrate that our battery allocation framework and workload migration algorithm can significantly improve the QoS of distributed data centers against electrical load shedding within a limited overall cost. 1.1 Related Work In this section, we introduce some recent works related to our research, including energy system analysis and workload management of data centers. 1.1.1 Energy Analysis and Energy Efficient Approach There are many works on energy system analysis and many approaches have been pro- posed to saving energy of the data center. Dayarathna et al. [9] survey the state-of-the-art techniques used for energy consumption modeling and prediction for data centers and their components. They performed a layer-wise decomposition of a data center system’s power hierarchy. They first divided the components into two types, hardware and software. They also conducted an analysis of current power models at different layers of the data center sys- tem in a bottom-up fashion. Chu et al. [7] investigated a typical small container data center having overhead air supply system to improve the power usage effectiveness (PUE) of data centers, the effects of air supply flowrate by computer room air handler (CRAH), intake flowrate of rack cooling fans, air supply grilles and heat load distribution on the cooling performance. Thompson et al. [34] presented a methodology for optimizing investment in data center battery storage capacity. Haywood et al. [18] addressed significant data center issues of growth in numbers of computer servers and subsequent electricity expenditure by proposing, analyzing and testing a unique idea of recycling the highest quality waste heat generated by data center servers. Li et al. [24] address the problem of improving the uti- lization of renewable energy for a single data center by using two approaches: opportunistic scheduling and energy storage. They find an intermediate solution mixing both approaches in order to achieve a balance in the energy losses due to different causes such as battery efficiency and VM migrations due to consolidation algorithms. 3
1.1.2 Workload Management Workload management is also an everlasting topic of data center and has attracted many efforts. Zapater et al. [38] presented a model for close-coupled data centers with free cool- ing, and proposed a technique that jointly allocates workload and controls cooling in a power-efficient way based on this model. Guo et al. [17] made a thorough analysis of the newly released dataset from Alibaba’s production cluster. They breakdown into resource allocation and resource adjustment to better understand the resource efficiency and charac- teristics of workloads in data center. Forestiero et al. [14] presented a hierarchical approach for workload management in geographically distributed data centers. This approach is based on a function that defines the cost of running some workload on the various sites of the distributed data center. Iqbal et al. [21] propose a secure fog computing paradigm where roadside units (RSUs) are used to offload tasks to nearby fog vehicles based on repute scores maintained at a distributed blockchain ledger. The experimental results demonstrate a sig- nificant performance gain in terms of queuing time, end-to-end delay, and task completion rate when compared to the baseline queuing-based task offloading scheme. Aljaedi et al. [1] introduce the design and implementation of OTMEN, a scalable traffic monitoring system for SDN-enabled data center networks. OTMEN decouples the forwarding and monitoring configurations in the data plane to relax the controller, while allowing fine-grained flow-level monitoring at the edge switches. The evaluation results show that OTMEN provides signif- icant improvements and monitoring overhead reduction compared to the existing solutions. 1.2 Motivation All the previous works have constructed comprehensive models for data centers and pro- posed many approaches to save energy or improve efficiency. However, all of them are based on the assumption that power is always fully charged. To our best of knowledge, there is still no work about energy analysis and workload management against electrical load shedding for distributed data centers. We closely investigate the influence of electrical load shedding in distributed data centers and construct a physical model to estimate the relationship among power, heat and workload. Based on this information profiling of distributed data centers against electrical load shedding, we next propose a framework to allocate the ap- propriate number of battery groups against load shedding and migrate some workload in this situation to improve the QoS of data centers with minimal cost. 1.3 Thesis Organization The rest of this thesis is organized as follows: 4
In Chapter 2, we present our physical model of data center and working information profiling under electrical load shedding. And then we show our system model without backup batteries and propose our workload migration algorithm to improve QoS of data centers. In Chapter 3, we consider a model with battery groups in data centers as backup power. In this model, we divide the workloads of data centers into two types: web jobs and batch jobs. We formulate our optimization problem which aims to guarantee web jobs not inter- rupted while minimizing the waiting time of batch jobs. We propose a new battery allocation framework combined with the workload migration algorithm to achieve this. In Chapter 4, we evaluate the performance of our proposed workload migration algorithm and battery allocation framework. Through real-world experiments and simulations, our methods are confirmed to be efficient. In Chapter 5, we conclude this thesis and discuss the potential improvement as our future work. 5
Chapter 2 Workload Migration under Electrical Load Shedding In this chapter, we first introduce the overall system model without backup batteries and then formulate the workload migration problem with the consideration of constrained power supply. Then we use queuing theory to approximate the tasks’ average response time. At last we transform the optimization problem into two subproblems and design an algorithm to solve them efficiently. 2.1 System Model Fig. 2.1 shows the overall system model of the distributed data centers considered in this chapter. In particular, a number of distributed data centers are deployed in a region with different scales and different power supply conditions. Normally, users send tasks to their closest data center, and these tasks are usually scheduled and executed with the first-come- first-serve (FCFS) policy [16]. Each data center has a single-channel queue and all the servers in a data center are homogeneous. However, due to the instability of the power supply, tasks may suffer from considerable delays under electrical load shedding, where a long response time will severely affect the quality of service (QoS). As each data center has different scales, and the server utilization will be heterogeneous under different power supply situations, we first model the relationship between the power supply and server utilization. Since the cooling system is very essential in a data center and consumes a large amount of energy, we consider both the heat flow and power flow in this relationship. Then we present our model for task arrival and server service rate to approximate the average response time. At last, we introduce the workload migration policy in our model and propose two objectives we cared about in the problem formulation. Table 2.1 shows the notations used in this chapter. 6
Users ... ... Task queue ... ... Data center Power Constrain Power supply Power supply Workload migration ... ... ... ... Figure 2.1: Overall system model of distributed data centers 2.1.1 Workload, Power and Heat To estimate the server utilization under power constraints, we first construct a physical model of the data center. Data center is an energy-intensive system. There are many devices in this system that consume lots of energy. Meanwhile, these devices will generate a large amount of heat. So to protect these devices, a cooling system is essential in the data center. Fig. 2.2 shows a typical air cooling system used in distributed data centers. The air cooling system needs to remove the heat generated by the servers and other heat sources and has to be reliable and available 24/7 with built-in redundancy. Traditional data center cooling system uses the infrastructure named computer room air conditioner (CRAC) or computer room air handler (CRAH). The CRAC/CRAH units would pressurize the space below the raised floor and push cold air through the perforated tiles and into the server intakes. The temperature of the servers will be cooled in this process and all of these require significant electrical power. Similar to [30], we construct a model with several intermediary data flows and relation- ships between sub-components in a data center, which is further illustrated in Fig. 2.3. In general, there are three flows in this model, that is workload, heat and power flow. The amount of workload will determine the power consumption of the servers. The heat gener- ated from the servers and the outside temperature will jointly affect the power consumption of the cooling system. Then the combination of server, cooler and other parts in the data 7
Table 2.1: Notations 1 P power consumption of server P0 power consumption of server under electrical load shedding C power consumption of cooling system C0 power consumption of cooling system under electrical load shedding O power consumption of other components S total power consumption Q the amount of generated heat α the percentage of power constraints T temperature u percentage of utilization A the area of server surface h the heat transfer coefficient V the velocity of airflow D the set of all the data centers si the number of servers in data center i λi the task-arriving rate in data center i λi 0 the task-arriving rate in data center i after workload migration µi the server-service rate in data center i µi 0 the server-service rate in data center i under electrical load shedding P0i the probability that no task in queue in data center i Wi the average waiting time in data center i Wi 0 the average waiting time in data center i after workload migration Ui the maximum upload speed in data center i Li the maximum download speed in data center i cij the migration cost per unit from data center i to j Xij the volume of workload migrated from data center i to j ρi the utilization of the servers in data center i t the accuracy of approximation in the punish function center are the total power consumption. The concrete relationship among each part are as follows: The work in [11] shows that a server’s power consumption has a linear relationship with its utilization as shown below: P = Pidle + (Ppeak − Pidle )u (2.1) where P is the power consumption of the server and u is the percentage of its utilization. Pidle and Ppeak are the idle and peak load power consumption of the server. We use Q to represent the heat generated by the server, which is also roughly the heat that should be removed by the cooling system. Then we have the following convective heat transfer equation: 8
Figure 2.2: Typical air cooling system in distributed DCs Q = hA(Toutside − Tinside ) (2.2) where A is the area of the server surface, h is the heat transfer coefficient, Toutside and Tinside are the temperature of the environment inside and outside the data center. From [10], we know h is exponentially related to velocity (V ) of the flow, and the exponent depends on the type of flow. The forced convection in the server is typically turbulent, which makes the exponent 54 . We thus have: 4 h∝V 5 (2.3) To remove more heat generated by the server, the velocity of airflow should increase at an even greater rate. The fan laws [23] tell us that this increase is first order, but power consumption increases to the 3rd power: V ∝ C3 (2.4) where C is the power consumption of cooling system. Combine the Eq. (2.2), (2.3) and (2.4), we can get the relationship between power consumption of cooling system C and generated heat Q: 5 C ∝ Q 12 (2.5) Next we consider the data center’s server utilization under power constraints, and the power supply relationship can be represented like: P 0 + C 0 + O ≤ αS (2.6) 9
Workload Total power consumption Heat Power Power consumption(others) Power Power consumption(server) consumption(cooler) Workload Outside temperature Figure 2.3: Data flow in the model where α is the percentage of power constraints and S is the total power consumption. We can easily get the total server power consumption P 0 from Eq. (2.1), and the total power consumption of other components O will not change under power constraints. Then the only challenge is to get the total power consumption of the cooling system C 0 . From the Eq. (2.5) we know that the power consumption of the cooling system has a relationship with the generated heat. Based on this, we can estimate the C 0 if we can know the predicted heat under a specific situation. If the total amount of generated heat is Qtotal and Q0total with and without power constraints, respectively, where we use β to denote the ratio between them, then we have: Q0total = βQtotal (2.7) Combine with Eq. (2.5), we can get the relationship between Ctotal 0 and Ctotal : 5 0 Ctotal = β 12 Ctotal (2.8) It is easy to get Ctotal and β in the general situation. So we can easily get Ctotal 0 under power constraints and calculate the server utilization based on Eq. (2.1) and (2.6). 2.1.2 Task-Arriving and Server-Service Rates Let λi denote the task-arriving rate in data center i, and µi denote the mean rate at which tasks are finished by a single server in the data center i, which is directly proportional to server utilization. The number of servers in the data center i is represented by si . 10
Let Wi denote the average waiting time of tasks in data center i which is determined by the task-arriving rate λi , server-service rate µi , and the number of servers si . Moreover, if µi and si do not change, the average waiting time should decrease when λi decreases. That is the reason why power constraints can affect the response time of tasks. In the next section, we will propose an approach using queue theory to approximate this average waiting time. 2.1.3 Workload Migration Our model assumes that all data centers can connect to each other and migrate some workloads to others. When the workload migration happens between two data centers, both of their task-arriving rates will change accordingly. In particular, if data center i migrates some workload to data center j, then its task-arriving rate decreases from λi to (λi − ∆λij ). The task-arriving rate of data center j will then increase from λj to (λj + ∆λij ). Due to the limits of the device capacity and network condition, the migration speed is constrained in all data centers, which means one data center can only receive or send a limited amount of workload at the same time. We use Ui and Li to denote the maximum upload and download speed allowed for workload migration in data center i. The migration will also incur some cost, and we use cij to denote the cost of migrating one unit of the workload from data center i to j. 2.2 Problem Formulation If there are some power outages or constraints in some distributed data centers, the original task schedule will suffer from low QoS. Tasks in some data centers will have too long response time while others may still have extra computing resources. To obtain better QoS and the lower response time, the original workloads may need to be rescheduled with some being migrated to other distributed data centers. Let Xij denote the volume of workload migrated from data center i to j. If Xij is negative, it means data center i receives workload from j. Let D denote the number of distributed data centers considered in a region and λi denote the original task-arriving rate, then we can get the new task-arriving rate λi 0 in data center i as below: D λi 0 = λi − (2.9) X Xij j=1 Our objective is to minimize the response time of all the tasks and minimize the mi- gration cost simultaneously. Let Wi 0 denote the new average waiting time after workload migration. The problem can be formulated as 11
D Min:Response = λ i 0 Wi 0 (2.10) X i=1 D X D Min:Cost = max{Xij , 0}cij (2.11) X i=1 j=1 s.t. Xij = −Xji , ∀i, j (2.12) λi 0 < 1, ∀i (2.13) si µi D (2.14) X − Li ≤ Xij ≤ Ui , ∀i j=1 Here, constraint (2.13) ensures the model is stable, which means the queue of tasks will not increase infinitely. Constraint (2.14) ensures the migration limitation is not surpassed. In the next section, we will introduce our solution to approximate the average waiting time and get the best migration schedule. 2.3 Solutions In this section, we first use queuing theory to approximate the tasks’ average response time. Then we transform the optimization problem into two subproblems and design an algorithm to solve them efficiently. 2.3.1 Approximation of Average Waiting Time Assume the task-arriving time and server-service time are both exponentially distributed [37] [36]. All the tasks in a data center can be modeled as an M/M/c queue. Then we can get the utilization of the servers in data center i as below: λi ρi = (2.15) si µi where ρi also represents the probability that the server is busy or the proportion of time that the server is busy. According to the theory of M/M/c queue, the probability that no task in queue is i −1 sX (si ρi )n (si ρi )si −1 P0i = [ + ] (2.16) n=0 n! si !(1 − ρi ) Therefore, the average waiting time of tasks in data center i can be calculated as 12
P0i ρi λsi i −1 + si !(1 − ρi )2 µsi i −1 Wi = (2.17) si !(1 − ρi )2 µsi i 2.3.2 Problem Transformation Recall that our objectives are minimizing both the response time Eq. (2.10) and overall migration cost Eq. (2.11). This optimization problem is actually a multi-objective program- ming problem, where the migration variable Xij must be in a specific range with some constraints, and there is no efficient known solution to solve it. Here we transform it into two subproblems and design an algorithm to solve them efficiently. The two objectives in our formulated problem are contradictory and thus cannot be optimized simultaneously. Here we first optimize the response time because for the providers, guaranteeing the QoS of users is usually the most important. As overall response time is only determined by the new task-arriving rate λ0 and does not depend on the process to migrate the workload, all the constraints can be transformed into new forms without variable Xij . Let λi 0 become the new variable instead of Xij , with the same objective (3.4), the new constraints can be formulated as: D D λi 0 = (2.18) X X λi i=1 i=1 λi − Ui ≤ λi 0 ≤ λi + Li , ∀i (2.19) λi 0 < si µi , ∀i (2.20) To solve this new problem, we start from the barrier method [4]. We first introduce a punish function I(u) = −(1/t)log(−u) to remove the inequality constraints, where t > 0 is a parameter that sets the accuracy of the approximation. I(u) is convex, nondecreasing and differentiable, which makes it feasible for Newton’s method. We will replace u in the punish function with Eq. (2.19) and Eq. (2.20) and add them to the objective. The basic idea here is that the punish function will be close to zero when constraints are satisfied and very large when constraints are not satisfied. So the solution of the new objective will be closer and closer to the origin problem when t increases. Now the problems become D Min:f (~λ0 ) = {tλi 0 Wi − log(λi 0 − λi + Ui )− X i=1 (2.21) 0 0 log(λi + Li − λi ) − log(si µi − λi )} s.t. Eq. (2.18). 13
2.3.3 Algorithm Design We have transformed the original problem to an equality constrained minimization problem, and we now develop a Newton’s method based algorithm [4] to solve it. Unlike classical Newton’s method, the start point must be feasible for this problem, and the calculation of the Newton step must consider the equality constraint. Here we can use the original task-arriving rate ~λ = {λ1 , λ2 , ..., λD } as the start point since it must be feasible. At this feasible point ~λ, we replace the objective with its second-order Taylor approximation near ~λ, the objective becomes 1 Min:fˆ(~λ + ∆~λ) = f (~λ) + f 0 (~λ)∆~λ + f 00 (~λ)∆~λ 2 (2.22) 2 where ∆~λ denotes the Newton step. Now this is a quadratic minimization problem with equality constraints and can be solved analytically. The Newton step ∆~λ is what must be added to ~λ to solve the problem and it can be calculated by the KKT system for the equality constrained quadratic optimization problem [4]. The optimality conditions are D D ∆λi = 0, f 00 (~λ)∆~λ + f 0 (~λ) + λ̂i = 0 (2.23) X X i=1 i=1 ~ where λ̂ is the associated optimal dual variable for the quadratic problem. Rewrite Eq. (2.23) in matrix form: # f 00 (~λ) I ∆~λ −f 0 (~λ) " " # ~ = (2.24) I 0 λ̂ 0 where I is the identity matrix. We can easily get that f 00 (~λ) = diag(1/λ1 2 , 1/λ2 2 , ..., 1/λD 2 ) and f 0 (~λ) = −(1/λ1 , 1/λ2 , ..., 1/λD ) from Eq. (2.21). The Newton step is defined only at points for which the KKT matrix is nonsingular. When this KKT matrix is nonsingular, ~ there is only one unique optimal primal-dual pair (~λ, λ̂). If the KKT matrix is nonsingular ~ but the KKT system is solvable, then any solution will yields an optimal pair (~λ, λ̂). If the KKT system is not solvable, this problem problem is unbounded below or infeasible. If the KKT system is solvable, we can calculate it efficiently by elimination, i.e., by solving: −1 ~ −1 If 00 (~λ) I T λ̂ = −If 00 (~λ) f 0 (~λ) (2.25) −1 ~ and setting ∆~λ = f 00 (~λ) (I T λ̂ + f 0 (~λ)) or in other words, ~ (2.26) ∆~λ = −diag(λ)2 I T λ̂ + ~λ 14
~ where λ̂ is the solution of ~ (2.27) Idiag(λ)2 I T λ̂ = 0 After we get the ∆~λ, we can use ~λ + ∆~λ to approximate the optimal solution for the current t. When the objective f is exactly quadratic, this update ~λ + ∆~λ exactly solves the minimization problem. When f is nearly quadratic, ~λ + ∆~λ should be a very approximate estimate of the optimal solution. For each t, we calculate the Newton step repeatedly until the gap to the optimal solution is smaller then the tolerance parameter . When t increases, ~λ + ∆~λ will be closer and closer to the optimal solution. Let ~λ∗ (t) denote the solution ~λ0 with parameter t we get, and p~ denote the optimal solution of objective (2.10). We now show the performance bound of our algorithm. Theorem 1. For a specific t, the maximum gap to the optimal solution is t . 3D i.e., f (~λ∗ (t)) − f (~ p) ≤ 3D t Proof: Let f0 denote the original Objective (2.10). For Objective (2.22), the optimality condition is D 1 1 1 tf00 (~λ0 ) + + λ̂i } = 0 X { − − i=1 si µi − λ0i λ0i − λi + Ui λi + Li − λ0i Divide the whole equation by t and let D 1 1 1 α(t) = X { − − } i=1 t(si µi − λi ) t(λi − λi + Ui ) t(λi + Li − λ0i ) 0 0 D λ̂i β(t) = X i=1 t According to the duality theorem of optimization problem [4], α(t) and β(t) is a dual feasible pair. The duality gap associated with ~λ0 and the dual feasible pair is mt , where m is the number of inequality constraints. i.e., ~λ0 is no more than 3D -suboptimal.t Then to minimize the migration cost based on the solution ~λ∗ (t), we transform the problem into a minimum cost flow problem according to the following transformation rules: 1) For each data center i that needs to send workload, add a source node vsi with demand (λi 0 − λi ). 2) For each data center j that needs to receive workload, add a sink node vrj with demand (λj 0 − λj ). 3) For each pair of vsi and vrj , add an edge (vsi , vrj ). Set its cost to cij , and its capacity to min(Ui , Lj ). 15
After applying the above rules, the objective (2.11) is transformed into a minimum cost flow algorithm. Such a problem can be solved by a capacity scaling algorithm [13] in polynomial time. The pseudo-code of the whole algorithm is shown in Algorithm 1. The algorithm takes the power constraints situation and server utilization of all data centers as the input and returns the migration solution, the overall response time, and total migration cost. Line 1 gets the new service rate µ0 based on the power constraints. Line 2 introduces the punish function I_(u) to our problem and transform the Objective (2.10) to (2.21). Line 3 initializes a feasible start point ~λ, an accuracy parameter of the punish function t, an increase parameter α for t and a tolerance parameter . Lines 4-8 repeatedly calculate the Newton step and update the accuracy parameter t until the gap is smaller than the tolerance parameter . Line 9 creates a Graph G based on the solution ~λ∗ (t) we get and the transformation rules. Line 10 calculates the minimum cost flow on Graph G using capacity scaling algorithm and gets the migration solution X. Algorithm 1: Workload Migration Input : Power constraints and server utilization of all data centers. Output: Migration solution X, response time and total migration cost. 1 Get the new service rate µ0 based on the power constraints condition; 2 Introduce the punish function I_(u) and transform the Objective (2.10) to (2.21); 3 Initialize a feasible ~ λ, t > 0, α > 1, tolerance > 0; 3D 4 while t ≤ do 5 Starting at ~λ, get ∆~λ by solving the KKT system; 6 Update ~λ := ~λ + ∆~λ; 7 Increase t := αt; 8 end 9 Create a Graph G based on the solution ~ λ∗ (t) and the transformation rules; 10 Get the minimum cost flow on Graph G using capacity scaling algorithm; 11 return The migration solution X, Response and Cost; The choices of the parameters in Algorithm 1 are not trivial. At first, the choice of the parameter α involves a trade-off in the number of inner and outer iterations required. If α is too small (near 1) then t will only increase by a very small factor at each outer iteration. Thus for small t we expect a small number of Newton steps per outer iteration and the algorithm will convergence near the optimal path. But of course, more outer iterations are needed because the gap to the optimal solution is reduced by only a small amount each time. On the other hand, we will have the opposite situation when α is large. During each outer iteration, t increases a lot, so the current equation is probably not a good approximation of the next iteration and we need more inner iterations to calculate the Newton step. But this ’aggressive’ updating of t can decrease the number of outer iterations since the gap to the optimal solution is decreased more in each step. The confirmation both in practice and 16
theory [4] proves that values from around 10 to 20 or so seem to work well. Another very important parameter is the tolerance parameter . A small can help us get accurate results but will significantly increase the runtime of the algorithm since more iterations are needed to convergence to the required gap. On the other hand, a very large can lead to inaccurate results and seriously damage the QoS in our problem. Thus we need to carefully select the based on the real situation and the priority of our demands. At last, a good start point ~λ will definitely save the runtime of the algorithm. However, in this problem we only have one feasible start point to choose from, that is the initial situation of the data centers. 17
Chapter 3 Battery Allocation against Electrical Load Shedding In this section, we first introduce the overall system model with backup batteries. Fig. 3.1 shows typical lead-acid battery groups and the monitor system. When there is electrical load shedding or power outage, they will support the servers in a data center as backup power. Yet the capacity of a battery group is limited, and a deep discharge of an energy-storage battery group (typically lead-acid) will severely affect its internal structure, reducing its capacity and lifetime. Fig. 3.2 shows a typical discharging curve for a lead-acid battery. Most of its power is released during the linear region. In the last hyperbolic region, the voltage falls very fast and it can only release a very small fraction of power, which should be avoided. Then we formulate the battery allocation and workload migration against load shedding as a multi-objective optimization problem. Table 3.1 lists the notations to be used in this chapter. (a) Battery groups (b) Monitor system Figure 3.1: The backup batteries and the monitor system. 18
2.1 discharging curve 2 coup de fouet region linear region Voltage (v) 1.9 1.8 1.7 hyperbolic region 1.6 0 2 4 6 8 10 12 Time (h) Figure 3.2: Typical discharge voltage versus time characteristics Table 3.1: Notations 2 D the set of all the data centers si the number of opening servers in data center i ni the number of battery groups in data center i Pb the amount of power supplied by one backup battery Ps the amount of power required by one server Pg the amount of power supplied by power grid λw,i the arriving rate of web jobs in data center i λb,i the arriving rate of batch jobs in data center i µi the server-service rate in data center i P0i the probability that no task in queue in data center i Wi the average waiting time in data center i t ri,n s the reserve time for data center i with ns battery groups at time t cb the replacement cost of a single battery group of data center Ti,ni the lifetime of batteries in data center i Cr the normalized battery replacement cost Cm the migration cost of all batch jobs Ca ll the normalized overall cost Ψ the average waiting time of batch jobs Ui the maximum upload speed in data center i Li the maximum download speed in data center i cij the migration cost per unit from data center i to j Xij the volume of workload migrated from data center i to j nL lower limit of battery group number in a data center nU upper limit of battery group number in a data center B the budget limit ρi the utilization of the servers in data center i t the accuracy of approximation in the punish function 19
Web Jobs Users Users ... Data center ... ... ... Batch Jobs Workload migration Batch Jobs Power Outage Backup Batteries Power Grid Backup Batteries Figure 3.3: Framework of our system model 3.1 System Model Fig. 3.3 shows the overall system model of the distributed data centers considered in this chapter. In particular, we use D to denote the set of all distributed data centers in a region. Each data center will be powered by both the power grid and backup batteries under electrical load shedding. When there is power outage and the power grid is totally infeasible, the data centers will only be powered by backup batteries. Similar to [24], we distinguish all the workload of distributed data centers into two kinds of computing jobs: web jobs that require to run continuously and batch jobs that can be delayed and interrupted. Web jobs cannot be interrupted and the backup batteries must guarantee the web jobs are not influenced under electrical load shedding. Yet the batch jobs have a deadline constraint specified by the users. This part of the workload can be migrated to other distributed data centers which have extra computing resources to meet the deadline constraint. Next, we will introduce the details and give the notations of each part. 3.1.1 Web Jobs and Batch Jobs Workload in each distributed data center can be divided into web jobs and batch jobs. Let λtw,i and λtb,i denote the arriving rate of web jobs and batch jobs in data center i at time t. Web jobs must run continuously and cannot be interrupted (like web servers). Web jobs must be finished at the current data center and cannot be migrated to other data centers. Distributed data centers have to close some servers under electrical load shedding. Thus we need to allocate more backup batteries to support more opening servers if the web jobs have 20
the potential to be interrupted under electrical load shedding. Let sti denote the number of opening servers in data center i at time t. All the other workload are defined as batch jobs that can be delayed and interrupted (like monthly payroll computation). Different from web jobs, if the current data center cannot open enough servers to finish batch jobs under electrical load shedding, we can migrate some of the batch jobs to other data centers and wait to be finished later. Let Wi denote the average waiting time of batch jobs in data center i, which is determined by the arriving rate λi , server-service rate µi , and the number of servers si . 3.1.2 Backup Batteries and Servers Let ni denote the number of battery groups in data center i and Pb denote the amount of power supplied by one battery group. Assume all the servers in each data center are homogeneous and each server needs Ps power to run. In the situation of electrical load shedding, assume the power grid can only supply Pgt power at time t. The relationship of the number of opening servers and backup batteries in data center i can be represented as: sti Ps ≤ ni Pb + Pgt , ∀t (3.1) Recall that the battery has already severely suffered from deep discharge at the hyper- bolic region. To protect the battery, we disconnect it when the battery discharges to the end of linear region (as illustrated in Fig. 3.2). To this end, we use ri,n t s to denote the reserve time for data center i with ns battery groups at time t. When the reserve time decreases to zero, the data center can only be supplied by the power grid under load shedding. The batteries will also degrade in this process and need to be replaced when the lifetime is surpassed. Let cb denote the replacement cost of a single battery group of data center. For simplification, we assume an average for the shipment and labor cost among different data centers. Then the overall battery replacement cost Cr (including purchase and installment) can be represented by: X ns cb Cr = (3.2) i∈D Ti,ns where Ti,ni is the lifetime of batteries in data center i. We use Ti,ni as denominator for normalization (representing annual replacement cost). 3.2 Problem Formulation Our objective is to allocate the appropriate number of battery groups in each data center against electrical load shedding. We want to guarantee the web jobs are never interrupted which can be represented as: 21
λtw,i ≤ N nti , ∀t, ∀i (3.3) where N represents the computing capacity of one server. Then we want to minimize the waiting time of batch jobs at the same time. So our first objective can be represented as: Min: Ψ = (3.4) XX λtb,i Wit t∈T i∈D where T is the time-based index range of the considered period and Ψ represents the average waiting time of batch jobs. Besides minimizing the waiting time of all the batch jobs, the operator of data centers also wants to reduce the overall cost Call , which includes the battery replacement cost Cr and workload migration cost Cm of batch jobs. Let Xij denote the volume of workload migrated from data center i to j. If Xij is negative, it means data center i receives workload from j. Then our another objective can be represented as follows: Min: Call = Cr + Cm X ni cb (3.5) = + max{Xij , 0}cij XX T i∈D i,ns i∈D j∈D s.t. Xij = −Xji , ∀i, ∀j (3.6) λi < 1, ∀i (3.7) si µi D (3.8) X − Li ≤ Xij ≤ Ui , ∀i j=1 Here, constraint (3.7) ensures the model is stable, which means the queue of tasks will not increase infinitely. Constraint (3.8) ensures the migration limitation is not surpassed. In practice, there may be other factors that limit the number of battery groups equipped in a data center. We thus use nL and nU to denote the lower limit and upper limit of the number of battery groups, respectively, and this constraint can be represented as: nL ≤ ni ≤ nU , ∀i (3.9) Besides, the operators of data centers usually want to control the overall cost within a given upper bound B. So we also have another constraint which can be represented as: Call ≤ B (3.10) 22
Distributed DCs information Multi-objective under eletrical Optimization load shedding Distributed DCs across a region Battery Workload Allocation Migration Battery information Battery Degradation Prediction Figure 3.4: Framework of our solution 3.3 Solutions In this section, we introduce the solutions to solve the optimization problem formulated in the last section. The systematical design our solution framework is shown in Fig. 3.4. Given a set of distributed data centers in a region, we first estimate the work information of each data center under different power supply conditions. We have explored the work conditions of distributed data centers under electrical load shedding in Chapter 2. The work conditions of servers with a different number of battery groups can be calculated accordingly. Our another previous work [35] has proposed a deep learning-based battery profiling method to predict the battery degradation from the historical battery activities. The reserve time and lifetime of battery groups can be calculated from the result of the prediction easily. Given all of these results, we propose new solutions to solve the multi- objective optimization problem. At first, we use Workload Migration algorithm introduced in the last chapter to give the optimal migration schedule of batch jobs given the specific number of battery groups in each data center with a bound. Based on this algorithm, we propose another new algorithm Battery Allocation to allocate the appropriate number of battery groups in each data center to optimize the objectives with all the constraints. Next, we will introduce our new battery allocation framework. 3.3.1 Battery Allocation Framework Given the profiling results of distributed data centers and battery features under electrical load shedding, we next solve the battery allocation optimization problem. Recall that our objective is first to guarantee the web jobs are not interrupted (Eq. (3.3)). Later we try to migrate some of the batch jobs to minimize average waiting time and migration cost by 23
Algorithm 1. If we continue to allocate more battery groups after we guarantee the web jobs, according to Eq. 3.2, more servers can be opening to support more batch jobs, resulting in lower migration costs. The replacement cost of backup batteries may increase, however. Thus the allocation of backup batteries is not a trivial problem to minimize the overall cost. Indeed this multi-objective optimization problem is a multi-objective integer programming problem, where the battery group number ni must be an integer between a lower bound nL and an upper bound nU , and this kind of integer linear programming (ILP) is known to be NP-complete. Thus we design a heuristic algorithm to solve it efficiently. Before jumping to the algorithm design, we first briefly analyze the characteristics of this optimization problem. Intuitively, given the same external incidents happening to a data center, the base station can support more servers under electrical load shedding when equipped with more battery groups. Thus the average waiting time of batch jobs and mi- gration cost will both decrease. The battery replacement cost is different, where the process can be divided into two stages: In the first stage, when the number of battery groups in- creases, the additional backup power helps the data center sustain a longer period against load shedding and reduce deep discharging of batteries. So the lifetime of batteries is pro- longed, achieving lower battery replacement cost. In the second stage, if we continue to increase the number of battery groups, the extra backup power becomes redundant and the battery replacement cost will increase due to the unavoidable battery degradation process. Therefore, there may exist three situations for battery replacement cost according to the different conditions of the corresponding data center, i.e., the cost first decreases and then increases (both stage 1 and stage 2), the cost keeps decreasing (only stage 1), and the cost keeps increasing (only stage 2). As the sum of battery replacement cost and migration cost, the overall cost Call can also have this characteristic, which is future verified by our real data-driven experiments in Section 5. Based on the above analysis, the design of our heuristic algorithm is shown in Algorithm 2, where we divide the whole process into two stages. In the first stage (line 1-7), for each data center, we allocate enough battery groups until the servers can support all the web jobs. In the second stage (line 8-17), we first allocate as many as possible battery groups for each data center when the battery replacement cost decreases. Since the migration will never increase with more backup power, the overall cost will also decrease in this stage. Then if the overall cost still does not exceed the budget limit and average waiting time still decreases. To better balance the tradeoff between average waiting time and cost, we define Gain as the ratio of the weighted average waiting time decrease and the overall cost increase Ψi (ni ) − Ψi (ni + 1) Gain = (3.11) Ci,all (ni + 1) − Ci,all (ni ) 24
Algorithm 2: Battery Allocation Input : Results of data center and battery features profiling. Output: The allocation results ni for every data center i and average waiting time Ψ of batch jobs. 1 foreach i in D do 2 Set initial battery assignment as ni = nL ; 3 while ni ≤ nU do 4 Increase ni until Eq. (3.3) is satisfied; 5 Calculate Cr and switch to next station; 6 end 7 end 8 while Call ≤ B and Ψ still decreases do 9 Try to pre-allocate one more battery group for each data center i and calculate the Ψ, Cr and Call ; 10 if Cr begin to increase then 11 Choose data center i that leads to maximum Gain and add one battery group; 12 end 13 else 14 Do add one battery group for data center i; 15 end 16 Update Ψ and Call ; 17 end 18 return ni for all the data center i, Ψ and Call ; 25
We each time select the data center with the maximum Gain and add one battery group to it until we reach the budget. By utilizing such a greedy approach we guarantee to reduce the most average waiting time with the least cost increase for each step. We next analyze the complexity of our heuristic algorithm to show its efficiency. In the first stage, we only access each data center once and the complexity is O(n). In the second stage, we calculate Ψ, Cr and Call for each data center and iteratively select data center with the maximum Gain, which contributes the complexity of O(nlog(n)). So the total complexity of our algorithm is O(nlog(n)). 26
Chapter 4 Performance Evaluation In this chapter, Based on real-world data and simulation, we evaluate the performance of propose Workload Migration algorithm and Battery Allocation framework respectively. 4.1 Evaluation of Workload Migration Algorithm In this section, we will introduce our experiment setup and the performance evaluation of our proposed Workload Migration algorithm. 4.1.1 Simulation Setup We conduct extensive trace-driven simulations with diverse networks, computing and energy configurations to evaluate the performance of our method. We also compare it with some baseline methods. The number of servers in a single distributed data center varies from 2 to 20. The average task-arriving and server-service rates are generated by uniform distribution, where the former vary from 10 to 15 MB/s and the latter from 15 to 20 MB/s. Here the Table 4.1: Server configuration Item Quantity Configuration Processor 4 Intel 8158 3.0 GHz 150W 12C/24.75MB Cache/DDR4 2666MHz Memory 48 128GB DDR4-2666-MHz TSV- RDIMM/PC4-23100/octal rank/x4/1.2v Power Supply 4 1600W PSU1 Dedicated Storage 1 RAID-Cisco 12G SAS Modular Con- Controller troller 4GB FBWC Storage 16 480GB 6Gb SATA 2.5-inch SSD En- terprise Value Adapter 12 GPU-NVIDIA V100 32GB 27
You can also read