Performance-controlled server consolidation for virtualized data centers with multi-tier applications

  • CategoryDocuments

  • View212

  • P c Y a b a A R R A K P D S M V C 1 t h I c p F r f a e o h 2 Sustainable Computing: Informatics and Systems 4 (2014) 52–65 Contents lists available at ScienceDirect Sustainable Computing: Informatics and Systems jou rn al hom epage: www.elsev ier .com/ locate /suscom erformance-controlled server consolidation for virtualized data enters with multi-tier applications� efu Wanga, Xiaorui Wangb,∗ University of Tennessee, Knoxville, United States The Ohio State University, United States r t i c l e i n f o rticle history: eceived 8 August 2012 eceived in revised form 24 August 2013 ccepted 19 February 2014 eywords: ower management ata centers erver consolidation ulti-tier web applications irtualization ontrol theory a b s t r a c t Modern data centers must provide performance assurance for complex system software such as multi- tier web applications. In addition, the power consumption of data centers needs to be minimized to reduce operating costs and avoid system overheating. Various power-efficient performance management strategies have been proposed based on dynamic voltage and frequency scaling (DVFS). Virtualization technologies have also made it possible to consolidate multiple virtual machines (VMs) onto a smaller number of active physical servers for even greater power savings, but at the cost of a higher overhead. This paper proposes a performance-controlled power optimization solution for virtualized server clusters with multi-tier applications. While most existing work relies on either DVFS or server consolidation in a separate manner, our solution utilizes both strategies for maximized power savings by integrating feedback control with optimization strategies. At the application level, a novel multi-input–multi-output controller is designed to achieve the desired performance for applications spanning multiple VMs, on a short time scale, by reallocating the CPU resources and conducting DVFS. At the cluster level, a power optimizer is proposed to incrementally consolidate VMs onto the most power-efficient servers on a longer time scale. Empirical results on a hardware testbed demonstrate that our solution outperforms pMapper, a state-of-the-art server consolidation algorithm, by having greater power savings and smaller consolidation overheads while achieving the required application performance. Extensive simulation results, based on a trace file of 5415 real servers, demonstrate the efficacy of our solution in large-scale data centers. © 2014 Elsevier Inc. All rights reserved. . Introduction In recent years, power has become one of the most impor- ant concerns for enterprise data centers hosting thousands of igh-density servers and providing outsourced business-critical T services. A well-known approach to reducing server power onsumption is to transition the hardware components from high- ower states to low-power states whenever performance allows. or example, a widely used power-efficient server design is to have un-time measurement and control of the desired application per- ormance by adjusting the CPU power states using dynamic voltage nd frequency scaling (DVFS). However, while this approach can ffectively reduce the dynamic power of the system with a small verhead (e.g., under 100 �s in AMD Athlon [2] and under 50 �s in � This is a significantly extended version of a workshop paper [1]. ∗ Corresponding author. Tel.: +1 614 247 1977. E-mail addresses:, (X. Wang). ttp:// 210-5379/© 2014 Elsevier Inc. All rights reserved. IBM POWER7 [3]), it cannot minimize the system leakage power for maximized power savings. Recently, many data centers have begun to adopt server virtual- ization strategies for resource sharing. Virtualization technologies such as VMware and Xen can consolidate applications previously running on multiple physical servers onto a smaller number of physical servers, effectively reducing the power consumption of a server cluster by shutting down unused servers. More impor- tantly, live migration [4] allows the movement of a virtual machine (VM) from one physical host to another with a reasonably short downtime [5]. This function makes it possible to use server consol- idation as an online management approach, i.e., having run-time estimation of resource requirements of every VM and dynamically re-mapping VMs to physical servers using live migration. When a more power-efficient VM-server mapping is found, unused servers can be put into the sleep mode for reduced power consumption. A recent study [6] shows that server consolidation can achieve a power saving of 34%, which is significantly more than that achieved by DVFS (17%). However, the down side of consolidation is that it
  • ing: In m o s a t a a b a g c s s t c c S w r r b a i h s m f o i c f a a u r i t i S t w i s s t s p p a p t D p e f m f • Y. Wang, X. Wang / Sustainable Comput igrates VMs from one host to another, which may incur a high verhead in terms of both time and server resources. Therefore, erver consolidation can only be conducted on a long time scale to mortize the migration overheads. While power consumption must be minimized, an impor- ant goal of data center operators is to meet the service-level greements (SLAs) required by customers, such as response time nd throughput. SLAs are important to operators of data centers ecause they are the key performance metrics for customer service nd are part of customer commitments. Therefore, it is important to uarantee the SLAs of the applications while minimizing the power onsumption of a data center. Guaranteeing the desired SLAs with minimized power con- umption introduces several major challenges. First, complex ystem software, such as web applications, commonly has multi- ier installations. Thus, an application may span multiple VMs. This haracteristic calls for advanced multi-input–multi-output (MIMO) ontrol solutions to manipulate multiple VMs simultaneously. econd, web applications often face significant, unpredictable orkload variations. To guarantee SLAs, a control solution must espond to a workload variation quickly by adjusting system esource allocation (e.g., CPU time on each server). This cannot e achieved by migrating some VMs among the servers because VM migration typically requires seconds, or even minutes, to fin- sh. Finally, servers in a cluster may be manufactured by different ardware vendors and have different power efficiencies, i.e., some ervers are more power-efficient than others. A power manage- ent solution must be able to utilize this kind of heterogeneity to urther reduce the total power consumption. While existing work on data center power management relies n either DVFS or server consolidation in a separate manner, it is mportant to integrate application-level performance control and luster-level server consolidation for maximized power savings or two reasons. First, most existing work of server consolidation ssumes that the resource requirements of the VMs are known priori or can be estimated through measuring the resource tilization. However, resource utilization commonly differs from esource demands, especially when servers are overloaded. Thus, t is preferable to have an application-level performance controller hat can dynamically allocate system resource and conduct DVFS n response to short-term application resource demand variations. econd, a performance controller based on CPU resource alloca- ion and DVFS may fail to guarantee the application performance hen servers are overloaded due to significant long-term workload ncreases. Therefore, it is preferable to have a cluster-level con- olidation algorithm to dynamically migrate VMs among physical ervers to resolve the overload problem. The potential large size of virtualized datacenters requires that he integration of SLA guarantee, DVFS and server consolidation hould be implemented in a scalable manner. In this paper, we ropose a scalable integrated management solution to minimize ower consumption for virtualized data centers while providing pplication-level performance assurance. Each application-level erformance controller adopts a MIMO control strategy to maintain he desired performance and reduce power consumption through VFS and dynamic CPU resource reallocation. The cluster-level ower optimizer then consolidates the VMs onto the most power- fficient servers and places unused servers into the sleep mode or power savings on a much longer time scale, to amortize the igration overhead. Specifically, the contributions of this paper are our-fold: We formulate the performance assurance problem for multi-tier applications running in multiple VMs as a MIMO control prob- lem in a distributed manner at the application level. We then formatics and Systems 4 (2014) 52–65 53 design a MIMO performance controller based on advanced Model Predictive Control (MPC) theory. • We design a novel cluster-level power optimizer that conducts server consolidation based on the application-level resource demands profiled by the MIMO performance controllers. Com- pared with existing solutions, our solution can achieve more power savings with smaller consolidation overheads. We present detailed analysis of the computational overhead of our consolida- tion algorithm and demonstrate that the overhead is sufficiently small for deployment in real data centers. • We integrate server consolidation with DVFS and CPU resource allocation to handle both long-term and short-term workload variations. The key benefit of such an integration is to achieve maximized power savings with application-level performance assurance. We introduce the system architecture of our inte- grated power management solution and the implementation details of each component. • We present both empirical results on a hardware testbed and simulation results (based on a trace file of 5415 real servers) to demonstrate that our solution can effectively reduce cluster power consumption while achieving the desired performance for multi-tier applications in virtualized data centers. In addition, we show that our solution outperforms a state-of-the-art server consolidation solution, pMapper [7]. The rest of the paper is organized as follows. Section 2 intro- duces the overall architecture of our power management solution. Section 3 presents the modeling, design, and analysis of the per- formance controller. Section 4 discusses our power optimizer. Section 5 describes the implementation details of our testbed and simulator. Section 6 presents the results. Section 7 highlights the distinction of our work by discussing the related work with Sec- tion 8 concluding the paper. 2. System architecture In this section, we provide a high-level description of our system architecture, which includes an application-level response time controller and a cluster-level power optimizer. As shown in Fig. 1, our power management solution includes two levels. At the application level, for every application running in the cluster, there is a performance controller that dynamically controls the performance of the application by adjusting the CPU resource (i.e., fraction of CPU time) allocated to the virtual machines running the application. We choose to control the 90-percentile response time of each multi-tier web application as an example SLA metric, but our management solution can be extended to con- trol other SLAs such as average or maximum response times. We assume that the response time of a web application is independent from that of another web application. This is usually a reason- able assumption because they may belong to different customers. Hence, we choose to have a response time controller for each multi- tier application. A server-level CPU resource arbitrator then collects the CPU resource demands of all VMs hosted on the server, allocates the CPU resource to the VMs, and uses DVFS to save power, if the server has more CPU resources than the VMs require. At the cluster level, a power optimizer collects the CPU resource demands of all the VMs running in the cluster, then uses a server consolidation algorithm to find the most power-efficient VM- server mapping while satisfying the CPU resource requirements of all the VMs. The power optimizer then sends VM migration com- mands to the VM migration interfaces on every server, if necessary. It also puts selected servers into the sleep/active mode. Server consolidation makes it possible to put unused servers into the sleep mode, which can typically save more power than
  • 54 Y. Wang, X. Wang / Sustainable Computing: Informatics and Systems 4 (2014) 52–65 F applic D D m t t b t t a m a t t m c l t d o t r p 3 i m 3 m w A r n c C i c ig. 1. System architecture, including a response time controller for every multi-tier VFS. However, server consolidation may incur a higher overhead. ue to variations in workload, the power optimizer may require the igration of a VM from one server to another, or require a server o awaken. These operations are time consuming, especially when he processors of the related servers are busy, or when the network andwidth is limited. Thus, the optimizer should not be invoked oo frequently. On the other hand, the response time controller at he application level uses CPU resource allocation and DVFS as its ctuators, both of which have much smaller overheads than VM igration. As a result, the response time controller is invoked on small time scale (several seconds) to deal with short-term varia- ions in workload, while the power optimizer is invoked on a longer ime scale (hours to days). Between two consecutive invocations of the cluster-level opti- izer, it is possible that an unexpected increase of the workload an cause a severe overload on a server. To deal with this prob- em, the solution in this paper can be integrated with algorithms o move VMs from the overloaded servers to idle servers in an on- emand manner. An example of such algorithms can be found in ur previous work [8]. We introduce the design and analysis of the response time con- roller and cluster-level power optimizer in the next two sections, espectively. The implementation details of other components are rovided in Section 5. . MIMO response time controller In this section, we present the problem formulation, model- ng, and design of the MIMO response time controller designed for ulti-tier applications. .1. Problem formulation Response time control can be formulated as a dynamic opti- ization problem. We first introduce some notation. A cluster ith N physical servers (S1, S2,. . ., SN) hosts L applications (App1, pp2,. . ., AppL). Some applications can be multi-tier applications unning in multiple VMs. A VM running the jth tier of Appi is amed as VMij, j = 1, 2, . . ., ri. CPU allocation of VMij is defined as ij. Note that this number is expressed as the absolute number of PU cycles allocated to VMij in terms of GHz. For example, If VM11 s allocated 20% of a 3 GHz CPU, we say c11 = 20 % ×3 GHz = 0.6 GHz. i = [ci1, ci2, . . ., ciri ] T is a column vector of the CPU allocations of ation and a cluster-level power optimizer to conduct dynamic server consolidation. all the VMs running application Appi. The 90-percentile response time of Appi is ti seconds. The absolute CPU frequency of server Si is fi. The control period of the response time controller is T seconds. Note that x(k) denotes the value of x in the kth control period, e.g., ci(k) means the value of ci at time kT. Specially, �ci(k) is the difference between ci(k + 1) and ci(k), i.e., �ci(k) = ci(k + 1) − ci(k). At the end of every control period, the response time controller decides �ci(k) to achieve the desired response time for application Appi. 3.2. Response time controller design In order to have an effective controller design, it is important to model the dynamics of the controlled system. The response time model of the application Appi is the relationship between ti and ci. A well-established theoretical equation is usually unavailable for computer systems, therefore, we use a standard approach to this problem called system identification [9]. Rather than building a physical equation between the manipulated variables and the con- trolled variable, we infer their relationship by collecting data in experiments and then establish a statistical model based on the measured data. For example, by conducting system identification to an application (App1) in the prototype system introduced in Sec- tion 5, we get the system model as: t1(k) = ˛11t1(k − 1) + ˇ11c1(k − 1) + ˇ12c1(k − 2) + �1(k − 1) (1) where ˛11 is a constant parameter while ˇ11 and ˇ12 are constant parameter vectors. The values of ˛11, ˇ11 and ˇ12 can be deter- mined in system identification. �1(k) represents the static part of the response time that does not change with CPU allocation varia- tions plus the error of our model in practice, and is inferred online by comparing the predicted response time and measured response time. We apply Model Predictive Control (MPC) theory [10] to design the controller, based on system model (1). MPC is an advanced con- trol technique that deals with coupled MIMO control problems. This characteristic makes MPC well suited for response time control in multi-tier web applications. A Model Predictive Controller optimizes a cost function defined over a time interval in the future. The controller uses the sys- tem model to predict the control behavior over P control periods, called the prediction horizon. The control objective is to select an
  • ing: In i t � h t t c t c m b t M c T r t m J w i v e a a c r e l r w s p c s w i b a h t v t r e v e h t s i f v e Q a n b c Y. Wang, X. Wang / Sustainable Comput nput trajectory that minimizes the cost function. An input trajec- ory includes the control inputs in the following M control periods, c(k), �c(k + 1|k),. . .�c (k + M − 1|k), where M is called the control orizon. The notation x(k + i|k) means that the value of variable x at ime (k + i)T depends on the conditions at time kT. Once the input rajectory is computed, only the first element �c(k) is applied as the ontrol input to the system. At the end of the next control period, he prediction horizon slides one control period and the input is omputed again based on feedback t(k) from the response time onitor. Note that it is important to re-compute the control input ecause the original prediction may be incorrect due to uncertain- ies and inaccuracies in the system model used by the controller. PC enables us to combine performance prediction, optimization, onstraint satisfaction, and feedback control into a single algorithm. he controller includes a least squares solver, a cost function, a eference trajectory, and a system model. At the end of every con- rol period, the controller computes the control input �c(k) that inimizes the following cost function: (k) = P∑ i=1 ‖t(k + i|k) − ref (k + i|k)‖2Q + M−1∑ i=0 ‖�c(k + i|k)‖2R(i) (2) here P is the prediction horizon, and M is the control horizon. Q s the tracking error weight, and R(i) is the control penalty weight ector. The first term in the cost function represents the tracking rror, i.e., the difference between the response time t(k + i|k) and reference trajectory ref(k + i|k). The reference trajectory defines n ideal trajectory along which the response time t(k + i|k) should hange from the current value t(k) to the set point Ts (i.e., desired esponse time). Our controller is designed to track the following xponential reference trajectory so the closed-loop system behaves ike a linear system. ef (k + i|k) = Ts − e−T/Tref i(Ts − t(k)) (3) here Tref is the time constant that specifies the system response peed. A smaller Tref causes the system to converge faster to the set oint but may lead to a larger overshoot. By minimizing the tracking error, the closed-loop system will onverge to the response time set point Ts if the system is stable. The econd term in the cost function (2) represents the control penalty, hich causes the controller to decrease the change of the control nput, i.e., the CPU allocation. The control weight vector, R(i), can e tuned to represent a preference among the VMs. For example, higher weight may be assigned to a VM if the process running it as a larger CPU demand so that the controller can give preference o increasing its CPU allocation. The prediction horizon P, the control horizon M, and the weight ectors Q and R(i), are tuning parameters, which can be adjusted o achieve satisfactory dynamic performance such as settling time, obustness. In general, the prediction horizon P should be large nough such that the reference trajectory converges to the set point alue and the system converges to the reference trajectory. How- ver, the larger P is, the more computation load there is. The control orizon M affects the freedom for the least squares solver to change he control input (i.e., CPU time allocation). The more freedom the olver has, the better solution it can generate. As the value of M ncreases, the dimension of the solution space increases and, there- ore, searching the optimal solution takes more time. The weight ectors Q and R(i) determine the trade-off between the control rror and the control gain. A general rule to determine the values of and R(i) is that a larger Q leads to faster response to workload vari- tions, while a larger R lets the system be less sensitive to system oise. More detailed discussions about those tuning parameters can e found in [10]. A fundamental benefit of the control-theoretic approach is the onfidence for system stability. We say that a system is stable if formatics and Systems 4 (2014) 52–65 55 the response time t(k) converges to the desired set point Ts, that is, lim k→∞ t(k) = Ts. However, an MPC controller without additional constraint can be unstable. In optimal control theory [10,11], the stability of an MPC controller can be ensured by adding a terminal constraint. The constraint forces the state to take a particular value at the end of the prediction horizon. Therefore, we add the termi- nal constraint to our optimization problem requiring the response time of the application to converge to the set point at the end of the prediction horizon: t(k + M|k) = Ts (4) Based on the above analysis, the response time control has been modeled as a MIMO optimal control problem. The controller must minimize the cost function (2) under the terminal constraint. This constrained optimization problem can be transformed to a standard least-squares problem [10]. The controller uses a standard least-squares solver to solve the opti- mization problem online. In our prototype system, we implement the controller based on the solver presented in [10]. The overhead of the solver is polynomial in the number of tiers that host the application and the control and prediction horizons. Because an application typically has only three or fewer tiers, the computa- tional overhead of the controller is negligible. For example, in our prototype system, the controller takes 176 s to make 1 million con- trol decisions on an Inten Xeon 5365 processor. In other words, the control decision to be made in each control period only takes 0.17 ms. 3.3. CPU resource arbitrator As introduced in Section 2, the response time controller deter- mines the CPU resource demand of every VM. A server-level CPU resource arbitrator then allocates the CPU resource to the VMs hosted on the server based on their demands. Specifically, the arbitrator on each server collects the CPU resource demand of every VM hosted on the server, in terms of CPU cycles per second (GHz), decides what CPU frequency the server should have in order to satisfy the aggregated demands, and then throttles the processor of the server to the desired CPU frequency. For example, a server S1 (with the maximum CPU frequency of 3 GHz) hosts two VMs, VM11 and VM32. VM11 requires a CPU fre- quency of 1 GHz and VM32 requires a CPU frequency of 0.5 GHz. Another 0.5 GHz is reserved for the VM hypervisor. In this case, the arbitrator on S1 decides that the CPU frequency of S1 should be 1 GHz + 0.5 GHz + 0.5 GHz = 2 GHz. Thus, the CPU frequency of Si is dynamically set to: fi(k) = min ⎧⎨ ⎩ ∑ VMjl∈Si cjl(k) + c0, Fimax ⎫⎬ ⎭ (5) where c0 is the constant portion of the CPU resource reserved for the virtual machine hypervisor. If the arbitrator cannot satisfy the demands of all the hosted VMs even when the processor is run- ning at the highest CPU frequency, the server is assumed to be overloaded. To deal with this case, the solution in this paper can be integrated with algorithms to move VMs from the overloaded servers to other servers in an on-demand manner. An example of such algorithms can be found in our previous work [8]. Please note that we mainly focus on CPU-intensive workloads (e.g., dynamic web content creation) in this paper. However, our design can be extended for IO-intensive workloads by adding IO requirements as constraints in the arbitrator, which is our future work. Please also note that if a VM is migrated to a server with dif- ferent hardware, the same CPU frequency may have very different performance. In this case we assume that the CPU frequency value
  • 5 ing: In i a e b o f t d t l m p 4 t u p s p l t P a 4 s f b t i t T r C s t r p c m s t r t e o ∀ m 6 Y. Wang, X. Wang / Sustainable Comput s already scaled based on an offline performance benchmark such s the Reference Performance Estimate 2 (RPE2) value [12]. Today’s processors only support several discrete frequency lev- ls. However, the new CPU frequency level periodically decided y the CPU resource arbitrator could be a value that is not one f the supported frequency levels. Our arbitrator includes a CPU requency modulator which resolves the output value of the con- roller to a series of supported frequency levels to approximate the esired value. For example, to approximate 2.89 GHz during a con- rol period, the modulator would output a sequence of supported evels: 2.67, 3, 3, 2.67, 3, 3, etc. on a smaller timescale. We imple- ent the first-order delta-sigma modulator presented in [13] to erform this function. . Server consolidation for power optimization As discussed in Section 2, a cluster-level power optimizer is used o find the most power-efficient VM-server mapping, and reconfig- re the cluster by VM migration. The optimizer then dynamically laces selected servers into the sleep mode or wakes up selected ervers for maximum power savings and guaranteed application erformance. In this section, we first formulate the power optimization prob- em. We then build a power model to capture the impact caused by he migration of a VM to a server. Finally, we propose Incremental ower Aware Consolidation (IPAC), a heuristics-based optimization lgorithm, to find the most power-efficient VM-server mapping. .1. Problem formulation We first introduce the following notation. The processor of erver Sl has a maximum CPU frequency of F (l) max and a minimum requency of F (l)min, i.e., fl ∈ [F (l) min, F (l) max]. VMij ∈ Sl means VMij is hosted y server Sl. Sl consumes pl Watts of power. When it is placed into he sleep mode, we assume pl = p0, which is a small constant value. The power optimizer is invoked every To hours. As discussed n Section 2, the invocation period of the optimizer is much longer han the control period of the response time controller, i.e., To � T. herefore, during the invocation period of the optimizer, the esponse time controller makes a series of decisions regarding the PU resource demands for every virtual machine VMij. We place the eries of CPU resource demands for VMij into a vector �cij . We adopt he workload analysis method in [14] to estimate the CPU resource equirements in the next period based on the requirements in the revious control periods. The optimizer minimizes the total power onsumption of the cluster in N∑ l=1 pl(k) (6) The optimization problem is subject to the following con- traints. First, every VM should be hosted by one server. Second, he CPU resource of each server should be enough to satisfy the CPU equirements of all the VMs hosted on it during the entire invoca- ion period. Third, the memory resource of each server should be nough to satisfy the memory requirements of all the VMs hosted n it. The constraints are modeled as VMij, ∃1 ≤ l ≤ n, s.t.VMij ∈ Sl (7) ax ⎧⎨ ⎩ ∑ VMij∈Sl �cij ⎫⎬ ⎭ ≤ F (l) max (1 ≤ l ≤ n) (8) formatics and Systems 4 (2014) 52–65 ∑ VMij∈Sl MEM(VMij) ≤ MEM(Sl) (1 ≤ l ≤ n) (9) We include constraints (7)–(9) in the experiments in Section 6. Other constraints defined by the system administrators can be added to the optimization algorithm. We list several possible con- straints here. 1. The constraint of the available IO or network bandwidth and other computing resources. 2. The constraint of the power distribution infrastructure and power capacity. 3. The constraint of thermal densities and heat dissipation. Please note that here we adopt a simple CPU resource con- straint (8) to focus on the power optimization algorithm itself. More complex formulations of CPU resource constraints, e.g., the 90-percentile CPU resource constraint [14], can be easily integrated into our power optimization algorithm. 4.2. Power model To have an effective power optimizer design, it is important to model the power consumption of the servers. It is well-known that DVFS can allow cubic reductions in power density relative to per- formance loss for the processor in a server. Recent studies (e.g., [15]) present a model between the power consumption of a server and the frequency of the processor as: pi(k) = aif 3i (k) + bi (10) where ai and bi are generalized parameters that may vary for different servers. Please note that aif 3i (k) represents the power consumption contributed by the processor of Server i, while bi rep- resents the power consumed by all other server components, such as memory, hard disk, and fan. We approximate the power con- sumption of those components as bi because we do not adapt their power states in this paper. Such a power model has also been used in related work (e.g., [16,17]). For servers with other components that also have tunable power states, our power model can be easily extended to include those knobs. For example, active/sleep states for the main memory have been explored in [18] and the DVFS technique has also been applied to memory in [19]. For a server with such memory, our model can extended as pi(k) = aif 3i (k) + bimi(k) + ci, where bimi(k) represents the power consumption of the main memory and ci is the power of other components (other than the processor and memory) in Server i. We plan to explore this in our future work. As described in Section 3.3, the CPU resource arbitrator enforces (5) by dynamically changing the frequency of the physical proces- sor. As a result, the power consumption of a server is modeled as: pi(k) = ai(min ⎧⎨ ⎩ ∑ VMjl∈Si cjl(k) + c0, Fimax ⎫⎬ ⎭) 3 + bi (11) The parameters ai and bi are then estimated from an offline analysis by curve fitting [17]. Using the power models (10) and (11), it is easy to ascertain (1) the power efficiency of the server, i.e., the ratio between the maximum CPU frequency and maximum power consumption of the server, and (2) how the power consumption changes if a VM is moved from one server to another. The information is used to design the optimization algorithm.
  • ing: In 4 t N t p p u t c b M a A i i t p a a t h 2 r c n s p i m t A l A b t t s t o A l s V a c i i r d o f c b t e V a [ t p Y. Wang, X. Wang / Sustainable Comput .3. Optimization algorithm The optimization problem formulated in Section 4.1 falls in he category of vector-packing problems which are known to be P-hard [20]. Therefore, we propose a heuristics-based optimiza- ion algorithm to find a polynomial time approximate solution. Minimum slack problem for a single server: We begin with a sub- roblem named the minimum slack problem. The problem can be resented as: given a server (not necessarily empty) and a list of nallocated VMs, select several VMs from the list, and allocate them o the server, such that the server has the least amount of unallo- ated CPU resource. This problem is a special case of the minimum in slack (MBS) problem. Although it is an NP-hard problem, the BS problem can be solved in pseudo-polynomial time [21]. The lgorithm to solve the minimum slack problem is summarized in lgorithm 1. This algorithm is extended from the MBS algorithm n [21] by evaluating a more general constraint (8) in each step, nstead of checking if the total size of the items exceeds the size of he bin. Power aware consolidation for a list of servers: Another sub- roblem, the power aware consolidation problem, can be presented s: given a list of servers (some servers are possibly not empty) nd a list of VMs, consolidate the VMs to the servers, such that the otal power consumption of the servers is minimized. Our proposed euristic algorithm to solve this problem is described in Algorithm . In the first step, the servers are sorted by power efficiency, i.e., the atio between the maximum CPU frequency and maximum power onsumption of the server, which can be derived from (10). Begin- ing from the most power-efficient server, we use Algorithm 1 to elect several VMs from the remaining unallocated VMs, and then ack these VMs to this server such that the unused CPU resource n this server is minimized. We repeat this process with the next ost power-efficient server until every VM in the list is allocated o a server. Incremental Power Aware Consolidation (IPAC) algorithm: Using lgorithm 2 to directly consolidate all the VMs may incur a very arge number of VM migrations with a large overhead. Therefore, lgorithm 2 is invoked incrementally such that only a small num- er of VMs in a migration list are considered for consolidation each ime. In each invocation period, some servers may be unable to host heir VMs due to the possible workload increase. The algorithm first elects some VMs from these overloaded servers and adds them to he migration list to resolve the overload problem. Then, the VMs n the least power efficient server are added to the migration list. lgorithm 2 is invoked to consolidate the VMs in the migration ist to the servers. After the consolidation, if the number of active ervers is reduced, Algorithm 2 is invoked again to consolidate the Ms on the next least power efficient server until the number of ctive servers no longer decreases. Cost-aware VM migration: The cost of the VM migration can be onsiderable. For example, if the network bandwidth is a bottleneck n a data center, a VM migration with high bandwidth consumption s the least preferred method. As a result, when the IPAC algorithm equests a migration, benefits and costs should be compared to ecide if the migration should be allowed or rejected. The benefits f the migration include power savings which can be easily inferred rom the model in (11). The cost of migration depends highly on the ondition of the data center such as the network architecture, the andwidth usage, the application itself and the memory usage of he VMs. Thus, the cost function can be highly different for differ- nt data centers. Also, during VM migration, the performance of the Ms on the source server and the source server will be negatively ffected. An example of the impact is shown in our previous work 8]. As a result, we provide an interface for data center adminis- rators to define their own cost functions based on their various olicies. formatics and Systems 4 (2014) 52–65 57 Algorithm 1. Minimum slack q: a list of unallocated VMs. NVM: the total number of unallocated VMs. S: the server in consideration. ε: allowed slack, initialized to 0. s*: minimum slack. A*: the collection of VMs best fits S. Minimum-Slack (q) begin 1: Sort VMs in q by CPU requirement. 2: for all VM VMi in q do 3: Pack VMi into S. 4: if S and the VMs allocated to it meets the constraint (8) then 5: if Remaining CPU resource in S
  • 5 ing: Informatics and Systems 4 (2014) 52–65 c w V m w a ε r o t c fi q c I b o t w t b t T w i T � w a t s i t f w 5 t e c 5 S s t S 2 t X 3 E t i m w a VM11 VM21 VM42 VM62 VM31 VM22 VM82 VM41 App1 App2 VM72 VM32 VM51 VM61 VM12 VM71 VM81 VM52 App3 App4 App5 App6 App7 App8 8 Y. Wang, X. Wang / Sustainable Comput omplexity of this case can be analyzed in a similar way as in [21], hich gives the result of O(nu) where u is the maximum number of Ms that can be hosted in a server and n is the number of VMs in the igration list. This complexity is acceptable only when u is small, hich can be true when the number of VMs that can be hosted in server is limited by the available resources of a server. Another extreme case is to set �� = 100%. In this case, because is initialized to 0 at the beginning, the algorithm ends after the ecursive function has been called for NVM times. From Line 5-8 f the algorithm, the condition that the function being called NVM imes is that NVM combinations that have been tested to meet the onstraints in Line 4. These combinations cover the result of the rst fit decreasing (FFD) algorithm and thus give better solution uality than FFD. This gives the best computational time and the omplexity is O(NVM). This case provides the worst solution quality. n other words, the worst-case solution quality of Algorithm 1 is etter than first fit decreasing (FFD) algorithm. Now we analyze the worst-case computation time for a choice f 0 ≤ �� ≤ 1. Because the remaining CPU resource of the server has o be less or equal to 100% of the total CPU resource, the algorithm ill exit at Line 6 when the value of ε exceeds 100%. Therefore, in he worst case, the computation time of this algorithm is bounded y the time ε grows to 100%. Therefore, the worst-case execution ime Tworst is bounded by: worst ≤ T0 �� (12) here T0 is the time when the recursive function in Algorithm 1 s entered by NVM times. The value of T0 can be measured online. herefore, given an expected execution time, a minimum value of � can be selected based on (12) to achieve better performance ithin the expected execution time. The latency caused by switching servers between sleep and ctive modes adds additional overhead to our algorithm. However, he latency can be eliminated by always keeping one or two idle ervers active. In that way, the optimizer can immediately use an dle server when it needs to wake up a server. After the idle server is aken by the optimizer, another server is waken up to get prepared or the optimizer. As a result, the optimizer never actually needs to ait for a server to wake up. . System implementation In this section, we first introduce our testbed and the implemen- ation details of each component. We then present the simulation nvironment used to test our IPAC algorithm in various cluster onfigurations. .1. Testbed Our testbed includes a cluster of four physical computers named 1 to S4. A fifth computer named Storage is used as the storage erver for the Network File System (NFS) and is not part of the clus- er. All the computers run Fedora Core 8 with Linux kernel 2.6.21. 1 and S3 are each equipped with 4 GB RAM and an AMD Opteron 222SE processor, which supports 8 frequency levels from 1 GHz o 3 GHz. S2 and S4 are each equipped with 4G of RAM and an Intel eon X5160 processor, which has 4 frequency levels from 2 GHz to GHz. All the servers are connected via a gigabit Ethernet switch. very server is connected with a power meter to measure and log he power consumption. Xen 3.3 is used as the virtual machine monitor on all four servers n the cluster. We use a PHP implementation of the RUBBoS bench- ark [22], a standard bulletin board benchmark, as our server side orkload. Each instance of RUBBoS is configured to be a two-tier pplication running in two VMs. The first tier has an Apache server S1 S2 S3 S4 Fig. 2. The application and VM configuration on the 4-server hardware testbed. installed and works as a webserver running application scripts. The second tier has a MySQL database installed and acts as a database server. We run 8 applications in our cluster. Each is an instance of RUBBoS. Therefore, we use 16 VMs on the four servers. The config- uration is shown in Fig. 2. In our server consolidation experiment, to demonstrate that the power optimizer consolidates the servers when the cluster has light workloads, we enable 6 VMs to run 3 applications (App3–App5). The client-side workload generator is the Apache HTTP server benchmarking tool (ab). This tool allows users to manually define the concurrency level, which is the number of requests to perform in a short time, to emulate multiple clients. A concurrency level of 40 is used in our experiments to do system identification and most experiments, if not otherwise indicated. The workload generator runs on the Storage computer. 5.2. Control components We now introduce the implementation details of each compo- nent in our response time control loop. Response time monitor. To eliminate the impact of network delays, we focus on controlling the server-side response time in this paper. The response time monitor is implemented as an Apache module in each VM running Apache. For every request added to the application, two time stamps are used when a request is received and when the corresponding response is made, respectively. The difference value is used as the server side response time of the request. At the end of each control period, we calculate the 90- percentile response time of all requests finished during the control period. This is used as the server-side response time and is sent to the corresponding response time controller. Response time controller. As introduced in Section 2, a response time controller exists for each application. The controller imple- ments the control algorithm presented in Section 3. A cluster-level daemon written in C++sequentially runs the controller of every application on Storage and computes the total CPU resource requested by all the VMs. The daemon program calls the CPU resource arbitrator to enforce the desired CPU allocation, based on the CPU resource requests, and enforces the CPU frequency. The control period of the response time controller is selected to be 6 seconds such that multiple HTTP requests can be processed in one period. The set point of the response time controller (i.e., the desired response time) is selected to be 1 s. This number can be chosen by the system administrator based on the nature of the application and can be changed online. The system parameters in (1) resulting from the system identification are ˛11 = 0.2663, ˇ11 = [−0.9116, − 1.527], ˇ12 = [0.2717, − 0.6833]. In the controller design, we set Q = 1000 and R = diag{300, 100} in (2) to provide preferential treatment to the Apache VM in each application since it typically requires more CPU resource. We set Tref = 2T in (3) for a trade-off, as a smaller Tref causes the system to converge faster to the set point but may lead to larger overshoot. We set M = 6 and P = 8 in (3). CPU resource arbitrator. On every server, a daemon program written in Perl runs a CPU resource arbitrator as introduced in
  • Y. Wang, X. Wang / Sustainable Computing: In S r C t p a p t a c g a a 5 h w T l fi u ( a d t fi m a V i w u r T S Fig. 3. Workloads recorded in part of the trace file. ection 3.3. In every control period, the arbitrator collects the CPU esource requirements, allocates the CPU resource, and changes the PU frequency using the approach introduced in Section 3.3. For he CPU resource allocation, we use Credit Scheduler, Xen’s pro- ortional share scheduler. In Credit Scheduler, each VM is assigned cap and a weight. Cap represents the upper limit of CPU time (as a ercentage) that can be allocated to the VM. Weight is used to give he preference among different VMs. In this paper, we use cap to llocate CPU resource and use the same fixed weight for all VMs. Power monitor: The power consumption of each server in the luster is measured with a WattsUp Pro power meter [23] by plug- ing the server into the power meter, which is then connected to standard 120-V AC wall outlet. The WattsUp power meter has an ccuracy of 1.5% of the measured value. .3. Simulator To evaluate our power optimizer in large-scale data centers, we ave developed a C++simulator that uses a trace file from real- orld data centers [24] to simulate the CPU utilization variations. he trace file includes the utilization data of 5415 servers from ten arge companies covering the manufacturing, telecommunications, nancial, and retail sectors. The trace file records the average CPU tilization of each server every 15 minutes from 00:00 on July 14th Monday) to 23:45 on July 20th (Sunday) in 2008. These servers run variety of typical data center workloads such as email, file sharing, atabase, and backup. Fig. 3 lists the distribution of the workloads ypes in the servers. Note that the mixed workload in this trace le actually represents an example of how realistic workloads are ixed in a real data center. We treat the utilization data of each server as the CPU demand of VM. We randomly generate 3000 simulated servers to host these Ms based on the parameters used in the simulated servers listed n Table 1. Each server features a power model based on Eq. (10) ith the parameters ai and bi measured from our hardware results sing real servers and standard web application benchmarks, as eported in [25]. able 1 erver models used in simulations. Type Active power Processor Max Min Frequency range Chips/ cores Transport GT24 (B2912-E) 264 W 121 W [0.8 GHz, 2.7 GHz] 2/8 Supermicro 2021M-UR+ 269 W 138 W [0.8 GHz, 2.5 GHz] 2/8 HP ProLiant DL385 G5p 257 W 147 W [0.8 GHz, 2.7 GHz] 2/8 Dell PowerEdge 2970 302 W 139 W [1.15 GHz, 2.3 GHz] 2/8 HP ProLiant DL385 G6 258 W 120 W [0.8 GHz, 2.6 GHz] 2/12 formatics and Systems 4 (2014) 52–65 59 6. Experimentation In this section, we present our empirical and simulation results. We first evaluate the response time controller and examine the power optimizer on the hardware testbed. We then present the simulation results of our power optimization algorithm in a data center with 3000 servers and about 5415 VMs. 6.1. Baselines Because our solution is an integration of DVFS-based technol- ogy and server consolidation technology, we evaluate our solution by comparing with combinations of both DVFS-based technologies and a state-of-the-art server consolidation algorithm. The evalua- tion is performed both on a hardware testbed and a simulator with traces from real data centers. Here we introduce the baselines we use. pMapper is a heuristic-based server consolidation algorithm we use as a baseline [7]. PMapper is an incremental algorithm with two phases. In the first phase, it sorts the servers based on their power efficiency, then consolidates the VMs to the servers using a first- fit algorithm, beginning with the most power efficient server. Note that in this phase, the VMs are not actually migrated. In the second phase, pMapper computes the list of servers that require a higher utilization in the new allocation, and labels them as receivers. For each donor (servers with a target utilization lower than the cur- rent utilization), it selects the smallest-sized applications and adds them to a VM migration list. It then runs first-fit decreasing (FFD) to migrate the VMs in the migration list to the receivers. Several key differences exist between pMapper and our IPAC algorithm. First, pMapper is adapted from FFD while IPAC tests much more combinations and tries to find a better solution. There- fore, IPAC will typically provide a better consolidation solution than pMapper in terms of power consumption, especially when facing constraints such as memory constraint and bandwidth constraint. Second, the better consolidation solution of IPAC comes with the cost of a higher computational overhead. However, the computa- tion time can be estimated such that the computation can finish in a desired time frame. Third, IPAC typically has a smaller number of VMs considered for migration in each iteration when the num- ber of servers is large. This is due to the different ways IPAC and pMapper construct the migration list: while pMapper selects VMs from servers with a target utilization lower than the current utiliza- tion, IPAC selects VMs from two sources: (a) over-utilized servers, and (b) some least power efficient servers. The part of (a) is only a subset pMapper’s migration list and the part of (b) is usually not significant when the number of servers is large. Another baseline we user to evaluate our response time con- troller is on-demand [26]. We compare our MPC response time controller and CPU resource arbitrator with on-demand and then compare our IPAC consolidation algorithm with pMapper. Similar to a recent study [27], on-demand adopts control theory to achieve desired application response time by adjusting the CPU resource (e.g., CPU time) allocated to the VM in each tier. Since the scheme presented in [27] does not manage power, in order to save power using DVFS, on-demand is integrated with the widely used on- demand governor built in the recent releases of Xen to dynamically tune DVFS based on the measured CPU utilization of a server. If the utilization is higher than an administrator-determined value UP THRESHOLD, the processor DVFS is set to the highest level. If the utilization is lower than another administrator-determined value DOWN THRESHOLD, the DVFS of the CPU is decreased by one level at a time. Note that on-demand conservatively tunes DVFS between discrete levels, such that the server utilization stays below the threshold with a gap that sometimes can be large. In contrast, our solution adopts the delta-sigma modulator discussed
  • 60 Y. Wang, X. Wang / Sustainable Computing: Informatics and Systems 4 (2014) 52–65 0 500 1000 1500 App 1 App 2 App3 App4 App 5 App 6 App7 App 8 R es po ns e Ti m e (m s) Applica tion F i i b t r w t D b 6 t c a d T e c c n s A m w T w t s t t i 1 s F m w r t a a s r m i r T c t t 2000 3000 e Ti m e ) Response Time Set Po int 0 1000 2000 0 300 600 900 1200 1500 R es po ns e (m s) Time (s) (a) Response time of App5 (response time controller) 590 620 650 r ( W ) 500 530 560 0 30 0 600 90 0 1200 150 0 Po w e Time (s) (b) Po wer of th e cluste r (res pons e tim e co ntroller) 2000 3000 Response Time Se t Poin t 0 1000 2000 0 30 0 60 0 90 0 120 0 150 0 R es po ns e Ti m e (m s) Time (s) (c) Respons e tim e of App 5 (pMap per) 590 620 650 r ( W ) 500 530 560 0 300 600 90 0 1200 15 00 Po w e r Time (s) (d) Power consumptio n of th e cluste r (pMap per) Fig. 5. Typical runs of the response time controller and the baseline pMapper under three randomly selected applications (with standard deviations) under the proposed performance controller when the concurrency level varies from 30 to 80. The results of other applications are ig. 4. The average of the 90-percentile response times of all the eight applications n the cluster. n Section 3.3 to approximate any fractional DVFS level determined y the CPU resource arbitrator. As a result, our solution allows racking the CPU resource requirements more accurately, which esults in less power consumption. We also integrate pMapper with the DVFS-based technology e proposed in this paper. By comparing with this configura- ion, we demonstrate that, even when integrating with the same VFS-based technology, IPAC still provides more energy savings y having better consolidation solutions. .2. Response time control In this experiment, we disable the power optimizer to evaluate he response time controllers of the 8 applications running in the luster as shown in Fig. 2. We first set the response time target for all pplications to be 1000 ms. Fig. 4 plots the means and the standard eviations of the response times of the applications in the cluster. his figure demonstrates that the response time controller works ffectively to achieve the desired response time for all the appli- ations. Fig. 5(a) and (b) show a typical run of the response time ontroller for a randomly selected application, App5. At the begin- ing of the run, the controller achieves the desired response time et point, i.e., 1000 ms, after a short settling time. The workload of pp5 increases significantly at a time of 600 s. This is common in any web applications, e.g., breaking news on a major newspaper ebsite may incur a large number of accesses in a short time frame. o stress test the performance of our controller in such a scenario, e increase the concurrency level of App5 from 40 to 80 between ime 600 s and time 1200 s to emulate the workload increase. The uddenly increased workload causes App5 to violate its response ime limit at time 600 s. The response time controller responds o the violation by allocating more CPU resource to the two VMs n both tiers. As a result, the response time of App5 converges to 000 ms again and the power consumption of the cluster increases lightly due to the increased CPU resource usage, as shown in ig. 5(b). We use the baseline on-demand for comparison in this experi- ent. Since this baseline does not manage CPU resource allocations e assume that CPU resource are equally allocated to all the VMs unning on every server. This baseline represents a typical solu- ion that relies only on server consolidation to save power without pplying DVFS and resource allocation to guarantee performance nd save power on short time scales. Fig. 5(c) and (d) shows the performance of the baseline in the ame scenario. Clearly, during the workload increase, the relative esponse time of App5 increases significantly. The inferior perfor- ance of the baseline is caused by its policy to allocate CPU resource n a static way. Consequently, the baseline cannot adjust CPU esource among different VMs in response to workload variations. his experiment demonstrates that the response time controller an effectively maintain application-level response times. In addi- ion, a server consolidation algorithm alone may not be able o provide application-level performance assurance without a a workload increase from time 600 s to 1200 s on App5. (a) Response time of App5 (response time controller), (b) Power of the cluster (response time controller), (c) Response time of App5 (pMapper), (d) Power consumption of the cluster (pMapper). performance controller. Therefore, it is important to integrate dynamic CPU resource allocation, DVFS, and server consolidation for maximized power savings with performance assurance. To test the robustness of the response time controller when it is applied to a system that is different from the one used to do system identification, we conduct a set of experiments with wide ranges of concurrency levels. Fig. 6 shows the average response times of Fig. 6. Average response times of three randomly selected applications under dif- ferent workloads.
  • Y. Wang, X. Wang / Sustainable Computing: Informatics and Systems 4 (2014) 52–65 61 Fig. 7. Average response times of three randomly selected applications under dif- ferent set points. Fig. 8. Response time of an application during the VM running its database service migrates to another server. 0 500 1000 1500 800 900 1000 1100 1200R es p o n se T im e (m s) Setpoint Our Solution On-demand s t s r c l d a a c a s p c F Setpoint (ms) Fig. 9. Average response time of our solution and the on-demand baseline. imilar and not shown due to space limitations. Fig. 7 shows he average response times of the same three applications (with tandard deviations) under the performance controller when the esponse time set point increases from 600 ms to 1300 ms. The ontroller achieves the desired response times for all the currency evels and set points and for all the applications. The experiments emonstrate that the response time controller works effectively to chieve the desired response time when the actual system is under workload different from the one used to design the controller. Figs. 9 and 10 compare the average response time and power onsumption of our algorithm and the on-demand baseline. We ctivate Applications 4, 5, and 7 in this experiment. Although both olutions achieve an average response time that is close to the set oint, our solution consumes 529.7 W of power while on-demand onsumes 604.9 W. As discussed earlier in this section, the better 400 500 600 700 800 700 800 900 1000 1100 A ve ra g e P o w er C o n su m p ti o n (W ) Setpoint (ms) Our Solution On-demand ig. 10. Average power consumption of our solution and the on-demand baseline. Fig. 11. Power consumption of the cluster. Power optimizer is activated at time 600 s. energy-efficiency of our solution is due to the reason that our solu- tion can more accurately track the CPU resource requirements of the servers. 6.3. Server Consolidation for Power Optimization In this subsection, we evaluate our power optimizer designed for dynamic server consolidation on our testbed. Fig. 11 shows the power consumption of the cluster mea- sured using a power meter before and after a server consolidation. Initially, all servers are active. The power consumption is approxi- mately 544 W. At time 600 s, the power optimizer is activated and consolidates all 6 VMs to the most power-efficient server. After 102 s, the three idle servers are put to sleep, reducing the power consumption of the cluster to approximately 165 W. This leads to a power saving of 69.6%. In a typical server consolidation process, virtual machine live migration is used to shift a VM from one server to another. It is important to understand how migration affects the application- level performance. Fig. 8 demonstrates the response time of an application when the VM running its database service migrates from one server to another. Initially, the response time has been successfully controlled by the response time controller. When the optimizer starts the VM migration, the response time has a temporary increase since the migration operation consumes some CPU and bandwidth resources. After two control periods, the application-level response time again achieves its desired set point. This experiment demonstrates that our management solu- tion effectively reduces the power consumption of the cluster while the application-level response time is only affected for a very short time during the migration. We conduct a series of experiments to test the efficacy of our solution under different number of active applications. We mea- sure the power savings of our solution from the power meter and compare it with a baseline when all servers are kept active and all VMs are deployed among the servers in a balanced way. Figs. 12 and 13 show the power savings and the average perfor- mance of the applications in these experiments, respectively. The power saving changes from 294.0 W to 502.6 W as the number of active applications changes from 8 to 1. The power saving signif- icantly increases as the number of application changes from 7 to 6 and from 4 to 3. This is because the number of active servers decreases in these experiments. The response time controller is able Fig. 12. Average power savings of the cluster with different number of applications.
  • 62 Y. Wang, X. Wang / Sustainable Computing: Informatics and Systems 4 (2014) 52–65 F a t e m b 6 d t r d b a d ( t m p b c m e e i s f i p v i D a f p f e t u s F 85% 90% 95% 100% ta ge Energy savings from better consolidation of IPAC 70% 75% 80% 85% 30 1030 2030 3030 4030 5030 Pe rc en Number of VMs Energy savings from DVFS Fig. 15. Percentage of energy savings (over pMapper) as results of DVFS and better consolidation of IPAC. 600 800 1000 of A ct iv e ve rs pMapper IPAC 0 200 400 30 1030 2030 3030 4030 5030 N um be ro Se rv Number of VMs Fig. 16. Number of servers kept active in 7 days under different numbers of VMs. 30000 40000 50000 ig ra ti o n s IPAC pMapper 0 10000 20000 30 1030 2030 3030 4030 5030N u m b er o f M i Number of VM s other hand, the migration list of pMapper contains selected VMs from each donor i.e., the server whose utilization is higher than ig. 13. Response time of the applications in the cluster with different number of pplications. o meet the set point of 1000 ms for all the applications in all these xperiments. These experiments show that our solution provides ore power savings when the utilization of the cluster decreases, y putting more servers into the sleep mode. .4. Simulation results for large data centers In this experiment, we test our power optimizer in large-scale ata centers using the simulation environment introduced in Sec- ion 5.3. We simulate 55 data centers with different number of VMs, anging from 30, 130 to 5330, 5415 with a step size of 100. Every ata center is assumed to have enough inactive servers which will e waken up and used if necessary. The parameters of the servers re introduced in Section 5.3. As an example of administrator- efined real world constraints, in addition to the constraints in 8), we add a restriction to the optimization algorithm such that he memory size of every server should be greater than the total emory allocations of the hosted VMs. Fig. 14 plots the average energy consumption per VM of IPAC and Mapper in 7 days under different data centers with different num- er of VMs. In comparison to pMapper, IPAC shows lower energy onsumption in all these simulations. On average, IPAC has a 60.5% ore energy saving han pMapper. It is important to note that the nergy savings of IPAC are due to two reasons. First, as discussed arlier, pMapper is adapted from FFD while IPAC has better consol- dation quality than FFD. Therefore, IPAC consolidates VMs to fewer ervers than pMapper does. Second, IPAC is integrated with DVFS or power savings on a short time scale between two consecutive nvocations of the optimization algorithm. Thus, IPAC saves more ower when a lower CPU frequency is allowed due to short-term ariations on the CPU requirements. Note that the testbed exper- ments in Section 6.2 show a smaller power saving because only VFS is tested and the CPU requirements of the workloads are rel- tively stable. To demonstrate the contribution to energy savings rom the two reasons, we plot the fraction of energy savings over Mapper resulted from DVFS in Fig. 15. On average, DVFS accounts or 91.7% of the total energy savings over pMapper and the rest 8.3% nergy savings are resulted from the better consolidation quality hat IPAC provides. Fig. 16 plots the number of servers kept active nder different numbers of VMs deployed in the system. We can ee that the difference between IPAC and pMapper becomes greater 5000 10000 15000 20000 25000 30000 y C o n su m p ti o n er V M ( W h ) IPAC pMapper 0 5000 10000 15000 20000 25000 30000 30 1030 2030 3030 4030 5030E n er g y C o n su m p ti o n p er V M ( W h ) Number of VMs IPAC pMapper ig. 14. Energy consumption per VM in 7 days under different numbers of VMs. Fig. 17. Number of migrations of IPAC and pMapper in 7 days under different num- bers of VMs. when the number of VMs increases. This demonstrates that IPAC can lead to more energy savings in large-scale data centers (see Fig. 17). Fig. 18 demonstrates that IPAC conducts a smaller number of migrations than pMapper, especially for data centers with a large number of VMs. On average, IPAC moves each VM 1.31 times in a week while pMapper moves each VM 16.13 times. This is due to the reason that, in each period, IPAC only considers a much smaller list of VMs for migration. The migration list for IPAC contains only selected VMs from the overloaded servers and the VMs on one or more least power-efficient servers (see Section 4 for details). When the number of VMs is large, the major part of the migra- tion list is the selected VMs from the overloaded servers. On the an ideal utilization target determined by an FFD algorithm. Every overloaded server is a donor since the utilization of a overloaded 60000 80000 100000 120000 ig ra ti o n s IPAC pMapper pMapper+MigCost 0 20000 40000 60000 0 10 20 30 40 50 60 70 80 90 100N u m b er o f M Migration Cost to Energy Cost Ratio Fig. 18. Number of migrations of IPAC, pMapper, and pMapper with considera- tion of migration (denoted as pMapper+MigCost) in 7 days under different ratios of migration cost to energy cost.
  • Y. Wang, X. Wang / Sustainable Computing: Informatics and Systems 4 (2014) 52–65 63 1000 1500 2000 n T im e (s ) IPAC pMapper 0 500 000 30 1030 2030 3030 4030 5030 E xe cu ti o n Number of VMs F V s g F t V t o N [ c a ( t F s m a 6 a w i C s e m v a � F e p p u a p b t f r a e a 7 s h 40000 60000 80000 100000 n T im e( s) 0 20000 40000 1 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0. 08 0. 06 0. 04 0. 02 0. 01 E xe cu ti o n The value of (a) Total execution time in 7 days 40400 40600 40800 41000 su m p ti o n ) 39800 40000 40200 40400 1 0. 9 0. 8 0. 7 0. 6 0. 5 0. 4 0. 3 0. 2 0. 1 0. 08 0. 06 0. 04 0. 02 0. 01E n er g y C o n s (W h ) The value of (b) Solutio n Qualit y ig. 19. Execution time of IPAC and pMapper in 7 days under different numbers of Ms. erver (≥100%) is higher than any reasonable ideal utilization tar- et (≤100%). However, not every donor is an overloaded server. or example, if the current utilization of a server is 95%, and the arget utilization determined by FFD is 90%, pMapper will move Ms out of this server while IPAC will not. Therefore, the migra- ion list of pMapper is longer than that of IPAC when the number f servers is large, leading to more migrations needed for pMapper. ote that pMapper has been extended to consider migration cost 7]. To have a fair comparison, we evaluate IPAC with migration ost considerations (i.e., cost-aware VM migration in Section 4.4) gainst both pMapper and pMapper with migration consideration i.e., pMapper+MigCost) using a metric defined in [7], migration cost o energy cost ratio, which converts the migrations into energy cost. ig. 18 shows that the number of migrations of pMapper+MigCost ignificantly decreases as the ratio increases, i.e., the price of each igration operation becomes higher. However, pMapper+MigCost lways has more migrations than IPAC. .5. Execution time measurement To examine the execution time of our algorithm, we run our lgorithm in a virtual machine with a single core and is allocated ith an Amazon EC2 Compute Unit. An Amazon EC2 Compute Unit s a standard CPU capacity approximately provides the equivalent PU capacity of a 1.0–1.2 GHz 2007 Opteron or 2007 Xeon proces- or [28]. The execution time of our algorithm is measured in the xecution of our simulation. As discussed in Section 4, the execution time of IPAC is deter- ined by the selection of a design parameter ��. For different alues of ��. A smaller �� means a better solution quality and longer execution time. We first compare the execution time of IPAC and pMapper when � = 1%. The algorithm used in IPAC is more complex than the FD algorithm used in pMapper. However, note that IPAC is only xecuted with a small number of VMs in the migration list while Mapper needs to execute FFD algorithm on all the VMs. Fig. 19 resents the total execution time of the two algorithms in 7 days nder different numbers of VMs. It can be observed that IPAC has longer but still acceptable execution time, compared with pMap- er. However, IPAC outperforms pMapper significantly in terms of oth power efficiency and migration penalty, as discussed before. In Fig. 20, we present the change in the execution time and he energy consumption of IPAC when the value of �� changes rom 1 to 0.01 while the number of VMs is fixed at 5, 415. The esults are consistent with our analysis in Section 4.4, i.e., while smaller �� provides a better consolidation quality in terms of nergy consumption, the execution time of IPAC increases quickly s �� increases. . Related Work Power consumption is one of the most important design con- traints for high-density servers. The majority of the prior work as attempted to reduce power consumption by improving the Fig. 20. Execution time and energy consumption of IPAC: a greater �� means a shorter execution time. (a) Total execution time in 7 days. (b) Solution Quality. energy-efficiency of individual server components [29]. Several research projects have successfully developed power management algorithms at the server level [13], the cluster level [17,30–32], and the data center level [24]. More closely related to this paper, Heo et al. [33] developed a power optimization algorithm based on DVFS for large-scale applications spanning several non-virtualized servers. In contrast to their work, our solution provides even greater power savings by consolidating underutilized servers onto a smaller number of active servers. Virtualization technology has provided a promising way to manage application performance by dynamically reallocating resources to VMs. Several management algorithms have been pro- posed to control application performance for virtualized servers [34,27,35–38]. For example, Padala et al. [37] proposed to con- trol throughputs for virtualized servers. In contrast, our algorithm controls response time, a user-perceived performance metric. In addition to performance assurance, we integrate DVFS with server consolidation for maximized power savings. Further- more, most existing work assumed simple single-tier applications [34,27,35,36] while our solution relies on advanced MIMO con- trol theory to deal with complex web applications that may span multiple VMs. While performance management for multi-tier applications has been studied before [39,40], we focus on a dif- ferent problem: minimizing the power consumption of multi-tier applications in virtualized data centers. VM migration is an important tool for resource and power management in virtualized computing environments. Some prior work focused on using migration to satisfy the resource requests of VMs [41,42,20,8]. In contrast, our work takes the total power consumption of the cluster as the design goal and designs a power- efficient algorithm while guaranteeing the desired performance requirements. Several recent studies proposed server consolidation algorithms for power savings [16,20,7,14]. For example, Raghaven- dra et al. [16] used a greedy bin-packing algorithm as part of their coordinated control solution to minimize power consumption in virtualized data centers. This paper has two differences compared to their work. First, our solution provides response time guarantees to multi-tier applications while their work focuses on single-tier applications. Second, our solution dynamically re-allocates CPU resource among the VMs while their performance control relies only on DVFS. Thus, our solution can allocate CPU resource to VMs based on their needs, which saves additional power. Similarly, some
  • 6 ing: In r p I g u t c p s e S q e a p e i t a a a s p m m l a 8 s p W a o m a a r t l V s s t m f r d A C N R [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ 4 Y. Wang, X. Wang / Sustainable Comput ecent papers have proposed heuristics based on variants of bin- acking algorithms such as Best Fit Decreasing (BFD) [43,44], Mixed nterger Programing (MIP) [45]. Besides the difference that we inte- rate with CPU resource allocation and DVFS, our solution has a nique feature by providing an interface of variable �� that allows he users to have a compromise between solution quality and exe- ution time. The VM-server mapping problem falls in the category of vector- acking problems (or its subset bin-packing problem) in computer cience. A detailed survey of this problem is given in [46]. Sev- ral recent papers [21,47,48] demonstrated that the Minimum Bin lack (MBS) algorithm provides better results in terms of solution uality when compared to traditional algorithms. In this paper we xtend the basic MBS algorithm by considering power efficiency nd a more realistic constraint to solve our problem to reduce the ower consumption in a cluster of heterogeneous servers. Jung t al. [49,50] introduce algorithms to provide performance control n open-loop optimizations based on performance gains and adap- ion costs, and packs VMs to physical servers using a bin-packing lgorithm similar to worst-fit. Compared with these algorithms, our lgorithm provides better consolidation solutions because these lgorithms are based on worst-fit. Also, our algorithm is more calable because, instead of trying to optimize the power and erformance of every VM in the entire data center in a single opti- ization problem, we divide the problem into three levels with uch smaller solution space, and solve them using a datacenter- evel optimizer, an application-level response time controller and server-level CPU resource arbitrator. . Conclusions In this paper, we have presented a performance-controlled erver consolidation solution for virtualized data centers to achieve ower efficiency and application-level performance assurance. hile existing solutions rely on DVFS or server consolidation in separate manner, our solution integrates feedback control with ptimization to utilize both DVFS and server consolidation for aximized energy savings with performance guarantees. At the pplication level, a novel MIMO controller is designed based on dvanced MPC control theory to achieve the desired 90-percentile esponse time for applications spanning multiple VMs, on a short ime scale, by reallocating CPU resource and DVFS. At the cluster evel, a power optimizer is proposed to dynamically consolidate Ms onto the most power-efficient servers on a much longer time cale. Empirical results on a hardware testbed demonstrate that our olution outperforms pMapper, a state-of-the-art server consolida- ion algorithm, by having greater power savings and much smaller igration overheads while achieving the required application per- ormance. Extensive simulation results, based on a trace file of 5415 eal servers, demonstrate the efficacy of our solution in large-scale ata centers. cknowledgments This work was supported, in part, by NSF under grant NS-1143607 (NSF CAREER Award) and by ONR under grant 00014-09-1-0750. eferences [1] Y. Wang, X. Wang, Power optimization with performance assurance for multi- tier applications in virtualized data centers, in: The Second International Workshop on Green Computing (GreenCom 2010), held in conjunction with the 39th International Conference on Parallel Processing (ICPP), 2010. [2] AMD, White Paper Publication 26094: BIOS and Kernel Developer’s Guide for AMD Athlon 64 and AMD Opteron Processors, Revision 3.30, Advanced Micro Devices, Inc. (February 2006). [ formatics and Systems 4 (2014) 52–65 [3] M. Ware, et al., Architecting for power management: the POWER7 approach, in: HPCA, 2010. [4] C. Clark, K. Fraser, S. Hand, J.G. Hansen, E. Jul, C. Limpach, I. Pratt, A. Warfield, Live migration of virtual machines, in: NSDI’05: Proceedings of the 2nd Conference on Symposium on Networked Systems Design & Implementation, 2005. [5] M.R. Hines, K. Gopalan, Post-copy based live virtual machine migration using adaptive pre-paging and dynamic self-ballooning, in: VEE’09: Proceedings of the 2009 ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, 2009. [6] R. Nathuji, K. Schwan, Virtualpower: coordinated power management in vir- tualized enterprise systems, in: SOSP’07: Proceedings of Twenty-first ACM SIGOPS Symposium on Operating Systems Principles, 2007. [7] A. Verma, P. Ahuja, A. Neogi, pMapper, Power and migration cost aware application placement in virtualized systems, in: Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware, 2008. [8] X. Wang, Y. Wang, Coordinating power control and performance management for virtualized server clusters, IEEE Transactions on Parallel and Distributed Systems 22 (2) (2011) 245–259. [9] G.F. Franklin, J.D. Powell, M. Workman, Digital Control of Dynamic Systems, 3rd ed., Addition-Wesley, 1997. 10] J.M. Maciejowski, Predictive Control with Constraints, Prentice Hall, 2002. 11] F.L. Lewis, V.L. Syrmos, Optimal Control, 2nd ed., John Wiley & Sons, Inc., 1995. 12] About ideas relative performance estimate 2 (RPE2), http://www. 13] C. Lefurgy, X. Wang, M. Ware, Server-level power control, in: ICAC, 2007. 14] A. Verma, G. Dasgupta, T.K. Nayak, P. De, R. Kothari, Server workload analysis for power minimization using consolidation, in: USENIX ATC, 2009. 15] E.M. Elnozahy, M. Kistler, R. Rajamony, Energy-efficient server clusters, in: PACS’02: Proceedings of the 2nd International Conference on Power-aware Computer Systems, 2002. 16] R. Raghavendra, P. Ranganathan, V. Talwar, Z. Wang, X. Zhu, No power struggles: coordinated multi-level power management for the data center, in: ASPLOS XIII: Proceedings of the 13th International Conference on Architectural Support for Programming Languages and Operating Systems, 2008. 17] X. Wang, M. Chen, Cluster-level feedback power control for performance optimization, in: HPCA’08: The 14th IEEE International Symposium on High- Performance Computer Architecture, 2008. 18] M. Chen, X. Wang, X. Li, Coordinating processor and main memory for efficient server power control, in: ICS, 2011. 19] Q. Deng, D. Meisner, A. Bhattacharjee, T.F. Wenisch, R. Bianchini, Coscale Coor- dinating CPU and memory system DVFS in server systems, in: MICRO, 2012. 20] G. Khanna, K. Beaty, G. Kar, A. Kochut, Application performance management in virtualized server environments, in: NOMS’06: Proceedings of the 10th IEEE/IFIP Network Operation and Management Symposium, 2006. 21] K. Fleszar, K.S. Hindi, New heuristics for one-dimensional bin-packing, Com- puters and Operations Research 29 (7) (2002). 22] Julie Marguerite and Emmanuel Cecchet, RUBBoS. Rice University Bul- letin Board System, Rice University, DynaServer/RUBBoS/ 23] Electronic Educational Devices Inc., Watts Up Pro Power Meter, http://www. 24] X. Wang, M. Chen, C. Lefurgy, Scalable hierarchical power control for large- scale data centers, in: PACT’09: The 18th International Conference on Parallel Architectures and Compilation Techniques, 2009. 25] Standard Performance Evaluation Corporation, All Published SPECpower ssj2008 Results, ssj2008/results/power ssj2008. html (July 2010). 26] V. Pallipadi, A. Starikovskiy, The ondemand governor, in: Proceedings of the Linux Symposium, Vol. 2, sn, 2006, pp. 215–230. 27] P. Padala, K.G. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal, A. Merchant, K. Salem, Adaptive control of virtualized resources in utility computing environments, in: EuroSys’07: Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, 2007. 28] Amazon Web Services LLC, Amazon EC2 FAQs, faqs/ 29] C. Lefurgy, K. Rajamani, F. Rawson, W. Felter, M. Kistler, T.W. Keller, Energy management for commercial servers, IEEE Computer 36 (12) (2003). 30] K. Rajamani, C. Lefurgy, On evaluating request-distribution schemes for sav- ing energy in server clusters., in: ISPASS’03: Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software, 2003. 31] R. Nathuji, P. England, P. Sharma, A. Singh, Feedback driven qos-aware power budgeting for virtualized servers, in: FeBID’09: The 4th International Workshop on Feedback Control Implementation and Design in Computing Systems and Networks, 2009. 32] L. Bertini, J. Leite, D. Mosse, SISO PIDF controller in an energy-efficient multi- tier web server cluster for e-commerce, in: FeBID’07: International Workshop on Feedback Control Implementation and Design in Computing Systems and Networks, 2007. 33] J. Heo, D. Henriksson, X. Liu, T. Abdelzaher, Integrating adaptive components: an emerging challenge in performance-adaptive systems and a server farm case- study, in: RTSS’07: Proceedings of the 2007 Real-Time Systems Symposium, 2007. 34] Y. Wang, X. Wang, M. Chen, X. Zhu, PARTIC: Power-aware response time con- trol for virtualized web servers, IEEE Transactions on Parallel and Distributed Systems 22 (2) (2011) 323–336.
  • ing: In [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ Y. Wang, X. Wang / Sustainable Comput 35] J. Xu, M. Zhao, J. Fortes, R. Carpenter, M. Yousif, On the use of fuzzy modeling in virtualized data center management, in: ICAC’07: Proceedings of the Fourth International Conference on Autonomic Computing, 2007. 36] Y. Chen, S. Iyer, X. Liu, D. Milojicic, A. Sahai, SLA decomposition: translating service level objectives to system level thresholds, in: ICAC’07: Proceedings of the Fourth International Conference on Autonomic Computing, 2007. 37] P. Padala, K.-Y. Hou, K.G. Shin, X. Zhu, M. Uysal, Z. Wang, S. Singhal, A. Merchant, Adaptive control of virtualized resources in utility computing environments, in: EuroSys’09: Proceedings of the 4th ACM European Conference on Computer Systems, 2009. 38] D. Kusic, J.O. Kephart, J.E. Hanson, N. Kandasamy, G. Jiang, Power and per- formance management of virtualized computing environments via lookahead control, in: ICAC’08: Proceedings of the 5th International Conference on Auto- nomic Computing, 2008. 39] Y. Diao, J.L. Hellerstein, S. Parekh, H. Shaikh, M. Surendra, Controlling quality of service in multi-tier web applications, in: ICDCS’06: Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, 2006. 40] X. Liu, J. Heo, L. Sha, X. Zhu, Adaptive control of multi-tiered web applications using queueing predictor, in: NOMS’06: 10th IEEE/IFIP Network Operations and Management Symposium, 2006. 41] X. Zhu, D. Young, B.J. Watson, Z. Wang, J. Rolia, S. Singhal, B. McKee, C. Hyser, D. Gmach, R. Gardner, T. Christian, L. Cherkasova, 1000 Islands, Integrated capac- ity and workload management for the next generation data center, in: ICAC’08: Proceedings of the 5th International Conference on Autonomic Computing, 2008. 42] C. Hyser, B. McKee, R. Gardner, B.J. Watson, Autonomic virtual machine place- ment in the data center, Tech Report, HP Labs, 2008 techreports/2007/HPL-2007-189.pdf 43] A. Beloglazov, J. Abawajy, R. Buyya, Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing, Future Genera- tion Computer Systems 28 (5) (2012) 755–768. 44] A. Beloglazov, R. Buyya, Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of vir- tual machines in cloud data centers, Concurrency and Computation: Practice and Experience 24 (13) (2012) 1397–1420. 45] V. Petrucci, E. Carrera, O. Loques, J.C.B. Leite, D. Mosse, Optimized manage- ment of power and performance for virtualized heterogeneous server clusters, in: 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), 2011. 46] T.F. Gonzalez, Handbook of Approximation Algorithms and Metaheuristics, Chapman, 2008. 47] M. Haouari, M. Serairi, Heuristics for the variable sized bin-packing problem, Computers and Operations Research 36 (10) (2009). 48] A. Caprara, U. Pferschy, Worst-case analysis of the subset sum algorithm for bin packing, Operations Research Letters 32 (2) (2004). formatics and Systems 4 (2014) 52–65 65 49] G. Jung, K.R. Joshi, M.A. Hiltunen, R.D. Schlichting, C. Pu, A cost-sensitive adap- tation engine for server consolidation of multitier applications, in: Middleware, 2009, pp. 163–183. 50] G. Jung, M.A. Hiltunen, K.R. Joshi, R.D. Schlichting, C. Pu, Mistral dynamically managing power, performance, and adaptation cost in cloud infrastructures, in: ICDCS, 2010, pp. 62–73. Yefu Wang received the B.S. and M.S. degrees from Harbin Institute of Technology, Harbin, China, in 2003 and 2006. He is currently a PhD candidate in computer engineer- ing supervised by Dr. Xiaorui Wang at the University of Tennessee, Knoxville. His current research focuses on system-level power management of computing systems, with applications to virtualized enterprise server environ- ments. Xiaorui Wang received the D.Sc. degree in computer sci- ence from Washington University in St. Louis in 2006. He is currently an Associate Professor in the Depart- ment of Electrical and Computer Engineering at the Ohio State University. He is the recipient of the US Office of Naval Research (ONR) Young Investigator (YIP) Award in 2011, the US National Science Foundation (NSF) CAREER Award in 2009, the Power-Aware Computing Award from Microsoft Research in 2008, and the IBM Real-Time Inno- vation Award in 2007. He also received the Best Paper Award from the 29th IEEE Real-Time Systems Symposium (RTSS) in 2008. He is an author or coauthor of more than 90 refereed publications. From 2006 to 2011, he was an assis- tant professor at the University of Tennessee, Knoxville, where he received the EECS Early Career Development Award, the Chancellors Award for Professional Promise, and the College of Engineering Research Fellow Award in 2008, 2009, and 2010, respectively. In 2005, he worked at the IBM Austin Research Laboratory, designing power control algorithms for high-density computer servers. From 1998 to 2001, he was a senior software engineer and then a project manager at Huawei Technologies Co. Ltd., China, developing distributed management systems for optical networks. His research interests include computer architecture and systems, data center power management, and cyber-physical systems. He is a member of the IEEE and the IEEE Computer Society. Performance-controlled server consolidation for virtualized data centers with multi-tier applications 1 Introduction 2 System architecture 3 MIMO response time controller 3.1 Problem formulation 3.2 Response time controller design 3.3 CPU resource arbitrator 4 Server consolidation for power optimization 4.1 Problem formulation 4.2 Power model 4.3 Optimization algorithm 4.4 Algorithm analysis 5 System implementation 5.1 Testbed 5.2 Control components 5.3 Simulator 6 Experimentation 6.1 Baselines 6.2 Response time control 6.3 Server Consolidation for Power Optimization 6.4 Simulation results for large data centers 6.5 Execution time measurement 7 Related Work 8 Conclusions Acknowledgments References
P c Y a b a A R R A K P D S M V C 1 t h I c p F r f a e o h 2 Sustainable Computing: Informatics and Systems 4 (2014) 52–65 Contents lists available at ScienceDirect Sustainable…