Contention-Aware Lock Scheduling for Transactional Databases - VLDB Endowment
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Contention-Aware Lock Scheduling for Transactional Databases Boyu Tian Jiamin Huang Barzan Mozafari Grant Schoenebeck University of Michigan Ann Arbor, MI, USA {bytian, jiamin, mozafari, schoeneb}@umich.edu ABSTRACT by solving the data contention problem. For example, be- Lock managers are among the most studied components fore a transaction accesses a database object, it has to ac- in concurrency control and transactional systems. How- quire the corresponding lock; if the transaction fails to get ever, one question seems to have been generally overlooked: a lock immediately, it is blocked until the system grants it “When there are multiple lock requests on the same object, the lock. This poses a fundamental question: when multiple which one(s) should be granted first?” transactions are waiting for a lock on the same object, which Nearly all existing systems rely on a FIFO (first in, first should be granted first when the object becomes available? out) strategy to decide which transaction(s) to grant the This question, which we call lock scheduling, has received lock to. In this paper, however, we show that the lock surprisingly little attention, despite the large body of work scheduling choices have significant ramifications on the over- on concurrency control and locking protocols [15, 46, 7, 68, all performance of a transactional system. Despite the large 18, 40, 50, 54, 23]. In fact, almost all existing DBMSs1 rely body of research on job scheduling outside the database con- on variants of the first-in, first-out (FIFO) strategy, which text, lock scheduling presents subtle but challenging require- grants (all) compatible lock requests based on their arrival ments that render existing results on scheduling inapt for time in the queue [2, 3, 4, 5, 6]. In this paper, we carefully a transactional database. By carefully studying this prob- study the problem of lock scheduling and show that it has lem, we present the concept of contention-aware scheduling, significant ramifications on overall performance of a DBMS. show the hardness of the problem, and propose novel lock Related Work — There is a long history of research on scheduling algorithms (LDSF and bLDSF), which guarantee scheduling in a general context [25, 42, 69, 70, 64, 41, 35, a constant factor approximation of the best scheduling. We 62], whereby a set of jobs is to be scheduled on a set of pro- conduct extensive experiments using a popular database on cessors such that a goal function is minimized, e.g., the sum both TPC-C and a microbenchmark. Compared to FIFO— of (weighted) completion times [64, 41, 39] or the variance the default scheduler in most database systems—our bLDSF of the completion or wait times [14, 17, 76, 49, 28]. There is algorithm yields up to 300x speedup in overall transaction also work on scheduling in a real-time database context [75, latency. Alternatively, our LDSF algorithm, which is sim- 40, 7, 36, 74], where the goal is to minimize the total tardi- pler and achieves comparable performance to bLDSF, has ness or the number of transactions missing their deadlines. already been adopted by open-source community, and was In this paper, we address the problem of lock scheduling in chosen as the default scheduling strategy in MySQL 8.0.3+. a transactional context, where jobs are transactions and pro- PVLDB Reference Format: cessors are locks, and the scheduling decision is about which Boyu Tian, Jiamin Huang, Barzan Mozafari, Grant Schoenebeck. locks to grant to which transactions. However, our transac- Contention-Aware Lock Scheduling for Transactional Databases. tional context makes this problem quite di↵erent than the PVLDB, 11 (5): 648 x-yyyy, - 662, 2018. well-studied variants of the scheduling problem. First, un- DOI: https://doi.org/10.1145/3177732.3177740 like generic scheduling problems, where at most one job can be scheduled on each processor, a lock may be held in either 1. INTRODUCTION exclusive or shared modes. The fact that transactions can sometimes share the same resources (i.e., shared locks) sig- Lock management forms the backbone of concurrency con- nificantly complicates the problem (see Section 2.4). More- trol in modern software, including many distributed systems over, once a lock is granted to a transaction, the same trans- and transactional databases. A lock manager guarantees action may later request another lock (as opposed to jobs both correctness and efficiency of a concurrent application requesting all of their needed resources upfront). Finally, Permission to make digital or hard copies of all or part of this work for in the scheduling literature, the execution time of each job personal or classroom use is granted without fee provided that copies are is assumed to be known upon its arrival [52, 70, 14, 76], not made or distributed for profit or commercial advantage and that copies whereas the execution time of a transaction is often unknown bear this notice and the full citation on the first page. To copy otherwise, to a priori. republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Articles from this volume were invited to present Although there are scheduling algorithms designed for their results at The 44th International Conference on Very Large Data Bases, real-time databases [55, 71, 77, 13], they are not applica- August 2018, Rio de Janeiro, Brazil. 1 Proceedings of the VLDB Endowment, Vol. 11, No. 5 The only exceptions are MySQL and MariaDB, which have Copyright 2018 VLDB Endowment 2150-8097/18/1. recently adopted our Variance-Aware Transaction Schedul- DOI: https://doi.org/10.1145/3177732.3177740 ing (VATS) [44] (see Section 7). 648
ble in a general DBMS context. For example, real-time set- as a microbenchmark. Our results confirm that, com- tings assume that each transaction comes with a deadline, pared to the commonly-used FIFO strategy, LDSF and whereas most database workloads do not have explicit dead- bLDSF reduce mean transaction latencies by up to 300x lines. Instead, most workloads wish to minimize latency or and 290x, respectively. They also increase throughput by maximize throughput. up to 6.5x and 5.5x. As a result, LDSF (which is simpler than bLDSF) has already been adopted as the default Challenges — Several aspects of lock scheduling make it scheduling algorithm in MySQL [1] as of 8.0.3+. a uniquely challenging problem, particularly under the per- formance considerations of a real-world DBMS. 1. An online problem. At the time of granting a lock 2. PROBLEM STATEMENT to a transaction, we do not know when the lock will be In this section, we first describe our problem setting and released, since the transaction’s execution time will only define dependency graphs. We then formally state the lock be known once it is finished. scheduling problem. 2. Dependencies. In a DBMS, there are dependencies among concurrent transactions when one is waiting for a 2.1 Background: Locking Protocols lock held by another. In practice, these dependencies can Locks are the most commonly used mechanism for ensur- be quite complex, as each transaction can hold locks on ing consistency when a set of shared objects are concurrently several objects and several transactions can hold shared accessed by multiple transactions (or applications). In a locks on the same object. locking system, there are two main types of locks: shared locks and exclusive locks. Before a transaction can read 3. Non-uniform access patterns. Not all objects in the an object (e.g., a row), it must first acquire a shared lock database are equally popular. Also, di↵erent transaction (a.k.a. read lock) on that object. Likewise, before a trans- types might each have a di↵erent access pattern. action can write to or update an object, it must acquire an 4. Multiple locking modes. The possibility of granting exclusive lock (a.k.a. write lock) on that object. A shared a lock to one writer exclusively or to multiple readers is lock can be granted on an object as long as no exclusive a source of great complexity (see Section 2.4). locks are currently held on that object. However, an exclu- Contributions — In this paper, to the best of our knowl- sive lock on an object can be granted only if there are no edge, we present the first formal study of lock scheduling other locks currently held on that object. We focus on the problem with a goal of minimizing transaction latencies in strict 2-phase locking (strict 2PL) protocol: once a lock is a DBMS context. Furthermore, we propose a contention- granted to a transaction, it is held until that transaction aware transaction scheduling algorithm, which captures the ends. Once a transaction finishes execution (i.e., it commits contention and the dependencies among concurrent transac- or gets aborted), it releases all of its locks. tions. The key insight is that a transaction blocking many 2.2 Dependency Graph others should be scheduled earlier. We carefully study the difficulty of lock scheduling and the optimality of our algo- Given the set T of transactions currently in the system, rithm. Most importantly, we show that our results are not and the set O of objects in the database, we define the merely theoretical, but lead to dramatic speedups in a real- dependency graph of the system as an edge-labeled graph world DBMS. Despite decades of research on all aspects of G = (V, E, L). The vertices of the graph V = T [ O con- transaction processing, lock scheduling seems to have gone sist of the current transactions and objects. The edges of unnoticed, to the extent that nearly all DBMSs still use the graph E ✓ T ⇥ O [ O ⇥ T describe the locking relation- FIFO. Our ultimate hope is that our results draw attention ships among the objects and transactions. Specifically, for to the importance of lock scheduling on the overall perfor- transaction t 2 T and object o 2 O, mance of a transactional system. • (t, o) 2 E if t is waiting for a lock on o; In summary, we make the following contributions: • (o, t) 2 E if t already holds a lock on o. 1. We propose a contention-aware lock scheduling algorithm, The label L : E ! {S, X} indicates the lock type: called Largest-Dependency-Set-First (LDSF). We prove • L(t, o) = X if t is waiting for an exclusive lock on o; that, in the absence of shared locks, LDSF is optimal in • L(t, o) = S if t is waiting for a shared lock on o; terms of the expected mean latency (Theorem 2). With • L(o, t) = X if t already holds an exclusive lock on o; shared locks, we prove that LDSF is a constant factor approximation of the optimal scheduling under certain • L(o, t) = S if t already holds a shared lock on o. regularity constraints (Theorem 3). We assume that deadlocks are rare and are handled by an 2. We propose the idea of granting only some of the shared external process (e.g., a deadlock detection and resolution lock requests on an object (as opposed to granting them module). Thus, for simplicity, we assume that the depen- all ). We study the difficulty of the scheduling prob- dency graph G is always a directed acyclic graph (DAG). lem under this setting (Theorem 5), and propose another algorithm, called bLDSF (batched Largest-Dependency- 2.3 Lock Scheduling Set-First), which improves upon LDSF in this setting. A lock scheduler makes decisions about which transactions We prove that bLDSF is also a constant factor approxi- are granted locks upon one or both of the following events: mation of the optimal scheduling (Theorem 6). (i) when a transaction requests a lock, and (ii) when a lock 3. In addition to our theoretical analysis, we use a real-world is released by a transaction.2 Let G be the set of all possible DBMS and extensive experiments to empirically evalu- 2 These are the only situations in which the dependency ate our algorithms on the TPC-C benchmark, as well graph changes. If a scheduler grants locks at other times, the same decision could have been made upon the previous 649
Notation Description Theorem 1. Given a dependency graph G, when a trans- T the set of transactions in the system action t releases a lock (S or X) on object o, it is NP-hard to O the set of objects in the database determine which pending lock requests to grant, in order to minimize the expected transaction latency. The result holds G the dependency graph of the system even if all transactions have the same execution time, and V vertices in the dependency graph no transaction requests additional locks in the future.3 E edges in the dependency graph L labels of the edges indicating the lock type Given the NP-hardness of the problem in general, in the rest of this paper, we propose algorithms that guarantee a A a scheduling algorithm constant-factor approximation of the optimal scheduling in lA (t) the latency of transaction t under A terms of the expected transaction latency. ¯ lA (t) the expectation of lA (t) ¯ l(A) the expected transaction latency under A 3. CONTENTION-AWARE SCHEDULING Table 1: Table of Notations. We define contention-aware scheduling as any algorithm that prioritizes transactions based on their impact on the dependency graphs of the system. A scheduling algorithm overall contention in the system. First, we study several A = (Areq , Arel ) is a pair of functions Areq , Arel : G ⇥ O ⇥ heuristics for comparing the contribution of di↵erent trans- T ⇥ {S, X} ! 2T . For example, when transaction t requests actions to the overall contention, and illustrate their short- an exclusive lock on object o, Areq (G, o, t, X) determines comings through intuitive examples. We then propose a par- which of the transactions currently waiting for a lock on o ticular contention-aware scheduling that formally quantifies (including t itself) should be granted their requested lock this contribution, and guarantees a constant-factor approxi- on o, given the dependency graph G of the system. (Note mation of the optimal scheduling when shared locks are not that the types of locks requested by transactions other than held by too many transactions. (Later, in Section 4, we gen- t are captured in G.) Likewise, when transaction t releases a eralize this algorithm for situations where this assumption shared lock on object o, Arel (G, o, t, S) determines which of does not hold.) the transactions currently waiting for a lock on o should be granted their requested lock, given the dependency graph G. 3.1 Capturing Contention When all transactions holding a lock on an object o release The degree of contention in a database system is directly the lock, we say that o has become available. When the related to the number of transactions concurrently request- lock request of a transaction t is granted, we say that t is ing conflicting locks on the same objects. scheduled. For example, a transaction holding an exclusive lock on a Since the execution time of each transaction is typically popular object will naturally block many other transactions unknown in advance, we model their execution time using requesting a lock on that same object. If such a transaction a random variable with expectation R. Given a particular is itself blocked (e.g., waiting for a lock on a di↵erent object), scheduling algorithm A, we define the latency of a transac- it will negatively a↵ect the latency of many transactions, in- tion t, denoted by lA (t), as its execution time plus the total creasing overall contention in the system. Thus, our goal in time it has been blocked waiting for various locks. Since contention-aware scheduling is to determine which transac- lA (t) is a random variable, we denote its expectation as tions have a more important role in reducing the overall ¯ lA (t). We use ¯l(A) to denote the expected transaction la- contention in the system, so that they can be given higher tency under algorithm A, which is defined as the average of priority when granting a lock. Next, we discuss heuristics the expected for measuring the priority of a transaction in reducing the ¯ P latencies of all transactions in the system, i.e., l(A) = |T1 | t2T ¯lA (t). overall contention. Our goal is to find a lock scheduling algorithm under Number of locks held — The simplest criterion for pri- which the expected transaction latency is minimized. To oritizing transactions is the number of locks they currently ensure consistency and isolation, in most database systems hold. We refer to this heuristic as Most Locks First (MLF). Areq simply grants a lock to the requesting transaction only The intuition is that a transaction with more locks is more when (i) no lock is held on the object, or (ii) the currently likely to block other transactions in the system. However, held lock and the requested lock are compatible and no this approach does not account for the popularity of objects transaction in the queue has an incompatible lock request. in the system. In other words, a transaction might be hold- This choice of Areq also ensures that transactions requesting ing many locks but on unpopular objects, which are unlikely exclusive locks are not starved. The key challenge in lock to be requested by other transactions. Prioritizing such a scheduling, then, is choosing an Arel such that the expected transaction will not necessarily reduce contention in the sys- transaction latency is minimized. tem. Figure 1 demonstrates an example where transaction t1 holds the most number of locks, but on unpopular ob- 2.4 NP-Hardness jects. It is therefore better to keep t1 waiting and instead Minimizing the expected transaction latency under the schedule t2 first, which holds fewer but more popular locks. scheduling algorithm is, in general, an NP-hard problem. Intuitively, the hardness is due to the presence of shared Number of locks that block other transactions — An locks, which cause the system’s dependency graph to be a improvement over the previous criterion is to only count DAG, but not necessarily a tree. those locks that have at least one transaction waiting on them. This approach disregards transactions that hold many event, i.e., a transaction was unnecessarily blocked. A lock 3 scheduler is thus an event-driven scheduler. All missing proofs can be found in our technical report [72]. 650
o1 Object o1 X X Object X X Transaction t1 t2 Transaction t1 t2 Transaction holds 1 Transaction holds a lock on object a lock on object Transaction waits Transaction waits for a lock on object for a lock on object 2 Figure 1: Transaction t1 holds the greatest number of 3 locks, but many of them on unpopular objects. Figure 3: Transaction t1 has a deeper dependency sub- o1 graph, but granting the lock to t2 will unblock more trans- X X Object actions which can run concurrently. Transaction t1 t2 Transaction holds a lock on object Later, in Section 6.4, we empirically evaluate these heuris- Transaction waits for a lock on object tics. While none of these heuristics alone are able to guaran- t3 tee an optimal lock scheduling strategy, they o↵er valuable insight in understanding the relationship between scheduling and overall contention. In particular, the first two heuris- tics focus on what we call horizontal contention, whereby a Figure 2: Transaction t2 holds two locks that are waited transaction holds locks on many objects directly needed by on by other transactions. Although only one of t1 ’s locks other transactions. In contrast, the third heuristic focuses is blocking other transactions, the blocked transaction (i.e., on reducing vertical contention, whereby a chain of depen- t3 ) is itself blocking three others. dencies causes a series of transactions to block each other. Next, we present an algorithm which is capable of resolving both horizontal and vertical aspects of contention. locks, but on these locks no other transactions are waiting. We call this heuristic Most Blocking Locks First (MBLF). 3.2 Largest-Dependency-Set-First The issue with this criterion is that it treats all blocked In this section, we propose an algorithm, called Largest- transactions as the same, even if they contribute unequally Dependency-Set-First (LDSF), which provides formal guar- to the overall contention. Figure 2 shows an example in antees on the expected mean latency. which the scheduler must decide between transactions t1 Consider two transactions t1 and t2 in the system. If and t2 when the object o1 becomes available. Here, this cri- there is a path from t1 to t2 in the dependency graph, we terion would choose t2 , which currently holds two locks, each say that t1 is dependent on t2 (i.e., t1 depends on t2 ’s com- at least blocking one other transaction. However, although pletion/abortion for at least one of its required locks). We t1 holds only one blocking lock, it is blocking t3 which itself define the dependency set of t, denoted by g(t), as the set of is blocking three other transactions. Thus, by scheduling t2 all transactions that are dependent on t (i.e., the set of trans- first, t3 and its three subsequent transactions will remain actions in t’s dependency subgraph). Our LDSF algorithm blocked in the system for a longer period of time than if t1 uses the size of the dependency sets of di↵erent transactions had been scheduled first. to decide which one(s) to schedule first. For example, in Depth of the dependency subgraph — A more sophis- Figure 4, there are five transactions in the dependency set ticated criterion is the depth of a transaction’s dependency of transaction t1 (including t1 itself) while there are four subgraph. For a transaction t, this is defined as the subgraph transactions in t2 ’s dependency set. Thus, in a situation of the dependency graph comprised of all vertices that can where both t1 and t2 have requested an exclusive lock on reach t (and all edges between such vertices). The depth object o1 , LDSF grants the lock to t1 (instead of t2 ) as soon of t’s dependency subgraph is characterized by the number as o1 becomes available. of transactions on the longest path in the subgraph that Now, we can formally present our LDSF algorithm. Sup- ends in t. We refer to this heuristic as Deepest Dependency pose an object o becomes available (i.e., all previous locks on First (DDF). Figure 3 shows an example, where the depth o are released), and there are m + n transactions currently of the dependency subgraph of transaction t1 is 3, while waiting for a lock on o: m transactions ti1 , ti2 , · · · , tim are that of transaction t2 is only 2. Thus, based on this crite- requesting a shared lock o, and n transactions tx1 , tx2 , · · · , txn rion, the exclusive lock on object o1 should be granted to are requesting an exclusive lock on object o. Our LDSF al- t1 . The idea behind this heuristic is that a longer path indi- gorithm defines the priority of each transaction txi requesting cates a larger number of transactions sequentially blocked. an exclusive lock as the size of its dependency set, |g(txi )|. Thus, to unblock such transactions sooner, the scheduling However, LDSF treats all transactions requesting a shared algorithm must start with a transaction with deeper depen- lock on o, namely ti1 , ti2 , · · · , tim , as a single transaction— dency graph. However, considering only the depth of this if LDSF decides to grant a shared lock, it will be granted subgraph can limit the overall degree of concurrency in the to all of them. The priority of the shared lock requests is system. For example, in Figure 3, if the exclusive lock on thus defined Sm as the size of the union of their dependency o1 is granted to t1 , upon its completion only one transac- sets, i=1 g(t i i ) . LDSF then finds the transaction t with ˆx tion in its dependency subgraph will be unblocked. On the the highest priority among t1 , t2 , · · · , tn . If tˆ ’s priority is x x x x other hand, if the lock is granted to t2 , upon its completion higher than the collective priority of the transactions re- two other transactions in its dependency subgraph will be questing a shared lock, LDSF grants the exclusive lock to unblocked, which can run concurrently. tˆx . Otherwise, a shared lock is granted to all transactions 651
o1 t2 t3 t4 X X Object Object Transaction Transaction t1 t2 Transaction holds Transaction holds o1 o2 a lock on object a lock on object Transaction waits Transaction waits t5 t6 for a lock on object for a lock on object o3 t1 Figure 4: Lock scheduling based on the size of the depen- Figure 5: The critical objects of t1 are o1 and o2 , as they dency sets. are locked by transactions t2 and t3 . Note that, although o3 is reachable from t1 , it is not a critical object of t1 since Input : The dependency graph of the system G = (V, E, L), transaction t, object o, label L 2 {X, S} it is locked by transactions that are not currently running, // meaning t has just released a lock of type L on o i.e., t5 and t6 which themselves are waiting for other locks. Output: The set of transactions whose requested lock on o should be granted The intuition here is that if a transaction t1 is dependent 1 if there are other transactions still holding a lock on o then on t2 , any progress in the execution of t2 can also be consid- 2 return ;; 3 Obtain the set of transactions waiting for a shared lock on o, ered as t1 ’s progress since t1 cannot receive its lock unless t2 Ti {ti 2 V : (ti , o) 2 E and L(ti , o) = S} = finishes execution. Thus, by granting the lock to the trans- {ti1 , ti2 , · · · , tim }; action with the largest dependency set, LDSF allows the 4 Obtain the set of transactions waiting for an exclusive lock most transactions to make progress toward completion. on o, T x {tx 2 V : (tx , o) 2 E and L(tx , o) = X} = However, this does not necessarily hold true with the ex- {tx , 1 2 t x , · · · , tx }; n istence of shared locks. Even if transaction t1 is dependent S n 5 Let ⌧ (T i ) = i i=1 g(ti ) ; on t2 , the execution of t2 does not necessarily contribute 6 Find a transaction t 2 W s.t. |g(tˆx )| = maxtxi 2T x g(tx ˆ x i) ; to t1 ’s progress. Specifically, consider the set of all objects 7 if ⌧ (T i ) < g(tˆx ) then that are reachable from t1 in the dependency graph, but are 8 return T i ; locked (shared or exclusively) by currently running transac- 9 else tions. We call these objects the critical objects of t1 , and 10 return {tˆx }; denote them as C(t1 ).5 For example, in Figure 5, we have Algorithm 1: Largest-Dependency-Set-First Algorithm C(t1 ) = {o1 , o2 }. Note that not all transactions that hold a lock on a critical object of t1 contribute to t1 ’s progress. Rather, only the transaction that releases the last lock on ti1 , ti2 , · · · , tim . The pseudo-code of the LDSF algorithm is that critical object allows for the progress of t1 . In the ex- provided in Algorithm 1. ample of Figure 5, t2 ’s execution does not contribute to t1 ’s progress, unless t3 releases the lock before t2 . Analysis — We do not make any assumptions about the fu- Nonetheless, when the number of transactions waiting for ture behavior of a transaction, as they may request various each shared lock is bounded, LDSF is a constant-factor ap- locks throughout their lifetime. Furthermore, since we can- proximation of the optimal scheduler. not predict new transactions arriving in the future, in our analysis, we only consider the transactions that are already Theorem 3. Let the maximum number of critical objects in the system. Since the system does not know the execution for any transaction in the system be c. Assume that the time of a transaction a priori, we model the execution time number of transactions waiting for a shared lock on the same of each transaction as a memoryless random variable. That object is bounded by u. The LDSF algorithm is a (c · u)- is, the time a transaction has already spent in execution approximation of the optimal scheduling (among strategies does not necessarily reveal any information about the trans- that grant all shared locks simultaneously) in terms of the action’s remaining execution time. We denote the remaining expected latency. execution time as a random variable R with expectation R̄. We also assume that the execution time of a transaction is 4. SPLITTING SHARED LOCKS not a↵ected by the scheduling.4 Transactions whose behav- ior depends on the actual wall-clock time (e.g., stop if run In the LDSF algorithm, when a shared lock is granted, it is before 2pm, otherwise run for a long time) are also excluded granted to all transactions waiting for it. In Section 4.1, we from our discussion. show why this may not be the best strategy. Then, in Sec- We first study a simplified scenario in which there are tion 4.2, we propose a modification to our LDSF algorithm, only exclusive locks in the system (we relax this assumption called bLDSF, which improves upon LDSF by exploiting the in Theorem 3). The following theorem states that LDSF idea of not granting all shared locks simultaneously. minimizes the expected latency in this scenario. 4.1 The Benefits and Challenges As noted earlier, when the LDSF algorithm grants a shared Theorem 2. When there are only exclusive locks in the lock, it grants the lock to all transactions waiting for it. system, the LDSF algorithm is the optimal scheduling algo- However, this may not be the optimal strategy. In general, rithm in terms of the expected latency. granting a larger number of shared locks on the same object 4 increases the probability that at least one of them will take For example, scheduling causes context switches, which 5 may a↵ect performance. For simplicity, in our formal anal- Note that the critical objects of a transaction may change ysis, we assume that their overall e↵ect is not significant. throughout its lifetime. 652
a long time before releasing the lock. Until the last transac- Object tion completes and releases its lock, no exclusive locks can S S S X Transaction be granted on that object. In other words, the expected du- Transaction waits t1 t2 t3 t4 for a lock on obect ration that the slowest transaction holds a shared lock grows x Dependency set of size x with the number of transactions sharing the lock. This is the 6 1 1 5 well-known problem of stragglers [21, 27, 31, 63, 79], which is exacerbated as the number of independent processes grows. Figure 6: Assume that f (2) = 1.5 and f (3) = 2. If we first To illustrate this more formally, consider the following grant a shared lock to all of t1 , t2 , and t3 , all transactions in example. Suppose that a set of m transactions, t1 , · · · , tm , t4 ’s dependency set will wait for at least 2R̄. The total wait are sharing a shared lock. Let R1rem , R2rem , · · · , Rmrem be a time will be 10R̄. However, if we only grant t1 ’s lock, then set of random variables representing the remaining times t4 ’s lock, and then grant t2 ’s and t3 ’s locks together, the of these transactions. Then, the time needed before an transactions in t4 ’s dependency set will only wait R̄, while exclusive lock can be granted on the same object is the those in t2 ’s and t3 ’s dependency sets will wait 2R̄. Thus, remaining time of the the slowest transaction, denoted as the total wait time in this case will be only 9R̄. rem Rmax,m = max{R1rem , · · · , Rm rem }, which itself is a random rem rem variable. Let R̄max,m be the expectation of Rmax,m . As long According to this theorem, any algorithm that does not as the Rirem ’s have non-zero variance6 (i.e., i2 > 0), R̄max,m rem rely on knowing the delay factor is not competitive: it per- strictly increases with m, as stated next. forms arbitrarily poor, compared to the optimal scheduling. Lemma 4. Suppose that R1rem , R2rem , · · · are random vari- Thus, in the next section, we take the delay factor f (k) as ables with the same range of values. If k+1 2 > 0, then an input, and propose an algorithm that adopts the idea of rem R̄max,k rem p, the system will make faster progress if the batch wait for an exclusive lock, as illustrated in Figure 6. How- of tˆi1 , tˆi2 , · · · , tˆik is scheduled first, in which case bLDSF will ever, lock scheduling in this situation becomes extremely difficult. We have the following negative result. grant shared locks to tˆi1 , tˆi2 , · · · , tˆik simultaneously. When q = p, the speed of progress in the system will be the same Theorem 5. Let A¬f be the set of scheduling algorithms under both scheduling decisions. In this case, bLDSF grants that do not use the knowledge of the delay factor f (k) in shared locks to the batch, in order to increase the overall their decisions. For any algorithm A¬f 2 A¬f , there exists degree of concurrency in the system. The pseudocode for w̄(A¬f ) bLDSF is provided in Algorithm 2. an algorithm A, such that = !(1) for some delay w̄(A) We show that, when the number of transactions waiting factor f (k). for shared locks on the same object is bounded, the bLDSF 6 algorithm is a constant factor approximation of the optimal This assumption holds unless all instances of a transaction type take exactly the same time, which is unlikely. scheduling algorithm in terms of the expected wait time. 653
Input : The dependency graph of the system G = (V, E, L), 5 t1 Object transaction t, object o, label L 2 {X, S} // meaning t has just released a lock of type L on o x Transaction with approximate Output: The set of transactions whose requested lock on o dependency set size x should be granted Transaction holds t2 2 2 t3 1 if there are other transactions still holding a lock on o then a lock on object 2 return ;; Transaction waits 3 Obtain the set of transactions waiting for a shared lock on o, for a lock on object Ti {ti 2 V : (ti , o) 2 E and L(ti , o) = S} = 1 t4 {ti1 , ti2 , · · · , tim }; 4 Obtain the set of transactions waiting for an exclusive lock Figure 7: The e↵ective size of t1 ’s dependency set is 5. But on o, T x {tx 2 V : (tx , o) 2 E and L(tx , o) = X} = its exact size is only 4. {tx , 1 2 t x , · · · , tx }; n 5 Let tˆi1 , tˆi2 , · · · , tˆik be the set of transactions in T i such that a transaction performing a table scan will request a large S number of locks, and will not make any progress until all of | ki=1 g(tˆii )| is maximized ; its locks can be granted. Thus, knowing a transaction’s lock f (k) ˆ pattern in advance would also be beneficial. We leave such 6 Let t be the transaction in T x with the largest dependency x set; extensions of our algorithms (e.g., to hybrid workloads [60]) Sk ˆi to future work. 7 if g(tˆx ) · f (k) i=1 g(t ) then i return {tˆi1 , tˆi2 , · · · , tˆik }; 8 9 else 5. IMPLEMENTATION 10 return tˆx ; We have implemented our scheduling algorithm in MySQL. Algorithm 2: The bLDSF Algorithm Similar to all major DBMSs, the default lock scheduling pol- icy in MySQL was FIFO.7 Specifically, all pending lock re- quests on an object are placed in a queue. A lock request Theorem 6. Let the maximum number of critical objects is granted immediately upon its arrival only if one of these for any transaction in the system be c. Assume that the two conditions holds: (i) there are no other locks currently number of transactions waiting for shared locks on the same held on the object, or (ii) the requested lock type is com- object is bounded by v. Then, given a delay factor of f (k), patible with all of the locks currently held on the object and the bLDSF algorithm is an h-approximation of the opti- there are no incompatible requests ahead of it waiting in the mal scheduling algorithm in terms of the expected wait time, queue. Similarly, whenever a lock is released on an object, where h = cv 2 · f (v). MySQL’s scheduler scans the entire queue from beginning to the end. It grants any waiting requests as long as one Unlike the LDSF algorithm, bLDSF requires a delay fac- of these conditions holds. As soon as the scheduler encoun- tor for its analysis. However, since the remaining times of ters the first lock request that cannot be granted, it stops transactions can be modeled as random variables, the exact scanning the rest of the queue. form of the delay factor f (k) will also depend on the distribu- One challenge in implementing LDSF and bLDSF is keep- tion of these random variables. For example, the delay fac- ing track of the sizes of the dependency sets. Exact calcu- tor for exponential random variables is f (k) = O(log k) [22], lation would require either (i) searching down the reverse for geometric random variables is f (k) p = O(log k) [29], for edges in the dependency graph in real-time, whenever a Gaussian random variables is f (k) = O( log k) [47], andpfor scheduling decision is to be made, or (ii) storing the depen- power law random variables with exponent 3 is f (k) = k. dency sets for all transactions and maintaining them each In Section 6.7, we empirically show that bLDSF’s per- time a transaction is blocked or a lock is granted. Both op- formance is not sensitive to the specific choice of the de- tions are relatively costly. Therefore, in our implementation, lay factor, as long as it is a sub-linear function that grows we rely on an approximation of the sizes of the dependency monotonically with k (conditions C1, C2, and C3 from Sec- sets, rather than computing their exact values. When a tion 4.1). This is because, when the batch size is small, transaction t holds no locks that block other transactions, the di↵erence between all sub-linear p functions is also small. |g(t)| = 1. Otherwise, let Tt be the set of transactions wait- For example, when b = 10, b ⇡ 3.16 and log2 (1 + b) ⇡ ing for an 3.46, leading to similar scheduling decisions. Even though P object currently held by transaction t. Then, p |g(t)| ⇡ t0 2Tt |g(t0 )| + 1. The reason this method is only log2 (1 + b) ⇡ 1.86 is smaller than the other two, it can an approximation of |g(t)| is that the dependency graph is a still capture condition C2 quite well. DAG (but not necessarily a tree), which means the depen- 4.3 Discussion dency sets of di↵erent transactions may overlap. Figure 7 illustrates an example, where the dependency set of t1 is In our analysis, we have assumed no additional informa- {t1 , t2 , t3 , t4 } and is therefore of size 4. However, its e↵ec- tion regarding a transaction’s remaining execution time, or tive size is calculated as one plus the sum of the e↵ective its lock access pattern. However, with the recent progress sizes of t2 and t3 ’s dependency sets, resulting in 5. To en- on incorporating machine learning models into DBMS tech- sure that transactions appearing on multiple paths will not nology [58, 11], one might be able to predict transaction be updated multiple times, we also keep track of those that latencies [78, 59] in the near future. When such information have already been updated. is available, a lock scheduling algorithm could take that into Another implementation challenge lies in the difficulty of account when maximizing the speed of progress: a transac- finding the desired batch of transactions in bLDSF. Cal- tion that will take longer should be given less priority. The culating the size of the union of several dependency sets priority of a transaction would then be the size of its depen- 7 dency set divided by its estimated execution time. Likewise, Now, our LDSF algorithm is the default (MySQL 8.0.3+). 654
requires detailed information about the elements in each de- 6. EXPERIMENTS pendency set (since the dependency sets may overlap due to Our experiments aim to answer several key questions: shared locks). Therefore, we rely on the following approxi- • How do our scheduling algorithms (LDSF and bLDSF) mation in our implementation. We first sort all transactions a↵ect the overall throughput of the system? waiting for a shared lock in the decreasing order of their de- pendency set sizes. Then, for k = 1, 2, · · · , we calculate the • How do our algorithms compare against FIFO (the q value (see Section 4.2) for the first k transactions. Here, default policy in nearly all databases) and VATS (re- we approximate the size of the union of the dependency sets cently adopted by MySQL), in terms of reducing av- as the sum of their individual sizes. Let k⇤ be the k value erage and tail transaction latencies? that maximizes q. We then take the first k⇤ transactions as • How do our scheduling algorithms compare against our batch, which we consider for granting a shared lock to. various heuristics? In Section 6, we show that, despite using these approxi- • How much overhead do our algorithms incur, com- mations in our implementation, our algorithms remain quite pared to the latency of a transaction? e↵ective in practice. • How does the e↵ectiveness of our algorithms vary with Starvation Avoidance — In MySQL’s implementation of di↵erent levels of contention? FIFO, when there is an exclusive lock request in the queue, • What is the impact of the choice of delay factor on the it serves as a conceptual barrier: later requests for shared e↵ectiveness of bLDSF? locks cannot be granted, even if they are compatible with the • What is the impact of approximating the dependency currently held locks on the object. This mechanism prevents sets (Section 5) on reducing the overhead? starvation when using FIFO. In our algorithms, starvation In summary, our experiments show the following: is prevented using a similar mechanism. We place a barrier 1. By resolving contention much more e↵ectively than FIFO at the end of the current wait queue. Lock requests that and VATS, bLDSF improves throughput by up to 6.5x arrive later are placed behind this barrier and are not con- (by 4.5x on average) over FIFO, and by up to 2x (1.5x sidered for scheduling. In other words, the only requests on average) over VATS. (Section 6.2) that are considered are those that are ahead of the barrier. Once all such requests are granted, this barrier is lifted, and 2. bLDSF can reduce mean transaction latencies by up to a new barrier is added to the end of the current queue, i.e., 300x and 80x (30x and 3.5x, on average) compared to those requests that were previously behind a barrier are now FIFO and VATS, respectively. It also reduces the 99th ahead of one. This mechanism prevents a transaction with percentile latency by up to 190x and 16x, compared to a small dependency set from waiting indefinitely behind an FIFO and VATS, respectively. (Section 6.3) infinite stream of newly arrived transactions with larger de- 3. Both bLDSF and LDSF outperform various heuristics by pendency sets. An alternative strategy to avoid starvation 2.5x in terms of throughput, and by up to 100x (8x on is to simply add a fraction of the transaction’s age to its avg.) in terms of transaction latency. (Section 6.4) dependency set size when making scheduling decisions. A 4. Our algorithms reduce queue length by reducing con- third strategy is to replace a transaction’s dependency set tention, and thus incur much less overhead than FIFO. size with a sufficiently large number once its wait time has However, their overhead is larger than VATS. (Section 6.5) exceeded a certain timeout threshold. 5. As the degree of contention rises in the system, bLDSF’s Space Complexity — Given the approximation methods improvement over both FIFO and VATS increases. (Sec- mentioned above, both LDSF and bLDSF only require main- tion 6.6) taining the approximate size of the dependency set of each 6. bLDSF is not sensitive to the specific choice of delay fac- transaction. Therefore, the overall space overhead of our tor, as long as it is chosen to be an increasing and sub- algorithms is only O(|T |). linear function. (Section 6.7) 7. Our approximation technique reduces scheduling over- Time Complexity — In MySQL, all lock requests on an head by up to 80x. object (either granted or not) are stored in a linked list. Whenever a transaction releases a lock on the object, the scheduler scans this list for requests that are not granted 6.1 Experimental Setup yet. For each of these requests, the scheduler scans the Hardware & Software — All experiments were performed list again to check compatibility with granted requests. If using a 5 GB bu↵er pool on a Linux server with 16 Intel(R) the request is found compatible with all existing locks, it is Xeon(R) CPU E5-2450 processors and 2.10GHz cores. The granted, and the scheduler checks the compatibility of the clients were run on a separate machine, submitting transac- next request. Otherwise, the request is not granted, and the tions to MySQL 5.7 running on the server. scheduler stops granting further locks. Let N be the num- ber of lock requests on an object (either granted or not). Methodology — We used the OLTP-Bench tool [24] to run Then, FIFO takes O(N 2 ) time in the worst case. LDSF the TPC-C workload. We also modified this tool to run a mi- and bLDSF both use the same procedure as FIFO to find crobenchmark (explained below). OLTP-Bench generated compatible requests that are not granted yet, which takes transactions at a specified rate, and client threads issued O(N 2 ) time. For bLDSF, we also sort all transactions wait- these transactions to MySQL. The latency of each transac- ing for a shared lock by the size of their dependency sets, tion was calculated as the time from when it was issued until which takes O(N log N ) time. Thus, the time complexity of it finished. In all experiments, we controlled the number of LDSF and bLDSF is still O(N 2 ). transactions issued per second within a safe range to prevent MySQL from falling into a thrashing regime. We also no- ticed that the number of deadlocks was negligible compared 655
Figure 8: Throughput improvement Figure 9: Avg. latency improvement Figure 10: Tail latency improvement with bLDSF (TPC-C). with bLDSF (under the same TPC-C w/ bLDSF (under the same number of transactions per second). TPC-C transactions per second). to the total number of transactions, across all experiments tion that holds the most locks which block at least one and algorithms. other transaction (introduced in Section 3.1). TPC-C Workload — We used a 32-warehouse configura- 6. Deepest Dependency First (DDF). When an object tion for the TPC-C benchmark. To simulate a system with becomes available, grant a lock on it to the transaction di↵erent levels of contention, we relied on changing the fol- with the deepest dependency subgraph (Section 3.1). lowing two parameters: (i) number of clients, and (ii) num- For MLF, MBLF, and DDF, if a shared lock is granted, ber of submitted transactions per second (a.k.a. through- all shared locks on that object are granted. For LDSF and put). Each of our client threads issued a new transaction as bLDSF, we use the barriers explained in Section 5 to prevent soon as its previous transaction finished. Thus, by creating starvation. For FIFO and VATS, if a shared lock is granted, a specified number of client threads, we e↵ectively controlled they continue to grant shared locks to other transactions the number of in-flight transactions. To control the system waiting in the queue until they encounter an exclusive lock, throughput, we created client threads that issued transac- at which point they stop granting more locks. tions at a specific rate. Microbenchmark — We created a microbenchmark for a 6.2 Throughput more thorough evaluation of our algorithm under di↵erent We compared the system throughput when using FIFO degrees of contention. Specifically, we created a database and VATS versus bLDSF, given an equal number of clients with only one table that had 20,000 records in it. The clients (i.e., in-flight transactions). We varied the number of clients would send transactions to the server, each comprised of 5 from 100 to 900. The results of this experiment for TPC-C queries. Each query was randomly chosen to be either a “SE- are presented in Figure 8. LECT” query (acquiring a shared lock) or an “UPDATE” In both cases, the throughput dropped as the number of query (acquiring an exclusive lock). The records in the table clients increased. This is expected, as more transactions in were accessed by the queries according to a Zipfian distri- the system lead to more objects being locked. Thus, when bution. To generate di↵erent levels of contention, we varied a transaction requests a lock, it is more likely to be blocked. the following two parameters in our microbenchmark: In other words, the number of transactions that can make 1. skew of the access pattern (the parameter ✓ of the Zipfian progress decreases, which leads to a decrease in throughput. distribution) However, the throughput decreased more rapidly when 2. fraction of exclusive locks (probability of “UPDATE” queries). using FIFO or VATS than bLDSF. For example, when there were only 100 clients, bLDSF outperformed FIFO by only Baselines — We compared the performance of our bLDSF 1.4x and VATS by 1.1x. However, with 900 clients, bLDSF algorithm (with f (k)=log2 (1 + k) as default) against the achieved 6.5x higher throughput than FIFO and 2x higher following baselines: throughput than VATS. As discussed in Section 4.2, bLDSF 1. First In First Out (FIFO). FIFO is the default sched- always schedules transactions that maximize the speed of uler in MySQL and nearly all other DBMSs. When an progress in the system. This is why it allows for more trans- object becomes available, FIFO grants the lock to the actions to be processed in a certain amount of time. transaction that has waited the longest. 2. Variance-Aware Transaction Scheduling (VATS). 6.3 Average and Tail Transaction Latency This is the strategy proposed by Huang et al. [44]. When We compared transaction latencies of FIFO, VATS, and an object becomes available, VATS grants the lock to the bLDSF under an equal number of transactions per second eldest transaction in the queue. (i.e, throughput). We varied the number of clients (and 3. Largest Dependency Set First (LDSF). This is the hence, the number of in-flight transactions) from 100 to strategy described in Algorithm 1, which is equivalent to 900 for FIFO and VATS, and then ran bLDSF at the same bLDSF with b = inf, and f (k) = 1. throughput as VATS, which is higher than the throughput 4. Most Locks First (MLF). When an object becomes of FIFO. This means that we compare bLDSF with FIFO available, grant a lock on it to the transaction that holds at a higher throughput. The result is shown in Figure 9. the most locks (introduced in Section 3.1). Our bLDSF algorithm dramatically outperformed FIFO by 5. Most Blocking Locks First (MBLF). When an ob- a factor of up to 300x and VATS by 80x. This outstanding ject becomes available, grant a lock on it to the transac- improvement confirms our Theorems 3 and 6, as our algo- rithm is designed to minimize average transaction latencies. 656
Figure 11: Maximum throughput un- Figure 12: Transaction latency under Figure 13: Scheduling overhead of var- der various algorithms (TPC-C). various algorithms (TPC-C). ious algorithms (TPC-C). Figure 14: Average number of transac- Figure 15: Average transaction la- Figure 16: Average latency for dif- tions waiting in the queue under various tency for di↵erent degrees of skewness ferent numbers of exclusive locks (mi- algorithms (TPC-C). (microbenchmark). crobenchmark). We also report the 99th percentile latencies in Figure 10. a scheduling algorithm is the time needed by the algorithm Here, bLDSF outperformed FIFO by up to 190x. Interest- to decide which lock(s) to grant. ingly, bLDSF outperformed VATS too (by up to 16x), even In this experiment, we fixed the number of clients to 100 though the latter is specifically designed to reduce tail la- while varying throughput from 200 to 1000. The result is tencies. This is because bLDSF causes all transactions to shown in Figure 13. We can see that, although all three finish faster on average, and thus, those transactions waiting algorithms have the same time complexity in terms of the at the end of the queue will also wait less, leading to lower queue length (Section 5), ours resulted in much less over- tail latencies. head than FIFO because they led to much shorter queues for the same throughput. This is because our algorithms ef- 6.4 Comparison with Other Heuristics fectively resolve contention, and thus, reduce the number of In this section, we report our comparison of both bLDSF waiting transactions in the queue. To illustrate this, we also and LDSF algorithms against the heuristic methods intro- measured the average number of waiting transactions when- duced in Section 3, i.e., MLF, MBLF, and DDF. Moreover, ever an object becomes available. As shown in Figure 14, we compare our algorithms with VATS too. this number was much smaller for LDSF and bLDSF. How- First, we compared their throughput given an equal num- ever, VATS incurred less overhead than LDSF and bLDSF, ber of clients. We varied the number of clients from 100 despite having longer queues. This is because VATS does to 900. The results are shown in Figure 11. LDSF and not compute the sizes of the dependency sets. bLDSF achieve up to 2x and 2.5x improvement over the other heuristics in terms of throughput, respectively. We also measured transaction latencies under an equal 6.6 Studying Different Levels of Contention number of transactions per second (i.e, throughput). We In this section, we study the impact of di↵erent levels varied the number of clients from 100 to 900 for the heuris- of contention on the e↵ectiveness of our bLDSF algorithm. tics, and then ran bLDSF and LDSF at the maximum through- Contention in a workload is a result of two factors: (i) skew put achieved by any of the heuristics. For those heuris- in the data access pattern (e.g., popular tuples), and (ii) a tics which were not able to achieve this throughput, we large number of exclusive locks. There is more contention compared our algorithms at a higher throughput than they when the pattern is more skewed, as transactions will re- achieved. The results are shown in Figure 12, indicating that quest a lock on the same records more often. Likewise, ex- MLF, MBLF, and DDF outperformed FIFO by almost 2.5x clusive lock requests cause more contention, as they cannot in terms of average latency, while our algorithms achieved up be granted together and result in blocking more transac- to 100x improvement over the best heuristics (MBLF with tions. We studied the e↵ectiveness of our algorithm under 900 transactions). Furthermore, bLDSF was better than di↵erent degrees of contention by varying these two factors LDSF by a small margin. using our microbenchmark: 1. We fixed the fraction of exclusive locks to be 60% of all 6.5 Scheduling Overhead lock requests, and varied the ✓ parameter of the Zipfian We also compared the overhead of our algorithms (LDSF distribution of our access distribution between 0.5 and and bLDSF) against both FIFO and VATS: the overhead of 0.9 (larger ✓, more skew). 657
Figure 17: The impact of delay factor Figure 18: Scheduling overhead with Figure 19: CCDF of the relative error on average latency. and without our approximation heuristic of the approximation of the sizes of the for choosing a batch. dependency sets. 2. We fixed the ✓ parameter to be 0.8 and varied the prob- Understandably, f1 did not perform well, as it did not satisfy ability of an “UPDATE” query in our microbenchmark condition C2 from Section 4.1. Functions f5 and f6 did not between 20% and 100%. The larger this probability, the perform well either, since linear functions overestimate the larger the fraction of exclusive locks. delay. For example, two transactions running concurrently First, we ran FIFO using 300 clients, and then ran both take less time than if they ran one after another. VATS and bLDSF at the same throughput as FIFO. The results of these experiments are shown in Figures 15 and 16. 6.8 Approximating Sizes of Dependency Sets Figure 15 shows that when there is no skew, there is no We studied the e↵ectiveness of our approximation heuris- contention, and thus most queues are either empty or only tic from Section 5 for choosing a batch of shared requests. have a single transaction waiting. Since there is no schedul- Computing the optimal batch accurately was costly, and sig- ing decision to be made in this situation, FIFO, VATS and nificantly lowered the throughput. However, we measured bLDSF become equivalent and exhibit a similar performance. the scheduling overhead, and compared it to when we used However, the gap between bLDSF and the other two algo- an approximation. We ran TPC-C, and varied the number rithms widens as skew (and thereby contention) increases. of clients from 100 to 900. As shown in Figure 18, our ap- For example, when the data access is highly skewed (✓ = proximation reduced the scheduling overhead by up to 80x. 0.9), bLDSF outperforms FIFO by more than 50x and VATS We also measured the error of our approximation tech- by 38x. Figure 16 reveals a similar trend: as more exclusive nique for estimating the dependency set sizes—the deviation locks are requested, bLDSF achieves greater improvement. from the actual sizes of the dependency sets—for varying ra- Specifically, when 20% of the lock requests are exclusive, tios of shared locks in the workload. Figure 19 shows the bLDSF outperforms FIFO by 20x and VATS by 9x. How- complementary cumulative distribution function (CCDF) of ever, when all the locks are exclusive, the improvement is the relative error of approximating the dependency set sizes. even more dramatic, i.e., 70x over FIFO and 25x over VATS. The error grew with the ratio of shared locks; this was ex- Note that, although VATS guarantees optimality when there pected, as shared locks are the cause of error in our approx- are only exclusive locks [44], it fails to account for transac- imation. However, the errors remained within a reasonable tion dependencies in its analysis (see Section 7 for a discus- range, e.g., even with 80% shared locks, we observed a 2- sion of the assumptions made in VATS versus bLDSF). In approximation of the exact sizes in 99% of the cases. summary, when there is no contention in the system, there are no scheduling decisions to be made, and all scheduling 7. RELATED WORK algorithms are equivalent. However, as contention rises, so In short, the large body of work on traditional job schedul- does the need for better scheduling decisions, and so does ing is unsuitable in a database context due to the unique the gap between bLDSF and other algorithms. requirements of locking protocols deployed in databases. Al- though there is some work on lock scheduling for real-time 6.7 Choice of Delay Factor databases, they aim at supporting explicit deadlines rather To better understand the impact of delay factors on bLDSF, than minimizing the mean latency of transactions. we experimented with several functions of di↵erent growth Job Scheduling — Outside the database community, there rates, ranging from the lower bound of all functions that has been extensive research on scheduling problems in gen- satisfy conditions C1, C2, and C3 (i.e., f (k) = 1) to their eral. Here, the duration (and sometimes the weight and upper bound (i.e., f (k) = k). Specifically, we used each arrival time) of each task is known a priori, and a typical of the following delay factors in our bLDSF algorithm, and goal is to minimize (i) the sum of (weighted) completion measured the average transaction latency: times (SCT) [64, 41, 39], (ii) the latest completion time [20, • f1 (k) = 1; 34, 67], (iii) the completion time variance (CTV) [14, 17, p • f2 (k) = log2 (1 + k); 76, 49], or even (iv) the waiting time variance (WTV) [28]. • f3 (k) = log2 (1 + k); The o✏ine SCT problem can be optimally solved using a p Shortest-Weighted-Execution-Time approach, whereby jobs • f4 (k) = k; are scheduled in the non-decreasing order of their ratio of • f5 (k) = 0.5(1 + k); execution time to weight [70], if they all arrive at the same • f6 (k) = k. time. However, when the jobs arrive at di↵erent times, the The results are shown in Figure 17. We can see that all sub- scheduling becomes NP-hard [52]. linear functions (i.e., f2 , f3 , and f4 ) performed comparably, None of these results are applicable to our setting, mainly and that they performed better than the other functions. because of their assumption that each processor/worker can 658
You can also read