Pointer and Escape Analysis for Multithreaded Programs ff
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Pointer and Escape Analysis for Multithreaded Programs ∗ Alexandru Salcianu ® Martin Rinard Laboratory for Computer Science Laboratory for Computer Science Massachusetts Institute of Technology Massachusetts Institute of Technology Cambridge, MA 02139 Cambridge, MA 02139 salcianu@lcs.mit.edu rinard@lcs.mit.edu ABSTRACT threads packages such as POSIX threads is unstructured — This paper presents a new combined pointer and escape child threads execute independently of their parent threads. analysis for multithreaded programs. The algorithm uses The software structuring techniques described above are de- a new abstraction called parallel interaction graphs to an- signed to work with this form of multithreading, as are alyze the interactions between threads and extract precise many recommended design patterns [13]. But because the points-to, escape, and action ordering information for ob- lifetimes of child threads potentially exceed the lifetime of jects accessed by multiple threads. The analysis is compo- their starting procedure, unstructured multithreading sig- sitional, analyzing each method or thread once to extract nificantly complicates the interprocedural analysis of multi- a parameterized analysis result that can be specialized for threaded programs. use in any context. It is also capable of analyzing programs that use the unstructured form of multithreading present in 1.1 Analysis Algorithm languages such as Java and standard threads packages such This paper presents a new combined pointer and escape as POSIX threads. analysis for multithreaded programs, including programs We have implemented the analysis in the MIT Flex com- with unstructured forms of multithreading. The algorithm piler for Java and used the extracted information to 1) ver- is based on a new abstraction, parallel interaction graphs, ify that programs correctly use region-based allocation con- which maintain precise points-to, escape, and action order- structs, 2) eliminate dynamic checks associated with the use ing information for objects accessed by multiple threads. of regions, and 3) eliminate unnecessary synchronization. Unlike previous escape analysis abstractions, parallel inter- Our experimental results show that analyzing the interac- action graphs enable the algorithm to analyze the interac- tions between threads significantly increases the effective- tions between parallel threads. The analysis can therefore ness of the region analysis and region check elimination, but capture objects that are accessed by multiple threads but has little effect for synchronization elimination. do not escape a given multithreaded computation. It can also fully characterize the points-to relationships for objects 1. INTRODUCTION accessed by multiple parallel threads. Because parallel interaction graphs characterize all of the Multithreading is a key structuring technique for modern potential interactions of the analyzed method or thread with software. Programmers use multiple threads of control for its callers and other parallel threads, the resulting analysis many reasons: to build responsive servers that communicate is compositional at both the method and thread levels — with multiple parallel clients [15], to exploit the parallelism it analyzes each method or thread once to produce a single in shared-memory multiprocessors [5], to produce sophisti- general analysis result that can be specialized for use in any cated user interfaces [16], and to enable a variety of other context.1 Finally, the combination of points-to and escape program structuring approaches [11]. information in the same abstraction enables the algorithm Research in program analysis has traditionally focused to analyze only part of the program, with the analysis result on sequential programs [14]; extensions for multithreaded becoming more precise as more of the program is analyzed. programs have usually assumed a block structured, parbe- gin/parend form of multithreading in which a parent thread 1.2 Application to Region-Based Allocation starts several parallel threads, then immediately blocks wait- ing for them to finish [12, 19]. But the standard form of We have implemented our analysis in the MIT Flex com- multithreading supported by languages such as Java and piler for Java. The information that it produces has many potential applications in compiler optimizations, software ∗ The research was supported in part by DARPA/AFRL Con- engineering, and as a foundation for further program anal- tract F33615-00-C-1692, NSF Grant CCR00-86154, and NSF Grant ysis. This paper presents our experience using the analysis CCR00-63513. to optimize and check safety conditions for programs that Permission to make digital or hard copies of all or part of this work for use region-based allocation constructs instead of relying on personal or classroom use is granted without fee provided that copies are not made or distributed for pro£t or commercial advantage and that copies garbage collection. Region-based allocation allows the pro- bear this notice and the full citation on the £rst page. To copy otherwise, to gram to run (a potentially multithreaded) computation in republish, to post on servers or to redistribute to lists, requires prior speci£c 1 permission and/or a fee. Recursive methods or recursively generated threads may require an PPoPP’01, June 18-20, 2001, Snowbird, Utah, USA. iterative algorithm that may analyze methods or threads in the same Copyright 2001 ACM 1-58113-346-401/0006 ...$5.00. strongly connected component multiple times to reach a fixed point.
the context of a specific allocation region. All objects cre- computation. Each Task stores an Integer object in its ated by the computation are allocated in the region and source field as input and produces a new Integer object in deallocated when the computation finishes. To avoid dan- its target field as output.2 gling references, the implementation must ensure that the This program illustrates several common patterns for mul- objects in the region do not outlive the associated compu- tithreaded programs. First, it uses threads to implement tation. One standard way to achieve this goal is to dynam- parallel computations. Second, when a thread starts its ex- ically check that the program never attempts to create a ecution, it points to objects that hold the input data for reference from one object to another object allocated in a its computation. Finally, when the computation finishes, it region with a shorter lifetime [4]. If the program does at- writes references to its result objects into its thread object tempt to create such a reference, the implementation refuses for the parent computation to read. to create the reference and throws an exception. Unfortu- nately, this approach imposes dynamic checking overhead class main { and introduces a new failure mode for programs that use public static void main(String args[]) { int i = Integer.parseInt(args[0]); region-based allocation. Fib f = new Fib(i); We have used our analysis to statically verify that our Region r = new Region(); multithreaded benchmark programs use region-based allo- r.enter(f); cation correctly. It therefore provides a safety guarantee to } the programmer and enables the compiler to eliminate the } dynamic region reference checks. We also found that in- class Fib implements Runnable { int source; trathread analysis alone is not powerful enough — the algo- rithm must analyze the interactions between parallel threads Fib(int i) { source = i; } to verify the correct use of region-based allocation. We also used our analysis for the more traditional pur- public void run() { pose of synchronization elimination. While our algorithm is Task t = new Task(new Integer(source)); quite effective at enabling this optimization, for our multi- t.start(); try { threaded benchmarks, the interthread analysis provides lit- t.join(); tle additional benefit over the standard intrathread analysis. } catch (Exception e) { System.out.println(e); } System.out.println(t.target.toString()); 1.3 Contributions } This paper makes the following contributions: } class Task extends Thread { • Abstraction: It presents a new abstraction, paral- public Integer source; lel interaction graphs, for the combined pointer and public Integer target; escape analysis of programs with unstructured multi- Task(Integer s) { source = s; } threading. public void run() { • Analysis: It presents a new algorithm for analyz- int v = source.intValue(); ing multithreaded programs. The algorithm is com- if (v
these objects are contained in the lifetime of the Fibonacci 2.4 Analysis in the Example computation, and die when this computation finishes. A We use a generalized escape analysis to determine whether standard memory management system would not exploit any object allocated in a given region escapes the compu- this property. The Task and Integer objects would be allo- tation associated with the region. If none of the objects cated out of the garbage-collected heap, increasing the mem- escape, the program will never attempt to create a dangling ory consumption rate, the garbage collection frequency, and reference and the compiler can eliminate all of the checks. therefore the garbage collection overhead. The algorithm first performs an intrathread, interprocedural Region-based allocation provides an attractive alterna- analysis to derive a parallel interaction graph at the end of tive. Instead of allocating all objects out of a single garbage- each method. Figures 2 and 3 present the analysis results for collected heap, region-based approaches allow the program the run methods in the Fib and Task classes, respectively. to create multiple memory regions, then allocate each object in a specific region. When the program no longer needs any 2.4.1 Points-to Graphs of the objects in the region, it deallocates all of the objects The first component of the parallel interaction graph is the in that region without garbage collection. points-to graph. The nodes in this graph represent objects; Researchers have proposed many different region-based al- the edges represent references between objects. There are location systems. Our example (and our implemented sys- two kinds of edges: inside edges, which represent references tem) uses the approach standardized in the Real-Time Java created within the analyzed part of the program (for Fig- specification [4]. Before the main program invokes the Fi- ure 2, the sequential computation of the Fib.run method), bonacci computation, it creates a new memory region r. and outside edges, which represent references read from ob- The statement r.enter(f) executes the run method of the jects potentially accessed outside the analyzed part of the f object (and all of the methods or threads that it exe- program. In our figures, solid lines denote inside edges and cutes) in the context of the new region r. When one of dashed lines denote outside edges. the threads in this computation creates a new object, the There are also several kinds of nodes. Inside nodes rep- object is allocated in the region r. When the entire multi- resent objects created within the analyzed part of the pro- threaded computation terminates, all of the objects in the gram. There is one inside node for each object creation region are deallocated without garbage collection. The Task site in the program; that node represents all objects created and Integer objects are therefore managed independently of at that site. Parameter nodes represent objects passed as the garbage collected heap and do not increase the garbage parameters to the currently analyzed method; load nodes collection frequency or overhead. Region-based allocation represent objects accessed by reading a reference in an ob- is an attractive alternative to garbage collection because it ject potentially accessed outside the analyzed part of the exploits the correspondence between the lifetimes of objects program. Together, the parameter and load nodes make up and the lifetimes of computations to deliver a more efficient the set of outside nodes. In our figures, solid circles denote memory management mechanism. inside nodes and dashed circles denote outside nodes. In Figure 2, nodes 1 and 4 are outside nodes. Node 1 2.3 Regions and Dangling Reference Checks represents the this parameter of the method, while node 4 One potential problem with region-based allocation is the represents the object whose reference is loaded by the ex- possibility of dangling references. If an object whose lifetime pression t.target at line 2 of the example at the end of the exceeds the region’s lifetime refers to an object allocated in- Fib.run method. Nodes 2 and 3 are inside nodes, and de- side the region, any use of the reference after the region is note the Task and Integer objects created in the statement deallocated will access potentially recycled garbage, violat- Task t = new Task(new Integer(source)) at line 1 of the ing the memory safety of the program. The Real-Time Java example. specification eliminates this possibility as follows. It allows the computation to create a hierarchy of nested regions and 2.4.2 Started Thread Information ensures that no parent region is deallocated before one of its The parallel interaction graph contains information about child regions. Each region is associated with a (potentially which threads were started by the analyzed part of the pro- multithreaded) computation; the objects in the region are gram. In Figure 2, node 2 represents the started Task thread deallocated when its computation terminates and the ob- that implements the entire Fibonacci computation. In Fig- jects in all of its child regions have been deallocated. The ure 3, nodes 8 and 11 represent the two threads that im- implementation dynamically checks all assignments to ob- plement the parallel subtasks in the computation. The in- ject fields to ensure that the program never attempts to terthread analysis uses the started thread information when create a reference that goes down the hierarchy from an ob- it computes the interactions between the current thread and ject in an ancestor region to an object in a child region. If threads that execute in parallel with the current thread. the program does attempt to create such a reference, the check fails. The implementation prevents the assignment 2.4.3 Escape Information from taking place and throws an exception. The parallel interaction graph contains information about While these checks ensure the memory safety of the exe- how objects escape the analyzed part of the program to be cution, they impose additional execution time overhead and accessed by the unanalyzed part. A node escapes if it is a introduce a new failure mode for the software. Our goal is parameter node or represents an unanalyzed thread started to analyze the program and statically verify that the checks within the analyzed part of the program. It also escapes if never fail. Such an analysis would enable the compiler to it is reachable from an escaped node. In Figure 2, node 1 eliminate all of the dynamic region checks. It would also escapes because it is passed as a parameter, while nodes 3 provide the programmer with a guarantee that the program and 4 escape because they are reachable from the unana- would never throw an exception because a check failed. lyzed thread node 2.
Points-to Information Escape Information this 1 Points-to Information 1 is a parameter node Escape Information 3 is an unanalyzed 2 this 1 source started thread node t 2 3 is reachable from 2 1 is a parameter node 3 target source 4 is reachable from 2 4 is an unanalyzed t 2 8 started thread node target 9 is reachable from 8 inside edge inside node 7 10 is reachable from 8 outside edge outside node 9 is an unanalyzed source 11 Figure 2: Analysis Result for Fib.run started thread node 8 12 is reachable from 11 Points-to Information Escape Information target 10 13 is reachable from 11 6 source this 5 5 is a parameter node 12 target source 6 is reachable from 5 7 11 7 is reachable from 5 target 9 is an unanalyzed 13 source 8 started thread node t1 8 9 is reachable from 8 Figure 5: Analysis Result After First Interthread Analysis target 10 is reachable from 8 10 Points-to Information Escape Information is an unanalyzed 11 started thread node this 1 1 is a parameter node 12 source 12 is reachable from 11 3 t2 11 13 is reachable from 11 source target t 2 13 target Figure 3: Analysis Result for Task.run 7 Points-to Information Points-to Information source from Fib.Run from Task.Run 8 9 Mappings this 1 target 3 6 source source source 11 12 t 2 5 target target target 4 7 Figure 6: Final Analysis Result for Fib.run Figure 4: Mappings for Interthread Analysis of Fib.run and Task.run
2.5 Interthread Analysis Note that the matching process models interactions in which Previously proposed escape analyses treat threads very one thread reads references created by the other thread. Be- conservatively — if an object is reachable from a thread ob- cause the threads execute in parallel, the matching is sym- ject, the analyses assume that it has permanently escaped [2, metric. 3, 6, 22]. Our algorithm, however, analyzes the interactions The analysis uses µ1 and µ2 to combine the two parallel between threads to recapture objects accessed by multiple interaction graphs and obtain a new graph that represents threads. The foundation of the interthread analysis is the the combined effect of the two threads. Figure 5 presents construction of two mappings µ1 and µ2 between the nodes this graph, which the analysis computes as follows: of the parallel interaction graphs of the parent and child threads. Each outside node is mapped to another node if • Edge Projections: The analysis projects the edges the two nodes represent the same object during the analy- through the mappings to augment nodes from one par- sis. The mappings are used to combine the parallel interac- allel interaction graph with edges from the other graph. tion graph from the child thread into the parallel interaction In our example, the analysis projects the inside edge graph from the parent thread. The result is a new parallel from node 5 to node 6 through µ2 to generate new interaction graph that summarizes the parallel execution of inside edges from node 2 to nodes 3 and 7. It also the two threads. generates other edges involving outside nodes, but re- Figure 4 presents the mappings from the interthread anal- moves these edges during the simplification step. ysis of Fib.run and the Task.run method for the thread that Fib.run starts. The algorithm computes these mappings as • Graph Combination: The analysis combines the follows: two graphs, omitting the outside node that represents the this parameter of the started thread (node 5 in • Initialization: Inside the Fib.run method, node 2 our example). represents the started Task thread. Inside the Task.run method, node 5 represents the same started thread. • Simplification: The analysis removes all outside edges The algorithm therefore initializes µ2 to map node 5 from captured nodes, all outside nodes that are not to node 2. reachable from a parameter node or unanalyzed started thread node, and all inside nodes that are not reach- • Matching target edges: The analysis of the Task.run able from a live variable, parameter node, or unana- method creates inside edges from node 5 to nodes 6 lyzed started thread node. and 7. These edges have the label target, and rep- In our example, the analysis recaptures the (now analyzed) resent references between the corresponding Task and thread node 2. Nodes 3 and 7 are also captured even though Integer objects during the execution of the Task.run they are reachable from a thread node. The analysis removes method. nodes 4 and 6 in the new graph because they are not reach- The Fib.run method reads these references to obtain able from a parameter node or unanalyzed thread node. the result of the Task.run method. The outside edge Note that because the interactions with the thread nodes from node 2 to node 4 represents these references dur- 8 and 11 have not yet been analyzed, those nodes and all ing the analysis of the Fib.run method. The analysis nodes reachable from them escape. therefore matches the outside edge from the Fib.run Because our example program uses recursively generated method (from node 2 to node 4) against the inside parallelism, the analysis must perform a fixed point compu- edges from the Task.run method to compute that node tation during the interthread analysis. Figure 6 presents the 4 represents the same objects as nodes 6 and 7. The final parallel interaction graph from the end of the Fib.run result is that µ1 maps node 4 to nodes 6 and 7. method, which is the result of this fixed point analysis. The analysis has recaptured all of the inside nodes, including the • Matching source edges: The analysis of the Fib.run task nodes. Because none of the objects represented by these method creates an inside edge from node 2 to node 3. nodes escapes the computation of the Fib.run method, its This edge has the label source, and represents a ref- execution in a new region will not violate the region refer- erence between the corresponding Task and Integer encing constraints. objects during the execution of the Fib.run method. The Task.run method reads this reference to obtain 3. ANALYSIS ABSTRACTION its input. The outside edge from node 5 to node 6 We next formally present the abstraction (parallel interac- represents this reference during the analysis of the tion graphs) that the analysis uses. In addition to the points- Task.run method. The interthread analysis therefore to and escape information discussed in Section 2, parallel matches the outside edge from the Task.run method interaction graphs can also represent ordering information (from node 5 to node 6) against the inside edge from between actions (such as synchronization actions) from par- the Fib.run method (from node 2 to node 3) to com- ent and child threads. This ordering information enables the pute that node 6 represents the same objects as node analysis to determine when thread start events temporally 3. The result is that µ2 maps node 6 to node 3. separate actions of parent and child threads. This infor- mation may, for example, enable the analysis to determine • Transitive Mapping: Because µ1 maps node 4 to that a parent thread performs all of its synchronizations on node 6 and µ2 maps node 6 to node 3, the analysis a given object before a child thread starts its execution and computes that node 4 represents the same object as synchronizes on the object. To simplify the presentation, we node 3. The result is that µ1 maps node 4 to node 3. assume that the program does not use static class variables,
all the methods are analyzable and none of the methods re- – hhsync, n1 , n2 i, ni ∈ π if the synchronization ac- turns a result. Our implemented analysis correctly handles tion hsync, n1 , n2 i may have happened after one all of these aspects [20]. of the threads represented by n started executing. In this case, the actions of a thread represented 3.1 Object Representation by n may conflict with the action. The analysis represents the objects that the program ma- – hhn1 , f, n2 i, ni ∈ π if a reference represented by nipulates using a set n ∈ N of nodes, which is the disjoint the outside edge hn1 , f, n2 i may have been read union of the set NI of inside nodes and the set NO of outside after one of the threads represented by n started nodes. The set of thread nodes NT ⊆ NI represents thread executing. In this case, the outside edge may rep- objects. The set of outside nodes is the disjoint union of the resent a reference written by a thread represented set NL of load nodes and the set NP of parameter nodes. by n. There is also a set f ∈ F of fields in objects, a set v ∈ V of local and parameter variables, and a set l ∈ L ⊆ V of local We use the notation π@n = {a|ha, ni ∈ π} to denote the set variables. of actions and outside edges in π that may occur in parallel with a thread represented by n. 3.2 Points-To Escape Graphs A points-to escape graph is a triple hO, I, ei, where 4. ANALYSIS ALGORITHM • O ⊆ N × F × NL is a set of outside edges. We use the For each program point, the algorithm computes a paral- notation O(n1 , f) = {n2 |hn1 , f, n2 i ∈ O}. lel interaction graph for the current analysis scope at that point. For the intraprocedural analysis, the analysis scope • I ⊆ (N × F × N ) ∪ (V × N ) is a set of inside edges. is the currently analyzed method up to that point. The We use the notation I(v) = {n|hv, ni ∈ I}, I(n1 , f) = interprocedural analysis extends the scope to include the {n2 |hn1 , f, n2 i ∈ I}. (transitively) called methods; the interthread analysis fur- ther extends the scope to include the started threads. • e : N → P(N ) is an escape function that records the We next present the analysis, identifying the program rep- escape information for each node.3 A node escapes if resentation, the different phases, and the key algorithms in it is reachable from a parameter node or from a node the interprocedural and interthread phases. that represents an unanalyzed parallel thread. 4.1 Program Representation The escape function must satisfy the invariant that if n1 The algorithm represents the computation of each method points to n2 , then n2 escapes in at least all of the ways that using a control flow graph. We assume the program has been n1 escapes. When the analysis adds an edge to the points- preprocessed so that all statements relevant to the analy- to escape graph, it updates the escape function so that it sis are either a copy statement l = v, a load statement satisfies this invariant. We define the concepts of escaped l1 = l2 .f, a store statement l1 .f = l2 , a synchroniza- and captured nodes as follows: tion statement l.acquire() or l.release(), an object cre- ation statement l = new cl, a method invocation statement • escaped(hO, I, ei, n) if e(n) 6= ∅ l0 .op(l1 , . . . , lk ), or a thread start statement l.start(). • captured(hO, I, ei, n) if e(n) = ∅ The control flow graph for each method op starts with an enter statement enterop and ends with an exit statement 3.3 Parallel Interaction Graphs exitop . A parallel interaction graph is a tuple hhO, I, ei, τ, α, πi: 4.2 Intraprocedural Analysis • The thread set τ ⊆ N represents the set of unanalyzed The intraprocedural analysis is a forward dataflow analy- thread objects started by the analyzed computation. sis that propagates parallel interaction graphs through the statements of the method’s control flow graph. Each method • The action set α records the set of actions executed is analyzed under the assumption that the parameters are by the analyzed computation. Each synchronization maximally unaliased, i.e., point to different objects. For a action hsync, n1 , n2 i ∈ α has a node n1 that represents method with formal parameters v0 , . . . , vn , the initial par- the object on which the action was performed and a allel interaction graph at the entry point of the method is node n2 that represents the thread that performed the hh∅, {hvi , nvi i}, λn.if n = nvi then {n} else ∅i, ∅, ∅, ∅i, where action. If the action was performed by the current nvi is the parameter node for parameter vi . If the method thread, n2 is the dummy current thread node nCT ∈ is invoked in a context where some of the parameters may NT . Our implementation can also record actions such point to the same object, the interprocedural analysis de- as reading an object, writing an object, or invoking scribed below in Section 4.4 merges parameter nodes to con- a given method on an object. It is straightforward servatively model the effect of the aliasing. to generalize the concept of actions to include actions The transfer function hG0 , τ 0 , α0 , π 0 i = [[st]](hG, τ, α, πi) performed on multiple objects. models the effect of each statement st on the current par- allel interaction graph. Figure 7 graphically presents the • The action order π records ordering information be- rules that determine the new points-to graphs for the differ- tween the actions of the current thread and threads ent basic statements. Each row in this figure contains four that execute in parallel with the current thread. items: a statement, a graphical representation of existing 3 Here P(N ) is the set of all subsets of N , so that e(n) is the set of edges, a graphical representation of the existing edges plus nodes through which n escapes. the new edges that the statement generates, and a set of side
conditions. The interpretation of each row is that whenever the points-to escape graph contains the existing edges and the side conditions are satisfied, the transfer function for the statement generates the new edges. Assignments to a variable kill existing edges from that variable; assignments to fields of objects leave existing edges in place. In addition to updating the outside and inside edge sets, the transfer function also updates the the escape function e to ensure that if n1 points to n2 , then n2 escapes in at least all of the ways that n1 escapes. Except for load statements, the transfer functions leave τ , α, and π unchanged. For a load statement l1 = l2 .f the transfer function updates the action order π to record that any new outside edges may be created in parallel with the threads modeled by the nodes in τ (here nL is the load node for l1 = l2 .f): π 0 = π ∪ {hn1 , f, nL i|n1 ∈ I(l2 ) ∧ escaped(hO, I, ei, n1 )} × τ Figure 8 presents the transfer function for an l.start() statement, which adds the started thread nodes to τ and updates the escape function. Figure 9 presents the trans- where fer function for synchronization statements, which add the corresponding synchronization actions into α and record the actions as executing in parallel with all of the nodes in τ . At control-flow merges, the confluence operation takes the union of the inside and outside edges, thread sets, actions, and action orders. 4.3 Mappings Mappings µ : N → P(N ) implement the substitutions that take place when combining parallel interaction graphs. During the interprocedural analysis, for example, a param- eter node from a callee is mapped to all of the nodes at the call site that may represent the corresponding actual pa- rameter. Given an analysis component ξ, ξ[µ] denotes the component after replacing each node n in ξ with µ(n):4 S where τ [µ]=Sn∈τ µ(n) O[µ]= hn,f,n i∈O µ(n) × {f} × {nL } S L S I[µ]= hn ,f,n i∈I µ(n1 ) × {f} × µ(n2 ) ∪ hv,ni∈I {v} × µ(n) S 1 2 α[µ]= hsync,n ,n i∈α {sync} × µ(n1 ) × µ(n2 ) S 1 2 π[µ]= hhsync,n ,n i,ni∈π ({sync} × µ(n1 ) × µ(n2 )) × µ(n)∪ S 1 2 hhn1 ,f,n2 i,ni∈π (µ(n1 ) × {f} × µ(n2 )) × µ(n) 4.4 Interprocedural Analysis The interprocedural analysis computes a transfer function Figure 7: Generated Edges for Basic Statements for each method invocation statement. We assume a method invocation site of the form l0 .op(l1 , . . . , lk ), a potentially invoked method op with formal parameters v0 , . . . , vk with τ 0 =τ∪ I(l) corresponding parameter nodes nv0 , nv1 , . . . , nvk , a paral- e(n) ∪ {n0 } if n0 ∈ I(l) and lel interaction graph hhO1 , I1 , e1 i, τ1 , α1 , π1 i at the program 0 e (n)= n is reachable in O ∪ I from n0 point before the method invocation site, and a graph e(n) otherwise hhO2 , I2 , e2 i, τ2 , α2 , π2 i from the exit statement of op. The interprocedural analysis has two steps. It first computes a mapping µ for the outside nodes from the callee. It then uses Figure 8: Transfer Function for l.start() µ to combine the two parallel interaction graphs to obtain the parallel interaction graph at the program point immedi- α0 =α ∪ {sync} × I(l) × {nCT } ately after the method invocation. The analysis computes π 0 =π ∪ ({sync} × I(l) × {nCT }) × τ µ as the least fixed point of the following constraints: Figure 9: Transfer Function for l.acquire() and 4 The only exception is in the definition of O[µ] where we do not l.release() substitute the load node nL that constitutes the end point of an outside edge hn, f, nL i.
current implementation obtains this call graph information using a variant of a cartesian product type analysis [1], but I1 (li ) ⊆ µ(nvi ), ∀i ∈ {0, 1, . . . k} (1) it can use any conservative approximation to the dynamic call graph. hn1 , f, n2 i ∈ O2 , hn3 , f, n4 i ∈ I1 , n3 ∈ µ(n1 ) The analysis uses a worklist algorithm to solve the com- (2) n4 ∈ µ(n2 ) bined intraprocedural and interprocedural dataflow equa- tions. A bottom-up analysis of the program yields the full hn1 , f, n2 i ∈ O2 , hn3 , f, n4 i ∈ I2 , result with one analysis per strongly connected component µ(n1 ) ∩ µ(n3 ) 6= ∅, n1 6= n3 (3) of the call graph. Within strongly connected components, µ(n4 ) ∪ {n4 } ⊆ µ(n2 ) the algorithm iterates to a fixed point. The first constraint initializes µ; the next two constraints 4.5 Thread Interaction extend µ. Constraint 1 maps each parameter node from the Interactions between threads take place between a starter callee to the nodes from the caller that represent the actual thread (a thread that starts a parallel thread) and a startee parameters at the call site. Constraint 2 matches outside thread (the thread that is started). The interaction algo- edges read by the callee against corresponding inside edges rithm is given the parallel interaction graph hhO, I, ei, τ, α, πi from the caller. Constraint 3 matches outside edges from the from a program point in the starter thread, a node nT that callee against inside edges from the callee to model aliasing represents the startee thread, and a run method that runs between callee nodes. when the thread object represented by nT starts. The par- The algorithm next extends µ to µ0 to ensure that all allel interaction graph associated with the exit statement nodes from the callee (except the parameter nodes) appear of the run method is hhO2 , I2 , e2 i, τ2 , α2 , π2 i. The result in the new parallel interaction graph: of the thread interaction algorithm is a parallel interaction ½ graph hhO0 , I 0 , e0 i, τ 0 , α0 , π 0 i that models all the interactions 0 µ(n) if n ∈ NP µ (n) = between the execution of the starter thread (up to its corre- µ(n) ∪ {n} otherwise sponding program point) and the entire startee thread. This The algorithm computes the new parallel interaction graph result conservatively models all possible interleavings of the hhO0 , I 0 , e0 i, τ 0 , α0 , π 0 i at the program point after the method two threads. invocation as follows: The algorithm has two steps. It first computes two map- pings µ1 , µ2 , where µ1 maps outside nodes from the starter O0 = O1 ∪ O2 [µ0 ] I 0 = I1 ∪ (I2 − V × N )[µ0 ] 0 0 and µ2 maps outside nodes from the startee. It then uses µ1 τ = τ1 ∪ τ2 [µ ] α0 = α1 ∪ α2 [µ0 ] and µ2 to combine the two parallel interaction into a single π 0 = π1 ∪ π2 [µ0 ] ∪ (O2 [µ0 ] ∪ α2 [µ0 ]) × τ1 parallel interaction graph that reflects the interactions be- It computes the new escape function e0 as the union of the tween the two threads. The algorithm computes µ1 and µ2 escape function e1 before the method invocation and the as the least fixed point of the following constraints: expansion of the escape function e2 from the callee through nT ∈ µ2 (nv0 ), nT ∈ µ2 (nCT ) (4) µ0 . More formally, the following constraints define the new escape function e0 as hn1 , f, n2 i ∈ Oi , hn3 , f, n4 i ∈ Ij , n3 ∈ µi (n1 ) n2 ∈ µ0 (n1 ) (5) 0 e1 (n) ⊆ e (n) n4 ∈ µi (n2 ) (e2 (n1 ) − NP )[µ0 ] ⊆ e0 (n2 ) propagated over the edges from O0 ∪ I 0 . After the interpro- hn1 , f, n2 i ∈ Oi , hn3 , f, n4 i ∈ Ii , cedural analysis, reachability from the parameter nodes of µi (n1 ) ∩ µi (n3 ) 6= ∅, n1 6= n3 (6) the callee is no longer relevant for the escape function, hence µi (n4 ) ∪ {n4 } ⊆ µi (n2 ) the set difference in the second initialization constraint. We have a proof that this interprocedural analysis produces to hn1 , f, n2 i ∈ Ii , hn3 , f, n4 i ∈ Oj , n3 ∈ µi (n1 ) (7) a parallel interaction graph that is at least as conservative n2 ∈ µj (n4 ) as the one that would be obtained by inlining the callee and performing the intraprocedural analysis as in section n2 ∈ µi (n1 ), n3 ∈ µj (n2 ) 4.2 [20]. (8) n3 ∈ µi (n1 ) Finally, we simplify the resulting parallel interaction graph by removing superfluous nodes and edges. We remove all Here nv0 is the parameter node associated with the single load nodes nL such that e0 (nL ) = ∅ from the graph; such parameter of the run method – the this pointer – and nCT load nodes do not represent any concrete object. We also is the dummy current thread node. Also, I1 = I and O1 = remove all all outside edges hn1 , f, n2 i that start from a cap- O ∩ (π@nT ). Note that the algorithm computes interactions tured node n1 (where e0 (n1 ) = ∅); such outside edges do only for outside edges from the starter thread that represent not represent any concrete reference. Finally, we remove all references read after the startee thread starts. nodes that are not reachable from a live variable, parameter Unlike the caller/callee interaction, where the execution node, or unanalyzed started thread node from τ 0 . of the caller is suspended during the execution of the callee, Because of dynamic dispatch, a single method invocation in the starter/startee interaction, both threads execute in site may invoke several different methods. The transfer func- parallel, producing a more complicated set of statement in- tion therefore merges the parallel interaction graphs from all terleavings. The interthread analysis must therefore model a potentially invoked methods to derive the parallel interac- richer set of potential interactions in which each thread can tion graph at the point after the method invocation site. The read edges created by the other thread. The interthread
analysis therefore uses two mappings (one for each thread) not escape via a parameter node, it is captured after the in- instead of just one mapping. It also augments the con- terthread analysis. Finally the algorithm enhances the effi- straints to reflect the potential interactions. ciency and precision of the analysis by removing superfluous In the same style as in the interprocedural analysis, the nodes and edges using the same simplification method as in algorithm first initializes the mappings µ01 , µ02 to extend µ1 the interprocedural analysis. and µ2 , respectively. Each node from the two initial par- As presented, the algorithm assumes that each node n ∈ τ allel interaction graphs (except nv0 ) will appear in the new represents multiple instances of the corresponding thread. parallel interaction graph: Our implementation improves the precision of the analysis by tracking whether each node represents a single thread or µ01 (n) = ½ µ1 (n) ∪ {n} multiple threads. For nodes that represent a single thread, 0 µ2 (n) if n = nv0 µ2 (n) = the algorithm computes the interactions just once, adjusting µ2 (n) ∪ {n} otherwise the new action order π 0 to record that the outside edges and The algorithm uses µ01 and µ02 to compute the resulting par- actions from the startee thread do not occur in parallel with allel interaction graph as follows: the node n that represents the startee thread. For nodes that represent multiple threads, the algorithm repeatedly O0 = O[µ01 ] ∪ O2 [µ02 ] I 0 = I[µ01 ] ∪ (I2 − V × N )[µ02 ] computes the interactions until it reaches a fixed point. 0 0 0 τ = τ [µ1 ] ∪ τ2 [µ2 ] α0 = α[µ01 ] ∪ α2 [µ02 ] π 0 = π[µ01 ] ∪ π2 [µ02 ] ∪ 4.7 Resolving Outside Nodes (O2 [µ02 ] ∪ α2 [µ02 ]) × τ [µ01 ] ∪ π@nT [µ01 ] × τ2 [µ02 ] It is possible to augment the algorithm so that it records, In addition to combining the action orderings from the for each outside node, all of the inside nodes that it rep- starter and startee, the algorithm also updates the new ac- resents during the analysis of the entire program. This in- tion order π 0 to reflect the following ordering relationships: formation allows the algorithm to go back to the analysis results generated at the various program points and resolve • All actions and outside edges from the startee occur in each outside node to the set of inside nodes that it represents parallel with all of the starter’s threads, and during the analysis. In the absence of nodes that escape via unanalyzed threads or methods, this enables the algorithm • All actions and outside edges from the starter thread to obtain complete, precise points-to information even for that occur in parallel with the startee thread also oc- analysis results that contain outside nodes. cur in parallel with all of the threads that the startee starts. 5. ANALYSIS USES The new escape function e0 is the union of the escape func- tion e from the starter and the escape function e2 from the We next discuss how we use the analysis results to perform startee, expanded through µ1 and µ2 , respectively. More two optimizations: region reference check elimination and formally, the escape function e0 is initialized by the follow- synchronization elimination. ing two constraints 5.1 Region Reference Check Elimination n2 ∈ µ1 (n1 ) n2 ∈ µ2 (n1 ) The analysis eliminates region reference checks by verify- e(n1 )[µ1 ] ⊆ e0 (n2 ) (e2 (n1 ) − NP )[µ2 ] ⊆ e0 (n2 ) ing that no object allocated in a given region escapes the computation that executes in the context of that region. In and propagated over the edges from O0 ∪ I 0 . our system, all such computations are invoked via the execu- 4.6 Interthread Analysis tion of a statement of the form r.enter(t). This statement causes the the run method of the thread t to execute in the The interthread analysis uses a fixed-point algorithm to context of the memory region r. The analysis first locates obtain a single parallel interaction graph that reflects the in- all of these run methods. It then analyzes each run method, teractions between all of the parallel threads. The algorithm performing both the intrathread and interthread analysis, repeatedly chooses a node nT ∈ τ , retrieves the analysis re- and checks that none of the inside nodes in the analysis re- sult from the exit node of the corresponding run method,5 sult escape. If none of these inside nodes escape, all of the then uses the thread interaction algorithm presented above objects allocated inside the region are inaccessible when the in Section 4.5 to compute the interactions between the ana- computation terminates. All of the region reference checks lyzed threads and the thread represented by nT and combine will therefore succeed and can be removed. the two parallel interaction graphs into a new graph. Once the algorithm reaches a fixed point, it removes all nodes 5.2 Synchronization Elimination in NT from the escape function — the final graph already models all of the possible interactions that may affect nodes The synchronization elimination algorithm uses the re- that escape only via unanalyzed thread nodes. The analysis sults of the interthread analysis to find captured objects may therefore recapture thread nodes that escaped before whose synchronization operations can be removed. Like the interthread analysis. For example, if a thread node does previous synchronization elimination algorithms, our algo- 5 rithm uses the intrathread analysis results to remove syn- The algorithm uses the type information to determine which class chronizations on objects that do not escape the thread that contains this run method. For inside nodes, this approach is exact. For outside nodes, the algorithm uses class hierarchy analysis to find created them. Unlike previous synchronization elimination a set of classes that may contain the run method. The algorithm algorithms, our algorithm also analyzes the interactions be- computes the interactions with each of the possible run methods, then tween parallel threads. It then uses the action set α and the merges the results. In practice, τ almost always contains inside nodes only — the common coding practice is to create and start threads in action ordering relation π to eliminate synchronizations on the same method. objects with synchronizations from multiple threads.
The analysis proceeds as follows. For each node n that Analysis time [s] is captured after the interthread analysis, it examines π to Bytecode for removing Backend find all threads t that execute in parallel with a synchroniza- Program instructions checks syncs time [s] tion on n. It then examines the action set α to determine Tree 10,970 0.5 15.9 41.1 if t also synchronizes on n. If none of the parallel threads Array 10,896 0.6 16.9 42.2 t synchronize on n, the compiler can remove all synchro- Water 17,675 11.3 56.1 66.0 nizations on the objects that n represents. Even if multiple Barnes 15,945 6.9 94.2 54.8 threads synchronize on these objects, the analysis has deter- Http 14,313 17.1 38.3 73.8 mined that the synchronizations are temporally separated Quote 14,039 16.9 41.4 61.4 by thread start events and therefore redundant. Figure 10: Program Sizes and Analysis Times 6. EXPERIMENTAL RESULTS Original Optimized version We have implemented our combined pointer and escape Program version Interprocedural Interthread analysis algorithm in the MIT Flex compiler system, a static Tree 59 43 43 compiler for Java. We used the analysis information for syn- Array 59 43 43 chronization elimination and elimination of dynamic region Water 2,367,193 919,575 919,575 reference checks. We present experimental results for a set Barnes 2,838,720 678,355 678,355 of multithreaded benchmark programs. In general, these Http 67,268 8,460 7,406 programs fall into two categories: web servers and scien- Quote 268,913 200,650 198,610 tific computations. The web servers include Http, an http Figure 11: Number of Synchronization Operations server, and Quote, a stock quote server. Both of these ap- plications were written by others and posted on the Inter- Program Standard Checks No Checks net. Our scientific programs include Barnes and Water, two complete scientific applications that have appeared in other Tree 6.5 16.8 7.0 benchmark sets, including the SPLASH-2 parallel comput- Array 8.2 43.4 8.3 ing benchmark set [23]. We also present results for two syn- Water 9.6 9.7 8.1 thetic benchmarks, Tree and Array, that use object field as- Barnes 8.4 7.6 6.7 signment heavily. These benchmarks are designed to obtain Http 4.5 5.3 5.2 the maximum possible benefit from region reference check Quote 11.7 11.3 11.3 elimination. Figure 12: Execution Times for Benchmarks 6.1 Methodology Number of Number of We first modified the benchmark programs to use region- Program Objects in Heap Objects in Regions based allocation. The web servers create a new thread to Tree 184 65,534 service each new connection. The modified versions use a Array 183 8 separate region for each connection. The scientific programs Water 20,755 3,110,675 execute a sequence of interleaved serial and parallel phases. Barnes 17,622 2,121,167 The modified versions use a separate region for each paral- Http 12,228 62,062 lel phase. The result is that all of the modified benchmarks Quote 21,785 121,350 allocate long-lived shared objects in the garbage-collected heap and short-lived objects in regions. The modifications Figure 13: Allocation Statistics for Benchmarks were relatively straightforward to perform, but it was diffi- cult to evaluate the correctness of the modifications with- out the static analysis. The web servers were particularly 6.2 Results problematic since they heavily use the Java libraries. With- Figure 10 presents the program sizes and analysis times. out the static analysis it was not clear to us that the li- The synchronization elimination algorithm analyzes the en- braries would work correctly with region-based allocation. tire program, while the region check algorithm analyzes only For Http, Quote, Tree, and Array, the interprocedural anal- the run methods and the methods that they (transitively) ysis alone was able to verify the correct use of region-based invoke. The synchronization elimination analysis therefore allocation and enable the elimination of all dynamic region takes significantly more time than the region analysis. The checks. Barnes and Water required the interthread analysis backend time is the time required to produce an executable to eliminate the checks — interprocedural analysis alone was once the analysis has finished. All times are in seconds. unable to verify the correct use of region-based allocation. Figure 11 presents the number of synchronizations for the We used the MIT Flex compiler to generate a C imple- Original version with no analysis, the Interprocedural ver- mentation of each benchmark, then used gcc to compile the sion with interprocedural analysis only, and the Interthread program to an x86 executable. We ran the Http and Quote version with both interprocedural and interthread analysis. servers on a 400 MHz Pentium II running Linux, with the For this optimization, the interthread analysis produces al- clients running on an 866 MHz Pentium III running Linux. most no additional benefit over the interprocedural analysis. The two machines were connected with their own private Figure 12 presents the execution times of the benchmarks. 100 Mbit/sec Ethernet. We ran Water, Barnes, Tree, and The Standard version allocates all objects in the garbage- Array on an 866 MHz Pentium III running Linux. collected heap and does not use region-based allocation. The Checks version uses region-based allocation with all of the dynamic checks. The No Checks version uses region-based
allocation with the analysis eliminating all dynamic checks. 7.2 Escape Analysis for Multithreaded Programs None of the versions uses the synchronization elimination Published escape analysis algorithms for Java programs optimization. Check elimination produces substantial per- do not analyze interactions between threads [3, 6, 22, 2]. If formance improvements for Tree and Array and modest per- an object escapes via a thread object, it is never recaptured. formance improvements for Water and Barnes. The running These algorithms are therefore best viewed as sequential pro- times of Http and Quote are dominated by thread creation gram analyses that have been extended to execute correctly and operating system overheads, so check elimination pro- but very conservatively in the presence of multithreading. vides basically no performance increase. Figure 13 presents Our analysis takes the next step of analyzing interactions the number of objects allocated in the garbage-collected between threads to recapture objects accessed by multiple heap and the number allocated in regions. The vast ma- threads. jority of the objects are allocated in regions. Ruf’s analysis occupies a point between traditional escape analyses and our multithreaded analysis [18]. His analysis 6.3 Discussion tracks the synchronizations that each thread performs on Our applications use regions in one of two ways. The each object, enabling the compiler to remove synchroniza- servers allocate a new region for each connection. The re- tions for objects accessed by multiple threads if only one gion holds the new objects required to service the connec- thread synchronizes on the object. Our analysis goes a step tion. Examples of such objects include String objects that further to remove synchronizations even if multiple threads hold responses sent to clients and iterator objects used to synchronize on the object. The requirement is that thread find requested data. The scientific programs use regions for start events must temporally separate synchronizations from auxiliary objects that structure the parallel computation. different threads. These objects include the Thread objects required to gen- erate the parallel computation and objects that hold values 7.3 Region-Based Allocation produced by intermediate calculations. Region-based allocation has been used in systems for many In general, eliminating region checks provides modest per- years. Our comparison focuses on safe versions, which en- formance improvements. We therefore view the primary sure that there are no dangling references to deleted regions. value of the analysis in this context as helping the program- Several researchers have developed type-based systems that mer to use regions correctly. We expect the analysis to be support safe region-based allocation [21, 8]. These systems especially useful in situations (such as our web servers) when use a flow-insensitive, context-sensitive analysis to correlate the programmer may not have complete confidence in his or the lifetimes of objects with the lifetimes of computations. her detailed knowledge of the program’s object usage pat- Although these analyses were designed for sequential pro- terns. grams, it should be straightforward to generalize them to 7. RELATED WORK handle multithreaded programs. We discuss several areas of related work: analysis of mul- Gay and Aiken’s system provides an interesting contrast tithreaded programs, escape analysis for multithreaded pro- to ours in its overall approach [9]. They provide a safe, grams, and region-based allocation. flat region-based system that allows arbitrary references be- tween regions. The implementation instruments each store 7.1 Analysis of Multithreaded Programs instruction to count references that go between regions. A The analysis of multithreaded programs is a relatively region can be deleted only when there are no references to unexplored field [17]. There is an awareness that multi- its objects from objects in other regions. This dynamic, ref- threading significantly complicates program analysis but a erence counted approach works equally well for both sequen- full range of standard techniques have yet to emerge. Grun- tial and multithreaded programs. The system also supports wald and Srinivasan present a dataflow analysis framework the explicit assignment of objects to regions and allows the for reaching definitions for explicitly parallel programs [10], programmer to use type annotations to specify that a given and Knoop, Steffen and Vollmer present an efficient dataflow reference must stay within the same region. Violations of analysis framework for bit-vector problems such as liveness, this constraint generate a run-time error; a static analysis reachability and available expressions [12]. Both frameworks reduces but is not designed to eliminate the possibility of are designed for programs with structured, parbegin/parend such an error occurring. concurrency and are intraprocedural. We view the main con- Following the Real-Time Java specification, our imple- tributions of this paper as largely orthogonal to this previous mentation provides a less flexible system of hierarchically research. In particular, the main contributions of this paper organized regions with an implicit assignment of objects to center on abstractions and algorithms for the interprocedu- regions. Because region lifetimes are hierarchically nested, ral and compositional analysis of programs with unstruc- the implementation dynamically counts, for each region, the tured multithreading. We also focus on problems, pointer number of child regions rather than the number of external and escape analysis, that do not fit within either framework. pointers into each region. Instead of performing counter We are aware of two pointer analysis algorithms for multi- manipulations at each store, the unoptimized version of our threaded programs: an algorithm by Rugina and Rinard for system checks each assignment to ensure that the program multithreaded programs with structured parbegin/parend never generates a reference that goes down the hierarchy concurrency [19], and an intraprocedural algorithm by Cor- from an ancestor region to a descendant region. Our static bett [7]. The algorithms are not compositional (they dis- analysis eliminates these checks, with the interthread anal- cover the interactions between threads by repeatedly rean- ysis required to successfully optimize multithreaded pro- alyzing each thread in each new analysis context to reach grams. a fixed point), do not maintain escape information, and do not support the analysis of incomplete programs.
8. CONCLUSION [8] K. Crary, D. Walker, and G. Morrisett. Typed Multithreading is a key program structuring technique, memory management in a calculus of capabilities. In language and system designers have made threads a cen- Proceedings of the 26th Annual ACM Symposium on tral part of widely used languages and systems, and multi- the Principles of Programming Languages, San threaded software is becoming pervasive. This paper presents Antonio, TX, Jan. 1999. an abstraction (parallel interaction graphs) and an algo- [9] D. Gay and A. Aiken. Language support for regions. rithm that uses this abstraction to extract precise points- In Proceedings of the SIGPLAN ’01 Conference on to, escape, and action ordering information for programs Program Language Design and Implementation, that use the standard unstructured form of multithreading Snowbird, UT, June 2001. provided by modern languages and systems. We have im- [10] D. Grunwald and H. Srinivasan. Data flow equations plemented the analysis in the MIT Flex compiler for Java, for explicitly parallel programs. In Proceedings of the and used the extracted information to verify that programs 4th ACM SIGPLAN Symposium on Principles and correctly use region-based allocation constructs, eliminate Practice of Parallel Programming, San Diego, CA, dynamic checks associated with the use of regions, and elimi- May 1993. nate unnecessary synchronization. Our experimental results [11] C. Hauser, C. Jacobi, M. Theimer, B. Welch, and show that analyzing the interactions between threads signif- M. Weiser. Using threads in interactive systems: A icantly increases the effectiveness of the optimizations for case study. In Proceedings of the Fourteenth region-based programs, but has little effect for synchroniza- Symposium on Operating Systems Principles, tion elimination. Asheville, NC, Dec. 1993. [12] J. Knoop, B. Steffen, and J. Vollmer. Parallelism for 9. ACKNOWLEDGEMENTS free: Efficient and optimal bitvector analyses for parallel programs. ACM Transactions on Programming We started this research in collaboration with John Wha- Languages and Systems, 18(3):268–299, May 1996. ley; we thank him for his contributions during this collabo- [13] D. Lea. Concurrent Programming in Java: Design ration. We thank Wes Beebee for his invaluable assistance Principles and Patterns. Addison-Wesley, Reading, with the region-based allocation package and Darko Marinov Mass., 2000. and Viktor Kuncak for many interesting discussions regard- ing pointer and escape analysis. [14] F. Nielson, H. Nielson, and C. Hankin. Principles of Program Analysis. Springer-Verlag, 1999. [15] V. Pai, P. Druschel, and W. Zwaenepol. Flash: An 10. REFERENCES efficient and portable web server. In Proceedings of the [1] O. Agesen, J. Palsberg, and M. Schwartzbach. Type Usenix 1999 Annual Technical Conference, June 1999. inference of SELF: analysis of objects with dynamic [16] J. Reppy. Higher–order Concurrency. PhD thesis, and multiple inheritance. Software—Practice and Dept. of Computer Science, Cornell Univ., Ithaca, Experience, 25(9):975–995, Sept. 1995. N.Y., June 1992. [2] B. Blanchet. Escape analysis for object oriented [17] M. Rinard. Analysis of multithreaded programs. In languages. application to Java. In Proceedings of the Proceedings of 8th Static Analysis Symposium, Paris, 14th Annual Conference on Object-Oriented France, July 2001. Programming Systems, Languages and Applications, [18] E. Ruf. Effective synchronization removal for Java. In Denver, CO, Nov. 1999. Proceedings of the SIGPLAN ’00 Conference on [3] J. Bogda and U. Hoelzle. Removing unnecessary Program Language Design and Implementation, synchronization in Java. In Proceedings of the 14th Vancouver, Canada, June 2000. Annual Conference on Object-Oriented Programming [19] R. Rugina and M. Rinard. Pointer analysis for Systems, Languages and Applications, Denver, CO, multithreaded programs. In Proceedings of the Nov. 1999. SIGPLAN ’99 Conference on Program Language [4] G. Bollella, B. Brosgol, S. Furr, D. Hardin, P. Dibble, Design and Implementation, Atlanta, GA, May 1999. J. Gosling, and M. Turnbull. The Real-Time [20] A. Sălcianu. Pointer analysis and its applications for Specification for Java. Addison-Wesley, Reading, Java programs. Master’s thesis, Dept. of Electrical Mass., 2000. Engineering and Computer Science, Massachusetts [5] R. Chandra, A. Gupta, and J. Hennessy. Data locality Institute of Technology, In preparation. and load balancing in COOL. In Proceedings of the 4th [21] M. Tofte and L. Birkedal. A region inference ACM SIGPLAN Symposium on Principles and algorithm. ACM Transactions on Programming Practice of Parallel Programming, San Diego, CA, Languages and Systems, 20(4), July 1998. May 1993. [22] J. Whaley and M. Rinard. Compositional pointer and [6] J. Choi, M. Gupta, M. Serrano, V. Sreedhar, and escape analysis for Java programs. In Proceedings of S. Midkiff. Escape analysis for Java. In Proceedings of the 14th Annual Conference on Object-Oriented the 14th Annual Conference on Object-Oriented Programming Systems, Languages and Applications, Programming Systems, Languages and Applications, Denver, CO, Nov. 1999. Denver, CO, Nov. 1999. [23] S. Woo, M. Ohara, E. Torrie, J. Singh, and A. Gupta. [7] J. Corbett. Using shape analysis to reduce finite-state The SPLASH-2 programs: Characterization and models of concurrent Java programs. In Proceedings of methodological considerations. In Proceedings of the the International Symposium on Software Testing and 22nd International Symposium on Computer Analysis, Mar. 1998. Architecture. ACM, New York, June 1995.
You can also read