Extensible optimisation framework for .NET virtual machine
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Extensible optimisation framework for .NET virtual machine public c l a s s RemoteCallDSOC : Loom . A s p e c t { p r i v a t e OpDomains opDomains = new OpDomains ( ) ; p r i v a t e D e l a y e d C a l l s d e l a y e d C a l l s = new D e l a y e d C a l l s ( ) ; p r i v a t e S e t d e l a y e d C a l l s D e f = new H a s h e d S e t ( ) ; public RemoteCallDSOC ( DDFAAnalysis a n a l y s i s ) { t h i s . opDomains = a n a l y s i s . getOpDomains ( ) ; } [ Loom . C o n n e c t i o n P o i n t . I n c l u d e A l l ] [ Loom . C a l l ( I n v o k e . I n s t e a d ) ] public o b j e c t AnyMethod ( o b j e c t [ ] a r g s ) { OpDomain thisOpDomain = opDomains . getOpDomain ( C o n t e x t . MethodName ) ; OpNode opNode = thisOpDomain . OpNode ; S e t opNodeDef = opNode . g e t D e f s ( ) ; S e t opDomainUse = thisOpDomain . g e t U s e s ( ) ; bool f l a g = ( ( ( S e t ) opNodeDef & opDomainUse ) . Count > 0 ) | | ( ( ( S e t ) opDomainUse & d e l a y e d C a l l s D e f ) . Count > 0 ) ; BSc Joint Maths and Computing Final Year Individual Project June 2005 Miloš Puzović mp1102@doc.ic.ac.uk Supervisor: Paul Kelly Second Marker: Susan Eisenbach
Abstract Component-based programming is a new programming paradigm where in- dependent pieces of code are combined together to form the application program. Most performance improvement opportunities come from adapt- ing components to their context of use, but these improvements tend to reduce re-usability, clarity and long-term software value. The purpose of this work is to change this, so that developers can write programs that follow component-based programming logic, and a program would be au- tomatically and transparently transformed into one that is comparable, or better, in performance to an equivalent application. This work introduce a domain-specific optimisation component that en- codes knowledge specific to a particular application domain. The compo- nent has the access to the results of static data-flow analysis that is sharpen by control-flow information obtained at run-time. These results are used by domain-specific optimisation component to find the points in the appli- cation program that are candidates for the optimisation. At these points we intercept the execution of the application and apply the optimisation from domain-specific optimisation component. Initial testing shows that the framework we present here can result in significant performance gains for the applications.
Contents 1 Introduction 4 1.1 Goals of this project . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 An example of DSOCs . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 Organization of the report . . . . . . . . . . . . . . . . . . . . 9 2 Background 10 2.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.1 Microsoft .NET Framework . . . . . . . . . . . . . . . 10 2.1.2 Aspect-oriented programming . . . . . . . . . . . . . . 12 2.1.3 Deferred data-flow analysis . . . . . . . . . . . . . . . 13 2.2 The Veneer virtual machine . . . . . . . . . . . . . . . . . . . 15 2.2.1 Veneer run-time behaviour . . . . . . . . . . . . . . . 16 2.3 Focus of this work . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.1 Dynamo: automatic runtime program transformation 19 2.4.2 Data-flow Pointcut in Aspect-Oriented programming . 19 2.4.3 RROpt: the Runtime Remoting Optimizer . . . . . . 20 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3 Design 21 3.1 Requirement Specification . . . . . . . . . . . . . . . . . . . . 21 3.2 Individual Design Phases . . . . . . . . . . . . . . . . . . . . 22 3.2.1 Assembly Analysis . . . . . . . . . . . . . . . . . . . . 22 3.2.2 Static Analysis . . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 Interface Construction . . . . . . . . . . . . . . . . . . 23 3.3 Design Overview . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.3.1 The Integrated System . . . . . . . . . . . . . . . . . . 24 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4 Implementation 26 4.1 Assembly analysis . . . . . . . . . . . . . . . . . . . . . . . . 26 4.1.1 The Adon library . . . . . . . . . . . . . . . . . . . . . 26
CONTENTS 2 4.1.2 Instruction blocks . . . . . . . . . . . . . . . . . . . . 28 4.1.3 Control-flow analysis for instruction blocks . . . . . . 30 4.2 Data-flow analysis . . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1 Storing data-flow information . . . . . . . . . . . . . . 31 4.2.2 Deferred data-flow analysis . . . . . . . . . . . . . . . 32 4.3 The interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.3.1 Rapier-Loom.NET . . . . . . . . . . . . . . . . . . . . 34 4.3.2 Domain-specific optimisation component . . . . . . . . 34 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5 Evaluation 37 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.1.2 Experimental setup . . . . . . . . . . . . . . . . . . . . 38 5.2 Prefetching for distributed shared memory . . . . . . . . . . . 40 5.3 Remote call aggregation . . . . . . . . . . . . . . . . . . . . . 42 5.3.1 Optimisation . . . . . . . . . . . . . . . . . . . . . . . 42 5.3.2 Domain specific optimisation component . . . . . . . . 44 5.3.3 A benchmark . . . . . . . . . . . . . . . . . . . . . . . 46 5.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.4 SELECT * statements . . . . . . . . . . . . . . . . . . . . . . . 52 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 6 Conclusion and Future Work 55 6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 6.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 6.2.1 Further testing . . . . . . . . . . . . . . . . . . . . . . 57 6.2.2 Building a new interface . . . . . . . . . . . . . . . . . 57 6.2.3 Metadata creation . . . . . . . . . . . . . . . . . . . . 57
List of Figures 1.1 Graph of the execution times for the code optimised using the remote call aggregation optimisation and the unoptimised code 7 2.1 An overview of .NET framework . . . . . . . . . . . . . . . . 11 2.2 The control-flow graph after the builder phase of deferred data-flow analysis has been completed . . . . . . . . . . . . . 15 3.1 Assembly analysis . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Static analysis and builder phase . . . . . . . . . . . . . . . . 23 3.3 The process of applying optimisation(s) to the application . . 24 3.4 Design overview . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.1 Classes used to represent methods from assembly . . . . . . . 27 4.2 Class diagram for building instruction blocks . . . . . . . . . 29 4.3 Class diagram for deffered data-flow analysis . . . . . . . . . 32 5.1 The pseudo-code used for the bulk-prefetching application benchmark on left, while shaded areas on right present the op-nodes and the lp-regions identified in the pseudo-code . . 39 5.2 Demonstrates the stitcher algorithm when b = 3, c != 0 . . 39 5.3 Graphs of network traffic for the vector addition benchmark, with and without the remote call aggregation optimisation . . 49 5.4 Graph of the execution time for the vector addition bench- mark run over the local loopback device . . . . . . . . . . . . 50 5.5 Graph of the execution time for the vector addition bench- mark run over the 100MBit Ethernet network . . . . . . . . . 51 5.6 Graph of the execution time for the vector addition bench- mark run over the modem connection . . . . . . . . . . . . . 51
Chapter 1 Introduction In the past decade useful software has became far more complex then ever before. Building software is no more an art practiced by skilled individuals in isolation. Nowadays it is a group activity, based on ever increasing so- phisticated underlying infrastructure. A new programming paradigm, com- ponent software, has been devised to support this trend, where independent pieces of code are combined together to form the application programs [20]. By combining independent components applications programs can be built more quickly. However, this new paradigm introduces new demands on pro- gramming tools and the software development process. There are many abstractions available that make component-based pro- gramming easier, but none of them are able to perform optimisations that are specific to the application program or the component. This typically forces the application developers to conform to one of several performance- oriented design patterns, which may or may not be a fit for the component- based programming logic. The goal of this work is to change this, so that developers can write programs that follow component-based programming logic, and a program would be automatically and transparently transformed into one that is com- parable, or better, in performance to an equivalent application that conforms to the performance-oriented design patterns. 1.1 Goals of this project Most performance improvement opportunities come from adapting compo- nents to their context of use, but the performance improvement measures tend to reduce re-usability, clarity and long-term software value. Thus, the art of performance programming is to understand how to design and com- pose components so this doesn’t happen. In this work we have attacked this problem by automating the adaptation of components to their context. The ultimate goal of this work is to support cross-cutting domain-specific
1.2 An example of DSOCs 5 optimisations components (DSOCs) that are separable and re-usable. To make this work feasible we need to have a marketplace of DSOCs that will encode knowledge specific to a particular application domain, component family or API [8]. Also, we need to provide a domain-specific optimization plug-in framework for compiler or virtual machine that will adapt compo- nents to their context based on DSOCs. For example, we could have a DSOC that encodes knowledge about remote calls and decides whether two or more remote calls can be aggregated and sent together to the remote server in order to reduce the call overheads. Then this DSOC will be used by the provided domain-optimization framework to optimize remote calls. Hence, the developer can concentrate more on re-usability, clarity and long- term software value of her components. The ultimate goal, described above, gives us the following main goals of this work, • DSOC plugin API. To provide a proof-of-concept study that show it is possible to implement a software interface that enables develop- ers to encode knowledge specific to a particular application domain, component family or API. The software interface that encodes this knowledge is called a domain-specific optimization component. • DSOC plugin framework. To realize the adaptation of components to their context specified by the domain-specific optimization compo- nents in such a way that time overheads introduced by the program transformation are kept at the minimum. In other words, design an extensible domain-specific optimization framework for compiler or vir- tual machine. 1.2 An example of DSOCs To give a flavor of what is a domain-specific optimization component, this section will introduce one example. The DSOC in this example will encode knowledge necessary to perform the optimization of the remote calls by aggregating them. The software interface that will encode this knowledge will be an aspect because, by definition, aspects allow us to declare additional and modified behaviour of the program during its execution. For concreteness of the discussion, the software interface represented here is loosely based on the AspectJ’s pointcut-and-advice mechanism [2]. Parallels between this As- pectJ’s mechanism and the mechanism used in this project are shown in the [18], while an introduction to aspect-oriented programming is given in the Chapter 2. The following is an example DSOC, which optimizes remote calls by aggregating them:
1.3 Approach 6 1 aspect R e m o t e C a l l O p t i m i z a t i o n { OpDomains opDomains = new DDFA(new F i l t e r (new RemoteMethod ( ) ) ) ; 3 C o n t e x t L i s t d e l a y e d C a l l s = new C o n t e x t L i s t ( ) ; 5 pointcut remote ( ) : c a l l ( ∗ ∗ . ∗ ( . . ) throws RemoteException ) ; 7 void around ( ) : remote ( ) { OpDomain thisOpDomain = opDomains . getOpDomain ( t h i s J o i n t P o i n t ) ; 9 OpNode opNode = thisOpDomain . getOpNode ( ) ; i f ( thisOpDomain . g e t U s e s ( ) . i n t e r s e c t s ( d e l a y e d C a l l s . g e t D e f s ( ) | | 11 thisOpDomain . g e t U s e s ( ) . i n t e r s e c t s ( opNode . g e t D e f s ( ) ) { delayedCalls . execute ( ) ; 13 proceed ( ) ; } 15 else { d e l a y e d C a l l s . add ( opNode . g e t C o n t e x t ( ) ) ; 17 } } 19 } In line 2 we have divided the application that we want to optimize into code blocks such that each code block consists of one remote call and all statements reachable from that remote call that are not passing through an- other remote call, while in line 3 we have created an object that keeps the list of delayed calls together with necessary information to execute those calls. Each code block has data-flow information associated with each statement. Out of all well-defined points in the program execution line 5 selects only points that are remote calls. In Java each method that throws Remote- Exception exception is a remote method. Implementation of remote call aggregation starts at line 7 and it is in the form of around advice, because we want to have a chance to decide where and when we want to execute the joint point, i.e. remote call. Firstly, we find a code block that has a selected joint point as its member (line 8) and secondly, we create an object that has data-flow information for the joint point (line 9). At line 10 we check if uses of variables in the selected code block intersect with those defined by delayed calls or the selected joint point. If they don’t intersect we can delay the call (line 16), otherwise we need to execute delayed calls (line 12) and selected joint point (line 13) or we will get semantically incorrect application. By sending remote calls in groups to server for the execution we are reducing the total amount of network traffic and therefore reducing the total execution time of remote calls. Figure 1.1 shows the execution times for the unoptimised code and the optimised code. 1.3 Approach To support cross-cutting domain-specific optimizations that are separable and re-usable it was decided to represent them as aspects. These aspects consist of pointcut and an advice. A pointcut picks a set of joint points based on the defined criteria. The selection of the joint points can be per-
1.3 Approach 7 Figure 1.1: Graph of the execution times for the code optimised using the remote call aggregation optimisation and the unoptimised code formed either at the compile-time or the run-time giving us a static part and dynamic part of a pointcut. An advice encodes knowledge specific to a particular component family and it is used by the domain-specific optimi- sation framework to adapt the component family to its context. Results of data-flow analysis, for example liveness analysis, can be used in pointcuts and in advice. The following approach has been taken to achieve the goal. • Creation of domain-specific optimization components requires use of program analysis techniques to identify dependences in components or program segments. We provide a deferred data-flow analysis (DDFA) as our dependence analysis tool. This analysis is divided into two phases: builder, performed at compile-time and stitcher, performed at run-time. During the builder phase we identify potential joint points based on pointcuts. These joint points will induce a fragment that consists of all statements reachable from the joint point that are not passing through another joint-point or program-exit node. Then we generate static data-flow summary information for each fragment. At run-time DDFA stitches together static data-flow summary informa- tion based on actual control flow to enhance accuracy. • This work extends DDFA’s stitcher phase with a dynamic aspect we- aver, because in many performance opportunities it is desirable to postpone the decision whether aspect information is to be added or
1.3 Approach 8 not to a particular component until program runtime. Hence, during this phase we dynamically interweave previously defined aspects with functional component code at the selected joint points. After we have completed interweaving we get the modified component that is adapted to its context. • It is essential to have an interface in which the domain-specific opti- mization components are going to be written. This work implements a simple interface as an extension of the aspect language. In this in- terface we have extended the pointcut language to provide access to both static and dynamic data-flow analysis. Advice can also query the data-flow analysis. The aim of this extension was to use data-flow analysis as knowledge decoder of the context of component. • We demonstrate the practical use of domain-specific optimization com- ponents and automated adaptation of components to their context based on the domain-specific optimization component using suitable experiments. Accomplishments of this project are, • Extending static analysis with dynamic information using de- ferred data-flow analysis and rescuing data-flow information thrown away by the conservative static analysis. This extension minimizes overheads due to control flow interception. This interception is derived from information needed by the particular aspect. If we can predict control flow at the point of use, then runtime data-flow information is maximised and control flow interceptions minimized. • Introducing aspects as domain-specific optimisation components. Performance opportunities usually affect multiple classes and cut across the typical division of responsibility. Therefore, we have used aspects to encapsulate this behavior giving us separable and re-usable compo- nents. • Providing a software interface for writing domain-specific optimi- sation components. This interface uses static and dynamic data-flow information to decode knowledge about the context of the component. It is then used by the dynamic aspect weaver to adapt the compo- nent to its context. Providing pointcuts and advice with data-flow information is the main feature of this interface. • Demonstrating the usability of the implementation using the suitable experiments such as remoting aggregation shown in this chap- ter.
1.4 Organization of the report 9 1.4 Organization of the report • Chapter 2 (Background) is an introduction to various aspects of the project and background reasearch that lead to the approach taken by this work. • Chapter 3 (Design) describes design decisions taken at the early stage of the project and identifies the primary design phases. • Chapter 4 (Implementation) details the implementation of each design phase. • Chapter 5 (Experimental evaluation) provides tests and results of the usability of the implementation. • Chapter 6 (Conclusion and Future Work) points out the achievements of the implementation, possible improvements for the implementation and directions for the future work.
Chapter 2 Background This chapter serves as an introduction to the key terms and concepts used through this project. Firstly, it provides a summary of the background literature that forms the basis for this work. It describes in sufficient depth the work on which this implementation is built. Secondly, it discusses the related work in this field and compares it to the work done in this project. 2.1 Literature review This section reviews the literature related to this project. It includes a gen- eral overview of Microsoft .NET framework and aspect-oriented program- ming as metaprogramming tool. It describes disadvantages of conservative data-flow analysis and introduces deferred data-flow analysis that overcomes them. 2.1.1 Microsoft .NET Framework Microsoft .NET Framework is a new and exciting integral Windows com- ponent for software development and deployment. It has been live since version 1.0 that was released in January 2002. The .NET Framework [14]: • Supports over 20 different programming languages. • Manages much of the plumbing involved in developing software, en- abling developers to concentrate more on the core business logic code. • Makes it easier than ever before to build, deploy and administer secure, robust, and high-performing applications. Therefore the .NET Framework embodies design goals that are both prac- tical and ambitious. They consists of support for components, language integration, application interoperation, simplifed development, improved re- liability, and greater security. The solid component model played a major
2.1 Literature review 11 Figure 2.1: An overview of .NET framework part in decision of choosing this framework. In .NET world all classes are ready to be reused at the binary level, since they are part of an assembly. These are the clear units of development, deployment and versioning. Thus our domain specific optimization component described in the Section 1.1 will have the form of an assembly. The .NET Framework is composed of the common language runtime and a unified set of class libraries. Common Language Runtime The common language runtime (CLR) is the most important component of the framework. It is responsible for run-time services such as object activation, object execution and garbage-collection of objects. In addition, the CLR has a role at development time when features such as life-cycle management, strong type naming, cross-language exception handling, and dynamic binding reduce the amount of code that developer must write to turn the business logic into a reusable component. A functional subset of the CLR, known as the Common Language In- terface (CLI) has been accepted in April, 2003, by the International Orga- nization for Standardization and the International Electrotechnical Comit- tee (ISO/IEC) as an international standard. This standardization provided third party developers with a chance to develop their own versions of the CLR for different platforms, such as Mac OS X and Linux. The CLR supports any language that can be represented in its Common Intermediate Language (CIL). Most of the time this code is run in the man- aged mode, which means that garbage collection and array bounds checking is performed by the CLR. There is also an unmanaged mode where the CIL code is allowed to perform unsafe and insecure operations such as pointer arithmetics. Finally, there is a simple interface for calling native code within DLLs. Native code is by definition unmanaged.
2.1 Literature review 12 Class Libraries Basic framework classes libraries provide rudimentary input and output functionality, string manipulation, security manager, network communica- tion, thread management, text management, reflection functionality, collec- tion functionally, user interface design features, as well as other functions. On the top of the basic framework classes is a set of classes that extend base classes to support data and XML management. One of those classes are ADO.NET classes that enable developers to interact with data accesses in the form of XML through the OLE DB, ODBC, Oracle, and SQL Server interfaces. Further to this there are classes that enable XML manipulation, searching and translation. Finally, classes in three different technologies (Web services, Web Forms and Windows Forms) extend framework base classes and data and XML classes. Web services include classes for development of web-based applica- tions, while Web Forms include classes for development of web Graphical User Interface (GUI) applications. Web Forms support classes that let you to develop native Windows GUI applications. 2.1.2 Aspect-oriented programming Metaprogramming is concerned with writing metaprograms. Metaprograms are programs that operate on other programs. Compilers and interpreters can be regarded as a type of metaprograms. Metaprogramming is an um- brella term that encompasses a large and diverse range of subjects such as reflection and multi-stage programming. One of example of metaprogram- ming is aspect-oriented programming (AOP). Each application can be regarded as a collection of implementation of ideas or concerns. Some concerns can be implemented as a programming module or as a collection of programming modules. Other concerns can- not be implemented as a set of programming modules. These concerns are referred to as crosscutting concerns. In order to implement crosscutting concerns it is required to add additional program fragments to the other concerns. Typical crosscutting concerns include logging, context-sensitive error handling, performance optimization and design patterns. For exam- ple, when adding the logging support to the application it is necessary to insert logging statements at the various points throughout the application. This approach is laborious and error-prone. It also lacks the modularity, because if we want to change the printouts of logging statements we would have to change each of them. The need for crosscutting concerns was discovered as early as in the 1970s in the seminal paper by Fouxman, while AOP originated in the 1990 by the group at the Xerox Palo Alto Research Center led by Gregor Kiczales [9]. They have developed the first AOP tool, AspectJ [2], based on Java. There
2.1 Literature review 13 are also .NET-based AOP tools such as LOOM.NET [18] that was devel- oped by Hasso-Plattner-Institute in Germany. The AOP web site contains a variety of other references. The idea behind the AOP is to modularize crosscutting concerns into reusable modules. These reusable modules are known as aspects. Aspects are very similar to the classes in their declaration, but in addition aspects contain pointcuts and advices. Each application has joint points. Joint points represent well-defined points in the program’s execution. Examples of joint points are method calls, access to the class members and the execution of exception handlers blocks. Then a pointcut is a language construct that selects a set of joint points based on the defined criteria. For example, we can define a pointcut that selects the execution of all remote procedure calls. An advice is the code defined relative to a pointcut that executes before, after, or around a joint point. The first two types of advice are simple, they either execute before the joint point or after the joint point. The third type of advice, around, gives the aspect writer a chance to decide whether and when the joint point is going to be executed. Example of the around advice was given in the Section 1.2. For aspects to affect regular class-based code, they need to be woven into the code they modify. Weaving can be performed before the program starts (static weaving), at load-time (load-time weaving) and after the class has been loaded (dynamic weaving). Static weaving has the advantage, because it doesn’t add any run-time penalties to the program’s execution. Aim of this project is to adapt the applications to their context. If a program is available as a source code and the code is stored locally then we have a fully controlled context, thus we can use static weaving. Otherwise parts of the program are generated at the run-time, hence we have a partly controlled context so the dynamic weaving needs to be used. The .NET implementation of aspect-oriented programming used in this project as well as the choice of the weaving technique is described in the Chapter 3 and Chapter 4. 2.1.3 Deferred data-flow analysis There are different types of optimizations. Optimizations found in the main- stream compilers are usually of the static variety, where a block of code is replaced by another that performs better, but produces the same re- sult for every possible case. Examples of such optimizations are thoroughly explained in the standard compiler texts [1]. However, many modern opti- mization techniques are dynamic, because they require some form of runtime support. To perform more aggressive optimizations it is necessary to work around the limits of static data-flow analysis. Static data-flow analysis lose infor- mation at the branch points, because it must conservatively combine data
2.1 Literature review 14 from the various branches. For example, if one branch only uses a variable A, while the other branch only uses a variable B, then both variables, A and B, will be identified as being used at the branch point although only one of them will be used during the execution of the program. If the branch that will be taken is known beforehand, it would be possible to optimize the program more aggressively by knowing which of A and B is live. Sharma et al. [19] used this concept to introduce deferred data-flow analysis (DDFA). In this section, we will give a brief overview of DDFA’s terms and con- cepts needed to follow the rest of this dissertation. Details on the imple- mentation of DDFA are given in the Chapter 4. DDFA framework is targeted to the optimization of heavy-weight opera- tions, such as remote procedure calls. Let denote such operations by op and node in the control-flow graph corresponding to the operation by op-node. This node induces an op-domain that consist of all nodes reachable from op- node that are not passing through another op-node or program-exit node. Node that induces this domain is called the entry node, while the exit node is another op-node or program-exit node. Each op-domain is going to have zero or more forks. Forks are well defined program points from which the control flow of the program can flow in more than one direction. If the incident edges from the fork in the control- flow graph don’t have the same data-flow attributes than fork is defined to be lossy. If it can be determined at the op-node at the runtime which control-flow will fork take than fork is defined to be predictable. Fork that is lossy and predictable is lp-fork. In the conservative analysis, information is lost at the lp-forks although we can predict the control flow direction at runtime. Hence if we defer the meet operation on the lp-forks until runtime we will not lose any information. The lp-fork induces lp-region which consists of all nodes reachable from lp-fork that are not passing through another lp-fork or any exists of op- domain. An lp-region may contain zero or more other forks. These forks are neither lossy nor predictable, hence we will not gain in information if we defer data-flow analysis on these forks until runtime. After the terms and concepts have been introduced, we will now explain how DDFA works. The first stage of DDFA identifies all op-nodes in the control-flow graph and constructs op-domains. Then within each op-domain, lp-forks are identified and lp-regions are constructed. In the second stage, we analyze each lp-region and create summary transfer function φ for each control-flow direction that lp-fork can take. The summary transfer function summarizes backward data-flow operations that would occur if control flowed in one direction. If there are no lp-forks in the op-domain we simply perform the conservative control-flow analysis. The identifications of op-nodes and analysis of lp-forms are performed at the compile-time. The third and fourth stage of DDFA are executed during the runtime of the application program when the control reaches the entry to the op-
2.2 The Veneer virtual machine 15 domain. At this point we know the directions of all lp-forks in the op-domain induced by the op-node we want to optimize and therefore the sequence of lp-region that are going to be encountered during the execution of the body of op-domain. This information is going to be used at the joint-point to stitch together summary functions from lp-regions in reverse order using the work-list algorithm. In the final stage we are going to use the data-flow information computed by the stitcher to make optimization decision. Figure 2.2: The control-flow graph after the builder phase of deferred data- flow analysis has been completed Advantage of DDFA over conventional data-flow analysis can be seen on the remote procedure call aggregation example. The example is shown in Figure 2.2. Let the remote procedure call that defines variable r be op- node that induces op-domain with one lp-fork. One branch of lp-fork uses variable r, while the other branch is executing some other remote procedure call. Using the conventional data flow analysis at the branch point variable r will be live, hence we will need to force the execution of the remote call at the op-node. On the other hand, with DDFA, at the op-node we will know which branch will be taken. If the branch that uses variable r is not taken, we can delay the execution of the remote call at the op-node and aggregate it with other remote calls. 2.2 The Veneer virtual machine Venner [23] is a domain-specific optimization framework for Java developed as a part of a Ph.D. dissertation. In simple terms, its author describes Veneer as a drop-in substitute for a standard virtual machine that is transparent from the perspective of the end user. It sits between the client applica-
2.2 The Veneer virtual machine 16 tion and a standard virtual machine intercepting the control flow of meth- ods in the application. These methods are referred as intercepted methods. The original body of the method is removed and replaced by a stub, while the control-flow graph of the original method is stored in a data structure known as an execution plan. The user of this framework can write simple interpreters called executors that execute method plans from plan-sets. Each method plan is a representation of the method body composed of code blocks called fragments and consists of related metadata (such as variable liveness information) that can be calculated statically. A plan-set is a structure that collects together all plans associated with methods in the same class. The power of Venner lies in the fact that by implementing different executors it is possible to deviate from the standard behavior of the virtual machine. For example, it may inspect and modify code belonging to the method, modify the control flow of the method etc. 2.2.1 Veneer run-time behaviour In this section we will explore how the body of the method is transformed into a plan via a fragmentation process and explain the structure of the executors. The fragmentation process First step is to determine what methods are going to be intercepted by Veneer. This is determined by an interception policy selected by the user. This policy is responsible for determining which classes and methods are going to be intercepted. If a method is to be intercepted, then original body of the method is removed completely and replaced by a stub. Stubs serve as entry-points into Veener. When stub is called, it fetches a suitable executor and execution plan for the method and calls the executor passing in the plan. After the stub has been generated, a new plan is created for the method. This plan consists of a code blocks. Code blocks are encapsulated sections of the code that may be executed at any time by the executor. There are two main categories of them - parameterised blocks and fragments. Parameterised blocks usually represent a single statement, such as a method call, an if statement or an assignment statement. These blocks are usually of interest to the executor writer and their behaviour may be modified at run-time. Hence, they are interpreted by Veneer. Fragments are fixed sections of code that executor writer is not interested in modifying at run-time. Thus, they are interpreted by the underlying virtual machine and they can run at the full speed. To determine fragmentation within the method Veneer is using another active policy selected by the user - fragmentation policy. For example, it may determine where parameterised blocks and fragments should occur.
2.2 The Veneer virtual machine 17 After the plan has been created, it is necessary to determine the control- flow between the blocks. This is done by searching for the branch statements within the each block. For every branch statement found, the code block that contains the destination statement is located and added to the next- block list associated with the current block. For every code block created, a fragment class with a single execute method containing the code from the code block is generated. This method is called by its corresponding code block within the plan. The fragment method consists of a prologue and an epilogue. The prologue sets up the stack frame with local and global variables into a form suitable for executing the next fragment. Data structure method state is used to hold the local and global variables used by a method between executions of the code blocks in a plan. Since the stack frame is set up by the prologue, the code from the code block can be executed. If during the execution of the code block there is a branch to the statement that is not within the current code block, current code block is replaced by the code block that contains the destination statement followed by a jump to the epilogue. An epilogue is an operation that reverses the effect of the prologue. It stores the current state of the stack frame back to the method state object. Final step in the fragmentation process is a metadata generation. To every code block we can assign additional information that can be calculated statically and later used at the run-time to make optimization decisions. Veneer supports information such as line numbers and variable definitions. Also, use and liveness information are provided. Executors Veneer provides the user with an opportunity to specify how the execution of the intercepted method should proceed by writing a simple interpreter. This simple interpreter is called an executor. An executor is built by extending the abstract class Executor. The main task of the developer is to implement the abstract method execute. Before the control reaches this method various attributes of the Executor class are initialized. The most important attributes are: 1. getCurrentPlan - returns the plan representing body of the method, 2. getCurrentBlock - returns the current block and 3. getCurrentState - returns the current method state. To make any progress the execute method of the current code block needs to be called, passing in the executor. By passing the instance of the executor the code block will have all the necessary information needed to execute itself. After the code block has been executed it will change to value of the current block to the block that is next in the control-flow
2.3 Focus of this work 18 graph and will return the control to the execute method of the abstract class Executor. Between block execution, the executor is allowed to do anything that is permissible within the Java environment. For example, it may decide to modify the current plan, which will result in changes to the method implementation. An example of a simple executor that executes a plan without any changes is given below (from [22]). public c l a s s B a s i c E x e c u t o r extends E x e c u t o r { public i n t e x e c u t e ( ) throws E x c e p t i o n { while ( ! ( i s F i n i s h e d ( ) | | l o c k W a s R e l e a s e d ( ) ) ) { try { // E x e c u t e t h e c u r r e n t b l o c k getCurrentBlock ( ) . execute ( this ) ; } catch ( E x e c u t e E x c e p t i o n e ) { // Pass c o n t r o l t o e x c e p t i o n h a n d l e r gotoExceptionHandler ( e ) ; // P r o p a g a t e e x c e p t i o n i f no h a n d l e r if ( isFinished ()) throw e . g e t E x c e p t i o n ( ) ; } } return n e x t ; } } 2.3 Focus of this work This work started as an idea to do for .NET what Veneer did for Java, but as a result of background research a different approach has been taken. This work differs in two major respects to Veneer. Firstly, Veneer performs con- servative compile-time data-flow analysis, here we perform a limited form of runtime data-flow analysis called deferred data-flow analysis (DDFA) and explained in section 2.1.3. Loss of precision due to the conservative data- flow analysis increases number of fragments in Veneer, thus increasing time overhead to interpret the method plans. Using DDFA data-flow analysis is sharpened using control-flow information available at the runtime result- ing in smaller fragmentation set. Secondly, instead of implementing virtual .Net virtual machine to gain control over the method control-flow of the ap- plication we will use one of the existing implementation of aspect-oriented programming on .Net. This approach will decrease time overheads intro- duced in Veneer by adding additional execution layer. Hence, code shown above for the executor will be replaced with the code shown in the section 1.2.
2.4 Related work 19 2.4 Related work This section discusses the research work that is related to this project and compares it to the work done here. 2.4.1 Dynamo: automatic runtime program transformation Dynamo [4] is a dynamic optimisation system running on PA-RISC based workstation running HP-UX. During the execution of the program, Dynamo builds up an instruction trace. If the instruction trace is identified as a ’hot’ (for example, the trace is often used) the trace is compiled into the fragment and run through the optimizer. Next time when the control reaches the point marked as the beginning of the hot instruction trace, Dynamo will pass the control to the optimized fragment. If within the optimized fragment there is a point to the other optimized fragment, Dynamo will immediately jump to the other optimized fragment without giving the control back to the executing program. Thus, run-time penalty that occurs during the control switching will be reduced to the minimum. This work differs only in environment and usage from Dynamo, while the underlying principle is practically identical. Dynamo operates at the machine-code level. In this work we operate on Common Intermediate Lan- guage (CIL) explained in the section 2.1.1. The advantage of using the CIL is that its high-level nature makes program analysis easier to implement. The disadvantage is that once the CIL code has been loaded into the virtual machine it is not possible to truly change the code. In order to overcome this a new code needs to be generated and everything needs to point in the direction of the new code. Dynamo is only concerned with extracting maximum performances from the software. In this work, we give the opportunity to the developer to write her own performance enchancers using the aspect-oriented programming. 2.4.2 Data-flow Pointcut in Aspect-Oriented programming Hidehiko Masuhara and Kazunori Kawauchi has proposed a data-flow-based pointcut for aspect-oriented programming languages [12]. Their pointcut specifies where aspects should be applied based on the origins of values and it is designed to be compatible with the other kinds of pointcuts in existing AOP languages. They have identified the security as a primary application field for this extension of AOP, because the flow of information is very important in the security. Two new pointcuts are added to the syntax of pointcut language. The first pointcut, dflow[x,x’](p), matches a statement if there is a dataflow from x to x0 , where variable x is bound to a value in the current joint point, while variable x0 is bound to a value in a previous joint point that matches
2.5 Summary 20 p. The second pointcut, returns(x), binds a return value from the joint point to x. In this work we have also extended the pointcut language of aspect- oriented programming languages. A new pointcut proposed in this work specifies where aspects should be applied based on the usage of variables in the op-domain that were defined in the op-node. While Masuhara and Kawauchi are using the forward data-flow to base their decision on which aspect to apply, here we are using the backward data-flow as explained in the section 2.1.3. 2.4.3 RROpt: the Runtime Remoting Optimizer The Runtime Remoting Optimizer was developed as a part of MSc disserta- tion [6]. This was the first attempt to ’port’ the architecture of Veener to the Microsoft .NET framework, but during the course of the project a different approach was taken. It was decided to implement RROpt within the virtual machine itself, rather as a virtual virtual machine. RROpt was implemented by modifying the interperter of Mono, the open-source implementation of CLI by Ximian. The idea was that rather than passing control back and forth between the fragment methods and the executor, the optimisations could be written in the interpreter which has control at every bytecode in- struction. The implementation was tested on the remote call aggregation problem. The main disadvantage of RROpt is that it is not extensible. If you would like to perform some other run-time optimization except remote call aggregation, you would have to modify the interpreter on your own. In this work by using the aspect-oriented programming we have made our opti- mization framework extensible such that every person with a knowledge of aspect-oriented programming is able to write an aspect that would perform the required run-time optimization. Another shortcoming of RROpt is that it is unable to eliminate unneeded return values, because it doesn’t perform the data-flow analysis. In this work optimization decisions are based on the data-flow analysis, thus any unneeded values or shared values will be detected. 2.5 Summary This chapter, • presents the survey of literature related to this work. • describes the focus of the project. • lists the related work from different research areas.
Chapter 3 Design In this chapter a detailed description of the core design decisions that were considered in the early stages of the project is given. It describes the overall strategy with approaches taken for this project and why they were cho- sen. We start the chapter by stating the requirement specification in order to achieve the desired goals. Then we break the requirement specification into individual design phases such that each phase aims at meeting spe- cific requirements. For each individual phase we present the essential design features of building blocks and illustrate how these building blocks are com- posed together to form the required system. We leave the explanation of implementation of building blocks for Chapter 4. 3.1 Requirement Specification The basic requirements for a system, in order to achieve the desired goals described in the Section 1.1, can be listed as follows, • Accept the user’s application in the form of the Common Intermediate Language (CIL) using one of the existing libraries for analysing assem- blies1 . This involves identifying the methods within the assembly that are candidates for the optimisation and preparing the assembly for data-flow and control-flow analysis. • Capability to find the statements in the application that can be op- timised. Perform the conventional data-flow analysis on them that will later be sharpen by using the control-flow information obtained at run-time. • An interface able to specify at what points to intercept the execution of the application and use the information obtained from the deferred data-flow analysis to make the optimisation decisions. 1 From now on by assembly we refer to the .NET assembly
3.2 Individual Design Phases 22 3.2 Individual Design Phases It has been decided to divide the overall design into three individual phases. Each of these phases will target one of the specific requirements described above. In this section we briefly describe each of these individual phases and present a flow diagram for each one of them. 3.2.1 Assembly Analysis In this phase we prepare assemblies that are written in the Common In- termediate Language (CIL) for the data-flow and the control-flow analysis. The virtual execution system of .NET framework is a simple stack-based machine consisting of an instruction pointer, a stack pointer and an array of arguments and an array of locals. In order to simplify the data-flow and the control-flow analysis instructions from the assembly will be grouped into the instruction blocks such that before and after any instruction block the number of operands on the stack will be equal to zero. Further to this, in assembly analysis phase we also identify methods that are candidates for the optimisation using the user’s policy. Only the instructions from these methods we group into the instruction blocks. Figure 3.1 shows the flow diagram of the assembly analysis. Using the user’s policy, assembly analy- sis identifies the candidate methods for the optimisations and then for each of them create instruction blocks. In the Section 4.1 we have described in detail how this phase has been implemented. Figure 3.1: Assembly analysis
3.2 Individual Design Phases 23 Figure 3.2: Static analysis and builder phase 3.2.2 Static Analysis Static analysis phase performs data-flow analysis and control-flow analysis on the instruction blocks obtained from the previous phase. These analysis must be performed at the compile-time or just before the execution of the user’s application. In our design we have decided to perform this analy- sis just before the execution of the user’s application. Except the standard static analysis, in this phase we also perform the builder phase of the deferred data-flow analysis as explained in the Section 2.1.3. Additional information passed to the builder phase are instructions in the program that will act as op-nodes. The results of static analysis and builder phase are kept in the data structures that can be accessed during the execution of the user’s appli- cation. Figure 3.2 shows the flow diagram for performing the static analysis and builder phase of deferred data-flow analysis on the instruction blocks, while the Section 4.2 deals with the implementation details corresponding to this design phase. 3.2.3 Interface Construction The final primary design phase is building the interface. The interface en- codes a knowledge specific to the possible optimisations that can be per- formed on the application. This knowledge is obtained from the results retrieved from the previous design phase. Using the interface we should be also able to specify at what points in the application we want to per- form these optimisations. This interface is a domain-specific optimisation component (DSOC) described in the Section 1.1. The properties of DSOC are similar to the properties of aspects in the aspect-oriented programming. Due the time constraints imposed on this project instead of implementing
3.3 Design Overview 24 the interface we have decided to use the existing aspect-oriented framework for .NET that supports dynamic weaving since the most of today’s opti- misations are run-time optimisations. Description and usage of the aspect- oriented framework we have chosen for this project is given in the Section 4.3 together with the explanation how we have incorporated results from the previous design phases with the framework. Figure 3.3 shows the process of applying domain-specific optimisation component to the application to produce the optimised version of the application. Figure 3.3: The process of applying optimisation(s) to the application 3.3 Design Overview By implementing each of the primary design phases we will fulfill the require- ments given in the Section 3.1. By integrating individual phases together we get the full picture of the system as shown in Figure 3.4. 3.3.1 The Integrated System From the Figure 3.4 we can see that the domain-specific optimisation com- ponents consist of static pointcuts, dynamic pointcuts and an advice. We use static pointcuts of DSOC to pass additional information required by the static analysis and the builder phase of the deferred data flow analysis. The
3.4 Summary 25 purpose of dynamic pointcuts is to use results from the data-flow analysis. Based on the results from the data-flow analysis we specify the connection points in the application, using dynamic pointcuts, where we want to in- tercept the control flow of the application. At those points we will execute the advice. The advice encodes a knowledge specific to the application (e.g. data-flow information) and a knowledge how to perform the optimisation. Figure 3.4: Design overview 3.4 Summary This chapter, • gives the requirement specification of this project. • describes how the specification has been broken into individual design phases. • points out what requirements are met by each design phase. • presents the integration of these phases into a single system.
Chapter 4 Implementation In this chapter we explain the implementation of design described in the previous chapter. The implementation serves as the proof of concept of ideas stated in the introduction chapter and it was done in C# programming language. Although we do include some code listings throughout the report, the full implementation was too large to include. The chapter starts with a brief introduction to the library used to retrieve Common Intermediate Language (CIL) instructions from assemblies and we give a description of each aspect of the library required to implement deferred data-flow analysis. From now on, by instruction we refer to CIL instruction. Then we describe in detail the implementation of deferred data- flow analysis together with the API used to obtain results produced by the analysis. Finally, we introduce the aspect-oriented framework used to apply the domain-specific optimisations based on the results from the deferred data-flow analysis. 4.1 Assembly analysis In this section we give a brief introduction to the library used to retrieve in- structions from assemblies. Then we explain how to obtain the instructions from the assemblies using the library and the structure of classes used to represent those instructions. In order to simplify the deferred data-flow anal- ysis we show how to group the instructions into the instruction blocks. We will demonstrate how to use those instruction blocks to do the control-flow analysis. At the end of this section we will have enough tools to implement the deferred data-flow analysis. 4.1.1 The Adon library The Adon library was implemented by Kwok Cheung Yeung at the Imperial College London. It was designed as a part of a project to fully port Veneer
4.1 Assembly analysis 27 from the Java world to .NET world. Due the lack of good open source tools for analysing assemblies Yeung decided to implement his own version. During the work on this project we were able to obtain a working version of this library, although the library is still not freely available. Since no documentation for the Adon library was available at the time of writing this report, we have decided to describe parts of the library that was of use in this project. The library makes a heavy use of reflection classes from the .NET frame- work, namely System.Reflection and System.Reflection.Emit. To load an assembly in order to be able to operate on it, a CodeDatabase class needs to be used from the Adon library. A static method GetAssembly() from the CodeDatabase class is used to load the assembly whose name is passed as the argument. After the assembly has been loaded, we can use a method GetMethod() to obtain properties of a specific method (e.g. return type) together with method’s instructions. Since a user decides which methods are candidates for the optimisation, we will only obtain properties of the methods specified by the user. In the next two subsections we elaborate on the classes used to represent methods and instructions. Structures The main class in Adon is an abstract class AnnotatedBase and all other classes extend this class. The purpose of this class is to manually add ad- ditional information to methods and instructions. For example, using this class we are able to add to the instruction a list of variables that the instruc- tion defines. The methods are represented by an abstract ILMethodBase. Figure 4.1: Classes used to represent methods from assembly This class extend the abstract class ILMember that in turn extends the ab- stract class AnnotatedBase. We can have two different types of methods. These types are constructor, represented by the ILConstructor and a regu- lar method, represented by the ILMethod class. Both of those classes extend
4.1 Assembly analysis 28 the abstract class ILMethodBase. The class diagram for classes described above is given in Figure 4.1. Because we are only going to optimise methods that are specified by the user the class of our main interest is ILMethod class. From the Figure 4.1 we note that we are able to obtain the list of method’s instruction, because the list of method’s instruction is stored in the class ILMethodBase that class ILMethod extend. The instruction list of a method is stored in the class InstructionList that is improved implementation of ArrayList. There- fore to access individual instructions of a method we can use the standard operations on the array. Instructions All representations of instructions in the Adon library extend the base ab- stract class InstructionBase. The InstructionBase class extends the AnnotatedBase and ICloneable classes. Therefore it is possible to manually add additional information to each instruction. The information that we are planning to add to each instruction are results from the data-flow analysis. Also, in the Adon library, there exists a class CompoundInstructions that extends the InstructionBase class and is used to group the instructions. In implementing our instruction blocks we will be extending this class. Since the CompoundInstruction indirectly extends the AnnotatedBase class we can also manually add aditional information to the instruction blocks. The power of Adon lies in the fact that each Common Intermediate Language instruction is represented by a class. For example, unconditional branch instruction br is represented by a class Br. Thus we can check for the instructions in the instruction list simply by testing if the instruction is an instance of a class of instruction that we are looking for. 4.1.2 Instruction blocks In the previous section we have introduced the Adon library and explained the structure of the classes that were used throughout the implementation of this project. In this section we will show how to use these classes to create the instruction blocks. The virtual execution system of the .NET framework is a simple stack- based machine that consists of an instruction pointer, a stack pointer, an array of arguments and an array of locals. This virtual machine differs from the other virtual machines, because most of the other virtual machines put arguments and locals in the same array. Since the virtual machine for the .NET framework is stack-based then a single statement from a high-level programming language will be represented by several instructions in the Intermediate Language. For example, the following simple statement from any high-level language,
You can also read