A Generic Approach of Static Analysis for Detecting Runtime Errors in Java Programs
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
A Generic Approach of Static Analysis for Detecting Runtime Errors in Java Programs Xiaoping Jia, Sotiris Skevoulis Division of Software Engineering School of Computer Science, Telecommunication, and Information Systems DePaul University Chicago, Illinois, U.S.A. E-mail: jia@cs.depaul.edu, sskevoul@cs.depaul.edu Abstract language, which is strongly typed with extensive and This paper presents a generic approach to statically sophisticated runtime checking mechanism. analyze Java programs in order to detect potential er- However, even the most sophisticated compilers rors (bugs). We discuss a framework that supports our have a limited ability to detect potential errors. This approach and carries out the static analysis of Java is due to certain time, resources and performance con- code automatically. Our approach can automatically straints that limit the amount of checking that com- detect potential bugs and report them before the pro- pilers can perform. More serious problems, that can gram is executed. For a Java class, invariants related cause runtime exceptions, such as attempts to access to the category of error under examination are au- null pointers, out of bounds array indeces, etc. can not tomatically generated and used to assess the validity be caught by any compiler yet. Most of these kinds of of variable usage in the implementation of this class. errors are usually caught at runtime. Our approach is distinctive in its emphasis to provide We approach the problem of potential error detec- a practical generic mechanism for error detection that tion, using formal methods. We provide a partial so- is capable of addressing error detection for a variety of lution and a generic mechanism capable of detecting error categories via a web of specialized components. a variety of runtime errors. We apply static analysis A research prototype has been developed that demon- and veri cation techniques, to automatically analyse strates the feasibility and eectiveness of our approach. Java programs and detect potential bugs that can not be detected by compilers and other data ow-based analysis techniques. Our approach is based on the use 1. Introduction of the logical concepts of class invariant and weakest precondition 4]. We developed a generic and deter- Detecting errors in programs has always been one ministic detection mechanism capable of addressing of the most active areas of research 4, 1, 11, 7] in the problem of error detection for a variety of run- computer science. Numerous research eorts have re- time errors. The detection mechanism comprises a sulted in techniques that allow us to reason about pro- set of generic and specialized algorirhms to carry out grams. Concerns about program correctness has led analysis of Java code automatically. It consists of the from the Hoare triples 5], to Dijkstra's weakest pre- following components: condition 1] and currently to complete veri cation environments such as Higher Order Logic (HOL) 2] Generic analysis It can be used to provide the and the Prototype Veri cation System (PVS) 11]. A framework for any kind of analysis that we need. large number of programming notations exists that Specialized analysis. It specializes the behavior use a variety of mechanisms to prevent potential er- of some of the steps de ned in the generic mech- rors from occurring or to notify the user in a safe man- anism. ner if errors do occur. Examples of such mechanisms include strong typing, smart compilers, and runtime Prototype. A fully functional prototype imple- checking. Java 3] is an example of such programming ments the algorithms and provides feedback that
is used to enhance and strengthen our approach generic and specialized components in order to deter- mine invariants related to the speci c analysis that is Our approach uses a generic mechanism for carry- performed. The invariant determination process takes ing out the static analysis of Java programs and pro- into consideration the dierent initialization semantics vides a number of specialized components which take for static and instance variables i.e. static variables into account the idiosycratic details of each category are instantiated during the rst active use 6] of the of errors that we are trying to detect. type (class). The generic algorithms are listed below: The detection mechanism of our approach has cer- tain limitations, mainly due to the intractability of DetermineInvariant. This algorithm is responsi- theorem proving process. We can not guarantee abso- ble for generating and verifying a class invariant. lute success in nding all bugs even for the restricted types of analysis discussed earlier. Our goal is not BreakDownCandInv. It splits the candidate in- to nd all errors rather to nd the majority of them variant into predicates involving static variables in a fully automatic and transparent to the developer and predicates involving instance variables. way. The key characteristics and contributions of our approach could be summarized as follows:: IsInvariant. It checks whether a predicate regard- ing class level variables is an invariant. Generic detection mechanism. Our approach is both speci c enough to detect errors determinis- CalculateWP. (CWP) Given a statement and a tically and generic enough to provide an ecient predicate, it calculates the weakest precondition. and eective framework for an extensive variety Prove. It is a call to the thorem prover that at- of error categories. tempts to discharge proof obligations. Complete automation. The entire process of ana- CheckViolation. It analyzes the implementation lyzing the source code and reporting the potential violations is fully automatic. of the class and based on the class invariant, it check for potential errors. No need for formal specications. We do not re- quire formal speci cations. Indeed, we recover Each one of them uses a number of generic and spe- the relevant speci cations from the source code. cialized algorithms. Specialized analysis is achieved with a number of speci c algorithms that carry out Flexibility. Our approach can be easily extended analysis for speci c categories of bugs. Some of these to incorporate any user provided speci cations algorithms are listed below: which will strengthen the capability of the analy- sis component and will extend its set of potential ConstructCandInv. It constructs a potential in- errors that can eectively check for. variant based on the category of error that we In the following section we discuss the generic check for framework and present the algorithms. Section three Mutate. The goal of the mutation algorithm presents an overview of the prototype tool and its com- is to provide an array of weaker predicates and ponents. Section four discusses the experiments that store them in order of strength starting from the have been carried out using the prototype tool. Re- strongest one (the predicate itself unchanged) go- lated work and comparisons with our approach is pre- ing to weaker ones until the predicate can not be sented in section ve. We discuss some future work mutated anymore and its weakest mutated form is section six and conclude in section seven, with a is the predicate true. summary of main ndings and results. CreateTargets It creates the veri cation condi- tions that will ensure the safe use of variables in 2. The Analysis Framework the Java code. An important concept in our approach is the invari- ant of a class. A class invariant is a condition that is Determining a class invariant involves the construc- satis ed by all non-transient states of the instances of tion of a candidate invariant, its mutation and check- the class. A generic algorithm which can determine ing if it satis es the requirements to be an invariant. an invariant has been developed. It uses a number of
2.1. The DetermineInvariant Algorithm expresses a condition about static variables and one We informally de ne the following concepts: about the all variables. Invariant is broken down to two sets in order to examine seperately the properties Static Invariant is a condition regarding only that do not depend on the instantiation of the Java static variables of the Java class, that is ensured class from the ones that do. Each predicate is exam- by all static initialization blocks and is preserved ined seperately and if it ful lls the requirements of an by all constructors and public methods. invariant, is added to the invariant of the class. The Instance Invariant is a condition regarding all output of the algorithm is the invariant for the Java class level variables of the Java class, that is en- class under examination. sured by all constructors and is preserved by all public methods. 2.1.1. Determining the Invariant P Predicate DETERMINE I NVARIANT ( in JavaClass Class) We need to establish a condition (i.e invariant) that 1. /* Initialization */ is true for the relevant variables every time a class is P Predicate Invariant := ? instantiated. The algorithm described below is used P Predicate CandInstInv := ? to determine invariants related to both static and in- P Predicate CandStaticInv := ? stance variables. P Predicate InstanceInv := ? P Predicate StaticInv := ? boolean I S I NVARIANT (in P Predicate Precond, in P Predicate P Predicate PermutCandStaticInv := ? P Predicate PermutCandInstInv := ? Candidate) seq Predicate MutCandStaticInv := hi boolean progress := false seq Predicate MutCandInstInv := hi for each ci 2 InstanceConstructionBlock do 2. /* Constructing candidate static and instance invariants*/ progress := CandInstInv := C ONSTRUCT C PROVE (Precond ) CWP (ci Candidate)) ANDINV (Class CandStaticInv) PermCandStaticInv := BREAKDOWN C if : progress then ANDINV (CandStaticInv) break PermCandInstInv := BREAKDOWN C ANDINV (CandInstInv) end if 3. /* Determine invariant related to static variables */ end for for each predst 2 PermCandStaticInv do if progress then MutCandStaticInv := M UTATE (predst ) for each mutpred in MutCandStaticInv do for each mj 2 Methods do if I S STATIC I NVARIANT (mutpred ) then progress := PROVE (Candidate ) CWP (mj Candidate)) StaticInv :=StaticInv fmutpred g if :progress then break end if break end for end if end for end for 4. /* Determine invariant of instance variables */ return progress for each predinst 2 PermCandInstInv do MutCandInstInv := M UTATE (predinst ) for each mutpred in MutCandInstInv do Due to the dierent instantiation semantics be- if I S I NVARIANT (mutpred ) then InstanceInv :=InstanceInv fmutpred g tween static and instance variables, we actually run break the algorithm with dierent inputs and each time we end if consider dierent instantiation blocks. Speci cally for end for the static invariant determination the static blocks end for 5. /* The invariant found */ must ensure the potential invariant while for instance Invariant := StaticInv InstanceInv invariant we consider init blocks and constructors. return Invariant The precondition that holds before every instantia- tion of a class is the StaticInvariant . In the case that DetermineInvariant algorithm accepts as input a the class has no static variables, the precondition is Java class and returns a predicate that is satis ed by reduced to the predicate true. We use this as a Pre- all non transient instances of this class. It reads in condition input in our algorithm along with the can- the class level variables both static and non static didate invariant, Candidate. An invariant regarding and forms a candidate invariant. It breaks down instance variables has to be ensured by constructors the formed invariant into two predicates: one that and preserved by each public method.
2.2. The Check Violation Algorithms P Predicate Targets := ? P Variable ProblematicVars := ? The algorithms presented in the previous section 2. /* Check Java block of code for bugs with a given precondi- determine an invariant with regard to a speci c prop- tion */ erty that we try to ensure. We can now check the for each statement si 2 Block do usage of each relevant variable to detect any possible ProcessedStatements := ProcessedStatements a hsi i Targets := C REATE TARGETS (si ) violations. if Targets 6= ? then for each pred 2 Targets do void aMethod(...) { correct := PROVE (Precondition ) fInvariant g the pre-condition CWP (ProcessedStatements pred)) // ... other statements if : correct then f:isNull (v )g the assertion we attempt to prove ProblematicVars := ProblematicVars fvar g end if v.m(...) the dereference end for // ... other statements StatementsWithBugs := StatementsWithBugs } fsi ! ProblematicVars g In order to do that we provide an algorithm that end if end for scans though the implementation of the class, locates all the relevent variables and forms the appropriate 3. /* Return potential bugs found in the block of code */ veri cation conditions. return StatementsWithBugs P StatementBugs CHECK CLASS V IOLATION ( in JavaClass Class, in P Predicate Invariant) 3. Prototype Development The main components of the prototype are listed 1. /* Initialization */ below: P StatementBugs StatementsWithBugs := ? 2. /* Check Static Block for potential bugs. Precondition is: true */ Model constructor The modeling activity de- StatementsWithBugs := StatementsWithBugs pends on an extensive set of classes that provide C HECK V IOLATION (StaticBlock , true ) support for every Java programming construct, i.e 3. /* Check Constructors for potential Bugs. Precondition is: class, method, block of code, statements, types, StaticInvariant */ etc. An abstract syntax tree is created and con- for each cj 2 InstanceConstructionBlock do trol ow graph of the program is constructed. StatementsWithBugs := StatementsWithBugs Based on the control ow graph model all the C HECK V IOLATION (cj , StaticInvariant) end for paths of execution are identi ed. 4. /* Check public Methods for potential Bugs. Precondition is: Invariant */ Invariant generator It uses the generic and cus- for each mj 2 Methods do tomizable algorithms to derive an invariant re- StatementsWithBugs := StatementsWithBugs garding the speci c property that is under inves- C HECK V IOLATION (mj , Invariant) tigation. end for 5. /* Return the pstatement with potential violations */ Violation detector Given the invariant from return StatementsWithBugs the invariant generator, it scans through class im- plementation to form the approprate veri cation conditions, and passes them to the prover. In order to identify potential errors, we need to carefully check every statement in the implementation In general, theorem proving that involves unre- of the class, locate the points that these variables are stricted predicates is intractable. Our approach has used, form the appropriate predicate that needs to be focused on the use of such restricted predicates. Ex- true in order for the variables to be safely used by the amples of such restricted predicates are the following: program at the point of execution. This is shown in a general form below: Checking for illegal dereference of variable of ref- P StatementBugs CHECK V IOLATION ( in JavaBlockCode Block, erence type myvar. We can formulate the predi- in P Predicate Precondition) cate :isNull (myvar ). 1. /* Initialization */ Checking for array index falling out of bounds seq Statement ProcessedStatements := hi while accessing the ith element of myarrayi]. We boolean condition := true boolean correct := false formulate the predicate i < myarray :length .
4. Experiments Error null pointer Scenario class level var not initialized We assess the capabilities of our approach by con- array bounds array initialized with size , where size 0 ducting a number of small experiments. The results null pointer class level var becomes null at arbitrary are encouranging and show that the prototype can array bounds array created and accessed in local scope serve as a vehicle for further research in the area. null pointer var is dereferenced under two condition blocks Our rst goal was to demonstrate the feasibility of array bounds class level array initialized with int constant: a) remains constant the approach. Small scale projects have been con- b) may change in various constructors ducted and the results have been evaluated. Our re- null pointer null var dereferenced in an unexecutable path search eort is experimental in nature. Its success can array bounds array access guarded by arr.length be judged through experiments rather than theoreti- null pointer var initialized but randomnly becomes null cal proofs and analyses. Experimentation is used as null pointer var becomes null before end of loop iteration null pointer var not initialized but dereferenced is guarded a feedback mechanism to our theoretical studies and solutions. The goal is to gain and demonstrate eec- Figure 1: Experimental Scenarios tiveness, through experimentation. The experimental prototype allows us to gain insight, discover obstacles and limitations of our techniques. We use the feedback scenarions are shown in Fig. 1. Prior to any testing all given by the tool to re ne and enhance our approach. test cases were successfully compiled to ensure syntac- We conducted a large number of tests and the proto- tic correctness of the code. For each of the classi ed type tool exhibited the following key characteristics: cases we performed complete set of permutations un- der dierent scenarios to measure the eectiveness of Complete Java coverage. We were able to cover the approach. Our prototype was capable of nding the entire Java language. No syntactical or se- over 80% of the errors in each scenario. In summary, mantical restriction on the original language has the early experiments are encouranging. The proto- been imposed. type is still under development and more extensive No guarantee of program correctness. Our ap- case studies are planned. proach does not promise absolute correctness or absence of errors. It rather strives to identify cer- 5. Related Work Flow control based techniques have been used suc- tain types of potential errors cessfully as part of modern compilers such as Java, Application of formal methods Our approach to provide early detection of a various anomalies and promotes the use of formal methods in a way minimize the number of inconsistencies in the pro- completely transparent to the user. We can grams. Unfortunately these techniques are not suc- actually apply formal methods with little cost cessuful in dealing with invariant properties, pre/post but concrete gains. conditions and cannot prevent a number of tedious bugs from occurring. Such bugs include dereferencing Concerete gains. The prototype tool under devel- a null pointer or array index falling out of bounds, opment provides the means for measuring the ef- etc. These kinds of problems are beyond the capabil- fectiveness and capability of our approach. Early ities of compilers and subsequently are transfered to experiments indicate that: the runtime environment where some languages oer exception handling mechanisms. Our approach does not introduce any radi- More recently, research work at DEC laboratories cal changes in the way practitioners develop has resulted in Extended Static Checking (ESC) 8, 10] their code. and checking object invariants 9]. They attempt to It removes the burden of proof from the soft- use formal methods to identify particular kinds of bugs ware practitioners in a programming language and provide also some The analysis is fully automatic kind of feedback to the programmer about those po- tential bugs. ESC translates progrmas into a an ab- We have developed a classi cation scheme for the stract form based on Dijkstra's guarded command. dierent causes of errors in both kinds of specialized Their logical framework is untyped rst order pred- analysis discussed in this paper: null pointer and array icate calculus and the lack of type information for bounds. For each kind of anomaly we analyzed and the proofs is covered by the background predicate 9] enumerated the causes and the scenarios under which which includes type axioms. An unsuccessful proof a violation will be occur in the program. Some of those return information about the reason of its failure.
The work presented in this paper is a generalisa- References tion and extension of the work 12] on the null pointer 1] E. Dijkstra. Guarded commands, nondetermi- detection which carried out by the members of For- nacy and formal derivation of program. Com- mal Methods group at Software Engineering Division munications of ACM, 18(8):453{458, 1975. at DePaul University. 2] M. Gordon and T. Melham. An Introduction to 6. Future Work HOL: A theorem proving environment for higher The work presented in this paper lays out the foun- order logic, 1993. dation of a comprehensive analysis of Java classes. 3] J. Gosling, B. Joy, and G. Steele. The Our work will evolve in three main directions in or- JavaTM Language Speci cation. Addison-Wesley, der to provide: 1996. Stronger analysis techniques which will cover 4] D. Gries. The science of Programming. Springer- more complicated cases within the already de- Verlag, 1981. ned categories of potential bugs 5] A. Hoare. An axiomatic basis for com- An extensive array of algorithms capable of han- puter programming. Communications of ACM, dling dierent cases 12(10):576{580, 1969. Broader coverage of kinds of errors that can be 6] B. J. Gosling and G. Steele. Java speci cation automatically checked. These include but not language. Technical report, 1996. available from http://www.javasoft.com. limited to: 7] D. Jackson. A formal Speci cation Language for { illegal downcasting Detecting Bugs, 1992. PhD Thesis, Massachusetts { string index out of bounds Institute of Technology. 8] D. l. Detlefs. An overview of the extended static 7. Conclusion checking. In Proc. The First Workshop on Formal Usually, checking for errors is not one of the rst Methods inSoftware Practice, pages 1{9, 1996. priorities of a software developer. Sometimes errors ACM-SIGSOFT. even escape detection during testing and products are released to customers with a number of bugs hidden 9] K. R. M. Leino. Ecstatic: An object-oriented pro- in the code. Our approach provides a solution to this gramming language with axiomatic semantics. In problem by focusing on the detection of certain kinds Proc. FOOL4, 1997. Fourth International Work- of bugs. The approach and algorithms described in shop on Foundations of Object-Oriented Lan- this paper are dealing with the concept of class in- guages. variant and based on that, check the implementation 10] K. R. M. Leino and R. Stata. Checking object of the class. The implication is that for an instance invariants. Technical report, Digital Equipment created by any public constructor of the class, any Corporation Research Center, 1997. Palo Alto, public (thread-safe) method can be invoked in any or- CA. der. In our work, we address some of the obstacles that prevent formal methods from being widely used 11] N. S. S. Owre, J. Rushby. PVS: A prototype in software industry. We make the use of formal meth- veri cation system. Lecture Notes in Arti cial ods and theorem proving completeley trasnparent to Intelligence, 607:748{752, 1992. the software practitioner. By sacri cing guarantee of absolute corectness and lack of errors, we provide a 12] S. Sawant. Applying static analysis for detecting mechanism to detect an arguably signi cant number null pointers in java programs. Technical report, of common errors, that are part of the daily debugging Oct. 1998. MS Thesis. routine of every software practitioner. Our approach focused on certain types of anomalies in the source code that can be detected automatically. The key feature which makes our approach practical is its complete automation which entails practicality.
You can also read