Lock-and-key Strategies for Handling Undefined Variables
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 23(7), 693–710 (JULY 1993) Lock-and-key Strategies for Handling Undefined Variables richard b. borie and allen s. parrish Department of Computer Science, University of Alabama, Box 870290, Tuscaloosa, AL 35487-0290, U.S.A. and srinivas mandyam BusLogic, Inc., 4151 Burton Drive, Santa Clara, CA 95054, U.S.A. SUMMARY Most programming languages ignore the problem of undefined variables and permit compilers to allow the leftover contents of the memory cells belonging to such variables to be referenced. Although efficient, this type of semantics does not support software engineering, because defects might arise in subtle ways and might be difficult to locate. Two other types of semantics that better support software engineering are to either initialize all variables to some default, e.g. zero, or to require that all references to undefined variables be treated as errors. However, these types of semantics are obviously more expensive than simply ignoring the undefined variables entirely. In this paper, we propose a simple technique that works equally well for both of these latter two types of semantics, and whose efficiency compares favorably for certain realistic programs with more traditional implementations of these semantics. Furthermore, we provide a mechanism for using this technique through Ada implementations of two abstract data types where undefined variables respectively exhibit these two types of semantics, and whose implementations of these semantics use our technique. These abstract data types allow our technique to be selectively used in strictly those situations where the cost of the technique is justified. We provide practical examples illustrating these situations. key words: Undefined variables Programming languages Software engineering Algorithms Language design Sparse arrays INTRODUCTION The method of handling variables that are referenced before being defined represents one of many decisions that have to be made by programming language designers. At the semantic level, there are three approaches:1 I1. Allow values of undefined variables to be referenced. This means that bit patterns that happen to be ‘leftover’ in memory from previous executions could be used by the program. I2. Initialize all variables to default values, e.g. zero for integers, blank for characters, etc. I3. Define rules prohibiting references to undefined variables. 0038–0644/93/070693–18$14.00 Received 10 August 1992 1993 by John Wiley & Sons, Ltd. Revised 5 January 1993
694 r. b. borie, a. s. parrish and s. mandyam I1 effectively means that the whole issue is ignored, and undefined variables are treated just like any other variable. Among these different semantics, I1 can be ‘implemented’ most efficiently, since actually no implementation is required. Conse- quently, I1 is the most frequently adopted semantics by language designers.1,2 However, I1 is generally acknowledged to be the most problematic semantics in terms of software engineering.1–5 For example, both Reference 2 and Reference 3 observe that I1 can hide defects during testing. Consider a program which contains a counter that was not properly initialized to zero at the beginning of the program. If the counter happens to initially contain a zero coincidentally during the testing process, testing might not reveal any defects. But if the program then later executes in an environment where the counter is coincidentally initialized to something other than zero, a defect might then appear. This example seems to offer some support for I2 or I3, despite additional implementation costs. Between these two approaches, there is some disagreement as to which one better supports software engineering. Some authors, e.g. Meyer1 and Weide, Ogden and Zweben4, argue in favor of I2, because it simplifies the program- mer’s tasks if the initialization values are well-chosen. On the other hand, Reference 2 argues in favor of I3, because despite the use of well-chosen initialization values, there might be some cases where a default initialization value was not really intended to be used. In such cases, the programmer might have made a mistake in either failing to properly assign the variable, or in referencing a variable that should not have been referenced. As an example from Reference 2, consider a program that is supposed to find the minimum value in an array (1. .N) of natural numbers that has been filled by the programmer in positions (1. .i), where i is allowed to be strictly less than N. If the program is ‘off-by-one’ and accidentally looks at position i+1 in searching for the minimum, and I2 semantics have been used to initialize the array to zeros, then the program could conceivably indicate that the 0 in the (i+1)th position is the minimum. Yet the (i+1)th position is irrelevant, because it is not part of the data set. Depending on the context, this error might be difficult to find during testing and debugging. Regardless of whether I2 or I3 better supports software engineering, it is clear that both offer more support than I1, yet both are more expensive than I1. Conse- quently, reducing the cost of implementations of these semantics seems to be a desirable goal. In this paper, we propose a simple technique that works equally well for both of these latter two types of semantics, and whose efficiency compares favorably for certain realistic programs with more traditional implementations of these semantics. We offer an analysis of our approach that concretely identifies the types of programs for which the approach outperforms existing implementations. Furthermore, we provide a mechanism for using this technique through Ada implementations of two abstract data types where undefined variables respectively exhibit these two types of semantics, and whose implementations of these semantics use our technique. These abstract data types allow the programmer to selectively use our technique in strictly those situations where the cost of the technique is justified. We provide practical examples illustrating these situations. Although there are advantages in building our technique directly into compilers, our use of abstract data types has the advantage of permitting our technique to be used with existing compilers.
lock-and-key strategies 695 EXISTING IMPLEMENTATIONS In this section, we consider existing, typical implementations of I2 and I3. Our interest is in informally analyzing the run-time requirements of these implementations beyond what is already required by normal program execution, i.e. with I1 semantics. To make our analysis as widely applicable as possible, we assume a typical machine architecture where no special provisions have been made for efficiently implementing these semantics. We also conduct our analysis in the context of local procedure variables in a stack-based language such as Pascal. Thus, although the load image for a program may include a static memory map allowing global variables to be initialized without run-time overhead, our discussion is in the context of local procedure variables, for which run-time initialization overhead is necessary. For simplicity, we define a variable to be any scalar variable of simple type, such as integer, character, real or boolean, or any elementary component of a structure, such as a single integer element from an array of integers. Also, we use the term program to mean any program unit, such as a procedure or block, for which variables can be separately allocated at initiation and destroyed at termination. We then refer to the collection of all variables in a program as the program’s variable space. A variable definition is a statement that assigns a value to the variable, i.e. through assignment or user input. A variable reference is a statement that refers to an existing value of the variable, i.e. through the right side of an assignment statement or through a predicate. Throughout this work, it is important to classify variable references as follows: 1. Variable references for which the compiler can determine whether or not the variable has been defined. These variable references fall into two subcategories: (a) Variable references for which there is no path from the beginning of the program to the reference that contains a definition of the variable. (b) Variable references for which every path from the beginning of the program to the reference contains a definition of the variable. 2. Variable references for which the compiler cannot determine whether or not the variable has been defined. These variable references fall into two subcategories: (a) Variable references for which some, but not all paths from the beginning of the program to the reference contain definitions of the variable. (b) References where the memory location is based entirely on run time events. This includes most references to array elements that have variable subscripts, and to dereferenced pointers. The compiler is unable to determine whether references in 2(a) are guaranteed to be defined, because it cannot determine whether the paths containing definitions are executable. A similar determination is impossible for 2(b) because the compiler cannot determine the value of the subscript variable. We note that because of 2(b), variable references falling into categories 1(a) and 1(b) are usually to locations where the relative address can be computed at compile time, such as normal scalar variables. We first consider I3 semantics, which makes references to undefined variables illegal. Because static analysis can detect references falling into category 1, such references are either guaranteed errors 1(a), and can be flagged at compile time with
696 r. b. borie, a. s. parrish and s. mandyam a fatal error, or are guaranteed not to be errors 1(b), and can be flagged and ignored at run-time. However, since the compiler cannot determine whether variable references in category 2 have been defined, run-time analysis is necessary for variables in this category. Also in the absence of static analysis to identify variables falling into the first category, such run-time analysis must be performed with respect to all variables. There are several implementations of tun-time analysis for I3.2 Two representative implementations are as follows: 1. Undefined values method. There is an ‘undefined’ value for every type, that is outside the set of legal values for the type. The compiler must generate code to initialize every variable to the undefined value for its type. This code is executed at program start-up time. Then, every time a variable is referenced, if the variable contains the undefined value for its type, an error message is generated. This requires time at start-up to perform the initialization, proportional to the number of variables with references in category 2, and constant time for every variable reference in category 2. 2. Extra bit method. For every variable, an extra ‘defined’ bit denotes whether or not the variable has been assigned a value. All of the bits must be set to ‘undefined’ initially, requiring time at start-up to perform the initialization proportional to the number of variables with references in category 2. Then constant time is required for every variable reference in category 2 to check the assigned bit, and also for every variable definition with a subsequent reference in category 2 to check and/or update this bit. If the assigned bit still has its ‘undefined’ value when the corresponding variable is referenced, an error message is generated. Other existing implementations of the run-time analysis component of I3 possess the same general form. Essentially, for any of these implementations, an initialization must first be performed, requiring time proportional to the number of variables with references in category 2, to set every variable to an undefined state. Then when the user code is executed, every variable reference must be checked to see whether or not the variable is in its undefined state. If a variable is referenced while in its undefined state, an error message is generated. To implement I2 semantics, the compiler must generate additional code to initialize some or all program variables. In the absence of static analysis, all program variables must be initialized. If static analysis is performed to categorize variable references as above, then an optimization is possible: it is not necessary to generate initialization code for variables where every reference falls into category 1(b), since such variables are always guaranteed to be initialized. Of course, both of these approaches require time proportional to the number of variables being initialized. An alternative approach to implementing I2 involves generating code to initialize only those variables having references in category 1(a). As pointed out already, it is unnecessary to generate initialization code for variables whose only references are in category 1(b). Rather than initializing variables having references in category 2, one of the strategies for detecting undefined variables at run-time could be employed. However, instead of generating an error, a default value for the type could be returned. This approach takes initialization time proportional to the number of
lock-and-key strategies 697 variables having references in category 1(a), plus the time required to detect undefined variables in category 2. All of these implementations of I2 and I3 share a potentially non-trivial initializ- ation cost at program start-up time. Our proposed implementations of I2 and I3 discussed in the next section eliminate this start-up cost. Program start-up can be done in constant time with either semantics. However, with either semantics, our algorithm requires constant time for most variable definitions and references. We show that despite these additional time requirements, the start-up savings are still worth the additional cost for certain programs. NEW IMPLEMENTATIONS FOR I2 AND I3 Our proposed implementations of I2 and I3 both use the same algorithm to detect undefined variables. For I2, a reference to an undefined variable simply returns the default value for variables of that type; for I3, an error is generated whenever an undefined variable is referenced. Our algorithm is very similar to algorithms that have been developed to detect dangling pointers,6 to support constant time initialization and finalization of arrays,7 and to support a version of dynamic type checking in C.8 The basic idea behind our algorithm involves associating a ‘lock’ and a ‘key’ with every variable that possibly could be undefined when referenced, i.e. every variable not eliminated by static analysis. If the lock and key do not match, then the variable is assumed to be undefined, and the appropriate action is performed, i.e. returning a default value for I2 or generating an error for I3. The lock associated with a variable is set to match the key when the variable is defined, thus allowing subsequent access to the variable’s actual value. Although our final implementation eliminates this possibility, we first consider a simplified version of the algorithm that allows undefined variables to possibly go undetected because of a coincidental match between a key and an undefined lock. Specifically, the locks and keys are implemented as follows. The key for a variable is simply its memory address; thus, the key can be computed from the variable without requiring additional space. The lock is an extra memory cell associated with the variable that is large enough to contain its address, i.e. in order to match its key. The idea is that at the first time a variable is defined, the address of the variable is assigned to its lock, making the lock and key match at this point. Any subsequent references to the variable therefore return the actual value of the variable. By the same token, if the lock and key do not match on a particular variable reference, it can be assumed that the variable is undefined, and either an error message is generated or the default value is returned. More specifically, we have the following steps. 1. Whenever a variable is defined. First, check to see whether the lock matches the key. If so, then do nothing, as the variable has been defined previously. If not, then assign the lock the address of the variable, thus allowing the lock and key to match for subsequent references. 2. Whenever a variable is referenced. First, check to see whether the lock matches the key. If so, permit the actual value of the variable to be referenced, as the
698 r. b. borie, a. s. parrish and s. mandyam Figure 1. Lock-and-key configurations variable has been defined previously. If not, then the variable is undefined. For I2, return the default value for variables of this type; for I3, generate an error. Figure 1 demonstrates the lock-and-key configuration for both undefined and defined variables using this simplified algorithm. As noted above, the problem with this algorithm is that there is nothing to guarantee that a lock and key cannot initially match coincidentally, since there is no a priori initialization. Since our objective is to achieve constant running time at start-up, we cannot simply take the obvious approach of initializing all of the locks to some type of undefined state. This would also do nothing more than reduce the implementation to a more complicated version of the extra-bit method. Instead, rather than assigning a value directly to the lock as above, we make the lock for variable v (v.Lock) a pointer into an array (LockValues) which contains the actual lock values. The cells from LockValues are then uniquely allocated to various locks dynamically, from contiguous locations in the array. A variable, fence, partitions LockValues into its allocated and unallocated portions; fence is initialized to 0 to denote that no locks have been allocated. Then, in order to conclude that variable v has been defined, v.Lock must point at a value in the allocated portion of LockValues, and LockValues[v.Lock] must match the key for v, i.e. its address, v.Key. Figure 2 shows the lock-and-key configuration given this approach, for an undefined Figure 2. Undefined variable v
lock-and-key strategies 699 Figure 3. After the statement v:=25 variable v, and Figure 3 shows the lock-and-key configuration after v has been defined. Unlike our simpler, but incorrect, version of the algorithm given previously, this implementation requires non-zero start-up overhead, because fence must be initialized to 0. However, obviously this overhead is a very small constant. Figure 4 illustrates the details of the algorithm as an implementation of I3. Figure 5 illustrates the analogous implementation of I2. Note that, in either case, all three of the operations (Initialize, Define and Reference), should be guaranteed to be performed automatically by the language. The operation Initialize is performed automatically at program start-up, whereas Define and Reference are performed at every variable definition and reference, respectively. Figure 4. Lock-and-key algorithm pseudocode for I3
700 r. b. borie, a. s. parrish and s. mandyam Figure 5. Lock-and-key algorithm pseudocode for I2 COMPARISON WITH OTHER IMPLEMENTATIONS We now compare the execution time of the lock-and-key algorithm with those of existing implementations, for both I2 and I3. As discussed previously, it is possible to use varying degrees of static analysis with either semantics. The degree to which static analysis is performed affects the details of our execution time analysis. For simplicity, we first conduct the execution time analysis as if there were no static analysis. We later consider the impact of static analysis on our execution time analy- sis. We let V refer to the size of the variable space for a given program. For a given program execution, we let D and R be the total numbers of variable definitions and references, respectively. The comparison is summarized in Table I. Note that the values provided in Table I refer to the additional time requirements of the I2 and I3 strategies, beyond what is required for the normal execution of the program with I1 semantics. It is fair to point out that although the O(V) term is eliminated from both I2 and I3 by the lock-and-key approach, the O(D+R) term hides a relatively large constant, e.g. larger than the analogous term in the extra-bit implementation of I3. If we call this larger constant k, and the constant associated with the O(V) term in traditional methods c, then the lock-and-key approach is superior to traditional implementations of I2 and I3 whenever k(D+R) , cV. Regardless of the exact values of k and c, however, we maintain that there are realistic situations where the average for D+R Table I. Comparison of execution times Traditional Traditional Lock-and-key Strategy method time time I2 Obvious O(V) O(D+R) I3 Undefined values O(V+R) O(D+R) I3 Extra bit O(V+D+R) O(D+R)
lock-and-key strategies 701 over all executions is small enough relative to V to satisfy this relationship. In particular, a static array to hold variable-size input data, where the worst case number of data elements is much larger than the average case, constitutes such a situation. We elaborate on some specific applications involving this situation later in the paper. Adding static analysis does not substantially affect our execution time analysis. In particular, suppose we interpret V, D and R in Table I to refer to their respective quantities after static analysis has eliminated certain variables, definitions, and refer- ences from consideration at run-time. Provided that I3 produces only warnings at compile time for references in category 1(a), where no path preceding the reference contains a definition, then both I2 and I3 must deal with references in both categories 1(a) and 2 at run-time. So the analysis in Table I still holds. Alternatively, if I3 flags the references in category 1(a) as errors, which is certainly possible and perhaps desirable, this would allow I3 to ignore such references at run- time. On the other hand, I2 remains forced to deal with such references, since they do not constitute errors under I2. This latter interpretation would thus make the V, D and R quantities smaller for some programs under I3 than under I2. However, for either I2 or I3, this still does not change our basic conclusion concerning the circumstances under which the lock-and-key approach is more appropriate than conventional methods. IMPLEMENTATION IN ADA A combination of software engineering and performance concerns might motivate choosing different initialization semantics and different implementations of those semantics for different programs. An ideal approach would provide a compiler switch that permits the programmer to choose the appropriate option from Table I for the program at hand. This would permit static analysis to be integrated with run-time checking, possibly allowing certain variables to be eliminated from being subject to run-time checking. A second approach would provide user-defined types that exhibit the desired initialization semantics and performance characteristics. This would allow various implementations of I2 and I3 to be introduced into the context of existing compilers that only support I1 semantics, and would still allow the programmer to select the desired semantics by choosing the appropriate type. Although perhaps not as ideal as building our technique into the compiler, this approach of providing user- defined types would at least permit the lock-and-key technique to be used in situations where modifying the compiler is not an option. In this section, we provide an example of the latter approach by providing user- defined types whose initialization semantics are supported by our lock-and-key algorithm. Because of its data abstraction capability, we choose Ada as the implemen- tation language. Since the lock-and-key approach is most appropriate when the number of variable references is small relative to the size of the variable space, we feel that this approach has the most potential in the context of large structures that are, on the average, only partially used. We have therefore chosen to demonstrate this approach in the context of an array type, which we say is protected from undefined references by the lock-and-key mechanism. In this context, a ‘key’ for a variable, an array element, is just its index. Note that we could have chosen to ‘protect’ other types of structures in a similar fashion. We actually provide two types: an array type with I2 initialization semantics,
702 r. b. borie, a. s. parrish and s. mandyam exported by the package in Figure 6, and an array type with I3 initialization semantics, exported by the package in Figure 7. The package I2 requires generic parameters for the type of array elements and for the default value for the type; the array elements are then initialized to the default in constant time before variables of the protected array type are used. The package I3 only requires the array element type as a formal parameter; in this case, each reference to an array element that has Figure 6. Implementation of I2 in Ada
lock-and-key strategies 703 Figure 7. Implementation of I3 in Ada not been defined causes an exception to be raised. Each of these two packages exports Define and Reference operations that must be invoked in the obvious fashion. For example, Define(a,3,10) is invoked to assign the value 10 to element 3 in protected array a. Reference(a,i) is invoked whenever it is desired to reference element i in protected array a. Examples using these operations are given in the next section.
704 r. b. borie, a. s. parrish and s. mandyam Either of the packages I2 and I3 can easily be generalized to support protected multidimensional arrays. Figure 8 represents a two-dimensional generalization of the I3 package. The only feature that needs explanation is the allocation of storage for the field LockValues in the record Protected2dimArray. Ada prohibits declaring the size of a field to be an expression involving discriminants, such as Max1*Max2.9 However, using an access type as shown is permitted. PRACTICAL APPLICATIONS OF THE LOCK-AND-KEY TECHNIQUE We first consider an example of a procedure that locates and prints the smallest integer from the collection of integers stored in an array. The number of integers that are actually input into the array is determined by the actual value of a parameter called Count. We assume that Count is normally small relative to the size of the (static) array, which has been defined to handle a worst-case scenario in terms of data size. We note that even in a language like Ada that supports dynamic arrays, defining a static array with substantial wasted space, which costs little, may still be desirable to reduce execution time. As observed earlier, this example was used in Reference 2 to argue in favor of I3. In particular, given the error that Count is one greater than the actual size of the data n, I1 will lead to totally unpredictable results due to the existence of random data in position n+1, whereas I2 may lead to the incorrect conclusion that the 0 in the (n+1)th position is the minimum. Although our Ada compiler only supports I1 semantics, our Ada package I3 solves this problem. Figure 9 contains an example of a minimum procedure where the I3 package is used to detect the error. By defining the array to be protected, an error will be generated whenever the (n+1)th position is referenced. We note that whenever the size of the array is substantially larger than Count, then the relationship k(D+R) , cV is satisfied. The values chosen for the size of the array and for Count in our example at least plausibly satisfy this relationship, making this appear to be a case where the lock-and-key technique is more efficient than the conventional implementation of I3. Even if this were not the case, however, the package I3 at least allows us to replace the default I1 semantics of the Ada compiler with I3 semantics, in a situation where I3 semantics is most useful. Future work will involve providing both conventional and lock-and-key Ada package implementations of both I3 and I2, as well as more precise guidelines for choosing the most efficient implementation for a particular situation. As an aside, I3 is actually no more expensive than I2 with our lock-and-key algorithm. This is unlike the case with conventional approaches where I3 is more expensive; whereas I2 involves initializing all values, I3 involves initializing all values, or their extra bits, and checking them at run-time to determine whether they are defined. Thus, in situations like the above when I3 is more desirable than I2 in terms of software engineering concerns, no additional expense over I2 is required to implement I3 using locks and keys. Therefore, in cases where k(D+R) , cV holds, and the lock-and-key approaches are superior to conventional approaches, the decision of whether to use I2 or I3 can be motivated entirely by software engineering concerns, and not by efficiency. Aside from its ability to perform more efficient error detection associated with unintended uses of undefined variables, the lock-and-key technique provides a means
lock-and-key strategies 705 Figure 8. I3 generalization of a two-dimensional protected array
706 r. b. borie, a. s. parrish and s. mandyam Figure 9. Ada client: off-by-one error in general for using an array of size n without requiring O(n) initialization time. For certain problems, it is therefore possible to develop a more efficient algorithm with our protected array abstraction than with either an unprotected array or linked list data structure. In the remainder of this section, we briefly consider two such problems: maintaining a sparse array and evaluating a recurrence formula. A sparse array is an array in which most of the values are zero. Figure 10 gives an Ada implementation in which at most M of the values in an N-element array are accessed. This Ada implementation using the lock-and-key technique to initialize all array elements to a default of 0 requires only O(M) execution time. Traditional implementations, such as explicitly initializing all N elements, or using the extra bit or undefined values methods to trap undefined accesses, require O(N+M) time. Another implementation, using a linked list data structure that contains all (index, value) pairs whose value co-ordinate is non-zero, yields an O(M2) time algorithm.
lock-and-key strategies 707 Figure 10. Ada client: maintaining a sparse array Therefore when M is substantially smaller than N, as can be the case for a sparse array, the lock-and-key technique gives the most efficient implementation. Finally, consider the problem of computing an exact value F(n) given a recurrence such as F(k) = H c a1F(k/b1) + a2 * F(k/b2) , if k#1 , if k.1 A straightforward dynamic programming solution for computing a value F(n) requires O(n) time. Furthermore, assuming without loss of generality that b1$b2, the straightforward recursive solution has an execution time that satisfies a recurrence T(n)$2T(n/b1), and thus requires more than O(nlogb12) time. These standard techniques therefore each require time that is polynomial, and hence worse than polylogarithmic, in the magnitude of n. However, by combining the lock-and-key technique with other algorithmic tech- niques known as top-down dynamic programming and memoization,10 a better than polynomial execution time can be obtained. This execution time is proportional to
708 r. b. borie, a. s. parrish and s. mandyam the number of distinct values k such that F(k) is present in the recursion tree for computing F(n). Again assuming b1$b2, this number is O(log2b2n), so this method yields an O(log2n) execution time, which is polylogarithmic, and hence sublinear, in the magnitude of n. An Ada implementation of this algorithm is given in Figure 11. Note that this example represents an unusual use of I3 semantics. In evaluating the recurrence formula, we wish to test particular array elements to determine whether or not they are defined. However, an undefined array element does not imply an error, as our previous discussion has implied. Rather, an undefined array element array element is a signal to perform the recurrence computation. This is similar to ideas from Reference 5, which suggest ‘undefined’ as a legitimate value for a given type. Figure 11. Ada client: evaluating a recurrence formula
lock-and-key strategies 709 CONCLUSION In this paper, we have proposed an algorithm for detecting undefined variables in programs. We have shown that this algorithm can be used to implement two different treatments of undefined variables: either assigning a default value (I2) or returning an error (I3) whenever an undefined variable is detected. For programs where the average number of variable definitions and references is small relative to the size of the overall variable space, this algorithm is more efficient than classical approaches to implementing I2 and I3. As discussed previously, our technique is remarkably general, and has been used in several disparate contexts.6–8 Although the universe of programs for which this technique is efficient appears to be relatively small, we have provided a flexible implementation in Ada whereby the technique is built into an abstract data type to handle undefined variables of that type. We provided separate ADTs corresponding to I2 and I3 semantics. Thus, the programmer can make use of this technique only for array variables where it appears to be useful. In addition, the programmer can use different types for different variables according to the desired I2 or I3 semantics, even to the degree of treating different undefined variables in two different ways within the same program. Finally, we showed how this technique could be used to obtain speed-up of classical applications involving large, sparsely populated structures. There are several areas where future work appears to be needed. Certainly additional work is needed to validate this technique experimentally, in order to determine the extent of its usefulness in practical situations. In addition, appropriate ways to integrate these approaches into language design and implementation should be considered, e.g. through compiler switches to choose an appropriate semantics and implementation of that semantics. Also, issues surrounding the development of the protected array type should be further investigated. In particular, implementation alternatives should be considered that allow the lock-and-key algorithm to be reused independently of the type being protected, perhaps through inheritance of the basic algorithm across a variety of types. In addition, an implementation that supports a more natural client programming style should be sought, where assignment statements and variable references can be written in a more natural way. Perhaps reimplementing the protected array types in a language supporting assignment overloading and inheritance would offer some insight into a more appropriate implementation. REFERENCES 1. B. Meyer, Introduction to the Theory of Programming Languages, C.A.R. Hoare Programming Series, Prentice-Hall, 1990. 2. W. Kempton and B. Wichmann, ‘Run-time detection of undefined variables considered essential’, Software—Practice and Experience, 20, 391–402 (1990). 3. A. Parrish, ‘Obtaining maximum confidence from unrevealing tests’, Proc. IEEE Southeastcon 1992, April 1992. 4. B. Weide, W. Ogden and S. Zweben, ‘Reusable software components’, in M. C. Yovits (ed.), Advances in Computers, Academic Press, 1991, Vol. 33, pp. 1–65. 5. R. Winner, ‘Unassigned objects’, ACM Trans. Programming Languages and Systems, 6, (4), 449– 467 (1984). 6. C. Fischer and R. LeBlanc, ‘Implementation of runtime diagnostics in Pascal’, IEEE Trans. Software Engineering, SE-6, (4), 313–319 (1980). 7. D. Harms and B. Weide, ‘Efficient initialization and finalization of data structures: why and how’, Technical Report OSU-CISRC-3/89-TR11, The Ohio State University Computer and Information Science Research Center, February 1989.
710 r. b. borie, a. s. parrish and s. mandyam 8. B. Weide, ‘OWL programming in C’, Technical Report OSU-CISRC-TR-86-9, The Ohio State University Computer and Information Science Research Center, March 1986. 9. Ada Programming Language Reference Manual, ANSI/MIL-STD-1815A-1983. 10. T. Cormen, C. Leiserson and R. Rivest, Introduction to Algorithms, MIT Press, 1990.
You can also read