Generating Highly Customizable SQL Parsers
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Generating Highly Customizable SQL Parsers Sagar Sunkle, Martin Kuhlemann, Norbert Siegmund, Marko Rosenmüller, Gunter Saake School of Computer Science University of Magdeburg 39106 Magdeburg, Germany {ssunkle,mkuhlema,nsiegmun,rosenmue,saake}@ovgu.de ABSTRACT treatment of database technology in different applica- Database technology and the Structured Query Language tions. Characteristics of SQL like data access, query ex- (SQL) have grown enormously in recent years. Applications ecution speed, and memory requirements differ between from different domains have different requirements for using application domains like data warehouses and embed- database technology and SQL. The major problem of current ded systems. Various researchers have shown the need standards of SQL is complexity and unmanageability. In this for configurability in many areas of DBMS [6, 13, 17]. paper we present an approach based on software product line We therefore need the ability to select only the func- engineering which can be used to create customizable SQL tionality we need in database products in general and parsers and consequently different SQL dialects. We give an SQL in particular. overview of how SQL can be decomposed in terms of fea- Software product line engineering (SPLE ) is a soft- tures and how different features can be composed to create ware engineering approach that considers tailor-made parsers for SQL. aforementioned issues of products of similar kind made for a specific market segment and differing in features. A set of such products is called a software product line General Terms (SPL) [18]. In SPLE, products are identified in terms Design, Languages of features which are user-visible characteristics of a product [12, 9]. Decomposing SQL in terms of fea- Keywords tures, that can be composed to obtain different SQL Tailor-made Data Management, Embedded Systems, dialects, can be beneficial and insightful not only in Feature-oriented Programming managing features of SQL itself but also in database technology of embedded and real time systems. A cus- 1. INTRODUCTION tomizable SQL and a customizable database manage- ment system is a better option than using conventional Since its modest beginnings in the 70’s, database tech- database products for embedded systems. Embedded nology has exploded into every area of computing. It is and real time systems have different characteristics and an integral part of any modern application where dif- performance requirements than the general computing ferent kinds of data need to be stored and manipulated. systems due to restricted hardware and frequent use of All major database technology vendors have adopted a certain kinds of queries such as projection and aggrega- very general approach, combining functionality from di- tion [6]. Embedded systems like various hardware ap- verse areas in one database product [8]. Likewise, Struc- pliances, peer-to-peer, and stream based architectures tured Query Language (SQL), the basis for interaction for embedded devices use shared information resources between database technology and its user, has grown and require declarative query processing for resource enormously in its size and complexity [6, 13]. Start- discovery, caching and archiving, etc. [13]. Query pro- ing with the standard selection-projection-join queries cessing for sensor networks requires different semantics and aggregation, SQL now contains a number of ad- of queries as well as additional features than provided ditional constructs pertaining to new areas of comput- in SQL standards [14]. A feature decomposition of SQL ing to which database technology has been introduced can be used to create a ‘scaled down’ version of SQL ap- [8]. All major database vendors conform to ISO/ANSI propriate for such applications, by establishing a prod- SQL standards at differing levels, maintaining the syn- uct line architecture for SQL variants. tax suitable for their products, thus further increasing In this paper, we propose that SPL research is ca- the complexity of learning and using SQL. Require- pable of providing answers to the problems of manag- ments like performance, tuning, configurability, etc., ing features in database products such as SQL engines. differ from domain to domain, thus needing a different
We present an approach for a customizable SQL parser 2.2 Software Product Line Engineering based on product line concepts and a decomposition of SPLE aims at developing software applications by SQL into features. We show how different features of identifying and building reusable assets in the domain SQL are obtained and how these features can be com- engineering and implementing mass customization in posed to create different SQL parsers. the application engineering [18]. The feature model- ing activity applied to systems in a domain to cap- 2. BACKGROUND ture commonalities and variabilities in terms of features is known as feature-oriented decomposition. Feature- 2.1 Structured Query Language oriented decomposition is carried out in the analysis SQL is a database query language used for formulat- phase of domain engineering. A feature is any end-user- ing statements that are processed by a database man- visible, distinguishable and functional or non-functional agement system to create and maintain a database [21]. characteristic of a concept that is relevant to some stake- The most commonly used SQL query is the SELECT holder [9, 10]. In modeling the features of SQL (specifi- statement which can retrieve data from one or more cally SQL:2003), we take the view of features as the end- tables in a database. This data can be restricted us- user-visible and distinguishable characteristics of SQL. ing conditional statements in the WHERE clause. SE- Feature diagrams [12, 9] are used to model features in LECT can group related data using the GROUP BY a hierarchical manner as a tree. The root of the tree clause and restrict the grouped data with the HAV- represents a concept and the child nodes represent fea- ING clause. It can order or sort the data based on tures. The hierarchical structure of a feature diagram different columns using the ORDER BY clause [15]. indicates that there is a parent child relationship be- SQL contains many statements to create and manip- tween the feature nodes. A feature diagram contains ulate database objects. Since its first standardization various types of features such as mandatory, optional, in 1986, more and more functionality is being included alternative features, AND features, and OR features. in SQL in each subsequent standard. The latest edi- A feature instance is a description of different feature tion of the SQL standard, referred to as SQL:2003, sup- combinations obtained by including the concept node ports diverse functionality such as call level interfacing, of the feature diagram and traversing the diagram from foreign-data wrappers, embedding SQL in Java, busi- the concept. Depending on the type of the feature node, ness intelligence and data warehousing functions, sup- the node becomes part of the instance description [9]. port for XML, new data types, etc. The vast scope of SQL’s functionality has led many 2.3 Feature-Oriented Programming researchers to advocate the usage of a ‘scaled down’ Feature-oriented programming (FOP) [19] is the study version of SQL, especially for embedded systems [6, 13, of feature modularity and how to use it in program syn- 8]. Embedded systems have many hardware limitations thesis [2]. It treats features as first-class entities in de- such as small RAM, small stable storage, and high data sign and implementation of a product line. A complex read/write ratio. Also the applications where embedded program is obtained by adding features as incremental systems are used, e.g., healthcare and bank cash cards, details to a simple program. The basic ideas of FOP need only a small set of queries like select, project, were first implemented in GenVoca [4] and Algebraic Hi- views, and aggregations. A standard called Structured erarchical Equations for Application Design (AHEAD) Card Query Language (SCQL) by ISO considers inter- [5]. AHEAD uses the Jak language, which is a superset industry commands for use in smart cards [11] with re- of the Java language, for feature-oriented programming. stricted functionality of SQL. Some database systems In AHEAD, a programming language and language ex- and SQL engines, distinguished as ‘tiny’, have been pro- tensions are defined in terms of a base grammar and posed to address this issue, e.g., the TinyDB1 database extension grammars. Grammars specific to both the management system for sensor networks. TinyDB con- language and the language extensions are described us- tains TinySQL language for querying sensor networks. ing the Bali grammar specification language. Bali can TinySQL has a limited functionality compared to SQL be used to specify sub-grammars that can be shared such as single table in FROM clause, no column alias and reused. As a grammar specification language, Bali in SELECT clause, and sensor networks specific query allows specifying extra constructs for BNF specifica- constructs such as epoch duration and sample period tion such as labels to name the production rules in a clause. grammar. Grammars written in Bali can be composed While the standardization process shows how SQL to specify language extensions and language combina- has increased in size and complexity in terms of features tions2 . A Bali grammar can import definitions for non- provided, efforts for ‘scaled down’ versions indicate a terminals from other grammars. Syntax extension and need to control and manipulate features of SQL. 1 2 http://telegraph.cs.berkeley.edu/tinydb/ http://www.cs.utexas.edu/users/schwartz/
the corresponding semantic actions are implemented sep- imum requirements of the SQL. Other parts define ex- arately using the Javacc3 parser generator and the Jak tensions. language respectively. Features are treated as collab- SQL statements are generally classified by their func- orations among classes and composed using Jampack tionality as data definition language statements, data and Mixin tools [5]. manipulation language statements, data control lan- guage statements, etc. [15]. Therefore, we arranged 3. CUSTOMIZABILITY FOR SQL the top level features for SQL:2003 at different levels of granularity with the basic decomposition guided by the Customizability for SQL means that only the needed classification of SQL statements by function as found in functionality, such as only some specific SQL statement [15]. The feature diagram for features of SQL:2003 is types, is present in the SQL engine. The concepts of fea- based on the BNF grammar specification of SQL:2003 tures and product lines can be applied to the SQL lan- and other information given in SQL Foundation. guage so that composition of different features leads to We use the BNF grammar of SQL for constructing the different parsers for SQL. We base our approach for cre- feature diagrams based on the following assumptions: ating a customizable parser for • The complete SQL:2003 BNF grammar represents SQL:2003 on the idea of composing sub-grammars to a product line, in which various sub-grammars rep- add extra functionality to a language from the Bali ap- resent features. Composing these features creates proach as stated above. For a customizable SQL parser, products of this product line, namely different vari- we consider LL(k) grammars which are a type of con- ants of SQL:2003. text free grammars. Terminal symbols are the elemen- • A nonterminal may be considered as a feature only tary symbols of the language defined by such a gram- if the nonterminal clearly expresses an SQL con- mar while the nonterminal symbols are set of strings struct. Mandatory nonterminals are represented of terminals also known as syntactic variables [1]. Ter- as mandatory features. Optional nonterminals are minals and nonterminals are used in production rules represented as optional features. of the grammar to define substitution sequences. From • The choices in the production rule are represented a features and feature diagrams perspective, the pro- as OR features. cess of feature-oriented decomposition and feature im- • A terminal symbol is considered as a feature only if plementation is realized in two broad stages [20]. In the it represents a distinguishable characteristic of the first stage, the specification of SQL:2003 is modeled in feature under consideration apart from the syntax terms of feature diagrams. Given the feature diagram (e.g., DISTINCT and ALL keywords in a SELECT of a particular construct of SQL:2003 that needs to be statement signify different features of the SELECT customized, we need to create the LL(k) grammar that statement). captures the feature instance description. This ensures The grammar given in SQL Foundation is useful in that the specific feature which we want to add to the understanding the overall structure of an SQL construct, original specification is included as well. In the second or what different SQL constructs constitute a larger stage, having obtained LL(k) grammars for the base SQL construct. This approach may also be useful in and extension features separately, we compose them to general to carry out a feature decomposition of any obtain the LL(k) grammar which contains the syntax programming language as the grammar establishes the for both the base and extension features. With a parser basic building blocks of any programming language. generator, we obtain a parser which can effectively parse The most coarse-grained decomposition is the decom- the base as well as extension specification for a given position of SQL:2003 into various constituent packages. SQL construct. We have chosen to further decompose SQL Foundation, We add semantic actions to the parser code thus gen- since it contains the core of SQL:2003. Overall 40 fea- erated using Jak and other feature-oriented program- ture diagrams are obtained for SQL Foundation with ming tools, effectively creating a SQL:2003 preproces- more than 500 features. Other extension packages of sor. In the following sections we explain decomposition SQL:2003 can be similarly decomposed. Figures 1 and 2 of SQL:2003 into features and composition of these fea- show the features Query Specification(representing SE- tures. LECT statement) and the feature Table Expression re- spectively. 3.1 Decomposing SQL:2003 To obtain the sub-grammars corresponding to fea- For the feature-oriented decomposition of SQL:2003, tures, we refer to the feature diagram for the given fea- we use various SQL:2003 standards ISO/IEC 9075 - ture. Based on the feature diagram, we create LL(k) (n):2003 which define the SQL language. SQL Frame- grammars for each feature in the feature instance de- work [16] and SQL Foundation [15] encompass the min- scription using the original SQL:2003 BNF specifica- 3 tion for Query Specification (cf. Section 7.12 in SQL https://javacc.dev.java.net/
Query position to create a feature model from which different Specification features constituting the parser may be selected. Such a parser for SQL:2003 can selectively parse precisely those SQL:2003 statements which are represented as features in feature diagrams under consideration. We explain Set Table Select List this with an example. Quantifier Expression Suppose that we want to create a parser for the SE- LECT statement in SQL:2003 represented by the Query Specification feature (cf. Figure 1). Specifically we [1..*] want to implement a feature instance description of ALL DISTINCT Select Sublist Asterisk {Query Specification, Select List, Select Sublist (with cardinality 1), Table Expression} with the Table Ex- pression feature instance description, {Table Expres- Derived sion, From Clause, Table Reference (with cardinality Column 1)}. We would proceed as follows: 1. A feature tree of the SELECT statement presents AS Clause various features of the statement to the user. Se- lection of different subfeatures of the SELECT statement is equivalent to creating a feature in- Figure 1: Query Specification Feature Diagram. stance description. Table 2. To create a parser for these features, we make use Expression of the sub-grammars and token files created dur- ing the decomposition. We compose these sub- grammars to one LL(k) grammar. Similarly, cor- responding token files are composed to a single Where Group By Having Window token file. From Clause Clause Clause Clause Clause 3. Using the ANTLR parser generator, we create the Figure 2: Table Expression Feature Diagram. parser with the composed grammar. The parser code generated is specific to the features we se- lected in the first step. That is, it is capable of Foundation). These sub-grammars are used later dur- parsing precisely the features in the feature in- ing composition to obtain a grammar that can parse stance description for which we created LL(k) gram- various SQL constructs represented by corresponding mars. features. We represent a grammar and the tokens sepa- rately. Accordingly, for each sub-grammar we also cre- Thus composing the sub-grammars for the Query Spec- ate a file containing various tokens used in the grammar. ification (cf. Figure 1) feature which represents an SQL SELECT statement, the optional Set Quantifier feature 3.2 Composing SQL:2003 Features of Query Specification and the optional Where Clause feature of the Table Expression (cf. Figure 2) feature During our work on a customizable parser for SQL:2003, which itself is a mandatory feature of Query Specifi- we found that the tool Balicomposer, which is used cation, gives a grammar which can essentially parse a for composing the Bali grammar rules, is restrictive SELECT statement with a single column from a single in expressing the complex structure of SQL grammar table with optional set quantifier (DISTINCT or ALL) rules. Bali uses its own notation for describing lan- and optional where clause. This procedure can be ex- guage and extension grammars which are converted to tended to other statements of SQL:2003, first mapping LL(k) grammars as required by the Javacc parser gen- features to sub-grammars and then composing them to erator. We instead use the LL(k) grammars with ad- obtain a customizable parser. ditional options used by the ANTLR4 parser generator In composing LL(k) grammars we must consider the in our prototype. In order to create an SQL engine, we treatment of nonterminals and tokens. The treatment also need the feature-oriented programming capability of nonterminals is most involved. We have provided already present in the form of Jak language and Bali composition mechanisms for various production rules related tools Mixin and Jampack. with the same nonterminal, for composition of optional We use the feature diagrams obtained in the decom- nonterminals, and for composing complex sequences of 4 http://www.antlr.org/ nonterminals. Production rules in the grammar may or
may not contain choices for the same nonterminal, e.g., be combined producing various ambiguities. Since we A: B | C | D and A: B. The composition of production are decomposing only SQL, a single language, we can rules labeled with the same nonterminal is carried as use the common approach of employing a separate scan- follows: ner which would be insufficient for their goal. They use Syntax Definition Formalism (SDF) to achieve modu- • If the new production contains the old one, then larity, although the modularity of SDF grammars can the old production is replaced with the new pro- be attributed to scanner-less generalized LR (SGLR) duction, e.g., in composing A: BC with A: B, the parsing mechanism used in MetaBorg [7]. The no- production B is replaced with BC. tion of inheritable grammar modules in SDF also oc- • If the new production is contained in the old one, curs in Bali. We provide the mechanisms of adding, then the old production is left unmodified, e.g., in removing and modifying the production rules in gram- composing A: B with A: BC, the production BC mar but not inheritable grammars at this point. Dis- is retained. ambiguation mechanism of the Metaborg approach consists of priority between productions, reject/prefer • If the new and old production rules defer, then mechanisms for derivations, enforcing associativity for they are appended as choices, e.g., in composing operators. Similar disambiguation constructs are also A: B with A: C, productions B and C are appended present in ANTLR in terms of syntactic and semantic to obtain A : B | C. predicates. We compose any optional specification within a produc- tion after the corresponding non optional specification. A: B and A : B[C] or A : B and A : [C]B can be composed 5. CONCLUSIONS in that order only. Some production rules in the gram- Features of SQL:2003 are user-visible and distinguish- mar may contain complex lists as productions. Com- able characteristics of SQL:2003. The complete gram- plex lists are of the form ‘ [ mar of SQL:2003 can be considered as a product line, ... ]’. In the composer for the prototype, if features where sub-grammars denote features that can be com- to be composed contain a sublist and a complex list, posed to obtain customized parsability. In addition to e.g., A: B and A: B [“,” B] respectively, then these are decomposing SQL by statement classes, it is possible composed sequentially with the sublist being composed to classify SQL constructs in different ways, e.g., by ahead of the complex list. the schema element they operate on. We propose that A feature may require other features for correct com- different classifications of features lead to the same ad- position. Such features constraints are expressed as ‘re- vantages of using the feature concept. We have created quires’ or ‘excludes’ conditions on features. We use the 40 feature diagrams for SQL Foundation representing notion of composition sequence that indicates how var- more than 500 features. Other extension packages of ious features are included or excluded. SQL can be similarly decomposed into features. We have created different prototype parsers by composing 4. RELATED WORK different features. Currently we are creating an imple- Extensibility of grammars is also subject to current mentation model and a user interface presenting vari- research. Batory et al. provide an extensible Java gram- ous SQL statements and their features. When a user mar based on an application of FOP to BNF gram- selects different features, the required parser is created mars [3]. Our work on decomposing the SQL grammar by composing these features. Although SPLE and FOP was inspired by their work. Initially, we tried to use concepts have been applied to programming language the Bali language and accompanied tools to decompose extension as in Bali, our approach is unique in that the SQL grammar. The language extension approach of these concepts have been applied to SQL:2003 for the Bali clearly separates the syntax extension and the im- first time to obtain customizable SQL parsers. Modu- plementation of semantic actions. However, given the larity and disambiguation constructs for given type of complexity of SQL:2003 grammar, we cannot achieve grammars depend largely on the parsers and parser gen- the customizability that we need by using Bali [20]. We erators used. Similarly implementation of semantics for therefore created a new compositional approach based given language depends on the generated parser. Some on ANTLR grammars. parsers represent production rules in the grammar as Bravenboer et al. provide with MetaBorg an ap- methods whereas some other parsers represent these as proach for extensible grammars [7]. Their aim is to classes. We surmise that the best approach will depend extend an existing language with new syntax from a on ease of implementation of semantics. One of our different language. For that reason they need a parsing future aims is to find out what kind of modularity of approach that can handle various context free gram- grammars and what kind of parsing mechanism is most mars simultaneously because multiple languages may suitable for feature-oriented extension of SQL.
Acknowledgments [10] K. Czarnecki, S. Helsen, and U. W. Eisenecker. Norbert Siegmund and Marko Rosenmüller are funded Formalizing cardinality-based feature models and by German Research Foundation (DFG), Project their specialization. Software Process: SA 465/32-1. The presented work is part of the FAME- Improvement and Practice, 10(1):7–29, 2005. DBMS project5 , a cooperation of Universities of Dort- [11] International Organization for Standardization mund, Erlangen-Nuremberg, Magdeburg, and Passau, (ISO). Part 7: Interindustry Commands for funded by DFG. Structured Card Query Language (SCQL). In Identification Cards – Integrated Circuit(s) Cards with Contacts, ISO/IEC 7816-7, 1999. 6. REFERENCES [12] K. Kang, S. Cohen, J. Hess, W. Novak, and [1] A. V. Aho, M. S. Lam, R. Sethi, and J. D. A. Peterson. Feature-Oriented Domain Analysis Ullman. Compilers: Principles, Techniques, and (FODA) Feasibility Study. Technical Report Tools (2nd Edition). Addison Wesley, 2006. CMU/SEI-90-TR-21, Software Engineering [2] D. Batory. A Tutorial on Feature-oriented Institute, Carnegie Mellon University, 1990. Programming and Product-lines. In Proceedings [13] M. L. Kersten, G. Weikum, M. J. Franklin, D. A. of the 25th International Conference on Software Keim, A. P. Buchmann, and S. Chaudhuri. A Engineering, pages 753–754, Washington, DC, Database Striptease or How to Manage Your USA, 2003. IEEE Computer Society. Personal Databases. In Proceedings of the [3] D. Batory, B. Lofaso, and Y. Smaragdakis. JTS: International Conference on Very Large Data Tools for Implementing Domain-Specific Bases, pages 1043–1044. Morgan Kaufmann, 2003. Languages. In Proceedings of the International [14] S. R. Madden, M. J. Franklin, J. M. Hellerstein, Conference on Software Reuse, pages 143–153. and W. Hong. TinyDB: An Acquisitional Query IEEE Computer Society, 1998. Processing System for Sensor Networks. ACM [4] D. Batory and S. O’Malley. The Design and Transactions on Database Systems, 30(1):122–173, Implementation of Hierarchical Software Systems March 2005. with Reusable Components. ACM Transactions [15] J. Melton. Working Draft : SQL Foundation . on Software Engineering and Methodology, ISO/IEC 9075-2:2003 (E) 1(4):355–398, 1992. 5WD-02-Foundation-2003-09, ISO/ANSI, 2003. [5] D. Batory, J. N. Sarvela, and A. Rauschmayer. [16] J. Melton. Working Draft : SQL Framework . Scaling Step-Wise Refinement. IEEE Transactions ISO/IEC 9075-1:2003 (E) on Software Engineering, 30(6):355–371, 2004. 5WD-01-Framework-2003-09, ISO/ANSI, 2003. [6] C. Bobineau, L. Bouganim, P. Pucheral, and [17] D. Nyström, A. Tešanović, M. Nolin, P. Valduriez. PicoDMBS: Scaling Down Database C. Norström, and J. Hansson. COMET: A Techniques for the Smartcard. In Proceedings of Component-Based Real-Time Database for the International Conference on Very Large Data Automotive Systems. In Proceedings of the Bases, pages 11–20. Morgan Kaufmann, 2000. Workshop on Software Engineering for [7] M. Bravenboer and E. Visser. Concrete Syntax Automotive Systems, pages 1–8. IEEE Computer for Objects: Domain-specific Language Society, 2004. Embedding and Assimilation Without [18] K. Pohl, G. Böckle, and F. van der Linden. Restrictions. In Proceedings of the International Software Product Line Engineering : Foundations, Conference on Object-Oriented Programming, Principles, and Techniques . Springer, 1998. Systems, Languages, and Applications, pages [19] C. Prehofer. Feature-Oriented Programming: A 365–383. ACM Press, 2004. Fresh Look at Objects. In Proceedings of the [8] S. Chaudhuri and G. Weikum. Rethinking European Conference on Object-Oriented Database System Architecture: Towards a Programming, volume 1241 of Lecture Notes in Self-Tuning RISC-Style Database System. In Computer Science, pages 419–443. Springer, 1997. A. E. Abbadi, M. L. Brodie, S. Chakravarthy, [20] S. Sunkle. Feature-oriented Decomposition of U. Dayal, N. Kamel, G. Schlageter, and K.-Y. SQL:2003. Master’s thesis, Department of Whang, editors, Proceedings of 26th International Computer Science, University of Magdeburg, Conference on Very Large Data Bases Germany, 2007. (VLDB’00), pages 1–10. Morgan Kaufmann, 2000. [21] R. F. van der Lans. Introduction to SQL: [9] K. Czarnecki and U. Eisenecker. Generative Mastering the Relational Database Language, Programming: Methods, Tools, and Applications. Fourth Edition/20th Anniversary Edition. Addison-Wesley, 2000. Addison-Wesley Professional, 2006. 5 http://fame-dbms.org
You can also read