How to Build an RDF Based Wiki
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Institut für Informatik Lehr- und Forschungseinheit für Programmierung und Softwaretechnik Fortgeschrittenenpraktikum How to Build an RDF Based Wiki Walter Christian Kammergruber Aufgabensteller: Professor Dr. Martin Wirsing Betreuer: Axel Rausmayer Abgabetermin: Mai 2006
Zusammenfassung Es ist ein verbreiteter Trugschluss, ein Wiki nur als eine Ansammlung von Textdokumenten zu betrachten. Eine Wikiseite ist mehr als nur ASCII-Text: Auf der einen Seite gibt es implizite Daten, die mit beschreibenden Text verquirlt, und zudem an anderer Stelle gespeichert sind. Dies führt zu Konsitenzproblemen. Auf der anderen Seite gibt es Metadaten über Wikiseiten, die mit bisherigen Wikiansätzen nicht zufriedenstellend verwaltet werden können. Wir zeigen einen Ansatz, bei dem die Daten und Metadaten in einer RDF Datenbank gepeichert und gehandhabt werden. Dabei können Duplikationen vermieden werden. Zudem werden verschiedene Ansichten auf die Daten möglich. Wegen der Verwendung von RDF, ein weit unterstüzter Standard, können externe Datenquellen in die RDF Datenbank einbezogen werden. Wir zeigen zudem einen neuen Ansatz für eine Wikisyntax, eine Sprache mit Namen WITL. Bei WITL wird nicht ein ’search and replace’ Stil verwendet um den Text zu rendern, sondern ein Syntaxbaum, der mittels einer LL(k)-Grammatik definiert ist, wird erzeugt und ausgewertet, um das gewollte Ausgabeformat zu generieren. Abstract It is a common fallacy to see a Wiki as just a collection of text documents. It is a network of information. A wiki page is more than just ASCII text: On the one hand there is a lot of implicit data tangled with descriptive text that is often a duplication of other data stored elsewhere. This duplication leads to consistency problems. On the other hand, there is meta-data about Wiki pages (such as their name or author) that currently cannot be properly managed. We show an approach, where this data and meta data is stored and managed by an RDF database. This prevents duplication and allows us to publish different views on the same data. Additionally, because of using RDF as a widely supported standard, one can also add data from external sources to the database. We also show a new Wiki syntax approach, a language called WITL. In WITL we do not use a search and replace mechanism for rendering the text written in Wiki style but we build an abstract syntax tree defined by a LL(k) grammar and walk this tree to generate the desired output format.
Contents Contents 1 Introduction 4 1.1 What Is Yet Another Wiki Good For? . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 About RDF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.1 RDF as a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.2.2 RDF as Triples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Further Advantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 WITL: A Syntax for the Wiki 7 2.1 Wiki Markup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 WITL Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2.1 WITL Syntax explained by Illustrative Examples . . . . . . . . . . . . . . 9 2.3 Using ANTLR for Parsing WITL Code . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3.1 The Lexer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.2 The Parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Rendering with WITL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.4.1 Evaluating the Abstract Syntax Tree . . . . . . . . . . . . . . . . . . . . . . 12 2.4.2 Defining Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3 Wikked Architecture 15 3.1 Three Layer Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.2 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4 Scenario: A Bookmark Collection Web Site 18 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 The Wikked Three Layer Model in Practice . . . . . . . . . . . . . . . . . . . . . 18 4.2.1 Daniel the Data Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2.2 Alice the Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2.3 Wayland the Web Designer . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5 Related Work 23 6 Future Research 23 7 Conclusion 23 3
1 Introduction 1 Introduction 1.1 What Is Yet Another Wiki Good For? When you write an entry in a Wiki, you just type some lines of text in some Wiki syntax. You may include links to other entries, books, web pages, images, articles, newspapers and so on. But you do not have a handy way to access general data like your bookmark collection or your favorite songs. You can simply quote them, but what happens when your taste of music changes? You have to update all your entries. Therefore it would be nice to have the information separated from the text. As an example we show how you can include your bookmark collection in your Wiki. In general, you can see our Wiki as some kind of RDF 1 browser. Everything is encoded in RDF and the Wiki is just a tool to make the information visible. Because we use RDF as a quite popular form for managing data structures, you can also include or import data from external sources based on RDF. These sources could be for example the open directory project created by the Netscape Communications Corporation [9] and its subprojects for music or restaurants. There is also an increasing number of tools concerning RDF, e. g for extracting information out of web pages. RDF at the bottom of the Semantic Web is a very promising approach for modeling data structures with an increasing number of applications. The primary goal of our wiki called Wikked is that it can display any data encoded in RDF. The point of using RDF for managing information is to have the possibility to state relations between resources in a practical way. For example when an entry links to an other one, you can formulate this fact with a statement such as “entry A links to entry B”. A graph representation of such statements can be seen on Fig. 1 on the example of an address book entry. You can access these pieces of information easily by formulating queries, e.q. by using a querying language like RDQL[15]. These queries are an elegant way to get the information you want. Storing and querying meta data can become quite complex. There are some handmade solutions for attaching meta data to Wiki entries but all of them seem to be just a makeshift. RDF is highly developed and there are many tools for it available. By expressing the information in RDF we try to see everything as a node in a network. You can connect or “label” everything with anything. An entry has a creator and the creator has linked nodes or properties like names, addresses, phone numbers and so on2 . A big issue for Wikis is also the Wiki syntax. In most Wikis, there are regular expressions used for searching and replacing patterns, e. g. *bold* stands for bold and there are blocks with macros for extra functionality. In WITL, we just differ between functions and text. A Wiki document then consists of a sequence of text and functions. A function may also have nested functions and nested text, i.e. when the document is parsed, the result is an abstract syntax tree. To keep the idea behind traditional Wiki syntax namely being able to allow fast-paced editing we integrated Syntactic Sugar or call it wiki markup, e. g. **bold** for bold. This syntactic sugar is replaced by the constructs of WITL before the actual parsing of the Wiki document is done. Because of the wiki markup, users with no experience in programming languages can use Wikked as well. They just can pick out a few functions they want to use and omit the additional functionality. On the one hand, WITL is used as language additional to the wiki markup. On the other hand WITL can also be used as template language for designing the web page of the wiki itself. It has dynamical and static ways to include constructs ‘normal’ programming languages use but it is designed for working with heterogeneous data, especially text. Because of this approach, Wikked does not depend on template engines such as velocity from the apache jakarta project [1], JavaServer Pages [16] or stringtemplate from Terence Parr [12]. The templates for the pages 1 RDF stands for Resource Description Format. For details see section 1.2. 2 You may have a look at the vCard MIME Directory Profile from the networking group [4] for more examples. 4
1 Introduction http://www.w3.org/2000/10/swap/pim/contact#fullName John Doe http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2000/10/swap/pim/contact#Person Figure 1: This small RDF graph encodes part of an address book entry. The entry itself is the blank node in the top left corner, it contains the full name of the contact and has the RDF type http://www.w3.org/2000/10/swap/pim/contact#Person. Both edges are drawn with a URI label, but actually they are labeled with a node. Therefore, it could also be a blank node. in Wikked can also be edited in Wikked, because both the templates and the entries are pages in Wikked. As a consequence of preprocessing wiki markup, generating templates can be done quickly and without the need of writing too much HTML. 1.2 About RDF For our purposes, we view RDF as a universal data structure that stores graph-based information. Its main virtue is its simplicity and its ability to integrate external data. 1.2.1 RDF as a Graph Similar to usual graph definitions, an RDF graph consists of • Nodes: There are two kinds of nodes. – Resources: these are the main building blocks of the graph. If a node should be publicly addressable, it gets a URI as an identifier. If not, it remains anonymous and is called a blank node. – Literals: Resources can also be viewed as potentially containing URIs. In order to add arbitrary data to RDF, one attaches a literal to a resource. Literals can only exist in that role, as the target of an edge. They contain a text string. • Edges: edges are binary and directed, connecting a source and a target node. Every edge has a node as its type; a label, if you will. Fig. 1 shows how these basic constructs are used to build an RDF graph. Note how URIs are used as a convenient public namespace for identifying nodes. But they can also be viewed as references to entities on the Web (including the local file system). Constructing URIs as a code for non-Web entities, one can also put references to real-world “objects” into an RDF graph. For example, creating a resource with the URI mailto:john@doe.com and attaching other nodes to it can mean3 that we are describing the real-world person John Doe. 3 Naturally, this is a matter of semantics, of how a group of people agrees on interpreting a particular RDF graph. 5
1 Introduction 1.2.2 RDF as Triples Internally, RDF is usually stored as a set of triples. Each triple is an edge and has the components subject, predicate and object. The names of the components already hint at the fact that with each triple we are making an assertion. That is why triples are also sometimes called statements. The subject is the edge source and must be a resource. The predicate is the edge label and also has to be a resource. The object is the edge target and can either be a resource or a literal. A consequence of RDF only knowing about edges is that there cannot be single nodes. In practice, this is not a problem, because RDF is so fine-grained that single resources are rarely enough for expressing anything meaningful. The following shows how the RDF triples of the example in Fig. 1 are expressed as plain text. Note that in this format, blank nodes also need an identifier (which is internal only). _:1 http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://www.w3.org/2000/10/swap/pim/contact#Person . _:1 http://www.w3.org/2000/10/swap/pim/contact#fullName "John Doe" . 1.2.3 Further Advantages Fortunately, the basics of RDF are simple. Further possibilities such as reification4 of statements, schema languages, inferencing etc. are outlined in the RDF Primer [8]. Many aspects of RDF, such as merging of graphs and multi-dimensionality, make it very useful for software engineering applications [14]. 4 Reifying a statement means making it available as a resource. Without reification, one can only annotate edge types, but not the instances of a type. 6
2 WITL: A Syntax for the Wiki 2 WITL: A Syntax for the Wiki Finding the right syntax for a Wiki is a very important issue because user of the wiki have to use it intensively. You can either resort to some existing rendering machines or create a new one. We tried to use Radeox [5] as a rendering engine, but we quickly reached its limitations. Adding extra functionality beyond the wiki markup and makros led to awkward constructs and unclean code design. For example the links to other Wiki pages are in Radeox written with brackets, e.g. [someTitle] creates a link to a Wiki page with title: “someTitle”. We wanted to identify Wiki pages by means other than their title 5 . They should be unique and independent from the title. The identifier being independent from the title brings also the advantage that when a title changes, the identifier is untouched. We wanted to distinguish between both sort of links, i.e. wiki links to unique ids and also links to titles that refer either to a set of entries or to an explicit entry. That differentiation in Radeox would just be possible with some makeshifts. Because of being unhappy with exiting solutions we finally decided to create a new and simple syntax with a formal semantic. There are two syntactic concepts involved. Thereby we differentiate between wiki markup and our wiki language called WITL. The syntax of both languages are described in the following sections. 2.1 Wiki Markup “A Wiki website is a hypertext on steroids.” (Lars Aronsson [2]) Writing wiki entries as easy as possible is a key issue of wiki markup, especially because typical HTML source code makes the actual text content very hard to read and edit for most users6 . Promoting plain-text editing with a few simple conventions for structure and style is therefore advisable. Keeping that in mind, we introduced a simple wiki markup. Wiki markup parsing is a separate text to text transformation that is performed before the actual WITL parsing. The wiki markup is compatible with WITL, because WITL has very few special symbols. Fig. 2 shows the used wiki markup. 2.2 WITL Syntax Our syntax emerged of some aesthetic and practical considerations. We just started with setting a link in curly brackets, e. g. {http://www.ifi.lmu.de} or {ftp://ftp.leo.org}. Because of the structure of a URI (see RFC:3986 [3] for details), which is defined as a scheme name followed by a colon and other chars, it is straightforward to see the scheme as a function name and the characters behind the colon as an argument. You can consider this construct as a function with the signature function(String schemeName, String arg), e. g. {http://www.ifi.lmu.de} stands for function("http", "//www.ifi.lmu.de"). A function in most cases does not only need one but several arguments and therefore we defined a separator between the arguments. We chose “ ˆ ”, because “ ˆ ” is not allowed in a URI, rarely used in text and not need for the wiki markup. In WITL we also have blocks with unparsed text. Unparsed blocks are convenient for writing longer text parts where reserved charaters may occur and it is not desired that every single character is “escaped”. These verbatim text blocks are surrounded by tripled brackets, e. g. [[[ some unparsed text ]]]. Abbreviated abstract syntax: 5 The title of an entry is not a good identifier. Just consider a title like “Hub”. As ”Hub” there might be a lot of entries with the same title. 6 This fact is nothing new but worth mention. 7
2 WITL: A Syntax for the Wiki Markup Result ==== Title Level 1 ==== Title Level 1 === Title Level 2 === Title Level 2 == Title Level 3 == Title Level 3 = Title Level 4 = Title Level 4 **bold**, ~~italic~~, ''teletype'' bold, italic, teletype tabs lead to quoted paragraphs tabs lead to quoted paragraphs % comments start with a percent sign for http: for http: {http://www.ifi.lmu.de} creates a link to http://www.ifi.lmu.de creates a link to "http://www.ifi.lmu.de" "http://www.ifi.lmu.de" - indent opens sublist indent opens sublist and this text is in the and this text is in the same item same item + numbered sublist at same level 1. numbered sublist at same level Paragraphs are separated by blank lines Paragraphs are separated by blank lines This is a new paragraph . This is a new paragraph . tabs lead to quoted paragraphs tabs lead to quoted paragraphs : Definition list Definition list A list with terms A list with terms : Start term with colon Start term with colon || Table Heading | Table Heading | Table Heading Table Heading | one |two | | three |four | one two three four {wikked.link:http://page1.com^page one} page one creates a link to "http://page1.com" but creates a link to "http://page1.com" but displays "page one" displays "page one" The second argument can be left out: The second argument can be left out: {wikked.link:http://page1.com^page one} page one creates a link to "http://page1.com" creates a link to "http://page1.com" Figure 2: This figure displays the used wiki markup. 8
2 WITL: A Syntax for the Wiki seq ::= (node)* . node ::= (plain | verbatim | funapp) . funapp ::= ‘{’ ( (functionName (‘:’ seq ( ‘ ˆ’ seq ) *) ? ) | (‘*’ (.)* ‘*’) | (‘!’ (.)* ) | (‘?’ (.)* ) )? ‘}’ . functionName ::= ([a-z]|[A-Z]|‘_’|‘$’) ([a-z]|[A-Z]|‘_’|[0-9]|‘$’|‘+’|‘-’|‘.’)*. plain ::= all characters except ’{’ and ’}’ . verbatim ::= "[[[" (.) * "]]]" . 2.2.1 WITL Syntax explained by Illustrative Examples Most of the text is considered “plain text”, except for the following constructs: • {h1:Heading on level one} a wikked function, most HTML commands are defined • {http://foo.com}, {ftp:www.leo.org} standard URLs fit naturally as links into the syntax scheme • [[[verbatim text no functions are evaluated]]] “verbatim” or unescaped text can be used to insert HTML (e.g. in web page templates) or source code (e.g. Java code for documentation) • {* Comments *} produce no output Variables: • {?var}gets a value (unevaluated!) • {:var}, gets an evaluated value • {!var^value} sets a value Attributes: • {?var^attrib}gets a value (unevaluated!) • {:var}, gets an evaluated value • {!var^value} sets an attribute to an value {?}, {:} and {!} functions are syntactic sugar and could be expressed by {get}, {getEval} and {set}. 2.3 Using ANTLR for Parsing WITL Code For parsing WITL code we use ANTLR [11], primarily developed by Terence Parr as a parser generator. On its web site it is described the following way: “ANTLR, ANother Tool for Language Recognition, (formerly PCCTS) is a language tool that provides a framework for constructing recognizers, compilers, and translators from grammatical descriptions containing Java, C#, Python, or C++ actions. ANTLR is popular because it is easy to understand, powerful, flexible, generates human-readable output, and comes with complete source. ANTLR provides excellent support for tree construction, tree walking, and translation. There are currently over 5000 ANTLR source downloads a month.” (from: http://www.antlr.org/about.html visited on 03/16/2006) 9
2 WITL: A Syntax for the Wiki ANTLR creates three main parts for language recognition: • A Lexer that divides the input in token classes. • A Parser that evaluates a token stream according to specified rules. • A TreeWalker that evaluates a syntax tree. For our purpose, we only use the lexer and the parser. We do not define a TreeWalker, because we have our own specialized implementation of a syntax tree and do not use the default interfaces supplied with ANTLR. Even if we used the interfaces of ANTLR for syntax trees, a TreeWalker would be to inflexible for the evaluation. 2.3.1 The Lexer In this section we will present the shortened employed lexer grammar. The lexer is used to split up the WITL code in token classes that can be processed by the Parser on a higher level of abstraction. ANTLR uses a syntax similar to EBNF. Elements and characteristics of the target language - here java - can be included. Listing 1: Lexer definition class WitlLexer extends Lexer ; /∗ The o p e n i n g t a g f o r a l l f u n c t i o n s such a s 3 ∗ {: , { ! , { ? , { : o r s p e c i a l a r i t i e s {} and ∗ { ( no c o l l o n ) ∗/ OTAG : ( options { generateAmbigWarnings=false ; } : 6 ’{’’}’ { $setType ( EMPTY_FUNCTION ) ; } | "{:" ! | ( // u s e s s y n t a c t i c p r e d i c a t e t o d i s t i n g u i s h 9 // between ’{ ’ < functionName > ’ : ’ // and ’{ ’ < functionName > ’} ’ ( ’{’ ! ( COMNAME ) ’:’ ! ) => ’{’ ! ( COMNAME ) ’:’ ! 12 | ’{’ ! COMNAME ’}’ ! { $setType ( ZERO_ARG_FUNCTION ) ; } ) | "{!" ! // f o r ”{ s e t : ” | "{?" ! // f o r ”{ g e t : ” 15 ); /∗ C l o s i n g t a g f o r f u n c t i o n s . ∗/ 18 CTAG : ’}’ ; /∗ {∗ . ∗} matches e v e r y t i n g i n b e t w e e n ∗/ 21 COMMENTTAG : "{*" ! ( . ) ∗ "*}" ! ; /∗ [ [ [ . ] ] ] matches e v e r y t i n g i n b e t w e e n ∗ t r i p l e s q u a r e b r a c k e t ∗/ 24 VERBATIM : "[[[" ! ( . ) ∗ "]]]" ! ; /∗ Used t o s e p e r a t e between arguments i n 27 ∗ a f u n c t i o n , e . g . { fun : a r g 1 ˆ a r g 2 } ∗/ ARGSEP : "^" ; 30 /∗ e v e r y t h i n g e x c e p t s p e c i a l s i g n s ∗/ 10
2 WITL: A Syntax for the Wiki TEXT : ( { LA ( 2 ) ! = ’]’ | | LA ( 3 ) ! = ’]’ }? ’]’ // LA = l o o k ahead | { LA ( 2 ) ! = ’[’ | | LA ( 3 ) ! = ’[’ }? ’[’ 33 | CONTENT | ESC )+ ; 36 /∗ Escaped s i g n s { , } , : , ˆ , [ , ] with l e a d i n g ’ / ’ ∗/ protected ESC : "/" ! ( ’{’ | ’}’ | ’:’ | ’^’ | ’]’ | ’[’ ) ; 39 /∗ Allowed p l a i n c o n t e n t ∗/ protected CONTENT : ˜ ( ’{’ | ’}’ | ’\n’ | ’\r’ | ’^’ | ’]’ | ’[’ ) ; 42 protected COMNAME : ( ’a’ . . ’z’ | ’A’ . . ’Z’ | ’_’ | ’$’ ) ( ’a’ . . ’z’ | ’A’ . . ’Z’ | ’_’ | ’0’ . . ’9’ | ’$’ | ’+’ | ’-’ | ’.’ ) ∗ ; 45 WS : ( ’ ’ | ’\t’ | ’\f’ ) ; NL : ( ’\n’ | "\r\n" | ’\r’ ) ; Listing 1 shows the source code of the lexer. We assume that the comments in the source code describe most token classes sufficiently. Some tricky parts however, are worth being mentioned explicitly: • Functions are split up in an opening tag (line 13) and a closing tag. To distinguish between ’{’’:’ and ’{’’}’ ANTLR has to first look whether it finds an opening curly bracket followed by a functionName, followed by a colon. If it does not find a colon, it has to fall back and look for a closing curly bracket, i.e. it has to differentiate if the function has zero arguments or one or more. • The TEXT token definition uses a look ahead of two and three to find out if a [ or ] belongs to a VERBATIM token class. 2.3.2 The Parser The parser code in ANTLR is analog to the lexer code. It is very similar to EBNF but with features needed for processing with java. Listing 2 shows a very simplified extraction of the parser code. Listing 2: Parser definition class WitlParser extends Parser ; 3 /∗ Returns t h e r o o t o f t h e WITL document . ∗/ witl returns [ Sequence seq ] : ( node ) ∗ ; 6 /∗ g e n e r a l node ∗/ private node returns [ WitlExpression node ] : ( plain | function | getCommand | setCommand | comment ) ; /∗ f u n c t i o n node { functionName : w i t l t e x t } , ∗ { functionName } , {} ∗/ 12 private function returns [ WitlExpression com ] : ( OTAG witl ( ARGSEP witl ) ∗ CTAG | EMPTY_FUNCTION | ZERO_ARG_FUNCTION ) ; 11
2 WITL: A Syntax for the Wiki /∗ f u n c t i o n node {? w i t l t e x t } f o r G e t t e r ∗/ private getCommand returns [ WitlExpression com ] : 18 GET_FUNCTION witl ( ARGSEP witl ) ∗ CTAG ; /∗ f u n c t i o n node { ! w i t l t e x t } f o r S e t t e r ∗/ 21 private setCommand returns [ WitlExpression com ] : SET_FUNCTION witl ( ARGSEP witl ) ∗ CTAG ; 24 /∗ comment node {∗ my comment ∗} ∗/ private comment returns [ WitlExpression node ] : COMMENTTAG ; 27 /∗ p l a i n t e x t o r verbatim t e x t ∗/ private plain returns [ WitlExpression node ] : text | verbatim ; 30 private text returns [ String text ] : TEXT | NL | WS ; private verbatim returns [ String text ] : VERBATIM ; Some remarks: • Tokens delivered from the Lexer are in uppercase letters, e.g. VERBATIM. • Parser rules are in lowercase or mixed cases an may have a return value of any java class. WitlExpression and Sequence are two examples of java classes. 2.4 Rendering with WITL 2.4.1 Evaluating the Abstract Syntax Tree While building a syntax tree seems to be a nice gadget, you want to display some text in a certain target language. In the case of our Wiki mostly HTML but also LATEX is imaginable. A syntax tree for itself is just a structure and insensitive to the grammar it has been produced by. Extracting the information expressed by the tree is an important issue. The evaluation of markup functions is very easy and can be done by implementing some hard coded functions, but WITL has more to offer. You can also define your own functions as shown on Fig. 3 in the Wiki text. The evaluation then becomes more complex. We define a general evaluation strategy that can handle these cases. The evaluation of a tree is done by evaluating every node of the tree. The result is yet again a tree, but very often, it now consists of only text nodes. The final evaluation step is to flatten the tree via pre-order traversal to produce HTML output. Below, we give a more formal definition of evaluation. EBNF name term signature seq seq hhead, taili funapp funapp hvarname, argi plain plainhtexti The evaluation function has the signature eval : Tree × Text → Tree × Text 12
2 WITL: A Syntax for the Wiki The type Tree is defined as Tree = funapphVarname, Treei | plainhTexti | seqhTree, Sequencei In the definition of eval, we use the type Text for storing variable bindings. Lookup is done via function application where the instance bindings is viewed as having the signature bindings : Varname → Tree Adding new bindings views bindings as a sequence of (varname, tree) pairs to which the . operator prepends a pair. Lookup is always performed starting with the first element, returning the first match if one is found. The evaluation of each possible alternative of a tree is defined as follows. • Sequence of Nodes: eval(seq hhead, taili , bindings) = seq result, eval(tail, bindings 0 ) where (result, bindings 0 ) = eval(head, bindings) eval(seq hi , bindings) = seq hi • Function Application: Without loss of generality, we define the function application just for one argument. Note that the function definition is stored as a function application whose functor part is called fun. eval(funapp hfunname, argi , bindings) = eval(body, (param, arg) . bindings) where funapp hfun, param, bodyi = bindings(funname) • Plain Text: eval(plainhtexti, bindings) = (plainhtexti, bindings) Except for the evaluation of sequences, this evaluation strategy is typical for functional languages with lazy evaluation. 2.4.2 Defining Functions The primary goal behind using WITL as a language for both entries and templates is that you can define your own functions in the text as well as using hard coded functions. This fact makes WITL convenient but also flexible. We will demonstrate the mechanism with a short example: For defining your own bold function you can use code like this: { def : {b : content } ˆ [ [ [ ] ] ] { var : content } [ [ [ ] ] ] } The function def is a predefined function and has two arguments: 13
2 WITL: A Syntax for the Wiki 1. The function to define: {b: content} The function has a name ”b” and one argument with variable name ”content”. 2. The body of the function as second argument: There are HTML tags in triple brackets and between them, there is a predefined function var that tells the interpreter to fetch and evaluate its argument. When the function application is evaluated, e. g. {b:someBold}, first the function definition of b is looked up and the variable content is bound to ”someBold”. Then the definition of function b is evaluated. In this definition there is a function var embedded. The function var tells the interpreter to lookup its argument ”content” in the variable bindings and to insert the result in the text at the position itself occurred. As result {b:someBold} is replaced by ” someBold ” Fig. 3 shows the syntax tree for the example above. Round nodes are functions such as {b:content}. The only empty round node is a sequence. It is a container for the nodes in the second argument of def. The square nodes are just for the plain text. In addition to def, eval and var, WITL also provides further lisp-inspired functions such as {list:} and {for:}. content b def var content Figure 3: This syntax tree is a graphical representation of the definition for a bold function. Round nodes are functions, the empty round node is a sequence of nodes and the square nodes are plain text. 14
3 Wikked Architecture 3 Wikked Architecture 3.1 Three Layer Architecture In order to reduce complexity, we created an architecture with three layers as shown in Fig. 4. This is just an abstraction to give a bird’s-eye view on the system. The data layer is used to maintain most of the data to which the to layers above can refer. Thus it is some kind of service layer, which is reliable for creating, updating and deleting data when possible. The presentation layer is the layer in the middle. This is the level where entries are situated. An entry from the author’s view stands for the traditional Wiki page. It is the actual Wiki text extended by the possibility of adding other information like the used language or date of creation. When writing an entry, an author can access the underlying data by including a query or just referring to meta data for this entry in the Wiki text. The meta presentation layer is the layer on top. It is used for designing the web pages. There are templates defined for creating a common look and feel. It is also used to create a certain view on a template, because one might be interested in the text, title and date of creation for an entry and you do not care about the language or other extra information. Somebody else, on the other hand might be just interested in the title and creator, because he wants to create an overview of entries that a friend of him has written. This top level gives the possibility to choose what information one want to have displayed. The idea is to apply a template to an entry and by switching to another template, the view can be changed while the entry stays the same. This can be done the other way round. You can keep the template and change the entry. There may be also entries that do not have a Wiki text but other properties. For example there might be an entry for a person and it is desired to have some information about the person displayed. This can be easily done by using a template that functions as a form which is filled with the data concerning the person. [[ X HT [ R/x ht m l1/ N" W3 C //D TD meta rel layer - -- l: actss " / i s a> - -- li : wi k } li: on i g ano e ite o od Wi the m f or { i k is r i ? te m { li : co ll * *a r bo r nk: h oba e* * ati t r ve_ tp:// tive idly sof e s w u presentation twa n.wik oftwa sed a re} i } , p ed ia e i.e r . . t o rg /w s layer o w i o r k ki/ Co in l la data layer Figure 4: A three layer architecture. On top: The meta presentation building the framework for the Wiki, i.e. the web page templates. In the middle: The presentation layer for authors to write their entries. On bottom: The data layer for managing information. 3.2 Components This section gives a component oriented view of Wikked s architecture. Fig. 5 describes the interaction of the components involved with Wikked: 15
3 Wikked Architecture Figure 5: This figure displays the involved components of the Wikked wiki. On bottom there is the RDF database used by Hyena to maintain the data needed for Wikked, i.e. the wiki pages and other exploited data. Wikked itself runs as a servlet in a servlet container and uses hyena as a service to receive the desired information. The rendering engine is used to parse the WITL source code including wiki markup and then to generate the HTML output. 16
3 Wikked Architecture • Hyena: Hyena is used as a service to manage data concerning RDF. This includes the wiki pages, which are RDF nodes itself. • Wikked: Wikked runs as a servlet in a servlet container. It receives request from clients and returns the corresponding HTML pages. The HTML pages are generated from WITL source code by the Rendering Engine. • Rendering engine: The Rendering Engine includes a preprocessor for translating wiki markup to WITL. WITL then is translated to HTML. This architecture can be seen as a implementation of the MVC pattern7 . Thereby the Wikked servlet acts a controller which translates interactions with the view into actions to be performed by the model. The model is the RDF graph managed by Hyena. The rendering engine at last is the view. The view renders the contents of a model, i.e. it generates HTML output by applying a template to wiki page. 7 Also known as the Model 2 architecture. 17
4 Scenario: A Bookmark Collection Web Site 4 Scenario: A Bookmark Collection Web Site 4.1 Motivation In Wiki documents you often want to refer to a data source instead of simply quoting it. In cases like “list all my java books” or “list my ten favorite songs” it would be nice to have a mechanism to query your database for this information. Your database in this case can be a text document, an excel file or, as the name implies, some kind of relational database. With such a mechanism you do not have to just copy and paste the text, for the plainest case. This is especially interesting when data are subject to changes. You may buy some new java books or your taste in music may vary. You also may want to apply a bit more advanced query conditions, e. g. “list my top ten brit pop CDs from 1998 until 2006”. With your data managed by the RDF database, you can easily refer to it with query functions. Why did we choose a example with bookmarks? We suggest that nearly everybody has a huge collection of bookmarks, but when you have to search for something you ask google. When you browse the web, you can also find a lot of web pages with only bookmarks on them. Nowadays, there are tools such as del.icio.us 8 but the bookmark example is very simple and easy to understand. Hence we show an application of our concept by a use case where three friends are creating a web page for bookmark collections. 4.2 The Wikked Three Layer Model in Practice The work for implementing the Wikked layer model can be split up to three persons, let us call them Mathew, the data manager, Alice an author who presents her bookmarks and Wayland the designer, who creates nice templates for the web pages. Each of them has their well defined function. Mathew has to create an data base and create an API for accessing it. Alice has to write articles and Wayland has to design templates. 4.2.1 Daniel the Data Manager Daniel’s work is to import some bookmarks into the RDF database. A bookmark in our case consists of a title, a URI and belongs to one or more categories. You can see a visual representation of a bookmark for java.sun.com in Fig. 6. The round nodes are representing resources and oblong nodes literal values. There are also two blank nodes called anonymous resources. The most left one stands for the concept of “Sun’s Java home page”, the other one for the category with title “Java”. In this very simple example there are only three categories: News, Java and Wikis. In each category there are ten bookmarks, all having the same structure as the example of Fig. 6. 4.2.2 Alice the Author Writing readable and informative entries is Alice’s job. In our example she only writes three entries, one with her Java links, one with her News links and finally one with her Wiki links. Because she wants to quote them, all of her bookmarks are imported into the database by Daniel earlier9 . Her entry for Java my look like this: These are my ˜˜ Java ˜˜ links : { queryTable : { book : cat : Java } } 8 http://del.icio.us/ 9 She had sent them to Daniel per email earlier. 18
4 Scenario: A Bookmark Collection Web Site hg:title Java rdf:type hg:category hg:category hg:source http://java.sun.com/ hg:title rdf:type Java Technology hg:bookmark Figure 6: A graph representation for an entry in a bookmark collection. To make the graph more readable the namespaces are abbreviated. This bookmark has a title:“Java Technology”, a source: “htp://java.sun.com”, a type:“hg:bookmark”and a category linked to a node with title:“Java” and type:“hg:category”. This entry seems to be a little short, but for demonstration purposes it is sufficient. It shows the main aspects of our Wiki syntax: • function orientation: A function begins with a curly brace followed by the function name, a colon, one or more arguments separated by ^ and finally a closing curly bracket. In the example there are two functions. The first one has the function name “queryTable” with “{book:cat:java}” as a single argument. The second one is the argument for the first one. It has the function name “book” and the argument “cat:java”. “queryTable” gets a query in RDQL as its argument and returns a table with the results. Here “book” is a predefined query function. It simply returns a query in RDQL according to its argument. In this case it returns a query to list a table with title and source for all bookmarks where the category is “java”. • nested functions: A function can also occur in a function, which makes recursive calls possible. When the document is parsed, the structure of the text is translated to a tree as described in the previous section about syntax. • syntactic sugar: To keep the advantages of common Wiki syntax, we added some syntactic sugar, e. g. ~~italic~~is translated to {i:italic} and stands for italic. For some more examples refer to section 2.1. This is very useful for just making some notices in ASCII text, but to have them rendered in HTML. 4.2.3 Wayland the Web Designer Finally, Wayland an experienced web designer has to create some templates for the entries Alice wrote. As already mentioned, he can also use WITL to face this challenge. Therefore he writes a master template: { call : { header }} { call : { body }} { call : { footer }} 19
4 Scenario: A Bookmark Collection Web Site In this template there are several nested templates, included with the function call: 10 : • header: A header template that has some tags with formal definitions needed for layout reasons and to make the whole page valid HTML. • body: The definition how to display a single entry. There is a sidebar on the left that holds the author, the date of creation and the keywords. • footer: In this example the footer created by Wayland is just for printing a closing tag of the HTML document. But when the website of the three gets more sophisticated, there can be for example copyright information included. The code for the header: [ [ [ ] ] ] { : dc : title } [ [ [ < !−− some CSS −−> ] ] ] The code for the body: [ [ [ < !−− l e f t s i d e b a r , some i n f o s −−> ] ] ] {b : Author : { : dc : creator }} [ [ [ ] ] ] {b : created on : { : dc : date }} {b : subjects : { : dc : subject }} [ [ [ < !−− c e n t e r e d c o n t e n t s −−> ] ] ] { h1 : Topic : { : dc : title }} [ [ [ < !−− t h e w i k i T e x t −−> ] ] ] { : dc : source } [ [ [ ] ] ] The code for the footer: [[[ < !−− some o t h e r text −−> ] ] ] 10 The key in the most cases is an URL. 20
4 Scenario: A Bookmark Collection Web Site Figure 7: A screen shot of an entry displayed by the Wiki. Fig. 8 shows a scheme of how the templates are evaluated. On top there is the content of the main page template. Each sub template is included and evaluated by the call function. Then the functions included in the “called” templates are evaluated as well until you have got the desired output, a web page coded in HTML. Fig. 7 shows an screen shot of the result. 21
4 Scenario: A Bookmark Collection Web Site {call: header} Page Template {call: body} {call: footer} Body Template {b:someBold} {renderGet:dc:source} Entry Content these are my ~~java~~ links: {queryTable:{book:cat:News}} HTML Output Topic: Java these are my Java-links: title source Radeox :: start http://radeox.org/ ... Figure 8: This figure shows an extract how a template (Page template) is evaluated. First the body (Body Template) is looked up in the template list and then evaluated. In the body there is a function renderGet that fetches the property dc:source of the entry and tells the evaluator to render it. The text for the property (Entry Content) has embedded functions that are simplified evaluated - here in one step. The entry then is rendered to HTML (HTML output). 22
7 Conclusion 5 Related Work We are not aware of any RDF-based Wikis. As an example for a Wiki with meta tags we can mention SnipSnap. SnipSnap [6] uses a concept with so called labels to attach data to an entry like a label for the creator of the entry, a person. A person then can also have several labels like ”name: Smith” of a type like ”NameLabel”. A label has a type, a name and a value. This approach is rather clever, but the disadvantage is that they do not use an open standard. Worth mention is that SnipSnap supports an export of an entry to rdf. Omnigator [10] provides a web interface for navigating topic maps. It supports importing RDF models and browsing them. But Omnigator is not a Wiki. SEAL [7] is a framework for developing ontology-based portal application. It differs in these main points: it is not a wiki, it concentrates on semantics, which WITL does not, and SEAL is focused on protal application. It has with WITL in common that both can give views or they call it lenses on RDF data. 6 Future Research • Blogs: Our metadata support makes this a very straightforward enhancement. The first step is to add meta-data about the creation data. The second step is to display multiple small wiki pages on one web page and to sort them chronologically. Finally, one can add further convenience functions such as partitioning the blog entries into pages, a calendar wiki, an RSS feed etc. • Integration into the Hyena RDF editing infrastructure: There are two ways in which Hyena [13] will benefit Wikked: – RDF and Wiki syntax editing with a graphical user interface (GUI): One frontend for Hyena is implemented as a collection of plugins for the integrated development environment Eclipse. Having this frontend edit Wikked’s data will provide a nice alternative to purely Web-based editing. – Remote publishing: Hyena provides an infrastucture of distributed servers that can publish and subscribe data among each other. One can therefore start one editor as a Hyena engine (with a GUI frontend) and Wikked as another (without a user interface). Publishing from the editor to Wikked uses a push protocol provided by Hyena. • Lightweight publishing for software engineers: We are currently extending Hyena with a set of tools that allows software engineers to man- age development-related information (such as bug-tracking lists, documentation and source code) with Hyena, in one integrated RDF database. Using Wikked as a presentation layer for that database allows them to effectively publish and edit that information on the Web. For example, one can write a wiki page that documents a certain aspect of the system and includes code samples that are retrieved via RDQL queries. 7 Conclusion In this paper, we presented an RDF based Wiki. Therefore, we designed a scripting language called WITL. Templates for Wiki pages use the same language as entries in the Wiki. Both entries and templates are also described and maintained with RDF. Furthermore, templates can also be edited 23
7 Conclusion as a normal Wiki page. That assures highest flexibility for the look and feel of the Wiki. To keep the advantages of ordinary Wikis, traditional Wiki markup is also supported. We developed a three layer architecture for browsing RDF data and showed how the involved components interact between each other. As a proof of concept we presented a scenario where a web site for maintaining bookmark collection is developed. 24
References References [1] Apache Jakarta Project. Velocity. http://jakarta.apache.org/velocity/. [2] L. Aronsson. Operation of a large scale, general purpose wiki website. Elpub 2002. Technology Interactions., 2002. [3] T. Berners-Lee, R. T. Fielding, and L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. http://www.ietf.org/rfc/rfc3986.txt, January 2005. [4] F. Dawson and T. Howes. vCard MIME directory profile. ftp://ftp.isi.edu/in-notes/ rfc2426.txt, September 1998. [5] Fraunhofer FIRST. Radeox. http://radeox.org/. [6] M. L. Jugel and S. J. Schmidt. SnipSnap. http://snipsnap.org/. [7] A. Maedche, S. Staab, N. Stojanovic, R. Studer, and Y. Sure. Seal - a framework for developing semantic web portals. In BNCOD 18: Proceedings of the 18th British National Conference on Databases, pages 1–22, London, UK, 2001. Springer-Verlag. [8] F. Manola and E. Miller. RDF Primer, W3C Recommendation. http://www.w3.org/TR/ rdf-primer/, 2004. [9] Netscape Communications Corporation. Open Directory Project. http://www.dmoz.org/. [10] Ontopia. Omnigator. http://www.ontopia.net/omnigator/. [11] T. Parr. Antlr parser generator. http://antlr.org/. [12] T. Parr. StringTemplate: Java Template Engine. http://www.stringtemplate.org/. [13] A. Rauschmayer. Hyena: A Semantic Web Enabled Editor for Software Engineers. Submitted for publication. [14] A. Rauschmayer and P. Renner. Knowledge-Representation-Based Software Engineering. Technical Report 0407, Ludwig-Maximilians-Universität München, Institut für Informatik, May 2004. [15] A. Seaborne. RDQL - a query language for RDF. http://www.w3.org/Submission/RDQL/. [16] Sun Microsystems. Jsr-000152 javaserver pages 2.0 specification - final release. http://jcp. org/aboutJava/communityprocess/final/jsr152/. 25
List of Figures List of Figures 1 RDF Example: Adress Book Entry . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Wiki Markup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3 Abstract Syntax Tree for Bold Function . . . . . . . . . . . . . . . . . . . . . . . . 14 4 Three Layer Architecture for Wikked . . . . . . . . . . . . . . . . . . . . . . . . . . 15 5 Wikked Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 6 A graph representation for an entry in a bookmark collection. . . . . . . . . . . . . 19 7 A screen shot of an entry displayed by the Wiki. . . . . . . . . . . . . . . . . . . . 21 8 Template processing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 26
You can also read