Using XML Schema and Namespaces in XML - AUTHOR: v1, March 13, 2007
←
→
Page content transcription
If your browser does not render page correctly, please read the page content below
Using XML Schema and Namespaces in XML v1, March 13, 2007 AUTHOR: Peter Lacey Hplacey@burtongroup.com TECHNOLOGY THREAD: Data Access Strategies
Publishing Information Burton Group is a research and consulting firm specializing in network and applications infrastructure technologies. Burton works to catalyze change and progress in the network computing industry through interaction with leading vendors and users. Publication headquarters, marketing, and sales offices are located at: Burton Group 7090 Union Park Center, Suite 200 Midvale, Utah USA 84047-4169 Phone: +1.801.566.2880 Fax: +1.801.566.3611 Toll free in the USA: 800.824.9924 Internet: info@burtongroup.com; www.burtongroup.com Copyright 2007 Burton Group. ISSN 1048-4620. All rights reserved. All product, technology and service names are trademarks or service marks of their respective owners. Terms of Use: Burton customers can freely copy and print this document for their internal use. Customers can also excerpt material from this document provided that they label the document as “Proprietary and Confidential” and add the following notice in the document: “Copyright © 2007 Burton Group. Used with the permission of the copyright holder. Contains previously developed intellectual property and methodologies to which Burton Group retains rights. For internal customer use only.” Requests from non-clients of Burton for permission to reprint or distribute should be addressed to the Marketing Department at +1.801.304.8119. Burton Group’s Application Platform Strategies service provides objective analysis of networking technology, market trends, vendor strategies, and related products. The information in Burton’s Application Platform Strategies service is gathered from reliable sources and is prepared by experienced analysts, but it cannot be considered infallible. The opinions expressed are based on judgments made at the time, and are subject to change. Burton offers no warranty, either expressed or implied, on the information in Burton’s Application Platform Strategies service, and accepts no responsibility for errors resulting from its use. If you do not have a license to Burton’s Application Platform Strategies service and are interested in receiving information about becoming a subscriber, please contact Burton. 2 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Table of Contents Foreword ..................................................................................................................................................................5 Audience...............................................................................................................................................................5 Scope .....................................................................................................................................................................6 Introduction..............................................................................................................................................................6 Understanding Namespaces in XML....................................................................................................................7 The Target Namespace ......................................................................................................................................9 Qualifying Elements and Attributes ...............................................................................................................10 Choosing a Name for a Namespace ................................................................................................................15 Using URNs for Namespace Names..........................................................................................................15 Using URLs for Namespace Names...........................................................................................................16 Versioning ..........................................................................................................................................................18 Best Practices: Namespaces and Versioning ..................................................................................................19 Namespaces ...................................................................................................................................................19 Versioning......................................................................................................................................................22 Creating XML Schema Documents ....................................................................................................................23 Structuring an XML Schema Document .......................................................................................................23 Elements.........................................................................................................................................................23 Global Elements .......................................................................................................................................23 Local Elements..........................................................................................................................................24 Element Reuse ..........................................................................................................................................25 Element Groups........................................................................................................................................27 Attributes .......................................................................................................................................................27 Types ..............................................................................................................................................................28 Custom Types ...........................................................................................................................................28 Simple Types ........................................................................................................................................28 Complex Types ....................................................................................................................................29 Named and Anonymous Types..............................................................................................................29 Best Practices: Document Structure ...........................................................................................................30 Elements vs. Types ...................................................................................................................................30 Elements vs. Attributes............................................................................................................................33 Best Practices: Naming Conventions..........................................................................................................34 Modularity..........................................................................................................................................................35 Include............................................................................................................................................................35 Import.............................................................................................................................................................36 Redefine .........................................................................................................................................................36 Best Practices: Modularity............................................................................................................................36 Extensibility........................................................................................................................................................38 Extending Constructs Using the “Any” Element .....................................................................................38 The anyType Type .......................................................................................................................................41 Extensibility via Composite Schemas .........................................................................................................42 3 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Best Practices: Extensibility..........................................................................................................................43 Character Encoding...........................................................................................................................................44 Schema Documentation....................................................................................................................................44 Annotation .....................................................................................................................................................44 XML-Style Comments .................................................................................................................................45 Notation..........................................................................................................................................................45 Best Practices: Schema Documentation......................................................................................................45 Content Modeling..............................................................................................................................................46 Element Cardinality......................................................................................................................................46 Empty, Null, and Missing ............................................................................................................................47 Simple Types .................................................................................................................................................49 Facets (Restricted Simple Types) ............................................................................................................49 List Types ..................................................................................................................................................49 Union Types ..............................................................................................................................................50 Best Practices: Simple Types ...................................................................................................................50 Complex Types..............................................................................................................................................51 Element Content.......................................................................................................................................51 Mixed Content...........................................................................................................................................52 Best Practices: Complex Types................................................................................................................53 Derived Types ...............................................................................................................................................53 Derived by Extension ...............................................................................................................................53 Derived by Restriction..............................................................................................................................54 Final............................................................................................................................................................54 Abstract Types ..........................................................................................................................................55 Best Practices: Derived Types .................................................................................................................55 Other Content Modeling Constructs ..........................................................................................................56 Substitution Groups..................................................................................................................................56 Block...........................................................................................................................................................57 Default or Fixed Values...........................................................................................................................57 Best Practices: Substitution Groups, Block, and Fixed Values ...........................................................57 Identity Constraints ......................................................................................................................................58 Unique........................................................................................................................................................58 Key/Keyref ................................................................................................................................................59 Best Practices: Identity Constraints ........................................................................................................62 Notes ........................................................................................................................................................................63 4 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Foreword Namespaces in Extensible Markup Language (XML) is a specification published by the World Wide Web Consortium (W3C) that addresses the potential for naming collisions between like-named elements in composite XML instance documents. XML Schema is a specification, also from the W3C, 1 that “provide[s] a means for defining the structure, content and semantics of XML documents.” The Namespaces in XML specification (from here on referred to by its more common name, “XML namespaces”) predates, and is thus orthogonal to, the XML Schema specification. The XML Schema specification, on the other hand, is tightly entwined with XML namespaces, and one cannot be fully understood without the other. Together these two specifications underpin most XML-oriented technologies. XML namespaces is used seemingly everywhere and XML Schema is in broad usage. Of particular interest is that XML Schema is core to document-oriented, SOAP-based web services and to the web services framework (WSF). And, because the WSF underpins nearly all service oriented architecture (SOA) efforts, XML Schema is a critical component of SOA. But XML Schema is a large and complex specification, so much so that no single piece of software implements it in its entirety. Furthermore, ambiguities in the specification can lead to different interpretations in different implementations. And, of course, outright errors appear in XML Schema implementations, as well. It is therefore all too easy to create an XML Schema document that works perfectly in one implementation but differently or not at all in another. Problems with the XML Schema specification and its implementations are not the only source of concern. Many areas of XML and XML Schema usage can benefit from consistently applied best practices—areas such as the use of namespaces, extensibility, modularity, versioning, and the use of attributes. The fact is that there are many ways to accomplish a particular goal in XML Schema, not all of which are equally viable. Even the XML namespaces specification, as short and concise as it is, routinely trips up experienced developers. Audience The best practices contained herein provide a set of guidelines for designing XML Schema documents that are effective, interoperable, and extensible. The target audiences for this information are enterprise and application architects and software developers. While it contains much explanatory content, this Methodologies and Best Practices (MBP) document is not a tutorial; it assumes at least a passing familiarity with XML namespaces and XML Schema. Furthermore, this MBP document assumes that, within the reader’s organization, at least one group (and possibly more) has responsibility for creating, maintaining, and retiring XML Schema documents within a particular namespace. And, finally, it assumes that governance and incentives are in place to ensure that users (developers) employ these corporate schemas rather than creating new schemas that model the same business entities. As will be shown, corporate schemas should only 5 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
model data types. Corporate schemas should not allow for the direct creation of instance documents (i.e., they should have no globally declared elements). A need will always exist for developers to create actionable schemas, but such schemas should be derived from corporate data types. Scope This MBP document covers those XML Schema features useful for modeling strongly typed data and describing XML-based messages. It does not provide best practices for using XML Schema to describe “textual” content (such as Extensible Hypertext Markup Language [XHTML] or Atom). In fact, the best practices for doing so are very different from those presented here. (It’s also true that XML Schema is not particularly well suited for representing non-data-oriented documents— consider RELAX NG or Schematron instead.) Therefore, this MBP document assumes that the primary use case of XML Schema is to describe data and not text, and that the schemas will be used by developer tools to: • Create code artifacts in a given programming language that map to the schema-described data elements • Provide the information necessary to serialize native language data types to XML and back again, whether at development time or runtime • Provide a description of data as it exists “on the wire” • Provide for the automatic validation of XML instance documents Many interoperability concerns presented here apply to XML-based messaging systems, such as SOAP web services and the standard artifacts (e.g., Web Services Description Language [WSDL] documents) and processing engines (e.g., web service platforms [WSPs]) that implement them. These interoperability concerns may or may not be applicable to developers who process XML directly using the Document Object Model (DOM) or Simple API for XML (SAX) application programming interfaces (APIs). However, if developers are planning to migrate applications to WSDL-described web services and support a variety of XML binding tools, these recommendations will help to transition those applications with minimal effort. Introduction In XML Schema there are many ways of accomplishing a particular goal. However, the variations in approach are not simply a matter of syntax. Instead, choosing one mechanism over another can have significant ramifications regarding interoperability, reusability, maintainability, and how a schema document constrains an instance document. In order to make the most effective use of XML Schema, a schema author should understand the consequences of the choices made in designing that schema, and that’s what this MBP document provides: best practices for using XML Schema. 6 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
This MBP document assumes a base familiarity with XML Schema; that the reader knows how to create a schema document that describes types, elements, and attributes. Even so, this MBP document describes many aspects of XML Schema that the reader is probably already familiar with. The purpose of this review is to explain the many possible ways that data can be described with XML Schema, in order to help the reader choose which, if any, is best. The goals of this MBP document are to steer the XML Schema designer away from the specification’s many potential pitfalls regarding real-world usage and to identify some of the more esoteric components of XML Schema. Understanding Namespaces in XML The W3C released the first working draft of XML namespaces in March 1998, just one month after the core XML specification reached recommendation (1.0) status. Although XML namespaces predates the first working draft of XML Schema by more than one year, the XML namespaces working drafts anticipated the need for namespace-aware schemas by noting, “We envision applications of Extensible Markup Language where a document contains markup defined in multiple schemas.” At the time of that writing, the standard schema language for XML was Document Type Definition (DTD), which, being an integral component of the XML specification, was namespace unaware. The most recent version of the specification is the Namespaces in XML 1.0 (Second Edition), released in August 2006. (A 1.1 version of XML namespaces also exists that corresponds with the 1.1 version of XML. It primarily adds support for internationalized resource identifiers [IRIs], which allow Unicode characters in resource identifiers [e.g., http://société.com].) Though narrowly scoped and remarkably brief, the Namespaces in XML specification still trips up even experienced developers. The specification describes a way of qualifying element and attribute names from one schema so that they can be made unique in the presence of like named elements and attributes in another schema. For example, imagine that a program, “A,” could process an XML document containing a list of media (e.g., books and movies). This program takes all elements named Title and prints out the contents. Thus, when processing the following document: The Sign of the Four Sir Arthur Conan Doyle The Hound of the Baskervilles Sir Arthur Conan Doyle 7 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Program A would generate the following output: The Sign of the Four The Hound of the Baskervilles In this case, the Title element can be said to be in program A’s namespace (even though no schema exists). Now imagine a similar schemaless program, “B,” that processes XML documents containing print media (e.g., books and magazines) and prints out all Author elements. However, program B anticipates that Author elements are composed of nested elements as follows: Program B has essentially made the Title element part of its namespace. If a user wanted to use both programs, he or she might format a document as follows: The Sign of the Four Sir Arthur Conan Doyle … The result of running this document through program A would be erroneous if the program were looking for any instance of the Title element. The result of running it through program B is undefined. The two distinct instances of the Title element have collided and thrown both programs off the rails. XML namespaces addresses this issue of naming collisions. It does so by allowing a collection of element and attribute names to be associated with a Uniform Resource Identifier (URI), for that URI to be associated with an arbitrary prefix, and for that prefix to be pre-pended to all entities from that namespace. If program A were using the namespace URI http://company.com/schemas/allmedia, and program B were using the namespace URI urn:example.com:schemas:printmedia, then the above instance document could be rewritten as follows: 8 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
The Sign of the Four Sir Arthur Conan Doyle … Both programs could be rewritten to process only those elements associated with its namespace, thus solving the problem of naming collisions. Note that the XML namespaces specification does not define a means for declaring that one owns a particular namespace nor what entities are in it. In fact, someone can just state by fiat (as is done here) that the elements X, Y, and Z are in the namespace http://me.my.mine.com, and that would be just fine as far the XML namespaces specification is concerned. This, of course, does not lend itself well to automation or ease of use. It would be far better to have a means of explicitly stating namespace names and the entities within them. The Target Namespace An XML Schema document describes a vocabulary. For instance, a schema document may describe a book as follows: In this instance, the vocabulary consists of the parent Book element and, within that, the elements Title and Author. The Book element is referred to as a global element because its definition is a direct child of the schema element. Global elements can live on their own in an instance document (a document conforming to and constrained by a particular schema). The other elements, referred to as local elements, cannot exist outside the scope of the Book element. Other schema documents describe other vocabularies to meet the needs of their audiences. It is entirely likely that some other schema document also defines an element called Author (perhaps for describing magazine articles). Because instance documents can make use of multiple schemas, an 9 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
XML processor might not be able to disambiguate the fact that one instance of the Author element is derived from vocabulary 1 and another is derived from vocabulary 2. XML namespaces is the means for preventing these naming collisions. XML namespaces makes sure that names in one XML vocabulary don’t conflict with names in another XML vocabulary. In many ways, XML namespaces are similar to other namespace mechanisms, such as packages in Java or Domain Name System (DNS) for the Internet. To prevent naming collisions, it is necessary to assign each vocabulary a unique namespace in the form of a URI. To do this in XML Schema, use the targetNamespace attribute of the root schema element, as shown here: Qualifying Elements and Attributes When a schema declares a target namespace, all globally defined elements, attributes, and types become members of that namespace. Globally defined components are those that are immediate children of the schema element. In the example above, the only member of the urn:example.com:schemas:book namespace is Book. Should any XML document make use of a component contained in a namespace, Schema validation will require that the document qualify the component’s name with a reference to its namespace. This is true not only of instance documents, but also of other XML Schema documents, and even of the same schema document that defines the namespace-qualified component. In other words, a schema processor must be told which namespace a given element, attribute, or definition belongs to. This is accomplished by declaring the namespace using the xmlns (XML namespace) attribute and associating the element in question with the namespace using a qualified name (QName). Indeed, this namespace qualification appears in many of the earlier examples where the XML Schema namespace is declared and associated with all of the components in use from that namespace: … 10 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
In this example, the xmlns:xsd=“http://www.w3.org/2001/XMLSchema” attribute declares the http://www.w3.org/2001/XMLSchema namespace and associates it with the prefix string xsd. This binds the prefix string to the namespace. Components can then use the prefix string as a shorthand for the namespace URI. The xsd:schema element name is a QName. A QName comprises two parts: the namespace prefix and the component’s local name. The two parts are separated by a colon. A nonqualified component name, containing only the local name, is known as a non-colonized name (i.e., lacking a colon) or NCName. In the example below, the same technique is used to qualify the Book element. The Sign of the Four Sir Arthur Conan Doyle So, Book can be said to be “in” the urn:example.com:schemas:book namespace. But what about the other elements, Title and Author? Are they in the same namespace? The answer is no. By default, only global, top-level definitions in a schema are in the target namespace and, therefore, must be qualified. Elements that are not in the target namespace must not be qualified. However, schema authors, if they so desire, can assert that all elements and/or attributes in the schema by default belong to the target namespace. Why would they choose to do so? Two reasons: one, it gives the schema author the leeway to move elements from a local to a global context in later revisions of the schema without breaking backward compatibility. And two, it allows schema users to be less knowledgeable about the internals of the schema if they know they must simply qualify every element. To specify that all elements should belong to the target namespace, the schema author uses the elementFormDefault attribute of the schema element and sets it to qualified. Alternatively, it can be set to (the default) unqualified, which means that local elements in instance documents must not be qualified. Using QNames for each element future-proofs instance documents against changes in the schema. (The default setting—either qualified or unqualified—can also be overridden in an individual element definition via the form attribute, but this technique is not commonly used.) Here’s a schema that demonstrates the use of elementFormDefault: 11 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
The corresponding instance document now looks like this: The Sign of the Four Sir Arthur Conan Doyle In this instance document, bk is specified as the prefix string, and that string is used to qualify all elements in the document. Note that the prefix string is completely arbitrary. Instance authors are free to use any prefix string they care to. For example, the following instance document is semantically equivalent to the previous one. The Sign of the Four Sir Arthur Conan Doyle A namespace declaration is available to the element in which it is declared and to all of that element’s children (referred to as the namespace scope). In this instance, the book namespace is declared in the Book element, and it can be referenced by all children (and their children) of the Book element. From the instance document author’s point of view, a number of equally valid mechanisms exist for qualifying a component’s name. Of particular interest is the default namespace declaration, in which no prefix string is specified in the namespace declaration; for example: xmlns=“urn:example.com:schemas:book” Notice how the xmlns attribute is followed directly by the equal sign and not by a colon/string pair. As the name implies, a default namespace declaration establishes a default namespace. All elements within the scope of the default namespace declaration that are not explicitly qualified are assigned to the default namespace. The following examples demonstrate valid mechanisms for qualifying names. For simplicity’s sake, the schema is changed here to have only one local element: 12 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Valid instances of this schema include the following documents: The Sign of the Four All elements are qualified using a prefix declared in the root element. The Sign of the Four All elements are qualified using the default namespace declared in the root element. The Sign of the Four Each element declares a default namespace for itself and its children. The Sign of the Four Each element declares an explicit namespace for itself. Even though both elements are in the same namespace, the prefix strings are different. The Sign of the Four Each element declares its own namespace. The root element assigns its namespace a prefix, but the child element (and its children, if any) declares a default namespace. If elementFormDefault is set or defaulted to unqualified, then instance authors must be careful not to qualify elements that don’t belong to a namespace. This is especially easy to do when using default namespaces, which implicitly qualify child elements. For instance, in this schema all local elements don’t belong to a namespace. 13 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
The following document is not a valid instance of this schema because the Title element is implicitly qualified by the default namespace, though it should be in no namespace. (The question marks are a notation made to clearly illustrate that the example XML is invalid.) ? ? The Sign of the Four ? In order to correct this error, the instance author must override the default namespace in the Symbol element and set it to an empty string (no namespace), like so: The Sign of the Four Or not use default namespaces at all: The Sign of the Four Schema authors can also force the qualification of attributes by using the attributeFormDefault attribute of the schema element. However, the additional clutter of qualifying every attribute is generally not considered worth the effort unless the attribute is intended to be used in multiple schemas. With all this in mind, an instance document with two Author (and Title) elements from different schemas would look something like this: The Sign of the Four Sir Arthur Conan Doyle A Scandal in Bohemia Sir Arthur Conan Doyle An XML Schema-aware processor, when parsing this document, can now disambiguate the two instances of the Title and Author elements. Note that the instance document below using default namespaces is semantically equivalent to the instance document above using explicit namespaces. 14 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
The Sign of the Four Sir Arthur Conan Doyle A Scandal in Bohemia Sir Arthur Conan Doyle Choosing a Name for a Namespace Namespace URIs are just strings. One namespace is distinct from another if the namespace strings differ in any way, including case and trailing slashes. However, a namespace must be a URI in order to help guarantee uniqueness. URIs can be either Uniform Resource Locators (URLs) or URNs. URLs are familiar, and namespaces are often specified as URLs (e.g., http://www.w3.org/2001/XMLSchema), but that is not a requirement. A URN, in contrast, is a Universal Resource Name—a unique name for something, but not a pointer to it. Examples of real-world URNs include an International Standard Book Number (ISBN), a Committee on Uniform Securities Identification Procedures (CUSIP) number, or a DNS name (without a corresponding Internet Protocol [IP] address). How a schema processor maps URIs to actual XML Schema documents is processor specific. Given that namespaces in XML Schema can be either URLs or URNs, which should schema authors use? Both are viable, but an enterprise should consistently use one or the other, and not have some schema documents use URLs and others use URNs. The best practices for using URLs and URNs are presented here. Note: Though either URLs or URNs may be used for namespace names, the bulk of this MBP document has adopted URNs. Using URNs for Namespace Names The format of a URN is as follows: urn:global_namespace:local_namespace*:element, where urn is the URI scheme, global_namespace is universally unique, local_namespace is optional and locally unique, and element is a reference to something in this namespace. Schema authors can also use slashes instead of colons, though slashes typically indicate an actual path to a resource. Because URNs define names, it is common to use URNs as namespace names. Internal URNs can be of the form urn:company.com:schemaname. However, “Company” may like to use the universally unique company.com namespace for other purposes, so namespace names should be refined even further. Examples include: 15 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
• urn:company.com:schemas:products:books:genres • urn:company.com:schemas:products/books/genres • urn:schemas.company.com:products/books/genres • urn:com.company.schemas:products:books:genres Note how a DNS name is used as the globally unique namespace component (inverted in the last example)—this is only a convention, but it’s a good one. Also, notice how these URNs contain taxonomical information; that is, the “genres” schema is classified as being in the “products” category of “schemas” and further classified as being in the “books” category of the “products” category. Because URNs only need to be unique and not actually point to anything, it is arbitrary as to whether subdomains are used (colon separated) or paths are used (slash separated). However, colons are preferred. Using URLs for Namespace Names Schema authors may elect to use URLs for namespace names instead of URNs. While no requirement exists that the URL actually point to a resource on the Web, and many don’t, it is a best practice to have the URL point to something. The question is, what should the URL point to? In the case of XML Schema, the answer is, typically, the schema file itself. However, the answer is not always so obvious. The Namespaces in XML specification is completely distinct from the XML Schema specification, so it is perfectly valid for a namespace URI to point to a DTD, to a RELAX NG or Schematron schema, or even to a text document that describes the namespace in prose. In fact, even schema authors may wish to provide alternative schemas, documentation, style sheets, or source or object code. The answer, then, to the question of what should a namespace URL point to is everything—but indirectly, via a Resource Directory Description Language (RDDL) document. An RDDL document provides references to any and all information regarding a namespace that the schema author cares to include. RDDL includes a single element, resource, that has a number of descriptive attributes. For example, an RDDL document that points to an XML Schema file and English documentation might look like this: Notice that the href attribute of the first resource element is pointing at the actual XML Schema file and the second href attribute is pointing at an HTML file. Each resource element also makes use of a role, arcrole, and title attribute (more allowable attributes exist). The role and arcrole attributes 16 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
are used to describe the “nature” and “purpose” of the resource, and generally have “well known” values. For instance, the first resource element states that the referenced document is an XML Schema document, and the arcrole attribute states its use for inclusion in other schema documents. The title attribute gives a short, non-normative description of the resource. Also notice that the resource elements above are not bound up in a root element, nor are the rddl and xlink prefixes declared. This information has been elided solely for the sake of clarity. In actuality, however, the root element of an RDDL document is . That’s right, an RDDL document is really just an HTML document, XHTML to be precise, and RDDL is defined as an extension of XHTML Basic. This means that schema authors can have content in an RDDL document designed to be rendered by a browser, and they can add an rddl:resource element basically wherever they can put a paragraph, , tag. A more accurate (and superior) version of the above, then, might be something like this: RDDL Document for Book genres Genre Type Resource This document describes the complex type representing a book’s genre. XML Schema The schema is available at http://schemas.company.com/products/books/genres.xsd Documentation for this type is available at http://schemas.company.com/products/books/genres.html 17 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Now, when schema users enter the namespace URL into a browser, they are greeted with an explanatory HTML page, but when an RDDL-aware piece of software follows the link, it can easily parse out the rddl:resource elements for processing. Versioning XML Schema has no inherent notion of versioning. The schema element does have a version attribute, but applications are free to ignore it, and many do. The fact is, no universal mechanism exists for signaling version change to an application and having the application take appropriate steps. Even best practices are isolated in their effectiveness for those who practice them. However, it’s important to have some repeatable practice around this subject so that those who honor it can deal with the inevitable fact of version change. It is rare for an XML Schema document to remain static from the moment of its creation. Mistakes and oversights will need to be corrected and changes in technology and business will need to be accounted for. From a big-picture point of view, however, a schema can change in only two ways: In a way that breaks backward compatibility and in a way that doesn’t. Every schema should set the version attribute. Three dot-separated digits should be used (e.g., 1.0.3). Every change to a published schema should be reflected in the schema’s version attribute. If changes break backward compatibility (e.g., a new mandatory element is added), then the major number should be incremented. If a schema changes the capabilities of a schema without breaking backward compatibility (e.g., an optional element has been added), then the second digit should be incremented. If a schema changes such that the functionality is not affected at all (e.g., comments have been added), then the third digit should change. The first digit should be set to zero if the schema is not yet ready for production use. Examples: Using just the version attribute, however, is not enough to ensure that applications using this schema don’t break, because applications rarely pay attention to the version attribute. When a schema author makes compatibility-breaking changes, he or she must consider the schema to be entirely new. The schema must therefore occupy its own namespace. And the best way for the author to do that is to append to the namespace name the date that the schema was moved into production. Even though the namespace name has changed, do not reset the version field to 1.0.0; rather, increment as normal to show that this schema is closely related to some earlier schema. 18 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
URN examples: URL examples: The schema author should not remove earlier schemas from production until he or she is certain that no applications are relying on them. Note, though, that schemas are not always (or often) accessed from the network on each use; therefore, making schemas physically unavailable is not enough to keep them from being used. Frequently schemas are used once to, for instance, generate some static code, or a schema will be copied locally for continued use. In short, there really is no way to remove a schema from production. Best Practices: Namespaces and Versioning What follows is a succinct list of best practices that pertain to namespace usage and schema versioning. Namespaces In all cases, schemas should be assigned a target namespace for the simple reason of avoiding naming collisions. Ensure that the components that make up the namespace are logically related. Best Practice: Give all schema files their own namespace. When a schema is used by another schema or an instance document, it is necessary for the schema user to qualify only global elements. However, schema authors should use the elementFormDefault attribute to force users to qualify all elements. This allows the original schema author to move an 19 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
element from a local context to a global context at some later time without breaking those documents that are dependent on the schema. It also frees the instance author from having to empirically determine whether or not local elements need to be qualified. Finally, if developers are using .NET serialization, then elementFormDefault is mandatory. Best Practice: Use elementFormDefault to force all elements to be qualified. However, schema authors should not force users to qualify attributes. Forcing the qualification of attributes just adds clutter. Best Practice: Do not force attributes to be qualified via attributeFormDefault. It is possible to set a namespace to be the default namespace for an element and all its children until overridden. Schema authors, though, should not use a default namespace. Fewer errors are introduced if all components are explicitly qualified. Schema authors should assign a prefix string to all declared namespaces. Frequently a schema references components defined within itself, so schema authors must also declare a namespace for the target namespace. It is common to assign the target namespace the prefix “tns,” which is short for “this namespace.” The following contrived example shows this usage: Best Practice: Don’t use a default namespace. Assign the “tns” prefix to the target namespace. Namespaces must be URIs. URIs can be either URLs or URNs. Organizations should standardize on only one format and schema authors should conform to this standard. 20 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Best Practice: Standardize on one URI format for namespace names. Namespace URIs are case sensitive and are also sensitive to trailing slashes (URLs and URNs) and semicolons (URNs). Best Practice: Namespace URIs should be all lowercase and not contain trailing slashes or semicolons. How URNs are structured is entirely up to the schema author, but the globally unique component should be a DNS name (e.g., company.com). The locally unique component should be taxonomical. Colons should be used as separators unless the locally unique bit is an actual path to the document. A representative URN, then, looks something like this: urn:company.com:schemas:products:books:fiction:genres Best Practice: The globally unique component of a URN should be a DNS name. Even though schema authors are free to structure a URN as they please, an organization (via an XML data management group) should standardize on the locally unique component of a namespace. Start by collecting all corporate schemas into a single component, thus keeping the root namespace uncluttered: urn:company.com:schemas Spend some time creating a shallow taxonomy that represents the enterprise’s information technology (IT) or business needs. For instance, group schemas by business unit: urn:company.com:schemas:all_company urn:company.com:schemas:magazines urn:company.com:schemas:books Continue to categorize schemas into domain components as necessary, but don’t go too deep. urn:company.com:schemas:books:textbooks Finally, tack on the schema name, then follow the best practices given immediately below concerning schema versioning. urn:company.com:schemas:books:textbooks:subjects Think hard about namespace names early in the game. Once deployed, they will stick around for awhile. 21 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
Best Practice: The locally unique component of a URN-valued namespace name should reflect an organized taxonomy. Collect all schemas together, then branch out as many levels deep as necessary. Best Practice: If URLs are used as a namespace, the domain name should be prefaced by a host name that collects all schemas. The URL path should follow the same taxonomical practices given for URNs. Example: http://schemas.company.com/books/textbooks/subjects When a URL is used as a namespace, it is not strictly necessary for the URL to point to anything—it must only be unique among all other namespace names. However, users of a schema will expect to be able to resolve a URL-valued namespace, and will be surprised if they can’t. Best Practice: When a URL is used as a namespace name, the URL should resolve to some resource. When a URL is used for a namespace, many different resources can legitimately be put at the URL address, but only one is possible. Best Practice: A URL-valued namespace should resolve to an RDDL document. If this is not practical, then it should resolve to the schema itself. It is presumed that an enterprise has some central schema authority that is responsible for creating and managing corporate schemas and namespaces. However, it is impossible for such a group to create all schemas; local groups and individuals will also need this capability. These schema authors should be assigned a namespace of their own that reflects a group over which the authors can exercise control. That is, user schemas should have namespaces that look something like these: urn:company.com:schemas:human_resources_schemas:* http://schemas.company.com/human_resources_schemas/* The http example above implies that local schema authors be able to publish to their portion of the web server or content management system that controls this address. Best Practice: Locally created schemas should be collected into a namespace that the developer can control. Versioning As described above, the final component of the target namespace should be the date on which the schema was created. Should the schema be updated in a way that breaks backward compatibility, the namespace name should be changed by updating this date component. 22 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
urn:company.com:schemas:books:textbooks:subjects:2006-06-15 http://schemas.company.com/books/textbooks/subjects/2006-06-15 Best Practice: Append the schema creation date to the namespace name using the ISO date standard of YYYY-MM-DD. Update this date to reflect incompatible changes to the schema. Don’t update the namespace URI if changes are backward compatible. Best Practice: Use the schema/version attribute. Adopt a three-digit, dot- separated version number. Update the first digit for compatibility-breaking changes, update the second digit for functionality-altering changes, and update the third digit for inconsequential changes. Do not reset the version to 1.0.0, when the date component of the namespace changes. Best Practice: Keep earlier versions of a schema available until all use of them has been eliminated. Creating XML Schema Documents XML Schema is made up of two principal components: a means to describe the structure of an XML document and a means to provide type information for elements and attributes. Accordingly, the information provided in this section follows roughly the same organization. Structuring an XML Schema Document The principal structural components of an XML Schema document are the element and attribute elements, which determine the appearance of elements and attributes in an instance document (a document conforming to a particular schema). Elements and attributes can be typed, and complex types can have embedded elements or attributes. Elements The element element is used to dictate the appearance of elements in an instance document. Elements can be defined globally or local to a specific type. Global Elements When the element element is a direct child of the schema element (i.e., not a child of any other element), it is said to be a global element. Global elements may be used as root elements in an instance document, and they can be referenced (via the ref attribute) by other schema elements. 23 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
The example below is a complete schema document. Both elements, Author and Title, are global elements. And below is a conforming instance document. Notice that root elements use the xmlns attribute to declare what namespace the element belongs to. Because the Author element is global, it can be used as a root element in this instance document. Brian Kernighan The following is another conforming instance document. In this case, the global element Title is used as the root element. The C Programming Language The third example of a conforming instance document, below, shows that an instance document can have as many root elements as it wants—a schema document cannot constrain root elements to a specified number of occurrences. Notice that both root elements must declare their namespace because neither element is within the scope of the other. Brian Kernighan Dennis Ritchie This final example is similar to the previous one, except that it shows both root elements in use in a single document. Brian Kernighan The C Programming Language Local Elements When an element element is a child of some other element, it is said to be a local element. In an instance document, a local element can exist only within the scope of its parent. It cannot be a root element. By default, local elements are not part of a schema’s target namespace. As discussed earlier, schema developers can effectively override this default by specifying elementFormDefault=“qualified” in the schema element or by specifying form=“qualified” on a specific element definition. In this example, the complexType and sequence elements are introduced in order to maintain correct syntax; these tags are explained in the “Custom Types” section of this MBP document. For 24 BURTON GROUP 7090 Union Park Center · Suite 200 · Midvale, Utah · 84047 · P 801.566.2880 · F 801.566.3611 · www.burtongroup.com
You can also read