Using XML Schemas


	Questions

Usage
Using Entities and CDATA Sections
XPath 2.0 support for XML Schema 1.1 validation
User defined error messages, for XML Schema 1.1 assertion failures
XML Schema API
Changes to PSVI
Accessing PSVI via XNI
Accessing PSVI via DOM
Accessing PSVI via SAX
Accessing PSVI via the JAXP 1.4 Validation API
Parsing and analyzing an XML schema
Using XSLoader to get an XSModel


	Answers


	How do I validate against XML schema?

XML Schema 1.0 validation has been integrated with the regular SAXParser and DOMParser classes, and also with the JAXP validation API using the XSD 1.0 Schema factory. No special classes are required to parse documents that use a schema.

For XML Schema 1.1 validation, the preferred way is to use the JAXP validation API, using the XSD 1.1 Schema factory. Here's an example:

import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;

...

StreamSource[] schemaDocuments = /* created by your application */;
Source instanceDocument = /* created by your application */;

SchemaFactory sf = SchemaFactory.newInstance(
    "http://www.w3.org/XML/XMLSchema/v1.1");
Schema s = sf.newSchema(schemaDocuments);
Validator v = s.newValidator();
v.validate(instanceDocument);

Similar to XML Schema 1.0 validation using SAXParser and DOMParser classes that was available earlier with Xerces, the same has been enhanced to support XML Schema 1.1 validation as well. To be able to do this, we need to construct the XSD 1.1 Schema factory with the following java statement, SchemaFactory.newInstance("http://www.w3.org/XML/XMLSchema/v1.1") and do the subsequent validation.

You can also refer to the JAXP sample, SourceValidator, where you can validate XML documents against 1.1 schemas by specifying an option "-xsd11" when running the sample.

Each document that uses XML Schema grammars must specify the location of the grammars it uses by using an xsi:schemaLocation attribute if they use namespaces, and an xsi:noNamespaceSchemaLocation attribute otherwise. These are usually placed on the root / top-level element in the document, though they may occur on any element; for more details see XML Schema Part 1 section 4.3.2. Here is an example with no target namespace:


	<document xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:noNamespaceSchemaLocation='document.xsd'> ... </document>

Here is an example with a target namespace. Note that it is an error to specify a different namespace than the target namespace defined in the Schema.


	<document xmlns='http://my.com' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://my.com document.xsd'> ... </document>

Review the sample file, 'data/personal.xsd' for an example of an XML Schema grammar.


	How does the XML Schema processor treat entities and CDATA sections?

According to the XML Infoset the infoset items contributing to the [character information item] are: characters in the document, whether literally, as a character reference, or within a CDATA section, or within Entity Reference. The XML Schema specification "requires as a precondition for assessment an information set as defined in [XML-Infoset]" (Appendix D) and thus Xerces might attempt to normalize data in an entity reference or CDATA section. To preserve character data within entity references and CDATA sections, turn off http://apache.org/xml/features/validation/schema/normalized-value feature.


	How an XPath 2.0 engine is used for XML Schema 1.1 assertions and CTAs?

XML Schema 1.1 assertions and CTAs require an XPath processor during evaluation. For XSD 1.1 assertions, full XPath 2.0 support is required. For XSD 1.1 CTAs the XSD schema engines can provide full XPath 2.0 support, or they can implement a smaller XPath subset as defined by the XSD 1.1 language. For CTAs Xerces uses the XSD 1.1 CTA XPath subset language by default, but can be made to use the full XPath 2.0 support by setting the value of Xerces feature http://apache.org/xml/features/validation/cta-full-xpath-checking to 'true'. The native CTA XPath processor in Xerces-J was written for efficiency, so you will likely get better performance if your XPath expressions fall within the minimum subset defined by the XML Schema 1.1 specification. For full XPath 2.0 evaluation (for XSD 1.1 assertions, and optionally for CTAs), Xerces-J uses an XPath 2.0 engine. Xerces-J does bundle along an XPath 2.0 engine jar for these features (that requires JDK 1.4 or later).

We would like to acknowledge, work of following people for the XPath 2.0 support that comes with Xerces's XML Schema 1.1 processor: Andrea Bittau (responsible for the original design and implementation, of XPath 2.0 processor that's been used by Xerces. Andrea donated his XPath 2.0 processor to the Eclipse Foundation, that Xerces uses), Dave Carver (Dave helped to set up software infrastructure at Eclipse Foundation's Web Tools Platform project, for testing the XPath 2.0 processor as per W3C XPath 2.0 test suite. He also helped to improve compliance of XPath 2.0 processor, to the W3C XPath 2.0 test suite by providing numerous bug fixes and implementation), Jesper Steen Moeller (Jesper as an Eclipse's Web Tools Platform committer, helped to improve the implementation of the XPath 2.0 processor).


	How to specify a user defined error message, when an XML Schema 1.1 assertion returns a 'false' result?

When evaluation of an XSD 1.1 assertion fails the Xerces XML Schema validator would produce a default error message, which would say that which element or attribute was not validated successfully by an assertion and the schema type involved during validation. It is however possible to, specify an user-defined error message for assertion failures. This is done by specifying an attribute "message" in the XML namespace 'http://xerces.apache.org' on an xs:assert or xs:assertion element in the schema. In this case, the value of attribute "message" is a text of user-defined assertion error message which is generated by the Xerces XML Schema validator when an assertion fails.

While using xs:assertion facet within simple types, if a user defined error message is used, a user may dynamically construct the assertion error message by getting the value of XPath 2.0 context variable $value into the error message. Following is an example of such an assertion error message: xerces:message="The number {$value} is not divisible by 2". With such a description of assertion error message, a value from XML instance document is assigned to a variable reference {$value} which is a defined keyword. During a validation failure, an error message for this example would be produced as follows, "The number 3 is not divisible by 2" (the variable reference {$value} is assigned a value 3).

User-defined error messages for assertion failures are not a standard feature of the XML Schema 1.1 specification, and are a Xerces extension to help XML Schema document authors to specify a problem domain specific error messages.


	Does Xerces provide access to the post schema validation infoset (PSVI)?

Xerces implements the XML Schema API specification that defines an API for accessing and querying the post schema validation infoset (PSVI) as defined in Contributions to the post-schema-validation infoset (Appendix C.2). This API also defines interfaces for loading XML schema documents.

For more information please refer to the interfaces.

The Xerces 2.6.2 release fixes a documentation bug in the XML Schema API. In particular in the XSModel interface the order of the parameters in getTypeDefinition, getNotationDeclaration, getModelGroupDefinition, getElementDeclaration, getAttributeDeclaration, getAttributeGroup methods have been changed from (String namespace, String name) to (String name, String namespace).


	What happened to the PSVI interfaces in org.apache.xerces.xni.psvi?

The PSVI interfaces which used to be part of the org.apache.xerces.xni.psvi and org.apache.xerces.impl.xs.psvi packages were modified and have been moved into the XML Schema API.


	How do I access PSVI via XNI?

From within an XMLDocumentHandler, one can retrieve PSVI information while in the scope of the document handler start/end element calls:

import org.apache.xerces.xs.*;

...

public void startElement(QName element, XMLAttributes attributes,
    Augmentations augs) {
    ElementPSVI elemPSVI = (ElementPSVI)augs.getItem("ELEMENT_PSVI");
    // get PSVI items of this element out of elemPSVI
    short attemp = elemPSVI.getValidationAttempted();
    short validity = elemPSVI.getValidity();
    ...
}

For more information, please refer to the API documentation for the XML Schema API.

The above code shows how to retrieve PSVI information after elements/attributes are assessed. The other kind of information PSVI offers is the property [schema information]. This property exposes all schema components in the schema that are used for assessment. These components and the schema itself are represented by interfaces in the org.apache.xerces.xs package.

[schema information] property is only available on the endElement method for the validation root. When this method is called, information about various components can be retrieved by

import org.apache.xerces.xs.*;

...

public void endElement(QName element, Augmentations augs) {
    ElementPSVI elemPSVI = (ElementPSVI)augs.getItem("ELEMENT_PSVI");
    XSModel xsModel = elemPSVI.getSchemaInformation();
    // get a list of [namespace schema information information item]s,
    // one for each namespace.
    XSNamespaceItemList nsItems = xsModel.getNamespaceItems();
    ...
    
    // get an element declaration of the specified name and namespace
    XSElementDeclaration elemDecl = xsModel.getElementDeclaration
        (namespace, name);
    ...
}


	How do I access PSVI via DOM?

Use the http://apache.org/xml/properties/dom/document-class-name property to set org.apache.xerces.dom.PSVIDocumentImpl as the implementation of the org.w3c.dom.Document interface. In the resulting DOM tree, you may cast org.w3c.dom.Element to the org.apache.xerces.xs.ElementPSVI and org.w3c.dom.Attr to the org.apache.xerces.xs.AttributePSVI.

import org.apache.xerces.xs.ElementPSVI;
import org.apache.xerces.xs.XSModel;
import org.apache.xerces.xs.XSNamedMap;

...

Document document = parser.getDocument();
Element root = document.getDocumentElement();

// retrieve PSVI for the root element
ElementPSVI rootPSVI = (ElementPSVI)root;

// retrieve the schema used in validation of this document
XSModel schema = rootPSVI.getSchemaInformation();
XSNamedMap elementDeclarations =
    schema.getComponents(XSConstants.ELEMENT_DECLARATION);

// get schema normalized value
String normalizedValue = rootPSVI.getSchemaNormalizedValue();


	How do I access PSVI via SAX?

The Xerces SAX parser also implements the org.apache.xerces.xs.PSVIProvider interface. Within the scope of the methods handling the start (org.xml.sax.ContentHandler.startElement) and end (org.xml.sax.ContentHandler.endElement) of an element, applications may use the PSVIProvider to retrieve the PSVI related to the element and its attributes.

import org.apache.xerces.xs.PSVIProvider;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

...

SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader();
PSVIProvider psviProvider = (PSVIProvider)reader;


	How do I access PSVI via the JAXP 1.4 Validation API?

Like the Xerces SAX parser the implementations of javax.xml.validation.Validator and javax.xml.validation.ValidatorHandler also implement the org.apache.xerces.xs.PSVIProvider interface. Within the scope of the methods handling the start (org.xml.sax.ContentHandler.startElement) and end (org.xml.sax.ContentHandler.endElement) of an element, applications may use the PSVIProvider to retrieve the PSVI related to the element and its attributes.


	How do I parse and analyze an XML schema?

Please, refer to the Examining Grammars FAQ.


	Can I parse and query XML Schema components in memory?

Yes, the XML Schema API specification defines an interface called XSLoader which provides methods for loading XML Schema documents into an XSModel. The XSImplementation interface provides a method to create an XSLoader using the DOM Level 3 Bootstraping mechanism. An application can get a reference to an XSImplementation using the getDOMImplementation method on the DOMImplementationRegistry object with the feature string "XS-Loader". To create an XSLoader you need to do something like this:

import org.w3c.dom.DOMImplementationRegistry;
import org.apache.xerces.xs.XSImplementation;
import org.apache.xerces.xs.XSLoader; 

...
   
// Get DOM Implementation using DOM Registry
System.setProperty(DOMImplementationRegistry.PROPERTY,
    "org.apache.xerces.dom.DOMXSImplementationSourceImpl");
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();

XSImplementation impl = 
    (XSImplementation) registry.getDOMImplementation("XS-Loader");

XSLoader schemaLoader = impl.createXSLoader(null);

...