Writing Application FAQs


	Questions

Handling Errors
What does "non-validating" mean?
Parsing Several Documents
Pull-parsing documents
Getting More Information for Your Entity Resolver


	Answers


	How do I handle errors?

You should register an error handler with the parser by supplying a class which implements the org.xml.sax.ErrorHandler interface. This is true regardless of whether your parser is a DOM based or SAX based parser.

You can register an error handler on a DocumentBuilder created using JAXP like this:

import javax.xml.parsers.DocumentBuilder;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

ErrorHandler handler = new ErrorHandler() {
    public void warning(SAXParseException e) throws SAXException {
        System.err.println("[warning] "+e.getMessage());
    }
    public void error(SAXParseException e) throws SAXException {
        System.err.println("[error] "+e.getMessage());
    }
    public void fatalError(SAXParseException e) throws SAXException {
        System.err.println("[fatal error] "+e.getMessage());
    throw e;
    }
};

DocumentBuilder builder = /* builder instance */;
builder.setErrorHandler(handler);

If you are using DOM Level 3 you can register an error handler with the DOMBuilder by supplying a class which implements the org.w3c.dom.DOMErrorHandler interface. For more information, refer to FAQ

You can also register an error handler on a SAXParser using JAXP like this:

import javax.xml.parsers.SAXParser;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;

ErrorHandler handler = new ErrorHandler() {
    public void warning(SAXParseException e) throws SAXException {
        System.err.println("[warning] "+e.getMessage());
    }
    public void error(SAXParseException e) throws SAXException {
        System.err.println("[error] "+e.getMessage());
    }
    public void fatalError(SAXParseException e) throws SAXException {
        System.err.println("[fatal error] "+e.getMessage());
    throw e;
    }
};

SAXParser parser = /* parser instance */;
parser.getXMLReader().setErrorHandler(handler);


	Why does "non-validating" not mean "well-formedness checking" only?

Using a "non-validating" parser does not mean that only well-formedness checking is done! There are still many things that the XML specification requires of the parser, including entity substitution, defaulting of attribute values, and attribute normalization.

This table describes what "non-validating" really means for Xerces-J parsers. In this table, "no DTD" means no internal or external DTD subset is present.

	non-validating parsers		validating parsers
	DTD present	no DTD	DTD present	no DTD
DTD is read	Yes	No	Yes	Error
entity substitution	Yes	No	Yes	Error
defaulting of attributes	Yes	No	Yes	Error
attribute normalization	Yes	No	Yes	Error
checking against model	No	No	Yes	Error


	How do I more efficiently parse several documents sharing a common DTD?

By default, the parser does not cache DTD's. The common DTD, since it is specified in each XML document, will be re-parsed once for each document.

However, there are things that you can do to make the process of reading DTD's more efficient:

First, have a look at the grammar caching/preparsing FAQ
keep your DTD and DTD references local
use internal DTD subsets, if possible
load files from server to local client before parsing
Cache document files into a local client cache. You should do an HTTP header request to check whether the document has changed, before accessing it over the network.
Do not reference an external DTD or internal DTD subset at all. In this case, no DTD will be read.
Use a custom EntityResolver and keep common DTDs in a memory buffer.


	How can I parse documents in a pull-parsing fashion?

Since the pull-parsing API is specific to Xerces, you have to use a Xerces-specific method to create parsers, and parse documents.

First, you need to create a parser configuration that implements the XMLPullParserConfiguration interface. Then you need to create a parser from this configuration. To parse documents, method parse(boolean) should be called.

import org.apache.xerces.parsers.XIncludeAwareParserConfiguration;
import org.apache.xerces.parsers.SAXParser;
import org.apache.xerces.xni.parser.XMLInputSource;

  ...

boolean continueParse = true;
void pullParse(XMLInputSource input) throws Exception {
    XIncludeAwareParserConfiguration config = new XIncludeAwareParserConfiguration();
    SAXParser parser = new SAXParser(config);
    config.setInputSource(input);
    parser.reset();
    while (continueParse) {
        continueParse = continueParse && config.parse(false);
    }
}

In the above example, a SAXParser is used to pull-parse an XMLInputSource. DOMParser can be used in a similar way. A flag continueParse is used to indicate whether to continue parsing the document. The application can stop the parsing by setting this flag to false.


	I would like to know more about the kind of entity my XMLEntityResolver's been asked to resolve. How can I go about convincing Xerces to tell me more?

XNI only guarantees that you'll receive an XMLResourceIdentifier object during an XMLEntityResolver#resolveEntity callback. Nonetheless, the xni.grammars package has a number of interfaces which extend XMLResourceIdentifier that can provide considerably more information.

To take advantage of this, you'll first need to see whether the object you've been passed is an instance of the org.apache.xerces.xni.grammars.XMLGrammarDescription interface. This interface contains a method called getGrammarType which can tell you what kind of grammar is involved (for the moment, XML Schema and DTD's are all that's supported). Once you know the type of grammar, you can cast once again to either org.apache.xerces.xni.grammars.XMLDTDDescription or org.apache.xerces.xni.grammars.XMLSchemaDescription which contain a wealth of information specific to these types of grammars. The javadocs for these interfaces should provide sufficient information for you to know what their various methods return.