http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Charter
Release Info

Installation
Download

FAQs
Samples
API Docs

Features
Properties

XML Schema
Caveats
Feedback
Y2K Compliance

Source Repository
User Mail Archive
Dev Mail Archive

Questions
 

Answers
 
What should I be aware of when using various DOM parsers?
 

There are a couple of points to note when using the various DOM parsers. This FAQ discusses some of the differences between the Xerces, Oracle and Sun XML parsers:

  1. Parsing methods:
    The Xerces-J and Oracle parsers have a parser object that parses XML files and constructs a DOM tree which is queried after parsing.
    The Sun parser calls a static method on the XmlDocument class to parse and construct a DOM tree.
  2. Specifying the source file:
    All three parsers allow specifying the source of the XML document using the SAX InputSource object as well as parsing from java.io.InputStream and java.io.Reader object.
  3. Error handling:
    The Xerces-J parser uses the SAX ErrorHandler mechanism on all parser types, including DOM.
    The Oracle parser only allows you to specify which output stream or writer you want the error to be written.
    The Sun parser has no way to request error notification when parsing XML files into DOM trees. An exceptions will be thrown if an error occurs during parsing.
  4. Validation:
    The Xerces-J and Oracle DOM parsers use a method to set validation.
    Because of the way that DOM documents are constructed from XML files in the Sun parser, validation is set via a parameter to the static createXmlDocument method.
  5. Standard versus proprietary features:
    If the user has written their programs using the W3C DOM API, then migrating to Xerces-J is easy. If however, the user takes advantage of non-standard, proprietary features of the Oracle and Sun parsers and DOM implementations, migration will be harder. This document does not go into any detail regarding migration of features specific to each parser's implementation that are non-standard.

Samples:

Xerces 1.0.x:

// instantiate parser
org.apache.xerces.parsers.DOMParser parser;
parser = new org.apache.xerces.parsers.DOMParser();

// specifying validation
boolean validate = /* true or false */;
parser.setFeature("http://xml.org/sax/features/validation", validate);

// installing error handler
org.xml.sax.ErrorHandler handler = /* SAX error handler */;
parser.setErrorHandler(handler);

// parsing document from URI string
String uri = /* uri */;
parser.parse(uri);

// parsing document from input stream
java.io.InputStream stream = /* input stream */;
parser.parse(new org.xml.sax.InputSource(stream));

// parsing document from reader
java.io.Reader reader = /* reader */;
parser.parse(new org.xml.sax.InputSource(reader));

// querying document after parse
org.w3c.dom.Document document;
document = parser.getDocument();

Oracle 2.0.2.x:

// instantiate parser
oracle.xml.parser.v2.DOMParser parser;
parser = oracle.xml.parser.v2.DOMParser();

// specifying validation
boolean validate = /* true or false */;
parser.setValidationMode(validate);

// installing error stream to output stream (with and without encoding)
java.io.OutputStream output = /* output stream */;
String encoding = /* Java encoding name */;
parser.setErrorStream(output);
parser.setErrorStream(output, encoding);

// installing error stream to print writer
java.io.PrintWriter printer = /* print writer */;
parser.setErrorStream(printer);

// parsing document from URI string
String uri = /* uri */;
parser.parse(uri);

// parsing document from input stream
java.io.InputStream stream = /* input stream */;
parser.parse(stream);

// parsing document from reader
java.io.Reader reader = /* reader */;
parser.parse(reader);

// querying document after parse
org.w3c.dom.Document document;
document = parser.getDocument();

Sun TR2:

// parsing document from URI string
String uri = /* uri */;
Document document;
document = com.sun.xml.tree.XmlDocument.createXmlDocument(uri);

// parsing document from URI string (with validation)
boolean validate = /* true or false */;
document = com.sun.xml.tree.XmlDocument.createXmlDocument(uri, validate);

// parsing document from input stream
java.io.InputStream stream = /* input stream */;
document = com.sun.xml.tree.XmlDocument.createXmlDocument(stream, validate);

// parsing document from reader
java.io.Reader reader = /* reader */;
document = com.sun.xml.tree.XmlDocument.createXmlDocument
           (new org.xml.sax.InputSource(reader), validate);

What should I be aware of when using various SAX parsers?
 

There are a couple of points to note when using the various SAX parsers:

The SAX API has detailed specifications on how documents are parsed and entities are resolved, so little migration effort required. The only change is the construction of the SAX parser. See the following examples for construction details.

NoteRegarding validation:
If a parser is SAX2 compliant, there is a standard way of turning on validation. The Xerces parser implements the appropriate methods today, even though they haven't been finalized, yet. The parsers downloaded from Oracle and Sun do not yet implement these methods. The Oracle parser has a method to turn validation on and off. The Sun parser requires you to instantiate a separate parser object to perform validation.

Samples:

Xerces 1.0.x:

// instantiate parser
org.apache.xerces.parsers.SAXParser parser;
parser = new org.apache.xerces.parsers.SAXParser();

// specifying validation
boolean validate = /* true or false */;
parser.setFeature("http://xml.org/sax/features/validation", validate);

// installing error handler
org.xml.sax.ErrorHandler errorHandler = /* SAX error handler */;
parser.setErrorHandler(errorHandler);

// installing document handler
org.xml.sax.DocumentHandler documentHandler = /* SAX document handler */;
parser.setDocumentHandler(documentHandler);

// parsing document from URI string
String uri = /* uri */;
parser.parse(uri);

// parsing document from input stream
java.io.InputStream stream = /* input stream */;
parser.parse(new org.xml.sax.InputSource(stream));

// parsing document from reader
java.io.Reader reader = /* reader */;
parser.parse(new org.xml.sax.InputSource(reader));

Oracle 2.0.2.x:

// instantiate parser
oracle.xml.parser.v2.SAXParser parser;
parser = oracle.xml.parser.v2.SAXParser();

// specifying validation
boolean validate = /* true or false */;
parser.setValidationMode(validate);

// ... the rest is the same as Xerces ...

Sun TR2:

// instantiate parser
boolean validate = /* true or false */;
com.sun.xml.parser.Parser parser;
if (validate) {
    parser = new com.sun.xml.parser.ValidatingParser();
}
else {
    parser = new com.sun.xml.parser.Parser();
}

// ... the rest is the same as Xerces ...

How do I migrate my code from XML4J Version 2.0.x?
 

Migrating from the version 2.0.x native SAX and DOM parser classes should be straight forward.

change this XML4J class  to this Xerces-J class 
com.ibm.xml.parsers.SAXParser  org.apache.xerces.parsers.SAXParser 
com.ibm.xml.parsers.ValidatingSAXParser  org.apache.xerces.parsers.SAXParser + switch 
com.ibm.xml.parsers.NonValidatingDOMParser  org.apache.xerces.parsers.DOMParser 
com.ibm.xml.parsers.DOMParser  org.apache.xerces.parsers.DOMParser + switch 

Table entries that say " + switch" mean that you should use the Configurable API to turn validation on. See the answer in Validation.




Copyright © 1999-2005 The Apache Software Foundation. All Rights Reserved.