Don't use XML where it doesn't make sense. XML is not a panacea.
You will not get good performance by transferring and parsing a
lot of XML files.
Using XML is memory, CPU, and network intensive.
Avoid creating a new parser each time you parse; reuse parser
instances. A pool of reusable parser instances might be a good idea
if you have multiple threads parsing at the same time.
The parser configuration may affect the performance of the parser.
For example, if you are interested in evaluating the parser performance
with DTDs you may want to use the
(Note: you can build Xerces with DTD-only support using
dtdjars build target).
|XML Application Performance|
When writing an XML application there are a number of choices you can make that effect
performance. Some of the things which will affect the performance of your application
are described below.
- Grammar Caching -- if you do need validation, consider
using grammar caching to reduce the cost of validation by allowing the parser
to skip grammar loading and assessment. See this FAQ
on how to perform grammar caching with Xerces.
- Validation -- if you don't need validation (and infoset augmentation)
of XML documents,
don't include validators (DTD or XML Schema) in the pipeline.
Including the validator components in the pipeline will result in a performance
hit for your application: if a validator component is present in the pipeline,
Xerces will try to augment the infoset even if the validation feature is set to false.
If you are only interested in validating against DTDs don't include
XML Schema validator in the pipeline.
- DOCTYPE -- if you don't need validation,
avoid using a DOCTYPE line in your XML document.
The current version of the parser will always read the DTD if the DOCTYPE line
is specified even when validation feature is set to false.
- Deferred DOM --
by default, the DOM feature defer-node-expansion is true,
causing DOM nodes to be expanded as the tree is traversed.
The performance tests produced by Denis Sosnoski showed that Xerces DOM with
deferred node expansion offers poor performance and large memory size
for small documents (0K-10K). Thus, for best performance when using Xerces DOM
with smaller documents you should disable the deferred node expansion feature.
For larger documents (~100K and higher) the deferred DOM offers
better performance than non-deferred DOM but uses a large memory size.
- SAX --
if memory usage using DOM is a concern, SAX should be considered;
the SAX parser uses very little memory and notifies the
application as parts of the document are parsed.
For more detailed information on best practices for writing XML applications
you may want to read the following series of articles:
- Write XML documents and develop applications using the SAX and DOM APIs
- Reuse parser instances with the Xerces2 SAX and DOM implementations
- XNI, Xerces2 features and properties, and caching schemas
These three articles discuss general performance tips in addition to ones specifically pertaining to Xerces2.