http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Charter
Release Info

Installation
Download
Bug-Reporting

FAQs
Samples
API JavaDoc

Features
Properties

XNI Manual
XML Schema
SAX
DOM
Limitations

Source Repository
User Mail Archive
Dev Mail Archive

Overview
 

A parser written to conform to the Xerces Native Interface (XNI) framework is configured as a pipeline of parser components. The document's "streaming" information set flows through this pipeline of components to produce some sort of programming interface as the output. For example, the pipeline could produce a W3C Document Object Model (DOM) or a series of Simple API for XML (SAX) events.

The core XNI interfaces provide a mechanism for the document information to flow from component to component. However, beyond the basic information interfaces, XNI also defines a framework for constructing these pipelines and parser configurations. This document is designed to give you an overview of this framework and what a parser looks like that is written to conform to the Xerces Native Interface. An overview of these frameworks are described below:

For more detailed information, refer to the following documents:


Pipeline
 

The XNI parser pipeline is any combination of components that are either capable of producing XNI events, consuming XNI events, or both. All pipelines consist of a source, zero or more filters, and a target. The source is typically the XML scanner; common filters are DTD and XML Schema validators, a namespace binder, etc; and the target is the parser that consumes the XNI events and produces a common programming interface such as DOM or SAX. The following diagram illustrates the basic pipeline configuration.

Basic Pipeline Configuration

However, this is a simplified view of the pipeline configuration. The Xerces Native Interface actually defines two different pipelines with three interfaces: one for document information and two for DTD information.

The Xerces2 parser, the reference implementation of XNI, contains more components than the basic pipeline configuration diagram shows. The following diagram shows the Xerces2 pipeline configuration. The arrow going from left to right on the top of the image represents the flow of document information and the arrows on the bottom of the image represent the DTD information flowing through the parser pipeline.

Xerces2 Pipeline Configuration

As the diagram shows, the "Document Scanner" is the source for document information and the "DTD Scanner" is the source for DTD information. Both document and DTD information generated by the scanners flow into the "DTD Validator" where structure and content is validated according to the DTD grammar, if present. From here, the validated document information with possible augmentations such as default attribute values and attribute value normalization flows to the "Namespace Binder" which applies the namespace information to elements and attributes. The newly namespace-bound document document information then flows to the "Schema Validator" for validation based on the XML Schema, if present. Finally, the document and DTD information flow to the "Parser" which generates a programming interface such as DOM or SAX.

XNI defines the document information using a number of core interfaces. (These interfaces are described in more detail in the Core API documentation.) But XNI also defines a set of interfaces to build parser configurations that assemble the pipelines in order to parse documents. The next section gives a general overview of the this parser configuration provided by XNI.


Configuration
 

A parser implementation written using the Xerces Native Interface can be seen as a collection of components, some of which are connected together to form the pipelines for document and DTD information. All of the components in the parser are managed by a "Component Manager" that does the following:

  • Keeps track of parser settings and options,
  • Instantiates and configures the various components in the parser, and
  • Assembles the parsing pipeline and initiates parsing of documents.

The following diagram represents a typical parser configuration that has a component manager and various components such as a "Symbol Table", "Scanner", etc.

Generic Parser Configuration

Some of the components in a configuration are configurable and others are not. The actual details regarding component configuration, however, can be found in the XNI Parser Configuration document. But for now it is sufficient to understand the basic overview of parser configurations.

The XNI parser configuration framework provides an easy and convenient way to construct different kinds of parser configurations. By separating the configuration from the API generation (in each specific parser object), different parser configurations can be used to build a DOM tree or emit SAX events without re-implementing the DOM or SAX code. The following diagram shows this separation. Notice how the document information flows through the pipeline in the parser configuration and then to the parser object which generates different APIs.

Configuration and Parser Separation



Copyright © 1999-2022 The Apache Software Foundation. All Rights Reserved.