http://xml.apache.org/http://www.apache.org/http://www.w3.org/

Home

Readme
Charter
Release Info

Installation
Download
Build Instructions

Programming
Samples
FAQs

API Reference
DOM C++ Binding
Migration Guide

Feedback
Bug-Reporting
Mailing Lists

Source Repository
Applications

Design Objectives
 

The C++ DOM implementation is based on the Apache Recommended DOM C++ binding.

The design objective aims at meeting the following requirements:

  • Reduced memory footprint.
  • Fast - especially for use in server style and multi-threaded applications.
  • Good scalability on multiprocessor systems.
  • More C++ like and less Java like.

DOM Level 3 Support in Xerces-C++
 

The Xerces-C++ 2.8.0 contains a partial implementation of the W3C Document Object Model Level 3. This implementation is experimental. See the document DOM Level 3 Support for details.


Using DOM API
 
Accessing API from application code
 
#include <xercesc/dom/DOM.hpp>

The header file <dom/DOM.hpp> includes all the individual headers for the DOM API classes.


Class Names
 

The DOM class names are prefixed with "DOM" (if not already), e.g. "DOMNode". The intent is to prevent conflicts between DOM class names and other names that may already be in use by an application or other libraries that a DOM based application must link with.

   DOMDocument*   myDocument;
   DOMNode*       aNode;
   DOMText*       someText;
         

Objects Management
 

Applications would use normal C++ pointers to directly access the implementation objects for Nodes in C++ DOM.

Consider the following code snippets

   DOMNode*       aNode;
   DOMNode* docRootNode;

   aNode = someDocument->createElement(anElementName);
   docRootNode = someDocument->getDocumentElement();
   docRootNode->appendChild(aNode);
         

Memory Management
 

The C++ DOM implementation provides a release() method for releasing any "orphaned" resources that were created through createXXXX factory method. Memory for any returned object are owned by implementation. Please see Apache Recommended DOM C++ binding for details.

Objects created by DOMImplementation::createXXXX
 

Users must call the release() function when finished using any objects that were created by the DOMImplementation::createXXXX (e.g. DOMBuilder, DOMWriter, DOMDocument, DOMDocumentType).

Access to a released object will lead to unexpected behaviour.

NoteWhen a DOMDocument is released, all its associated children AND any objects it owned (e.g. DOMRange, DOMTreeWalker, DOMNodeIterator or any orphaned nodes) will also be released.
NoteWhen a DOMDocument is cloned, the cloned document has nothing related to the original master document and need to be released explicitly.
NoteWhen a DOMDocumentType has been inserted into a DOMDocument and thus has a owner, it will then be released automatically when its owner document is released. DOMException::INVALID_ACCESS_ERR will be raised if releasing such owned node.

Objects created by DOMDocument::createXXXX
 

Users can call the release() function to indicate the release of any orphaned nodes. When an orphaned Node is released, its associated children will also be released. Access to a released Node will lead to unexpected behaviour. These orphaned Nodes will eventually be released, if not already done so, when its owner document is released

NoteDOMException::INVALID_ACCESS_ERR will be raised if releasing a Node that has a parent (has a owner).

Objects created by DOMDocumentRange::createRange or DOMDocumentTraversal::createXXXX
 

Users can call release() function when finished using the DOMRange, DOMNodeIterator, DOMTreeWalker. Access to a released object will lead to unexpected behaviour. These objects will eventually be released, if not already done so, when its owner document is released


Here is an example

    //
    //  Create a small document tree
    //

    {
        XMLCh tempStr[100];

        XMLString::transcode("Range", tempStr, 99);
        DOMImplementation* impl = DOMImplementationRegistry::getDOMImplementation(tempStr, 0);

        XMLString::transcode("root", tempStr, 99);
        DOMDocument*   doc = impl->createDocument(0, tempStr, 0);
        DOMElement*   root = doc->getDocumentElement();

        XMLString::transcode("FirstElement", tempStr, 99);
        DOMElement*   e1 = doc->createElement(tempStr);
        root->appendChild(e1);

        XMLString::transcode("SecondElement", tempStr, 99);
        DOMElement*   e2 = doc->createElement(tempStr);
        root->appendChild(e2);

        XMLString::transcode("aTextNode", tempStr, 99);
        DOMText*       textNode = doc->createTextNode(tempStr);
        e1->appendChild(textNode);

        // optionally, call release() to release the resource associated with the range after done
        DOMRange* range = doc->createRange();
        range->release();

        // removedElement is an orphaned node, optionally call release() to release associated resource
        DOMElement* removedElement = root->removeChild(e2);
        removedElement->release();

        // no need to release this returned object which is owned by implementation
        XMLString::transcode("*", tempStr, 99);
        DOMNodeList*    nodeList = doc->getElementsByTagName(tempStr);

        // done with the document, must call release() to release the entire document resources
        doc->release();
    };
         

String Type
 

The C++ DOM uses the plain, null-terminated (XMLCh *) utf-16 strings as the String type. The (XMLCh*) utf-16 type string has low overhead.

   //C++ DOM
   const XMLCh* nodeValue = aNode->getNodeValue();
       

All the string data would remain in memory until the document object is released. But such string data may be RECYCLED by the implementation if necessary. Users should make appropriate copy of any returned string for safe reference.

For example after a DOMNode has been released, the memory allocated for its node value will be recycled by the implementation.

   XMLCh xfoo[] = {chLatin_f, chLatin_o, chLatin_o, chNull};

   // pAttr has node value = "foo"
   // fNodeValue has "foo"
   pAttr->setNodeValue(xfoo);
   const XMLCh* fNodeValue = pAttr->getNodeValue();

   // fNodeValue has "foo"
   // make a copy of the string for future reference
   XMLCh* oldNodeValue = XMLString::replicate(fNodeValue);

   // release the node pAttr
   pAttr->release()

   // other operations
   :
   :

   // implementation may have recycled the memory of the pAttr already
   // so it's not safe to expect fNodeValue still have "foo"
   if (XMLString::compareString(xfoo, fNodeValue))
       printf("fNodeValue has some other content\n");

   // should use your own safe copy
   if (!XMLString::compareString(xfoo, oldNodeValue))
       printf("Use your own copy of the oldNodeValue if want to reference the string later\n");

   // delete your own replicated string when done
   XMLString::release(&oldNodeValue);

       

Or if DOMNode::setNodeValue() is called to set a new node value, the implementation will simply overwrite the node value memory area. So any previous pointers will now have the new value automatically. Users should make appropriate copy of any previous returned string for safe reference. For example

   XMLCh xfoo[] = {chLatin_f, chLatin_o, chLatin_o, chNull};
   XMLCh xfee[] = {chLatin_f, chLatin_e, chLatin_e, chNull};

   // pAttr has node value = "foo"
   pAttr->setNodeValue(xfoo);
   const XMLCh* fNodeValue = pAttr->getNodeValue();

   // fNodeValue has "foo"
   // make a copy of the string for future reference
   XMLCh* oldNodeValue = XMLString::replicate(fNodeValue);

   // now set pAttr with a new node value "fee"
   pAttr->setNodeValue(xfee);

   // should not rely on fNodeValue for the old node value, it may not compare
   if (XMLString::compareString(xfoo, fNodeValue))
       printf("Should not rely on fNodeValue for the old node value\n");

   // should use your own safe copy
   if (!XMLString::compareString(xfoo, oldNodeValue))
       printf("Use your own copy of the oldNodeValue if want to reference the string later\n");

   // delete your own replicated string when done
   XMLString::release(&oldNodeValue);

       

This is to prevent memory growth when DOMNode::setNodeValue() is being called hundreds of times. This design allows users to actively select which returned string should stay in memory by manually copying the string to application's own heap.



XercesDOMParser
 
Constructing a XercesDOMParser
 

In order to use Xerces-C++ to parse XML files using DOM, you can create an instance of the XercesDOMParser class. The example below shows the code you need in order to create an instance of the XercesDOMParser.

    #include <xercesc/parsers/XercesDOMParser.hpp>
    #include <xercesc/dom/DOM.hpp>
    #include <xercesc/sax/HandlerBase.hpp>
    #include <xercesc/util/XMLString.hpp>
    #include <xercesc/util/PlatformUtils.hpp>

    #if defined(XERCES_NEW_IOSTREAMS)
    #include <iostream>
    #else
    #include <iostream.h>
    #endif

    XERCES_CPP_NAMESPACE_USE

    int main (int argc, char* args[]) {

        try {
            XMLPlatformUtils::Initialize();
        }
        catch (const XMLException& toCatch) {
            char* message = XMLString::transcode(toCatch.getMessage());
            cout << "Error during initialization! :\n"
                 << message << "\n";
            XMLString::release(&message);
            return 1;
        }

        XercesDOMParser* parser = new XercesDOMParser();
        parser->setValidationScheme(XercesDOMParser::Val_Always);    
        parser->setDoNamespaces(true);    // optional

        ErrorHandler* errHandler = (ErrorHandler*) new HandlerBase();
        parser->setErrorHandler(errHandler);

        char* xmlFile = "x1.xml";

        try {
            parser->parse(xmlFile);
        }
        catch (const XMLException& toCatch) {
            char* message = XMLString::transcode(toCatch.getMessage());
            cout << "Exception message is: \n"
                 << message << "\n";
            XMLString::release(&message);
            return -1;
        }
        catch (const DOMException& toCatch) {
            char* message = XMLString::transcode(toCatch.msg);
            cout << "Exception message is: \n"
                 << message << "\n";
            XMLString::release(&message);
            return -1;
        }
        catch (...) {
            cout << "Unexpected Exception \n" ;
            return -1;
        }

        delete parser;
        delete errHandler;
        return 0;
    }
          

XercesDOMParser Supported Features
 

The behavior of the XercesDOMParser is dependent on the values of the following features. All of the features below are set using the "setter" methods (e.g. setDoNamespaces), and are queried using the corresponding "getter" methods (e.g. getDoNamespaces). The following only gives you a quick summary of supported features. Please refer to API Documentation for complete detail.

void setCreateEntityReferenceNodes(const bool) 
true:  Create EntityReference nodes in the DOM tree. The EntityReference nodes and their child nodes will be read-only.  
false:  Do not create EntityReference nodes in the DOM tree. No EntityReference nodes will be created, only the nodes corresponding to their fully expanded substitution text will be created.  
default:  true  
note:  This feature only affects the appearance of EntityReference nodes in the DOM tree. The document will always contain the entity reference child nodes.  

void setExpandEntityReferences(const bool) (deprecated)
please use setCreateEntityReferenceNodes
 
true:  Do not create EntityReference nodes in the DOM tree. No EntityReference nodes will be created, only the nodes corresponding to their fully expanded sustitution text will be created.  
false:  Create EntityReference nodes in the DOM tree. The EntityReference nodes and their child nodes will be read-only.  
default:  false  
see:  setCreateEntityReferenceNodes  

void setIncludeIgnorableWhitespace(const bool) 
true:  Include text nodes that can be considered "ignorable whitespace" in the DOM tree.  
false:  Do not include ignorable whitespace in the DOM tree.  
default:  true  
note:  The only way that the parser can determine if text is ignorable is by reading the associated grammar and having a content model for the document. When ignorable whitespace text nodes are included in the DOM tree, they will be flagged as ignorable; and the method DOMText::isIgnorableWhitespace() will return true for those text nodes.  

void setDoNamespaces(const bool) 
true:  Perform Namespace processing.  
false:  Do not perform Namespace processing.  
default:  false  
note:  If the validation scheme is set to Val_Always or Val_Auto, then the document must contain a grammar that supports the use of namespaces.  
see:  setValidationScheme  

void setDoValidation(const bool) (deprecated)
please use setValidationScheme
 
true:  Report all validation errors.  
false:  Do not report validation errors.  
default:  see the default of setValidationScheme  
see:  setValidationScheme  

void setValidationScheme(const ValSchemes) 
Val_Auto:  The parser will report validation errors only if a grammar is specified. 
Val_Always:  The parser will always report validation errors.  
Val_Never:  Do not report validation errors.  
default:  Val_Never  
note:  If set to Val_Always, the document must specify a grammar. If this feature is set to Val_Never and document specifies a grammar, that grammar might be parsed but no validation of the document contents will be performed.  
see:  setLoadExternalDTD  

void setDoSchema(const bool) 
true:  Enable the parser's schema support.  
false:  Disable the parser's schema support.  
default:  false  
note  If set to true, namespace processing must also be turned on.  
see:  setDoNamespaces  

void setValidationSchemaFullChecking(const bool) 
true:  Enable full schema constraint checking, including checking which may be time-consuming or memory intensive. Currently, particle unique attribution constraint checking and particle derivation restriction checking are controlled by this option.  
false:  Disable full schema constraint checking.  
default:  false  
note:  This feature checks the Schema grammar itself for additional errors that are time-consuming or memory intensive. It does not affect the level of checking performed on document instances that use Schema grammars. 
see:  setDoSchema  

void setLoadExternalDTD(const bool) 
true:  Load the External DTD .  
false:  Ignore the external DTD completely.  
default:  true  
note  This feature is ignored and DTD is always loaded if the validation scheme is set to Val_Always or Val_Auto.  
see:  setValidationScheme  

void setExitOnFirstFatalError(const bool) 
true:  Stops parse on first fatal error.  
false:  Attempt to continue parsing after a fatal error.  
default:  true  
note:  The behavior of the parser when this feature is set to false is undetermined! Therefore use this feature with extreme caution because the parser may get stuck in an infinite loop or worse. 

void setValidationConstraintFatal(const bool) 
true:  The parser will treat validation error as fatal and will exit depends on the state of setExitOnFirstFatalError  
false:  The parser will report the error and continue processing.  
default:  false  
note:  Setting this true does not mean the validation error will be printed with the word "Fatal Error". It is still printed as "Error", but the parser will exit if setExitOnFirstFatalError is set to true. 
see:  setExitOnFirstFatalError  

void useCachedGrammarInParse(const bool) 
true:  Use cached grammar if it exists in the pool. 
false:  Parse the schema grammar. 
default:  false  
note:  The getter function for this method is called isUsingCachedGrammarInParse. 
note:  If the grammar caching option is enabled, this option is set to true automatically and any setting to this option by the user is a no-op. 
see:  cacheGrammarFromParse  

void cacheGrammarFromParse(const bool) 
true:  Cache the grammar in the pool for re-use in subsequent parses. 
false:  Do not cache the grammar in the pool 
default:  false  
note:  The getter function for this method is called isCachingGrammarFromParse 
note:  If set to true, the useCachedGrammarInParse is also set to true automatically. 
see:  useCachedGrammarInParse  

void setStandardUriConformant(const bool) 
true:  Force standard uri conformance.  
false:  Do not force standard uri conformance.  
default:  false  
note:  If set to true, malformed uri will be rejected and fatal error will be issued.  

void setCalculateSrcOfs(const bool) 
true:  Enable source offset calculation.  
false:  Disable source offset calculation.  
default:  false  
note:  If set to true, the user can inquire about the current source offset within the input source. Setting it to false (default) improves the performance. 

void setIdentityConstraintChecking(const bool); 
true:  Enable identity constraint checking.  
false:  Disable identity constraint checking.  
default:  true  

void setGenerateSyntheticAnnotations(const bool); 
true:  Enable generation of synthetic annotations. A synthetic annotation will be generated when a schema component has non-schema attributes but no child annotation.  
false:  Disable generation of synthetic annotations.  
default:  false  

setValidateAnnotation 
true:  Enable validation of annotations.  
false:  Disable validation of annotations.  
default:  false  
note:  Each annotation is validated independently.  

setIgnoreAnnotations 
true:  Do not generate XSAnnotations when traversing a schema. 
false:  Generate XSAnnotations when traversing a schema. 
default:  false  

setDisableDefaultEntityResolution 
true:  The parser will not attempt to resolve the entity when the resolveEntity method returns NULL. 
false:  The parser will attempt to resolve the entity when the resolveEntity method returns NULL. 
default:  false  

setSkipDTDValidation 
true:  When schema validation is on the parser will ignore the DTD, except for entities. 
false:  The parser will not ignore DTDs when validating. 
default:  false  
see:  DoSchema 

setIgnoreCachedDTD 
true:  Ignore a cached DTD when an XML document contains both an internal and external DTD, and the use cached grammar from parse option is enabled. Currently, we do not allow using cached DTD grammar when an internal subset is present in the document. This option will only affect the behavior of the parser when an internal and external DTD both exist in a document (i.e. no effect if document has no internal subset). 
false:  Don't ignore cached DTD.  
default:  false  
see:  useCachedGrammarInParse 
setCreateSchemaInfo 
true:  Enable storing of PSVI information in element and attribute nodes.  
false:  Disable storing of PSVI information in element and attribute nodes.  
default:  false  

setCreateCommentNodes 
true:  Enable the parser to create comment nodes in the DOM tree being produced. 
false:  Disable comment nodes being produced.  
default:  true  


XercesDOMParser Supported Properties
 

The behavior of the XercesDOMParser is dependent on the values of the following properties. All of the properties below are set using the "setter" methods (e.g. setExternalSchemaLocation), and are queried using the corresponding "getter" methods (e.g. getExternalSchemaLocation). The following only gives you a quick summary of supported features. Please refer to API Documentation for complete details.

void setExternalSchemaLocation(const XMLCh*) 
Description  The XML Schema Recommendation explicitly states that the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the instance document is only a hint; it does not mandate that these attributes must be used to locate schemas. Similar situation happens to <import> element in schema documents. This property allows the user to specify a list of schemas to use. If the targetNamespace of a schema specified using this method matches the targetNamespace of a schema occurring in the instance document in schemaLocation attribute, or if the targetNamespace matches the namespace attribute of <import> element, the schema specified by the user using this property will be used (i.e., the schemaLocation attribute in the instance document or on the <import> element will be effectively ignored). 
Value  The syntax is the same as for schemaLocation attributes in instance documents: e.g, "http://www.example.com file_name.xsd". The user can specify more than one XML Schema in the list. 
Value Type  XMLCh*  

void setExternalNoNamespaceSchemaLocation(const XMLCh* const) 
Description  The XML Schema Recommendation explicitly states that the inclusion of schemaLocation/ noNamespaceSchemaLocation attributes in the instance document is only a hint; it does not mandate that these attributes must be used to locate schemas. This property allows the user to specify the no target namespace XML Schema Location externally. If specified, the instance document's noNamespaceSchemaLocation attribute will be effectively ignored. 
Value  The syntax is the same as for the noNamespaceSchemaLocation attribute that may occur in an instance document: e.g."file_name.xsd". 
Value Type  XMLCh*  

void useScanner(const XMLCh* const) 
Description  This property allows the user to specify the name of the XMLScanner to use for scanning XML documents. If not specified, the default scanner "IGXMLScanner" is used. 
Value  The recognized scanner names are:
1."WFXMLScanner" - scanner that performs well-formedness checking only.
2. "DGXMLScanner" - scanner that handles XML documents with DTD grammar information.
3. "SGXMLScanner" - scanner that handles XML documents with XML schema grammar information.
4. "IGXMLScanner" - scanner that handles XML documents with DTD or/and XML schema grammar information.
Users can use the predefined constants defined in XMLUni directly (fgWFXMLScanner, fgDGXMLScanner, fgSGXMLScanner, or fgIGXMLScanner) or a string that matches the value of one of those constants. 
Value Type  XMLCh*  
note:   See Use Specific Scanner for more programming details.  

void useImplementation(const XMLCh* const) 
Description  This property allows the user to specify a set of features which the parser will then use to acquire an implementation from which it will create the DOMDocument to use when reading in an XML file. 
Value Type  XMLCh*  

setSecurityManager(Security Manager * const) 
Description  Certain valid XML and XML Schema constructs can force a processor to consume more system resources than an application may wish. In fact, certain features could be exploited by malicious document writers to produce a denial-of-service attack. This property allows applications to impose limits on the amount of resources the processor will consume while processing these constructs.  
Value  An instance of the SecurityManager class (see xercesc/util/SecurityManager). This class's documentation describes the particular limits that may be set. Note that, when instantiated, default values for limits that should be appropriate in most settings are provided. The default implementation is not thread-safe; if thread-safety is required, the application should extend this class, overriding methods appropriately. The parser will not adopt the SecurityManager instance; the application is responsible for deleting it when it is finished with it. If no SecurityManager instance has been provided to the parser (the default) then processing strictly conforming to the relevant specifications will be performed.  
Value Type  SecurityManager*  



DOMBuilder
 
Constructing a DOMBuilder
 

DOMBuilder is a new interface introduced by the W3C DOM Level 3.0 Abstract Schemas and Load and Save Specification. DOMBuilder provides the "Load" interface for parsing XML documents and building the corresponding DOM document tree from vari