|
| | | | What should I be using instead of Xerces' XML, HTML or XHTML serializers?
| | | | |
| |
As of the 2.9.0 release Xerces-J began sharing a common serialization codebase with Xalan and now includes serializer.jar
with its distribution for DOM Level 3 serialization support. The entire org.apache.xml.serialize package was
deprecated in Xerces 2.9.0. The HTML and XHTML serializers were previously deprecated in the
Xerces 2.6.2 release. You can find more details about the rationale for this decision here in the
archives.
If you want to achieve interoperability and avoid using deprecated APIs, you should not be using Xerces serialization code directly.
Instead, the JAXP Transformer API should be used to serialize HTML, XHTML, and SAX. The DOM Level 3 Load and Save API (or JAXP Transformer API) should be used to serialize DOM.
Using DOM Level 3 you can serialize XML as follows:
| | | | import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.Document;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSSerializer;
import org.w3c.dom.ls.LSOutput;
...
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl =
(DOMImplementationLS)registry.getDOMImplementation("LS");
...
LSSerializer writer = impl.createLSSerializer();
LSOutput output = impl.createLSOutput();
output.setByteStream(System.out);
writer.write(document, output); | | | | |
Using JAXP you can serialize HTML and XHTML as follows:
| | | |
// Create an "identity" transformer - copies input to output
Transformer t = TransformerFactory.newInstance().newTransformer();
// for "XHTML" serialization, use the output method "xml"
// and set publicId as shown
t.setOutputProperty(OutputKeys.METHOD, "xml");
t.setOutputProperty(OutputKeys.DOCTYPE_PUBLIC,
"-//W3C//DTD XHTML 1.0 Transitional//EN");
t.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM,
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd");
// For "HTML" serialization, use
t.setOutputProperty(OutputKeys.METHOD, "html");
// Serialize DOM tree
t.transform(new DOMSource(doc), new StreamResult(System.out)); | | | | |
|
| | | | What international encodings are supported by Xerces-J? | | | | |
| |
- UTF-8
- UTF-16 Big Endian and Little Endian
- UCS-2 (ISO-10646-UCS-2) Big Endian and Little Endian
- UCS-4 (ISO-10646-UCS-4) Big Endian and Little Endian
- IBM-1208
- ISO Latin-1 (ISO-8859-1)
-
ISO Latin-2 (ISO-8859-2) [Bosnian, Croatian, Czech,
Hungarian, Polish, Romanian, Serbian (in Latin transcription),
Serbocroatian, Slovak, Slovenian, Upper and Lower Sorbian]
- ISO Latin-3 (ISO-8859-3) [Maltese, Esperanto]
- ISO Latin-4 (ISO-8859-4)
- ISO Latin Cyrillic (ISO-8859-5)
- ISO Latin Arabic (ISO-8859-6)
- ISO Latin Greek (ISO-8859-7)
- ISO Latin Hebrew (ISO-8859-8)
- ISO Latin-5 (ISO-8859-9) [Turkish]
- ISO Latin-7 (ISO-8859-13)
- ISO Latin-9 (ISO-8859-15)
- Extended Unix Code, packed for Japanese (euc-jp, eucjis)
- Japanese Shift JIS (shift-jis)
- Chinese (big5)
- Chinese for PRC (mixed 1/2 byte) (gb2312)
- Japanese ISO-2022-JP (iso-2022-jp)
- Cyrillic (koi8-r)
- Extended Unix Code, packed for Korean (euc-kr)
- Russian Unix, Cyrillic (koi8-r)
- Windows Thai (cp874)
- Latin 1 Windows (cp1252) (and all other cp125? encodings recognized by IANA)
- cp858
- EBCDIC encodings:
- EBCDIC US (ebcdic-cp-us)
- EBCDIC Canada (ebcdic-cp-ca)
- EBCDIC Netherland (ebcdic-cp-nl)
- EBCDIC Denmark (ebcdic-cp-dk)
- EBCDIC Norway (ebcdic-cp-no)
- EBCDIC Finland (ebcdic-cp-fi)
- EBCDIC Sweden (ebcdic-cp-se)
- EBCDIC Italy (ebcdic-cp-it)
- EBCDIC Spain, Latin America (ebcdic-cp-es)
- EBCDIC Great Britain (ebcdic-cp-gb)
- EBCDIC France (ebcdic-cp-fr)
- EBCDIC Hebrew (ebcdic-cp-he)
- EBCDIC Switzerland (ebcdic-cp-ch)
- EBCDIC Roece (ebcdic-cp-roece)
- EBCDIC Yugoslavia (ebcdic-cp-yu)
- EBCDIC Iceland (ebcdic-cp-is)
- EBCDIC Urdu (ebcdic-cp-ar2)
- Latin 0 EBCDIC
- EBCDIC Arabic (ebcdic-cp-ar1)
|
|