Table of Contents
XMLmind XML Editor Document Object Model (DOM) is somewhat similar though, in our opinion, simpler than W3C DOM or JDOM. This chapter describes how to program XXE DOM using AddTOC.java as an example.
This sample program:
loads a XHTML file,
traverses the loaded document searching for h1, h2, h3 headings,
adds an empty <a name="tocentry to each of these headings,NNN"/>
for each of the traversed headings, adds an indented line containing <a href="#tocentry to the NNN">text of the heading</a>div that will be used as a TOC,
inserts the div used as a TOC as first child of body,
saves modified document to disk.
import java.io.File;
import java.io.IOException;
import com.xmlmind.xmledit.xmlutil.*;
import com.xmlmind.xmledit.doc.*;
import com.xmlmind.xmledit.doctype.DocumentType;
import com.xmlmind.xmledit.edit.Loader;
import com.xmlmind.xmledit.edit.Formatter;
public class AddTOC {
private static final Name BODY = Name.get("body");
private static final Name DIV = Name.get("div");
private static final Name H1 = Name.get("h1");
private static final Name H2 = Name.get("h2");
private static final Name H3 = Name.get("h3");
private static final Name A = Name.get("a");
private static final Name BR = Name.get("br");
private static final Name CLASS = Name.get("class");
private static final Name NAME = Name.get("name");
private static final Name HREF = Name.get("href");
private static final class Info {
public int headingCount;
public Element toc;
public Element body;
}
public static void processDocument(Document doc) {
final Info info = new Info();
Element b = new Element(Name.get("b"));
b.putAttribute(CLASS, "toctitle");
b.appendChild(new Text("Contents"));
info.toc = new Element(DIV);
info.toc.putAttribute(CLASS, "toc");
info.toc.appendChild(b);
Traversal.traverse(doc.getRootElement(), new Traversal.HandlerBase() {
public Object enterElement(Element element) {
Name name = element.getName();
if (name == H1 || name == H2 || name == H3) {
processHeading(element, info);
return Traversal.LEAVE_ELEMENT;
} else {
if (name == BODY)
info.body = element;
return null;
}
}
});
if (info.body != null) {
info.toc.appendChild(new Element(BR));
info.toc.appendChild(new Element(Name.get("hr")));
add(info.body, info.toc);
}
}
private static void processHeading(Element heading, Info info) {
String id = "tocentry" + info.headingCount++;
Element target = new Element(A);
target.putAttribute(CLASS, "tocentry");
target.putAttribute(NAME, id);
add(heading, target);
Traversal.TextGrabber grabber = new Traversal.TextGrabber();
Traversal.traverse(heading, grabber);
String headingText =
XMLUtil.collapseWhiteSpace(grabber.grabbed.toString());
Element link = new Element(A);
link.putAttribute(HREF, "#" + id);
link.appendChild(new Text(headingText));
int indentation;
Name headingName = heading.getName();
if (headingName == H1)
indentation = 4;
else if (headingName == H2)
indentation = 8;
else
indentation = 12;
StringBuffer indent = new StringBuffer();
while (indentation > 0) {
indent.append('\u00A0'); //
--indentation;
}
info.toc.appendChild(new Element(BR));
info.toc.appendChild(new Text(indent.toString()));
info.toc.appendChild(link);
}
private static void add(Element parent, Element added) {
Name addedName = added.getName();
String addedClass = added.getAttribute(CLASS);
boolean replaced = false;
loop: for (Node child = parent.getFirstChild();
child != null;
child = child.getNextSibling()) {
switch (child.getNodeType()) {
case Node.TEXT:
case Node.COMMENT:
case Node.PROCESSING_INSTRUCTION:
break;
case Node.ELEMENT:
{
Element element = (Element) child;
if (element.getName() == addedName &&
addedClass.equals(element.getAttribute(CLASS))) {
parent.replaceChild(element, added);
replaced = true;
break loop;
}
}
break;
}
}
if (!replaced)
parent.insertChild(parent.getFirstChild(), added);
}Compile AddTOC by executing ant (see build.xml) in the samples/addtoc/ directory.
Run AddTOC by executing ant run in the samples/addtoc/ directory. This adds a TOC to tests/in/sample1.html and saves modified document to tests/out/sample1.html. You may want to open the generated file in a Web browser to check what has been done.
Element and attribute names are not Strings, they are Name objects. A Name is the aggregation of a Namespace object and a String local part.
Names and Namespaces are managed as symbols in a symbol table. For example, it is not possible to invoke Because of this, Names and Namespaces can be compared for equality using |
A document is composed of Nodes: Text, Comment, ProcessingInstruction, Element, DocumentTypeDeclaration, Document. Notice that a document is itself a Node. Document and Element are Trees, that is, Node containers. Attributes are not Nodes. Attribute is just a simple data structure which groups together the attribute name, the attribute value and the element having the attribute. This simple data structure is mainly used by the Function The Element has many convenience functions to access its attributes or child nodes, for example: |
Traversal is a set of utility functions that can be used to traverse a Tree in both directions ( During the traversal, Traversal functions notify a Traversal.Handler which must implement: Traversal.HandlerBase can be used as the base class of a handler if most notifications methods are not useful. Traversal can be controlled by returning a value from notification methods. Return In the AddTOC example, | |
You do not always need to define your own Traversal.Handler. Class Traversal contains many predefined, ready-to-use, Traversal.Handlers for simple tasks. Traversal.TextGrabber used in the AddTOC example is one of them. You'll also find Travsersal.TextNodeFinder, Traversal.NodeMatcher, etc. | |
XMLUtil contains a lot of utility functions related to lexical aspects of XML. It defines functions that trim whitespaces, that escape and unescape XML text and attribute values, that escape and unescape URIs, etc. |
public static void main(String[] args) throws IOException {
if (args.length != 2) {
System.err.println(
"usage: java AddTOC in_xhtml_file out_xhtml_file");
System.exit(1);
}
String inFileName = args[0];
String outFileName = args[1];
Loader docLoader = new Loader();
docLoader.setAddedProperties(0x0);
Document doc = docLoader.load(inFileName);
DocumentType docType =
(DocumentType) doc.getProperty(StandardProperty.DOCUMENT_TYPE);
AddTOC.processDocument(doc);
Formatter docWriter = new Formatter(docType);
docWriter.writeDocument(doc, outFileName);
}Compared to low-level DocumentLoader, Loader has many advantages.
Both loaders are XML catalog aware. Note that in build.xml we use system property | |
Low-level DocumentWriter not being DocumentType aware, it cannot output indented XML. Therefore in the AddTOC example, we rather user Formatter. |
Trees, that is Documents and Elements, can have application-level properties. These properties are generally added by document loaders at load time but nothing prevents a programmer to add and remove its own properties at any time. What follows is a comparison between Element attributes and properties.
Properties used to implement XXE have their names defined as constants in StandardProperty. One such key is |