Chapter 9. Writing a documentHook

Table of Contents

1. Implementing the DocumentHook interface

A documentHook is some code written in Java™ notified by XXE each time a document (having a given document type) is created, opened, checked for validity, saved to disk and closed.

This is a very general mechanism which has been created to perform semantic validation beyond what can be done using a DTD or XML-Schema alone but which can also be used to perform many other tasks.

A documentHook which:

has been written to be used as an example in this tutorial.

Compile this documentHook by executing ant (see build.xml) in samples/checklinks/. The build creates checklinks.jar. Then test the documentHook by:

  1. Copying checklinks.incl and checklinks.jar to XXE_install_directory/addon/config/xhtml/.

  2. Including checklinks.incl in the XXE configuration file for XHTML which is XXE_install_directory/addon/config/xhtml/xhtml_strict.xxe.

  3. Restarting XXE.

  4. Loading tests/in/sample2.html into XXE and examining all the problems found by the documentHook for this file (click on the Validity tool tab to display the semantic warnings).

How to deploy a documentHook is detailed in Section 7, “documentHook” in XMLmind XML Editor - Configuration and Deployment.

1. Implementing the DocumentHook interface

The DocumentHook interface is very easy to understand:

MethodDescription
documentCreatedInvoked after a document has been created.
documentOpenedInvoked after a document has been opened.
checkingDocumentInvoked before a document conforming to a DTD or schema is validated.
documentCheckedInvoked after a document conforming to a DTD or schema has been validated.
savingDocumentInvoked before a document is saved to disk.
documentSavedInvoked after a document has been saved to disk.
savingDocumentAsInvoked before a document is saved to a different location.
documentSavedAsInvoked after a document has been saved to a different location.
savingDocumentCopyInvoked before a copy of document being edited is saved to disk.
documentCopySavedInvoked after a copy of document being edited has been saved to disk.
copyingDocumentInvoked before a document is copied to a temporary file by a process command.
documentCopiedInvoked after a document has been copied to a temporary file by a process command.
closingDocumentInvoked before the document being edited is closed.
documentClosedInvoked after a document being edited is closed.

The documentHook used as an example in this tutorial just needs to implement the documentChecked method, therefore it extends adapter class DocumentHookBase rather than implement the above interface.

public class CheckLinks extends DocumentHookBase {
    private static final Name SRC = Name.get("src");
    private static final Name NAME = Name.get("name");
    private static final Name ID = Name.get("id");
    private static final Name HREF = Name.get("href");

    public Diagnostic[] documentChecked(Document doc, int status, 
                                        Diagnostic[] diagnostics) {
        if (status != STATUS_SUCCESS)1
            return diagnostics;

        final ArrayList warnings = new ArrayList();
        final ArrayList links = new ArrayList();
        final HashMap anchors = new HashMap();

        Traversal.traverse(doc.getRootElement(), new Traversal.HandlerBase() {2
            public Object enterElement(Element element) {
                String localName = element.getLocalName();

                String anchorName = null;

                if ("img".equals(localName)) {3
                    String src = element.getAttribute(SRC);

                    if (src != null) {
                        if (src.startsWith("file:/") ||
                            src.startsWith("/") ||
                            src.startsWith("\\\\") ||
                            (src.length() >= 3 &&
                             Character.isLetter(src.charAt(0)) &&
                             src.regionMatches(1, ":\\", 0, 2))) {
                            warnings.add(new DiagnosticImpl(
                              element, 
                              "src attribute looks like an absolute file path",
                              Diagnostic.SEVERITY_SEMANTIC_WARNING));
                        }
                    }
                } else if ("a".equals(localName)) {
                    String href = element.getAttribute(HREF);

                    if (href != null) {
                        if (href.startsWith("#"))4
                            links.add(element);
                    } else {
                        anchorName = element.getAttribute(NAME);5
                        if (anchorName != null) {
                            ArrayList elements = 
                                (ArrayList) anchors.get(anchorName);
                            if (elements == null) {
                                elements = new ArrayList();
                                anchors.put(anchorName, elements);
                            }

                            elements.add(element);
                        }
                    }
                }

                String id = element.getAttribute(ID);6
                if (id != null && !id.equals(anchorName)) {
                    ArrayList elements = (ArrayList) anchors.get(id);
                    if (elements == null) {
                        elements = new ArrayList();
                        anchors.put(id, elements);
                    }

                    elements.add(element);
                }

                return null;
            }
        });

        int count = links.size();
        for (int i = 0; i < count; ++i) {7
            Element element = (Element) links.get(i);

            String id = element.getAttribute(HREF).substring(1);

            if (!anchors.containsKey(id))
                warnings.add(new DiagnosticImpl(
                    element, 
                    "reference to non-existent name or id '" + id + "'", 
                    Diagnostic.SEVERITY_SEMANTIC_WARNING));
        }

        Iterator iter = anchors.entrySet().iterator();8
        while (iter.hasNext()) {
            Map.Entry entry = (Map.Entry) iter.next();

            String id = (String) entry.getKey();
            ArrayList elements = (ArrayList) entry.getValue();

            count = elements.size();
            for (int i = 1; i < count; ++i) {
                warnings.add(new DiagnosticImpl(
                    (Element) elements.get(i), 
                    "name or id '" + id + "' already defined", 
                    Diagnostic.SEVERITY_SEMANTIC_WARNING));
            }
        }

        int warningCount = warnings.size();9
        if (warningCount > 0) {
            Diagnostic[] diagnostics2 = 
                new Diagnostic[diagnostics.length + warningCount];

            System.arraycopy(diagnostics, 0, diagnostics2, 0, 
                             diagnostics.length);
            for (int i = 0; i < warningCount; ++i)
                diagnostics2[diagnostics.length+i] = 
                    (Diagnostic) warnings.get(i);

            diagnostics = diagnostics2;
        }

        return diagnostics;
    }
}

1

If the checkingDocument method has been invoked, the documentChecked method is guaranteed to be invoked too, even if the passed status argument may be different from STATUS_SUCCESS.

This is the case for all the methods of a DocumentHook: savingDocument/documentSaved, closingDocument/documentClosed, etc, except documentCreated and documentOpened.

Example 1: saving the document fails because of an I/O error. documentSaved is nevertheless invoked with a STATUS_FAILED status.

Example 2: closing the document fails because an OpenedDocumentHook has vetoed this operation (may be because this component has detected uncommitted changes which could be lost). documentClosed is nevertheless invoked with a STATUS_VETOED status.

2

Document is traversed using the Traversal utility. The anonymous Traversal.Handler

  • will check <img src="..."> and will possibly add semantic warnings to ArrayList warnings,

  • will add elements <a href="#..."> to ArrayList links,

  • will add elements <a name="..."> or elements having an id attribute to HashMap anchors.

3

Img elements are checked here.

If the value of the src attribute looks like an absolute file path, a DiagnosticImpl structure describing the semantic warning is added to ArrayList warnings.

4

Elements <a href="#..."> are added to the ArrayList links here. Verification is done in a subsequent pass.

5

Elements <a name="..."> are added to HashMap anchors here. Verification is done in a subsequent pass.

6

Elements having an id attribute are added to HashMap anchors here. Verification is done in a subsequent pass.

Note that a <a> element often has both a name and an id attribute set to the same value and that this should not be considered as an error.

7

Elements contained in ArrayList links referencing an unknown name or id are detected here.

8

Elements contained in HashMap anchors having a name or id already in use are detected here.

9

If semantic warnings have been found for the document, they are added to the list of Diagnostic passed as an argument and the augmented list is returned as the result of the documentChecked method.