Chapter 6. Programming with the Qizx API

Table of Contents

1. What you'll learn
1.1. About the data samples used in this tutorial
1.2. Compiling and running the code samples
2. Creating a Library and populating it with Collections and Documents
2.1. Creating a LibraryManager
2.2. Creating a Library
2.3. Creating Collections and importing Documents
2.4. The dual nature of the Library object: both a database and a transactional session
2.5. Compiling and running the code of this lesson
3. Retrieving Documents stored in a database
3.1. Compiling and running the code of this lesson
4. Querying a database
4.1. Compiling and running the code of this lesson
5. Deleting Documents and Collections
5.1. Compiling and running the code of this lesson
6. Modifying a Document stored in a database
6.1. Compiling and running the code of this lesson
7. Customizing the indexing of XML content
7.1. Re-indexing a Library
7.2. Writing a custom Indexing.NumberSieve
7.3. Compiling and running the code of this lesson
8. Adding metadata to Documents
8.1. Compiling and running the code of this lesson

1. What you'll learn

This edition of Qizx/db does not include a stand-alone server program. It is designed to be embedded in a Java™ application, typically a Servlet. You'll learn in this chapter everything needed to implement a basic application using Qizx/db. For an introduction to using Qizx/db, please see the chapter Getting started.

The target audience of this chapter are experienced Java programmers, having a good knowledge of XML and at least a basic knowledge of XQuery.

This chapter is organized in 7 lessons:

  1. First lesson: how to create a database (Library) and populate it with data (Collections and Documents).

    This lesson is by far the largest one because it contains a refresher about the concepts (LibraryManager, Library, Collection, etc) involved in programming Qizx and also, sidebars about the XML catalog resolver, multi-threading and authorization, which can be skipped on a first reading.

  2. Second lesson: how to make local copies of Documents stored in a database.

  3. Third lesson: how to query a database.

  4. Fourth lesson: how to delete a Document, a Collection or a whole Library.

  5. Fifth lesson: how to modify a Document stored in a database.

  6. Sixth lesson: how to customize the indexing of the XML content and how to re-index a database

  7. Seventh lesson: how to add metadata (properties) to a Document.

1.1. About the data samples used in this tutorial

The directory docs/samples/book_data/ contains several kinds of XML documents. These short, simple XML documents (a few dozens) serve no other purpose than teaching how to program with the Qizx API. In real life, Qizx/db can be expected to store and query hundreds of thousands XML documents of multiple sizes, ranging from a few hundreds of bytes to several hundred megabytes.

Books/

Each document found in this directory contains the description of a Science-Fiction book: its title, authors, editions, etc. Example docs/samples/book_data/Books/The_Robots_of_Dawn.xml:

<book xmlns="http://www.qizx.com/namespace/Tutorial">
  <title>The Robots of Dawn</title>
  <author>Isaac Asimov</author>
  <publicationDate>MCMLXXXIII</publicationDate>
  <editions>
    <edition>
      <ISBN>0553299492</ISBN>
      <publisher>Doubleday</publisher>
      <language>English</language>
      <year>1983</year>
    </edition>
  </editions>
</book>
Publishers/

Each document found in this directory contains the description of a publisher: its name, address, etc. Example docs/samples/book_data/Publishers/Doubleday.xml:

<publisher xmlns="http://www.qizx.com/namespace/Tutorial">
  <trademark>Doubleday</trademark>
  <company>Random House, Inc.</company>
  <address xml:space="preserve">1540 Broadway
New York, NY 10036
US</address>
</publisher>
Authors/

Each document found in this directory contains the description of a Science-Fiction author: her/his name, pseudonyms, birth date, etc. Example docs/samples/book_data/Authors/iasimov.xml:

<author xmlns="http://www.qizx.com/namespace/Tutorial"
  nationality="US" gender="male">
  <fullName>Isaac Asimov</fullName>
  <pseudonyms>
    <pseudonym>Paul French</pseudonym>
    <pseudonym>George E. Dale</pseudonym>
  </pseudonyms>
  <birthDate>January 2, 1920</birthDate>
  <birthPlace>
    <city>Petrovichi</city><country>Russian SFSR</country>
  </birthPlace>
  <blurb location="../Author%20Blurbs/Isaac_Asimov.xhtml"/>
</author>
Author Blurbs/

Each document found in this directory is an XHTML page which is a copy of a Wikipedia article describing a Science-Fiction author. Example docs/samples/book_data/Author Blurbs/Isaac_Asimov.xhtml:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" dir="ltr"
lang="en">
<head>
...
<title>Isaac Asimov - Wikipedia, the free encyclopedia</title>
...
</body>
</html>

The XHTML DTD and the corresponding XML Catalog are found in docs/samples/xhtml_dtd/.

1.2. Compiling and running the code samples

All the code samples used to illustrate this chapter are found in the docs/samples/programming/ directory. Files containing XQuery scripts are found in the docs/samples/book_queries/ directory.

You'll need a recent version of ant, a Java-based build tool[2] to compile and run the codes samples.

2. Creating a Library and populating it with Collections and Documents

The Put class implements a command-line tool allowing to create a Library and populate it with Collections and Documents. More precisely, it allows to copy one or more source files or directories to a single destination Collection or Document. If multiple sources are specified, the destination must be an existing Collection. Moreover the Put class allows to filter what's being copied by the means of a simple java.io.FileFilter.

The outline of this program is (excerpts of Put.java):

        LibraryManager libManager = getLibraryManager(storageDir);1
        Library lib = getLibrary(libManager, libName);2

        LibraryMember dst = lib.getMember(dstPath);
        boolean dstIsCollection = (dst != null && dst.isCollection());

        if (args.length > l+4 && !dstIsCollection) {
            shutdown(lib, libManager);
            usage("'" + dstPath + "', does not exist or is a document");
        }

        try {
            for (int i = l+2; i < last; ++i) {
                File srcFile = new File(args[i]);

                String dstPath2 = dstPath;
                if (dstIsCollection) {
                    dstPath2 = joinPath(dstPath, srcFile.getName());
                }
                put(lib, srcFile, filter, dstPath2);3
            }

            verbose("Committing changes...");
            lib.commit();4
        } finally {
            shutdown(lib, libManager);5
        }

1

Get a LibraryManager. ``Create it'' if it does not exist.

2

Get a Library from the LibraryManager. Create it if it does not exist.

3

For each source directory, create the corresponding Collection in the Library. Assume that each source file is a well-formed XML document and import it in the Library.

4

Commit changes made to the Library.

5

Close the Library. ``Close'' the LibraryManager.

Objects involved:

LibraryManager

A LibraryManager is similar to a database manager. It allows to open or create Libraries.

Library

A Library is similar to a database. If we use the filesystem analogy, a Library is similar to a disk drive.

A Library has a name[3]. A Library always contains a root Collection, named "/", which cannot be deleted.

Collection

If we use the filesystem analogy, a Collection is similar to a directory. It can contain Documents and/or Collections.

Note that nothing forces you to create a hierarchy of Collections. If you prefer, you can import all your Documents in the root Collection.

Document

If we use the filesystem analogy, a Document is similar to a file. Unlike plain files, the content of a Document is always well-formed XML.

LibraryMember

A common term (super-interface) for both Collection and Document.

Like its filesystem counterpart, a LibraryMember has a path. Path components are separated by a slash character "/". The last component is the name of the LibraryMember. The other path components are the names of the ancestor Collections of the LibraryMember, up to the root Collection "/".

Example: "/foo/bar/gee". The name of this LibraryMember is "gee". Its ancestor Collections are, from direct parent to the root: "bar", "foo", "/".

There is no concept of current working Collection, therefore relative paths are not useful.

Note that the name of LibraryMember may contain any character supported by Java™ (including whitespace), except the slash character "/".

Unlike its filesystem counterpart, a LibraryMember may have any number of user-defined properties (meta-data) in addition to its content (that is, XML content for a Document, members for a Collection). More on properties in lesson 7.

2.1. Creating a LibraryManager

    private static LibraryManager getLibraryManager(File storageDir) 
        throws IOException, QizxException {
        LibraryManagerFactory factory = LibraryManagerFactory.getInstance();
        if (storageDir.exists()) {1
            return factory.openLibraryGroup(storageDir);2
        } else {
            if (!storageDir.mkdirs()) {3
                throw new IOException("cannot create directory '" + 
                                      storageDir + "'");
            }

            verbose("Creating library group in '" + storageDir + "'...");
            return factory.createLibraryGroup(storageDir);4
        }
    }

3 1

A LibraryManager stores all its data (XML content, indexes, etc) in a single directory of the filesystem. Creating LibraryManager automatically creates this directory if it does not already exist. In the above code, we have preferred to create the storage directory ``by hand'', before invoking createLibraryGroup. See also How to delete a LibraryManager.

4 2

A LibraryManager is obtained by using the openLibraryGroup or createLibraryGroup methods of a LibraryManagerFactory. The LibraryManagerFactory is itself obtained using LibraryManagerFactory.getInstance.

2.2. Creating a Library

    private static Library getLibrary(LibraryManager libManager,
                                      String libName) 
        throws QizxException {
        Library lib = libManager.openLibrary(libName, /*user*/ null);1
        if (lib == null) {
            verbose("Creating library '" + libName + "'...");
            lib = libManager.createLibrary(libName, /*user*/ null);2
        }
        return lib;
    }

1

openLibrary returns the Library having specified name. It returns null if such Library does not exist.

2

createLibrary creates and then returns the Library having specified name.

2.3. Creating Collections and importing Documents

    private static void put(Library lib,File srcFile, FileFilter filter,
                            String dstPath) 
        throws IOException, QizxException {
        if (srcFile.isDirectory()) {
            Collection collection = lib.getCollection(dstPath);1
            if (collection == null) {
                verbose("Creating collection '" + dstPath + "'...");
                collection = lib.createCollection(dstPath);2
            }

            File[] files = srcFile.listFiles(filter);
            if (files == null) {
                throw new IOException("cannot list directory '" + 
                                      srcFile + "'");
            }

            for (int i = 0; i < files.length; ++i) {
                File file = files[i];
                put(lib, file, filter, joinPath(dstPath, file.getName()));
            }
        } else {
            verbose("Importing '" + srcFile + "' as document '" + 
                    dstPath + "'...");
            lib.importDocument(dstPath, srcFile);3
        }
    }

1

Library has several methods returning a LibraryMember: getCollection, getDocument, getMember. All these methods must be passed absolute paths.

2

A Collection is created by invoking createCollection.

3

A Document is created by invoking one of the several importDocument methods. These methods differ by the types of their source arguments: java.io.File, java.net.URL, org.xml.sax.InputSource, etc. In all cases, the source must contain well-formed XML.

Note that if a Document already exists, importDocument allows to change its content.

Now what if your XML source is not a file? May be your XML source is a W3C DOM Document or a JDOM Document. Or may be you want to dynamically create a Document. In such case, you'll need to use the beginImportDocument and endImportDocument low-level methods.

Example: dynamically create a Document containing "<hello xmlns="http://www.acme.com/ns/test">Hello world!</hello>":

XMLPushStream out = lib.beginImportDocument(docPath);
out.putDocumentStart();
QName helloName = lib.getQName("hello", "http://www.acme.com/ns/test");
out.putElementStart(helloName);
out.putText("Hello world!");
out.putElementElement(helloName);
out.putDocumentEnd();
Document doc = lib.endImportDocument();

The XMLPushStream interface returned by beginImportDocument allows to ``push XML content'' into a Document. This is a pretty low-level interface, similar to SAX. Fortunately, Qizx comes with two handy adapters:

com.qizx.api.util.DOMToPushStream

Copies a W3C DOM document or element to an XMLPushStream. This utility class is used in lesson 5.

com.qizx.api.util.SAXToPushStream

Implements org.xml.sax.ContentHandler, org.xml.sax.ext.LexicalHandler, etc, to convert SAX events to invocations of the corresponding methods in an XMLPushStream.

2.4. The dual nature of the Library object: both a database and a transactional session

A Library is both a database (or a disk drive, if we use the filesystem analogy) and a transactional session allowing to modify and/or query this database. As such, a sequence of changes made to a Library must end with commit or rollback.

        ...
            verbose("Committing changes...");
            lib.commit();1
        } finally {
            shutdown(lib, libManager);
        }
        ...

    private static void shutdown(Library lib, LibraryManager libManager) 
        throws QizxException {
        if (lib.isModified()) {2
            lib.rollback();
        }
        lib.close();3
        libManager.closeAllLibraries(10000 /*ms*/);4
    }

1

The commit method is invoked to commit the changes made to the Library.

2

The shutdown helper is invoked even when the program crashes before committing the changes made to the Library. The isModified method may be used to test this case, becausem a successful commit clears the modified flag. When this error case happens, you need to invoke the rollback method to restore the state of the Library before the changes.

3

Note that the close method raises a QizxException if the database has been modified and commit or rollback have not been invoked.

4

A LibraryManager has no close method. However, you really need to invoke its closeAllLibraries method to stop worker threads. If you don't do that, your application may not be able to exit.

2.5. Compiling and running the code of this lesson

  • Compile class Put by executing ant (see build.xml) in the docs/samples/programming/put/ directory.

  • Create the "Tutorial" library and populate it with all the documents found in docs/samples/book_data/ by running ant run in the docs/samples/programming/put/ directory.

3. Retrieving Documents stored in a database

The Get class implements a command-line tool allowing to make local copies of Collections and Documents stored in a Library. This tool can match the names of the Collections and Documents to be copied against a wildcard. For example, it can be used to make local copies all Documents whose names end with ".xhtml" found in the "/Author Blurbs" Collection (corresponding command-line argument is "/Author Blurbs/*.xhtml").

Warning

For queries to work properly, document imports and updates should first be completed with a commit. Some operations would work even before the commit (like getting the contents of a just imported document), but many operations rely on indexing, and indexing is completed at the time of the commit.

Excerpts of Get.java:

            ...
            LibraryMember libMember = lib.getMember(path);1
            if (libMember == null) {
                error("dont't find '" + path + "'");
                return;
            }

            get(libMember, dstFile);
            ...

   private static void get(LibraryMember libMember, File dstFile) 
        throws IOException, QizxException {
        File dstFile2;
        if (dstFile.isDirectory()) {
            String baseName = libMember.getName();
            if ("/".equals(baseName))
                baseName = "root";

            dstFile2 = new File(dstFile, baseName);
        } else {
            dstFile2 = dstFile;
        }

        if (libMember.isCollection()) {2
            getCollection((Collection) libMember, dstFile2);
        } else {
            getDocument((Document) libMember, dstFile2);
        }
    }

1

Library.getMember returns the LibraryMember (if any) corresponding to specified absolute path.

2

LibraryMember.isCollection may be used to test if this member is a Collection or a Document. You'll also find a LibraryMember.isDocument method.

A local copy of a Document is created as follows:

    private static void getDocument(Document doc, File dstFile) 
        throws IOException, QizxException {
        verbose("Copying document '" + doc.getPath() + 
                "' to file '" + dstFile + "'...");

        FileOutputStream out = new FileOutputStream(dstFile);
        try {
            doc.export(new XMLSerializer(out, "UTF-8"));1
        } finally {
            out.close();
        }
    }

1

The Document.export method used in the above code sample has a XMLPushStream parameter. That is, to export itself, a Document ``pushes its XML content'' (element tags, attributes, text, etc) to an object implementing the XMLPushStream interface.

Qizx comes with a number of useful implementations of the XMLPushStream interface:

com.qizx.api.util.XMLSerializer

Most useful implementation. It allows to save XML content to a java.io.OutputStream and thus, to a File or a String.

com.qizx.api.util.PushStreamToDOM

With this implementation of XMLPushStream, converting a Qizx Document to org.w3c.dom.Document is as simple as:

PushStreamToDOM toDOM = new PushStreamToDOM();
doc.export(toDOM);
org.w3c.dom.Document w3cDOMDoc = toDOM.getResultDocument();
com.qizx.api.util.PushStreamToSAX

With this implementation of XMLPushStream, feeding a Qizx Document into a SAX org.xml.sax.ContentHandler is as simple as:

PushStreamToSAX toSAX = new PushStreamToSAX(handler);
doc.export(toSAX);

The above export method is useful when you want to save, or simply traverse, a Document stored in a Library. There is another Document.export method, this time having no parameters, which is useful when you want to parse a Document stored in a Library. This alternate export method returns an XMLPullStream, that is, a pull parser[8], similar to a StAX parser.

A local copy of a Collection is created as follows:

    private static void getCollection(Collection col, File dstFile) 
        throws IOException, QizxException {
        verbose("Copying collection '" + col.getPath() + 
                "' to directory '" + dstFile + "'...");

        if (!dstFile.isDirectory()) {
            verbose("Creating directory '" + dstFile + "'...");

            if (!dstFile.mkdirs()) {
                throw new IOException("Cannot create directory '" + 
                                      dstFile + "'");
            }
        }

        LibraryMemberIterator iter = col.getChildren();1
        while (iter.moveToNextMember()) {
            LibraryMember libMember = iter.getCurrentMember();

            File dstFile2 = new File(dstFile, libMember.getName());
            
            if (libMember.isCollection()) {
                getCollection((Collection) libMember, dstFile2);
            } else {
                getDocument((Document) libMember, dstFile2);
            }
        }
    }

1

Collection.getChildren returns an iterator which iterates over the Collections and Documents directly contained in a Collection.

You'll also find a variant of the getChildren method which has a LibraryMemberFilter parameter. com.qizx.api.util.GlobFilter is a ready-to-use implementation of LibraryMemberFilter which matches the name (not the full path, just the name) of a LibraryMember against a glob-style (Unix shell) pattern.

About Qizx iterators

The Qizx API contains a number of iterators which work differently from java.util.Iterator (e.g. hasNext, next).

In the Qizx API, an iterator always has a moveToNextXXX method which moves the position of the cursor by one item and a getCurrentXXX which returns the item found at current cursor position.

Invoking getCurrentXXX several times, without invoking moveToNextXXX, is indeed possible and will always return the same item. However initially the cursor is one position before the first item (if any), therefore you need to invoke moveToNextXXX at least once before invoking getCurrentXXX.

3.1. Compiling and running the code of this lesson

  • Compile class Get by executing ant (see build.xml) in the docs/samples/programming/get/ directory.

  • Run ant run in the docs/samples/programming/get/ directory to make local copies of

    • Document "/Authors/pjfarmer.xml",

    • Documents "/Author Blurbs/Philip*",

    • Documents "/Books/The*.xml",

    • Collection "/Publishers".

    in docs/samples/programming/get/tests/out/.

4. Querying a database

Querying a database (that is, a Library) is fairly easy:

Expression expr = lib.compileExpression(script);1
ItemSequence results = expr.evaluate();2        
while (results.moveToNextItem()) {3
    Item result = results.getCurrentItem();

    /*Do something with result.*/
}

1

First compile an XQuery expression using Library.compileExpression. If no compilation errors (CompilationException) are found, this returns an Expression object.

2

Then evaluate the expression using Expression.evaluate. If no evaluation errors (EvaluationException) are found, this returns the results of the evaluation in the form of an ItemSequence.

3

An ItemSequence allows to iterate over a sequence of Items (see About Qizx iterators). A Item is either an atomic value or an XML Node.

Example (1.xq):

(: Compute and return 2 + 3 :)
2 + 3

evaluates to an ItemSequence containing a single atomic value (5).

Example (3.xq):

(: List all books by their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

collection("/Books")//t:book/t:title

evaluates to an ItemSequence containing several t:title element Nodes.

Warning

For queries to work properly, document imports and updates should first be completed with a commit. Some operations would work even before the commit (like getting the contents of a just imported document), but many operations rely on indexing, and indexing is completed at the time of the commit.

The Query class, which implements a command-line tool allowing to query a Library, is more complicated than the above code sample because it supports somewhat advanced options.

Excerpts of Query.java:

    private static Expression compileExpression(Library lib, 
                                                String script,
                                                LibraryMember queryRoot,
                                                QName[] varNames,
                                                String[] varValues) 
        throws IOException, QizxException {
        Expression expr;
        try {
            expr = lib.compileExpression(script);
        } catch (CompilationException e) {
            Message[] messages = e.getMessages();
            for (int i = 0; i < messages.length; ++i) {
                error(messages[i].toString());
            }

            throw e;
        }

        if (queryRoot != null)
            expr.bindImplicitCollection(queryRoot);1

        if (varNames != null) {
            for (int i = 0; i < varNames.length; ++i) {
                expr.bindVariable(varNames[i], varValues[i], /*type*/ null);2
            }
        }

        return expr;
    }

1

Expression.bindImplicitCollection allows to write queries containing paths which are not prefixed with collection("XXX") or doc("YYY").

Example (100.xq), using bindImplicitCollection to bind the expression to collection("/Books"), allows to write:

(: List all books by their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

//t:book/t:title

instead of (3.xq):

(: List all books by their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

collection("/Books")//t:book/t:title

2

An XQuery expression can be further parametrized by the use of variables. Example (101.xq):

(: List all books containing the value of variable $searched 
   in their titles. :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

declare variable $searched external;

collection("/Books")//t:book/t:title[contains(., $searched)]

Expression.bindVariable allows to give a variable its value, prior to evaluating the expression.

Some queries may return thousands of results. Therefore, displaying just a range of results (e.g. from result #100 to result #199 inclusive) is a very common need.

    private static void evaluateExpression(Expression expr, 
                                           int from, int limit) 
        throws QizxException {
        ItemSequence results = expr.evaluate();
        if (from > 0) {
            results.skip(from);1
        }

        XMLSerializer serializer = new XMLSerializer();
        serializer.setIndent(2);

        int count = 0;
        while (results.moveToNextItem()) {
            Item result = results.getCurrentItem();

            System.out.print("[" + (from+1+count) + "] ");
            showResult(serializer, result);
            System.out.println();

            ++count;
            if (count >= limit)2
                break;
        }
        System.out.flush();
    }

1

ItemSequence.skip allows to quickly skip the specified number of Items.

2

This being done, you still need to limit the number of Items you are going to display.

In this lesson, we'll just show how to print the string representation of an Item. In lesson 5, we'll go further and explore the data model of Qizx.

    private static void showResult(XMLSerializer serializer,
                                   Item result) 
        throws QizxException {
        if (!result.isNode()) {1
            System.out.println(result.getString());2
            return;
        }
        Node node = result.getNode();3

        serializer.reset();
        String xmlForm = serializer.serializeToString(node);4
        System.out.println(xmlForm);
    }

1 3

Item.isNode returns true for a Node and false for an atomic value. Similarly, Item.getNode returns a Node when the Item actually is a Node and null when the Item is an atomic value.

2

Item.getString returns the string value of an Item (whether Node or atomic value). What precisely is the string value of an Item is specified in the XQuery standard.

4

The XMLSerializer.serializeToString convenience method is used to obtain the string representation of a Node.

4.1. Compiling and running the code of this lesson

  • Compile class Query by executing ant (see build.xml) in the docs/samples/programming/query/ directory.

  • Run ant run in the docs/samples/programming/query/ directory to perform this query:

    (: Find all books written by French authors. :)
    declare namespace t = "http://www.qizx.com/namespace/Tutorial";
    
    for $a in collection("/Authors")//t:author[@nationality = "France"]
        for $b in collection("/Books")//t:book[.//t:author = $a/t:fullName]
        return 
            $b/t:title

    Note that directory docs/samples/book_queries/ contains all the queries needed to illustrate this lesson and also the following ones. You can execute all these queries by running ant run_all in docs/samples/programming/query/.

5. Deleting Documents and Collections

Class Delete implements a command-line tool allowing to delete one or more Documents or Collections. If no Document or Collection paths are specified as command-line arguments, the tool deletes the whole Library.

Excerpts of Delete.java:

        if (args.length == 2) {
            verbose("Deleting library '" + libName + "'...");
            if (!libManager.deleteLibrary(libName)) {1
                warning("Library '" + libName + "' not found");
            }
            libManager.closeAllLibraries(10000 /*ms*/);
        } else {
            Library lib = libManager.openLibrary(libName, /*user*/ null);

            try {
                for (int i = 2; i < args.length; ++i) {
                    String path = args[i];

                    verbose("Deleting member '" + path + "' of library '" + 
                            libName + "'...");
                    if (!lib.deleteMember(path)) {2
                        warning("Member '" + path + "' of library '" + 
                                libName + "' not found");
                    }
                }

                verbose("Committing changes...");
                lib.commit();
            } finally {
                shutdown(lib, libManager);
            }
        }

1

LibraryManager.deleteLibrary is used to delete a Library. Note that the commit method is not invoked in this case.

2

Library.deleteMember is used to delete a LibraryMember (Document or Collection). Collections are recursively deleted.

How to delete a LibraryManager

Because there is no LibraryManager.delete method, the only way to physically destroy a LibraryManager is, first to ``close'' it using LibraryManager.closeAllLibraries, and then, to delete its storage directory (obtained using LibraryManager.getStorageDirectory).

5.1. Compiling and running the code of this lesson

  • Compile class Delete by executing ant (see build.xml) in the docs/samples/programming/delete/ directory.

  • Run ant run in the docs/samples/programming/delete/ directory to delete Document "/Authors/ktrout.xml"[9].

6. Modifying a Document stored in a database

In Qizx/db, updating a document basically consists of replacing its contents in its entirety.

Note

Higher-level update operations are not currently available, but support of XQuery Update is planned as a top priority and should be available in the next versions. XQuery Update is an extension of XQuery allowing insertions, deletions and updates on selected nodes.

In any respect, executing a XQuery Update script will still basically replace the entire document by a new version. This is a deliberate design choice allowing faster queries.

So the strategy we'll use to do the job is:

  1. Find the Document to be modified by performing a query.

  2. Convert the document found to a W3C DOM Document. This step is needed because the DOM[10] of Qizx is immutable. For example, you'll find a Node.getAttribute method, but no Node.setAttribute method.

  3. Modify the W3C DOM Document.

  4. Replace the content of the Document stored in the Library by the content of the W3C DOM Document.

Unlike the Put, Get, Delete classes which implement generic command-line tools, the Edit class is specific to the dataset used to illustrate this tutorial. The Edit class allows to add a pseudonym to an author. The author is found by her/his full name, and not by the path of the Document containing her/his record.

Excerpts of Edit.java:

        Node author = findAuthor(lib, collectionPath, authorName);1
        if (author == null)
            return;

        if (hasPseudonym(author, pseudonym)) {2
            warning("'" + authorName + "' already has pseudonym '" + 
                    pseudonym + "'");
            return;
        }

        org.w3c.dom.Document doc = 
            (org.w3c.dom.Document) author.getDocumentNode()3.getObject();4
        if (!doAddPseudo(doc, pseudonym))5
            return;

        XMLPushStream out = 
            lib.beginImportDocument(author.getLibraryDocument()6.getPath());7

        DOMToPushStream helper = new DOMToPushStream(lib, out);8
        helper.putDocument(doc);
        lib.endImportDocument();

1

The findAuthor method allows to find an t:author element by the content of its t:fullName child element. Lesson 3 explained how to query a database, so there is nothing new here:

    private static Node findAuthor(Library lib, String collectionPath,
                                   String authorName) 
        throws QizxException {
        Collection collection = lib.getCollection(collectionPath);
        if (collection == null) {
            error("'" + collectionPath + "' is not a collection");
            return null;
        }

        String script = 
            "declare namespace t = '" + TUTORIAL_NS_URI + "';\n" +
            "declare variable $name external;\n" +
            "/t:author[t:fullName = $name]";

        Expression expr = lib.compileExpression(script);
        expr.bindImplicitCollection(collection);
        expr.bindVariable(lib.getQName("name"), authorName, /*type*/ null);

        ItemSequence items = expr.evaluate();
        if (!items.moveToNextItem()) {
            error("Don't find author '" + authorName + "'");
            return null;
        }
        Item item = items.getCurrentItem();

        return item.getNode();
    }

2

The hasPseudonym method is detailed below.

3

Method Node.getDocumentNode is used to access the document Node containing the t:author element Node previously found by the findAuthor method.

4

Method Item.getObject converts an Item to an equivalent JavaObject. In the case of a com.qizx.api.Node, this equivalent is a org.w3c.dom.Node.

5

The doAddPseudo method adds a t:pseudonym descendant to the t:author element using the org.w3c.dom API, which is standard Java™ since version 1.4.

6

We now need to access the Document, that is, the LibraryMember, containing the t:author element Node. Method Node.getLibraryDocument returns this information. Not to be confused with Node.getDocumentNode, which returns the outermost ancestor Node of a Node.

7 8

Library.beginImportDocument, Library.endImportDocument and the com.qizx.api.util.DOMToPushStream helper class allows to import a W3C DOM Document into a Library. This has already been explained in lesson 1.

The hasPseudonym method is a simple example of using the Qizx DOM. It searches its pseudonym argument inside an t:author/t:pseudonyms/t:pseudonym element (author having multiple pseudonyms) or inside a t:author/t:pseudonym element (author having a single pseudonym):

    private static boolean hasPseudonym(Node element, String pseudonym) 
        throws QizxException {
        Node child = element.getFirstChild();1
        while (child != null) {
            if (child.isElement()) {2
                String childName = child.getNodeName().getLocalPart();3
                if ("pseudonyms".equals(childName)) {
                    return hasPseudonym(child, pseudonym);
                } else if ("pseudonym".equals(childName)) {
                    if (pseudonym.equals(child.getStringValue())) {
                        return true;
                    }
                }
            }

            child = child.getNextSibling();4
        }

        return false;
    }

1 4

The Node.getFirstChild and Node.getNextSibling methods allow to iterate over the children of an element or document Node.

Attributes are represented by Nodes too, but are not considered to be children of element Nodes. Attributes are accessed using the Node.getAttribute, Node.getAttributeCount, Node.getAttributes methods.

2

Nodes are not typed. That is, there are no Element, Attribute, Comment, etc, objects. The same Node object is used to represent an element, an attribute, a comment, a processing instruction, a text node or a document.

Method Node.getNodeNature returns the kind of a Node. Node.isElement is just a convenience method.

Methods such as Node.getName, Node.getAttribute, etc, return values depending on the kind of the subject Node. For example, Node.getAttribute returns null for all kinds of Nodes, except for element Nodes.

3

An element Node has a name which is returned by the Node.getName method. In Qizx, an XML name is represented by a com.qizx.api.QName[11] object, and not by a String or a pair of Strings like in the W3C DOM.

A new QName object is obtained using ItemFactory.getQName. A Library extends the ItemFactory interface. Therefore, a QName is generally obtained from a Library.

6.1. Compiling and running the code of this lesson

  • Compile class Edit by executing ant (see build.xml) in the docs/samples/programming/edit/ directory.

  • Run ant run in the docs/samples/programming/edit/ directory to add pseudonym "Kilgore Trout" to author "Philip José Farmer" [9].

7. Customizing the indexing of XML content

7.1. Re-indexing a Library

Query 20.xq:

(: Find all authors born after 1945 (e.g. Lois McMaster Bujold). :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

collection("/")//t:author[t:birthDate > xs:date("1945-01-01Z")]/t:fullName

gives no result because the t:birthDate element is not indexed as a xs:date[12]. The cause of this problem is that the element contains a date in local format (example: November 2, 1949) rather than a standard format (example: 1949-11-02).

This is a case where we need to specify a custom indexing: on the t:birthDate element, a specific string-to-date converter based on the predefined class com.qizx.api.util.text.FormatDateSieve has to be used.

In Qizx/db, custom indexing is defined through an "Indexing Specification" which is in XML format. The syntax and semantics of indexing specifications are described in great details in Chapter 5, Configuring the indexing process.

The indexing specification we will use is in the file indexing.xml:

<indexing xmlns:t="http://www.qizx.com/namespace/Tutorial">
  <!-- Default rules -->1

  <element as="numeric+string"/>
  <element as="date+string" />
  <element as="string" />

  <attribute as="numeric+string" />
  <attribute as="date+string" />
  <attribute as="string" />

  <!-- Custom rules -->

  <element name="t:birthDate" context="t:author" 
           as="date" sieve="com.qizx.api.util.text.FormatDateSieve" 
           format="MMMM d, yyyy" locale="en-US" timezone="GMT" />2

  <element name="t:publicationDate" context="t:book" 
           as="numeric" sieve="RomanNumberSieve" />3
</indexing>

1

Including the default rules before your custom rules is mandatory. If you don't do that, the Library is re-indexed with just the custom rules, which means that many queries will not work.

2

This custom rule specifies that a FormatDateSieve with a US "MMMM d, yyyy" format is to be used to index the content of t:author/t:birthDate elements.

3

More about this other custom rule in Section 7.2, “Writing a custom Indexing.NumberSieve”.

The ReIndex class implements a command-line tool allowing to change the indexing specification of a Library and then to re-index this Library.

        Library lib = libManager.openLibrary(libName, /*user*/ null);

        try {
            verbose("Loading indexing specifications from '" + 
                    indexingFile + "'...");
            Indexing indexing = loadIndexing(indexingFile);1
            lib.setIndexing(indexing);2

            verbose ("Re-indexing library '" + libName + "'...");
            lib.reIndex();3
        } finally {
            shutdown(lib, libManager);
        }

1

The Indexing specification is simply loaded from an XML file by using the Indexing.parse method:

    private static Indexing loadIndexing(File file) 
        throws IOException, SAXException, QizxException {
        Indexing indexing = new Indexing();

        String systemId = file.toURI().toASCIIString();
        indexing.parse(new InputSource(systemId));

        return indexing;
    }

Alternatively, it is possible to programmatically create an Indexing object by invoking methods such as Indexing.addAttributeRule, Indexing.addElementRule, etc.

2

Library.setIndexing changes the indexing specifications of a Library, but does not automatically re-index the Library.

3

Library.reIndex re-indexes a Library. This may take from a few seconds to several hours depending on the size of the Library.

Note that there is no need to invoke Library.commit after reIndex.

7.2. Writing a custom Indexing.NumberSieve

This time, query 21.xq

(: Find all books published before 1960 (e.g. The Caves of Steel). :)
declare namespace t = "http://www.qizx.com/namespace/Tutorial";

collection("/")//t:book[t:publicationDate < 1960]/t:title

gives no result because the t:publicationDate element is not indexed as a number[12]. The reason of this problem is that the element contains a Roman numeral year date (example: "MCMLIV" = 1954).

The predefined string-to-number converter, com.qizx.api.util.text.FormatNumberSieve, is very flexible but not to the point of converting Roman numeral year dates to numbers. Therefore the only way to solve the problem is:

  1. To write a custom string-to-number converter (called a sieve in Qizx parlance), that is, to implement interface Indexing.NumberSieve.

  2. To properly declare this custom sieve in indexing.xml, our custom indexing specification.

      <element name="t:publicationDate" context="t:book" 
               as="numeric" sieve="RomanNumberSieve" />
  3. To make sure that the code of our custom sieve is referenced in the CLASSPATH.

Excerpts of RomanNumberSieve.java:

public final class RomanNumberSieve implements Indexing.NumberSieve {
    ...
    public double convert(String text) {1
        double converted = 0;

        char[] chars = text.trim().toUpperCase().toCharArray();
        int maxSymbolValue = -1;

        for (int j = chars.length-1; j >= 0; --j) {
            char c = chars[j];

            Symbol symbol = null;
            for (int i = 0; i < SYMBOLS.length; ++i) {
                if (SYMBOLS[i].symbol == c) {
                    symbol = SYMBOLS[i];
                    break;
                }
            }
            if (symbol == null) {
                return Double.NaN;
            }

            if (symbol.value >= maxSymbolValue) {
                // Example: second "M" in "MCMXC" (1990).
                maxSymbolValue = symbol.value;
                converted += maxSymbolValue;
            } else {
                // Example: first "C" in "MCMXC" (1990).
                converted -= symbol.value;
            }
        }

        return converted;
    }

    public void setParameters(String[] parameters) {}2
    public String[] getParameters() { return null; }
    ...
}

1

A Indexing.NumberSieve basically converts a String to a double. It should return Double.NaN when the conversion fails.

2

Like all Indexing.Sieves, an Indexing.NumberSieve can be parametrized. This feature is not useful in the case of RomanNumberSieve.

7.3. Compiling and running the code of this lesson

  • Compile class ReIndex by executing ant (see build.xml) in the docs/samples/programming/reindex/ directory.

  • Run ant run in the docs/samples/programming/reindex/ directory to re-index the "Tutorial" Library using indexing.xml, our customized indexing specification.

  • Run ant run2 in the docs/samples/programming/query/ directory to check that the 20.xq and 21.xq queries now return the expected results.

8. Adding metadata to Documents

A LibraryMember, Collection or Document, has not only a content, but also properties. Properties are also explained in the chapter Getting Started.

A property has a name (String) and a value (any Object implementing java.io.Serializable).

Qizx automatically adds a few system properties to all LibraryMembers. The most useful system properties are:

nature

The nature of the LibraryMember: "collection" or "document".

path

The absolute path of the LibraryMember. Example: "/Author Blurbs/Philip_Jose_Farmer.xhtml".

But the real benefit of supporting properties is to allow an application to attach private information to a LibraryMember.

The AddMeta class implements a very specific command-line tool which allows to add metadata[13] to Documents stored in the "/Author Blurbs" Collection. Remember that the Documents stored in that Collection are copies of articles found on