1.2. Element copyDocument

<copyDocument
  to = Path
  selection = boolean : false
  preserveInclusions = boolean : false
  filterDuplicateIDs = boolean : true
  saveCharsAsEntityRefs = boolean : false
  indent = boolean : false  
  encoding = (ISO-8859-1|ISO-8859-13|ISO-8859-15|ISO-8859-2|
              ISO-8859-3|ISO-8859-4|ISO-8859-5|ISO-8859-7|
              ISO-8859-9|KOI8-R|MacRoman|US-ASCII|UTF-16|UTF-8|
              Windows-1250|Windows-1251|Windows-1252|Windows-1253|
              Windows-1257) : UTF-8

>
  Content: [ extract ]* [ resources ]*
</copyDocument>

<extract
  xpath = Absolute XPath (subset)
  dataType = anyURI|hexBinary|base64Binary|XML
  toDir = Path
  baseName = File basename without an extension
  extension = file name extension
>
  <processingInstruction
    target = Name
    data = string
  /> |
  <attribute
    name = QName
    value = string
  /> | any element
</extract>

<resources
  match = Regexp pattern
  copyTo = Path
  referenceAs = anyURI
/>

Copy document being edited to the location specified by required attribute to.

1.2.1. Attributes

AttributeDescription
toSpecifies the file where the document (or the node selection) is to be copied.
selection

If this attribute is specified with value true and if an element is explicitly selected, this element is saved to the specified location.

If multiple nodes are explicitly selected, their parent element is saved and a special processing-instruction <?select-child-nodes>, specifying which nodes are selected, is added to the root element of the saved document.

Example, the user has selected paragraphs with content 2, 3 and 4:

<div>
  <?select-child-nodes 3-5?>
  <p>1</p>
  <p>2</p>
  <p>3</p>
  <p>4</p>
</div>

In the above example, 3-5 is a node range intended to be tested using position(), the XPath built-in function. See Section 2.1, “Convert explicitly or implicitly selected para to a formalpara below to learn how to handle such multiple node selection in the XSLT style sheet.

Otherwise, it is the whole document which is saved to the specified location.

preserveInclusions

If this attribute is specified with value true , the generated XML file contains

  • references to external entities,

  • transclusion directives (e.g. XInclude).

Otherwise (default value),

  • references are replaced by the contents of the external entities,

  • transclusion directives (e.g. XInclude) are replaced by transcluded contents.

filterDuplicateIDs

Ignored unless preserveInclusions is set to false, that is, ignored unless the generated XML file contains transclusions.

If this attribute is specified with value true (default value), an attempt is made to remove duplicate ID errors resulting from the presence of transcluded contents. This is done by adding a unique automatically generated suffix to these “false” duplicate IDs.

saveCharsAsEntityRefs

If this attribute is specified with value true, the generated XML file contains references to character entities such as &eacute; (if needed to and if such entities are defined in the DTD of the document being edited).

Otherwise, the generated XML file contains character references such as &#233; (if needed to).

indent

If this attribute is specified with value true, the generated XML file is indented.

Otherwise, the generated XML file is not indented.

encodingSpecifies the encoding of the generated XML file.

1.2.2. Element extract

<extract
  xpath = Absolute XPath (subset)
  dataType = anyURI|hexBinary|base64Binary|XML
  toDir = Path
  baseName = File basename without an extension
  extension = File name extension
>
  <processingInstruction
    target = Name
    data = string
  /> |
  <attribute
    name = QName
    value = string
  /> |
  any element
</extract>

The extract element is designed to ease the writing of XSLT style sheets that need to transform XML documents where binary images (TIFF, PNG, etc) or XML images (typically SVG) are embedded.

In order to do this, the extract element copies the image data found in the element or the attribute specified by attribute xpath to a file created in the directory specified by attribute toDir.

The name of the image is automatically generated by extract. However, attributes baseName and extension may be used to parametrize to a certain extent the generation of the image file name.

Now the question is: how does the XSLT style sheet know about the ``extracted'' image files? The extract element offers three options:

  • Replace the element containing image data by the one specified as a child element of extract.

    If xpath selects an attribute instead of an element, the element containing the selected attribute is replaced.

    DocBook example: replace embedded svg:svg (allowed in "-//OASIS//DTD DocBook SVG Module V1.0//EN") by much simpler imagedata:

    <cfg:extract xmlns="" xpath="//imageobject/svg:svg" toDir="raw">
      <imagedata fileref="resources/{$url.rootName}.png" />
    </cfg:extract>
  • OR, replace the element containing image data by the attribute which is specified using the attribute child element of extract. This attribute is added to the parent element of the element containing image data.

    If xpath selects an attribute instead of an element, the element containing the selected attribute is replaced.

    DocBook 5 example: replace embedded db5:imagedata/svg:svg by db5:imagedata/@fileref:

    <cfg:extract xmlns=""
                 xmlns:db5="http://docbook.org/ns/docbook"
                 xmlns:svg="http://www.w3.org/2000/svg"
                 xpath="//db5:imagedata/svg:svg" toDir="raw" >
      <cfg:attribute name="fileref" 
                     value="resources/{$url.rootName}.png" />
    </cfg:extract>
  • OR, more general approach, insert a processing instruction (which is specified using the processingInstruction child element of extract) at the beginning of the element from which data has been extracted.

    If xpath selects an attribute instead of an element, the processing instruction is inserted in the element containing the selected attribute.

    Example: insert <?extracted extracted_file_name?> in imgd:image_ab and imgd:image_eb:

    <extract xpath="//imgd:image_ab/@data | //imgd:image_eb" toDir="raw">
      <processingInstruction target="extracted" 
                             data="resources/{$url.rootName}.png" />
    </extract>

The replacement element (attribute values or text nodes in the element or in any of its descendant) and the inserted processing instruction (target and data) can reference the following variables which are substituted by their values during the extraction step:

VariableValue
{$file.path}Pathname of the extracted image file. Example: "/tmp/xxe1234/book_image_3.svg".
{$file.parent}Pathname of the directory containing the extracted image file. Example: "/tmp/xxe1234/".
{$file.name}Name of the extracted image file. Example: "book_image_3.svg".
{$file.rootName}Name of the extracted image file, but without an extension. Example: "book_image_3".
{$file.extension}Extension of the extracted image file name. Example: "svg".
{$file.separator}

Native path component separator of the platform. Example: '\' on Windows.

{$url}

URL of the extracted image file. Example: "file:///tmp/xxe1234/book_image_3.svg".

[Note]

Unlike {$file.XXX} variables, the values of {$url.XXX} variables are escaped if needed to.

{$url.parent}URL of the directory containing the extracted image file. Example: "file:///tmp/xxe1234". Note that this URL does not end with a '/'.
{$url.name}Name of the extracted image file. Example: "book_image_3.svg".
{$url.rootName}Name of the extracted image file, but without an extension. Example: "book_image_3".
{$url.extension}Extension of the extracted image file name. Example: "svg".

In fact, any XPath expression (full XPath 1.0, not just the subset used in attribute xpath), not only variable references, can be put between curly braces (example: {./@id}). Such XPath expressions are evaluated as strings in the context of the element selected by attribute xpath. If attribute xpath selects an attribute, its parent element is used as an evaluation context for the XPath expression.

Attributes:

xpath

Selects elements and attributes containing the image data to be extracted.

This XPath expression must conform to the XPath subset needed to implement W3C XML Schemas (but not only relative paths, also absolute paths).

dataType

Specifies how the image data is ``stored'' in the elements or the attributes selected by the above XPath expression: anyURI, hexBinary, base64Binary or XML. This cannot be guessed for documents conforming to a DTD and for documents not constrained by a grammar.

Default: find the data type using the grammar of the document being processed.

toDir

Specifies the directory where extracted image files are to be created. Relative directories are relative to the temporary directory created during the execution of the process (that is, %W).

Default: use the temporary directory created during the execution of the process (that is, %W).

baseName

Specifies the start of the extracted image file names. An automatically generated part is always added after this user prefix.

Default: the base name of an extracted image file is automatically generated in its entirety.

extension

Specifies which extension to use for extracted image file names. Specifying "svgz" for extracted SVG images allows to create compressed SVG files.

Default: the extension is guessed by XXE for a number of common image formats.

1.2.3. Element resources

<resources
  include = NMTOKENS
  exclude = NMTOKENS
  match = Regexp pattern
  resolve = boolean : false
  copyTo = Path
  referenceAs = anyURI
/>

The resources child element specifies what to do with the resources which are logically part of the document.

The resources which are logically part of the document are specified using another configuration element: documentResources (see Section 10, “documentResources” in XMLmind XML Editor - Configuration and Deployment). DocBook example:

<cfg:documentResources xmlns="">
  <cfg:resource kind="image" path="//imagedata/@fileref"/>
  <cfg:resource kind="image" path="//graphic/@fileref"/>
  <cfg:resource kind="image" path="//inlinegraphic/@fileref"/>
  <cfg:resource kind="text" path="//textdata/@fileref"/>
  <cfg:resource kind="audio" path="//audiodata/@fileref"/>
  <cfg:resource kind="video" path="//videodata/@fileref"/>
</cfg:documentResources>

Note that elements replaced during an extraction step specified by the extract element are never scanned for resources.

The default resources child elements are:

<resources match="^[a-zA-Z][a-zA-Z0-9.+-]*:/.+" />
<resources match=".+" copyTo="." />

Attributes of the resources child element specifying how to match a resource:

match

For each resource of the document specified by the documentResources element, its URI is tested to see if it matches the first resources child element. If it does not match the first resources child element, the second resources child element is tried and so on until a matching resources child element is found.

If the matching resources element has no resolve, copyTo or referenceAs attribute, the matched resource is ignored. For example, rule <resources match="^[a-zA-Z][a-zA-Z0-9.+-]*:/.+"/> is designed to ignore resources of any kind having an absolute URL.

include

This attribute contains one or more kinds of resources separated by whitespace. Example related to the above DocBook example: include="image".

Unless the resource being processed has been given a kind and unless this kind is referenced in attribute include of element resources, the action corresponding to element resources is skipped.

exclude

This attribute contains one or more kinds of resources separated by whitespace. Example related to the above DocBook example: exclude="text image".

If the resource being processed has been given a kind and if this kind is referenced in attribute exclude of element resources, the action corresponding to element resources is skipped.

Attribute exclude has priority over attribute include.

Attributes of the resources child element specifying an action on the matched resource:

resolve

If resolve="true", attributes copyTo and referenceAs are ignored. Instead, in the copy of the document, the relative URI of the matched resource is replaced by its equivalent absolute URI.

Example:

<resources include="text" match=".+"
           resolve="true"/>

Let's suppose document file:///docs/doc.xml references text resource examples/sample1.txt. The copy of the document will reference absolute URI file:///docs/examples/sample1.txt.

copyTo

Specifies where to copy the matched resource. This can be a file name or a directory name.

The value of this attribute can contain $1, $2, ..., $9 variables, which are substituted with the substrings matching the parenthesized groups of the match regular expression.

Example:

<resources match="(?:.+/)?(.+)\.jpg"
           copyTo="resources/$1.jpeg"/>

Let's suppose the document references resource images/logo.jpg. File logo.jpg will be copied to resources/logo.jpeg and the copy of the document will reference resources/logo.jpeg.

referenceAs

Specifies the reference to the resource in the document created by the copyDocument configuration element.

Like for copyTo, the value of this attribute may contain $1, $2, ..., $9 variables.

Generally, this attribute is not needed because the reference implied by the value of the copyTo attribute is sufficient. But this attribute can be useful if images are to be converted from their original format to a format supported by the target XSL-FO processor.

DocBook example:

<process>
  <mkdir dir="resources"/>
  <mkdir dir="raw"/>

  <copyDocument to="__doc.xml">
        <resources match="^[a-zA-Z][a-zA-Z0-9.+-]*:/.+"/>

    <resources include="text" match=".+" 
               resolve="true"/>

    <resources include="image" match=".+\.(png|jpg|jpeg|gif)" 
               copyTo="resources"/>
    <resources include="image" match="(?:.+/)?(.+)\.(\w+)"
               copyTo="raw" referenceAs="resources/$1.png"/>

    <resources exclude="text image" match=".+" 
               copyTo="resources"/>
  </copyDocument>

  <convertImage from="raw" to="resources" format="png"/>
  ...
</process>