2.2. Custom edit steps

2.2.1. Using property configuration_name.pasteFromWord.parameter.base

Edit steps are implemented by XED scripts in XMLmind XML Editor - Support of XPath 1.0. The -s engine option specifies which main XED script is to be run to implement phase #2. Therefore custom edit steps could be implemented as follows:

  1. Create your custom main XED script. Let's call this file custom_main.xed. This file is found in the folder containing your DocBook 5 customization.

    include "paste-from-word:xed/main.xed";
    
    ...YOUR CUSTOM CODE HERE...
  2. Use property configuration_name.pasteFromWord.parameter to run it.

    <property name="$c.pasteFromWord.parameter">
      ...
      -s custom_main.xed
      ...
    </property>
  3. As is, option -s custom_main.xed will not work because a relative filename is resolved against the current working directory. For this -s option to work, your DocBook 5 customization file must additionally define property configuration_name.pasteFromWord.parameter.base as follows:

    <property name="$c.pasteFromWord.parameter.base" url="true">.</property>

    When this property is defined, command pasteFromWord uses it to resolve any relative URL found in its parameters.

2.2.2. Inserting, replacing and removing edit steps in stock main.xed

The procedure explained above has been introduced mainly to explain the use of property configuration_name.pasteFromWord.parameter.base. However we don't recommend to use it because firstly, there is a more convenient way to customize stock main.xed and secondly, you'll rarely want to perform your custom editing after stock main.xed finishes its work.

Script addon_install_dir/xed/main.xed looks like this:

script(defined1("before.after-parse", ""));2
script(defined("do.after-parse", "after-parse.xed"));3
script(defined("after.after-parse", ""));4

script(defined("before.styles", ""));
invoke(defined("do.styles",
               "com.xmlmind.xmleditext.paste_from_word.engine.SetStyles"));5
script(defined("after.styles", ""));
...

1

XPath extension defined() is documented here object defined(string variable-name, default-value?) in XMLmind XML Editor - Support of XPath 1.0.

2

Parameter before.after-parse defaults to "". Therefore, by default, there is no script which is run before after-parse.xed.

This also means that if you want to run a custom script before step after-parse, then pass option "-pu before.after-parse my_script.xed" to the engine.

3

If you want to suppress step after-parse, pass option "-p do.after-parse ''" to the engine.

If you want to replace step after-parse, pass option "-pu do.after-parse my_script.xed" to the engine.

4

Parameter after.after-parse defaults to "". Therefore, by default, there is no script which is run after after-parse.xed.

This also means that if you want to run a custom script after step after-parse, then pass option "-pu after.after-parse my_script.xed" to the engine.

5

If you want to suppress compiled step styles, pass option "-p do.styles ''" to the engine.

There is no direct way to replace a compiled step. You must first suppress it and then specify the corresponding after.step_name parameter.

Convenience engine options -i, -a, -d, and -r makes what explained above even easier to use. For example, "-i after-parse my_script.xed" is equivalent to "-pu before.after-parse my_script.xed".

2.2.3. A real world example

The problem to be solved is converting paragraphs styled using MS-Word user-defined style ProgramListing to an XHTML pre element.

Style ProgramListing has the following characteristics:

  • Font: "Times New Roman", 10pt.

  • Line height: 10pt.

  • Space after: 10pt, but do not add space between contiguous paragraphs having a ProgramListing style.

Pasting a few lines of indented source code into a paragraph having a ProgramListing style causes MS-Word to create several contiguous ProgramListing paragraphs, one for each line of source code. Moreover, when saving the document to non-filtered HTML, MS-Word replaces the leading space characters (that is, the indentation of the source code) by non-breaking space characters (&nbsp;).

In order to solve this problem, it is recommended to proceed as follows:

  1. Use MS-Word to save a sample document making use of your user-defined styles as non-filtered HTML. Example: samples/custom_docbook5/program_listing.docx saved as samples/custom_docbook5/program_listing.htm.

  2. Browse the non-filtered HTML file to see what did MS-Word with your user-defined styles.

  3. Write your XED script. Example: samples/custom_docbook5/program_listing.xed.

  4. Test it using the toxml command-line utility, and not in XMLmind XML Editor. Example:

    C:\...\custom_docbook5> toxml -i prune program_listing.xed¬
     program_listing.htm program_listing.xml
  5. After your XED script passes all tests, integrate it into your DocBook 5 customization. Example: samples/custom_docbook5/0docbook5.xxe.

XED script samples/custom_docbook5/program_listing.xed contains:

for-each /html/body//p[starts-with(@s:class, "ProgramListing")] {
    set-element-name("span");
    set-attribute("class", "programlisting");

    set-attribute("g:id", "pre");
    set-attribute("g:container", "pre class='programlisting'");
}

group();

for-each /html/body//span[@class='programlisting'] {
     (: Get rid of inner spans :)
     for-each .//span {
         unwrap-element();
     }

     (: Get rid of this span by replacing it by a text node 
        where nbsp, \n, \r characters have been processed. :)
     replace(<g:envelope>{translate(normalize-space(.),
                            "&#xA0;&#xA;&#xD;", "  ")}&#xA;</g:envelope>);
}

It is inserted before the prune edit step because this step simplifies span elements only containing whitespace and/or non-breaking space characters.

When the above toxml command is applied to samples/custom_docbook5/program_listing.htm, it gives DocBook 5 document samples/custom_docbook5/program_listing.xml.