A. The toxml command-line utility

The toxml command-line utility is the "Paste from Word" engine in the form a command-line utility. It allows to convert the non-filtered HTML generated by MS-Word 2003+ to XML.

The toxml command-line utility is available as toxml (shell script; Mac OS X, Linux) and as toxml.bat (Windows). Both files are found in the directory where the "Paste from Word" add-on has been installed.

Note

Depending on where the "Paste from Word" add-on has been installed, you may have to edit toxml or toxml.bat using a text editor in order to modify variable xxeHome.

Command-line usage:

toxml [-v|-vv] [Process options] [Format options] in_html_file out_xml_file

Process options:

-p name value

Set parameter name to value.

Parameters starting with "transform." are passed to the XSLT stylesheet, if any, after removing the "transform." prefix. All the other parameters are passed as is to the main .xed script, if any.

-pu name URL_or_file

Same as "-p", except that parameter value URL_or_file is first converted to an URL.

URL_or_file is an URL or an absolute or relative (to current working directory) filename.

-s xed_URL_or_file

Specifies which main .xed script to use to modify the document.

Specify an empty string ("") to suppress the edit phase.

Default script: paste-from-word:xed/main.xed.

-t xslt_URL_or_file

Specifies which XSLT 1.0 stylesheet to use to transform the document.

Specify an empty string ("") to suppress the transform phase.

Process options modifying default script "paste-from-word:xed/main.xed":

-parse

Save XHTML without fully processing it. (Stop processing after edit step "styles".)

-i step xed_URL_or_file

Insert script before .xed step.

-a step xed_URL_or_file

Add script after .xed step.

-r step xed_URL_or_file

Replace step by .xed script.

Step may be a single step name or a range: "first..last" or "..last" or "first..".

-d step

Delete step.

Step may be a single step name or a range.

The steps of default script are: after-parse, styles, prune, lang, title, biblio, index, xrefs, inlines, tables, captions, headings, lists, footnotes, sections, ids, finish, before-save.

Format options:

-out format

Specifies the output format in case it cannot be determined using the extension of the output file.

Formats: docbook, docbook5, topic, xhtml_strict, xhtml_loose, xhtml1_1, xhtml5, xhtml (default).

Other options:

-v, -vv

Verbose.