Getting started with w2x

About Evaluation Edition

Note that Evaluation Edition is useless for any purpose other than evaluating XMLmind Word To XML. This edition generates output containing random words replaced by string "[XMLmind]". (Of course, this does not happen with Professional Edition!)

???

We’ll use this manual to explain the basic uses of the w2x command-line utility. This manual is found in DOCX format in w2x_install_dir/doc/manual/ and the w2x command-line utility is found in w2x_install_dir/bin/.

C:\w2x-1_12_0> cd doc\manual

C:\w2x-1_12_0\doc\manual> mkdir out

Convert manual.docx to out\manual.xhtml, containing clean, styled, valid XHTML+CSS, looking very much like manual.docx:

..\..\bin\w2x manual.docx out\manual.xhtml

If you want to generate XHTML which is treated by Web browsers as if it were HTML, simply use a .html file extension for the output file:

..\..\bin\w2x manual.docx out\manual.html

Doing this automatically turn on options3 which remove the XML declaration (<?xml version=”1.0” encoding=”UTF-8”?>) normally found at the top of an XHTML file and insert a <meta content=”text/html; charset=UTF-8” http-equiv=”Content-Type”/> into the html/head element of the output document.

Convert manual.docx to out\frameset\manual.xhtml, containing multi-page, clean, styled, valid XHTML+CSS, looking very much like manual.docx:

..\..\bin\w2x –o frameset manual.docx out\frameset\manual.xhtml

The above command generates multiple “.xhtml” files in the out\frameset directory which is automatically created4 if needed to.

Note that out\frameset\manual.xhtml contains a frameset. While an obsolete HTML feature, a frameset makes it easy browsing the generated XHTML+CSS pages. Moreover the table of contents used as the left frame, found in out\frameset\manual-TOC.xhtml, is a convenient way to programmatically list all the generated XHTML+CSS pages.

Convert manual.docx to out\webhelp\manual.html, containing a Web Help looking very much like manual.docx:

..\..\bin\w2x –o webhelp manual.docx out\webhelp\manual.html

The above command generates multiple “.html” files in the out\webhelp directory which is automatically created if needed to.

Convert manual.docx to out\manual.epub, containing a EPUB 2 book looking very much like manual.docx:

..\..\bin\w2x –o epub manual.docx out\manual.epub

Convert manual.docx to out\manual.xml, containing DocBook 4.5.

..\..\bin\w2x –o docbook manual.docx out\manual.xml

Convert manual.docx to out\manual.xml, containing DocBook 5.0.

..\..\bin\w2x –o docbook5 manual.docx out\manual.xml

By default, the generated DocBook files contain HTML tables. If you prefer DocBook to contain CALS tables, please use the following options:

..\..\bin\w2x –o docbook5¬

-p convert.set-column-number yes -p transform.cals-tables yes¬

manual.docx out\manual.xml

Convert manual.docx to out\manual.xml, containing a DocBook V5.1 assembly.

..\..\bin\w2x –o assembly manual.docx out\manual.xml

Convert manual.docx to out\manual.dita, containing a DITA topic.

..\..\bin\w2x –o topic manual.docx out\manual.dita

Generating a task having “MyTask” as its ID is equally simple:

..\..\bin\w2x –o topic¬

-p transform.topic-type task -p transform.root-topic-id MyTask¬

manual.docx out\manual.dita

Convert manual.docx to out\manual.ditamap, containing a DITA map.

..\..\bin\w2x –o map manual.docx out\manual.ditamap

Convert manual.docx to out\manual.ditamap, containing a DITA bookmap possibly having chapter topicrefs and nested topicrefs acting as sections and subsections (but no sub-subsections).

..\..\bin\w2x –o bookmap -p transform2.section-depth 3¬

manual.docx out\manual.ditamap

Convert manual.docx to out\manual.xhtml, containing “semantic”, unstyled XHTML5.

..\..\bin\w2x –o xhtml5 manual.docx out\manual.xhtml

Use the following options to generate other versions of semantic XHTML:

Option

XHTML Version

-o xhtml_strict

XHTML 1.0 Strict

-o xhtml_loose

XHTML 1.0 Transitional

-o xhtml_1

XHTML 1.1

-o xhtml5

XHTML 5.0


3 This option is “-p convert.charset UTF-8”. See charset parameter.

4 But not automatically made empty if the output directory already exists.