Conversion step reference

Convert step

Convert input DOCX file to a styled, valid, XHTML 1.0 Transitional document. The result of this step is this XHTML document.

For clarity, the “convert.” parameter name prefix is omitted here.

However when you’ll pass any of the following parameters to w2x, please do not forget this prefix. Example: -p convert.resource-directory images.

Parameters:

Name

Value

Description

automatic-ids

A regular expression pattern.

Default:
"(^_?[a-zA-Z]{1,3}\\d+$)|
(^(OLE_LINK|_ENREF_))|
(^_GoBack$)
".

Specifies the names of the bookmarks which are automatically generated by MS-Word. This parameter is used to favor user-specified bookmarks, which are expected to have long and descriptive names, over those automatically generated by MS-Word ("_GoBack", "_Toc123", "BM3",etc).

If specified regular expression pattern starts with "|", it is appended to the default one.

If specified regular expression pattern ends with "|", it is prepended to the default one.

charset

A valid character encoding (e.g. UTF-8, Windows-1252).

Default: no charset, add an XML declaration.

When a charset is specified, a meta element is added to the head element of the generated document:

  • <meta charset=”charset”/> if parameter version is “5.0”,
  • <meta content=”text/html; charset=charset” http-equiv=”Content-Type” /> otherwise.

If the specified charset is “UTF-8”, then the XML declaration (<?xml version=”1.0” encoding=”UTF-8”?>) is not to added to the generated document. This allows to get Web browsers consider the generated document as being HTML, and not XHTML.

converted-image-extensions

A list of image file extensions separated by space characters.

Default: “svg png jpeg”.

When the input DOCX file contains an image not having any of the file extensions specified in the converted-image-extensions list, attempt to convert this image to one of the formats of this list.

Each format is considered in turn, that’s why w2x will attempt to convert a WMF image to SVG first, before considering PNG and JPEG.

create-mathml-object

yes” | “no” | “auto

Default: “auto”.

When converting MS-Word math (that is, OpenXML math) to MathML:

yes
Generate an external file containing the converted MathML element and insert an object element pointing to the generated “.mml” file. Example: <object data="doc_files/math-010.mml" type="application/mathml+xml"/>.
no
Embed the converted MathML element in the XHTML document created by this step.
auto
Embed the converted MathML element in the XHTML document but only if parameter version is set to 5.0[9].

default-lang

A valid language code (e.g. en, fr-CA).

No default.

if parameter set-lang is not specified and if the main language of the document cannot determined by examining the contents of the input DOCX file, set the lang attribute of the html element to this value.

About East Asian languages

Due to a limitation, it is recommended to specify for example –p convert.set-lang ja-JP or –p convert.default-lang ja-JP when converting a document written mainly in Japanese.

When parameter convert.set-lang or parameter convert.default-lang is set to a language code starting with ja, zh or ko, then it is attribute w:lang/@w:eastAsia which is used to determine the language of a text span and not attribute w:lang/@w:val.

Note that –p convert.default-lang ja-JP is just used as a hint to favor attribute w:lang/@w:eastAsia over attribute wlang/@w:val. Given the way MS-Word sets these two attributes, using parameter –p convert.default-lang ja-JP will not cause a vastly incorrect detection of the language when converting a German DOCX file for example.

lower-case-resource-names

A boolean: true (same as: yes | on | 1) | false (same as: no | off | 0).

Default: false.

Not for general use. Specifying this parameter as true is needed to keep quiet epubcheck on platforms where filenames are case-sensitive (e.g. Linux).

resource-directory

A file path.

Default: if parameter xhtml-file is specified, basename of xhtml-file, without an extension, but followed by “_files”; otherwise the absolute path of an automatically created temporary directory.

Specifies the file path of the directory which is to contain copies of the images referenced in the input DOCX file.

A relative file path is relative to the value of parameter xhtml-file.

Note that, if it already exists, a resource directory specified this way is not automatically made empty by w2x before being used to store resources. Only the “automatic”, default, output_file_basename_files/ folder is automatically made empty by w2x (if this “automatic” folder already exists).

resource-prefix

A non-empty string not containing the file separator character (“/” or “\”).

Default: none, no prefix.

Specifies a prefix to be prepended to the names of resource files created by w2x.

This prefix is useful when used in conjunction with parameter resource-directory and when several files generated by w2x share the same resource directory.

set-column-number

A boolean: true (same as: yes | on | 1) | false (same as: no | off | 0).

Default: false.

If specified as true, insert in each table cell a column-number processing-instruction containing the column number of this cell. First column is column #1.

Example:

<?column-number 1?>

This processing-instruction greatly helps in generating CALS tables (DocBook, DITA) containing cells spanning several columns.

set-lang

A valid language code (e.g. en, fr-CA).

No default: set the lang attribute of the html element after examining the contents of the input DOCX file.

if specified, set the lang attribute of the html element to this value.

About East Asian languages

Due to a limitation, it is recommended to specify for example –p convert.set-lang ja-JP or –p convert.default-lang ja-JP when converting a document written mainly in Japanese.

When parameter convert.set-lang or parameter convert.default-lang is set to a language code starting with ja, zh or ko, then it is attribute w:lang/@w:eastAsia which is used to determine the language of a text span and not attribute w:lang/@w:val.

version

1.0_transitional (same as: 1.0_loose | 1) | 1.0_strict | 1.1 | 5.0 (same as: 5) | “”.

Default: 1.0_transitional.

Specifies which XHTML version to generate, hence which <!DOCTYPE> to add to generated XHTML document.

Note that XHTML 5.0 has no DTD, hence no <!DOCTYPE> for this version.

The empty string “” means: generate XHTML 1.0 Transitional , but do not add a <!DOCTYPE>.

xhtml-file

A file path.

No default .

If the generated XHTML document was saved to disk, this would be the path of its save file.

When specified (which is strongly recommended), this file path is used to give a base URL to the generated XHTML document.


[9]Because only XHTML 5 documents may embed MathML. With any other version of XHTML, this would cause the document to become invalid.