Contents

1 Introduction 4

2 Installing w2x 5

2.1 Contents of the installation directory 6

3 Alternatives to using the w2x command-line utility 9

3.1 The w2x-app graphical application 9

3.2 The “Word To XML” add-on for XMLmind XML Editor 9

3.2.1 Installing the “Word To XML” add-on 10

3.3 The “Word To XML” servlet 10

3.3.1 Contents of the servlet software distribution 10

3.3.2 Installing the servlet 11

3.3.3 Configuring the servlet 11

3.3.4 Using the servlet to convert DOCX files 12

3.3.5 Non interactive requests 13

4 Getting started with w2x 15

4.1 How to generate useful multi-page HTML 17

5 Going further with w2x 19

5.1 Stock XED scripts 21

6 Customizing the output of w2x 24

6.1 Customizing the XHTML+CSS files generated by w2x 24

6.1.1 Using a XED script to modify the styles embedded in the XHTML+CSS file 24

6.1.2 Appending custom styles to the styles embedded in the XHTML+CSS file 24

6.1.3 Using an external CSS file rather than embedded CSS styles 25

6.1.4 Combining all the above methods 26

6.2 Customizing the semantic XML files generated by w2x 27

6.2.1 Converting custom character styles to semantic tags 27

6.2.2 Converting custom paragraph styles to semantic tags 28

6.2.3 The general case 30

6.3 Generating XML conforming to a custom schema 33

7 The w2x command-line utility 34

7.1 Variables substituted in the parameter values passed to the –p and –pu options 36

7.2 Default conversion steps 37

7.3 Automatic conversion step parameters 37

8 Conversion step reference 38

8.1 Convert step 38

8.2 Delete files step 41

8.3 Edit step 41

8.4 EPUB step 51

8.5 Load step 51

8.6 Save step 52

8.7 Split step 52

8.8 Transform step 54

8.9 Web Help step 58

9 Embedding w2x in a Java™ application 60

9.1 Extension points 61

9.1.1 Custom conversion step 61

9.1.2 Custom image converters 61

9.1.2.1 Specifying an external image converter 62

9.1.2.2 Controlling how image files found in the input DOCX file are converted to standard formats 63

10 Limitations and implementation specificities 65

10.1 About tab stops 67

Index 68