Customizing the semantic XML files generated by w2x

Converting custom character styles to semantic tags

Converting a custom character style to an XHTML element (possibly having specific attributes) is simple and does not require writing a XED script. Suffice for that to pass parameter inlines.convert to the Edit step.

Example 1: convert text spans having a “Code” character style to XHTML element code:

-p edit.inlines.convert "c-Code code"

Notice that the name of character style in the generated XHTML+CSS file is always prefixed by “c-“.

The syntax for the value of parameter inlines.convert is:

valueconversion [ S ‘!’ S  conversion ]*
conversionstyle_spec S XHTML_element_name [ S attribute ]*
style_specstyle_name | style_pattern
style_pattern  → ‘/’ pattern ’/’ | ‘^’ pattern ‘$’
attributeattribute_name ‘=’ quoted_attribute_value
quoted_attribute_value →  “’” value “’” | ‘”’ value ‘”’

Example 2: in addition to what’s done in above example 1, convert text spans having a “Abbrev” character style to XHTML element abbr having a title=”???” attribute:

-p edit.inlines.convert "c-Code code ! c-Abbrev abbr title='???'"

What if the semantic XHTML created by the Edit step is then converted to DITA or DocBook by the means of a Transform step?

In the case of XHTML elements code and abbr, there is nothing else to do because the stock XSLT stylesheets already support these elements:

The general case which also requires using custom XSLT stylesheets is explained in section The general case.

Converting custom paragraph styles to semantic tags

Converting a custom paragraph style to an XHTML element (possibly having specific attributes) is simple and does not require writing a XED script. Suffice for that to pass parameter blocks.convert to the Edit step.

Example 1.a: convert paragraphs having a “ProgramListing” paragraph style to XHTML element pre:

-p edit.blocks.convert "p-ProgramListing pre"

Notice that the name of paragraph style in the generated XHTML+CSS file is always prefixed by “p-“.

If you use the above blocks.convert specification, it will work fine, except that you’ll end up with several consecutive pre elements (one pre per line of program listing). This is clearly not what you want. You want consecutive pre elements to be merged into a single pre element. Fortunately implementing this too is quite simple.

Example 1.b: convert paragraphs having a “ProgramListing” paragraph style to XHTML element span (having grouping attributes; more about this below):

-p edit.blocks.convert "p-ProgramListing span g:id='pre' g:container='pre'"

When any of the target XHTML elements have grouping attributes (g:id='pre'[7], g:container='pre', in the above example), then w2x_install_dir/xed/blocks.xed automatically invokes the group() command at the end of the conversions. This has the effect of grouping consecutive <span g:id='pre' g:container='pre'> into a common pre parent element.

Given the fact that XED command group() automatically removes grouping attributes when done and that w2x_install_dir/xed/finish.xed discards all useless span elements, this leaves us with clean pre elements containing text[8].

The syntax for the value of parameter blocks.convert is:

valueconversion [ S ‘!’ S  conversion ]*
conversionstyle_spec S XHTML_element_name [ S attribute ]*
style_specstyle_name | style_pattern
style_pattern  → ‘/’ pattern ’/’ | ‘^’ pattern ‘$’
attributeattribute_name ‘=’ quoted_attribute_value
quoted_attribute_value →  “’” value “’” | ‘”’ value ‘”’

Example 3: in addition to what’s done in above example 1.b, convert paragraphs having a “Term” paragraph style to XHTML element dt, convert paragraphs having a “Definition” paragraph style to XHTML element dl and group consecutive dt and dl elements into a common dl parent:

-p edit.blocks.convert "p-Term dt g:id='dl' g:container='dl' !¬
 p-Definition dd g:id='dl' g:container='dl' !¬
 p-ProgramListing span g:id='pre' g:container='pre'"

What if the semantic XHTML created by the Edit step is then converted to DITA or DocBook by the means of a Transform step?

In the case of XHTML elements pre, dt, dd and dl, there is nothing else to do because the stock XSLT stylesheets already support these elements.

The general case which also requires using custom XSLT stylesheets is explained in section The general case.

The general case

In the general case, customizing the semantic XML files generated by w2x requires writing both a XED script and an XSLT stylesheet.

For example, let’s suppose we want to group all the paragraphs having a “Note” paragraph style and to generate for such groups DocBook and DITA note elements.

The following blocks.convert parameter would allow to very easily create the desired groups:

-p edit.blocks.convert "p-Note p g:id='note_group_member'¬
 g:container='div class=\”role-note\” ’"

However this would leave us with two unsolved problems:

  1. A paragraph having a “Note” paragraph style often starts with bold text “Note:”. We want to eliminate this redundant label.
  2. The stock XSLT stylesheets will not convert XHTML element <div class=”role-note”> to a DocBook or DITA note element.

A custom XED script

The first problem is solved by the following w2x_install_dir/doc/manual/customize/notes.xed script:

namespace "http://www.w3.org/1999/xhtml";
namespace html = "http://www.w3.org/1999/xhtml";
namespace g = "urn:x-mlmind:namespace:group";

for-each /html/body//p[get-class("p-Note")] {
    delete-text("note:\s*", "i");
    if content-type() <= 1 and not(@id) {
        delete();
    } else {
        remove-class("p-Note");
        set-attribute("g:id", "note_group_member");
        set-attribute("g:container", "div class='role-note'");
    }
}

group();

The “Note:” label, if any, is deleted using XED command delete-text. If doing this creates a useless empty (content-type() <= 1) paragraph, then delete this paragraph using XED command delete.

The above script is executed after stock script w2x_install_dir/xed/blocks.xed by the means of the following w2x command-line option:

-pu edit.after.blocks customize\notes.xed

A custom XSLT stylesheet

The second problem is solved by the following w2x_install_dir/doc/manual/customize/custom_topic.xslt XSLT 1.0 stylesheet:

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:h="http://www.w3.org/1999/xhtml"
  exclude-result-prefixes="h">

<xsl:import href="w2x:xslt/topic.xslt"/>

<xsl:template match="h:div[@class = 'role-note']">
  <note>
    <xsl:call-template name="processCommonAttributes"/>
    <xsl:apply-templates/>
  </note>
</xsl:template>
...
</xsl:stylesheet>

This stylesheet, which imports stock w2x_install_dir/xslt/topic.xslt, is used for the topic, map and bookmap output formats (see –o option). Similar, very simple, stylesheets have been developed for the docbook and docbook5 output formats.

Something like “w2x:xslt/topic.xslt” is an absolute URL supported by w2x. “w2x:” is an URL prefix (defined in the automatic XML catalog used by w2x) which specifies the location of the parent directory of both the xed/ and xslt/ subdirectories.

The above stylesheet replaces the stock one by the means of the following w2x command-line option:

-o topic -t customize\custom_topic.xslt

Do not forget to specify the –t option after the –o option, because it’s the –o option which implicitly invokes stock w2x_install_dir/xslt/topic.xslt (this has been explained in chapter Going further with w2x) and we want to use –t to override the use of the stock XSLT stylesheet.

Tip: You’ll find a template for custom XED scripts and several templates for custom XSLT stylesheets in w2x_install_dir/doc/manual/templates/.

For example, in order to create w2x_install_dir/doc/manual/customize/custom_topic.xslt, we started by copying template XSLT stylesheet w2x_install_dir/doc/manual/templates/template_topic.xslt.


[7]Any value would do (e.g. g:id=”foo” would have worked as well). Suffice for consecutive elements to be grouped to all have the same g:id attribute.

[8]Unless you specify:

-p edit.prune.preserve "p-ProgramListing"

script w2x_install_dir/xed/prune.xed will cause open lines to be stripped from the generated pre element.