Wednesday, August 19, 2015

Rudimentary Transclusion in DocBook for Emergencies

One of the problems I have faced repeatedly with modular DocBook XML documents is wanting to reuse a section where a chapter is valid and the reverse. I'm looking forward to using the new version of my excellent XML editor, from XMLmind, that supports the new topic/assembly structures in DocBook 5.1. DITA has always done this smoothly, but I prefer using DocBook.

For now, I am using an ugly pre-processing step to handle the situation. After I process my documents through an identity transformation to resolve xi:includes, I process them again to resolve the following custom processing instructions. Then process the resulting files with the DocBook XSLT stylesheets to get my final output.

  <?transclude-element my-element-id target-file=my-file.xml ?>

  <?transclude-element my-other-element-id make-ids-unique ?>

These processing instructions will be replaced by the element identified by the xml:id that is named by the first space-separated string in the processing instruction. If the element is not in the current document, it will be found in a file named by the string marked by "target-file=." If the element is in the current document, including the string "make-ids-unique" will append a unique string to each xml:id in the target element to avoid duplication.

The custom XSLT 2.0 stylesheet that handles the processing instruction (below) figures out whether the transcluded element will be the child of a book element and transforms sections into chapters. If the transcluded element will be the child of a chapter or section, it transforms chapters into sections. Otherwise it just writes the target element in place of the processing instruction.

One reason I prefer this to xi:includes in some situations is that  my XSLT processing chain only handles the simplest xpointers. Using this processing instruction allows me to reuse small elements like a table or paragraph that are deeply nested in other content.

Hopefully I will throw this away soon and start using topic and assembly to create modular documents in a more robust way. The custom XSLT that handles my processing instructions is pasted below. It's very specific to my element use (chapters and sections) so it will need to be altered to suit other environments. I use Saxon 9 for my pre-processing steps so that I  can use XSLT 2.0.

Warning: I had to add steps to preserve the links between callouts and their callout bugs in programlistings when the stylesheet makes xml:ids unique. There are surely other situations in which this brutal technique will cause problems!

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:fo="http://www.w3.org/1999/XSL/Format"
  xmlns:d="http://docbook.org/ns/docbook"
  xmlns:exsl="http://exslt.org/common"
  extension-element-prefixes="exsl"
  xmlns="http://docbook.org/ns/docbook">
  
  <xsl:output method="xml" indent="no" />
  <xsl:strip-space elements="d:title"/>
  
  <xsl:param name="current.docid" />
  
  <xsl:template match="@*|node()">
    <xsl:copy>
      <!-- If a chapter or section does not have an 
           xml:id attribute, generate one. -->
      <xsl:if test="not(@xml:id) and name() = 'chapter' or name() = 'section' or name() = 'part'">
        <xsl:attribute name="xml:id">
          <xsl:value-of select="concat('storiant', generate-id(.))" />
        </xsl:attribute>
      </xsl:if>
      <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
  </xsl:template>
  
  <!-- ************************************ -->
  <!-- ***  Transclude elements. -->
  <!-- ************************************ -->
  
  <xsl:template match="processing-instruction('transclude-element')">
    <xsl:variable name="pi.text">
      <xsl:value-of select="normalize-space(.)" />
    </xsl:variable>
    <xsl:variable name="transclusion.target.id">
      <xsl:value-of select="tokenize($pi.text, '\s')[position() = 1]" />
    </xsl:variable>
    <xsl:variable name="transclusion.target.file.path">
      <xsl:choose>
        <xsl:when test="tokenize($pi.text, '\s')[contains(., 'target-file=')]">
          <!-- Write the relative path from the stylesheet directory
               to the source directory. Brittle! -->
          <xsl:variable name="current.instruction.token" 
                        select="tokenize($pi.text, '\s')[position() = 2]" />
          <!-- The next line will surely not work anywhere 
               but my file system! -->
          <xsl:text>../../source/</xsl:text>
          <xsl:value-of 
            select="substring-after($current.instruction.token, 'target-file=')" />
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="document-uri(/)" />
        </xsl:otherwise>
      </xsl:choose>
    </xsl:variable>
    <xsl:variable name="generated.transclusion.id">
      <xsl:if test="contains($pi.text, 'make-ids-unique')">
        <xsl:value-of select="generate-id()" />
      </xsl:if>
    </xsl:variable>
    <xsl:variable name="transclusion.context.parent.name">
      <xsl:value-of select="name(parent::*)" />
    </xsl:variable>
    <xsl:choose>

      <!-- Determine the parent element of the processing 
           instruction and handle the target elements differently 
           according to the context. I convert chapters to 
           sections when they are going into a section and 
           convert sections to chapters when they are going 
           into a book. -->
      <xsl:when test="parent::d:book">
        <xsl:apply-templates 
          select="document($transclusion.target.file.path)//*[@xml:id=$transclusion.target.id]" 
          mode="transclude.element.book.parent">  
          <xsl:with-param name="transclusion.id" 
                          select="$generated.transclusion.id" 
                          tunnel="yes" />
        </xsl:apply-templates>
      </xsl:when>
      <xsl:when test="parent::d:chapter">
        <xsl:apply-templates 
          select="document($transclusion.target.file.path)//*[@xml:id=$transclusion.target.id]" 
          mode="transclude.element.chapter.parent">  
          <xsl:with-param name="transclusion.id" 
            select="$generated.transclusion.id" tunnel="yes" />
        </xsl:apply-templates>
      </xsl:when>
      <xsl:otherwise>
        <xsl:apply-templates 
          select="document($transclusion.target.file.path)//*[@xml:id=$transclusion.target.id]" 
          mode="transclude.element">  
          <xsl:with-param name="transclusion.id" 
            select="$generated.transclusion.id" tunnel="yes" />
        </xsl:apply-templates>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
  
  <xsl:template match="d:section" mode="transclude.element.book.parent">
    <xsl:param name="transclusion.id" required="yes" tunnel="yes" />
    <xsl:element name="chapter">
      <xsl:if test="@xml:id">
        <xsl:attribute name="xml:id">
          <xsl:value-of select="@xml:id" />
          <xsl:value-of select="$transclusion.id" />
        </xsl:attribute>
      </xsl:if>
      <xsl:apply-templates mode="transclude.element">  
        <xsl:with-param name="transclusion.id" 
          select="$transclusion.id" tunnel="yes" />
      </xsl:apply-templates>
    </xsl:element>
  </xsl:template>
  
  <xsl:template match="d:chapter" mode="transclude.element.chapter.parent">
    <xsl:param name="transclusion.id" required="yes" tunnel="yes" />
    <xsl:element name="section">
      <xsl:if test="@xml:id">
        <xsl:attribute name="xml:id">
          <xsl:value-of select="@xml:id" />
          <xsl:value-of select="$transclusion.id" />
        </xsl:attribute>
      </xsl:if>
      <xsl:apply-templates mode="transclude.element">  
        <xsl:with-param name="transclusion.id" 
          select="$transclusion.id" tunnel="yes" />
      </xsl:apply-templates>
    </xsl:element>
  </xsl:template>
  
  
  <xsl:template match="@xml:id" mode="transclude.element">
    <xsl:param name="transclusion.id" required="yes" tunnel="yes" />
    <xsl:attribute name="xml:id">
      <xsl:value-of select="." />
      <xsl:value-of select="$transclusion.id" />
    </xsl:attribute>
  </xsl:template>

  <!-- The next two templates preserve the relationship 
       between callouts and callout bugs. -->
  
  <xsl:template match="d:co/@linkends" mode="transclude.element">
    <xsl:param name="transclusion.id" required="yes" tunnel="yes" />
    <xsl:attribute name="linkends">
      <xsl:value-of select="." />
      <xsl:value-of select="$transclusion.id" />
    </xsl:attribute>
  </xsl:template>
  
  <xsl:template match="d:callout/@arearefs" mode="transclude.element">
    <xsl:param name="transclusion.id" required="yes" tunnel="yes" />
    <xsl:attribute name="arearefs">
      <xsl:value-of select="." />
      <xsl:value-of select="$transclusion.id" />
    </xsl:attribute>
  </xsl:template>

  <!-- I use olinks for cross-references. The next template 
       assumes that if it finds an xref, it must be pointing to 
       something very local and preserves the relationship. If 
       you use xref for cross-references, this will likely 
       cause problems. -->
  
  <xsl:template match="d:xref/@linkend" mode="transclude.element">
    <xsl:param name="transclusion.id" required="yes" tunnel="yes" />
    <xsl:attribute name="linkend">
      <xsl:value-of select="." />
      <xsl:value-of select="$transclusion.id" />
    </xsl:attribute>
  </xsl:template>

  
  <xsl:template match="@*|node()" mode="transclude.element">
    <xsl:param name="transclusion.id" required="yes" tunnel="yes" />
    <xsl:copy>
      <xsl:apply-templates select="@*|node()" mode="transclude.element">  
        <xsl:with-param name="transclusion.id" 
                        select="$transclusion.id" 
                        tunnel="yes" />
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

Sunday, March 8, 2015

Transforming Scribus Source File for Text Output

I've been working, very slowly, on a publication for the town I live in. It's a book of trail maps and information about open land in the area. I wanted to use an open tool for the book layout, partly because I don't have a copy of the commercial tools that are normally used for that, and partly because using free, open tools will make it easier for other people to edit the book document in the future.

I  found the Scribus project and I am now completing the book with it. It took a little time to learn the functions, but I'm really happy with it. The PDF output is great, and I found all the layout adjustments that I needed. Thanks, Scribus!

One tricky problem I encountered was in sharing the book content for review. I was not able to find an easy way to export the written content as text so I could paste it into a word processing file. Scribus is free, but the people who are reviewing this book are not likely to enjoy it as much as I do. I tried copying the text from the text frames and pasting it. This was tedious and the copied text did not include any line breaks. So I had to hunt through the pasted text and insert line breaks between each paragraph.

Since I am likely to submit more versions of the book for review, I thought it would be useful to have an automated way to extract the text. Scribus uses XML source files, so I wrote XSLT to transform them to plain text.

Here's the XSLT stylesheet that will write the text of a Scribus document to a plain text file. It handles my document, which is not very complex. One challenging aspect of Scribus source XML is that character strings are held in elements that are siblings of the paragraph markers. So the export XSLT has to include logic for recognizing the paragraph structures. I prefer explicit structures in XML documents, like wrapping contents inside a <para> element, but I guess the Scribus team had their reasons for keeping things flat.

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:exsl="http://exslt.org/common"
  extension-element-prefixes="exsl">
  
  <!-- 
  #!/bin/bash
  # Example bash script for transforming a Scribus file.
  # This requires an XSLT 2.0 compatible XSLT processor, 
  # in this case Saxon v9.

  SAXON_JAR_PATH="/path/to/saxon/download/saxon9he.jar"
  INPUT_FILE="my-scribus-file.sla"
  OUTPUT_FILE="my-output-text-file.txt"
  XSLT_FILE="this-xslt-file.xslt"

  java -classpath $SAXON_JAR_PATH \
  net.sf.saxon.Transform \
  -o ${OUTPUT_FILE} \
  ${INPUT_FILE} \
  ${XSLT_FILE}
  -->
  
  <xsl:output method="text" />
  
  <!-- Start by explicitly selecting the root of the DOM, and then 
  applying templates to PAGEOBJECT elements. 
  
  Sort the selected elements by the OwnPage attributes, which
  indicate the document order. 
  
  Then sort the selected elements by the YPOS attibutes, which 
  indicate the order on the page, roughly, and assuming you're
  reading from top to bottom. Pick a different attribute for 
  the secondary sort if you prefer. -->
  
  <xsl:template match="/">
    <xsl:apply-templates select="//PAGEOBJECT">
      <xsl:sort select="@OwnPage" data-type="number" />
      <xsl:sort select="@YPOS" data-type="number" />
    </xsl:apply-templates>
  </xsl:template>
  
  <!-- Working with Scribus XML is tricky because the para 
  elements are siblings of the ITEXT elements that hold text 
  strings. I would have expected nested elements. But I guess 
  there's a Scribus-related reason. -->
   
  <xsl:template match="ITEXT">
    <xsl:value-of select="@CH" />
  </xsl:template>
  
  <!-- Write a tab character if the ITEXT is followed by a 
  tab element.  -->
  
  <xsl:template match="ITEXT[name(following-sibling::*[1])='tab']">
    <xsl:value-of select="@CH" />
    <xsl:text>&#x9;</xsl:text>
  </xsl:template>
  
  <!-- The newline character in the text element creates a line
  break between the paragraphs of your Scribus document. I believe
  the trail element is equivalent to the end of a paragraph.  -->
  
  <xsl:template match="ITEXT[name(following-sibling::*[1])=('para', 'trail')]">
    <xsl:value-of select="@CH" />
    <xsl:text>&#xa;</xsl:text>
  </xsl:template>
  
  <!-- I don't want lots of newlines and space so text nodes
     must be supressed. -->
  <xsl:template match="text()" />
  
</xsl:stylesheet>