From DocBook to EPUB
DocBook is a popular choice for developers who need to maintain long-form technical documentation. Unlike the files produced by traditional word-processing programs, you can manage DocBook output with text-based version-control systems. Because DocBook is XML, you can easily transform it into multiple output formats. Since the summer of 2008, you can find native support of EPUB as an output format from the official DocBook XSL project.
Running the basic DocBook-to-EPUB pipeline with XSLT
Start with a simple DocBook document, in Listing 14.
This document is defined as type book and
includes a preface, two chapters, and an inline image displayed on the
title page. This image will be found in the same directory as the DocBook
source file. Create this file and the title page image yourself, or download
samples from Downloads.
Listing 14. A simple DocBook book
<?xml version="1.0" encoding="utf-8"?>
<book>
<bookinfo>
<title>My EPUB book</title>
<author><firstname>Liza</firstname>
<surname>Daly</surname></author>
<volumenum>1234</volumenum>
</bookinfo>
<preface id="preface">
<title>Title page</title>
<figure id="cover-image">
<title>Our EPUB cover image icon</title>
<graphic fileref="cover.png"/>
</figure>
</preface>
<chapter id="chapter1">
<title>This is a pretty simple DocBook example</title>
<para>
Not much to see here.
</para>
</chapter>
<chapter id="end-notes">
<title>End notes</title>
<para>
This space intentionally left blank.
</para>
</chapter>
</book>
|
Next, see Resources to download the latest version of the DocBook XSL stylesheets, and make sure that you have an XSLT processor such as xsltproc or Saxon installed. This example uses xsltproc, which is available on most UNIX-like systems. To convert the DocBook file, just run that file against the EPUB module included in DocBook XSL, as in Listing 15.
Listing 15. Converting DocBook into EPUB
$ xsltproc /path/to/docbook-xsl-1.74.0/epub/docbook.xsl docbook.xml Writing OEBPS/bk01-toc.html for book Writing OEBPS/pr01.html for preface(preface) Writing OEBPS/ch01.html for chapter(chapter1) Writing OEBPS/ch02.html for chapter(end-notes) Writing OEBPS/index.html for book Writing OEBPS/toc.ncx Writing OEBPS/content.opf Writing META-INF/container.xml |
Next, add the mimetype file and build the epub+zip archive yourself. Listing 16 shows those three quick commands and the result of a pass through the EpubCheck validator.
Listing 16. Creating the EPUB archive from DocBook
$ echo "application/epub+zip" > mimetype $ zip -0Xq my-book.epub mimetype $ zip -Xr9D my-book.epub * $ java -jar epubcheck.jar my-book.epub No errors or warnings detected |
Pretty easy! Figure 3 shows your creation in ADE.
Figure 3. Converted DocBook EPUB in ADE
Automatic DocBook-to-EPUB conversion with Python and lxml
The DocBook XSL goes a long way toward making EPUB generation painless, but you must perform a few steps outside XSLT. This last section demonstrates a sample Python program that completes the creation of a valid EPUB bundle. I show individual methods in the tutorial; you can get the complete docbook2epub.py program from Downloads.
Several Python XSLT libraries are available, but my preference is lxml. It provides not just XSLT 1.0 functionality but also high-performance parsing, full XPath 1.0 support, and special extensions for handling HTML. If you prefer a different library or use a different programming language than Python, these examples should be easy to adapt.
Calling the DocBook XSL with lxml
The most efficient method to call XSLT using lxml is to parse the XSLT in advance, then create a transformer for repeated use. This is useful, as my DocBook-to-EPUB script accepts multiple DocBook files to convert. Listing 17 demonstrates this approach.
Listing 17. Running the DocBook XSL using lxml
import os.path
from lxml import etree
def convert_docbook(docbook_file):
docbook_xsl = os.path.abspath('docbook-xsl/epub/docbook.xsl')
# Give the XSLT processor the ability to create new directories
xslt_ac = etree.XSLTAccessControl(read_file=True,
write_file=True,
create_dir=True,
read_network=True,
write_network=False)
transform = etree.XSLT(etree.parse(docbook_xsl), access_control=xslt_ac)
transform(etree.parse(docbook_file))
|
The EPUB module in DocBook XSL creates the output files itself, so nothing is returned from the evaluation of the transform here. Instead, DocBook creates two folders (META-INF and OEBPS) in the current working directory that contain the results of the conversion.
Copying the images and other resources into the archive
DocBook XSL does nothing about any images that you might supply
for use with your document; it only creates the metadata files and the
rendered XHTML. Because the EPUB specification requires that all
resources be listed in the content.opf manifest, you can inspect the
manifest to find any images that were referenced in the original DocBook
file. Listing 18 shows this technique, which
assumes that the path variable contains the
path to your EPUB-in-progress, as created by the DocBook XSLT.
Listing 18. Parse the OPF content file to find any missing resources
import os.path, shutil
from lxml import etree
def find_resources(path='/path/to/our/epub/directory'):
opf = etree.parse(os.path.join(path, 'OEBPS', 'content.opf'))
# All the opf:item elements are resources
for item in opf.xpath('//opf:item',
namespaces= { 'opf': 'http://www.idpf.org/2007/opf' }):
# If the resource was not already created by DocBook XSL itself,
# copy it into the OEBPS folder
href = item.attrib['href']
referenced_file = os.path.join(path, 'OEBPS', href):
if not os.path.exists(referenced_file):
shutil.copy(href, os.path.join(path, 'OEBPS'))
|
Creating the mimetype file automatically
DocBook XSL won't create your mimetype file, either, but a quick bit of code from Listing 19 can take care of that.
Listing 19. Create the mimetype file
def create_mimetype(path='/path/to/our/epub/directory'):
f = '%s/%s' % (path, 'mimetype')
f = open(f, 'w')
# Be careful not to add a newline here
f.write('application/epub+zip')
f.close()
|
Creating the EPUB bundle with Python
All that's left now is to bundle the files into a valid EPUB ZIP archive. This takes two steps: adding the mimetype file as the first in the archive with no compression, then adding the remaining directories. Listing 20 shows the code for this process.
Listing 20. Using the Python zipfile module to create an EPUB bundle
import zipfile, os
def create_archive(path='/path/to/our/epub/directory'):
'''Create the ZIP archive. The mimetype must be the first file in the archive
and it must not be compressed.'''
epub_name = '%s.epub' % os.path.basename(path)
# The EPUB must contain the META-INF and mimetype files at the root, so
# we'll create the archive in the working directory first and move it later
os.chdir(path)
# Open a new zipfile for writing
epub = zipfile.ZipFile(epub_name, 'w')
# Add the mimetype file first and set it to be uncompressed
epub.write(MIMETYPE, compress_type=zipfile.ZIP_STORED)
# For the remaining paths in the EPUB, add all of their files
# using normal ZIP compression
for p in os.listdir('.'):
for f in os.listdir(p):
epub.write(os.path.join(p, f)), compress_type=zipfile.ZIP_DEFLATED)
epub.close()
|
That's it! Remember to validate.




