Polish the EPUB

Find and correct problems in EPUB files

From the developerWorks archives

Colin Beckingham

Date archived: December 6, 2016 | First published: August 30, 2011

In EPUB documents, you cannot detect some problems with normal validation methods. As long as the document validates as well-formed XML and follows the EPUB standard, it can appear to be correct but might not read correctly in an e-Reader. Examples include broken paragraphs, bad page numbering, and spelling errors caused by OCR scanning. But you can view and correct errors using two methods: with the EPUB editor Sigil and with PHP in combination with SimpleXML and the Enchant libraries. Regular expressions provide the key to efficient processing.

This content is no longer being updated or maintained. The full article is provided "as is" in a PDF file. Given the rapid evolution of technology, some steps and illustrations may have changed.

Zone=XML, Open source
ArticleTitle=Polish the EPUB