Skip to main content

By clicking Submit, you agree to the developerWorks terms of use.

The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

All information submitted is secure.

  • Close [x]

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerworks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

By clicking Submit, you agree to the developerWorks terms of use.

All information submitted is secure.

  • Close [x]

Manage and convert electronic publications using Calibre

Make documents easier to use

William von Hagen, Systems Administrator, Writer, WordSmiths
William von Hagen has been a writer and UNIX systems administrator for more than 20 years and a Linux advocate since 1993. Bill is the author or co-author of books on subjects such as Ubuntu Linux, Xen Virtualization, the GNU Compiler Collection (GCC), SUSE Linux, Mac OS X, Linux file systems, and SGML. He has also written numerous articles for Linux and Mac OS X publications and Web sites. You can reach Bill at wvh@vonhagen.org.

Summary:  Most product documentation and literature today is distributed in PDF format, and many books are available in a variety of electronic publishing formats. Applications for reading these documents are available for every smart phone, portable computer, and tablet. There are dedicated eBook readers such as Kindle, Nook, and the Kobo eReader. Unfortunately, these eBook readers support different electronic publishing formats, limiting the use of these electronic publications to devices. This article introduces Calibre, an open source application that enables you to convert books and documents among these formats.

Date:  02 Aug 2011
Level:  Intermediate PDF:  A4 and Letter (275KB | 12 pages)Get Adobe® Reader®
Also available in:   Chinese  Japanese  Portuguese

Activity:  13130 views
Comments:  

Electronic publishing: Green and convenient

Continuous access to information is a basic assumption for today's mobile workforce and always-on society. Web-based content is inherently available from any networked, browser-equipped mobile device, but longer content is most conveniently stored locally on your mobile or portable device as documents in a variety of electronic publishing formats.

Capitalizing on the ubiquity of devices such as tablets, smart phones, and eBook readers, more organizations are delivering product literature and documentation in electronic publishing formats for both internal and customer consumption. Delivering documents electronically gets them into consumer and customer hands quicker, saves money, conserves paper, and simplifies document updates. By giving customers control over where and how they access your publications and by making it easier to access them, electronic publishing provides direct benefits to both consumers and publishers.


Problems in portable publishing

Electronic publishing solves many of the issues in distributing and accessing documents but introduces a new problem: the format in which electronic content is delivered. Table 1 lists the most common electronic publishing formats and the devices that support them, though many other formats are actively in use. As this table illustrates, all electronic document readers—whether applications or dedicated devices—support a limited number of these formats. When manually creating electronic content, corporate publishers must therefore select the "right" electronic publishing formats—at least those supported on the devices and applications used internally and by customers.


Table 1. Electronic publishing formats and supported devices
FormatDescriptioneReader support
TXTPlain text formatAll
HTMLHypertext Markup LanguageMost
Exception: Barnes & Noble Nook
PDFAdobe® Portable Document Format
Not mobile friendly unless text re-flow is enabled during document creation
Most
EPUBWidely supported open format for electronic publishing
See Resources for more information.
Most
Exception: Amazon Kindle
MOBIFollows the specifications of the Open eBook (OEB) format
May also have an PRC (Palm) or AZW (Amazon DRM-enabled MOBI) extension
Most
Exception: Nook
LITMicrosoft's literature formatMicrosoft® Reader application only
PDBExtension refers to different formats on different devicesSony Reader, open source Plucker application, TealDoc documents on Palm and compatible devices


The free and open source Calibre application provides a convenient translator for all of these different formats. Calibre is an open source application that enables you to convert electronic publications from one format to another, liberating electronic documents from the restrictions of a specific format or device. Calibre can read all common electronic publishing formats as well as import documents in PDF, text, and HTML format. You can then use Calibre's conversion capability to save those imported documents in the particular format you need.

Corporate publishers should keep in mind that converting electronic publications between different formats for corporate use may require purchasing additional copies of those publications. If in doubt, consult your organization's legal staff before converting and redistributing electronic publications that your company does not own. Converting electronic publications between different formats for personal use is generally viewed as acceptable use as long as you do not remove any DRM mechanisms present in those publications and respect any redistribution limitations that the original publisher imposed.


Obtain and install Calibre

Installable versions of Calibre for Windows® and Apple Mac OS X operating systems are available from the Calibre download page (see Resources). Calibre for Linux® is provided as a package in the repositories for most Linux distributions and can be installed using your distribution's package management utilities—typically, apt-get, rpm, Synaptic, or yum. If the repositories for your distribution do not include a prepackaged version of Calibre or you want to install a newer version of Calibre than what is available for your Linux distribution, you can download and install the latest version of Calibre over the Internet using the following command:

python -c "import urllib2; \
  exec urllib2.urlopen('http://status.calibre-ebook.com/linux_installer').read(); main()"

You must execute this installation command as a privileged user by using the su or sudo commands typed on a single line without the backslash character (\, used in this example for formatting purposes).

After entering this command, the latest Calibre package for Linux is downloaded to your system. You are then prompted for the name of the directory in which you want to install Calibre:

Enter the installation directory for calibre [/opt]:

Enter the name of the directory in which the installer should create a calibre sub-directory to hold Calibre and associated applications, data files, and libraries. You can also simply press Return to accept the default value, /opt, which will create the /opt/calibre directory and install Calibre in that location. Installing Calibre over the Internet creates symbolic links in your system's /usr/bin directory that point to the Calibre application and other applications in the Calibre package.

Import documents into a Calibre library

Calibre is a graphical application that uses the QT UI Framework (see Resources) to provide a standard, cross-platform GUI. Installing Calibre also provides a number of command-line tools, discussed later.

When using Calibre in graphical mode, you must import your existing documents into your Calibre library before you can convert, annotate, or otherwise modify them. To select the documents that you want to import, click the Add books icon on the Calibre toolbar. In the navigation window that appears, browse to and select those documents, then import them by clicking Open.

Calibre's main window displays summary information about the documents currently in your Calibre library. When you first start Calibre, this listing contains only the Calibre Quick Start Guide, included to help you get started using Calibre by walking you through its basic capabilities. After you have imported documents into your Calibre library, the summary listing also displays entries for those documents, as shown in Figure 1.


Figure 1. The main Calibre window showing imported documents
Screenshot of the main window showing function buttons along the top with a three column area below containing category lists, document lists and a preview of the currently selected document


Selecting any document in your library displays a preview of that document in the right pane. The output formats in which that document is currently available are listed below the preview image.

After you have imported documents into Calibre, you can click icons in its toolbar to perform various tasks, including converting those documents to other formats and adding or augmenting document metadata. See the Calibre documentation or online help for a complete list of toolbar icons and the tasks their associated tasks.

Convert documents between electronic formats

Calibre's Convert books toolbar icon makes it easy to convert documents in your Calibre library to other electronic formats. Selecting one or more documents in your library and clicking this icon displays Calibre's primary conversion window, shown in Figure 2.


Figure 2. Calibre's primary conversion window
Screenshot of the conversion window showing the input and output format selectinons at the top.  Below that is a column of option windows with Metadata selected.  The rest of the window shows the Metadata options.


The Input format list at the upper left identifies the format of a selected publication, while the Output format list on the right enables you to select the electronic publishing format to which you want to convert that publication. Use the buttons in the left pane to customize various aspects of the conversion process specific to your input and output formats. These options let you customize the default conversion process by, for example, specifying logical expressions that identify the portions of the input publication that should be treated as entries for the table of contents in the converted document.

After you have made your customizations to the conversion process, clicking OK begins the conversion as a background operation and redisplays the main Calibre window. A status indicator in the lower-left corner identifies the number of conversion operations in progress. When the process is complete, Calibre updates the list of output formats in which a document is available to include the newly generated formats.


Add or modify document metadata with Calibre

Metadata, often referred to as data about data, is information about another piece of information. Documents typically contain metadata such as the author, publisher, and publication date. Calibre's Edit metadata command makes it easy to display the current metadata associated with a selected document, enabling you to add, edit, or remove document metadata. Selecting a document in your library and clicking the Edit metadata icon displays Calibre's Edit Metadata window, as shown in Figure 3.


Figure 3. Calibre's Edit Metadata window
Screenshot of the metadata window showing all of the options that                     can be changed and a tool to manage formats tied to the selected document


To add new metadata or edit existing entries, select the appropriate field and make your changes or additions. Click OK to save the modified metadata in the document, or click Cancel to discard your changes.


Automate electronic publishing using Calibre's command-line utilities

Installing Calibre also installs command-line tools for working with electronic publications. These tools include the ebook-convert application for converting documents and the ebook-meta application for adding or modifying document metadata. Calibre's command-line utilities eliminate the need to import documents into your Calibre library and are convenient if you are integrating document conversion or automatic metadata insertion into an automated document build or production system.

Calibre's command-line utilities provide the same capabilities available through its GUI, enabling or overriding specific behavior through command-line options and associated arguments. Each command-line utility provides a -h option, which displays a list of the options available for that command. For example, to see a complete list of the options available for the conversion process, execute the ebook-convert command; specify the input document, output document, and associated format (identified by file extension); and supply the -h option.


Tips and other tools for electronic document conversion

Calibre provides usable defaults for its conversion processes, but automatic format conversion is rarely perfect. Converting your documents to other formats usually benefits from iterative experimentation with available conversion options, typically involving an initial conversion, displaying the converted document to identify problems or areas for improvement in the conversion process, and re-converting the document after modifying your conversion settings.

The primary area for improving the conversion process is usually in identifying items in the table of contents and how links to associated portions of your document should be created. Figure 4 shows a converted document in Calibre's document previewer, which enables you to explore converted documents without copying them to your eReader device.


Figure 4. A converted document in Calibre's eBook previewer
Screenshot of the ebook previewer showing a table of contents for                     Technical Notes. A column of function buttons is on the left side.


Other ways of improving the conversion process include using other tools to perform an initial conversion to another format that may be easier for Calibre to convert. For example, when converting long technical documents, I sometimes find that the open source pdftohtml utility, provided on Linux systems as part of the poppler-utils package, does a better job of extracting text from documents in PDF format than Calibre's built-in text-to-HTML conversion. After converting documents manually with external tools, you can then use Calibre to generate electronic documents from the output files that the initial conversion process generates.

Many other open source tools exist for pre-conversion or augmenting and improving the documents that Calibre has converted. For example, Sigil is a graphical WYSIWYG editor for documents in EPUB format that simplifies fine-tuning your converted documents, as shown in Figure 5. (View a larger version of Figure 5.)


Figure 5. Fine-tuning a converted EPUB document in Sigil
Screenshot of an epub document editor with a typical layout for an xml editor


You might feel that editing converted documents reduces the time and cost savings that Calibre's document conversion capabilities provide. However, exploring converted documents in tools such as Sigil can also help identify ways you can improve the conversion process. Similarly, you can often improve the conversion process by using specific techniques in the tool in which your electronic publications were originally created. As an example, the Resources section provides links to presentations that show how to improve the conversion of electronic documents that were created in Adobe InDesign®. Though such tips often focus on internal EPUB generation and conversion capabilities, they can also be useful when using other conversion tools, such as Calibre.


Conclusion

Like traditional publishers, organizations are finding it convenient and cost-effective to deliver documentation in electronic formats. Today's electronic publishing technologies deliver eBooks and other documents in a variety of formats, not all of which are usable on all eReaders and applications.

Calibre, an open source application, makes it easy to convert documents between different electronic publishing formats. Organizations can create documents in one format and use Calibre to quickly convert them to other formats, making those documents portable and easy for both internal users and customers to use.


Resources

Learn

Get products and technologies

  • Download the Calibre installer for Windows operating systems from the Calibre Windows download page.

  • Download the latest Calibre installer for Mac OS X from the Mac OS X download page. The latest version of Calibre for Apple PowerPC systems or Intel-based Apple systems running Mac OS X Tiger is version 0.7.28, which is separately available from the Calibre site at SourceForge.

  • LexCycle's free Stanza application enables you to read electronic publications on the Apple iPod Touch, iPhone, and iPad platforms.

  • See the poppler-utils website and package for the pdftohtml utility, which simplifies extracting text from PDF files.

  • Sigil is a free cross-platform WYSIWYG editor for eBooks in ePub format.

Discuss

About the author

William von Hagen has been a writer and UNIX systems administrator for more than 20 years and a Linux advocate since 1993. Bill is the author or co-author of books on subjects such as Ubuntu Linux, Xen Virtualization, the GNU Compiler Collection (GCC), SUSE Linux, Mac OS X, Linux file systems, and SGML. He has also written numerous articles for Linux and Mac OS X publications and Web sites. You can reach Bill at wvh@vonhagen.org.

Report abuse help

Report abuse

Thank you. This entry has been flagged for moderator attention.


Report abuse help

Report abuse

Report abuse submission failed. Please try again later.


developerWorks: Sign in


Need an IBM ID?
Forgot your IBM ID?


Forgot your password?
Change your password

By clicking Submit, you agree to the developerWorks terms of use.

 


The first time you sign into developerWorks, a profile is created for you. Select information in your profile (name, country/region, and company) is displayed to the public and will accompany any content you post. You may update your IBM account at any time.

Choose your display name

The first time you sign in to developerWorks, a profile is created for you, so you need to choose a display name. Your display name accompanies the content you post on developerWorks.

Please choose a display name between 3-31 characters. Your display name must be unique in the developerWorks community and should not be your email address for privacy reasons.

(Must be between 3 – 31 characters.)

By clicking Submit, you agree to the developerWorks terms of use.

 


Rate this article

Comments

static.content.url=http://www.ibm.com/developerworks/js/artrating/
SITE_ID=1
Zone=Open source
ArticleID=743096
ArticleTitle=Manage and convert electronic publications using Calibre
publish-date=08022011
author1-email=wvh@vonhagen.org
author1-email-cc=