Building your first EPUB
A minimally conforming EPUB bundle has several required files. The specification can be quite strict about the format, contents, and location of those files within the EPUB archive. This section explains what you must know when you work with the EPUB standard.
The basic structure of a minimal EPUB file follows the pattern in Listing 1. When ready for distribution, this directory structure is bundled together into a ZIP-format file, with a few special requirements discussed in Bundling your EPUB file as a ZIP archive.
Listing 1. Directory and file layout for a simple EPUB archive
mimetype
META-INF/
container.xml
OEBPS/
content.opf
title.html
content.html
stylesheet.css
toc.ncx
images/
cover.png
|
Note: A sample book following this pattern is available from Downloads, but I recommend that you create your own as you follow the tutorial.
To start building your EPUB book, create a directory for the EPUB project. Open a text editor or an IDE such as Eclipse. I recommend using an editor that has an XML mode—in particular, one that can validate against the Relax NG schemas listed in Resources.
This one's pretty easy: The mimetype file is required and must be named mimetype. The contents of the file are always:
application/epub+zip |
Note that the mimetype file cannot contain any newlines or carriage returns.
Additionally, the mimetype file must be the first file in the ZIP archive and must not itself be compressed. You'll see how to include it using common ZIP arguments in Bundling your EPUB file as a ZIP archive. For now, just create this file and save it, making sure that it's at the root level of your EPUB project.
At the root level of the EPUB, there must be a META-INF directory, and it must contain a file named container.xml. EPUB reading systems will look for this file first, as it points to the location of the metadata for the digital book.
Create a directory called META-INF. Inside it, open a new file called container.xml for writing. The container file is very small, but its structural requirements are strict. Paste the code in Listing 2 into META-INF/container.xml.
Listing 2. Sample container.xml file
<?xml version="1.0"?>
<container version="1.0" xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
<rootfiles>
<rootfile full-path="OEBPS/content.opf"
media-type="application/oebps-package+xml" />
</rootfiles>
</container>
|
The value of full-path (in bold) is the only
part of this file that will ever vary. The directory path must be
relative to the root of the EPUB file itself, not relative to the META-INF
directory.
The mimetype and container files are the only two whose location in the EPUB archive are strictly controlled. As recommended (although not required), store the remaining files in the EPUB in a sub-directory. (By convention, this is usually called OEBPS, for Open eBook Publication Structure, but can be whatever you like.)
Next, create the directory named OEBPS in your EPUB project. The following section of this tutorial covers the files that go into OEBPS—the real meat of the digital book: its metadata and its pages.
Open Packaging Format metadata file
Although this file can be named anything, the OPF file is conventionally called content.opf. It specifies the location of all the content of the book, from its text to other media such as images. It also points to another metadata file, the Navigation Center eXtended (NCX) table of contents.
The OPF file is the most complex metadata in the EPUB specification. Create OEBPS/content.opf, and paste the contents of Listing 3 into it.
Listing 3. OPF content file with sample metadata
<?xml version='1.0' encoding='utf-8'?>
<package xmlns="http://www.idpf.org/2007/opf"
xmlns:dc="http://purl.org/dc/elements/1.1/"
unique-identifier="bookid" version="2.0">
<metadata>
<dc:title>Hello World: My First EPUB</dc:title>
<dc:creator>My Name</dc:creator>
<dc:identifier
id="bookid">urn:uuid:0cc33cbd-94e2-49c1-909a-72ae16bc2658</dc:identifier>
<dc:language>en-US</dc:language>
<meta name="cover" content="cover-image" />
</metadata>
<manifest>
<item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/>
<item id="cover" href="title.html" media-type="application/xhtml+xml"/>
<item id="content" href="content.html"
media-type="application/xhtml+xml"/>
<item id="cover-image" href="images/cover.png" media-type="image/png"/>
<item id="css" href="stylesheet.css" media-type="text/css"/>
</manifest>
<spine toc="ncx">
<itemref idref="cover" linear="no"/>
<itemref idref="content"/>
</spine>
<guide>
<reference href="title.html" type="cover" title="Cover"/>
</guide>
</package>
|
The OPF document itself must use the namespace http://www.idpf.org/2007/opf, and the metadata will be in the Dublin Core Metadata Initiative (DCMI) namespace, http://purl.org/dc/elements/1.1/.
This would be a good time to add the OPF and DCMI schema to your XML editor. All the schemas used in EPUB are available from Downloads.
Dublin Core defines a set of common metadata terms that you can use to describe a wide variety of digital materials; it's not part of the EPUB specification itself. Any of these terms are allowed in the OPF metadata section. When you build an EPUB for distribution, include as much detail as you can here, although the extract provided in Listing 4 is sufficient to start.
Listing 4. Extract of OPF metadata
... <metadata> <dc:title>Hello World: My First EPUB</dc:title> <dc:creator>My Name</dc:creator> <dc:identifier id="bookid">urn:uuid:12345</dc:identifier> <meta name="cover" content="cover-image" /> </metadata> ... |
The two required terms are title and identifier. According
to the EPUB specification, the identifier must be a unique value,
although it's up to the digital book creator to define that unique value.
For book publishers, this field will typically contain an ISBN or Library of
Congress number. For other EPUB creators, consider using a URL or
a large, randomly generated unique user ID (UUID). Note that the value
of the attribute unique-identifier must match
the ID attribute of the dc:identifier element.
Other metadata to consider adding, if it's relevant to your content, include:
- Language (as
dc:language). - Publication date (as
dc:date). - Publisher (as
dc:publisher). (This can be your company or individual name.) - Copyright information (as
dc:rights). (If releasing the work under a Creative Commons license, put the URL for the license here.)
See Resources for more information on DCMI.
Including a meta element with the
name attribute containing
cover is not part of the EPUB specification
directly, but is a recommended way to make cover pages and images
more portable. Some EPUB renderers prefer to use an image file as the
cover, while others will use an XHTML file containing an inlined cover
image. This example shows both forms. The value of the
meta element's content
attribute should be the ID of the book's cover image in the manifest,
which is the next part of the OPF file.
The OPF manifest lists all the resources found in the EPUB that are part of the content (and excluding metadata). This usually means a list of XHTML files that make up the text of the eBook plus some number of related media such as images. EPUB encourages the use of CSS for styling book content, so CSS files are also included in the manifest. Every file that goes into your digital book must be listed in the manifest.
Listing 5 shows the extracted manifest section.
Listing 5. Extract of OPF manifest
... <manifest> <item id="ncx" href="toc.ncx" media-type="application/x-dtbncx+xml"/> <item id="cover" href="title.html" media-type="application/xhtml+xml"/> <item id="content" href="content.html" media-type="application/xhtml+xml"/> <item id="cover-image" href="images/cover.png" media-type="image/png"/> <item id="css" href="stylesheet.css" media-type="text/css"/> </manifest> ... |
You must include the first item, toc.ncx (discussed in
the next section). Note that all items have
an appropriate media-type value and
that the media type for the XHTML content is
application/xhtml+xml. This exact value
is required and cannot be text/html or
some other type.
EPUB supports four image file formats as core types: Joint Photographic Experts Group (JPEG), Portable Network Graphics (PNG), Graphics Interchange Format (GIF), and Scalable Vector Graphics (SVG). You can include non-supported file types if you provide a fall-back to a core type. See the OPF specification for more information on fall-back items.
The values of the href attribute should be
a Uniform Resource Identifier (URI) that is relative to the OPF file.
(This is easy to confuse with the reference to the OPF file in the
container.xml file, where it must be relative to the EPUB as a whole.) In
this case, the OPF file is in the same OEBPS directory as your content,
so no path information is required here.
Although the manifest tells the EPUB reader which files are part of the archive, the spine indicates the order in which they appear, or—in EPUB terms—the linear reading order of the digital book. One way to think of the OPF spine is that it defines the order of the "pages" of the book. The spine is read in document order, from top to bottom. Listing 6 shows an extract from the OPF file.
Listing 6. Extract of OPF spine
... <spine toc="ncx"> <itemref idref="cover" linear="no"/> <itemref idref="content"/> </spine> ... |
Each itemref element has a required attribute
idref, which must match one of the IDs in
the manifest.
The toc attribute is also required. It references an ID in the
manifest that must indicate the file name of the NCX table of contents.
The linear attribute in the spine indicates whether
the item is considered part of the linear reading order versus being
extraneous front- or end-matter. I recommend that you define any
cover page as linear=no. Conforming EPUB
reading systems will open the book to the first item in the spine that's
not set as linear=no.
The last part of the OPF content file is the guide. This section is optional but recommended. Listing 7 shows an extract from a guide file.
Listing 7. Extract of an OPF guide
... <guide> <reference href="cover.html" type="cover" title="Cover"/> </guide> ... |
The guide is a way of providing semantic information to an EPUB reading system. While the manifest defines the physical resources in the EPUB and the spine provides information about their order, the guide explains what the sections mean. Here's a partial list of the values that are allowed in the OPF guide:
cover: The book covertitle-page: A page with author and publisher informationtoc: The table of contents
For a complete list, see the OPF 2.0 specification, available from Resources.
Although the OCF file is defined as part of EPUB itself, the last major metadata file is borrowed from a different digital book standard. DAISY is a consortium that develops data formats for readers who are unable to use traditional books, often because of visual impairments or the inability to manipulate printed works. EPUB has borrowed DAISY's NCX DTD. The NCX defines the table of contents of the digital book. In complex books, it is typically hierarchical, containing nested parts, chapters, and sections.
Using your XML editor, create OEBPS/toc.ncx, and include the code in Listing 8.
Listing 8. Simple NCX file
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE ncx PUBLIC "-//NISO//DTD ncx 2005-1//EN"
"http://www.daisy.org/z3986/2005/ncx-2005-1.dtd">
<ncx xmlns="http://www.daisy.org/z3986/2005/ncx/" version="2005-1">
<head>
<meta name="dtb:uid"
content="urn:uuid:0cc33cbd-94e2-49c1-909a-72ae16bc2658"/>
<meta name="dtb:depth" content="1"/>
<meta name="dtb:totalPageCount" content="0"/>
<meta name="dtb:maxPageNumber" content="0"/>
</head>
<docTitle>
<text>Hello World: My First EPUB</text>
</docTitle>
<navMap>
<navPoint id="navpoint-1" playOrder="1">
<navLabel>
<text>Book cover</text>
</navLabel>
<content src="title.html"/>
</navPoint>
<navPoint id="navpoint-2" playOrder="2">
<navLabel>
<text>Contents</text>
</navLabel>
<content src="content.html"/>
</navPoint>
</navMap>
</ncx>
|
The DTD requires four meta elements inside the NCX
<head> tag:
uid: Is the unique ID for the digital book. This element should match thedc:identifierin the OPF file.depth: Reflects the level of the hierarchy in the table of contents. This example has only one level, so this value is 1.totalPageCountandmaxPageNumber: Apply only to paper books and can be left at 0.
The contents of docTitle/text is the title
of the work, and matches the value of dc:title in
the OPF.
The navMap is the most important part of the
NCX file, as it defines the table of contents for the actual book. The
navMap contains one or more
navPoint elements. Each
navPoint must contain the following
elements:
- A
playOrderattribute, which reflects the reading order of the document. This follows the same order as the list ofitemrefelements in the OPF spine. - A
navLabel/textelement, which describes the title for this section of the book. This is typically a chapter title or number, such as "Chapter One," or—as in this example—"Cover page." - A
contentelement whosesrcattribute points to the physical resource containing the content. This will be a file declared in the OPF manifest. (It is also acceptable to use fragment identifiers here to point to anchors within XHTML content—for example,content.html#footnote1.) - Optionally, one or more child
navPointelements. Nested points are how hierarchical documents are expressed in the NCX.
The structure of the sample book is simple: It has only two pages, and they are
not nested. That means that you'll have two navPoint
elements with ascending playOrder values,
starting at 1. In the NCX, you have the opportunity to name these
sections, allowing readers to jump into different parts of the eBook.
Now you know all the metadata required in EPUB, so it's time to put in the actual book content. You can use the sample content provided in Downloads or create your own, as long as the file names match the metadata.
Next, create these files and folder:
- title.html: This file will be the title page for the book. Create
this file and include an
imgelement that references a cover image, with the value of thesrcattribute asimages/cover.png. - images: Create this folder inside OEBPS, then copy the sample image (or create your own), naming it cover.png.
- content.html: This will be the actual text of the book.
- stylesheet.css: Place this file in the same OEBPS directory as the XHTML files. This file can contain any CSS declarations you like, such as setting the font-face or text color. See Listing 10 for an example of such a CSS file.
Listing 9 contains an example of a valid EPUB content page. Use this sample for your title page (title.html) and a similar one for the main content page (content.html) of your book.
Listing 9. Sample title page (title.html)
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Hello World: My First EPUB</title>
<link type="text/css" rel="stylesheet" href="stylesheet.css" />
</head>
<body>
<h1>Hello World: My First EPUB</h1>
<div><img src="images/cover.png" alt="Title page"/></div>
</body>
</html>
|
XHTML content in EPUB follows a few rules that might be unfamiliar to you from general Web development:
- The content must validate as XHTML 1.1: The only significant
difference between XHTML 1.0 Strict and XHTML 1.1 is that the
nameattribute has been removed. (Use IDs to refer to anchors within content.) imgelements can only reference images that are local to the eBook: The elements cannot reference images on the Web.scriptblocks should be avoided: There is no requirement for EPUB readers to support JavaScript code.
There are some minor differences in the way EPUB supports CSS, but none that affect common uses of styles (consult the OPS specification for details). Listing 10 demonstrates a simple CSS file that you can apply to the content to set basic font guidelines and to color headings in red.
Listing 10. Sample styles for the eBook (stylesheet.css)
body {
font-family: sans-serif;
}
h1,h2,h3,h4 {
font-family: serif;
color: red;
} |
One point of interest is that EPUB specifically supports the CSS 2
@font-face rule, which allows for embedded
fonts. If you create technical documentation, this is probably not
relevant, but developers who build EPUBs in multiple languages or for
specialized domains will appreciate the ability to specify exact font data.
You now have everything you need to create your first EPUB book. In the next section, you'll bundle the book according to the OCF specifications and find out how to validate it.




