Skip to Content

What is ePub (and why does it matter for metadata and application profiles)?

Editorial note: A version of this post was previously published in the Application Profile Support Project blog.

ePub is a standard packaging format designed for ebook readers. Why is ePub of interest from the point of view of metadata and application profiles?

Here is the definition given in the entry in Wikipedia:

[...] ePub [...] is a free and open e-book standard by the International Digital Publishing Forum (IDPF). Files have the extension .epub. EPUB is designed for reflowable content, meaning that the text display can be optimized for the particular display device used by the reader of the EPUB-formatted book. The format is meant to function as a single format that publishers and conversion houses can use in-house, as well as for distribution and sale.

That is to say that ePub contains within it the Open Packaging Format (for convenience, we can ignore the other structural parts for the purposes of this discussion), which defines the structure of both the metadata for the item contained within the file and the presentational (XML, XHMTL, CSS) elements of the standard. It is similar in many ways to a .docx file (MS Word 2007 onwards) in being effectively a specialised type of .zip file.

So why is ePub of interest from the point of view of metadata and application profiles? The IDPF's Open Packaging Format gives this description:

Dublin Core metadata is designed to minimize the cataloging burden on authors and publishers, while providing enough metadata to be useful. This specification supports the set of Dublin Core 1.1 metadata elements (, supplemented with a small set of additional attributes addressing areas where more specific information is useful. For example, the OPF role attribute added to the Dublin Core creator and contributor elements allows for much more detailed specification of contributors to a publication, including their roles expressed via relator codes. Content providers must include a minimum set of metadata elements, defined in Section 2.2, and should incorporate additional metadata to enable readers to discover publications of interest.

In which case, how is the metadata contained within ePub any different to Dublin Core 1.1? This is the interesting part:

Because the Dublin Core metadata fields for creator and contributor do not distinguish roles of specific contributors (such as author, editor, and illustrator), this specification adds an optional role attribute for this purpose. See Section 2.2.6 for a discussion of role. To facilitate machine processing of Dublin Core creator and contributor fields, this specification adds the optional file-as attribute for those elements. This attribute is used to specify a normalized form of the contents. See Section 2.2.2 for a discussion of file-as. This specification also adds a scheme attribute to the Dublin Core identifier element to provide a structural mechanism to separate an identifier value from the system or authority that generated or defined that identifier value. See Section 2.2.10 for a discussion of scheme. This specification also adds an event attribute to the Dublin Core date element to enable content providers to distinguish various publication specific dates (for example, creation, publication, modification). See Section 2.2.7 for a discussion of event.

Using these addition attributes, it is possible to define more accurately what certain fields contain, a standard, normalised format for agent metadata such as personal names, schemes defining the format in which a particular field is expected to appear, identifiers to provide a mechanism to link that metadata to the generating system or authority, and events to describe more accurately the events that have occurred during the life cycle of the item. By applying such constraints that are beyond the scope of DC 1.1, the ePub format effectively contains a de facto application profile, identified by its own namespace. Further, ad hoc metadata can be added using the (X)HTML meta element:

One or more optional instances of a meta element, analogous to the XHTML 1.1 meta element but applicable to the publication as a whole, may be placed within the metadata element [...]. This allows content providers to express arbitrary metadata beyond the data described by the Dublin Core specification. Individual OPS Content Documents may include the meta element directly (as in XHTML 1.1) for document-specific metadata. This specification uses the OPF Package Document alone as the basis for expressing publication-level Dublin Core metadata.

It would seem, however, that this last option suffers from the weakness that such metadata is invented on the fly, and does not have to follow the constraints of any schema or authority. Nonetheless, it would seem overall that the ePub "application profile" does significantly add to the functionality DC 1.1 in a potentially useful way. Different types of agent defined in DC, such as creator, contributor, can be further defined, for example author, editor, illustrator and thesis supervisor for higher degrees. Potentially, this could be leveraged for use with a number of different types of resources and for various purposes, although ePub by it's very nature is designed for reflowable content, which by and large means textual resources such as books, articles, manuals and so on. Illustrations, tables, charts, images and other non-reflowable content can potentially create a problem on the small screens of mobile devices such as ebook readers. The structure of this application profile is very simple and easy to use, unlike for example the classic form of SWAP, whose structure is based directly upon its conceptual data model, a simplified version of FRBR. It would be extremely interesting to compare the two, since they are fundamentally similar, relatively simple solutions that are limited in scope to online publications and similar resources. It would be most revealing to see whether what SWAP seeks to achieve can be done in a simpler way, and whether either SWAP or the ePub application profile have functionality that the other cannot provide. Ultimately, the purpose of this investigation could be to provide online textual content, for example in repositories, via increasingly popular hand-held devices, and to capitalise on the rapid growth of commercial ebooks. It would probably be necessary to provide .epub files in such systems as well as the usual .pdf and .doc(x) formats that are common in publishing, and consequently in institutional repositories. Either this would need to be done by converting the existing content, and likewise new content after it is deposited, or in addition by providing tools to enable the ePub format to be more immediately accessible to service providers and depositors in future.

Dr. Radut | blog_post_tf