The ePub file format can be an intimidating beast. But in essence it’s quite simple – a bunch of XHTML pages plus accompanying metadata in XML files all bundled up in an ordinary zip file.

That’s good news because it means you can unzip the file, hack away at the contents (as we’ll be seeing in the next post, about fixing Adobe InDesign CS3’s ePub shortcomings) and then just zip it up again.

But there is a gotcha. The e-book readers that work with ePub files can be very picky about how the zip file is put together. And the ePub specification itself is highly specific. So here are a couple of quick tips on how to avoid problems.

I use OS X and Linux, but Windows users should be able to adapt these comments to their own environment.

Unzip the file

Clearly, your first step is to unzip the file. Linux users, being the geeks they are, will naturally head straight for the command line to do this, and quite rightly so. Mac users are accustomed to unzipping files by double-clicking on them, but this is unlikely to work with a file that has the .epub extension – you’re more likely to end up opening the file in your favourite e-reader. Ditto for Windows users.

You could use a utility like StuffIt Expander. But given that much of the rest of what I’m discussing here will be done from the command line, you might as well go ahead and open a terminal window.

Let’s assume your ePub file is called MyEbook.epub. For the sake of simplicity, you might want to have this sitting in a folder by itself.

From the command line, cd to the folder holding the file. Then simply:

unzip MyEbook.epub

The zip file will spew its contents into the folder. Hack away to your heart’s content until you’re ready to zip up the file again.

Zip the ePub file

Among the files you’ll have found among the contents of the ePub document is one called mimetype. This is the critical one.

E-book readers require that the mimetype file is the first one in the zip document. What’s more, to be fully compliant, this file should start at a very specific point – a 30-byte offset from the beginning of the zip file (so that the mimetype text itself starts at byte 38).

If this sounds intimidating, don’t worry. It’s actually quite easy to achieve if you’re careful.

At this point, you might want to move the original MyEbook.epub file out of the way (or delete it, if you’re working with a copy, which is a sensible thing to do). To start creating your ePub file, use the following:

zip -X MyNewEbook.epub mimetype

I’ve given the new ePub file a different name in case you ignored my advice about moving the original out of the way.

The key element here is that -X flag. It tells zip to ignore file ‘extras’ – metadata such as permissions, etc. If you don’t use this flag, the contents of the mimetype file will be placed at the wrong position in the zip file. E-book readers may complain that the file contains formatting errors. And tools such as epubcheck (more of that in the next post) will tell you that the ePub file has the wrong mimetype – even when it has the correct mimetype, just in a slightly incorrect position. That can lead to all sorts of confusion.

You can then go ahead and add the rest of the files to the MyNewEbook.epub zip/epub file you’ve just created. Which files you need to add will depend on how the ePub file was put together in the first place.

I use InDesign CS3 for creating ePub files. These contain the mimetype file plus two directories – META-INF (containing one metadata file) and OEBPS (containing the book files themselves, images, more metadata etc).

So I use the commands:

zip -rg MyNewEbook.epub META-INF -x \*.DS_Store
zip -rg MyNewEbook.epub OEBPS -x \*.DS_Store

Some explanations necessary here. We start each line with two flags:

-r    (recursive)

This means move down through any directories/folders recursively, ensuring that everything in the folders specified gets included

-g    (grow file)

This means add to an existing zip file rather than creating a new one or overwriting an existing one. If you don’t use this flag, the file you started with the mimetype file, above, will get overwritten.

-x \*.DS_Store    (exclude)

This is just for Mac users. It tells zip to ignore the .DS_Store hidden file that is found in most Mac OS X folders.

And that’s it. To make things easier, I’ve put these commands into a shell script which is in my PATH for Bash sessions.

If you’re uncomfortable with the command line, and still prefer to use GUI-based zip utilities, the above should give you enough information on which to make sensible choices about settings.