USING A TEXT EDITOR TO HANDCRAFT AN EBOOK
Nothing like a well-bound EPUB to soothe the soul.
TABLE OF CONTENTS
PREFACE
CRAFTING THE EPUB
COMPILING THE BOOK
ADDING CONTENT TO THE BOOK
FINAL THOUGHTS
PREFACE
The following is my evening journey to figure out why an epub generated by
"pandoc" could not be uploaded to Google Books. The content is a Wikipedia
article.
I contact Google support (I was impressed any support existed), and they were
pleasantly helpful. By default I was directed to the "Partner Program" support
presumably because I signed up just to see it, not even use it, which means they
probably get priority, which probably explains my great experience!
Barish directs me to an EPUB validator (https://github.com/w3c/epubcheck) and
tells me I could get in contact with the other team if I had anything to report.
So I look up "EPUB specification" and decide the best place to start is
probably the container format. Turns out it's just the zip archive format.
At this point I begin to search for the keyword "require" over the three other
documents and quickly I learn the minimum requirements. Impressed doing the
search with that keyword yielded great information, I began to craft.
CRAFTING THE EPUB
There are only 4 files required to define an EPUB.
Below is a copy of the file content to visually demonstrate the smallest
strictly-valid EPUB, and for the reader to see how ids and files are tied
together.
::::::::::::::
mimetype
::::::::::::::
application/epub+zip
::::::::::::::
META-INF/container.xml
::::::::::::::
::::::::::::::
EPUB/index.opf
::::::::::::::
urn:uuid:A1B0D67E-2E81-4DF5-9E67-A64CBE366809
Handmadeen
2011-01-01T12:00:00Z
::::::::::::::
EPUB/nav.xhtml
::::::::::::::
0
With the new smallest EPUB, I upload it against Google Books and it accepts it
no problem! Having my "control" EPUB, I could now begin to reduce its size
further. This involved about an hour of removing parts and reloading many many
times. I'm not going to include those edits simply because they can change over
time and quickly become useless. Google Books is for sure not using the
strictest epubcheck rules, or is not using epubcheck at all. They do fund
epubcheck as a big sponsor, so that is something to consider too.
These files can now act as the basis for which books can be created easily and
uploaded or read on devices.
COMPILING THE BOOK
The mimetype file must be the first file in the archive, and specifically with
no "zip extra field". So your zipping will look like this:
zip -X -r Handmade.epub mimetype *
(-X is for "no extra fields")
When zipping be sure there is no "prefix" directory (the name of the directory
which originally held the files).
ADDING CONTENT TO THE BOOK
The book content must be in XML. Wikipedia has lots of nice content, but the
wikitext needs to be converted. Download php, composer and download the parsoid
project. Install parsoid's dependencies with `composer install` so we can use
it to convert the wikitext to XML.
Now you can use the converter:
echo "test" | php bin/parser.php
And you'll get HTML/XML as output.
https://en.wikipedia.org/wiki/MYPAGE?action=raw can be used to get the wikitext
of any Wikipedia page. Combine it with a small script to wrap it in the proper
mark-up, and any article can easily be turned into a book:
PAGE="Hedonism"
echo '' > out.xml \
&& curl "https://en.wikipedia.org/wiki/$PAGE?action=raw" \
| php bin/parser.php >> out.xhtml \
&& echo '' >> out.xhtml
Unfortunately this does not correct the issue of references which don't exist.
Either by automation or manually, all links to images and external sources will
need to be removed. Using "sed" can do most of the work, but a proper HTML/XML
parser should be used to remove problematic data.
After open EPUB/index.opf, add the new entry to the manifest and spine
elements. Run "zip" to create the EPUB and upload it.
You can now read Wikipedia articles and pick up where you left off any device
which runs Google Books.
FINAL THOUGHTS
1. It would be nice to have an extension which can automate the whole process.
2. I wonder what Google Books's engine does when looking at externalreferences.
Potential security vulnerabilities?
3. Writing a basic program to take a Wikipedia URL and output an EPUB should be
trivial for anyone who wants to make real use of this (like myself).
4. EPUB is actually a quite nice format if you ignore it's all in XML.