USING A TEXT EDITOR TO HANDCRAFT AN EBOOK Nothing like a well-bound EPUB to soothe the soul. TABLE OF CONTENTS PREFACE CRAFTING THE EPUB COMPILING THE BOOK ADDING CONTENT TO THE BOOK FINAL THOUGHTS PREFACE The following is my evening journey to figure out why an epub generated by "pandoc" could not be uploaded to Google Books. The content is a Wikipedia article. I contact Google support (I was impressed any support existed), and they were pleasantly helpful. By default I was directed to the "Partner Program" support presumably because I signed up just to see it, not even use it, which means they probably get priority, which probably explains my great experience! Barish directs me to an EPUB validator (https://github.com/w3c/epubcheck) and tells me I could get in contact with the other team if I had anything to report. So I look up "EPUB specification" and decide the best place to start is probably the container format. Turns out it's just the zip archive format. At this point I begin to search for the keyword "require" over the three other documents and quickly I learn the minimum requirements. Impressed doing the search with that keyword yielded great information, I began to craft. CRAFTING THE EPUB There are only 4 files required to define an EPUB. Below is a copy of the file content to visually demonstrate the smallest strictly-valid EPUB, and for the reader to see how ids and files are tied together. :::::::::::::: mimetype :::::::::::::: application/epub+zip :::::::::::::: META-INF/container.xml :::::::::::::: :::::::::::::: EPUB/index.opf :::::::::::::: urn:uuid:A1B0D67E-2E81-4DF5-9E67-A64CBE366809 Handmade en 2011-01-01T12:00:00Z :::::::::::::: EPUB/nav.xhtml :::::::::::::: 0 With the new smallest EPUB, I upload it against Google Books and it accepts it no problem! Having my "control" EPUB, I could now begin to reduce its size further. This involved about an hour of removing parts and reloading many many times. I'm not going to include those edits simply because they can change over time and quickly become useless. Google Books is for sure not using the strictest epubcheck rules, or is not using epubcheck at all. They do fund epubcheck as a big sponsor, so that is something to consider too. These files can now act as the basis for which books can be created easily and uploaded or read on devices. COMPILING THE BOOK The mimetype file must be the first file in the archive, and specifically with no "zip extra field". So your zipping will look like this: zip -X -r Handmade.epub mimetype * (-X is for "no extra fields") When zipping be sure there is no "prefix" directory (the name of the directory which originally held the files). ADDING CONTENT TO THE BOOK The book content must be in XML. Wikipedia has lots of nice content, but the wikitext needs to be converted. Download php, composer and download the parsoid project. Install parsoid's dependencies with `composer install` so we can use it to convert the wikitext to XML. Now you can use the converter: echo "test" | php bin/parser.php And you'll get HTML/XML as output. https://en.wikipedia.org/wiki/MYPAGE?action=raw can be used to get the wikitext of any Wikipedia page. Combine it with a small script to wrap it in the proper mark-up, and any article can easily be turned into a book: PAGE="Hedonism" echo '' > out.xml \ && curl "https://en.wikipedia.org/wiki/$PAGE?action=raw" \ | php bin/parser.php >> out.xhtml \ && echo '' >> out.xhtml Unfortunately this does not correct the issue of references which don't exist. Either by automation or manually, all links to images and external sources will need to be removed. Using "sed" can do most of the work, but a proper HTML/XML parser should be used to remove problematic data. After open EPUB/index.opf, add the new entry to the manifest and spine elements. Run "zip" to create the EPUB and upload it. You can now read Wikipedia articles and pick up where you left off any device which runs Google Books. FINAL THOUGHTS 1. It would be nice to have an extension which can automate the whole process. 2. I wonder what Google Books's engine does when looking at externalreferences. Potential security vulnerabilities? 3. Writing a basic program to take a Wikipedia URL and output an EPUB should be trivial for anyone who wants to make real use of this (like myself). 4. EPUB is actually a quite nice format if you ignore it's all in XML.