Kindlegen Bloat: Strip it or Leave it?

If you’ve worked with many ePubs and converted them with Kindlegen while paying attention to file sizes, you might well have noticed that the Kindle files are much larger than the original ePubs — typically around twice as large. It turns out that Kindlegen stashes a copy of the original ePub sources inside its output file, which explains the doubling of size. Unfortunately, this can lead to super-sized Kindle files, particularly in heavily-illustrated works, since images get duplicated along with everything else.

To see how you can strip out this bloat, head over to the Mobileread forums, in particular to pdurrant’s post about his excellent Kindlestrip Python script and AppleScript wrapper. Note that if you don’t already have Python, you’ll need to download and install it before you can use this script.

For many people (me too!) the immediate instinct will be to run the above script on everything, in order to minimize file sizes, transfer/bandwidth requirements, and space on our customers’ Kindle devices. However … consider why Amazon might have designed their conversion tool to force the inclusion of this source information in its output file; they hardly did so by accident. Are they planning to use the stashed information in future versions of their Kindle hardware and software, to enable future Kindles to get closer to the layout of the original ePub? Will future generations of Kindle support ePub directly, and try to find ePub files inside Kindle files? If so, then clearly there’s a trade-off involved in the decision of whether to strip the file.

Personally, I’d consider heavily illustrated works as clear candidates for stripping, since the duplication of images is so costly. And, if you’ve designed your ePub to be fully “Kindle-friendly” then maybe there’s little benefit to leaving the extra information in place, since the basic Kindle version will already be fine. However, for Kindle files that suffer by comparison to their ePub originals, or where the overhead is negligible, you might want to consider leaving the bloat in place, in the hope that it might come in useful one day.

6 thoughts on “Kindlegen Bloat: Strip it or Leave it?”

  1. Those were my thoughts exactly. It seems to me that Amazon needs to abandon the mobipocket format for a variety of reasons, and this is their process for making the transition. I just wish they could give a timeline for when this support will occur.

    Anyway, with some exceptions Mobipocket is a terrible format for presenting images, mainly because it doesn’t support floats.

    1. You can load the e-books to your Kindle using the USB connection to the coumtper. It is made much easier if you use a program like Calibre (which is freeware) to facilitate the transfer. You can use your PC or laptop to download an e-book from the internet to your coumtper. Then use´╗┐ Calibre to send it to your Kindle in the proper format.

  2. In most cases, there’s nothing useful Amazon can really do with the source content. Most folks have to hack up their source HTML and CSS so severely to get KindleGen and the Kindle readers to render them acceptably that they barely resemble standards-compliant EPUB content anymore.

    There’s no good way to single-source anything remotely complex between Kindle and EPUB. You run into too many places where you have to manually include HTML attributes to work around the minimal CSS support in older Kindles, and those HTML attributes are illegal in (valid) EPUB books. For example, if you want one line in reverse text, you might have a div with the attribute bgcolor="#000000" because the older readers don’t handle the background-color CSS property, and you might have a font tag with a color="#FFFFFF" attribute because the older readers don’t handle the color CSS property. Although I think the font tag might be legal in EPUB, the bgcolor attribute on a div definitely isn’t legal XHTML, so it’s not legal in EPUB, either.

    To make matters worse, KindleGen translates certain CSS properties into HTML attributes on its own. The result is a nasty mess in which certain CSS properties appear to work under certain circumstances and not in others. This is further compounded by KindleGen’s lack of support for CSS selectors in any useful sense of the word. It incorrectly interprets nearly every selector in nearly every situation, resulting in the wrong styles getting added to the wrong elements. If a newer Kindle reader knows how to interpret the CSS, too, you’ll likely get some fascinating bugs that suddenly appear.

    More significantly, many CSS properties that are simply ignored by earlier versions of Kindle get handled by Kindle Fire. This often results in significant breakage. Also, I suspect the Kindle Fire probably supports parsing the HTML class attribute correctly, whereas the KindleGen (at least in 1.x) translation code does not. This is sure to catch some people by surprise. In HTML, the class attribute contains a list of CSS classes to apply to the element. KindleGen supports only a single class (either the first one in the list or the last one, I forget which). Thus, if they just reprocessed it from the previous source, there’s a good chance that the Kindle Fire would start seeing styles that were previously ignored entirely during the translation process.

    So, yeah, in theory, Amazon could do something with that EPUB content, but in practice, it would be pretty dangerous for them to do so. At least in my experience, as Amazon enables new functionality, it often takes a *lot* of additional work to make things work cleanly, largely because you’re trying to pull back in all the nice formatting from your EPUB without breaking all the severe, hackish workarounds you have to add just to make the older Kindles display something that’s remotely acceptable. Amazon really should have made a clean break with the MOBI format. The more they try to hack new support into MOBI, the harder it gets to support their readers. Maintaining fallback compatibility is hard enough when it’s clean CSS. Maintaining backwards compatibility with a hodgepodge of CSS and HTML style hacks is an absolute nightmare.

    1. In most cases, there’s nothing ueufsl Amazon can really do with the source content. Most folks have to hack up their source HTML and CSS so severely to get KindleGen and the Kindle readers to render them acceptably that they barely resemble standards-compliant EPUB content anymore.There’s no good way to single-source anything remotely complex between Kindle and EPUB. You run into too many places where you have to manually include HTML attributes to work around the minimal CSS support in older Kindles, and those HTML attributes are illegal in (valid) EPUB books. For example, if you want one line in reverse text, you might have a div with the attribute bgcolor=”#000000″ because the older readers don’t handle the background-color CSS property, and you might have a font tag with a color=”#FFFFFF” attribute because the older readers don’t handle the color CSS property. Although I think the font tag might be legal in EPUB, the bgcolor attribute on a div definitely isn’t legal XHTML, so it’s not legal in EPUB, either.To make matters worse, KindleGen translates certain CSS properties into HTML attributes on its own. The result is a nasty mess in which certain CSS properties appear to work under certain circumstances and not in others. This is further compounded by KindleGen’s lack of support for CSS selectors in any ueufsl sense of the word. It incorrectly interprets nearly every selector in nearly every situation, resulting in the wrong styles getting added to the wrong elements. If a newer Kindle reader knows how to interpret the CSS, too, you’ll likely get some fascinating bugs that suddenly appear.More significantly, many CSS properties that are simply ignored by earlier versions of Kindle get handled by Kindle Fire. This often results in significant breakage. Also, I suspect the Kindle Fire probably supports parsing the HTML class attribute correctly, whereas the KindleGen (at least in 1.x) translation code does not. This is sure to catch some people by surprise. In HTML, the class attribute contains a list of CSS classes to apply to the element. KindleGen supports only a single class (either the first one in the list or the last one, I forget which). Thus, if they just reprocessed it from the previous source, there’s a good chance that the Kindle Fire would start seeing styles that were previously ignored entirely during the translation process.So, yeah, in theory, Amazon could do something with that EPUB content, but in practice, it would be pretty dangerous for them to do so. At least in my experience, as Amazon enables new functionality, it often takes a *lot* of additional work to make things work cleanly, largely because you’re trying to pull back in all the nice formatting from your EPUB without breaking all the severe, hackish workarounds you have to add just to make the older Kindles display something that’s remotely acceptable. Amazon really should have made a clean break with the MOBI format. The more they try to hack new support into MOBI, the harder it gets to support their readers. Maintaining fallback compatibility is hard enough when it’s clean CSS. Maintaining backwards compatibility with a hodgepodge of CSS and HTML style hacks is an absolute nightmare.

  3. This thread on Amazon’s forums may shine some light on the matter — it seems we don’t need to be so concerned about bloat because the upload size of the kindlegen output does not reflect the download size of the book itself.

    This doesn’t help those not putting books up via Amazon, however, or give authors a feel for how big the downloads will be for each individual device type.

    http://www.amazon.com/forum/kindle%20publishing?_encoding=UTF8&cdForum=Fx21HB0U7MPK8XI&cdThread=Tx1LY7T1QW631YK

    Hello from Amazon Kindle Publishing Team,
    When you create a Mobi using Kindlegen we create and store multiple versions of the content in order to provide the best customer experience. Eg. The image needs for the Kindle device are different from the image needs on iPad. This is the cause of the file size increase. When we fulfill your book to customers, we only send the content version that is most appropriate for that device. Amazon sends the trimmed content to the reading device and hence this increased file size during book compilation does not affect the download charges. Thus, we try providing the highest quality content to our customers while minimizing costs.

Leave a Reply

Your email address will not be published. Required fields are marked *