Making (Kindle) books from blogs

By: on July 30, 2012

Admittedly, more and more of them are doing this already, but this is a slightly more DIY option…

So, you’d like some more reading material for your Kindle. Maybe you’re going away for the holidays, or just want to survive the elongated journey times of the Olympic period. There’s a few blogs I’d like to read more of, but I’d never gotten around to because of the sheer volume of back-reading I’d need to do first. What if we could dump those blogs (and blog-like things) onto the Kindle? Glad you asked, as that’s exactly what I’ve put together.

The imaginatively named Book Blog does exactly that. I’ve provided a couple of examples of blogs I’ve been reading (365 Tomorrows and Harry Potter and the Methods of Rationality primarily). The format for specifying new books (in series.txt) is fairly readable, and mostly just consists of a series of Regex’s (being the the only sane way to parse arbitrary text) that grab the title, contents and “next” link from a given page.

The program proceeds to start from the first page (also in the config), grab a series of pages, shove the content into a header/footer that Calibre (which we’re using in a minute to actually create the books), and then write out the pages and a Table of Contents into a folder named after the blog plus a “book number”. Once it’s got enough entries (defaults to 20, but you can change that at the command line), it hands over to Calibre to do the heavy lifting of actually making the new Kindle file. It then keeps going with the next 20 entries from the blog, keeping trawling through until you get bored or it runs out of items. The entry count is simply to avoid having to wait ages to generate a really large book, but you can probably shove that number up if you’re not running off a little netbook like me.

I’ve been using this for about the last 2 months and it’s worked pretty well so far. There’s a few weird characters coming out of 365 Tomorrows that the Kindle wasn’t happy with that I’ve had to manually change, but otherwise all good. Only downside is that I now need more reading material. Suggestions anyone?


Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>