Sunday, August 15, 2010

simple script to extract Stanza epub ebooks from an iTunes backup

iTunes and the iPhone don't like to let go of data they have their claws into. Sometimes, however, you might need that data outside the Apple Garden, in which case you're going to have to get your hands dirty diving in iTunes backups, as they're the easiest way to regain control of your files.

(Tip: if you need to fight your software to access your own files, your platform is hostile. I don't use the iPhone myself and loathe the way Apple does things, but sometimes I have to work with it for other people. Hence this post.)

I needed to recover some Stanza ebooks from an iPhone. It's hard enough to get them *on* to the phone, and getting them off is nigh on impossible, as Apple continuously changes things to make it hard to access the phone via anything but iTunes. In this case, though, the latest change (to the backup format) made it easier to work with, not harder, so the extraction wasn't too hard.

Thankfully, the newer iTunes backup format isn't too hard to work with. It saves two backup files for every file on the phone - one with a .mdinfo extension that's an Apple binary p-list file containing the file's path and metadata, and a second with a .mddata extension that's the actual file data. Other than file extension both have the same name, so they're easy to associate.

The plist format is unpleasant to work with and I haven't made any effort to parse it properly. If anyone has a decent plist parser that doesn't require distribution of Apple shared libraries, please let me know. It's relatively easy to hack together a "dumb" processor that looks for and extracts strings within the plist-format .mdinfo files to obtain path information, as I've done here, but it's extremely fragile and likely to break at minor point revisions of the backup format.

Take the following script, which looks for .mdinfo files containing the string .epub, extracts the filename, and copies the associated .mddata file to the "ebooks" folder in the user's home directory.

Run this script after cd'ing to your iTunes backup folder. On Windows that's in %APPDATA%\Apple\MobiSync (I think, I'm not at the Windows box right now). Within there are folders for each backup. Check the dates to find the most recent, cd into it, and run this script. Your ebooks should appear in a ebooks folder in your home directory.

This is the dirtiest possible hack for the .mdinfo reading and filename extraction. I need to tidy it up into something a bit less gross, but hey, it's enough to play with.

#!/usr/bin/env python
import os
import shutil
import errno

outfolder=os.path.expanduser(r"~/ebooks")

def getfn(f):
    info = open(f,"r").read()
    if "epub" in info:
        return info.split("^P")[2][11:-6]

for f in os.listdir("."):
    try:
        os.mkdir(outfolder)
    except OSError,e:
        if e.errno != errno.EEXIST:
            raise e
    if f.endswith(".mdinfo"):
        fn = getfn(f)
        if fn is not None:
            datafile = os.path.basename(f)
            shutil.copyfile(datafile, os.path.join(outfolder, fn))

No comments:

Post a Comment

Captchas suck. Bots suck more. Sorry.