Friday, April 16, 2010

LVM vs VSS - it's no contest

Richard Jones writes about Microsoft's Virtual Shadow Copy Service (VSS, not to be confused with Visual Source Safe), and laments the lack of any equivalent or good alternative on Linux servers.

I couldn't agree more, but there's more to it than what he focuses on. Application consistency is in fact the least of the problems.


App consistency and pausing is missing in Linux, but is in principle not hard to introduce. It is something where D-BUS can be very useful, as it provides another piece of the puzzle in the form of inter-application and application<->system signaling. The trouble is that, like with most things in the Linux-related world, some kind of agreement needs to be reached by interested parties.

All it really takes for app consistency is two d-bus events:
  1. "Prepare for snapshot - pause activity, fsync, etc."
  2. "Snapshot taken/failed/cancelled, safe to resume"
... though there needs to be some aliveness-checking and time-limiting in place so that a broken application can't cause indefinite outages in other apps by not responding to a snapshot message.

Off the top of my head, products that'd benefit from this include Cyrus IMAPd, PostgreSQL, MySQL, any other database you care to name...


As compared to VSS, using dm-snapshot (LVM snapshots) on Linux suffers from a number of significant deficiencies and problems:
  • Snapshots require you to use LVM. LVM doesn't support write barriers, so file systems must use much slower full flushes instead.
  • Snapshots are block-level not file-system-level, so the file system isn't aware of the snapshot being taken.
  • Because snapshots are block-level and not filesystem-aware, the snapshot must track even low-level file system activity like file defragmentation, erasing free space, etc. This means they grow fast and have a higher impact on system I/O load.
  • Accessing a snapshot requires mounting it manually to another location and faffing around with the changed paths. There's no way for an app to simply request that it sees a consistent point-in-time view of the FS instead of the current state, as in VSS. This is clumsy and painful especially for backups and the like.
  • The file system being mounted has to be able to cope with being mounted read-only in crashed state - it can't do journal replay/recovery etc. LVM doesn't even give the *filesystem* a chance to make its self consistent before the snapshot. Some file systems, like XFS, offer manual pause/resume commands that may be issued before and after a snapshot is taken to work around this.
  • Snapshots don't age out particularly gracefully. You have to guess how much space you'll need to store changes, as LVM isn't smart enough to just use free space in the VG until it runs out. The snapshot's change tracking space is reserved and can't be used for anything else (even another snapshot) until the snapshot is freed. If you over-estimate you can have fewer snapshots and need to keep more free space around. If you under-estimate your snapshot may die before you finish using it. Argh!
So: even with the ability to pause app activity, there's unfortunately a lot more to be done at the LVM level before anything even half as good as VSS is possible. LVM has been progressing little, if at all, for quite a few years now and some of these problems (such as the namespace issues and lack of FS integration) aren't even practical to solve with a block-driver level approach like LVM's.

At this point, one can only hope that BTRFS can do a better job, so that we can switch to runnning btrfs on mdadm raid volumes and breathe a sigh of relief.

No comments:

Post a Comment

Captchas suck. Bots suck more. Sorry.