Ninja migrations from VMware to KVM using vmdksync

By May 15, 2012Technical

We recently made the decision to pay off some of our technical debt by eliminating the VMware servers we built when we first started our Virtual Private Server (VPS) offering. VMware is a commercial vendor platform so it’s not exactly trivial to jump ship, but it is possible with some time and effort. Forcing a few hours downtime on our customers for business reasons is not cool, so we had to find a better way.

Background and rationale

When we first started offering virtual servers the software landscape was very different. After comparing what was available at the time we settled on VMware ESX for our virtual private server product – the right features, suitable for a VPS product, secure and manageable enough, sufficiently mature and reliable, and a nominal level of support.

Things changed over time and VMware wasn’t doing what we needed, and we’ve since switched to using KVM for our new VPS deployments. The VMware servers are still ticking over but they’re not without issues, particularly with the dedicated Windows machine we’re forced to use to access the console and management tools.

Thus, the (rather easy) decision was made to migrate all the VMs to our KVM-based infrastructure. As a bonus we get to move them to shinier Dell hardware. The only question was how?

Challenges

We’ve done plenty of migrations between VPS servers before, it’s really just swapping one set of virtual hardware for another. Linux doesn’t mind, but Windows will frequently have a fit, and that’s where a bunch of the problems lie.

In addition, downtime needs to be kept to a minimum. You can mount up a VMware disk image and pump data across the network, but that’s slow. At least if we were dealing with a Xen or KVM system as the source then we could use lvmsync to dodge the problem, but no such luck here.

Enter vmdksync

This last point is what made our Matt Palmer itchy – why can’t we apply a binary diff of disk images? VMware has snapshots just like LVM after all.

VMware looks mostly like an RHEL3 system when you login, and VM data is stored in a VMFS filesystem with some special access semantics. The VM disks are just raw files, so far so good, but the filesystem doesn’t show up when you run df or mount, and VMware seems to hold exclusive locks on the .vmdk files. You can’t even use file or hexdump on an in-use image.

With some experimentation, Matt found that locks only apply to files opened for writing – taking a snapshot releases the lock on the original (“flat”) file, and locks the snapshot (“delta”) instead. Hallelujah! We can make a baseline copy of the disk from the unlocked flat file while the VM is running, then apply the changes in the delta file, which should be very quick.

That’s exactly what vmdksync does. After nutting out the perverse description of sparse extents in the Virtual Disk Format 5.0 Technote, Matt put together a little ruby script to merge a VMware snapshot over a target device. That’s an LVM logical volume in our case, but it could be any sort of disk image that you like.

Step by step

End-to-end, the procedure looks something like the following. Depending on how you prefer to manhandle VMware, Matt has some convenient command-line examples in his README over at the repo.

  1. Tell puppet to setup a blank VPS on your new KVM server
  2. Take a snapshot of the victim on the VMware host
  3. Use dd and netcat to copy the flat file across the network and onto the empty logical volume (LV) on the new server
  4. Shutdown the victim on the VMware host, releasing the locks on the snapshot delta file
  5. Push the delta across the network to the KVM server
  6. Use vmdksync to apply the delta to the LV
  7. Fire up the VPS on the KVM server, you’re done!

With judicious use of scripting and quick hands, you can do all this with as little as about 90 seconds of observed downtime. That’s not much longer than it takes for a reboot.

Dealing with Windows

It’s not all kittens and rainbows when it comes to Windows, as mentioned earlier. The massive changes in (virtual) hardware often cause problems when booting after migrating the VPS, so we use a commercial tool called ShadowProtect to inject the necessary drivers into the installation before bringing it to life again.

Sometimes you have a problem with the bootloader... Yep, that's broken!

ShadowProtect is also a very fast way to fix bootloader issues that sometimes crop up during migrations. Once successfully booted, the network interfaces will need to be reconfigured and the system reactivated, thanks to the hardware changes.

Start-to-finish, a Windows systems takes about 20-30min to get up and running again, which is quite respectable when you consider that regular Windows Updates can take as long to apply. We also remove the remnants of VMware tools and drivers to keep things tidy.

Wrap up

This was an overwhelmingly successful process that saw us sweep several dozen VMs onto new servers over the course of a couple of weekends. Planning the work and contacting all the affected customers probably took more time than actually doing the hands-on work.

If you’ve run into similar sorts of fun when dealing with VMware we’d love to hear about it. Likewise, if you have any questions just leave a comment.

2 Comments

  • bradj says:

    Could you instead at step 5&6 have vmware consolidate the snapshots and then rsync? Beforehand I’d have had VMware convert the disk format to thick.

    Also, you could use the vmware p2v (lying about ‘P’) to live migrate to an NFS datastore on your target server and then rsync the converged vmdk to lv locally?

    • Barney Desmond says:

      You could have VMware consolidate the snapshots, but the problem is that there’s no fast way to sync the block devices on each end after that. rsync could do it (you’d have to install it first), but it has to read the whole (potentially large) VMDK to find the differences. It ends up being much faster to make use of the delta data that you already have.

      The latter method sounds like it would work too, but we suspect there’d be a fair bit of hassle involved in getting the live migration to work nicely. You’d still have to do that rsync though, which would be on fast disks, but not likely to be faster than applying just the delta.