Virtualisation: It’s a Technology, not a Religion

Published September 30th, 2009 by matt

It’s been interesting to look at the press coverage, blog posts, and tweets surrounding the move of Github to an Anchor-managed infrastructure — I’ve never worked on something so public before. I think the article about “Vampire Programmers” has been my favourite so far.

The ZDnet article on the Github move gave me a wry chuckle, though. It made it sound like the move signified some sort of rejection of the Church of the Hypervisor — that virtualisation had been tested and found wanting. In actual fact, there’s more virtual machines running in the Github infrastructure now than there were previously, providing a lot of very essential services.

I really don’t think of myself as a virtualisation nay-sayer. I started using virtual machines with User-Mode Linux, back before anyone outside of Cambridge had ever heard of Xen, and I got on board with Xen back in the 2.0 days. I’ve introduced widespread virtualisation at two previous jobs, I was a big supporter of the use of virtualisation at my last job, and I’ve been working on Anchor’s High-Availability VM product recently. Virtualisation hater I ain’t.

Conversely, though, I don’t think VMs are the answer to all the world’s problems. They’re a fantastic opportunity for a lot of sites: everyone can be running on high-quality, server-grade hardware (redundant power, hardware RAID, fast busses, etc) without the need to either purchase or maintain that hardware. Furthermore, each VM, by virtue of it’s isolation, is more easily managed and scaled independently of the other VMs. Need more memory? Allocate it. This box is getting a little overloaded? No problem, just move a VM to another piece of hardware.

The simple fact is that very, very few sites need a whole dedicated server — even an entry-level server is massive overkill for most sites. In this situation, you can either:

  • Spend the extra money, assuming that you’ll grow and recoup those costs;
  • Buy a cheaper machine, either a basic desktop machine or second-hand server, and take the hit in reliability;
  • Use shared hosting, where everyone’s on the same OS installation (which has tradeoffs in control and isolation); or
  • Use a virtual machine.

Unsurprisingly, I like the latter option. It saves you money, avoids the reliability headaches of cheaper hardware and the management headaches of shared hosting.

Management is the big on-going cost of most sites. Virtualisation simplifies that by isolating different sites and services from each other, so that when it comes time to scale them, it’s not a big job. Most people who’ve been working as a developer or sysadmin will be able to recall the unpleasant feeling when that big-ball-of-wax that everyone calls “the server” starts to run out of huff, and there’s no better hardware to put it on, and no more software optimisation to be done. The call goes out, “move some services to another server”. Damn.

See, when everything’s on the one machine, they intertwine and become hard to separate. That little hack that Roger The Talented Intern put in to make mail processing run faster? That involved digging into the SMTP server queue and pulling out messages directly; if you separate the web server and the mail server, that’ll break — but I bet you don’t find that out until you move.

I hate doing archaeology on these sorts of machines, because it’s guaranteed that things will break, tempers will run hot, and sadness will result. The cost of doing the move (in IT staff time, downtime, customer and staff dissatisfaction, and so on) can easily equal or exceed that of the hardware itself — and yes, I’m still talking about good-quality, server-grade hardware here. People are expensive, and good people even more so.

Instead, if you run logically separate services in separate VMs, when the time comes to scale something, it really is a piece of cake to migrate a VM — shutdown, copy the disk image, boot it back up. Piece of cake. Sure, there’s some overhead in running those separate services in VMs, and yes, you’ll be looking to buy a second machine sooner than you would otherwise, but again, the savings made by not having to gently tease apart a dozen root-bound systems on a single machine will probably pay for that second machine. Let’s not even consider the costs of another separation in two years time when the services you put onto that other machine need to be separated again…

This use of virtualisation is all well and dandy if you’re one of the vast majority of sites that don’t need to service 125,000 users and 2.5TB of filesystem data. Github, though — they’re one of the (un)lucky few. When you’re using a machine’s worth (or more) of processing power on a single service, there’s no benefit to virtualising that. In Github’s case, there’s four physical machines running just the frontend services — each of which has the same specs as the machines that are running the VMs for the site. Sticking the frontend services into VMs in that case would have been a fruitless move. Similarly for the backend file storage, and the database. They’re all single services consuming a machine’s worth (or more) of resources, so we give them physical machines.

Down the track, as Github grows and individual VMs work harder and need more resources, we’ll first increase the size of those VMs, before making the decision to move a power-hungry VM off onto it’s own physical hardware. That’s an easy move — between the natural isolation provided by virtual machines and the strong configuration management policy we’ve adopted, transitioning from a VM to a physical machine will be painless — and painless systems management is, after all, the aim of the game.

1
Comment

VMware ESX Guest Disk IO

Published April 6th, 2009 by Paul De Audney

Knowing the state of your disk IO latency in VMware ESX can help you pre-empt performance & capacity issues before the occur. There are a few guidelines you should keep in mind. These notes are directed towards people using directly attached storage.

  • Write latency should be 0, because you have that fancy battery backed controller caching writes, right?
  • Read latency should be under 8ms.
  • Use the smallest stripe size possible for your RAID array setting. This helps keep random IO performance acceptable at the cost of some sequential performance.
  • Do not virtualise very heavy random IO workloads on shared arrays, other guest VMs wont like you for it.
  • Unless you have a very compelling reason not too, use RAID 10.

Some other notes, specific to Linux guests are:

  • Mount file systems with noatime and nodiratime, this will help reduce random IO.
  • Allocate enough memory to have some buffers.
  • Do anything possible to stop your VM swapping heavily (see point above).

As with any system, having great monitoring and performance trending allows for you to have an excellent overview of your infrastructure. Even if you don’t have external systems for performance trending, the VMware Infrastructure client with a few tweaks will display the data you want to see.

  1. Login to the VI Client.
  2. Click on an object in the left navigation tree.
  3. Click on the performance tab at the top of the main display pane.
  4. Click the “Change Chart Options” button
  5. Select the Disk chart option from the left expanding menu.
  6. Now change the counters, pick the Latency counters and Number counters, un-ticking the KBps  counters.
  7. Save the chart settings as disk-latency.

Now you can view in real time what is happening with your disk IO on the VMware ESX server. If you are more familiar with using a command line and Linux, you can SSH in to the ESX COS and use the command esxtop to view disk performance information.

  1. Launch esxtop (as root)
  2. Press “v”
  3. Press “s” and then “1″

Now you can see the per VM disk usage counters, with a 1 second sample period.

These rules of thumb are also applicable to Xen and Hyper-V.

0
Comments