Archive for September, 2009
Virtualisation: It’s a Technology, not a Religion
Wednesday, September 30th, 2009It’s been interesting to look at the press coverage, blog posts, and tweets surrounding the move of Github to an Anchor-managed infrastructure — I’ve never worked on something so public before. I think the article about “Vampire Programmers” has been my favourite so far.
The ZDnet article on the Github move gave me a wry chuckle, though. It made it sound like the move signified some sort of rejection of the Church of the Hypervisor — that virtualisation had been tested and found wanting. In actual fact, there’s more virtual machines running in the Github infrastructure now than there were previously, providing a lot of very essential services.
I really don’t think of myself as a virtualisation nay-sayer. I started using virtual machines with User-Mode Linux, back before anyone outside of Cambridge had ever heard of Xen, and I got on board with Xen back in the 2.0 days. I’ve introduced widespread virtualisation at two previous jobs, I was a big supporter of the use of virtualisation at my last job, and I’ve been working on Anchor’s High-Availability VM product recently. Virtualisation hater I ain’t.
Conversely, though, I don’t think VMs are the answer to all the world’s problems. They’re a fantastic opportunity for a lot of sites: everyone can be running on high-quality, server-grade hardware (redundant power, hardware RAID, fast busses, etc) without the need to either purchase or maintain that hardware. Furthermore, each VM, by virtue of it’s isolation, is more easily managed and scaled independently of the other VMs. Need more memory? Allocate it. This box is getting a little overloaded? No problem, just move a VM to another piece of hardware.
The simple fact is that very, very few sites need a whole dedicated server — even an entry-level server is massive overkill for most sites. In this situation, you can either:
- Spend the extra money, assuming that you’ll grow and recoup those costs;
- Buy a cheaper machine, either a basic desktop machine or second-hand server, and take the hit in reliability;
- Use shared hosting, where everyone’s on the same OS installation (which has tradeoffs in control and isolation); or
- Use a virtual machine.
Unsurprisingly, I like the latter option. It saves you money, avoids the reliability headaches of cheaper hardware and the management headaches of shared hosting.
Management is the big on-going cost of most sites. Virtualisation simplifies that by isolating different sites and services from each other, so that when it comes time to scale them, it’s not a big job. Most people who’ve been working as a developer or sysadmin will be able to recall the unpleasant feeling when that big-ball-of-wax that everyone calls “the server” starts to run out of huff, and there’s no better hardware to put it on, and no more software optimisation to be done. The call goes out, “move some services to another server”. Damn.
See, when everything’s on the one machine, they intertwine and become hard to separate. That little hack that Roger The Talented Intern put in to make mail processing run faster? That involved digging into the SMTP server queue and pulling out messages directly; if you separate the web server and the mail server, that’ll break — but I bet you don’t find that out until you move.
I hate doing archaeology on these sorts of machines, because it’s guaranteed that things will break, tempers will run hot, and sadness will result. The cost of doing the move (in IT staff time, downtime, customer and staff dissatisfaction, and so on) can easily equal or exceed that of the hardware itself — and yes, I’m still talking about good-quality, server-grade hardware here. People are expensive, and good people even more so.
Instead, if you run logically separate services in separate VMs, when the time comes to scale something, it really is a piece of cake to migrate a VM — shutdown, copy the disk image, boot it back up. Piece of cake. Sure, there’s some overhead in running those separate services in VMs, and yes, you’ll be looking to buy a second machine sooner than you would otherwise, but again, the savings made by not having to gently tease apart a dozen root-bound systems on a single machine will probably pay for that second machine. Let’s not even consider the costs of another separation in two years time when the services you put onto that other machine need to be separated again…
This use of virtualisation is all well and dandy if you’re one of the vast majority of sites that don’t need to service 125,000 users and 2.5TB of filesystem data. Github, though — they’re one of the (un)lucky few. When you’re using a machine’s worth (or more) of processing power on a single service, there’s no benefit to virtualising that. In Github’s case, there’s four physical machines running just the frontend services — each of which has the same specs as the machines that are running the VMs for the site. Sticking the frontend services into VMs in that case would have been a fruitless move. Similarly for the backend file storage, and the database. They’re all single services consuming a machine’s worth (or more) of resources, so we give them physical machines.
Down the track, as Github grows and individual VMs work harder and need more resources, we’ll first increase the size of those VMs, before making the decision to move a power-hungry VM off onto it’s own physical hardware. That’s an easy move — between the natural isolation provided by virtual machines and the strong configuration management policy we’ve adopted, transitioning from a VM to a physical machine will be painless — and painless systems management is, after all, the aim of the game.
GitHub: Speed matters
Tuesday, September 29th, 2009Impressions from the first article (in its first day) and the first 24 hours of the GitHub migration, have caused us at Anchor to believe that;
- GitHub is just as popular as we thought,
- The migration was worth it, as things are running much faster (just check your twitter feeds, or better yet, check your GitHub source tree for no reason
); and, - People are interested in what has gone under the hood of the new GitHub (insert your favorite fast car here; otherwise lets say a roadster).
Taking these three things into account, this installment will discuss why things are so much faster post migration compared to prior.
I said ‘faster’ and not ‘fast’, because GitHub is now as fast as any website should be. So in comparison, yes, GitHub is fast now, however it is akin to riding your bicycle with half inflated tires: when fully inflated, suddenly your old bike is blazing fast. Now this is not to be critical of the former architecture which held its merits when GitHub was founded. GitHub had simply moved to a stage where a infrastructure architecture refresh was logical.
The main thing, in the large, that made this new architecture fast was that we were given a blank slate and large amounts of freedom to make an architecture that would do the job well. This is an incredibly rare thing, and it no doubt took a lot of courage on Github’s part. For that, we have to say “thankyou” to the Github team for letting us have that freedom. I like to think that we’ve repaid that trust with a pretty awesome architecture that will serve them well for some time to come.
SCALE: When looking at the new architecture as a whole, the increased scale is immediately evident. GitHub now consumes far more hardware than ever before:
Old Infrastructure:
- 10 VMs
- 39 VCPUs
- 54GB RAM
New Infrastructure:
- 16 physical machines
- 128 physical cores
- 288GB RAM
Or for those who enjoy visual cues:

It is a credit to the old infrastructure and GitHub’s code that it ran so well on so little (in comparison). The first credit for increased performance is increased scale.
An important note regarding the hardware is that there is nothing special (or industry secretive) regarding it. The solution in its entirety is run from commodity hardware. No special black boxes doing scary things with packets and routes. No appliance servers. The solution architecture developed by Anchor can be used with any hardware vendor (insert: Dell, HP, IBM, SuperMicro, etc). Vendor neutrality provides GitHub with no encumbrance with either scaling up or out, a key issue when considering growth and future flexibility.
Note: The architectures flexibility allows for the user repository storage to be expanded with a mix of vendor hardware (should GitHub ever change hardware vendor). Furthermore, any component can be exchanged for another vendor’s hardware with no change to GitHubs architecture or software.
In a nutshell, the increased scale provides:
- More GitHub front-end servers to service your requests;
- More storage; and
- More I/O bandwidth when working with your repository data
HARDWARE PERFORMANCE: The speed specifications of the underlying components is important, in addition to how that hardware is utilised.
Storage I/O: A common factor in poor performance with any solution is an I/O bottleneck at the storage level. This pain was GitHub’s. To alleviate this, not only is the storage now distributed across several servers (distributing the I/O), but it is now running on direct-attached 15,000 RPM SAS disks on battery-backed hardware RAID. Therefore, the second credit for increased performance is faster storage.
Direct access to hardware: Virtualisation is great. What isn’t great is when virtualisation is used as a universal solution. At Anchor we believe there is a place for virtualisation, and systems with massive I/O or CPU requirements is not that place. By moving resource heavy systems onto dedicated hardware, any contention for resources between individual VMs is removed. The third credit goes to less overhead.
ARCHITECTURE: Throwing hardware at a scaling problem is an easy solution, but without the right division of resources and the right software to properly use it, it’s not going to run real fast.
For GitHub, this was their innovative Git command proxying systems, which do an excellent job of taking requests from the frontends (where users connect with their web browser, git client, or SSH client) and shipping them to the fileservers. The database structure, filesystem layout, and code efficiency also contribute to this.
Given that the software isn’t our speciality, there’s not a lot for us to say about this, but Github are planning a series of posts on their blog, and I’m quite sure it’ll be enlightening.
TO REVIEW: The factors involved in GitHub’s faster response on the new infrastructure include (but are not limited to):
- Increased Infrastructure (Scale)
- Faster Hardware ( Storage)
- No resource contention (More resources per server)
- Solid, scalable architecture (Awesomeness)
Keep an eye on this space, as we delve into technology specific posts regards what kinds of 11 herbs and spices Anchor used to realise the new GitHub architecture.
RAIDing USB flash disks – not just a silly stunt
Tuesday, September 29th, 2009We’ve seen it all before:
hay guyz, check this out, I got a bunch of old 64mb thumb drives and made a RAID out of them! now i can put all my pr0n on there roffle lolololll
RAIDed floppies? It’s been done. RAIDed tapes? Yo dawg, that’s an enterprise storage solution! Let’s talk seriously now.
I have a fileserver that my family uses, it’s just a box with a couple pairs of hard drives in it (RAID-1, thank you very much. None of this starving-student crap with an oddball assortment of drives in RAID-0). Given that the box is used exclusively for serving up SMB shares, the OS installation is tiny.
I could’ve gone with something really stripped down and optimised, but that would require effort; sysadmins are allergic to unnecessary effort. Instead I just installed Ubuntu jaunty via netinst. Laugh all you want, but I have better things to do, like sleep.
The old system was whining about missing one half of its RAID-1, so I decided to splurge on a pair of 4gb USB flash disk – the princely sum of $22 for the pair. I setup the md software raid volumes ahead of time, which were happily picked up by the ubuntu installer – 512MiB /boot partition and the rest handed off for LVM to manage.
I could bore you with a bunch of details, but who cares about that.
- Does it work? Yes, albeit a bit slower during bootup – total boot time from power-button to login prompt is 90 seconds.
- Does the RAID work? Nicely, thank you. You can yank a drive out and it’ll keep ticking along.
- Is there enough capacity? Plenty, the OS filesystem is 44% full.
- Won’t swapping kill it? Yes, maybe eventually. The system has 1GiB of RAM, more than enough when you consider it’s only really using about 100MiB. At least there’s a chance both drives won’t fail at exactly the same time, so I can replace one.
- Am I taking backups? Of course! If it toasts itself it’s not big deal.
What next? Hmm, if I splash out I could buy another pair of flash disks and kick it up to RAID-10 for a performance boost!
GitHub: Designing Success
Monday, September 28th, 2009At Anchor we do not believe in black box solutions. Sharing is caring and we like to share. In this post we specifically want to share our triumph with Project StarBug, better known to the wider world as GitHub. For the uninitiated, GitHub is ‘Social Networking meets Source Code management’, or in GitHubs own words ‘Git is a fast, efficient, distributed version control system ideal for the collaborative development of software. GitHub is the easiest (and prettiest) way to participate in that collaboration: fork projects, send pull requests, monitor development, all with ease.’.
Some readers may protest this point, stating that GitHub is hosted in the USA while Anchor is located in Australia. How then has Anchor architected, implemented and (going forwards) manage GitHub’s infrastructure with such a geographical encumbrance?
All will be revealed in a blog entry in three of many parts.
Part 1: (This Post) Designing for success (Otherwise known as: Making GitHub’s dream a reality and nightmares a thing of the past)
Part 2: Speed matters
Part N: (To be announced)
For obvious reasons, we cannot expose GitHub’s architecture in full, however we are sharing some of the more interesting technologies/architecture we have implemented, and the rationale for doing so. Essentially what we have done to make GitHub’s dreams a reality.
Geographical encumbrance
It is a credit to GitHub’s management that they were willing to look the world over for the right team to support them. While they do not want to be harried by anything outside the GitHub application (i.e. Hardware, O/S, Management, etc), they still needed to ensure that the right company was employed to look after these components.
Why Anchor? Anchor’s flexibility to manage a solution on third-party hosted hardware (anywhere in the world) and versatility in developing an architecture to suit this scenario were part of the rationale. Anchor’s reputation for needing to know how technology works (again, no black boxes) and then working out how to improve it was a major contribution.
Enough fluff, now to the meat;
One can imagine that the architecture required to support GitHub is complex mix. We won’t lie; there are many moving parts. Some of the key criteria for designing the solution included:
Scalability
GitHub states it growth as “400 new users and 1000 new repositories every day”. Post migration GitHub will be running on infrastructure spread across 15+ physical hosts/servers. It is essential that the infrastructure can grow with the user base, from 10’s to 100’s of servers, without the need to re-architect everything. Without a doubt, growing without the associated pain is a major objective for GitHub as it moves forward.
Interesting Note: GitHub’s new physical infrastructure (at migration) consists of:
- 15+ physical servers
- 10+ virtual servers
- 128 physical processor cores
- Over 288GBs RAM
- 1TB+ of storage
GitHub’s software architecture is modular by nature and scalability friendly. Components outside the core software, however, were not as readably scalable. This has been achieved with the following improvements;
- Distributed Storage Architecture (with real-time slaves). Distribution of GitHub’s source code repos across multiple partitions and multiple nodes (including redundant slaves) provided improvements in performance, scalability and reliability. By removing the limitation of using a single filesystem volume for storage, the issue of dealing with large scale storage has been avoided. New partitions can be rapidly added on demand with little to no fuss.
The graphic below illustrates a simplified request to the distributed file storage repo:
- (Sensible) Virtualisation. Previously, GitHub’s infrastructure was entirely virtualised. While virtualisation has its merits, there are reasons to avoid it. Services that aren’t I/O-heavy can be virtualised, while components with high I/O requirements are run on dedicated (“bare metal”) servers. For GitHub, this means file storage and databases are not virtualised. Otherwise, virtualisation is used to provide a mix of server consolidation, rapid deployment and service redundancy/HA.
- Horizontal scalability (on-demand, via automated build infrastructure). The ability to add additional components to the infrastructure in an automated fashion reduces scale-out time and removes user error from builds/configuration. In addition, this also turns the server build/deployment procedure into a measurable deliverable. Over time this can be review and improved (Thank you W. Edwards Deming).
Reliability
As with most businesses, High Availability (or business continuance) is essential to a success. To achieve this a combination of DRBD, virtualisation, heartbeat and load balancing has been employed.
- Mirroring Data; DRBD is utilised for several purposes.
- It is used to ensure the redundant (read: slave) storage partitions and nodes are in sync with the active counterparts.
- DRBD is also key in providing HA functionality across the virtualised environment.
Several Xen hosts are deployed with the following scenario; Server 1 runs VM A(active) B(active) C(offline DRBD mirrored) D(offline DRBD mirrored), and Server 2 runs VM A(offline DRBD mirrored) VM B(offline DRBD mirrored) VM D(active) VM E(active). This provides active failover if either of the virtualisation hosts fail.
The graphic below illustrates the replicated, highly-available storage architecture:
- Consistency; via automated builds and configuration management. With any horizontally-scaled solution, consistency amongst similar components is essential. One of the most notable achievements across the entire architecture is the complete integration of automated build infrastructure. A new/additional component of the solution can be rapidly built and added to the overall system regardless of the architecture (physical or virtual).
- Redundancy; A simple way to ensure greater uptime and lower the risk of service interruption is to introduce as much redundancy as possible. GitHub is a great example of this practice. Data links, Ethernet/switching, server and components all have a redundant twin ready to swing into action should the primary fail.
Conclusions
The implementation of any new architecture for an already mature product is never easy. Anchor engineers have been working tirelessly with GitHub staff to ensure the any growing pains are transparent to the users. In the next entry, we will be sharing some of our insights in regard to migrating GitHub from their existing host and infrastructure to the new Anchor developed model. Until then, we hope you enjoy the new faster GitHub, more of the time (well, all/any of the time) than ever before.
Just because you CAN, Doesn’t mean you SHOULD
Friday, September 25th, 2009(Yeah, I’ve been really slack with the blog posts about Project Starbug, but unfortunately when the choice is between doing the cool stuff, and blogging about it, the blogging tends to lose. I am still planning on writing all about things when things die down. In the meantime…)
Remember when you were a kid, and every time you got a new toy you’d just have to play with it all the time? That mentality doesn’t go away as you grow up, it just gets a little more sophisticated. With new technologies, I’m still very much this way. I remember when I first learnt about flex and bison — for the next six months or so, every programming problem I encountered just had to be solved with a minilanguage implemented in flex/bison. I shudder to think that any of that code might still be out there…
Anyway, this week’s shiny new toy has been Heartbeat / Pacemaker. I’ve played with it a fair bit in the past, but just in two-node (Heartbeat v1) clusters. For Project Starbug, though, I’ve been taking it to new heights of awesome (multi-node, easily expandable HA VM clusters, for example). So, of course, anywhere that a bit of high-availability might be good, I’ve laid it on thick. With the Puppet manifests we’ve got for managing Pacemaker, it’s almost harder not to make something HA (seriously, our Pacemaker manifests are awesome).
Unfortunately, in a couple of places I kinda forgot that some services have their own ways of doing HA, and they’re generally superior to tying a service and an IP together and telling Pacemaker to go do it’s thing. The two services that I’ve just converted back away from Heartbeat are NTP and DNS. Yeah, that’s right — I setup pacemaker resources for our NTP server and DNS server, because I suffer from occasional bouts of acute “shiny toy syndrome”. I’ve now recovered, having learnt my lesson (for now).
When HA won’t play the way you want it to
Tuesday, September 8th, 2009In an ideal world every service would support High Availability and Load Balancing, would scale up easily and cleanly and all of us systems administrators would be paid bucketloads to play golf all day while the computers did all the hard work. To quote Dylan Moran of Black Books fame, “Don’t make me laugh…bitterly”.
I’ll cut to the chase – sometimes you have to really shoehorn technologies to do what you want. Fortunately I love doing this, and the technologies of today’s article are virtualised Windows 2008 on Xen, and Oracle XE 10g. Neither likes to play ball, for a few reasons:
- Generally speaking, when you virtualise an OS you want to have para-virtualisation drivers enhancing the hardware support. Open Source Xen has PV drivers, but they are not signed with a legitimate certificate. Windows 2008 does not play nicely with unsigned or test-cert-signed drivers.
- Oracle is just a messy, messy, nasty thing. Yes, paid versions undoubtedly support all manner of loadbalancing and HA options, but the free one does not.
Adding HA to Windows 2008 on Xen
The basic procedure was as follows:
- Install the telnet server within Windows (making sure to lock it down in the firewall to only be accessible by the host machines)
- Create a special admin account and password used for triggering a shutdown
- Create an Expect script which logs into the VM via telnet, and issues the shutdown command
- Create a modified version of the Heartbeat Xen resource agent which calls the expect script to shut down the VM (and wait a safe period of time) before “xm shutdown” is called. Without this, “xm shutdown” will simply power off the VM (in absence of working PV drivers).
The VM was already running on a DRBD volume between the two HA Xen servers, so I was able to just create a standard set of Heartbeat resources to control DRBD primary/secondary mode and the startup/shutdown of the HA WIndows VM. For your benefit (if you want to recreate it) here is the expect script:
#!/usr/bin/expect -f
#
# Script which "automates" shutting down a Windows VM
# Don't log telnet output and commands to stdout, and set a reasonable timeout.
log_user 0
set timeout 3
# Log in via telnet and issue commands. Fairly straightforward.
spawn -noecho /usr/bin/telnet 192.168.1.1
sleep 0.5
# login as the "shutdown" user
expect {
-re "login: $" {send "shutdown\r"}
timeout exit
}
sleep 0.5
expect {
-re "password: $" {send "mysecretpassword\r"}
timeout exit
}
sleep 0.5
expect {
-re ">$" {send "shutdown /s /t 0\r"}
timeout exit
}
sleep 0.1
expect {
-re ">$" {send "exit\r"}
timeout exit
}
exit
The rest is fairly self-explanatory if you understand Heartbeat.
Oracle XE 10g
This was more of a learning process, since usually you just install Oracle and leave it the hell alone. Not so for me.
- Install Oracle on both nodes using (fortunately) the RPMs they provide
- Configure Oracle on both nodes including creating the databases, using the same password for SYSDBA
- Shutdown both instances of Oracle
- Create the DRBD resource, and mount it on the primary node
- On the primary node, move the contents of /usr/lib/oracle/xe/oradata and /usr/lib/oracle/xe/app/oracle/flash_recovery_area onto the mounted DRBD
- On the secondary node, delete the aforementioned paths
- Bind mount the oradata and flash recovery area from the mounted DRBD volume into the correct places in the directory tree.
- Start Oracle
After I had created a Heartbeat resource group which contained the DRBD resource, the DRBD filesystem mount, the aforementioned bind mounts and the Oracle service itself I was quite pleased to see that Oracle plays quite nicely with our shoehorned HA setup. You’ll want to make sure you have a properly fixed Oracle init script though, as the supplied one is fairly bad.
After making Oracle and Windows 2008 work nicely in HA, I’m almost certain any service no matter how bad can be shoehorned in a similar way to give you decent availability even when it was n’t originally intended.
This just in, from the Department of the Bleedin’ Obvious
Tuesday, September 8th, 2009I kid you not, we just received this in a piece of marketing guff from our favourite enterprise vendor.
“Industry analysts predict that Linux and Windows will soon dominate the operating system space. How you respond to this is critical.”
Meanwhile, industry analysts predict that more than 98% of the population will be consuming oxygen by 2010.
AusNOG conference
Tuesday, September 1st, 2009I was lucky enough to get a free pass to the Australian Network Operators Group conference from one of our upstream providers, so that’s what I’m up to at the start of this week. It is interesting to compare it to my experiences at the several LinuxConfAU conferences I’ve been to. On the whole I can say it is more Enterprisey, far less smelly, and a generally smaller but more focussed conference. Obviously network topics dominate the conference (although there are a number of presentations that border on other areas).
Somewhat confusingly for a sysadmin, they named this conference AusNOG03. They have decided to not use a year-based numbering system nor one that starts at 0 (which would please most of us), and as a kicker have locked themselves into a two-digit Y2K-style bug. Well, it’s only 3 years old, we’ll let that point slide.

Unhealthy snacks ahoy
Typically tasty and unhealthy snacks could be found upon entry – some delightful mini-croissants with ham and cheese. Coffee and tea staples were omnipresent. Apparently there was a large imbibing session last night and most delegates attended.

Conference room
It is being held at the Four Seasons Hotel in Sydney. I have to give them points for style, and functionality. Not only do we have actual stable desks for writing and computing, but there is a power board for every three seats.

Legacy writing equipment, water glass and mints
An array of useful items were at every seat. They clearly recognise that network operators lack social etiquette and have strewn mints far and wide. They are on the tables, they are in the conference bags.
To briefly summarise what I have taken in so far – the Internet is not yet blowing up; network operators and BGP are doing a good job and making the Internet as a whole (which is going from a long stringy network, to a fat wide network) better; Open-Source content delivery networks are on the horizon and may become a reality some time soon.




