Load balancing at Github: Why ldirectord?

Published October 31st, 2009 by matt

Some comments on Github’s blog post “How We Made Github Fast” have been asking about why ldirectord was chosen as the load balancer for the new site. Since I made most of the architecture decisions for the Github project, it’s probably easiest if I answer that question directly here, rather than in a comment.

Why ldirectord rocks

The reasons for Github using ldirectord are fairly straightforward:

  • I have a lot of experience with ldirectord. Never underestimate the value of knowing where the bodies are buried. In ldirectord’s case, there aren’t many skeletons, but “better the devil you know” is a valid argument. If you’ve got strong experience in making something work (and you’ve managed to make it work), and you don’t have a lot of time for science experiments, then there’s a lot to be said for going with what you know.

    This goes beyond simply knowing what to do when things go wrong, of course. You’ll also know how to install and configure it already, how to monitor it, and so on.

    What’s more, in ldirectord’s case I had already proven that it worked in an architecture almost identical to Github’s, and with a similar load profile. At a previous job, I had ldirectord serving a sustained aggregate of 2500 TCP connections per second on a 128MB Xen VM, passing to a large set of backends in a manner almost identical to Github.

  • Anchor has a lot of experience with ldirectord. Whilst my experiences are one thing, there’s a lot more to building an infrastructure than just setting it up. I like to take holidays as much as anyone, and so there was no point in using something that nobody else in the company had any experience with, if there was something else that we did all know about.

    Thankfully, ldirectord lined up nicely, since it’s what we use for our other load balancing setups (not setup by me, either — these were already in place before I arrived). This meant that there was already a pile of documentation and knowledge amongst the sysadmin team about ldirectord and it’s quirks. Also, being automation junkies, we already had Puppet dialled in to install and configure ldirectord, and we knew exactly how to monitor it.

  • Ldirectord will do the job. With the prior experiences of myself and the rest of the Anchor team, we were confident that ldirectord would do the job, and at the end of the day that’s what really matters.

The Alternatives

It’s all well and good to say “we know it and it works”, but I’m not really expecting anyone to just read that and say “well, OK, I guess we’ll use ldirectord”. In fact, if you apply the above criteria to your own situation, there’s every possibility that you’ll come up with a different answer — and if you’ve never setup a load balancer at all, then you’ve got no experiences to use to guide you.

So, here are the other load balancing options I’ve dealt with, and what I think of them. This might give you a bit of food for thought when choosing your load balancer.

  • keepalived. This is the project closest to ldirectord in terms of functionality and operation. It actually uses the same load balancing “core” as ldirectord, IPVS, part of the Linux Virtual Server project. As such, it performs similarly to ldirectord when it comes to actually redirecting requests to backends, and is another excellent choice for load balancing.

    For Github, though, there wasn’t any benefit in using keepalived. Whilst I used keepalived extensively at my last job, nobody else in at Anchor had had much to do with it. Also, keepalived has a built-in failover mechanism, which we didn’t need because we already use Heartbeat/Pacemaker for all our HA/failover requirements. I also feel that keepalived is more complicated when compared directly to ldirectord, largely because of it’s built-in failover capabilities. That’s not to say that combining Pacemaker and ldirectord is dirt simple, but if you’ve already got Pacemaker on hand anyway…

    If all you needed was a HA load balancer, and had no experience with either ldirectord or keepalived, I’d probably recommend keepalived over ldirectord, as it’s one project and one piece of software to do everything you need.

  • Load-balancing appliances. Sometimes misleadingly referred to as “hardware” load balancers (they’re still chock full of software, kids — and unlike high-end routers, I don’t know of any true L4 load balancer that has it’s forwarding plane entirely in hardware).

    I loathe these things. They’re expensive, restrictive, slow, and generally cause you a lot more pain and suffering than they’re worth. At my last job, one of my projects was to convert most of one of our existing clusters from a load-balancing appliance to use keepalived. Why would we do this? Because the $100k worth of appliance wasn’t capable of doing the job that $15k worth of commodity hardware and an installation of keepalived were handling with ease — and with capacity to spare. That cluster was our smallest, too, with probably only 2/3 the capacity of the other clusters run by keepalived.

    At the job where I had ldirectord handling 2500 conn/sec, we had also previously used a load-balancing appliance, which was supplied and managed by the hosting provider. It was a management nightmare — we couldn’t get any useful statistics out of it at all, like the conn/sec coming in or going out, and we couldn’t usefully adjust the weightings of each backend (to tune how many connections were going to each different sort of machine) or manage the system in real-time. When we switched to using ldirectord, a small shell script (involving watch and ipvsadm, mostly) was all it took for the CTO to be able to watch exactly how the cluster was performing, in real time, throughout the day. He loved the visibility — and the fact that we were saving several hundred dollars a month didn’t hurt, either.

  • haproxy. While we use haproxy extensively within Github, I don’t think haproxy is the right solution as the front-end load balancer for a high volume website. Being a proxy, rather than a simple TCP connection redirector, it has much larger overheads in CPU and memory, and adds more latency to the connections. All of Github’s load balancing is being done out of one small VM, and it barely raises a sweat. The return traffic doesn’t even go back through the load balancer at Github, since we’re using a really neat mode of IPVS that allows the traffic to return to the client directly. While you can throw hardware at the load balancing problem, I still prefer to be efficient where possible.

    Since haproxy makes a second TCP connection, rather than just redirecting an existing one, it mangles the source IP address information — and while you can work around that in HTTP with custom headers, that doesn’t work for other protocols like SSH. I cringe at the thought of trying to defend against a DDoS attack when the most useful piece of diagnostic information (the source IP) can’t be correlated against the actions of an attacker on the site.

    If all you know is haproxy, and you’re running a low-volume site that only has to deal with HTTP(S), then haproxy will probably do the job — it’s certainly handling more connections inside Github than most sites will ever see. However, I’d recommend getting someone who does systems administration full-time (like us!) to install and manage a real load balancer like ldirectord rather than use haproxy, along with keeping your other basic infrastructure on track. Wouldn’t you rather be developing new features rather than dealing with this stuff?

So, there’s one geek’s opinions on load balancing. Questions and comments appreciated, and if you’d like to know more about any part of the Github architecture (or any other aspect of systems administration), please let us know in the comments and I’ll whip up some more blog posts.

12
Comments

Envy our new Leviathan!

Published October 19th, 2009 by Barney Desmond

Our current rdiff and amanda backup server, KRAKEN, is almost full, so it was time to order a new one. After much wrangling, we finally received LEVIATHAN this morning.

LEVIATHAN is, I assure you, teh hardk0rez - dual xeon 5500-series, 6gb RAM and 12TB usable storage in RAID-10

LEVIATHAN is, I assure you, teh hardk0rez - dual xeon 5500-series, 6gb RAM and 12TB usable storage in RAID-10

I was pushing for PHYREXIAN DREADNOUGHT personally, but LEVIATHAN is acceptable too; the upkeep effort of backup servers is pretty high after all.

0
Comments

New dedicated server upgrade offering

Published October 10th, 2009 by Barney Desmond

This is, of course, a fantastic idea:
http://en.gentoo-wiki.com/wiki/Using_Graphics_Card_Memory_as_Swap

Anchor loves to stay abreast of the latest performance options. As such, we’re proud to announce a new range of upgrade options for our dedicated server customers that demand the absolute best in performance for their customers.

It makes sense, really. The best our current systems offer is puny DDR2 memory. Just think of what you could do with several gig of GDDR5. That’s right, FIVE! We’re now offering upgrade options with Geforce 320 and Geforce 340 cards. If you order one of our higher-specced (2RU) dedicated servers, you can have two of these puppies strapped together for insane amounts of swappiness.

Stay tuned for more news on how we’re rolling out ButterFS, phase-change cooling, overvolted Core2 Quad servers, and mass-scale SSD RAID-0 arrays for database optimisation.

Tags: , , , ,
Posted in WTF

 Leave a comment

3
Comments

Interesting failure modes, episode 2501

Published October 5th, 2009 by Barney Desmond

I got woken up by a SMS for low diskspace the other night on one of our customer’s servers. Okay, so that’s a lie, I never sleep, but the SMS is real.

Oh great, they’re making whoopie on their mailing lists again and making some stupidly huge logfile.

Little did I know just how huge that file was. How about 735gb huge, in the space of 12hrs? This customer is already a bit of an oddball, what with 1.4TiB of usable space in their server. “Oh that’s nothing”, you say. Sure, I’ve got a few TiB of kitten pictures on my machine at home, just like you, but to put things in perspective: 300GiB of space would be “big” for most Anchor customers. SCSI disks cost about $1.70/Gb, compared to about 10c/Gb for SATA.

There was no mailout. No big processing job, and no flood of activity. With a little digging I was able to nail it down to an apache errorlog file. That was a surprise, except for the PHP errors all throughout – some things never change.

[Fri Oct 02 02:39:57 2009] [error] [client 63.82.71.139] PHP Warning: fgets(): supplied
argument is not a valid stream resource in /home/wright/public_html/script.php on
line 15, referer: XXX

Nice work there, guys. You need to learn to check your return values from failure-prone functions.

Strangely, there were no actual active connections, but the process list showed two apache processes going balls to the wall, writing the same error message to the log file ad infinitum. By my reckoning that was over 9000 lines per second – nothing a quick service-restart couldn’t fix, thankfully.

And to actually fix the problem? It’s tempting to dump the file, but we don’t like doing that; it’s just a bit too cowboy for us. I settled for a forced logrotate run, taking about 4hrs and squishing it down to just 4.3GiB – Crisis (and sleep) Averted.

0
Comments

Ooh, bugger…

Published October 2nd, 2009 by Barney Desmond

And this is why we co-locate in Globalswitch, a top-tier facility with floors that AREN’T MADE OF BALSA WOOD.

Racks are pretty heavy, sure, but they totally wtfpwned those tables there

Racks are pretty heavy, sure, but they totally wtfpwned those tables there

0
Comments

Upping the maternal ante

Published October 2nd, 2009 by Barney Desmond

We’re gonna need another drinks fridge.

The third refrigeration unit has been ordered, ETA next week

The third refrigeration unit has been ordered, ETA next week

Tags: , ,
Posted in FTW

 Leave a comment

0
Comments

Performance tips – good reading for PHP/mysql devs

Published October 1st, 2009 by Barney Desmond

I came across this a little while ago; it’s a good little presentation with some interesting points I’d not considered before.

http://www.slideshare.net/techdude/how-to-kill-mysql-performance

If you’re an Anchor customer, I should point out that the ARCHIVE storage engine isn’t available in Redhat’s version of MySQL, which is a damned nuisance. :(

0
Comments

Pyramid of Productivity pt.2

Published October 1st, 2009 by Barney Desmond
************************************************************
PRESS RELEASE
FOR IMMEDIATE DISTRIBUTION
************************************************************

ANCHOR SYSTEMS SYSADMINS TO SEEK GUINNESS VERIFICATION
AS "MOST-WIRED MOFOS ON THA PLANET"

Maternal pyramid

It’s a good thing we got those stubby-holders, them mothas is ice cold!
5
Comments