Load balancing at Github: Why ldirectord?

By October 31, 2009 Technical 14 Comments

Some comments on Github’s blog post “How We Made Github Fast” have been asking about why ldirectord was chosen as the load balancer for the new site. Since I made most of the architecture decisions for the Github project, it’s probably easiest if I answer that question directly here, rather than in a comment.

Why ldirectord rocks

The reasons for Github using ldirectord are fairly straightforward:

  • I have a lot of experience with ldirectord. Never underestimate the value of knowing where the bodies are buried. In ldirectord’s case, there aren’t many skeletons, but “better the devil you know” is a valid argument. If you’ve got strong experience in making something work (and you’ve managed to make it work), and you don’t have a lot of time for science experiments, then there’s a lot to be said for going with what you know.

    This goes beyond simply knowing what to do when things go wrong, of course. You’ll also know how to install and configure it already, how to monitor it, and so on.

    What’s more, in ldirectord’s case I had already proven that it worked in an architecture almost identical to Github’s, and with a similar load profile. At a previous job, I had ldirectord serving a sustained aggregate of 2500 TCP connections per second on a 128MB Xen VM, passing to a large set of backends in a manner almost identical to Github.

  • Anchor has a lot of experience with ldirectord. Whilst my experiences are one thing, there’s a lot more to building an infrastructure than just setting it up. I like to take holidays as much as anyone, and so there was no point in using something that nobody else in the company had any experience with, if there was something else that we did all know about.

    Thankfully, ldirectord lined up nicely, since it’s what we use for our other load balancing setups (not setup by me, either — these were already in place before I arrived). This meant that there was already a pile of documentation and knowledge amongst the sysadmin team about ldirectord and it’s quirks. Also, being automation junkies, we already had Puppet dialled in to install and configure ldirectord, and we knew exactly how to monitor it.

  • Ldirectord will do the job. With the prior experiences of myself and the rest of the Anchor team, we were confident that ldirectord would do the job, and at the end of the day that’s what really matters.

The Alternatives

It’s all well and good to say “we know it and it works”, but I’m not really expecting anyone to just read that and say “well, OK, I guess we’ll use ldirectord”. In fact, if you apply the above criteria to your own situation, there’s every possibility that you’ll come up with a different answer — and if you’ve never setup a load balancer at all, then you’ve got no experiences to use to guide you.

So, here are the other load balancing options I’ve dealt with, and what I think of them. This might give you a bit of food for thought when choosing your load balancer.

  • keepalived. This is the project closest to ldirectord in terms of functionality and operation. It actually uses the same load balancing “core” as ldirectord, IPVS, part of the Linux Virtual Server project. As such, it performs similarly to ldirectord when it comes to actually redirecting requests to backends, and is another excellent choice for load balancing.

    For Github, though, there wasn’t any benefit in using keepalived. Whilst I used keepalived extensively at my last job, nobody else in at Anchor had had much to do with it. Also, keepalived has a built-in failover mechanism, which we didn’t need because we already use Heartbeat/Pacemaker for all our HA/failover requirements. I also feel that keepalived is more complicated when compared directly to ldirectord, largely because of it’s built-in failover capabilities. That’s not to say that combining Pacemaker and ldirectord is dirt simple, but if you’ve already got Pacemaker on hand anyway…

    If all you needed was a HA load balancer, and had no experience with either ldirectord or keepalived, I’d probably recommend keepalived over ldirectord, as it’s one project and one piece of software to do everything you need.

  • Load-balancing appliances. Sometimes misleadingly referred to as “hardware” load balancers (they’re still chock full of software, kids — and unlike high-end routers, I don’t know of any true L4 load balancer that has it’s forwarding plane entirely in hardware).

    I loathe these things. They’re expensive, restrictive, slow, and generally cause you a lot more pain and suffering than they’re worth. At my last job, one of my projects was to convert most of one of our existing clusters from a load-balancing appliance to use keepalived. Why would we do this? Because the $100k worth of appliance wasn’t capable of doing the job that $15k worth of commodity hardware and an installation of keepalived were handling with ease — and with capacity to spare. That cluster was our smallest, too, with probably only 2/3 the capacity of the other clusters run by keepalived.

    At the job where I had ldirectord handling 2500 conn/sec, we had also previously used a load-balancing appliance, which was supplied and managed by the hosting provider. It was a management nightmare — we couldn’t get any useful statistics out of it at all, like the conn/sec coming in or going out, and we couldn’t usefully adjust the weightings of each backend (to tune how many connections were going to each different sort of machine) or manage the system in real-time. When we switched to using ldirectord, a small shell script (involving watch and ipvsadm, mostly) was all it took for the CTO to be able to watch exactly how the cluster was performing, in real time, throughout the day. He loved the visibility — and the fact that we were saving several hundred dollars a month didn’t hurt, either.

  • haproxy. While we use haproxy extensively within Github, I don’t think haproxy is the right solution as the front-end load balancer for a high volume website. Being a proxy, rather than a simple TCP connection redirector, it has much larger overheads in CPU and memory, and adds more latency to the connections. All of Github’s load balancing is being done out of one small VM, and it barely raises a sweat. The return traffic doesn’t even go back through the load balancer at Github, since we’re using a really neat mode of IPVS that allows the traffic to return to the client directly. While you can throw hardware at the load balancing problem, I still prefer to be efficient where possible.

    Since haproxy makes a second TCP connection, rather than just redirecting an existing one, it mangles the source IP address information — and while you can work around that in HTTP with custom headers, that doesn’t work for other protocols like SSH. I cringe at the thought of trying to defend against a DDoS attack when the most useful piece of diagnostic information (the source IP) can’t be correlated against the actions of an attacker on the site.

    If all you know is haproxy, and you’re running a low-volume site that only has to deal with HTTP(S), then haproxy will probably do the job — it’s certainly handling more connections inside Github than most sites will ever see. However, I’d recommend getting someone who does systems administration full-time (like us!) to install and manage a real load balancer like ldirectord rather than use haproxy, along with keeping your other basic infrastructure on track. Wouldn’t you rather be developing new features rather than dealing with this stuff?

So, there’s one geek’s opinions on load balancing. Questions and comments appreciated, and if you’d like to know more about any part of the Github architecture (or any other aspect of systems administration), please let us know in the comments and I’ll whip up some more blog posts.

14 Comments

  • wtarreau says:

    Being the author of haproxy, I can’t agree with all your points against proxies, especially
    the points about performance. 2500 connections per second is small for haproxy. This
    is what it supports on my 5-watt Geode 500 MHz (it does about 2700 in fact). Some sites using more common hardware (core2duo, dual-opteron, …) and properly setup systems regularly see loads between 15000 and 25000 connections per second.

    Using a proxy for load-balancing provides two very important features for scalability :
    – content switching, which means dedicating servers to some tasks
    – request queuing, to limit the number of servers required to sustain traffic surges

    The first point, content switching, allows you to dedicate two fast web servers to static contents. Typically nginx or lighttpd. You don’t need more than two such servers in general, as both are capable of filling a gigabit pipe. Then you only have application servers. Not only you can decide to spread your site across multiple technologies, but you can ensure that you only have to add application servers when the existing ones get close to 100% usage. Such architectures ensure you use the best tool for the job.

    The second point, request queuing, saves you from having to start large numbers of servers to support traffic peaks, and from rejecting users during such peaks. Requests are simply processed at full throttle without ever surpassing the server’s capacity. If the servers are saturated, the only effect is a slowdown for the users, which is generally not perceivable (a few milliseconds). But the real advantage of queuing is to set up QoS. It’s really nice to be able to always dedicate ressources to some categories of users (eg: the ones already logged in, the ones who’re paying for premium service, etc…). A proxy-based load-balancer can do that because it can parse a full request while an L4 LB can’t.

    A third point which is less important but not to be neglected is security. Only really valid HTTP requests pass through a proxy. Half-openned connections, SYN floods, slowloris and all similar attacks simply never reach the servers. And if you identify an attack pattern, you add an ACL for it and block it. BTW, on a proxy, ACLs are extremely cheap because they are only evaluated when the request is available, not for each packet.

    Of course, detailed logs are a major help with a proxy and LVS cannot provide that, nor can it provide timers nor response errors statistics. But almost everyone has to be beaten first to appreciate logs :-)

    Now, concerning the other points about non-HTTP protocols like SSH, I agree with you. If the proxy has no added value, let’s not install one on such protocols. Haproxy can work transparently (connect to the server with the client’s IP) on some Linux kernel versions, but the set setup is not easy and it generally is not worth it for such protocols.

    Also, if you’re serving more than 1 Gigabit/s of traffic, using a proxy will require 10 Gbps NICs which are still a little bit expensive, while with LVS you can simply do DSR and only see the smaller forward traffic.

    I still have a now-old (though not outdated yet) article on my site about scalable load-balancing architectures, for those interested : http://1wt.eu/articles/2006_lb/

    That was another geek’s comments ;-)
    Willy

  • Social comments and analytics for this post…

    This post was mentioned on Twitter by FastestFood: Load balancing at Github: Why ldirectord? | Anchor Web Hosting Blog http://tinyurl.com/y9f26w3

  • matt says:

    Hi Willy,

    Thanks for taking the time to provide such an in-depth comment.

    Sorry if I implied that haproxy couldn’t sustain a certain connection rate. I don’t know what the performance limit is with either haproxy or ldirectord on a given hardware setup — I’ve never benchmarked them to that degree. However, given the amount of work that is required to proxy connections rather than simply load balance them, I’d be very surprised if the load balancer wouldn’t handle a lot more traffic than a proxy.

    As to the two “important features for scalability” that proxies provide, well, I don’t really consider them benefits.

    Content switching can be achieved more easily with a simple assets domain, which also has the key benefits of avoiding unnecessary cookies, and steps around the per-domain limit on concurrent connections in web browsers.

    I don’t think queueing something as time-sensitive as interactive HTTP requests is a good idea — if you’re running close enough to the line, resource wise, to ever get a queue forming, my experience is that it doesn’t stay at the “few milliseconds” of delay level for very long. The overload means that while the first connection might get delayed by, say, 5 milliseconds, the next will be delayed by 10 milliseconds, and so on — it doesn’t take too many requests to hit user-noticable levels, and at several thousand requests per second, it doesn’t take very long either. You’re better off shedding load to a static failover setup and keeping things under control that way. Your users get screwed either way, but at least with load shedding they get told why they’re screwed, instead of getting the spinning throbber of doom.

    Implementing QoS in the frontend is an interesting idea, but I subscribe to the view that you only need QoS if you’re oversubscribed. Proper capacity planning saves the need to play such games, which when all is said and done can be no more than a temporary stopgap before rising traffic levels overwhelm the system capacity even with QoS. Doing more complex decision making on the proxy will also put more pressure on it, as it needs to increase the processing it does on requests before passing them through, which has implications for capacity again.

    I don’t see logs as being any sort of a benefit of a proxy — the end servers can provide timing and error rate information better than the proxy can, and it’ll scale better as you don’t have to do all the processing and logging at a single point.

    There is one point, though, I completely agree with you on: “If the proxy has no added value, let’s not install one”.

    I don’t dislike haproxy, it’s used heavily at Github and we couldn’t do a lot of what we do without it. I’ve got another blog post planned on how haproxy is used in Github, and why it works so well. I just don’t think that it adds any value in the locations we’ve used ldirectord, and the rest of my arguments in favour of ldirectord still stand — we’ve got more experience using it, it is a lightweight solution that does exactly what we need to do, and the monitoring and management work was already done.

  • btucker says:

    Do you have any thoughts on using PF for IP-level load balancing?

  • Ldirectord does indeed rock. So I’ll just agree.

    I do have something to add, though:

    I have experience setting ldirectord up to replace a keepalived setup, because for some reason the keepalived process would go catatonic and drop all states in the kernel. Only a kill -9 and restart would cure it. Very unsatisfying.

    So I’m not a particularly big fan of keepalived, but, that was 4 years ago. It could’ve been fixed by now, I don’t know.

    Another thing I like about ldirectord is the fact that it does one thing well, then you have something like Heartbeat from Linux-HA to make the highly available part. Heartbeat does that very well.

    And if the ldirectord process dies it just leaves the kernel IPVS system in whatever state it was in at the time of death, unlike my experiences with keepalived.

    One bad thing about ldirectord: there’s a memory leak if you use negotiate checks on HTTPS services, and you’ll have to restart it every once in a while. This is easily done without interruptions, you just have to do it.

  • wtarreau says:

    Hi Matt,

    first, rest assured that I did not take any of your comments as criticism nor attacks, and I’m not dicussing your choices, just about your perception of what proxies can do, nothing else.

    Concerning the performance, it depends what you are doing. Direct-routing load balancers installed in one-leg configuration where they only see upstream traffic will always be faster than a proxy because they see about half the packets. Common figures are about 4 times haproxy’s max connection rate. A load balancer set up in the middle of the stream will still generally be faster than the proxy on connection rate, but can be slower on high bit rates (multi-gigabit), because it cannot make use of TCP speedups the NIC provides, while a proxy can. For instance, haproxy can forward 10 Gbps of HTTP traffic with only 20% CPU on one of my machines, while this machine has high difficulties to simply IP-forward the same stream at this rate. This is because of the TCP off-loading capabilities of the NICs which can only work with the proxy. But granted, this is not everyone’s needs !

    Concerning queuing, well, I can tell you that this has saved quite a number of medium to large sites. Yes connections can sometimes pile up. But better accept them, parse them, reject wrong ones and get the good ones ready to be served than simply drop the SYNs and wait for the client to resend them 3 seconds later. Also, it is quite possible that your site has a very predictible load. But gaming sites, sports sites, online newspapers, stock exchange, etc… are extremely unpredictable and need to be prepared for 4-10 times the load without failing, even if some requests get slightly delayed. And quite honnestly, it’s very rare that the queue delay is counted in seconds. This happens when something breaks behind. But even there, having the ability to automatically flush requests out of the queue is nice.

    Concerning QoS, well, it also depends on your usages. Gaming sites like to reserve resources for paying customers. Other sites with many external partners will prefer to prioritize some requests vs some others to the same partner so that if they become bandwidth-limited, the most important objects are fetched first. But doing that really does not cost much. It’s just a switching rule between two queues. You may drom from 15000 connections per second to 14980 maybe, this is not particularly noticeable.

    Concerning the logs, I don’t agree with you. The web servers will tell what they do, not how it’s perceived. The front LB will see how it’s perceived and will be able to quickly tell you that server XX has higher response times than others or fails to accept a connection 1/1000th of the times, etc… This is very valuable information when people start to complain about performance issues, and it’s even better when you can compare what the LB sees with what the server says. Most often, the difference is in (too short) system queues on the server itself (small TCP backlog, etc…) that can’t even be detected by the server software because the information is dropped before reaching it. That’s also the only place you can detect the famous 3-second stairs indicating some packet loss.

    Overall, all a proxy can bring is a tradeoff, adding more intelligence in the decisions taken by the load balancer at the expense of an increased resources usage on one cheap box. If the smart decisions can improve the site’s responsiveness, reliability or scalability, that’s fine. If they don’t bring anything, it’s not a place for a proxy. That’s why you generally see them in front of HTTP servers, sometimes in front of mail servers, database servers and terminal servers, and rarely in front of anything else.

    Most likely in your case, as you explain it, it would simply not bring anything for your current usage (and I have no problem with that). And I agree with you that with only 128 MB of RAM it’s hard to run a proxy with large numbers of connections, considering that the system itself will take 16 kB minimum per connection !

  • tylerflint says:

    This is a great article! I am currently architecting an alternate solution to our current web solution. We too are running 4 web servers and are currently using PF as a load balancer. The main issue with PF is that we can’t implement weighted load balancing…blech. 2 of our servers are substantially larger than the other 2, being the reason we need weighted round robin.

    I mostly wanted to ask one question. The article is centered around ldirectord, and as I have been researching, ldirectord doesn’t seem to the the load balancing engine at all. Rather, ipvsadm appears to be the load balancing utility, and ldirectord is the monitoring utility that checks the health of the real servers.

    Am I way off here?

  • wtarreau says:

    @tyterflint: you’re almost right. LVS (or IPVS, 2 names for the same thing) is the Linux kernel subsystem performing the load balancing. Ipvsadm is the utility used to configure LVS. It does not do much, it creates server farms, adds and removes servers, and reports statistics. Ldirectord as well as keepalived which was indicated above are daemons which monitor the servers’ health and add/remove them from the farms. All that chain of tools provide a nice and complete layer 4 load balancer.

  • tylerflint says:

    Awesome. I don’t mean to turn this blog into a tutorial, so I will keep my questions short and be on my way:

    Is “all that chain of tools” available within the heartbeat packages?

    Is there anything else that I would need to be familiar with to create a complete LVS load balancer aside from these utilities: iptables, ldirectord or keepalived, and ipvsadm?

    Thanks again. github rocks and you guys nailed it.

    • Davy Jones says:

      Tyler,

      If you just want a “regular” IPVS load balancer, that’s all you need. For a high-availability load balancer (if the active node dies, the other node in the pair will take over), then you need to add Pacemaker, and heartbeat or corosync, to the list of things to put together.

  • malcolm says:

    Matt,

    Nice Blog, I agree Ldirectord rocks… I too hated paying crazy money for naff hardware load balancers, so built my own using LVS, Ldirectord and Heartbeat. Then I thought he this is pretty good why not sell it… 7 years later Loadbalancer.org owns a small corner of the load balancer appliance market and we’ve added SSL termination, TPROXY, HAProxy, Nginx, feedback agents, SNMP…. all sorts of stuff that initially I couldn’t see the point in (Why doesn’t everyone use LVS in DR mode with Ldirectord?). Still the customer is always right (even when they are wrong). We are a little different ’cause we allow full root access and open source any changes we make. It makes me sad to admit but now that processors are getting so fast… full proxies (SNAT) like HAProxy are so easy to insert into your architecture they will probably make Layer 4 routers aka. LVS almost redundant at some point. Ps. Love the line ‘Never underestimate the value of knowing where the bodies are buried’, exactly why I never worry when a customer says the load balancer is broken….It’s never the load balancer (well mostly never). And here comes the cheeky load balancing link :-).

  • pachanga says:

    Thanks for sharing! Could you please tell me what are the benefits of running ldirectord instances under Xen? You simply didn’t want to run ldirectord on separate hardware hosts and thus to re-use the existing hardware?

  • [...] talked about load-balancing on Linux before, we’ll compare it to the Elastic Load Balancer and see why it’s just what we [...]

  • Quora says:

    Github: Is the load balancing part of Github infrastructure still ldirectord?…

    Is the part of this (http://www.anchor.com.au/blog/2009/10/load-balancing-at-github-why-ldirectord/) blog-post still valid, that Github’s first-level Loadbalancer is done via LVS direct routing on Xen VMs? It seems like a solution that can scale large…

Ready to talk business? Send us a note.