GitHub: Speed matters

By September 29, 2009Technical

Impressions from the first article (in its first day) and the first 24 hours of the GitHub migration, have caused us at Anchor to believe that;

  1. GitHub is just as popular as we thought,
  2. The migration was worth it, as things are running much faster (just check your twitter feeds, or better yet, check your GitHub source tree for no reason 😉 ); and,
  3. People are interested in what has gone under the hood of the new GitHub (insert your favorite fast car here; otherwise lets say a roadster).

Taking these three things into account, this installment will discuss why things are so much faster post migration compared to prior.

I said ‘faster’ and not ‘fast’, because GitHub is now as fast as any website should be. So in comparison, yes, GitHub is fast now, however it is akin to riding your bicycle with half inflated tires: when fully inflated, suddenly your old bike is blazing fast. Now this is not to be critical of the former architecture which held its merits when GitHub was founded. GitHub had simply moved to a stage where a infrastructure architecture refresh was logical.

The main thing, in the large, that made this new architecture fast was that we were given a blank slate and large amounts of freedom to make an architecture that would do the job well.  This is an incredibly rare thing, and it no doubt took a lot of courage on Github’s part.  For that, we have to say “thankyou” to the Github team for letting us have that freedom.  I like to think that we’ve repaid that trust with a pretty awesome architecture that will serve them well for some time to come.

SCALE: When looking at the new architecture as a whole, the increased scale is immediately evident. GitHub now consumes far more hardware than ever before:

Old Infrastructure:

  • 10 VMs
  • 39 VCPUs
  • 54GB RAM

New Infrastructure:

  • 16 physical machines
  • 128 physical cores
  • 288GB RAM

Or for those who enjoy visual cues:

Resource comparison old to new infrastructure

It is a credit to the old infrastructure and GitHub’s code that it ran so well on so little (in comparison). The first credit for increased performance is increased scale.

An important note regarding the hardware is that there is nothing special (or industry secretive) regarding it. The solution in its entirety is run from commodity hardware. No special black boxes doing scary things with packets and routes. No appliance servers. The solution architecture developed by Anchor can be used with any hardware vendor (insert: Dell, HP, IBM, SuperMicro, etc). Vendor neutrality provides GitHub with no encumbrance with either scaling up or out, a key issue when considering growth and future flexibility.

Note: The architectures flexibility allows for the user repository storage to be expanded with a mix of vendor hardware (should GitHub ever change hardware vendor). Furthermore, any component can be exchanged for another vendor’s hardware with no change to GitHubs architecture or software.

In a nutshell, the increased scale provides:

  • More GitHub front-end servers to service your requests;
  • More storage; and
  • More I/O bandwidth when working with your repository data

HARDWARE PERFORMANCE: The speed specifications of the underlying components is important, in addition to how that hardware is utilised.

Storage I/O: A common factor in poor performance with any solution is an I/O bottleneck at the storage level.  This pain was GitHub’s. To alleviate this, not only is the storage now distributed across several servers (distributing the I/O), but it is now running on direct-attached 15,000 RPM SAS disks on battery-backed hardware RAID. Therefore, the second credit for increased performance is faster storage.

Direct access to hardware: Virtualisation is great. What isn’t great is when virtualisation is used as a universal solution. At Anchor we believe there is a place for virtualisation, and systems with massive I/O or CPU requirements is not that place. By moving resource heavy systems onto dedicated hardware, any contention for resources between individual VMs is removed. The third credit goes to less overhead.

ARCHITECTURE: Throwing hardware at a scaling problem is an easy solution, but without the right division of resources and the right software to properly use it, it’s not going to run real fast.

For GitHub, this was their innovative Git command proxying systems, which do an excellent job of taking requests from the frontends (where users connect with their web browser, git client, or SSH client) and shipping them to the fileservers.  The database structure, filesystem layout, and code efficiency also contribute to this.

Given that the software isn’t our speciality, there’s not a lot for us to say about this, but Github are planning a series of posts on their blog, and I’m quite sure it’ll be enlightening.

TO REVIEW: The factors involved in GitHub’s faster response on the new infrastructure include (but are not limited to):

  • Increased Infrastructure (Scale)
  • Faster Hardware ( Storage)
  • No resource contention (More resources per server)
  • Solid, scalable architecture (Awesomeness)

Keep an eye on this space, as we delve into technology specific posts regards what kinds of 11 herbs and spices Anchor used to realise the new GitHub architecture.


4 Comments

  • Ben says:

    Buzzword alert! Infrastructure architecture? try saying that three times fast. The speedup doesn’t require a genius though: move away from VMs, add faster disks, add more RAM & CPU, and optimize the application.

  • matt says:

    Hey Ben,

    You missed ripping on “refresh” as well… yeah, we let the sales guys out of the cage and it does kinda show. They’re back at the martini shaker now.

    Although it doesn’t take a genius to know that buying more hardware or making the code more efficient is the way to speed increases — we write whole articles on how to do that (like this one: http://www.anchor.com.au/hosting/development/HuntingThePerformanceWumpus) — for our customers and others to work from.

    On the other hand, you can’t get a commodity machine with 128 cores, 288GB of RAM, and several TB of screamingly-fast disk (at least not for anything approaching a sensible price), and you *certainly* can’t rely on being able to upgrade the hardware at the speed that Github is growing. The architecture of the infrastructure (heh) is quite important in hanging all the pieces together, and whilst it doesn’t take a genius, I’d say that it *does* take a team that knows what they’re doing — but then, yes, I am biased.

    For the record, as far as I know, the code that was running on the new hardware was identical to that running on the old hardware at cutover time (modulo the changes needed to use the new distributed repo storage). In other words, there was no “grand optimisation” of the code base when the cutover happened — the speed boost is down to the way we specced and arranged the machines.

  • cn0308 says:

    Great article. When it comes to storage, what file system is used for sharing? I would guess NFS.

    • Davy Jones says:

      As it happens, your guess would be wrong. <grin> Github doesn’t use any network filesystem at all. Instead, “higher level” requests are made to the fileservers (via work queues, SSH, cron jobs, etc) and local processes do all the IO locally. This is a massive efficiency saving, as it is rare to the point of non-existence that the network response is larger than the amount of disk IO would be if the same job were to be done over NFS, and there’s no shared locks or other painful NFS artifacts to deal with.