<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Anchor Web Hosting Blog &#187; moving</title>
	<atom:link href="http://www.anchor.com.au/blog/tag/moving/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.anchor.com.au/blog</link>
	<description>A view into the Anchor Engineroom</description>
	<lastBuildDate>Wed, 08 Feb 2012 00:51:36 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>GitHub: Speed matters</title>
		<link>http://www.anchor.com.au/blog/2009/09/github-speed-matters/</link>
		<comments>http://www.anchor.com.au/blog/2009/09/github-speed-matters/#comments</comments>
		<pubDate>Tue, 29 Sep 2009 06:39:27 +0000</pubDate>
		<dc:creator>bsmith</dc:creator>
				<category><![CDATA[FTW]]></category>
		<category><![CDATA[drbd]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[moving]]></category>
		<category><![CDATA[project starbug]]></category>
		<category><![CDATA[site migration]]></category>
		<category><![CDATA[speed]]></category>
		<category><![CDATA[starbug]]></category>

		<guid isPermaLink="false">http://www.anchor.com.au/blog/?p=1161</guid>
		<description><![CDATA[Impressions from the first article (in its first day) and the first 24 hours of the GitHub migration, have caused us at Anchor to believe that; GitHub is just as popular as we thought, The migration was worth it, as things are running much faster (just check your twitter feeds, or better yet, check your [...]]]></description>
			<content:encoded><![CDATA[<p><em>Impressions from the first article (in its first day) and the first 24 hours of the GitHub migration, have caused us at Anchor to believe that; </em></p>
<ol>
<li><em>GitHub is just as popular as we thought, </em></li>
<li><em>The migration was worth it, as things are running much faster (just check your twitter feeds, or better yet, check your GitHub source tree for no reason <img src='http://www.anchor.com.au/blog/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' />  ); and,<strong> </strong></em></li>
<li><em>People are interested in what has gone under the hood of the new GitHub (insert your favorite fast car here; otherwise lets say a roadster). </em></li>
</ol>
<p><em>Taking these three things into account, this installment will discuss why things are so much faster post migration compared to prior.</em></p>
<p>I said &#8216;faster&#8217; and not &#8216;fast&#8217;, because GitHub is now as fast as any website should be. So in comparison, yes, GitHub is fast now, however it is akin to riding your bicycle with half inflated tires: when fully inflated, suddenly your old bike is blazing fast. Now this is not to be critical of the former architecture which held its merits when GitHub was founded. GitHub had simply moved to a stage where a infrastructure architecture refresh was logical.</p>
<p>The main thing, in the large, that made this new architecture fast was that we were given a blank slate and large amounts of freedom to make an architecture that would do the job well.  This is an incredibly rare thing, and it no doubt took a lot of courage on Github&#8217;s part.  For that, we have to say &#8220;thankyou&#8221; to the Github team for letting us have that freedom.  I like to think that we&#8217;ve repaid that trust with a pretty awesome architecture that will serve them well for some time to come.</p>
<p><strong>SCALE: </strong>When looking at the new architecture as a whole, the increased scale is immediately evident. GitHub now consumes far more hardware than ever before:</p>
<p><em>Old Infrastructure:</em></p>
<ul style="margin-top: 0px;margin-right: 0px;margin-bottom: 0px;margin-left: 1.25em;line-height: 1.4em;padding: 0px">
<li>10 VMs</li>
<li>39 VCPUs</li>
<li>54GB <span style="line-height: 1.4em;padding: 0px;margin: 0px">RAM</span></li>
</ul>
<p style="margin-top: 1em;margin-right: 0px;margin-bottom: 1em;margin-left: 0px;line-height: 1.4em;padding: 0px"><em>New Infrastructure:</em></p>
<ul style="margin-top: 0px;margin-right: 0px;margin-bottom: 0px;margin-left: 1.25em;line-height: 1.4em;padding: 0px">
<li>16 physical machines</li>
<li>128 physical cores</li>
<li>288GB <span style="line-height: 1.4em;padding: 0px;margin: 0px">RAM</span></li>
</ul>
<p>Or for those who enjoy visual cues:</p>
<p><img class="aligncenter size-full wp-image-1179" src="http://www.anchor.com.au/blog/wp-content/uploads/2009/09/Memory_Compare1.png" alt="Resource comparison old to new infrastructure" width="375" height="436" /></p>
<p>It is a credit to the old infrastructure and GitHub&#8217;s code that it ran so well on so little (in comparison). The first credit for increased performance is <strong>increased scale</strong>.</p>
<p>An important note regarding the hardware is that there is nothing special (or industry secretive) regarding it. The solution in its entirety is run from commodity hardware. No special black boxes doing scary things with packets and routes. No appliance servers. The solution architecture developed by Anchor can be used with any hardware vendor (insert: Dell, HP, IBM, SuperMicro, etc). Vendor neutrality provides GitHub with no encumbrance with either scaling up or out, a key issue when considering growth and future flexibility.</p>
<p><em>Note: The architectures flexibility allows for the user repository storage to be expanded with a mix of vendor hardware (should GitHub ever change hardware vendor). Furthermore, any component can be exchanged for another vendor&#8217;s hardware with no change to GitHubs architecture or software.</em></p>
<p>In a nutshell, the increased scale provides:</p>
<ul>
<li>More GitHub front-end servers to service your requests;</li>
<li>More storage; and</li>
<li>More I/O bandwidth when working with your repository data</li>
</ul>
<p><strong>HARDWARE PERFORMANCE:</strong> The speed specifications of the underlying components is important, in addition to how that hardware is utilised.</p>
<p><em>Storage I/O: </em>A common factor in poor performance with any solution is an <a href="http://www.anchor.com.au/hosting/development/HuntingThePerformanceWumpus#head-8f4521847d24e2119a421aa8d89a89d7e8372fdc">I/O bottleneck at the storage level</a>.  This pain was GitHub&#8217;s. To alleviate this, not only is the storage now distributed across several servers (distributing the I/O), but it is now running on direct-attached 15,000 RPM SAS disks on battery-backed hardware RAID. Therefore, the second credit for increased performance is <strong>faster storage</strong>.</p>
<p><em>Direct access to hardware: </em>Virtualisation is great. What isn&#8217;t great is when virtualisation is used as a universal solution. At Anchor we believe there is a place for virtualisation, and systems with massive I/O or CPU requirements is not that place. By moving resource heavy systems onto dedicated hardware, any contention for resources between individual VMs is removed. The third credit goes to <strong>less overhead</strong>.</p>
<p><strong>ARCHITECTURE:</strong> Throwing hardware at a scaling problem is an easy solution, but without the right division of resources and the right software to properly use it, it&#8217;s not going to run real fast.</p>
<p>For GitHub, this was their innovative Git command proxying systems, which do an excellent job of taking requests from the frontends (where users connect with their web browser, git client, or SSH client) and shipping them to the fileservers.  The database structure, filesystem layout, and code efficiency also contribute to this.</p>
<p>Given that the software isn&#8217;t our speciality, there&#8217;s not a lot for us to say about this, but Github are planning a series of posts on <a href="http://github.com/blog">their blog</a>, and I&#8217;m quite sure it&#8217;ll be enlightening.</p>
<p><strong>TO REVIEW</strong>: The factors involved in GitHub&#8217;s faster response on the new infrastructure include (but are not limited to):</p>
<ul>
<li>Increased Infrastructure (Scale)</li>
<li>Faster Hardware ( Storage)</li>
<li>No resource contention (More resources per server)</li>
<li>Solid, scalable architecture (Awesomeness)</li>
</ul>
<p><em>Keep an eye on this space, as we delve into technology specific posts regards what kinds of 11 herbs and spices Anchor used to realise the new GitHub architecture.</em></p>
<p><em><br />
</em></p>
]]></content:encoded>
			<wfw:commentRss>http://www.anchor.com.au/blog/2009/09/github-speed-matters/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>GitHub: Designing Success</title>
		<link>http://www.anchor.com.au/blog/2009/09/github-designing-success/</link>
		<comments>http://www.anchor.com.au/blog/2009/09/github-designing-success/#comments</comments>
		<pubDate>Mon, 28 Sep 2009 02:43:55 +0000</pubDate>
		<dc:creator>Davy Jones</dc:creator>
				<category><![CDATA[FTW]]></category>
		<category><![CDATA[github]]></category>
		<category><![CDATA[moving]]></category>
		<category><![CDATA[project starbug]]></category>
		<category><![CDATA[site migration]]></category>

		<guid isPermaLink="false">http://www.anchor.com.au/blog/?p=1125</guid>
		<description><![CDATA[At Anchor we do not believe in black box solutions.  Sharing is caring and we like to share. In this post we specifically want to share our triumph with Project StarBug, better known to the wider world as GitHub. For the uninitiated, GitHub is ‘Social Networking meets Source Code management’, or in GitHubs own words [...]]]></description>
			<content:encoded><![CDATA[<p>At Anchor we do not believe in black box solutions.  Sharing is caring and we like to share. In this post we specifically want to share our triumph with Project StarBug, better known to the wider world as GitHub. For the uninitiated, GitHub is ‘Social Networking meets Source Code management’, or in GitHubs own words ‘<em>Git is a fast, efficient, distributed version control system ideal for the collaborative development of software. GitHub is the easiest (and prettiest) way to participate in that collaboration: fork projects, send pull requests, monitor development, all with ease.</em>’.</p>
<p>Some readers may protest this point, stating that GitHub is hosted in the USA while Anchor is located in Australia. How then has Anchor architected, implemented and (going forwards) manage GitHub’s infrastructure with such a geographical encumbrance?</p>
<p>All will be revealed in a blog entry <span style="text-decoration: line-through;">in</span> <span style="text-decoration: line-through;">three</span> of many parts.</p>
<p><strong>Part 1: (This Post)</strong> Designing for success (Otherwise known as: Making GitHub&#8217;s dream a reality and nightmares a thing of the past)</p>
<p><strong>Part 2: </strong>Speed matters</p>
<p><strong>Part N:</strong> (To be announced)</p>
<p>For obvious reasons, we cannot expose GitHub&#8217;s architecture in full, however we are sharing some of the more interesting technologies/architecture we have implemented, and the rationale for doing so. Essentially what we have done to make GitHub&#8217;s dreams a reality.</p>
<p><strong>Geographical encumbrance</strong></p>
<p>It is a credit to GitHub’s management that they were willing to look the world over for the right team to support them. While they do not want to be harried by anything outside the GitHub application (i.e. Hardware, O/S, Management, etc), they still needed to ensure that the right company was employed to look after these components.</p>
<p><em>Why Anchor?</em> Anchor’s flexibility to manage a solution on third-party hosted hardware (anywhere in the world) and versatility in developing an architecture to suit this scenario were part of the rationale. Anchor’s reputation for needing to know how technology works (again, no black boxes) and then working out how to improve it was a major contribution.</p>
<p>Enough fluff, now to the meat;</p>
<p>One can imagine that the architecture required to support GitHub is complex mix. We won’t lie; there are many moving parts. Some of the key criteria for designing the solution included:</p>
<p><strong>Scalability</strong></p>
<p>GitHub states it growth as “<em>400 new users and 1000 new repositories every day</em>”. Post migration GitHub will be running on infrastructure spread across 15+ physical hosts/servers. It is essential that the infrastructure can grow with the user base, from 10’s  to 100’s of servers, without the need to re-architect everything. Without a doubt, growing without the associated pain is a major objective for GitHub as it moves forward.</p>
<p><strong><em>Interesting Note: </em></strong><em>GitHub&#8217;s new physical infrastructure (at migration) consists of:</em></p>
<ul>
<li><em>15+ physical servers</em></li>
<li><em>10+ virtual servers</em></li>
<li><em>128 physical processor cores</em></li>
<li><em>Over 288GBs RAM</em></li>
<li><em>1TB+ of storage</em></li>
</ul>
<p>GitHub&#8217;s software architecture is modular by nature and scalability friendly. Components outside the core software, however, were not as readably scalable. This has been achieved with the following improvements;</p>
<ul>
<li><em>Distributed Storage Architecture (with real-time slaves).</em> Distribution of GitHub’s source code repos across multiple partitions and multiple nodes (including redundant slaves) provided improvements in performance, scalability and reliability. By removing the limitation of using a single filesystem volume for storage, the issue of dealing with large scale storage has been avoided. New partitions can be rapidly added on demand with little to no fuss.</li>
</ul>
<p>The graphic below illustrates a simplified request to the distributed file storage repo:</p>
<div id="attachment_1142" class="wp-caption aligncenter" style="width: 560px"><a href="http://www.anchor.com.au/blog/wp-content/uploads/2009/09/GitHubStorageDist_Small.png"><img class="size-full wp-image-1142" title="GitHub Storage Distribution (Small)" src="http://www.anchor.com.au/blog/wp-content/uploads/2009/09/GitHubStorageDist_Small.png" alt="GitHub Repo Storage Distribution Illustration" width="550" height="446" /></a><p class="wp-caption-text">GitHub Distributed Repo Storage</p></div>
<ul>
<li><em> (Sensible) Virtualisation</em>. Previously, GitHub&#8217;s infrastructure was entirely virtualised. While virtualisation has its merits, there are reasons to avoid it. Services that aren&#8217;t I/O-heavy can be virtualised, while components with high I/O requirements are run on dedicated (“bare metal”) servers. For GitHub, this means file storage and databases are <strong>not</strong> virtualised. Otherwise, virtualisation is used to provide a mix of server consolidation, rapid deployment and service redundancy/HA.</li>
<li><em>Horizontal scalability (on-demand, via automated build infrastructure</em>). The ability to add additional components to the infrastructure in an automated fashion reduces scale-out time and removes user error from builds/configuration. In addition, this also turns the server build/deployment procedure into a measurable deliverable. Over time this can be review and improved (Thank you <a style="text-decoration: none; color: #002bb8; background-image: none; background-repeat: initial; background-attachment: initial; -webkit-background-clip: initial; -webkit-background-origin: initial; background-color: initial; background-position: initial initial;" title="W. Edwards Deming" href="http://en.wikipedia.org/wiki/W._Edwards_Deming">W. Edwards Deming</a>).</li>
</ul>
<p><strong> </strong></p>
<p><strong>Reliability</strong></p>
<p>As with most businesses, High Availability (or <em>business continuance</em>) is essential to a success. To achieve this a combination of DRBD, virtualisation, heartbeat and load balancing has been employed.</p>
<ul>
<li><em>Mirroring Data; DRBD is utilised for several purposes. </em></li>
</ul>
<ol>
<li>It is used to ensure the redundant (read: slave) storage partitions and nodes are in sync with the active counterparts.</li>
<li>DRBD is also key in providing HA functionality across the virtualised environment.</li>
</ol>
<p>Several Xen hosts are deployed with the following scenario; Server 1 runs VM A(active) B(active) C(offline DRBD mirrored) D(offline DRBD mirrored), and Server 2 runs VM A(offline DRBD mirrored) VM B(offline DRBD mirrored) VM D(active) VM E(active). This provides active failover if either of the virtualisation hosts fail.</p>
<p>The graphic below illustrates the replicated, highly-available storage architecture:</p>
<div id="attachment_1132" class="wp-caption aligncenter" style="width: 560px"><a href="http://www.anchor.com.au/blog/wp-content/uploads/2009/09/GitHubStorage_Small.png"><img class="size-full wp-image-1132" title="GitHub Storage Simplified Example (Small)" src="http://www.anchor.com.au/blog/wp-content/uploads/2009/09/GitHubStorage_Small.png" alt="GitHub Storage HA/Replication" width="550" height="446" /></a><p class="wp-caption-text">GitHub Storage HA/Replication</p></div>
<ul>
<li><em>Consistency;</em><strong> </strong>via automated builds and configuration management. With any horizontally-scaled solution, consistency amongst similar components is essential. One of the most notable achievements across the entire architecture is the complete integration of automated build infrastructure. A new/additional component of the solution can be rapidly built and added to the overall system regardless of the architecture (physical or virtual).</li>
<li><em>Redundancy; </em>A simple way to ensure greater uptime and lower the risk of service interruption is to introduce as much redundancy as possible. GitHub is a great example of this practice. Data links, Ethernet/switching, server and components all have a redundant twin ready to swing into action should the primary fail.</li>
</ul>
<p><strong>Conclusions</strong></p>
<p>The implementation of any new architecture for an already mature product is never easy. Anchor engineers have been working tirelessly with GitHub staff to ensure the any growing pains are transparent to the users. In the next entry, we will be sharing some of our insights in regard to migrating GitHub from their existing host and infrastructure to the new Anchor developed model. Until then, we hope you enjoy the new faster GitHub, more of the time (well, all/any of the time) than ever before.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.anchor.com.au/blog/2009/09/github-designing-success/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
	</channel>
</rss>

