Posts Tagged ‘datacentre’

Ooh, bugger…

Friday, October 2nd, 2009

And this is why we co-locate in Globalswitch, a top-tier facility with floors that AREN’T MADE OF BALSA WOOD.

Racks are pretty heavy, sure, but they totally wtfpwned those tables there

Racks are pretty heavy, sure, but they totally wtfpwned those tables there

Pain-free server migration

Thursday, April 9th, 2009

Being the veteran of a datacentre migration and several whole server migrations I feel like I’m getting the process down to a reasonably fine art. I had to perform another migration last night from another datacentre to ours at Global Switch and the process went very smoothly so I thought I’d share some of the techniques I’ve built up over time so you might benefit if you’re in the same situation.

Preparation

This should go without saying. The more time you have to prepare for the migration, the better. You do not want to leave it until the last minute. My philosophy when approaching the migration is always to leave the least amount of work possible to do at the time of the actual migration. Clients will generally want to schedule any server downtime for late at night, when you are not going to be operating at your best (despite how many coffees or energy drinks you may have consumed). If you can log in to the machine, run a prepared script which takes of everything and have the migration completed for you, you will end up with a happy client and be happy yourself. You will be in the datacentre for less time and get to bed earlier, both of which are good things.

Make good use of scripting

Following on from my last point, I strongly encourage you to script as much as possible. The migration I just performed entailed moving a server from one datacentre and network provider to another which meant a change in address space. Thus, firewalls, IP address configuration files, Apache vhosts, ACLs and more had to change. Ahead of time I determined which files would need to be modified and created a script which took a backup of each of these files before overwriting them with corrected versions. Any failure would cause the script to stop and print the problem which could be easily diagnosed manually.

The more automation and failsafes you can build into your script, the better. Since you will be creating it with plenty of time up your sleeve and your brain operating at full capacity you can build up the script with your full arsenal of tricks. At 3am in a cold datacentre with noisy airconditioning you can hardly expect to have your full faculties with you, so make life easier for yourself by leaving as little actual work to do at this point.

Fully acquaint yourself with the server

You will only know what needs to be changed on the server if you are familiar with it. Of course, you should have plenty of good documentation already on it but if not, log in and get the lie of the land. Have a plan for how you will find out facts about the system – make use of grep and well structured regexes for finding out configuration details, slocate (if there is a locate database present) for finding critical files, and your usual toolkit of sysadmin techniques.

Document as you go

At Anchor, documentation is critically important. We have an internal wiki system in which we make detailed notes on every server and a great number of technical articles (a lot of which we have shared with you in our public wiki). Every migration plan is carefully documented from start to finish. In more complicated scenarios a full change proposal is created and officially ratified, but at the very least you should create a checklist:

  • people involved (and their contact details, if necessary)
  • time frame
  • a detailed list of items that need to be prepared or information that needs to be acquired before the migration takes place
  • actions that will be undertaken just before the migration starts
  • the list of actual migration steps, including details of what any scripts will be doing
  • post-migration actions which need to be done immediately after the migration – e.g. checking that all your monitoring is showing OK for all hosts and services
  • a list of “cleanup” items which can be completed after the migration, but not time critical, e.g. removing stale references to servers from your internal documentation

Have as many people check over your documentation as possible, preferably those who have knowledge of the systems so that they can find anything you have missed. The more eyes on your documentation, and heads thinking about it, the better the chances that you will have a plan that covers all aspects.

One of the most important things from my point of view with documentation is to forward a copy to the client, and keep them involved in the process. Not only does it give them confidence in your abilities to conduct the migration successfully, but it gives them an idea of the work that you have had to put in, gives transparency to the process and gives you another point of view on the migration – there may be other steps important to them which you may have missed for example lowering TTLs on domains that are solely client-controlled.

Keep the client “in the loop”

Following on from my previous point, as well as giving the client a copy of your migration documentation, it is important to let them know what is going on. Send them a courtesy email every day or two, a call or whatever your deem appropriate to let them know how you are going with preparations and any information you need from them.

On the day of the migration, double-check everything with them – times, contact details, the migration plan, and so on. Make sure they are still happy to go ahead and that they are happy with your plans. Give them a courtesy call or message when you are about to start the migration, when you are finishing, but most importantly whenever you have any unexpected problems. Nothing upsets clients more than having things go pear-shaped and not being informed about it. Even if you don’t know what the problem is, let them know that you are diligently working on it and will keep them up to date with developments.

Plan for when things go wrong

In a perfect world, you would prepare adequately and everything would go flawlessly (as it did for me last night, luckily). However every slightly obsessive-compulsive systems administrator knows that things can and will go wrong every now and then despite your best efforts.

Make an escape plan for every point where things can go wrong during the migration. Given you won’t have infinite time available, prepare most for the most likely failure scenarios. Make a rollback plan which will abort the migration, and decide how many failures will cause you to take this rollback plan on the night. Confirm this with the client.

Make sure that no change you make cannot be reverted (which most times will necessitate backups). There is nothing worse than discovering you have irrevocably destroyed data in the process of making a critical change.

Approach everything with an obsessive-compulsive attitude

The best plans will have considered everything and left no detail to chance. It can be tiresome to be painstakingly thorough in your plans, but ultimately it will pay off. At the same time though, you don’t have to do everything in one sitting – make notes in your migration plan on what you still need to do and follow it up later. Don’t foolishly believe you will remember everything on the migration day, or even an hour from now – WRITE IT DOWN!

Remember, even though the preparation may be slightly tiresome, you are just making life easier for yourself at migration time. Hopefully if you follow these general tips I’ve prepared, they will make your next migration a lot easier.

News flash: widespread power outage hits Sydney CBD, Anchor hosting operations unaffected

Tuesday, March 31st, 2009

Sydney suffered a nasty power outage in the CBD on Monday, which according to reports affected tens of thousands of homes and businesses. Curiously, some traffic lights on George street were blacked out while others just a block or two away were working fine. From a technical standpoint, a measure of diversity like that is probably a good thing. Rather than having vast areas with unmanaged traffic flow, police could be deployed where necessary, with the knowledge that vehicles could move a meaningful distance before getting stopped at the next set of blacked-out lights.

A friend of one staffer at Anchor was expected to be staying back late last night babysitting the systems in their office that would take some time to come online. Meanwhile over at Anchor’s datacentre, things were humming along nicely without a blip. Globalswitch, our infrastructure provider, has multiple diverse power feeds to cover all equipment, along with redundant power and cooling capacity. In the event of a catastrophic supply failure, diesel generators are on standby to keep things running.

The Anchor NoC was also unaffected; we’ve got big EVA batteries to tide us over. Sure, they’re no competition to a GN drive, but our power requirements are somewhat more modest than a ‘004 Nadleeh in Trans-Am mode, so it’s not really an issue (don’t believe any vendor who tries to tell you otherwise, crunch your numbers first!).

eva-clock2

While we’re on the topic of backup power, it seems the CBD’s emergency warning systems don’t have backup power either. I’m not interested in making a call as to whether they should or shouldn’t have backup power, but from a public perspective it sure doesn’t look good on a service that’s meant to function in an emergency.

When deploying an “important” system, an appropriate level of consideration needs to be given to how you’re going to keep that system running; a point that we see missed all too often. Expecting a system to work continuously without fail is … well, doomed to fail, if you don’t have the corresponding redundant systems and fault rectification capabilities in place – standard on all Anchor web hosting, naturally. :)

I’m just happy that power at my house wasn’t affected – I had a lot of interesting browser tabs open, y’know.

“Mr Rees has told Parliament the shutdown of the three other power cables went to plan and 99.4 per cent of Sydney’s public transport services ran on time.”

Huh. Reliability went up as a result of the outage, eh.

Anchor’s New Colocation Fit Out – Stage Two

Wednesday, February 11th, 2009

Our new colocation space will be ready to go very soon! In the last couple of days we’ve had the new racks installed and the basic network infrastructure connected. The power rails in each rack will be powered up on Friday, and the network hardware will be installed. We expect to have live equipment in there in less than a week.

This will mostly be a photo post, they speak for themselves. To keep things interesting we’ve got photos of some of the high-level building infrastructure. This is the heavy-duty, redundancy-everywhere stuff that keeps you up and running, guaranteed. If you’re interested, follow the link on each photo; there’s a little more detail on what you’re looking at.

You can see most of the new floorspace and racks in the photos there. Once it’s online, we’ll have doubled our entire operating capacity. I’d say we’re growing rather nicely.

Photos are Creative Commons licensed Creative Commons Attribution-Share Alike 3.0 Unported Licence

Anchor’s New Colocation Fit Out – Stage One

Friday, February 6th, 2009

Anchor’s colocation requirements have been growing steadily over the past few years, so we’ve recently taken the plunge to significantly increase our total datacentre floor space which will allow for many new racks. At this point we’re in the early stages of fitting out the suite, so we thought it an ideal chance to give our readers some insight into the process of fitting out a data centre.

A long view of the new Anchor suite area

A long view of the new Anchor suite area

The plan is to produce a new blog post for each step; and, of course, take plenty of pretty pictures along the way!

A handful of our new racks, some still in their wrapping

A handful of our new racks, some still in their wrapping

You can see above a few of the shiny new extra-wide racks that have already been delivered. The keen-eyed might also be able to see the tape marking the locations of the first few racks to be powered up shortly, with the under-floor power installation happening right now. The suite’s cage will be going in real soon now.

A long view from the opposite end of our new suite area

A long view from the opposite end of our new suite area

Redundant air-conditioning units

Redundant air-conditioning units

We’ll keep the blog updated as things progress, so check back soon for the next post!

Site links
Anchor
Wiki
Blog
Services
Domain names
Web hosting
VPS
Dedicated Servers
Co-location
Articles
Dedicated Server Purchasing Guide
Dedicated Server Tutorials
Developer Friendly Hosting
Useful Tools