Anchor’s New Colocation Fit Out – Stage One

Published February 6th, 2009 by Lachlan Cooper

Anchor’s colocation requirements have been growing steadily over the past few years, so we’ve recently taken the plunge to significantly increase our total datacentre floor space which will allow for many new racks. At this point we’re in the early stages of fitting out the suite, so we thought it an ideal chance to give our readers some insight into the process of fitting out a data centre.

A long view of the new Anchor suite area

A long view of the new Anchor suite area

The plan is to produce a new blog post for each step; and, of course, take plenty of pretty pictures along the way!

A handful of our new racks, some still in their wrapping

A handful of our new racks, some still in their wrapping

You can see above a few of the shiny new extra-wide racks that have already been delivered. The keen-eyed might also be able to see the tape marking the locations of the first few racks to be powered up shortly, with the under-floor power installation happening right now. The suite’s cage will be going in real soon now.

A long view from the opposite end of our new suite area

A long view from the opposite end of our new suite area

Redundant air-conditioning units

Redundant air-conditioning units

We’ll keep the blog updated as things progress, so check back soon for the next post!

0
Comments

Windows 2008 Hosting Now Available!

Published February 3rd, 2009 by Keiran Holloway

If you’re familiar with the hosting services provided here at Anchor you’re probably aware that we’re big fans of Linux and open source software in general.   That said, it may come as a bit of a shock for you to realise that we’ve actually been doing Windows hosting for quite a number of years now.

It all started with a couple of isolated dedicated servers running Windows back in 2004…Since then we’ve deployed a shared Windows hosting server which is now running hundreds of websites and taken on multi-server Windows hosting environments.

One thing that has been interesting to note whilst making this transition; sure a lot of us here would prefer to be working with Linux systems, but, at the end of the day, the principles behind running robust, reliability and scalable hosting remains quite identical. Whatever the platform.

To re-enforce our commitment to providing a high quality Windows hosting infrastructure we spent the latter part of 2008 hiring staff with a specific skill set surrounding this environment. In addition to this we’ve spent some time looking at the new version of Windows and have published an article in our wiki discussing Windows Server 2008 for Web Hosting

Windows 2008 hosting is now available across all our dedicated server range as well as on all our virtual private servers .

0
Comments

nsscache and LDAP reliability

Published February 2nd, 2009 by oliver

Any company with multiple servers in their authentication domain will know of LDAP. Sadly on the Linux platform, OpenLDAP (although arguably the most widely used and well known of the few LDAP servers available) is still not particularly reliable, especially when it comes to replication. The overheads involved in querying even a local OpenLDAP server are much higher than, say, the plaintext files such as /etc/passwd.

Enter nsscache. Created by two boffins at Google (one of whom graduated from Anchor Systems), nsscache gives the reliability and speed of plaintext files (or BDB if you desire) and the scalability of OpenLDAP. Anchor recently started using it and we are confident it will dramatically boost the reliability and lookup speed of all of our LDAP systems.

In terms of performance, we are seeing update times of a second or so for a partial update (only changed or new entries), and around 10 seconds for a full update (replacing the entire on-disk files with a fresh copy of the entire source database) for a bit under 20000 records.

I’d definitely recommend taking a look at it. It currently only operates on Python 2.4 or later (although a patch is nearing release that allows it to work with Python 2.3).

0
Comments

Aggregating RRD data from multiple files

Published February 2nd, 2009 by matt

The RRD (Round-Robin Database) file format is a beautiful piece of work. It is used for storing time-series data in a (storage and CPU time) efficient form, with a fixed file size, and with some great support tools to retrieve, manipulate, and graph the data in various ways.

One problem you tend to hit every now and then, though, is that you want to aggregate the data from multiple separate RRD files into one monster graph. The simple method might be to put all the data into one RRD file, but that doesn’t work in the case where you can’t always collect all the data at once — RRD requires that you insert values for all your data sources at the same time.

Now, since we use Cacti for data collection at Anchor, in theory we should just be able to tell Cacti to do this. However, its interface is utter balls, and it always seems to take 10 times as long to do something as it should, so I tend to script this sort of thing instead of trying to fight Cacti. Also, if you don’t use Cacti (you lucky person, you), then you might need to know how to do this.

Recently, we needed to know the aggregate current draw from all the racks in our data centre. We’ve got APC managed power rails in every rack, and we already collect the current data from these devices, but then it’s stored in one RRD file for each power rail. So, we needed to aggregate this data into one big graph, and take some values out of it for management’s edification. Since there’s not a lot of info out there on aggregating lots of RRDs together, I thought I’d put down some notes on the subject.

The standard form of doing a graph in RRD is like this:

DEF:power=rack1.rrd:apc_current:AVERAGE
CDEF:kw=power,240,*
VDEF:avg=power,AVERAGE
VDEF:avg_kw=kw,AVERAGE
LINE:power#ff0000
GPRINT:avg:Average\ current\ is\ %9.2lfA
GPRINT:avg_kw:Average\ nominal\ power\ is\ %9.2lfA

This just takes the apc_current data source from the file rack1.rrd and stores it in the variable power. Then we scale the data source into kW (line 2), take the average of all the data points for both of those, then draw a line for the current, and print the average values we calculated. All pretty simple stuff, and if you work with RRD files at all, you’re probably quite familiar with this sort of thing.

What isn’t as common knowledge is that there’s nothing special about the DEF statement above — you can repeat that as many times as you like, and you can point to as many different files as you need. So if you’ve got, say, ten RRD files with current values in them, you can just do:

DEF:power1=rack1.rrd:apc_current:AVERAGE
DEF:power2=rack2.rrd:apc_current:AVERAGE
DEF:power3=rack3.rrd:apc_current:AVERAGE
...
DEF:power8=rack8.rrd:apc_current:AVERAGE
DEF:power9=rack9.rrd:apc_current:AVERAGE
DEF:power10=rack10.rrd:apc_current:AVERAGE

This will define separate variables for the apc_current data source in each of the files. This also works, incidentally, if you’ve got multiple data sources in each file (like, say, incoming bytes and outgoing bytes).

Once you’ve got your data sources mapped, it’s a fairly simple matter of adding them all together:

CDEF:power=power1,power2,+,power3,...,power9,+,power10,+

The rest of the definition stays the same.

What makes for a slightly more exciting time is when you don’t know, in advance, how many files you’re going to have to merge together. This happens whenever the user gets to specify what data gets included — the script we’ve got here asks you which racks you want to aggregate the data for, and I’ve done bandwidth graphs in the past which showed all of a customer’s IP addresses in one graph. In this case, you need a bit of code, and here’s some Ruby that I use to generate the RPN expression above to add all of the values together:

# Generate an RPN (reverse polish notation) sum of
# the strings given in list.
# A single-element list is supported, with the
# expected lack of addition operator.
def to_rpn_sum(list)
        if list.length == 1
                list[0]
        else
                x = list.dup
                (x.length - 1).times { |i| x.insert(i * 2 + 2, '+') }
                x.join(',')
        end
end

Glue that together with the code to create your list of RRD files, something to write out all the DEF lines (and keep a record of what variable names you use) and you’re pretty much done.

Tags: , , , ,
Posted in FTW

 Leave a comment

0
Comments

The Value of Commercial Software Support

Published February 2nd, 2009 by matt

Here at Anchor, we’re often asked to install commercially-supported software products by our customers. Most commonly, it’s Linux distributions, but hosting control panels, app servers, and various other pieces of paraphenalia all get the treatment fairly regularly.

The internal opinion on the subject is that most commercial support agreements for software aren’t worth the paper they’re written on (a problem made much worse by the fact that you can’t wipe your backside on an e-mail). A recently-concluded saga with a certain prominent North American vendor of Linux distributions has done nothing but reinforce this opinion, to the point that a rant is the only way to deal with the insanity.

In March 2006, we got a problem report from a customer that an aspect of our hosting services was not operating correctly. We investigated, and determined that the problem was that the vendor-provided webserver was crashing. Since this system was covered by a support agreement, we lodged a bug report with the vendor.

The log for this report in the vendor’s bug tracker reads like a primer for “how not to provide tech support 101″, with various people from the vendor commenting on the issue and asking for information that had already been provided, and generally tripping over each other to dodge and weave and avoid investigating and fixing the problem.

We also enjoyed the repeated use of a wonderful stalling tactic: demanding the provision of a large (> 650MB) dump of system information before investigating the problem. In addition to the practical problems of uploading a CD’s worth of data over Australian-grade ADSL uplinks to a flaky FTP server on the other side of the world, this dump contained various customer confidential information, which made it a gamble to upload. It also contained nothing of actual use in diagnosing the problem. (I know for a fact that the info dump was unnecessary, because the problem was eventually fixed — by us — without needing anything in that file, but instead entirely using the information we originally provided).

Overall, the entire bug report documented a thoroughly unhelpful exchange, spanning several months, with the guy on our side of the keyboard getting obviously more and more frustrated as the weeks went by. I wasn’t involved in the original bug at all, but even I got worked up reading over the log.

Eventually, in July 2006, we gave up on the vendor, worked out a very ugly and kludgy workaround ourselves, and closed the bug in disgust, hoping that the problem would never rear it’s ugly head ever again. It did, repeatedly, but each time the kludge was folded, mutilated, and spindled some more to provide further relief, because the idea of going back to the vendor was just too horrible to contemplate.

Things remained in this state of critical stability until a couple of weeks ago, when the problem once again became the focus of our attention. The difference this time was that this time the bug report landed on my desk, and I was flush with success after finding another Apache segfault bug (this one a security vulnerability) late last year. I figured I could dive in and find the bug.

It turned out to not be quite so easy as the previous one, but after about two and a half days of digging and poking, I did manage to unearth the source of the bug. It was, as it turns out, entirely due to a coding mistake in the vendor-provided webserver, and it was entirely diagnosable with the data that was originally provided in our bug report of 2006.

Things took an ugly turn at this point, though. Despite the vendor having expressed no interest in finding and fixing the bug in their software, I decided to send the patch to them, in the interests of being a good OSS citizen. Their reaction was utterly incomprehensible:

  • Despite being told in the original message that “the attached patch fixes the problem”, they asked “I would like to know if the patch you have uploaded solves your issue” — like I’d upload a known-broken patch, and say it fixes the problem. Sheesh.
  • They again asked for the gigantic system info dump, which we’d previously told them we couldn’t provide.
  • They also claimed that, since the OS release in question would be going out of support in around 6 months time, it would be very unlikely that a patched release of the webserver would be forthcoming.

So, in other words, if you’re running a commercially-supported software product, for which you’ve paid quite a considerable sum of money, you can expect that the “supported period” will be shorter than your contract promises, you’ll be given the runaround, the vendor will do anything they can to avoid having to actually do anything, you’ll be asked idiotic questions that anyone with a fundamental grasp of the English language would be able to answer from the existing bug log, and even when you do the vendor’s job for them and fix the problem yourself, they’ll still persist in jerking you around. And somehow, somehow, that’s better than saving the money and just being able to fix problems yourself, when and how you need to?

Sorry, but screw that for a game of skittles. I’m not against paying people for assistance, but if I pay you for assistance, I’d really appreciate it if I actually got some.

I’m having trouble recalling a situation in which I’ve actually gotten a consistently good experience out of a software support organisation. This isn’t an isolated incident — it seems like every time a problem is reported in a piece of commercially-supported software, the relevant vendor deems it more cost-effective to avoid the issue rather than fix it. That this seems to actually work (since people still keep paying for “support” when they don’t get any) is a sad indictment on consumers of IT services, while the fact that nearly all commercial software vendors are willing to screw their customers over is a horrible, soul-destroying realisation.

While the plural of anecdote isn’t data, my experiences, and that of the rest of the Anchor staff, really only suggest one thing: software “support” contracts aren’t really worth an awful lot, in the absence of real, strong performance guarantees. (Why nobody will give you an effective performance guarantee is left as an exercise for the reader.)

That isn’t to say that paying for software is never recommended. If the software you want to run is a commercial product, then there’s only one option — pay for it. Copyright infringement isn’t cool. Personally, I’ve not been the least bit interested in a commercial software product — other than Wii games — in the last 10 years, but I’m weird. Other people have differing opinions on the subject.

However, when you’re making the decision to buy a commercial software product, bear in mind that all you’re paying for, in practice, is the right to use the software. Any support services you are promised are unlikely to be of any value whatsoever. In fact, if the software product isn’t Open Source, then it’s value is actually lower, because nobody except the vendor can fix problems you come across — and the chances are that the vendor will not fix the problem for you. Ouch.

This might sound like a weird statement coming from a company that makes some of it’s revenue from servicing software. While we’re a hosting company, it doesn’t take very long for some customers to get out of their depth and need some specialist assistance in getting something running on their server, and we’ve got the expertise on-staff to help with those of things — for a suitable fee.

The difference between what Anchor does and what most software companies do is that we’re not selling software, just expertise. We also have no ability to lock you into using a particular piece of software or service, and hence if we don’t provide a good service, there is nothing stopping you from going to someone else next time. That tends to keep us on our toes.

But that doesn’t mean that our support level couldn’t decrease in the future, so it’s important that our customers don’t accept bad service — from us, but also from anyone else.

Everyone, both customer and service provider, needs to have high standards, and demand those high standards from their suppliers and customers. There’s way too much laissez faire in the IT industry.

2
Comments