Archive for March, 2009

News flash: widespread power outage hits Sydney CBD, Anchor hosting operations unaffected

Tuesday, March 31st, 2009

Sydney suffered a nasty power outage in the CBD on Monday, which according to reports affected tens of thousands of homes and businesses. Curiously, some traffic lights on George street were blacked out while others just a block or two away were working fine. From a technical standpoint, a measure of diversity like that is probably a good thing. Rather than having vast areas with unmanaged traffic flow, police could be deployed where necessary, with the knowledge that vehicles could move a meaningful distance before getting stopped at the next set of blacked-out lights.

A friend of one staffer at Anchor was expected to be staying back late last night babysitting the systems in their office that would take some time to come online. Meanwhile over at Anchor’s datacentre, things were humming along nicely without a blip. Globalswitch, our infrastructure provider, has multiple diverse power feeds to cover all equipment, along with redundant power and cooling capacity. In the event of a catastrophic supply failure, diesel generators are on standby to keep things running.

The Anchor NoC was also unaffected; we’ve got big EVA batteries to tide us over. Sure, they’re no competition to a GN drive, but our power requirements are somewhat more modest than a ’004 Nadleeh in Trans-Am mode, so it’s not really an issue (don’t believe any vendor who tries to tell you otherwise, crunch your numbers first!).

eva-clock2

While we’re on the topic of backup power, it seems the CBD’s emergency warning systems don’t have backup power either. I’m not interested in making a call as to whether they should or shouldn’t have backup power, but from a public perspective it sure doesn’t look good on a service that’s meant to function in an emergency.

When deploying an “important” system, an appropriate level of consideration needs to be given to how you’re going to keep that system running; a point that we see missed all too often. Expecting a system to work continuously without fail is … well, doomed to fail, if you don’t have the corresponding redundant systems and fault rectification capabilities in place – standard on all Anchor web hosting, naturally. :)

I’m just happy that power at my house wasn’t affected – I had a lot of interesting browser tabs open, y’know.

“Mr Rees has told Parliament the shutdown of the three other power cables went to plan and 99.4 per cent of Sydney’s public transport services ran on time.”

Huh. Reliability went up as a result of the outage, eh.

Anchor goes green(er)!

Tuesday, March 31st, 2009

As part of our recent greenification initiative we’ve started adding plants to the office. We’ve decided it’s simply not practical to carry the nautical theme throughout the Chalet, so we’re going for a more jungle-y theme. If Johnny Depp has shown us anything, it’s that a jungle theme should be largely compatible.

Without further ado, die pflanzen:

ACMA blacklist not so black?

Tuesday, March 31st, 2009

There’s been a lot going around recently regarding the ACMA’s proposed blacklist for undesirable and illegal content. By all measures it appears to have been an embarrassment for the government (or at least Senator Conroy) and just isn’t getting any support from anyone, judging by what’s getting reported in the mass media (though I admit minimal exposure to tabloid publications). I don’t even know if the trials of this filtering technology have managed to get off the ground yet.

In case you’ve not been following, ISPs aren’t that interested in participating in these trials, many are accusing the government of censorship, senator Conroy isn’t really sure what the filtering can or can’t do, the “OMG TOP SECRET” blacklist has been predictably leaked (or has it?), and a little while ago there were suggestions the government would find a way to fine you for even sniffing a blacklisted URL.

Myles Peterson from the Canberra Times has his own little opinion piece which I liked very much for the way he’s summed things up. I’d like to borrow a quote from it:
(more…)

Grepping for binary data

Tuesday, March 31st, 2009

I was dealing with an interesting content-encoding issue yesterday for a customer’s website. They’re adamant that the problem started a few weeks ago after a routine database restoration, but we beg to differ. In any case, the customer’s site was displaying “funny characters” here and there, classic symptoms of encoding failure. I’ve written about this before, as it relates to MySQL’s handling of character encoding, but it’s not mysql’s problem alone.

In this case, the content coming from the database and CMS was proper UTF8, but there were dodgy characters leaking into the rendered page. I knew these would be coming from template files in the user’s account, but how to find them? I could find an instance here and there by searching for nearby strings, but I needed to nail all of them, and I don’t know every page on the customer’s site.

After playing with things a bit and the aid of iconv, I determined that one of the bad characters was the Trademark symbol (), and that it was in the Windows-1252 encoding. As we all know, Windows loves standards, so of course it makes sense that developers would be saving files with this encoding.

But how to find the other files with this non-UTF8 trademark characters? I can’t easily paste the dodgy character into my UTF8 terminal, but I do know that the trademark character is represented in hex as 0×99. One idea I considered was using hexdump on all HTML and PHP files (did I mention this developer had put PHP in .html files?) and then grepping the textual output for ' 99 ', but this is very messy and takes no advantage of grep’s powerful capabilities.

Then our tech director suggested using echo – grep will happily search for arbitrary bytes, you just need to get them into the search pattern. A few keystrokes later we had a very nice solution.

Compare:

[ciel@phantomhive public_html]$ for i in *.html ; do hexdump -C "$i" | grep -q ' 99 ' && echo "$i" ; done
london.html
baker-st.html
monarch.html

With this:

[sebastian@phantomhive public_html]$ find . -name '*.html' -print0 | xargs -0 grep -l `echo -en '\x99'`
./london.html
./baker-st.html
./monarch.html
./dire/header.html
./includes/header.html

Fantastic! You’ll notice that the latter form also finds relevant files in subdirectories, something which the naive version simply doesn’t do.

Astute readers might also recognise that echo shouldn’t have worked there; the manpage for echo only mentions printing arbitrary bytes from octal notation. The example here works because we’re actually invoking the shell’s own builtin echo command, which accepts backslash-escaped hex bytes. If this isn’t an option or your shell’s builtin echo is deficient, you’ll have to convert those bytes to octal manually (you can use the ascii command’s character chart for this). Or do you?

Google to the rescue! I use google as a calculator all the time, but serendipitously discovered that it’ll do simple base conversions as well.

Ask a simple question, get a simple answer
0×99 = 0o231

# mysql_secure_installation… Ya-ha-! (and ~/.my.cnf)

Tuesday, March 31st, 2009

I was setting up mysql-server for a customer recently and noticed something interesting – there’s a helpful script included with mysql called mysql_secure_installation. We thought about that for a moment and had a chuckle. Okay, that was a little unfair; it’s no secret that we prefer to use Postgres wherever possible, but the idea of having a “make it all secure” script isn’t too bad an idea, as long as it doesn’t produce a false sense of security.

(more…)

I thought web hosting companies were the ones blocking spam

Friday, March 20th, 2009

We use a Barracuda to keep spam out of our email at Anchor. Having overcome some early teething issues and generally handling it with care it does do quite a good job of keeping spam out of our email to the point that it doesn’t really bother you – most of the time.

cosmotel-spam1
Perhaps that’s why the delightful email I received from Cosmotel Web Hosting caught my eye this morning – I just don’t get that much spam these days. Note the URL’s use of the words “emailmarketing”, I guess to some that is another name for spam.

My quarantine box always has a good collection of spam covering the ever enlightening topics of how to last longer on the job, how to make my schlong – well, long I guess, and of course all manner of exciting prescription medicines. It would be fair to say that the majority of this doesn’t originate from Australia and those generating could benefit from re-evaluating their ethics.

What surprised me about receiving spam from another hosting company is that as a web hosting provider you spend a not insignificant amount of time blocking spam, dealing with customer complaints about getting too much spam and getting your own mail servers out of spam abuse lists from the occasional overzealous sales cadet. Surely as a hosting provider you’re more aware of the problems with spamming and the illegality of it than the average punter? Surely you would think more than twice before hitting the send button?

For those that aren’t clued up on the legal problems with spam, our guide to responsible email marketing will run you through the Spam Act of 2003. Yes 2003! that’s 6 years since spamming in Australia became illegal (technically a little under 5 as the act only came into effect in April 2004).

Looking on the bright side, this mornings colourful email promising me 99.95% uptime (really, only 21.6 minutes of downtime per month, from a website that appears to be hosted on a DSL link, perhaps we’re wasting money on our bgp implementation and 4 upstreams) for $58/year did make me ask the question – is our government actually doing anything to enforce the Spam Act of 2003?

My Google searches soon led me to the ACMA website where I discovered that they appear to be quite active. They have a plugin not just for the Outlook mail client, but Outlook Express as well. Great, time to bin my Apple and switch to a Windows PC. Dig a little further and to their credit ACMA have made available some very usable alternatives for non-Outlook users to report spam. You can register for an email address to forward messages to our report spam via a web form. I’m impressed.

What happens to it once a complaint is received? According to ACMA the emails go into a database and are used in investigations and proceedings against spammers. They quote some quite impressive statistics on data collection and enforcement activities.

Will the report from an unhappy camper receiving probably one of the less harmful types of spam from another player in their industry be investigated? I’ll let you know if I hear back from ACMA.

A nice quote from CosmoTel’s website: “CosmoTel has different work ethics to our competitors” – yes you certainly do!

p.s It seems I’m not the only one that isn’t happy about the Cosmotel spam or questioning why a web hosting company is spamming!

IPv6 Implementors Conference

Thursday, March 19th, 2009

I was dropped a quick note by one of the speakers at the IPv6 Implementors Conference which is being hosted by Google – http://sites.google.com/site/ipv6implementors/conference2009/

Sadly I had no idea this conference was on, as it looks like a valuable opportunity to learn about IPv6 and the progress it is making in the wild. I did get a couple of handy tips about how to improve our implementation plan though so not all was lost.

If you are attending this conference, you are more than welcome to leave comments on this blog post with your learnings – or even link to your own site – the more the merrier.

View from the top

Thursday, March 19th, 2009

The venerable (and still exceedingly useful) top tool is immensely useful for seeing who is consuming all your CPU and memory. However, it’s not so good on showing who is eating your disk IO, or network bandwidth.

Unsurprisingly, people have run with the top concept and produced a wide range of other tools:

  • iotop, to show the consumption of disk IO (which we’ve previously covered in detail);
  • iftop, for your network;
  • htop, an enhanced top with bargraphs and other “Sysadmin 2.0″ features;
  • mytop, for when there are queries that are killing your MySQL server.

All top tools, and worthy of a look.

(sorry, couldn’t resist)

Global connectivity monitoring

Thursday, March 19th, 2009

funny-pictures-the-internet-is-a-series-of-tubes

If you manage a network on the Internet, you are committing to providing connectivity to practically the entire world, while only having direct control over your local connectivity. Worse still, you usually only have good visibility into local network conditions, which makes knowing about (as well as investigating and resolving) connectivity problems from other parts of the world a massive pain.

Clever people on the Internet, though, have already noticed this problem and are here to help. My network tool of the week is traceroute.org, which offers a huge list of publically-available traceroute servers sorted by country. You give one of these traceroute servers an IP address or hostname, and they’ll show you how they got to it from wherever they are. If the utility of that isn’t immediately obvious…

There’s also lists of BGP looking glasses and a bunch of other handy info too, but just the ability to see a traceroute to your network from Azerbaijan is worth the price of admission and more.

BGP Data Visualisation

Thursday, March 19th, 2009

If you are among the upper echelon of network administrators who happen to have BGP administration within their scope of duties you probably have access to a lot of interesting, albeit quite verbose, information about the Internet at large. Generally, any network with a BGP configuration accepting a full feed from their upstream will have data on just about every entity connected to the Internet. How much you decide to use that information is up to you.

BGP information from your upstream generally has the following pieces of data within it:

  • a network prefix and length
  • the next hop for the prefix
  • path of AS numbers through which the advertisement has passed (subject to some manipulation)
  • information on the originating routing protocol
  • community settings
  • various other BGP settings

After making a decision based on all of these factors, the border router will insert a route into the routing table to reach the advertised network and after this point essentially the data is unused (aside from optionally being passed on to other border routers). There are so many more possibilities for this data however – you may use it for diagnosis of network issues or you may want to use it to visualise your BGP router’s view of the Internet (which is far more interesting).

Here we will use the Quagga routing suite to provide the BGP data. You are free to use Cisco or other proprietary equipment but I find having a server running Quagga to allow you a lot more flexibility, especially in this case of getting data out of the BGP system.

BGP table version is 0, local router ID is 202.4.236.8
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
* i3.0.0.0          202.4.236.9                    90      0 4826 703 2914 9304 80 i
*>                  114.31.193.74                  90      0 4826 703 2914 9304 80 i
*                   203.134.70.37                  10      0 9443 2914 9304 80 i
*> 4.0.0.0          114.31.193.74                  90      0 4826 3356 i
* i                 202.4.236.9                    90      0 4826 3356 i
*                   203.134.70.37                  10      0 9443 11867 7018 3356 i
*> 4.0.0.0/9        114.31.193.74                  90      0 4826 3356 i
* i                 202.4.236.9                    90      0 4826 3356 i
*                   203.134.70.37                  10      0 9443 11867 7018 3356 i

...

Total number of prefixes 275034

The above is a small snippet of BGP data, straight from the proverbial horse’s mouth (the BGP router). Immediately we can see that there is a lot of information for us to use – almost 300,000 unique network prefixes with associated paths through various entities identified by their AS numbers. With this path information we can build a visualisation of the entire Internet. It must be stressed though that this “view” of the Internet is only as seen from our network’s point of view and could be vastly different if generated from a different network. Due to the decentralised nature of the Internet, there is not one categorically “authoritative” view of it (even if you took the BGP data from a very well-connected network), but that doesn’t mean that our view is not useful!

Making the Data Usable

I started out with a copy of the full BGP feed from one of our border routers. Using Quagga’s BGPD you can output the entire feed (post-filtering of course) into a file by using the command-line `vtysh` tool:

# vtysh -c 'show ip bgp' > /data/bgp.txt

You will end up with the entire feed in Quagga output format in the file `/data/bgp.txt`, unfortunately not in a well-formatted data structure but in a format we can work with (the format shown in the excerpt above).

From here, we need to pass the file through a little bit of manipulation so that our graphing backend of choice can use it. I hacked up a very quick perl script which takes the output from the “show ip bgp” and attempts to break it down into unique paths between ASs. It strips out unnecessary headers and other text, then goes through each AS path and adds direct links between ASs to a hash table (so we can automatically remove doubled-up entries). It spits out the list of paths in a fairly Graphviz-centric format but can be easily adjusted to fit the requirements of most other graphing engines.

#!/usr/bin/perl

#use strict;
my %aslist;
my %asnodes;
my $numpaths = 0;

while (<STDIN>) {
	# Skip the first 5 lines of header data
	if ($. < 6) { next; }
	chomp;

	# Skip any blank lines
	if ( m/^$/ ) { next; }

	# Skip lines without an AS path
	if ( m/ 0 (i|e|\?)$/ ) { next; }
	unless (m/ 0 / ) { next; }

	# Skip the last summary line
	if ( m/^Total number of prefixes.*$/ ) { next; }

	# Grab just the AS path bit
	s/^.* 0 (.*) (i|e|\?)$/\1/;
	s/(\{|\})//g;
	s/,/ /g;

	# Turn the AS path string into an array
	my @path = split(' ', $_);
	$numpaths++;

	# Grab the path between each pair of nodes in the array
	$current = pop(@path);
	while ( $next = pop(@path) ) {
		# Don't include AS path prepends
		if ( $current == $next ) { next; }

		# Add both ASs to our global list of ASs
		$asnodes{$_}=1 foreach "$current";
		$asnodes{$_}=1 foreach "$next";

		# Add the path between ASs global hash, so we have no duplicates.
		if ( scalar($current) < scalar($next) ) { $aslist{$_}=1 foreach "$current:$next"; }
		else { $aslist{$_}=1 foreach "$next:$current"; }
		$current = $next;
	}
}

while (($key, $value) = each(%aslist)){
	$key =~ s/:/ /;
	print "$key\n";
}

Graphing Engines

This blog post was originally going to be a full-fledged wiki article, but while I originally thought it was a nifty idea I could knock over in a day or so, it turns out that graphing problems can be really, really hard. Who would have thought? So I spent a couple of days on this but didn’t end up getting the pretty yet functional graphs that I had hoped to get. I also stupidly neglected to take screenshots, but due to the graphing engines churning away on my computer in most cases due to the complexity of the data it wouldn’t have been nice to add insult to injury on my little workstation.

But all that is by the by. If this blog posting has piqued your interest in BGP data graphing at all, you’ll hopefully find my summary of a few of the better graphing engines below useful in some way. None of them suited my requirements perfectly but at a very least it is a start for what you could no doubt work on.

  • Graphviz
    • very popular and flexible graphing library.
    • with the number of nodes and paths in this graph, it consumed too much memory and processing time to be effective
  • Large Graph Layout (LGL)
    • very good at handling large graphs, not picky about directed/undirected and has a very simple input format
    • uses a separate java frontend for 2D visualisation after building its meta-data files, and produces VRML output for 3D visualisation (you must provide your own VRML frontend)
    • I found the 2D visualisation to be satisfactory but not very useful for this type of data. I haven’t had much success with VRML viewers with the 3D graph of this size.
  • Walrus
    • entirely java which handles parsing as well as visualisation
    • requires the Java3D library
    • only accepts directed graphs and has a fairly strict input syntax
  • Nodes3D
    • takes relatively simple LUA-files as input
    • quite flexible, and uses standard OpenGL libraries to perform the graphing
    • sadly has a hard-coded limit of a maximum of 2000 nodes, and doesn’t handle more nodes efficiently (with respect to memory allocation) if you alter the limit and recompile
  • aiSee
    • A commercial program that seems to be quite well-rounded and professional-looking
    • Sadly only produces 2D visualizations, with 3D “imitation” with a fish-eye lens effect.
    • It was able to handle my large graph well (not blowing out memory usage) but the resulting visualization in force-directed mode was not sufficient
  • Tulip
    • Relies heavily on QT4. If you are compiling from source grab a cup of coffee while it completes.
    • Has many visualisation possibilities, and can deal with up to one million elements
    • The 3D visualisations aren’t really suitable for this type of data.
  • Lanet-Vi
    • Calculations and visualisation rendering is taken care of for you
    • Has probably the most lenient restrictions on input format
    • An easy option if you don’t want to spend days/weeks/months researching graphing, but would like something quickly
    • source code for local calculation is also available

Other Resources

If you are interested in graphs, the following will probably be interesting to you:

Site links
Anchor
Wiki
Blog
Services
Domain names
Web hosting
VPS
Dedicated Servers
Co-location
Articles
Dedicated Server Purchasing Guide
Dedicated Server Tutorials
Developer Friendly Hosting
Useful Tools