Author Archive

When HA won’t play the way you want it to

Tuesday, September 8th, 2009

In an ideal world every service would support High Availability and Load Balancing, would scale up easily and cleanly and all of us systems administrators would be paid bucketloads to play golf all day while the computers did all the hard work. To quote Dylan Moran of Black Books fame, “Don’t make me laugh…bitterly”.

I’ll cut to the chase – sometimes you have to really shoehorn technologies to do what you want. Fortunately I love doing this, and the technologies of today’s article are virtualised Windows 2008 on Xen, and Oracle XE 10g. Neither likes to play ball, for a few reasons:

  • Generally speaking, when you virtualise an OS you want to have para-virtualisation drivers enhancing the hardware support. Open Source Xen has PV drivers, but they are not signed with a legitimate certificate. Windows 2008 does not play nicely with unsigned or test-cert-signed drivers.
  • Oracle is just a messy, messy, nasty thing. Yes, paid versions undoubtedly support all manner of loadbalancing and HA options, but the free one does not.

Adding HA to Windows 2008 on Xen

The basic procedure was as follows:

  • Install the telnet server within Windows (making sure to lock it down in the firewall to only be accessible by the host machines)
  • Create a special admin account and password used for triggering a shutdown
  • Create an Expect script which logs into the VM via telnet, and issues the shutdown command
  • Create a modified version of the Heartbeat Xen resource agent which calls the expect script to shut down the VM (and wait a safe period of time) before “xm shutdown” is called. Without this, “xm shutdown” will simply power off the VM (in absence of working PV drivers).

The VM was already running on a DRBD volume between the two HA Xen servers, so I was able to just create a standard set of Heartbeat resources to control DRBD primary/secondary mode and the startup/shutdown of the HA WIndows VM. For your benefit (if you want to recreate it) here is the expect script:

#!/usr/bin/expect -f
#
# Script which "automates" shutting down a Windows VM

# Don't log telnet output and commands to stdout, and set a reasonable timeout.
log_user 0
set timeout 3

# Log in via telnet and issue commands. Fairly straightforward.
spawn -noecho /usr/bin/telnet 192.168.1.1
sleep 0.5

# login as the "shutdown" user
expect {
 -re "login: $" {send "shutdown\r"}
 timeout exit
}
sleep 0.5
expect {
 -re "password: $" {send "mysecretpassword\r"}
 timeout exit
}
sleep 0.5
expect {
 -re ">$" {send "shutdown /s /t 0\r"}
 timeout exit
}
sleep 0.1
expect {
 -re ">$" {send "exit\r"}
 timeout exit
}
exit

The rest is fairly self-explanatory if you understand Heartbeat.

Oracle XE 10g

This was more of a learning process, since usually you just install Oracle and leave it the hell alone. Not so for me.

  • Install Oracle on both nodes using (fortunately) the RPMs they provide
  • Configure Oracle on both nodes including creating the databases, using the same password for SYSDBA
  • Shutdown both instances of Oracle
  • Create the DRBD resource, and mount it on the primary node
  • On the primary node, move the contents of /usr/lib/oracle/xe/oradata and /usr/lib/oracle/xe/app/oracle/flash_recovery_area onto the mounted DRBD
  • On the secondary node, delete the aforementioned paths
  • Bind mount the oradata and flash recovery area from the mounted DRBD volume into the correct places in the directory tree.
  • Start Oracle

After I had created a Heartbeat resource group which contained the DRBD resource, the DRBD filesystem mount, the aforementioned bind mounts and the Oracle service itself I was quite pleased to see that Oracle plays quite nicely with our shoehorned HA setup. You’ll want to make sure you have a properly fixed Oracle init script though, as the supplied one is fairly bad.

After making Oracle and Windows 2008 work nicely in HA, I’m almost certain any service no matter how bad can be shoehorned in a similar way to give you decent availability even when it was n’t originally intended.

AusNOG conference

Tuesday, September 1st, 2009

I was lucky enough to get a free pass to the Australian Network Operators Group conference from one of our upstream providers, so that’s what I’m up to at the start of this week. It is interesting to compare it to my experiences at the several LinuxConfAU conferences I’ve been to. On the whole I can say it is more Enterprisey, far less smelly, and a generally smaller but more focussed conference. Obviously network topics dominate the conference (although there are a number of presentations that border on other areas).

Somewhat confusingly for a sysadmin, they named this conference AusNOG03. They have decided to not use a year-based numbering system nor one that starts at 0 (which would please most of us), and as a kicker have locked themselves into a two-digit Y2K-style bug. Well, it’s only 3 years old, we’ll let that point slide.

Unhealthy snacks ahoy

Unhealthy snacks ahoy

Typically tasty and unhealthy snacks could be found upon entry – some delightful mini-croissants with ham and cheese. Coffee and tea staples were omnipresent. Apparently there was a large imbibing session last night and most delegates attended.

Conference room

Conference room

It is being held at the Four Seasons Hotel in Sydney. I have to give them points for style, and functionality. Not only do we have actual stable desks for writing and computing, but there is a power board for every three seats.

Legacy writing equipment, water glass and mints

Legacy writing equipment, water glass and mints

An array of useful items were at every seat. They clearly recognise that network operators lack social etiquette and have strewn mints far and wide. They are on the tables, they are in the conference bags.

To briefly summarise what I have taken in so far – the Internet is not yet blowing up; network operators and BGP are doing a good job and making the Internet as a whole (which is going from a long stringy network, to a fat wide network) better; Open-Source content delivery networks are on the horizon and may become a reality some time soon.

Server naming schemes, part 5748

Friday, July 3rd, 2009

Unusual, odd and downright disturbing naming schemes for servers have been almost literally done to death already. We at Anchor use a nautical theme, which has proven plentiful and seemingly inexhaustible over the few hundred servers and VPSs that we have under our control. Every now and then though, you come across something altogether new and astounding.

Consider the documentation here which is linked to from this site.

If you read through the PDF you’ll find the gem ZEUSDOGGYDOG. We are clearly dealing with a genius who has combined two distinct themes – Greek mythology (specifically, gods) and modern-era rappers. The namespace is almost inexhaustible (especially considering the current state of popular music)!

I’ve come up with just a few examples:

MARSNOTORIOUSBIG
HADESFLAVAFLAV
JUPITERDRDRE
POSEIDONMETHODMAN
HERMESICET

I’m sure an automated generator based on wordlists would be trivial to implement, and supply you with a bounty of educational yet modern server names.

It’s fricken cold in here Mr Bigglesworth

Thursday, June 11th, 2009

If you’re in Sydney then you’ll be acutely aware of the extremes of temperature we are currently experiencing – and it’s only a few days after the start of Winter! Anchor is a carbon-neutral company and likes to promote environment-saving measures wherever possible so in our budget we allow for the purchase of thermal undergarments for each employee. A warm employee is a happy employee!

By dressing smart for work, we don’t have to unnecessarily make use of heating systems, cutting down on electricity use and ultimately helping the environment. Plus, you look like a real coding ninja when typing at your workstation with fingerless gloves :)

If your place of employment doesn’t provide for something similar, ask your boss about it. Or if you’re an adept systems administrator, you might consider switching teams and coming to work for us! ;)

Keeping your finger on the pulse of your network

Wednesday, June 3rd, 2009

In the past I’ve written about a few ways you can set up decent IP traffic accounting on your network. If you have already set this up and are champing at the bit for more ways your can increase situational awareness of your network state you can try one of the following tools related to pmacct:

These tools allows you to graph and/or analyse your traffic data in a variety of ways. If you are currently using one or more of these, drop us a comment and let us know your success or failure stories!

Testing your connectivity

Thursday, May 21st, 2009

Recently I blogged about our new IPv4 address allocation. While we don’t need to start using it for a while as we have been conserving IP addresses quite well, and gave ourselves plenty of time before we actually need to use the new allocation, it is a good idea to check that it is accessible to the Internet at large.

Our new allocation is from the block 110.0.0.0/8 which was only allocated to the Asia-Pacific regional registry APNIC last November. Prior to it being allocated to APNIC, it would have been in a state affectionately known as “bogon” to network administrators. Bogons are network ranges that aren’t in use, and therefore can be safely ignored by all live networks on the Internet. There have been cases where spammers or other parties looking to conduct illegal activity on the Internet have attempted to use unallocated network ranges for various reasons, so most knowledgable network administrators will block all bogon networks. There are several projects such as the Team Cymru Bogon Reference which put together lists of current bogons to aid network administrators in this task.

The problem comes at the time of removing these bogons from the list. There are currently over 30000 active ASs (Autonomous Systems) on the IPv4 Internet, and effectively each of these must update their own bogon list (if they are not peering with an automatic service such as what Team Cymru provides). Not all network administrators are up to date on the IANA allocations so this process can take months. We are lucky enough to have an allocation from a brand new APNIC range – others are not so lucky and often will have been allocated a range that was previously used by spammers.

Faced with this situation, we’ve decided to try to find out exactly how reachable our new allocation is. I consulted the members of the NANOG mailing list, and pondered their suggestions. I’ve documented below the success of the various methods they suggested and a method which we thought up and decided to try as well.

Do Nothing

One member of the mailing list said:

IMHO, if a network doesn't either update filters based on IANA
notifications or follow Cymru BOGON, then they don't deserve to receive
traffic from your network ;) 

The BOFH within me likes this response very much, but sadly I don’t think that response would be accepted by the boss… I’d also like to take more of an active role in determining our connectivity.

RIPE Debogon Prefix Reachability

http://www.ris.ripe.net/cgi-bin/debogon.cgi

The RIPE regional registry has this page which is effectively just a rudimentary looking glass allowing you to ping or traceroute to your new IP address space. Unfortunately I found dubious results when pinging from some of the routers listed, on all of our address ranges. I suspect not all routers are available or the script behind the page needs updating. If you only have a single address range it would be hard to figure out if results are correct.

RIPE also performs their own testing of de-bogoned address space and graphs the output of their reachability tests. This only really helps you if your allocation has come from RIPE though.

Looking Glasses

Similar to the above method, this involves advertising a small segment of IP space for testing and then using as many public looking glasses on the web as you can find to test connectivity. It is quite thorough, although very time consuming.

Notify network operator groups

One suggestion from the list was simply to post a message on NOG mailing lists and ask the participants to check their filters and optionally attempt to ping the address space in question. This requires participation from the network administrators on the lists, but my main reservation with this method is that if they are knowledgeable enough to subscribe to NANOG or a similar mailing list, they probably take care of their BGP filters anyway so this method probably won’t reveal too many misconfigurations.

Active testing from BGP data

I’m always interested in using BGP data for new and exciting things, so this was a good challenge. From our border routers we can assemble a list of endpoint ASs (as we’re not that interested in strictly transit ASs unless we spot a problem we can pin down to one of them) and pick at least one subnet advertised by each AS. Then we attempt to ping or in some other way communicate with one IP address on each of those subnets. We do this from a working IP on our existing IP allocation range.

We then take the results of that testing and perform the tests again from an IP address on our new allocation range. In theory if the range has been correctly debogoned we will see identical results (even if not all IPs are reachable). If there are discrepancies we can determine which ASs may have bogon-related issues and attempt to contact them.

Reusing most of the BGP dump manipulation script I wrote for my BGP Data Visualisation article, I was able to pull out a list of unique endpoint AS numbers and a subnet for each from our live BGP data within a few seconds. The wc utility tells me that there are 31056 unique AS numbers, which sounds about right based on the recent AS reports. Here is the perl script I used to generate the list of ASs and a subnet for each. You simply pipe the output of the “show ip bgp” command from your router into it and it will print one AS and one target subnet per line:

#!/usr/bin/perl

my %aslist;

while (<STDIN>) {
    # Skip the first 5 lines of header data
    if ($. < 6) { next; }
    chomp;

    # Skip any blank lines
    if ( m/^$/ ) { next; }

    # Skip lines without an AS path or subnet
    if ( m/ 0 (i|e|\?)$/ ) { next; }
    if ( m/^... / ) { next; }
    unless (m/ 0 / ) { next; }

    # Skip the last summary line
    if ( m/^Total number of prefixes.*$/ ) { next; }

    # Grab the AS path and subnet
    s/^...([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}(\/[0-9]{1,2})?).* 0 (.*) (i|e|\?)$/\1 \3/;
    s/(\{|\})//g;
    s/,/ /g;

    # Turn the AS path string into an array
    my @path = split(' ', $_);

    # Add classful subnet designations
    $path[0] =~ s/^([0-9]{1,3})\.0\.0\.0$/\1.0.0.0\/8/;
    $path[0] =~ s/^([0-9]{1,3}\.[0-9]{1,3})\.0\.0$/\1.0.0\/16/;
    $path[0] =~ s/^([0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3})\.0$/\1.0\/24/;

    # Add last AS to our global list of ASs
    $aslist{$path[-1]}=$path[0];
}

while (($key, $value) = each(%aslist)){
    print "$key $value\n";
}

Since this was very much just a proof of concept I didn’t have much motivation to ensure absolute correctness or make the process as efficient as possible. Ideally I’d have the entire thing within the one script/program which intelligently pings multiple hosts in separate threads with some sort of limiting involved. Instead, I hacked up a couple of quick shell scripts; the first takes the list of ASs and subnets and passes them to the second script which is forked off for each pair. Forking off indiscriminately would lead to the process scheduler having a fit and the machine becoming unresponsive pretty quick so there is a quick check to make sure there aren’t more than 250 concurrent pings running before forking off another instance.

#!/bin/bash
cat AS | while read as subnet; do
        while [ `ps h -C ping | wc -l` -gt 250 ]; do
                sleep 60
        done
        /data/pingloop $as $subnet &
done

“AS” is the file with AS/subnet pairs.

#!/bin/bash
as=$1
subnet=$2
for ip in `/usr/bin/ipcalc $subnet 255.255.255.255 | grep Hostroute | awk '{print $2}'`; do
        ping -c 5 -i 0.2 -w 5 $ip >/dev/null 2>/dev/null
        if [ $? -eq 0 ]; then
                ping -I X.X.X.X -c 5 -i 0.2 -w 5 $ip >/dev/null 2>/dev/null
                if [ $? -eq 0 ]; then
                        echo "$as $ip reachable" >> output
                else
                        echo "$as $ip bogoned" >> output
                fi
                exit 0
        fi
done

In the above “pingloop” script, we hamfistedly generate a sequence of IPs on the target subnet and attempt to find one reachable IP address then ping it from our new allocation immediately after to see if it is reachable from both subnets.

The results came back in about 10 hours, which isn’t bad for some fairly non-aggressive ICMP reachability testing of effectively the entire IPv4 Internet. Out of 25446 ASs we were able to reach initially, 1716 couldn’t be reached from our new address space which works out to be around 6.7%. Not terrible, but not great either. From here, we’ll look at the ASs that couldn’t be reached and see if there are any patterns that suggest common upstreams need to update their filters.

One disadvantage to this method is raising the ire of network administrators. The amounts of ICMP traffic the scripts generate is pretty minimal but some networks have overly sensitive network monitoring that will trigger if you perform a sequential ICMP “scan” of their network. Of course, it wasn’t performed with malicious intent to really they have no cause to complain.

On DNS and GeoIP

While network-based bogon lists are the prime concern, you should also consider DNS resolver ACLs and GeoIP data. Many DNS administrators will maintain bogon lists in their configurations and these are probably updated even less frequently than BGP bogon lists. If you run into issues with nameservers on your new IP allocation range, you will know that someone out there hasn’t updated their BIND configuration. Similarly, a lot of web services utilise GeoIP to determine the location of a remote IP. By virtue of the allocation to APNIC, our new range is displayed as being in Australia, but it does not show a city or geographical coordinates. Sending an email to GeoIP with your details can rectify this problem.

The importance of keeping clean log files

Thursday, May 21st, 2009

I was shoulder-surfing a colleague today while they were trying to diagnose a webserver problem for a client. I noticed, certainly not for the first time, that the Apache error log message was filled with messages like “robots.txt not found” and “favicon.ico not found”. Surely these must be amongst the most frequently logged errors (if not the top two).

Multiplied by many hundreds of servers, with millions of hits per day, and you have a significant amount of disk space being taken up by these trivial messages. What’s more, any time you spend scrolling through the hordes of messages like these is time taken away from debugging the real problem, if it exists.

So please, be kind to your sysadmin and include a robots.txt and favicon.ico for your website. It makes sense from a search engine point of view and makes your website just a little bit prettier, so why not?

New IPv6 allocation for Anchor

Thursday, May 14th, 2009

As mentioned on this blog a few times before, we’re committed to getting IPv6 happening at Anchor. While the live rollout date is probably still a while away, we have at least begun making some inroads on the progress. Today we received our IPv6 allocation from APNIC:

2407:7800::/32

That equates to about 2^96 IP addresses, roughly 10^28 or 79228162514264337593543950336. Quite a mind-boggling number. We’ll continue our research, documentation, testing and will let you know when we are ready to start handing out live addresses. Until then, if you are a customer or would like to be, please let us know you are interested in IPv6, as there are still not too many hosting companies who are using it. Amazing, given IPv4 addresses will run out in a couple of years at best.

New IPv4 allocation for Anchor

Wednesday, May 6th, 2009

Nobody is under any pretences that IPv6 will be close to 100% usage globally any time soon, so despite many entities having firm IPv6 plans or infrastructure already in place, demand for IPv4 is still strong. With that in mind, we’ve just acquired a new allocation from APNIC which will hopefully see us through until IPv6 is dominant on the Internet.

110.173.128.0/19

This allocation is from the 110/8 class A that was allocated to APNIC in November 2008, and represents a tripling of Anchor’s current IPv4 space. We’ll be following our current strict allocation policies to ensure it is the last additional IPv4 allocation we will need, and continuing with our current IPv6 plans as all responsible entities on the Internet should be doing.

Covering all your bases

Tuesday, May 5th, 2009

No, this isn’t a pre-baseball game pep-talk. If you use configuration management (and you really, really should) then you will understand the need to manage configuration files. That may seem like a ridiculous statement but configuration files come in many shapes and sizes. What may be simple and consistent to one application author may be strange and erroneous to another.

Thus the lowly sysadmin has the unenviable task of herding this heterogenous group of files into some semblance of order. With Puppet (a great configuration management application that Anchor uses) you have several options:

  • roll out complete static files
  • roll out templates which are filled in with common or possible per-host settings
  • make file edits

The last option is admittedly much more painful than the first two. Even here you have a few options – perform edits with shell commands such as sed, perl, awk etc or create your own custom types to deal with the specific configuration file format in question.

A better alternative is Augeas. This is (depending on how you want to use it) a shell command, or native binding in a number of common languages and allows you (through the various “lenses”) to edit configuration files in a consistent manner no matter what the format of that file is. Since version 0.24.7, Puppet has had support for the Augeas resource type so you can conveniently edit files using Augeas through your existing Puppet manifests.

Line-editing configuration files is not going away anytime soon sadly, so Augeas is a great tool to bridge the gap when complete file roll-outs and templates just can’t do the job you want.

Site links
Anchor
Wiki
Blog
Services
Domain names
Web hosting
VPS
Dedicated Servers
Co-location
Articles
Dedicated Server Purchasing Guide
Dedicated Server Tutorials
Developer Friendly Hosting
Useful Tools