Wireless IP KVM mk II

Published April 28th, 2009 by oliver

If you have been following this blog for a while you might have seen my previous article on the portable, wireless IP kvm that we constructed a while back for datacentre use. This has proven to be an invaluable tool for remotely accessing machines instantly, in fact so invaluable that contention for its use frequently causes consternation. When I completed the last device, I made a list of how it could be improved in a future revision so when I decided we needed a new one, I thought I’d take care of some of the improvements I had planned.

To refresh your memory:

  • remove covers of internal components to reduce space requirements and improve cooling
  • align the wireless antennae in the middle of the case so cables from the wireless bridge are hidden
  • reuse PCB mount points to screw the boards into the kit box, or use some sort of glue
  • install a unified power supply for the wireless bridge and IPKVM

As you’ll see below, I have taken care of all of these concerns and more.

Wireless Bridge with bottom removed

Wireless Bridge with bottom removed

As soon as the wireless bridge and IPKVM hit my desk, the screwdrivers came out. I set about taking out the cases off the bridge and the IPKVM and seeing what the circuit boards were like.

Wireless bridge circuit board exposed

Wireless bridge circuit board exposed

As you can see, Linksys just use a miniPCI wireless card on the bridge and a couple of standard antenna cables. I wondered what might be under the copper shielding but I didn’t want to destroy anything too early on in the process so I had to suppress my curiosity.

IPKVM circuit board

IPKVM circuit board

The IPKVM circuit board is quite neat and tidy. A perfect candidate for removing the casing! At this point I inspected the boards to see what kind of power supply I’d need. The IPKVM takes 5v but the wireless bridge takes 12v. Jaycar had some nice small switched AC/DC power supplies which can output up to 25w at 12v. This is fine for the bridge but I’d need some sort of regulator for the IPKVM.

I started looking at voltage regulator components such as an LM7805, but most would only do up to 1A and the IPKVM is rated to 2A. It is of course possible to build a 2A regulator using two LM7805s and some supporting components but that would mean building a circuit board, finding a good circuit design and spending a reasonable amount on the components – LM7805s aren’t cheap! I decided to find an alternate solution.

5v UBEC

5v UBEC

Hobbyists frequently pack a lot of batteries in their remote control cars and planes and use tiny, lightweight voltage regulators to bring the voltage down from 14.4V to 5V which the motors require. These are easily found on Ebay for less than $10 and do the job perfectly, so I found one and pretty soon it was on my desk. Above you can see the size comparison with a standard coin.

IEC power cable

IEC power cable

After a trip to Jaycar, I had most of the remaining components, including an IEC power socket and 3-wire AC cable. With this connected to the power supply, we can easily connect the Wireless IPKVM straight to our APC power rails in the datacentre.

25W Power Supply

25W Power Supply

Here you can see the 12v 25W power supply. It is quite small due to it being a switch-mode design, and better yet has a cool little LED showing it is on. Sadly you can’t see this from outside the IPKVM.

Testing 12v

Testing 12v

It’s always good to cover all your bases when constructing like this, so I made sure to break out the old multimeter and check all the voltages. 12.2v unloaded voltage out of the PSU here is pretty good.

Cool LED

Cool LED

Note the awesome power LED on the PSU. The input/output terminals are the easy-to-use screw-down type. There also appears to be some sort of fine-tuning adjustable cap on the board which might change the output voltage but I wasn’t keen on making any changes.

Testing the UBEC

Testing the UBEC

Now it was time to test the output from the 5v regulator. Yes, right on 5.3v. I was pleasantly surprised by the glowing red LED supplied complimentary on the UBEC. The UBEC actually heats up a little bit despite being switch mode – supposedly it gets up to 92% efficiency which is great.

DC connectors

DC connectors

Voltage testing done, I prepared the 12v and 5v outputs with DC connectors suitable for the wireless bridge and IPKVM.

My neat testbench

My neat testbench

Here on my very neat testbench I have connected up the wireless bridge and IPKVM to the 12v and 5v outputs respectively. Lights are on! Everything seemed to be good at this point so I configured the bridge and IPKVM with final deployment settings, updated our documentation and started planning for building it all into a kit box.

Test fitout of the kit box

Test fitout of the kit box

Another trip to Jaycar later, I had my kit box. The form factor is a bit different to the mk I Wireless IPKVM – whereas the first incarnation was long, wide and flat, this one is a lot taller but narrower. Since we are stacking the boards and other components there will be less open space but hopefully the overall result will be neater and more compact. Above is the test placement of the PSU and IPKVM board. The IEC power connector nicely sits next to the PSU.

Fans!

Fans!

A bit of interest in the project generated around the office as it started to take shape. One coworker offered use of his dremel to cut the requisite holes in the box for connectors and another donated a couple of small 12v cooling fans to aid active cooling of the components rather than hoping passive cooling would do. This is probably a necessity, as the PSU and UBEC heat up a reasonable amount under load. I am assuming the IPKVM and Wireless bridge are no different.

Fan and power

Fan and power

I installed two fans both operating in the same direction so airflow would go through the box. It is completely sealed apart from these openings so having the fans both blowing or both sucking would be fairly disastrous. You can see that the IEC power socket sits nice and flush with the box wall. I’ve screwed it in, but also added copious amounts of hot glue :)

Starting the glue-in

Starting the glue-in

At this point I’ve got the power supply glued down, the fans in and the power connector secured. The IPKVM is glued down at the front and underneath, with a small section of cardboard to keep it leveled and some more at the back to brace it against the side of the box. This time around I crimped RJ45 connectors on a small length of Cat5 cut to measure so I didn’t have a huge amount of cable inside the box. Everything is ready for the wireless bridge!

Concealed antennae

Concealed antennae

As per my improvements list, I wanted to have the antennae concealed and this is just what I did. A bit of hot glue here and there and we have the antennae secured to the lid of the kit box, waiting for the mini-N connectors to be screwed in.

Wireless bridge installed

Wireless bridge installed

I was able to cut a few notches in the side posts of the kit box which allowed the wireless bridge to slot in nicely. I hadn’t even planned for this so it was a nice surprise. A bit more hot glue, and it is secure in place.

Blinkenlights

Blinkenlights

It would be a crime against humanity to not make use of blinkenlights wherever possible, so I drilled a few holes in the front of the box to allow the wireless bridge LEDs to poke through. Largely useless, since nobody will be watching it, but this is an important feature nonetheless.

Important warning

Important warning

Adorned with the official Blinkenlights warning, the Wireless IPKVM is now ready for use!

Finished product

Finished product

With cables

With cables

So there you have it. This project was definitely a success and there are already calls for me to rebuild the original Wireless IPKVM in a box like this one. I can’t think of any major improvements I’d like to make next time around, except perhaps for a transparent kit box with far more blinkenlights.

0
Comments

Large filesystem “support”

Published April 24th, 2009 by oliver

I’ve written recently on how to handle systems with very large storage subsystems. One would think that as we make our way through 2009 that the supporting tools for such large filesystems are at the top of their game, but as I’ve been playing with 24TB of storage I’ve realised that this is hardly the case:

  • The most commonly used bootloader for Linux systems, GRUB, doesn’t yet have capabilities to boot from GPT partitions (at least not in the stable release)
  • The most commonly used partitioner, fdisk, doesn’t support GPT-partitioned disks (and hence no disk larger than 2TB)
  • GNU parted, which does support GPT, insists on performing all partition resize operations itself (including resizing the contained filesystem). Since it doesn’t yet understand LVM, it can’t resize any partition that contains an LVM PV.

Today I ran into what appears to be a bug in the CentOS 5.3 installation partitioner, which left my 12TB RAID volume only partitioned to 8TB when I had supplied the –grow parameter in the Kickstart script. Since parted can’t resize LVM partitions, and there don’t appear to be any other tools out there at the moment for GPT partitioning on Linux, I’m left in a less than ideal position.

GNU parted can’t resize the partition because it can’t understand LVM. Fortunately I can just use it to create another partition with the remaining space and add it to the existing LVM volume group but this is really just a hack, and one that disturbs my obsessive-compulsive sysadmin nature. Were it not for the flexibility of LVM, we would be in a bit of a mess.

Sadly, it seems the large filesystem support that will soon become essential for everyone is largely lacking in adequate support.

2
Comments

A delicate balancing act

Published April 24th, 2009 by oliver

In our day to day use of computers, we try to forget as much about the boring, inane things and concentrate on the cool, useful or interesting things (like lolcats). Unfortunately for us, there are quite a few things which are boring and inane, but are also very important.

One of these things is IRQ balancing. Without going into excessive detail, IRQs (interrupt requests) are a mechanism that computer devices use to get the attention of the CPU when they need to do something. It’s like a little call for help, saying they need to get something done. You might find that your network card sends an interrupt when it has received a packet of information, or your hard disk sends an interrupt letting you know that it has some data to send to another part of the computer. Each of these things takes a bit of time away from the CPUs important tasks (like crunching numbers) so it is a thing the CPU likes to do as little as possible – just like you don’t like being interrupted by coworkers when concentrating on that difficult Sudoku puzzle.

For those of us fortunate enough to have multi-processor computers, we have an added advantage – we can give responsibility for some IRQs to one processor, and the rest to the other processor(s). This “balancing” of IRQs will ensure that we get the most efficient handling of those interrupts done and save more processing time for real work. It is possible to balance these interrupts by hand, but there is a handy package called IRQBalance that will do it all for us, automatically.

Here is an example of a well balanced system, which does a lot of passing network traffic between its interfaces:

[root@linux ~]# cat /proc/interrupts
           CPU0       CPU1
  0:  217555173  176216279    IO-APIC-edge  timer
  1:          2          1    IO-APIC-edge  i8042
  4:        194          9    IO-APIC-edge  serial
  6:          2          1    IO-APIC-edge  floppy
  8:          1          0    IO-APIC-edge  rtc
  9:          0          1   IO-APIC-level  acpi
 12:          3          2    IO-APIC-edge  i8042
 15:         13         27    IO-APIC-edge  ide1
169:        350         26   IO-APIC-level  uhci_hcd:usb2, uhci_hcd:usb5
177:          0          0   IO-APIC-level  uhci_hcd:usb4
185:          0          2   IO-APIC-level  ehci_hcd:usb1
193:          0          0   IO-APIC-level  uhci_hcd:usb3
201:   18145851     130083   IO-APIC-level  aic79xx
209:          7          8   IO-APIC-level  aic79xx
217: 3543045844         70   IO-APIC-level  eth0
233:       9982 4165226804   IO-APIC-level  eth1
NMI:          0          0
LOC:  393789923  393790625
ERR:          0
MIS:          0

The most interesting parts of this listing are the lines referencing the network interfaces, eth0 and eth1. You can see from the cumulative IRQ count that the interrupts are evenly balanced between the two CPUs, thanks to IRQBalance.

So how does this interact with multi-core or hyperthreading CPUs? The most important thing to consider when balancing your IRQs is not the number of logical CPUs, but the number of cache domains available. The reasons behind this are quite techical but suffice it to say that you want to be balancing your IRQs over discrete cache domains.

This decision is automatically made for you by irqbalance, as you can see in this code snippet:

	/* On single core UP systems irqbalance obviously has no work to do */
	if (core_count<2)
		exit(EXIT_SUCCESS);
	/* On dual core/hyperthreading shared cache systems just do a one shot setup */
	if (cache_domain_count==1)
                one_shot_mode = 1;

Here, one_shot_mode means irqbalance will run once, balance the IRQs then exit and not continue to rebalance periodically. I ran into this problem when diagnosing a configuration management issue. We had some new Intel Core2Duo servers, and even though they had two cores per CPU, which for most purposes serves as a multi-processor environment, it was not enough for irqbalance. Configuration management would see the multi-core CPU and ensure the irqbalance service was running, but it would exit after the “one shot” run. Thus, on the next configuration management run, it would be started again, ad infinitum.

A similar situation arose recently, where I was performing some large data copies between machines over a network link. We were hitting disk and network limits as the storage subsystems were quite fast but I wanted to squeeze every drop of performance out of the machines. I realised I might be hitting IRQ saturation on one CPU if the RAID card and network interface were sending their interrupts to the same CPU. Indeed they were, and I rebalanced them by hand (check out /proc/irq/).

Lo and behold, the expected performance difference was not observed. I remembered my prior experience with the Core2Duo processors and irqbalance, which seemed to explain the result. It’s something that not many people think about, since it is quite a boring and inane subject, but nonetheless important to the efficient operation of the computer. Hopefully you’ve learned a little about it from reading this article!

0
Comments

Pain-free server migration

Published April 9th, 2009 by oliver

Being the veteran of a datacentre migration and several whole server migrations I feel like I’m getting the process down to a reasonably fine art. I had to perform another migration last night from another datacentre to ours at Global Switch and the process went very smoothly so I thought I’d share some of the techniques I’ve built up over time so you might benefit if you’re in the same situation.

Preparation

This should go without saying. The more time you have to prepare for the migration, the better. You do not want to leave it until the last minute. My philosophy when approaching the migration is always to leave the least amount of work possible to do at the time of the actual migration. Clients will generally want to schedule any server downtime for late at night, when you are not going to be operating at your best (despite how many coffees or energy drinks you may have consumed). If you can log in to the machine, run a prepared script which takes of everything and have the migration completed for you, you will end up with a happy client and be happy yourself. You will be in the datacentre for less time and get to bed earlier, both of which are good things.

Make good use of scripting

Following on from my last point, I strongly encourage you to script as much as possible. The migration I just performed entailed moving a server from one datacentre and network provider to another which meant a change in address space. Thus, firewalls, IP address configuration files, Apache vhosts, ACLs and more had to change. Ahead of time I determined which files would need to be modified and created a script which took a backup of each of these files before overwriting them with corrected versions. Any failure would cause the script to stop and print the problem which could be easily diagnosed manually.

The more automation and failsafes you can build into your script, the better. Since you will be creating it with plenty of time up your sleeve and your brain operating at full capacity you can build up the script with your full arsenal of tricks. At 3am in a cold datacentre with noisy airconditioning you can hardly expect to have your full faculties with you, so make life easier for yourself by leaving as little actual work to do at this point.

Fully acquaint yourself with the server

You will only know what needs to be changed on the server if you are familiar with it. Of course, you should have plenty of good documentation already on it but if not, log in and get the lie of the land. Have a plan for how you will find out facts about the system – make use of grep and well structured regexes for finding out configuration details, slocate (if there is a locate database present) for finding critical files, and your usual toolkit of sysadmin techniques.

Document as you go

At Anchor, documentation is critically important. We have an internal wiki system in which we make detailed notes on every server and a great number of technical articles (a lot of which we have shared with you in our public wiki). Every migration plan is carefully documented from start to finish. In more complicated scenarios a full change proposal is created and officially ratified, but at the very least you should create a checklist:

  • people involved (and their contact details, if necessary)
  • time frame
  • a detailed list of items that need to be prepared or information that needs to be acquired before the migration takes place
  • actions that will be undertaken just before the migration starts
  • the list of actual migration steps, including details of what any scripts will be doing
  • post-migration actions which need to be done immediately after the migration – e.g. checking that all your monitoring is showing OK for all hosts and services
  • a list of “cleanup” items which can be completed after the migration, but not time critical, e.g. removing stale references to servers from your internal documentation

Have as many people check over your documentation as possible, preferably those who have knowledge of the systems so that they can find anything you have missed. The more eyes on your documentation, and heads thinking about it, the better the chances that you will have a plan that covers all aspects.

One of the most important things from my point of view with documentation is to forward a copy to the client, and keep them involved in the process. Not only does it give them confidence in your abilities to conduct the migration successfully, but it gives them an idea of the work that you have had to put in, gives transparency to the process and gives you another point of view on the migration – there may be other steps important to them which you may have missed for example lowering TTLs on domains that are solely client-controlled.

Keep the client “in the loop”

Following on from my previous point, as well as giving the client a copy of your migration documentation, it is important to let them know what is going on. Send them a courtesy email every day or two, a call or whatever your deem appropriate to let them know how you are going with preparations and any information you need from them.

On the day of the migration, double-check everything with them – times, contact details, the migration plan, and so on. Make sure they are still happy to go ahead and that they are happy with your plans. Give them a courtesy call or message when you are about to start the migration, when you are finishing, but most importantly whenever you have any unexpected problems. Nothing upsets clients more than having things go pear-shaped and not being informed about it. Even if you don’t know what the problem is, let them know that you are diligently working on it and will keep them up to date with developments.

Plan for when things go wrong

In a perfect world, you would prepare adequately and everything would go flawlessly (as it did for me last night, luckily). However every slightly obsessive-compulsive systems administrator knows that things can and will go wrong every now and then despite your best efforts.

Make an escape plan for every point where things can go wrong during the migration. Given you won’t have infinite time available, prepare most for the most likely failure scenarios. Make a rollback plan which will abort the migration, and decide how many failures will cause you to take this rollback plan on the night. Confirm this with the client.

Make sure that no change you make cannot be reverted (which most times will necessitate backups). There is nothing worse than discovering you have irrevocably destroyed data in the process of making a critical change.

Approach everything with an obsessive-compulsive attitude

The best plans will have considered everything and left no detail to chance. It can be tiresome to be painstakingly thorough in your plans, but ultimately it will pay off. At the same time though, you don’t have to do everything in one sitting – make notes in your migration plan on what you still need to do and follow it up later. Don’t foolishly believe you will remember everything on the migration day, or even an hour from now – WRITE IT DOWN!

Remember, even though the preparation may be slightly tiresome, you are just making life easier for yourself at migration time. Hopefully if you follow these general tips I’ve prepared, they will make your next migration a lot easier.

0
Comments

Standards? Who needs standards?

Published April 6th, 2009 by oliver

Anyone in the sysadmin or developer worlds will know many examples of flagrant violations of standards in the IT world. Some are perpetrated by our coworkers, but a surprisingly high amount are perpetrated by vendors. Not all of them are by Microsoft, either!

One big win for systems administration at Anchor is our use of APC Rack Power Distribution Units. These have been documented elsewhere in our blog and wiki but suffice it to say that having remote control over your power ports is a Very Good Thing. Situations where you have servers or other devices with multiple power supply units complicates things slightly, but not that much, especially with the aforementioned Rack PDUs in place.

APCs in particular allow you to configure what are called Multicast Groups. Essentially you tell a couple of the Rack PDUs to talk to each other and share information, and WHAMMO you can turn off and turn on a bunch of ports on separate Rack PDUs simultaneously! So rather than turning off the power to one PSU then rebooting the other, you can conduct a reboot of the power to both PDUs with a single command.

The confusion comes during the configuration of the Multicast Group option. Multicast is a very under-utilised feature of IPv4 (which has now partially been rectified in IPv6), in fact a large chunk of the IPv4 address space is allocated to multicast (and is technically called the Class D space). As with all other portions of IP address-space, this has been carefully portioned into sections and allocated to various purposes. You can see the full list here:

http://www.iana.org/assignments/multicast-addresses/

Being a good sysadmin I consider standards to be of paramount importance, so naturally I wanted to configure our Rack PDUs with multicast addresses suitable for the purpose. There are many existing references on the Internet for how to pick sane and standards-obeying addresses from the multicast range. However, when attempting to follow standards and good reason, I was confronted with this error message:

Multicast IP Address is out of range. Valid values are 224.0.0.3 - 224.0.0.254.

Uh, what? I was under the impression that the range 224.0.0.0/24 was already heavily allocated to entities and purposes other than APC Rack PDUs! So much for following the standard, APC.

0
Comments

IPv6 Implementors Conference

Published March 19th, 2009 by oliver

I was dropped a quick note by one of the speakers at the IPv6 Implementors Conference which is being hosted by Google – http://sites.google.com/site/ipv6implementors/conference2009/

Sadly I had no idea this conference was on, as it looks like a valuable opportunity to learn about IPv6 and the progress it is making in the wild. I did get a couple of handy tips about how to improve our implementation plan though so not all was lost.

If you are attending this conference, you are more than welcome to leave comments on this blog post with your learnings – or even link to your own site – the more the merrier.

Tags: , ,
Posted in FTW

 Leave a comment

0
Comments

BGP Data Visualisation

Published March 19th, 2009 by oliver

If you are among the upper echelon of network administrators who happen to have BGP administration within their scope of duties you probably have access to a lot of interesting, albeit quite verbose, information about the Internet at large. Generally, any network with a BGP configuration accepting a full feed from their upstream will have data on just about every entity connected to the Internet. How much you decide to use that information is up to you.

BGP information from your upstream generally has the following pieces of data within it:

  • a network prefix and length
  • the next hop for the prefix
  • path of AS numbers through which the advertisement has passed (subject to some manipulation)
  • information on the originating routing protocol
  • community settings
  • various other BGP settings

After making a decision based on all of these factors, the border router will insert a route into the routing table to reach the advertised network and after this point essentially the data is unused (aside from optionally being passed on to other border routers). There are so many more possibilities for this data however – you may use it for diagnosis of network issues or you may want to use it to visualise your BGP router’s view of the Internet (which is far more interesting).

Here we will use the Quagga routing suite to provide the BGP data. You are free to use Cisco or other proprietary equipment but I find having a server running Quagga to allow you a lot more flexibility, especially in this case of getting data out of the BGP system.

BGP table version is 0, local router ID is 202.4.236.8
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
* i3.0.0.0          202.4.236.9                    90      0 4826 703 2914 9304 80 i
*>                  114.31.193.74                  90      0 4826 703 2914 9304 80 i
*                   203.134.70.37                  10      0 9443 2914 9304 80 i
*> 4.0.0.0          114.31.193.74                  90      0 4826 3356 i
* i                 202.4.236.9                    90      0 4826 3356 i
*                   203.134.70.37                  10      0 9443 11867 7018 3356 i
*> 4.0.0.0/9        114.31.193.74                  90      0 4826 3356 i
* i                 202.4.236.9                    90      0 4826 3356 i
*                   203.134.70.37                  10      0 9443 11867 7018 3356 i

...

Total number of prefixes 275034

The above is a small snippet of BGP data, straight from the proverbial horse’s mouth (the BGP router). Immediately we can see that there is a lot of information for us to use – almost 300,000 unique network prefixes with associated paths through various entities identified by their AS numbers. With this path information we can build a visualisation of the entire Internet. It must be stressed though that this “view” of the Internet is only as seen from our network’s point of view and could be vastly different if generated from a different network. Due to the decentralised nature of the Internet, there is not one categorically “authoritative” view of it (even if you took the BGP data from a very well-connected network), but that doesn’t mean that our view is not useful!

Making the Data Usable

I started out with a copy of the full BGP feed from one of our border routers. Using Quagga’s BGPD you can output the entire feed (post-filtering of course) into a file by using the command-line `vtysh` tool:

# vtysh -c 'show ip bgp' > /data/bgp.txt

You will end up with the entire feed in Quagga output format in the file `/data/bgp.txt`, unfortunately not in a well-formatted data structure but in a format we can work with (the format shown in the excerpt above).

From here, we need to pass the file through a little bit of manipulation so that our graphing backend of choice can use it. I hacked up a very quick perl script which takes the output from the “show ip bgp” and attempts to break it down into unique paths between ASs. It strips out unnecessary headers and other text, then goes through each AS path and adds direct links between ASs to a hash table (so we can automatically remove doubled-up entries). It spits out the list of paths in a fairly Graphviz-centric format but can be easily adjusted to fit the requirements of most other graphing engines.

#!/usr/bin/perl

#use strict;
my %aslist;
my %asnodes;
my $numpaths = 0;

while (<STDIN>) {
	# Skip the first 5 lines of header data
	if ($. < 6) { next; }
	chomp;

	# Skip any blank lines
	if ( m/^$/ ) { next; }

	# Skip lines without an AS path
	if ( m/ 0 (i|e|\?)$/ ) { next; }
	unless (m/ 0 / ) { next; }

	# Skip the last summary line
	if ( m/^Total number of prefixes.*$/ ) { next; }

	# Grab just the AS path bit
	s/^.* 0 (.*) (i|e|\?)$/\1/;
	s/(\{|\})//g;
	s/,/ /g;

	# Turn the AS path string into an array
	my @path = split(' ', $_);
	$numpaths++;

	# Grab the path between each pair of nodes in the array
	$current = pop(@path);
	while ( $next = pop(@path) ) {
		# Don't include AS path prepends
		if ( $current == $next ) { next; }

		# Add both ASs to our global list of ASs
		$asnodes{$_}=1 foreach "$current";
		$asnodes{$_}=1 foreach "$next";

		# Add the path between ASs global hash, so we have no duplicates.
		if ( scalar($current) < scalar($next) ) { $aslist{$_}=1 foreach "$current:$next"; }
		else { $aslist{$_}=1 foreach "$next:$current"; }
		$current = $next;
	}
}

while (($key, $value) = each(%aslist)){
	$key =~ s/:/ /;
	print "$key\n";
}

Graphing Engines

This blog post was originally going to be a full-fledged wiki article, but while I originally thought it was a nifty idea I could knock over in a day or so, it turns out that graphing problems can be really, really hard. Who would have thought? So I spent a couple of days on this but didn’t end up getting the pretty yet functional graphs that I had hoped to get. I also stupidly neglected to take screenshots, but due to the graphing engines churning away on my computer in most cases due to the complexity of the data it wouldn’t have been nice to add insult to injury on my little workstation.

But all that is by the by. If this blog posting has piqued your interest in BGP data graphing at all, you’ll hopefully find my summary of a few of the better graphing engines below useful in some way. None of them suited my requirements perfectly but at a very least it is a start for what you could no doubt work on.

  • Graphviz
    • very popular and flexible graphing library.
    • with the number of nodes and paths in this graph, it consumed too much memory and processing time to be effective
  • Large Graph Layout (LGL)
    • very good at handling large graphs, not picky about directed/undirected and has a very simple input format
    • uses a separate java frontend for 2D visualisation after building its meta-data files, and produces VRML output for 3D visualisation (you must provide your own VRML frontend)
    • I found the 2D visualisation to be satisfactory but not very useful for this type of data. I haven’t had much success with VRML viewers with the 3D graph of this size.
  • Walrus
    • entirely java which handles parsing as well as visualisation
    • requires the Java3D library
    • only accepts directed graphs and has a fairly strict input syntax
  • Nodes3D
    • takes relatively simple LUA-files as input
    • quite flexible, and uses standard OpenGL libraries to perform the graphing
    • sadly has a hard-coded limit of a maximum of 2000 nodes, and doesn’t handle more nodes efficiently (with respect to memory allocation) if you alter the limit and recompile
  • aiSee
    • A commercial program that seems to be quite well-rounded and professional-looking
    • Sadly only produces 2D visualizations, with 3D “imitation” with a fish-eye lens effect.
    • It was able to handle my large graph well (not blowing out memory usage) but the resulting visualization in force-directed mode was not sufficient
  • Tulip
    • Relies heavily on QT4. If you are compiling from source grab a cup of coffee while it completes.
    • Has many visualisation possibilities, and can deal with up to one million elements
    • The 3D visualisations aren’t really suitable for this type of data.
  • Lanet-Vi
    • Calculations and visualisation rendering is taken care of for you
    • Has probably the most lenient restrictions on input format
    • An easy option if you don’t want to spend days/weeks/months researching graphing, but would like something quickly
    • source code for local calculation is also available

Other Resources

If you are interested in graphs, the following will probably be interesting to you:

0
Comments

Filebucketing to the MAXXXXX

Published March 12th, 2009 by oliver

Every now and then we see an example of application failure so astounding it literally brings tears to our eyes. We have a client whose legacy application is unfortunately still running on an ancient version of Oracle Weblogic and which must be maintained until the new, flashy .NET version of their site is complete.

We were alerted this morning to a problem with some of the Weblogic content – the pages were timing out. Diagnostics were fairly fruitless – packet captures showed nothing useful, and the logging from Weblogic left much to be desired. We started considering more outlandish possibilities such as I/O load causing issues, recently applied updates and so on. Even rebooting was considered (given it is running on Windows).

The first clue of note was the open file list from the Weblogic processes – one such example stood out:

C:\weblogic\state\Sa0V\b1gR\O1Ok\WqYN\9kiv\IQT2\SHGx\C3ri\aE1z\L1YH\X5QW\
gdkB\B2PB\pPPw\uHDK\p1a7\I0l5\94sU\kQ43\+533\5517\5738\7484\6253\_-10\
6273\1519\_6_8\888_\8888\_700\2_702_8\888_

For the sake of your screen, I have manually wrapped this Godzilla-like filename.



Perhaps you are familiar with file bucketing already, but if not, typically the directory structure used will have a relatively sane scheme for locating files and only extend a few levels deep. What we saw in this instance was a completely new breed of monster. Admittedly the absolute path of this file is less than 200 characters out of a limit of more than 32,000 but the naming strategy and depth of the structure has us flummoxed.

But this was only the tip of the proverbial iceberg. When we requested Windows to show us the properties of this state folder it took over an hour to completely calculate the file and folder totals, and the result is impressive:

Web logic makes efficient use of the filesystem

Web logic makes efficient use of the filesystem

Yes you read that right – over 10 million nested directories. By this stage we had already moved the state directory out of the way and created a new one, and restarted Weblogic. It seemed happy and quite responsive after that. My suspicion is that someone developing this application at some point ran into a limitation with their filebucketing algorithm, and resolved to solve the problem once and for all, evidently by making it possible to efficiently filebucket every file in the known universe.

0
Comments

Tracing I/O usage on Linux

Published February 19th, 2009 by oliver

I/O subsystems are a whole industry of their own, and many libraries could be (and probably have been) written on the subject already. The particular sub-topic I’m talking about today is when you are faced with a machine that you suspect to be suffering from heavy I/O load, and you want to find the culprit.

Sadly, this is an area that Windows has the upper hand. You can quite easily using the Performance Monitor determine which process is using the largest chunk of your disk I/O. On Linux things can be a little harder, however not all is lost.

If you are fortunate enough to be experiencing the problem on a machine running at least a 2.6.20 kernel and with Python 2.5 or later available, you can run IOTop. This prints out I/O usage data in a similar format to the standard “top” command, and it looks something like this:

IOTop Output

IOTop Output - picture sourced from http://guichaz.free.fr/iotop/iotop_big.png

Sadly at the time I needed to diagnose I/O, the machine I was using had neither a 2.6.20 kernel nor Python 2.5 so I was forced into seeking other methods to trace the I/O. Cue Blktrace. This hooks into the kernel’s debug filesystem to gather I/O stats and presents a fairly raw trace of what’s going on. You can download the source from here or find RPM packages for recent RHEL at the RPMForge Repository.

While it is possible to use blktrace directly, there is also a helper script btrace which shortcuts a lot of the most commonly used options and output formatting. You will need to mount debugfs on /sys/kernel/debug then you are ready to roll!

root@blarg:~# mount -t debugfs none /sys/kernel/debug
root@blarg:~# btrace /dev/sda
  8,0    0        1     0.000000000  2884  A   W 60060711 + 8 <- (8,1) 60060648
  8,0    0        2     0.000000244  2884  Q   W 60060711 + 8 [kjournald]
  8,0    0        3     0.000005278  2884  G   W 60060711 + 8 [kjournald]
  8,0    0        4     0.000006933  2884  P   N [kjournald]
  8,0    0        5     0.000007515  2884  I   W 60060711 + 8 [kjournald]
  8,0    0        6     0.000010068  2884  A   W 60093063 + 8 <- (8,1) 60093000
  8,0    0        7     0.000010263  2884  Q   W 60093063 + 8 [kjournald]
  8,0    0        8     0.000011588  2884  G   W 60093063 + 8 [kjournald]
  8,0    0        9     0.000012072  2884  I   W 60093063 + 8 [kjournald]

OK so there’s not much I/O happening on my workstation, but on the machine I was diagnosing recently, the output of btrace spewed out hundreds of lines per second, many of them referring to a process running the mutt mail program. It turned out one of the users had approximately 90,000 emails in one folder that mutt was constantly rescanning since the machine didn’t have a recent enough version of mutt to support header caching.

The emails were archived away and the I/O problem was resolved. Whereas previously we could only guess at what was causing the I/O load, blktrace squarely points the finger at the problem process. On later machines IOTop would have been even more straightforward. Both are valuable additions to the sysadmin toolkit.

Tags: , , , ,
Posted in FTW

 Leave a comment

1
Comment

nsscache and LDAP reliability

Published February 2nd, 2009 by oliver

Any company with multiple servers in their authentication domain will know of LDAP. Sadly on the Linux platform, OpenLDAP (although arguably the most widely used and well known of the few LDAP servers available) is still not particularly reliable, especially when it comes to replication. The overheads involved in querying even a local OpenLDAP server are much higher than, say, the plaintext files such as /etc/passwd.

Enter nsscache. Created by two boffins at Google (one of whom graduated from Anchor Systems), nsscache gives the reliability and speed of plaintext files (or BDB if you desire) and the scalability of OpenLDAP. Anchor recently started using it and we are confident it will dramatically boost the reliability and lookup speed of all of our LDAP systems.

In terms of performance, we are seeing update times of a second or so for a partial update (only changed or new entries), and around 10 seconds for a full update (replacing the entire on-disk files with a fresh copy of the entire source database) for a bit under 20000 records.

I’d definitely recommend taking a look at it. It currently only operates on Python 2.4 or later (although a patch is nearing release that allows it to work with Python 2.3).

0
Comments