Just because you CAN, Doesn’t mean you SHOULD

Published September 25th, 2009 by matt

(Yeah, I’ve been really slack with the blog posts about Project Starbug, but unfortunately when the choice is between doing the cool stuff, and blogging about it, the blogging tends to lose. I am still planning on writing all about things when things die down. In the meantime…)

Remember when you were a kid, and every time you got a new toy you’d just have to play with it all the time? That mentality doesn’t go away as you grow up, it just gets a little more sophisticated. With new technologies, I’m still very much this way. I remember when I first learnt about flex and bison — for the next six months or so, every programming problem I encountered just had to be solved with a minilanguage implemented in flex/bison. I shudder to think that any of that code might still be out there…

Anyway, this week’s shiny new toy has been Heartbeat / Pacemaker. I’ve played with it a fair bit in the past, but just in two-node (Heartbeat v1) clusters. For Project Starbug, though, I’ve been taking it to new heights of awesome (multi-node, easily expandable HA VM clusters, for example). So, of course, anywhere that a bit of high-availability might be good, I’ve laid it on thick. With the Puppet manifests we’ve got for managing Pacemaker, it’s almost harder not to make something HA (seriously, our Pacemaker manifests are awesome).

Unfortunately, in a couple of places I kinda forgot that some services have their own ways of doing HA, and they’re generally superior to tying a service and an IP together and telling Pacemaker to go do it’s thing. The two services that I’ve just converted back away from Heartbeat are NTP and DNS. Yeah, that’s right — I setup pacemaker resources for our NTP server and DNS server, because I suffer from occasional bouts of acute “shiny toy syndrome”. I’ve now recovered, having learnt my lesson (for now).

0
Comments

When HA won’t play the way you want it to

Published September 8th, 2009 by oliver

In an ideal world every service would support High Availability and Load Balancing, would scale up easily and cleanly and all of us systems administrators would be paid bucketloads to play golf all day while the computers did all the hard work. To quote Dylan Moran of Black Books fame, “Don’t make me laugh…bitterly”.

I’ll cut to the chase – sometimes you have to really shoehorn technologies to do what you want. Fortunately I love doing this, and the technologies of today’s article are virtualised Windows 2008 on Xen, and Oracle XE 10g. Neither likes to play ball, for a few reasons:

  • Generally speaking, when you virtualise an OS you want to have para-virtualisation drivers enhancing the hardware support. Open Source Xen has PV drivers, but they are not signed with a legitimate certificate. Windows 2008 does not play nicely with unsigned or test-cert-signed drivers.
  • Oracle is just a messy, messy, nasty thing. Yes, paid versions undoubtedly support all manner of loadbalancing and HA options, but the free one does not.

Adding HA to Windows 2008 on Xen

The basic procedure was as follows:

  • Install the telnet server within Windows (making sure to lock it down in the firewall to only be accessible by the host machines)
  • Create a special admin account and password used for triggering a shutdown
  • Create an Expect script which logs into the VM via telnet, and issues the shutdown command
  • Create a modified version of the Heartbeat Xen resource agent which calls the expect script to shut down the VM (and wait a safe period of time) before “xm shutdown” is called. Without this, “xm shutdown” will simply power off the VM (in absence of working PV drivers).

The VM was already running on a DRBD volume between the two HA Xen servers, so I was able to just create a standard set of Heartbeat resources to control DRBD primary/secondary mode and the startup/shutdown of the HA WIndows VM. For your benefit (if you want to recreate it) here is the expect script:

#!/usr/bin/expect -f
#
# Script which "automates" shutting down a Windows VM

# Don't log telnet output and commands to stdout, and set a reasonable timeout.
log_user 0
set timeout 3

# Log in via telnet and issue commands. Fairly straightforward.
spawn -noecho /usr/bin/telnet 192.168.1.1
sleep 0.5

# login as the "shutdown" user
expect {
 -re "login: $" {send "shutdown\r"}
 timeout exit
}
sleep 0.5
expect {
 -re "password: $" {send "mysecretpassword\r"}
 timeout exit
}
sleep 0.5
expect {
 -re ">$" {send "shutdown /s /t 0\r"}
 timeout exit
}
sleep 0.1
expect {
 -re ">$" {send "exit\r"}
 timeout exit
}
exit

The rest is fairly self-explanatory if you understand Heartbeat.

Oracle XE 10g

This was more of a learning process, since usually you just install Oracle and leave it the hell alone. Not so for me.

  • Install Oracle on both nodes using (fortunately) the RPMs they provide
  • Configure Oracle on both nodes including creating the databases, using the same password for SYSDBA
  • Shutdown both instances of Oracle
  • Create the DRBD resource, and mount it on the primary node
  • On the primary node, move the contents of /usr/lib/oracle/xe/oradata and /usr/lib/oracle/xe/app/oracle/flash_recovery_area onto the mounted DRBD
  • On the secondary node, delete the aforementioned paths
  • Bind mount the oradata and flash recovery area from the mounted DRBD volume into the correct places in the directory tree.
  • Start Oracle

After I had created a Heartbeat resource group which contained the DRBD resource, the DRBD filesystem mount, the aforementioned bind mounts and the Oracle service itself I was quite pleased to see that Oracle plays quite nicely with our shoehorned HA setup. You’ll want to make sure you have a properly fixed Oracle init script though, as the supplied one is fairly bad.

After making Oracle and Windows 2008 work nicely in HA, I’m almost certain any service no matter how bad can be shoehorned in a similar way to give you decent availability even when it was n’t originally intended.

0
Comments