Posts Tagged ‘oracle’

When HA won’t play the way you want it to

Tuesday, September 8th, 2009

In an ideal world every service would support High Availability and Load Balancing, would scale up easily and cleanly and all of us systems administrators would be paid bucketloads to play golf all day while the computers did all the hard work. To quote Dylan Moran of Black Books fame, “Don’t make me laugh…bitterly”.

I’ll cut to the chase – sometimes you have to really shoehorn technologies to do what you want. Fortunately I love doing this, and the technologies of today’s article are virtualised Windows 2008 on Xen, and Oracle XE 10g. Neither likes to play ball, for a few reasons:

  • Generally speaking, when you virtualise an OS you want to have para-virtualisation drivers enhancing the hardware support. Open Source Xen has PV drivers, but they are not signed with a legitimate certificate. Windows 2008 does not play nicely with unsigned or test-cert-signed drivers.
  • Oracle is just a messy, messy, nasty thing. Yes, paid versions undoubtedly support all manner of loadbalancing and HA options, but the free one does not.

Adding HA to Windows 2008 on Xen

The basic procedure was as follows:

  • Install the telnet server within Windows (making sure to lock it down in the firewall to only be accessible by the host machines)
  • Create a special admin account and password used for triggering a shutdown
  • Create an Expect script which logs into the VM via telnet, and issues the shutdown command
  • Create a modified version of the Heartbeat Xen resource agent which calls the expect script to shut down the VM (and wait a safe period of time) before “xm shutdown” is called. Without this, “xm shutdown” will simply power off the VM (in absence of working PV drivers).

The VM was already running on a DRBD volume between the two HA Xen servers, so I was able to just create a standard set of Heartbeat resources to control DRBD primary/secondary mode and the startup/shutdown of the HA WIndows VM. For your benefit (if you want to recreate it) here is the expect script:

#!/usr/bin/expect -f
#
# Script which "automates" shutting down a Windows VM

# Don't log telnet output and commands to stdout, and set a reasonable timeout.
log_user 0
set timeout 3

# Log in via telnet and issue commands. Fairly straightforward.
spawn -noecho /usr/bin/telnet 192.168.1.1
sleep 0.5

# login as the "shutdown" user
expect {
 -re "login: $" {send "shutdown\r"}
 timeout exit
}
sleep 0.5
expect {
 -re "password: $" {send "mysecretpassword\r"}
 timeout exit
}
sleep 0.5
expect {
 -re ">$" {send "shutdown /s /t 0\r"}
 timeout exit
}
sleep 0.1
expect {
 -re ">$" {send "exit\r"}
 timeout exit
}
exit

The rest is fairly self-explanatory if you understand Heartbeat.

Oracle XE 10g

This was more of a learning process, since usually you just install Oracle and leave it the hell alone. Not so for me.

  • Install Oracle on both nodes using (fortunately) the RPMs they provide
  • Configure Oracle on both nodes including creating the databases, using the same password for SYSDBA
  • Shutdown both instances of Oracle
  • Create the DRBD resource, and mount it on the primary node
  • On the primary node, move the contents of /usr/lib/oracle/xe/oradata and /usr/lib/oracle/xe/app/oracle/flash_recovery_area onto the mounted DRBD
  • On the secondary node, delete the aforementioned paths
  • Bind mount the oradata and flash recovery area from the mounted DRBD volume into the correct places in the directory tree.
  • Start Oracle

After I had created a Heartbeat resource group which contained the DRBD resource, the DRBD filesystem mount, the aforementioned bind mounts and the Oracle service itself I was quite pleased to see that Oracle plays quite nicely with our shoehorned HA setup. You’ll want to make sure you have a properly fixed Oracle init script though, as the supplied one is fairly bad.

After making Oracle and Windows 2008 work nicely in HA, I’m almost certain any service no matter how bad can be shoehorned in a similar way to give you decent availability even when it was n’t originally intended.

Improving your quality of life with Oracle

Friday, August 7th, 2009

Not content to take Oracle lying down, we’ve made a couple of small changes on our systems to make life a little saner. The first is a substantial improvement to the default initscript, the second is some shell/environment hacks that should really be done by default. I alluded to these a few posts ago, but hadn’t gotten to publishing them yet.

If you’re a poor sod that has to use Oracle, head over and have a look at the improvements, feedback is always welcome (it might not be perfect, but it’s a lot better). In the interests of not being acquired by Oracle Corp, we’re publishing a diff to the initscript, rather than the full file.

Oracle, why dost thou sucketh so prodigiously?

Tuesday, July 21st, 2009

We’ve picked up a few larger contracts recently. In such cases the customer has been around for a while, which means they have a legacy app that needs to be supported. This is something we can handle – we specialise in tailored solutions that help leverage your existing assets while synergising with your expanding customer base, to enhance ROI and… oh where was I?

Yeah, so we’ve got customers that want Oracle installed. As I expect you know, Oracle has a long history of being enterprisey, seeing a lot of use on Big Iron hardware. It’s usable on cheap x86 hardware now, and there’s even a free edition if you want to play with it yourself (named Express Edition, much like MS SQL Server). The merits of having a toy-sized version of “enterprise” grade software seem questionable to me, but whatever.

So while we’re talking about “enterprise grade” software (to go with your enterprise support), I’d like to share some snippets of our experiences of making this enterprise-grade software work sanely. To be clear, this isn’t something we do a lot of. We have our own ways of doing things on our own systems, and more often than not, the server isn’t single-purpose (it just doesn’t make sense). That said, we have reasonable expectations of our software and how it should behave. Let’s call this “integration”, and it’s clear that they haven’t done much work on this for Linux.

Let’s start with the installer. For the Enterprise Edition, it’s a 765MiB CPIO archive. What the hell is this, seriously. It feels like a gigantic initrd. That’s the installer for version 10.2.0.1; then there’s the update patch to 10.2.0.4. At least that’s a zip file, but it’s 1.2GiB!

One of my colleagues handled the installation for that, it took him a couple of days of banging his head against the wall trying to make it Just Work. The installer is some GUI-based piece of horribleness that required installing a slew of X-related packages so the pretty installer could run. Maybe it’s an enterprise thing, but we don’t sit at a little monitor attached directly to the server, so some trickery had to be used to funnel everything to a VNC session. I guess Oracle expects you to install a full desktop environment on your big iron server as well. Once working, much time was spent dismissing popups letting us know that something wasn’t quite as expected.

Meanwhile, I’ve been working on the initscript. We need this to work properly as we intend to use it with Heartbeat, a failover management tool for Linux. To my joy, I discovered that they eschew all conventions and always return a zero error code, ie. nothing ever goes wrong. Better yet, they discard all output from the subcommands and hide the fact that there’s ever any problems. Whoever was writing this, I suspect the conversation went like this:

Okay, here’s what we’re gonna do. First, setup a LOG variable, so we can use it later:

LOG=”$ORACLE_HOME_LISTNER/listener.log”

Note that I haven’t actually defined ORACLE_HOME_LISTNER anwhere, nor have we sourced any other config files yet. Also, I made a typo in the name.

Now get this, this is the best part: we then proceed to NOT USE THE DAMNED LOG VARIABLE ANYWHERE!

Oracle has a pretty high opinion of itself, too, creating a file called /etc/oratab – yep, it’s right up there on the scale of importance, in the echelons of filesystems, raid volumes and the init process.

I think the best part of all this is the command shell. MySQL users will be familiar with the mysql command, and Postgres has psql. Oracle has sql*plus (yes, it has an asterisk in the name). From what I’ve read this isn’t the preferred way to get things done (you have a shiny GUI for that instead), but dammit, a command line is not a bad way to get things done sometimes. Or even a lot of the time. In any case, and whatever their excuses, sql*plus is absolute crap for what’s meant to be enterprise-grade software.

I understand that Oracle isn’t really “native” to linux-y systems, but would it have killed them to add readline support to the damn thing? This means you have no tab-completion, no ability to recall previous commands, and the only way to edit the current command is with the backspace key. The accepted standard way to get these features is to use rlwrap.

At least once it’s up and running we hopefully won’t have to touch it. The improved initscript is actually useful now, giving meaningful return codes and feedback on what it’s done. Huzzah!

Site links
Anchor
Wiki
Blog
Services
Domain names
Web hosting
VPS
Dedicated Servers
Co-location
Articles
Dedicated Server Purchasing Guide
Dedicated Server Tutorials
Developer Friendly Hosting
Useful Tools