Tag

drbd Archives - AWS Managed Services by Anchor

Highly available infrastructure for your own website

By | Company News, Technical | No Comments

Every site is different, so this isn’t so much a tutorial as some things to watch out for. We’ll take a reasonably representative database-backed site and talk about what changes when we make it highly available. The site For the purposes of demonstration we’ll use Magento, an e-commerce website written in PHP with a MySQL backend. As well as exemplifying the popular LAMP pattern, Magento allows for extensions that uses extra software components, which also need to be taken into consideration in a highly available setup. It’s worth noting that these notes apply even to vastly different systems. Taking some big customers that we’ve worked on as examples, Github is effectively a Rails app and Testflight’s core is mostly Django – the problem is approached in the same way. Types…

Read More

Pacemaker and Corosync for virtual machines

By | Technical | One Comment

In the previous post we talked about using Corosync and Pacemaker to create highly available services. Subject to a couple of caveats, this is a good all-round approach. The caveats are what we’ll deal with today. Sometimes you’re dealing with software that won’t play nice when moved between systems, like a certain Enterpriseā„¢ database solution. Sometimes you can’t feasibly decompose an existing system into neat resource tiers to HA-ify it. And sometimes, you just want HA virtual machines! This can be done. The solution If the solution to our problem is to run everything on a single server, so be it. We then virtualise that server, and make it highly available. Once again, it’s important to remember that we’re guarding against a physical server going up in smoke. There’s no…

Read More

Pacemaker and Corosync for HA services

By | Technical | No Comments

Now that we’ve got our terminology sorted out, we can talk about real deployments. Our most common HA deployments use the Linux HA suite, with multiple services managed by pacemaker. This is roughly the “stack” that we referred to in the first post in the series. We’ve already covered the resources involved, so we’ll focus on the important bit: What happens when something goes wrong? Normal operation Recall that on our hypothetical HA database server, we’ve got the following managed resources: DRBD storage The filesystem Floating IP address for the service The DB service itself Each resource has its own monitor action, specified by the Resource Agent (RA). Roughly speaking, an RA is a script that implements a common interface between pacemaker and the resources it can manage. It looks…

Read More

Anatomy of an HA stack

By | Technical | No Comments

In what we plan to be a small series of articles about our high availability deployments, we thought we’d start by defining the key components in the stack and how they work together. In future we’ll cover some of the more specific details and things that need to be taken into consideration when deploying such a system. For now we’ll talk about the bits that we use, and why we use them. Type of deployment A highly available system is also highly complex, so it’s important to know just what problem you’re trying to solve when you take on that burden. Our systems are designed to deal with the total failure of a server chassis. This is very low-level and was chosen because it provides the greatest flexibility when dealing…

Read More

Answers for DRBD time-travel issues

By | Technical | No Comments

A little update on a DRBD problem we wrote about at the start of April, in which in which we lost a few months of data during a cluster failover. Linbit got in touch with us to offer assistance, and we were happy to be enlightened. We had a good idea of what had happened, but no idea why. It seems that a race condition was introduced in version 8.3.9, when the fence-peer script was changed to run asynchronously. The engineering team explained that if the connection is reestablished while the script runs, it may happen that the peer’s disk-state gets overwritten with stale information. This was fixed in 8.3.11, and of course we’re running version 8.3.10 on the cluster in question. We’d like to thank Linbit for their assistance…

Read More

Holy time-travellin’ DRBD, batman!

By | Technical | 6 Comments

Here at Anchor we’ve developed High-Availability (HA) systems for our customers to ensure they remain online in the event of catastrophic hardware failure. Most of our HA systems involve the use of DRBD, the Distributed Replicated Block Device. DRBD is like RAID-1 across a network. We’d like to share some notes on a recent issue that involved a DRBD volume jumping into a time-warp and rolling back four months. If you run your own DRBD setup, you’ll want to know about this. The chances that you hit the same problem are slim, but it’s not hard to avoid. We have a script for Nagios that checks the health of your DRBD volumes, it was basically the go-to default check_drbd script on Nagios Exchange. The script is meant to ensure that…

Read More

GitHub: Speed matters

By | Technical | 4 Comments

Impressions from the first article (in its first day) and the first 24 hours of the GitHub migration, have caused us at Anchor to believe that; GitHub is just as popular as we thought, The migration was worth it, as things are running much faster (just check your twitter feeds, or better yet, check your GitHub source tree for no reason šŸ˜‰ ); and, People are interested in what has gone under the hood of the new GitHub (insert your favorite fast car here; otherwise lets say a roadster). Taking these three things into account, this installment will discuss why things are so much faster post migration compared to prior. I said ‘faster’ and not ‘fast’, because GitHub is now as fast as any website should be. So in comparison,…

Read More

When HA won’t play the way you want it to

By | Technical | No Comments

In an ideal world every service would support High Availability and Load Balancing, would scale up easily and cleanly and all of us systems administrators would be paid bucketloads to play golf all day while the computers did all the hard work. To quote Dylan Moran of Black Books fame, “Don’t make me laugh…bitterly”. I’ll cut to the chase – sometimes you have to really shoehorn technologies to do what you want. Fortunately I love doing this, and the technologies of today’s article are virtualised Windows 2008 on Xen, and Oracle XE 10g. Neither likes to play ball, for a few reasons: Generally speaking, when you virtualise an OS you want to have para-virtualisation drivers enhancing the hardware support. Open Source Xen has PV drivers, but they are not signed…

Read More