Tag

pacemaker Archives - AWS Managed Services by Anchor

Highly available infrastructure for your own website

By | Company News, Technical | No Comments

Every site is different, so this isn’t so much a tutorial as some things to watch out for. We’ll take a reasonably representative database-backed site and talk about what changes when we make it highly available. The site For the purposes of demonstration we’ll use Magento, an e-commerce website written in PHP with a MySQL backend. As well as exemplifying the popular LAMP pattern, Magento allows for extensions that uses extra software components, which also need to be taken into consideration in a highly available setup. It’s worth noting that these notes apply even to vastly different systems. Taking some big customers that we’ve worked on as examples, Github is effectively a Rails app and Testflight’s core is mostly Django – the problem is approached in the same way. Types…

Read More

Pacemaker and Corosync for virtual machines

By | Technical | One Comment

In the previous post we talked about using Corosync and Pacemaker to create highly available services. Subject to a couple of caveats, this is a good all-round approach. The caveats are what we’ll deal with today. Sometimes you’re dealing with software that won’t play nice when moved between systems, like a certain Enterpriseā„¢ database solution. Sometimes you can’t feasibly decompose an existing system into neat resource tiers to HA-ify it. And sometimes, you just want HA virtual machines! This can be done. The solution If the solution to our problem is to run everything on a single server, so be it. We then virtualise that server, and make it highly available. Once again, it’s important to remember that we’re guarding against a physical server going up in smoke. There’s no…

Read More

Pacemaker and Corosync for HA services

By | Technical | No Comments

Now that we’ve got our terminology sorted out, we can talk about real deployments. Our most common HA deployments use the Linux HA suite, with multiple services managed by pacemaker. This is roughly the “stack” that we referred to in the first post in the series. We’ve already covered the resources involved, so we’ll focus on the important bit: What happens when something goes wrong? Normal operation Recall that on our hypothetical HA database server, we’ve got the following managed resources: DRBD storage The filesystem Floating IP address for the service The DB service itself Each resource has its own monitor action, specified by the Resource Agent (RA). Roughly speaking, an RA is a script that implements a common interface between pacemaker and the resources it can manage. It looks…

Read More

Anatomy of an HA stack

By | Technical | No Comments

In what we plan to be a small series of articles about our high availability deployments, we thought we’d start by defining the key components in the stack and how they work together. In future we’ll cover some of the more specific details and things that need to be taken into consideration when deploying such a system. For now we’ll talk about the bits that we use, and why we use them. Type of deployment A highly available system is also highly complex, so it’s important to know just what problem you’re trying to solve when you take on that burden. Our systems are designed to deal with the total failure of a server chassis. This is very low-level and was chosen because it provides the greatest flexibility when dealing…

Read More

Hunting down unexpected behaviour in Corosync’s IP address selection

By | Technical | No Comments

Update from 2012-05-24: The Corosync devs have addressed this and a patch is in the pipeline. The effect is roughly as described below, to build the linked list by appending to the tail, and preferring an exact IP address match for bindnetaddr (which was intended all along but got lost along the way). Rejoicing all round! We’ve been looking at some of Corosync’s internals recently, spurred on by one of our new HA (highly-available) clusters spitting the dummy during testing. What we found isn’t a “bug” per se (we’re good at finding those), but a case where the correct behaviour isn’t entirely clear. We thought the findings were worth sharing, and we hope you find them interesting even if you don’t run any clusters yourself. Disclaimer: We’d like to emphasise…

Read More

LCA day 4 – On freedom

By | Technical | No Comments

It goes without saying that Linuxconf is all about free software, as in both beer and/or speech. A number of today’s talks focused on freedom, in the context of access to data and code, and the freedom to use software (and hardware) the way you see fit. We actually had two great keynote talks on freedom, I’d like to step back to yesterday’s talk by Karen Sandler (you can see the talk for yourself on on youtube, which I’d highly recommended). Karen was diagnosed with hypertrophic cardiomyopathy, a heart condition that means she could suddenly die at any time. Thankfully there are treatments available, one of which is a pacemaker. Being the person she is, she immediately asked “what software does it run?”. Long story short, the manufacturer ended up…

Read More

Just because you CAN, Doesn’t mean you SHOULD

By | Technical | No Comments

(Yeah, I’ve been really slack with the blog posts about Project Starbug, but unfortunately when the choice is between doing the cool stuff, and blogging about it, the blogging tends to lose. I am still planning on writing all about things when things die down. In the meantime…) Remember when you were a kid, and every time you got a new toy you’d just have to play with it all the time? That mentality doesn’t go away as you grow up, it just gets a little more sophisticated. With new technologies, I’m still very much this way. I remember when I first learnt about flex and bison — for the next six months or so, every programming problem I encountered just had to be solved with a minilanguage implemented in…

Read More