high availability Archives - Anchor Managed Hosting

Nagios checks for iSCSI targets with a blind initiator

By | Technical | No Comments

We’ve recently found ourselves in the situation where we’re managing a highly-available storage service for a customer without actually having direct access to the data on the server. HA storage is commonplace for us now, but not having access to the data is unusual, particularly as a full-stack hosting provider. The reason for this is that the server is presenting iSCSI targets, effectively networked block devices for the clients (“initiators” in iSCSI parlance). We don’t have access to the clients so we can’t find out what they’re doing. In short, there’s no easy access to the data, and it would probably be dangerous for us to try – we’d have to join their OCFS2 cluster to avoid corruption. That doesn’t mean that we can’t monitor them though. As long as…

Read More

Weighing the costs of High Availability

By | Company News, Technical | No Comments

We’re rounding out our series on high availability with a little discussion on the benefits of HA versus the inherent costs. If you’ve been keeping up with the previous articles you’ve probably gotten the impression that it’s a lot of work and easy to get wrong; you’d be correct. That said, HA definitely has its place, so it’s worth arming yourself with the knowledge to assess when it’s appropriate. Costs HA systems have obvious financial costs, but there’s a lot more to it than just money. We’ll talk about these first because we think it’s important to have these in mind when you assess the benefits of going down the HA path. The pattern you’ll notice is that almost everything boils down to complexity. Complexity is your enemy when building…

Read More

Highly available infrastructure for your own website

By | Company News, Technical | No Comments

Every site is different, so this isn’t so much a tutorial as some things to watch out for. We’ll take a reasonably representative database-backed site and talk about what changes when we make it highly available. The site For the purposes of demonstration we’ll use Magento, an e-commerce website written in PHP with a MySQL backend. As well as exemplifying the popular LAMP pattern, Magento allows for extensions that uses extra software components, which also need to be taken into consideration in a highly available setup. It’s worth noting that these notes apply even to vastly different systems. Taking some big customers that we’ve worked on as examples, Github is effectively a Rails app and Testflight’s core is mostly Django – the problem is approached in the same way. Types…

Read More

Application Clustering for High Availability

By | Technical | No Comments

The HA binge continues, today we’re talking about high availability through clustering – providing a service with multiple, independent servers. This differs from the options we’ve discussed so far because it doesn’t involve Corosync and Pacemaker. We’ll still be using the term “clustering”, but it’s now applied high up at the application level. There’s no shared resources within the cluster, and the software on each node is independent of other nodes. A brief description For this article we’re talking exclusively about naive applications that aren’t designed for clustering – they’re unaware of other nodes, and use an independent load-balancer to distribute incoming requests. There are applications with clustering-awareness built in, but they’re targeted at a specific task and aren’t generally applicable, so they’re not worth discussing here. Comparison with highly…

Read More

Extending Redis to scratch an itch

By | Technical | No Comments

Redis has become one of the most popular “noSQL” datastores in recent times, and for good reason. Customers love it because it’s fast and fills a niche, and we love it because it’s well behaved and easy to manage. In case you’re not familiar with Redis, it’s a key-value datastore (not a database in the classic sense). The entire dataset is always kept in memory, so it’s stupendously fast. Durability (saving the data to disk) is optional. Data in Redis is minimally structured; there’s a small set of data types, but there’s no schema as in a traditional relational database. Thanks to some peculiarities in the way Redis is implemented, it can offer atomic transactions that are difficult to achieve in normal database products. That’s not to say it’s perfect…

Read More

Pacemaker and Corosync for virtual machines

By | Technical | One Comment

In the previous post we talked about using Corosync and Pacemaker to create highly available services. Subject to a couple of caveats, this is a good all-round approach. The caveats are what we’ll deal with today. Sometimes you’re dealing with software that won’t play nice when moved between systems, like a certain Enterpriseā„¢ database solution. Sometimes you can’t feasibly decompose an existing system into neat resource tiers to HA-ify it. And sometimes, you just want HA virtual machines! This can be done. The solution If the solution to our problem is to run everything on a single server, so be it. We then virtualise that server, and make it highly available. Once again, it’s important to remember that we’re guarding against a physical server going up in smoke. There’s no…

Read More

Pacemaker and Corosync for HA services

By | Technical | No Comments

Now that we’ve got our terminology sorted out, we can talk about real deployments. Our most common HA deployments use the Linux HA suite, with multiple services managed by pacemaker. This is roughly the “stack” that we referred to in the first post in the series. We’ve already covered the resources involved, so we’ll focus on the important bit: What happens when something goes wrong? Normal operation Recall that on our hypothetical HA database server, we’ve got the following managed resources: DRBD storage The filesystem Floating IP address for the service The DB service itself Each resource has its own monitor action, specified by the Resource Agent (RA). Roughly speaking, an RA is a script that implements a common interface between pacemaker and the resources it can manage. It looks…

Read More

Squaring off with your high availability terminology

By | Technical | No Comments

Following our previous post on the basics of high-availability services, it occurred to us that there’s often some confusion about the use of certain terms and phrases. We’d like to clear that up before pressing on, and hopefully reduce some of the headaches for people in the long run. We’re dealing with a few closely related terms here, with important differences in meaning: High availability Load balancing Linux HA High Availability (HA) is a concept and a goal. How you achieve it is up to you, but the implication is that it involves more than one server, because a single server is a single point of failure. Having a hot-standby server to takeover in the event of a failure is one way to get HA. For certain types of services…

Read More

Anatomy of an HA stack

By | Technical | No Comments

In what we plan to be a small series of articles about our high availability deployments, we thought we’d start by defining the key components in the stack and how they work together. In future we’ll cover some of the more specific details and things that need to be taken into consideration when deploying such a system. For now we’ll talk about the bits that we use, and why we use them. Type of deployment A highly available system is also highly complex, so it’s important to know just what problem you’re trying to solve when you take on that burden. Our systems are designed to deal with the total failure of a server chassis. This is very low-level and was chosen because it provides the greatest flexibility when dealing…

Read More

Hunting down unexpected behaviour in Corosync’s IP address selection

By | Technical | No Comments

Update from 2012-05-24: The Corosync devs have addressed this and a patch is in the pipeline. The effect is roughly as described below, to build the linked list by appending to the tail, and preferring an exact IP address match for bindnetaddr (which was intended all along but got lost along the way). Rejoicing all round! We’ve been looking at some of Corosync’s internals recently, spurred on by one of our new HA (highly-available) clusters spitting the dummy during testing. What we found isn’t a “bug” per se (we’re good at finding those), but a case where the correct behaviour isn’t entirely clear. We thought the findings were worth sharing, and we hope you find them interesting even if you don’t run any clusters yourself. Disclaimer: We’d like to emphasise…

Read More
Ready to talk business? Send us a note.