Tag

failover Archives - AWS Managed Services by Anchor

DNS records and DNS management – an overview

By | General, Technical | No Comments

Arthur C. Clarke’s third law of prediction states that “Any sufficiently advanced technology is indistinguishable from magic” – a fair description of the elation you feel when after hours of stumbling around in the dark you finally fluke the right DNS configuration change and BAM! – your blog, website, mail server or load balancer suddenly springs into life. Well, that’s true for me anyway! DNS (or the Domain Name System) can be a complicated beastie if you’re not working with it day in and day out; it is however an essential building block that makes the Internet possible and if you have ever tried to get a website online you’ll have had to muck about with DNS records at some point in your life (and probably will again!). The role…

Read More

Extending Redis to scratch an itch

By | Technical | No Comments

Redis has become one of the most popular “noSQL” datastores in recent times, and for good reason. Customers love it because it’s fast and fills a niche, and we love it because it’s well behaved and easy to manage. In case you’re not familiar with Redis, it’s a key-value datastore (not a database in the classic sense). The entire dataset is always kept in memory, so it’s stupendously fast. Durability (saving the data to disk) is optional. Data in Redis is minimally structured; there’s a small set of data types, but there’s no schema as in a traditional relational database. Thanks to some peculiarities in the way Redis is implemented, it can offer atomic transactions that are difficult to achieve in normal database products. That’s not to say it’s perfect…

Read More

Answers for DRBD time-travel issues

By | Technical | No Comments

A little update on a DRBD problem we wrote about at the start of April, in which in which we lost a few months of data during a cluster failover. Linbit got in touch with us to offer assistance, and we were happy to be enlightened. We had a good idea of what had happened, but no idea why. It seems that a race condition was introduced in version 8.3.9, when the fence-peer script was changed to run asynchronously. The engineering team explained that if the connection is reestablished while the script runs, it may happen that the peer’s disk-state gets overwritten with stale information. This was fixed in 8.3.11, and of course we’re running version 8.3.10 on the cluster in question. We’d like to thank Linbit for their assistance…

Read More

Holy time-travellin’ DRBD, batman!

By | Technical | 6 Comments

Here at Anchor we’ve developed High-Availability (HA) systems for our customers to ensure they remain online in the event of catastrophic hardware failure. Most of our HA systems involve the use of DRBD, the Distributed Replicated Block Device. DRBD is like RAID-1 across a network. We’d like to share some notes on a recent issue that involved a DRBD volume jumping into a time-warp and rolling back four months. If you run your own DRBD setup, you’ll want to know about this. The chances that you hit the same problem are slim, but it’s not hard to avoid. We have a script for Nagios that checks the health of your DRBD volumes, it was basically the go-to default check_drbd script on Nagios Exchange. The script is meant to ensure that…

Read More