Flying with Redeye

Several months ago we talked about extending the functionality of Redis, the popular noSQL key-value datastore. Since then we’ve taken things a bit further and we think it’s worth sharing.

The Customer needs to store a lot of data in Redis, a few hundred GB at last check, and growing. Redis’ single-threaded nature means it doesn’t scale vertically without bounds though, you can only process so many thousands of transactions per second before you burn up a whole CPU. The solution is to shard your data and scale out horizontally across more CPUs and more RAM, so those hundreds of GB now live on a cluster of about a dozen independent Redis instances.

Shard boundaries can’t be easily adjusted, so we created a large number of shards initially and let them fill up. To make for efficient use of hardware, one server with many CPUs and gigabytes of RAM plays host to as many Redis shards as it can manage. As shards fill up they can be individually migrated to new empty servers; the CONFIG SET BIND feature that we added last time makes it possible.

The migration process is fairly involved and needs to be well-coordinated – it’d really better if the computer could do it for us, as it’s faster and doesn’t make mistakes. That brings us to Redeye, it’s a little set of tools that automates the tedious aspects of SSHing around the cluster and firing off commands.

Redeye brings all the management functions to your workstation, avoiding the need to open SSH sessions to all the servers.

Redeye brings all the management functions to your workstation.

Redeye’s design is intentionally simple; complex systems tend to have more serious failure modes, and are harder to diagnose and debug in the event of a problem. We went for a small set of guiding principles:

  • Keep it simple, we’re avoiding big tools like Pacemaker for a good reason.
  • Don’t try to do too much, leave it to the human to handle the big picture. Redeye just needs to remove the tedium of the procedure.
  • Play nice with our config management system (Puppet): it’s okay to do things a bit differently, but don’t obviate the whole things for Redeye’s sake.
  • Make liberal use of hook scripts for each stage of the procedure. This localises any failures and makes them easy to fix if there’s a problem.

Once a new node is provisioned and added to the cluster, migrating Redis instances to the new node is a simple matter of running a sequence of redeye commands from your workstation. Redeye makes generous use of terminal colour codes, signalling the operator regarding about any unexpected situations.

We’re not releasing the code for Redeye at this time as it’s very specific to the way we deploy Redis here at Anchor, but we’re more than happy to answer any questions. If you’re interested in the improvements to allow live migrations of Redis instances, these can be found linked from the previous post.

Enjoy juggling large amounts of mission critical data? We’re hiring.