Flying with Redeye

By March 7, 2013 Technical

Several months ago we talked about extending the functionality of Redis, the popular noSQL key-value datastore. Since then we’ve taken things a bit further and we think it’s worth sharing.

The Customer needs to store a lot of data in Redis, a few hundred GB at last check, and growing. Redis’ single-threaded nature means it doesn’t scale vertically without bounds though, you can only process so many thousands of transactions per second before you burn up a whole CPU. The solution is to shard your data and scale out horizontally across more CPUs and more RAM, so those hundreds of GB now live on a cluster of about a dozen independent Redis instances.

Shard boundaries can’t be easily adjusted, so we created a large number of shards initially and let them fill up. To make for efficient use of hardware, one server with many CPUs and gigabytes of RAM plays host to as many Redis shards as it can manage. As shards fill up they can be individually migrated to new empty servers; the CONFIG SET BIND feature that we added last time makes it possible.

The migration process is fairly involved and needs to be well-coordinated – it’d really better if the computer could do it for us, as it’s faster and doesn’t make mistakes. That brings us to Redeye, it’s a little set of tools that automates the tedious aspects of SSHing around the cluster and firing off commands.

Redeye brings all the management functions to your workstation, avoiding the need to open SSH sessions to all the servers.

Redeye brings all the management functions to your workstation.

Redeye’s design is intentionally simple; complex systems tend to have more serious failure modes, and are harder to diagnose and debug in the event of a problem. We went for a small set of guiding principles:

  • Keep it simple, we’re avoiding big tools like Pacemaker for a good reason.
  • Don’t try to do too much, leave it to the human to handle the big picture. Redeye just needs to remove the tedium of the procedure.
  • Play nice with our config management system (Puppet): it’s okay to do things a bit differently, but don’t obviate the whole things for Redeye’s sake.
  • Make liberal use of hook scripts for each stage of the procedure. This localises any failures and makes them easy to fix if there’s a problem.

Once a new node is provisioned and added to the cluster, migrating Redis instances to the new node is a simple matter of running a sequence of redeye commands from your workstation. Redeye makes generous use of terminal colour codes, signalling the operator regarding about any unexpected situations.

We’re not releasing the code for Redeye at this time as it’s very specific to the way we deploy Redis here at Anchor, but we’re more than happy to answer any questions. If you’re interested in the improvements to allow live migrations of Redis instances, these can be found linked from the previous post.

Enjoy juggling large amounts of mission critical data? We’re hiring.

Leave a Reply

This is Steve. One of the awesomely brilliant (and well-bearded) Anchorites.

Hosting and AWS management, support and advice from the Ops team behind GitHub.

And if you're on a DevOps journey, talk to us about getting a cloud infrastructure expert assigned to your Agile team.

Call us on +61 2 8296 5111 or send a note:

Name

Email

Your Message

Free AWS Management?
Awesome! Be quick -
offer ends July 31, 2016!



We're giving away free managed services
to the first 10 customers to sign up for
AWS Cloud Ops Lite this July.

You'll save more than USD$2500
in the first year alone!

Want to know more?

No, I don't want free managed services.