Extending Redis to scratch an itch
Redis has become one of the most popular “noSQL” datastores in recent times, and for good reason. Customers love it because it’s fast and fills a niche, and we love it because it’s well behaved and easy to manage.
In case you’re not familiar with Redis, it’s a key-value datastore (not a database in the classic sense). The entire dataset is always kept in memory, so it’s stupendously fast. Durability (saving the data to disk) is optional. Data in Redis is minimally structured; there’s a small set of data types, but there’s no schema as in a traditional relational database. Thanks to some peculiarities in the way Redis is implemented, it can offer atomic transactions that are difficult to achieve in normal database products.
That’s not to say it’s perfect though. One of our larger customers uses Redis extensively and we’ve run into some limitations that just aren’t cool when you’re trying to juggle hundreds of gigabytes of data without dropping any of it.
Redis is still the best software out there for what they’re doing, so we set about making it work for us.
We want to have our cake and eat it too – that means we need data to be safely on disk while enjoying the blinding speed that Redis provides. Let’s talk about data persistence when it comes to Redis.
The default Redis config will periodically perform an “RDB dump”, a full copy of the dataset at a point in time, straight to disk. This is great for small amounts of data, and convenient for quick restarts. However, because it’s a point-in-time snapshot of the dataset, you’ll lose any subsequent changes in the event of a crash.
The alternative is AOF logging (“Append-Only File”), which collects every redis command into an ever-growing logfile. In the event of a crash, Redis replays the whole log from scratch to recreate the dataset. This means you don’t lose any data if the server crashes, but the replay is slow and the AOF file will keep getting bigger unless you prune it periodically.
Both RDB and AOF persistence have their uses, and you can use both at the same time. That’s well and good, but there’s a few specific things that we care about:
- No data loss in the event of a crash
- Reliable backups
- Fast startup (important for high availability)
- Solid manageability from a high-availability standpoint
We set about solving these problems, one by one. The first one, at least, is solved by making use of AOF persistence.
Dynamic listen address specification
One of the first improvements we made was allowing a graceful failover to a replication slave. Redis supports replication out of the box, and it works great, but you can’t readily promote or replace a slave to make it a master. This is because the IP address that Redis listens on is set at startup, forcing you to put some sort of load-balancer or proxy in front. It’s really unnecessary when all you want to do is failover a single Redis instance.
So we fixed it, you can now change the listening IP address for a Redis instance, allowing a failover or migration with a couple seconds of downtime. The alternative is to restart the daemon, which takes a long time when you have a large dataset.
Backup dumps to an external command pipe
Regular RDB backups are convenient because they’re dumped straight to a self-contained file on disk. That’s fine for standard nightly backups, but there’s a bit more hassle in getting them off the machine if you’re concerned about quick recovery if the server goes up in smoke. You also incur heavy I/O penalty as everything is written to disk, only to be pulled off to a remote server a short time later.
We developed the PIPESAVE command to work around this; instead of dumping to disk, the Redis server runs a command specified in the config file and shovels bits down the pipe. This is great for offsite backups to Amazon S3, etc.
In-band RDB dumps to the client
PIPESAVE is pretty cool, but it’s a “push”-based backup solution. Real backup systems prefer to “pull” data from the server, so they can manage scheduling themselves. A backup system would ideally fetch an RDB dump straight from the Redis server, across the network, without the need for an intermediary file.
That’s what our DUMPSAVE command does. After making an ordinary Redis connection, the Redis client can request a DUMPSAVE using the standard Redis protocol. The Redis server will then switch to a raw, non-protocol mode and pump an RDB dump straight down the wire to the client, which can be pushed directly to disk.
We’ve found that this makes for a fantastic backup solution that can be driven by the backup server, on whatever schedule we find suitable. You can find both the pipesave and dumpsave enhancements in one of our topic branches at github.
AOF background rewrite monitoring
AOF files will grow forever, but thankfully Redis comes with functionality to consolidate an AOF file into a minimal set of commands needed to reconstitute the dataset. Redis does this in a background rewrite process to avoid holding up the master process.
This is very good, but there’s no way to know when a rewrite was last performed. The Redis server notes when the last RDB dump was performed, but there’s no equivalent for AOF backups. Until now, that is. Our nagios monitoring checks for the time of the last AOF rewrite and will notify us if it’s been too long since the last clean rewrite.
Truncate short AOF writes
It’s possible for an AOF write to bomb out and not write a full record to the end of the file (eg. if the disk is full). The official docs helpfully mention that the
redis-check-aof command can fix this for you, but that takes a long time when your AOF file is massive, and really isn’t fun when you can’t restart Redis because of a few lousy bytes at the end of the file.
This really shouldn’t be necessary, so we fixed it. The background rewrite will now try to clean up after itself if it gets in a pickle.
Retry writes to the AOF if we run out of space
AOF files use a lot of diskspace. That’s okay because disk is cheap. However, a background rewrite of an AOF file can cause a huge spike in disk usage, and it’s possible for that to get out of hand before anyone notices.
In the event that a background rewrite fails due to running out of diskspace, Redis will crash hard. There’s a certain cruel irony in the fact that the background rewrite process, designed to stop you from running out of diskspace due to an ever-growing AOF file, itself causes an out-of-space condition and kills the Redis server.
We’ve fixed this by making the usual AOF logger a bit more relaxed. If we run out of diskspace, the background rewrite process will eventually die, but the main AOF logger will retry the write in the hope that it might eventually succeed.
Redis is a very nifty piece of software. And, thanks to it being well written, we’ve been able to extend and improve the functionality to meet our needs and make it more robust.
With these improvements, we now suffer less downtime in the event of an outage, less chance of an outage to begin with, faster failovers on HA systems, and backups that are timely and produce minimal extra load on the Redis server. That’s pretty cool.