Today we’re talking about our experience with btrfs, the next-gen Linux filesystem. btrfs has been maturing rapidly over the last few years and offers many compelling features for modern systems, so we’ve been putting it through its paces on some of our backup servers.
How does it stack up? Read on!
We chose to test btrfs on backup servers because they can make good use of the features on offer, and the threat-level of data loss is low. For our backups, the biggest benefit comes from copy-on-write and atomic snapshots.
At Anchor we use a modified version of Dirvish with support for btrfs: instead of hardlinking directories to provide historical snapshots we just use btrfs’ snapshot facility, which is very quick. Expiring old snapshots is similarly quick – it’s a btrfs operation instead of traversing a filesystem tree.
In general btrfs has been a very positive experience. We’re looking forward to btrfs getting dedupe support in future, something that ZFS already does, which could pay off massively in our environment.
However, in recent weeks it looks like we’ve reached a sort of “critical” mass and a few near-showstoppers have cropped up.
Each backup server hosts around 100-150 backup clients. Every night at midnight they swing into action, creating dozens of snapshots at once and hammering the network for all it’s worth. It’s not perfectly reliable, but we’ve observed hung tasks coincident to snapshotting with a fair degree of regularity. When this happens it blocks I/O to the filesystem, which makes for decidedly ineffective backups.
Quota groups (qgroups) are a relatively new feature, added about six months ago in version 3.6 of the kernel. As well as enabling policy-based usage restrictions, quota groups allow for detailed usage reporting, which is very useful for the usage-based billing that we do. We believe we’ve stumbled upon a bug in the qgroups code that causes CPU soft lockups.
CPU soft lockups are basically unrecoverable, so we’re forced to reboot the system. This has a knock-on effect that, as best we can tell, corrupts btrfs’ free space cache, requiring a time-intensive rebuild after the reboot (over an hour on our 16TB filesystems). Failure to do so results in an error some hours later, with the problem being detected and forcing the filesystem into read-only mode for safety. We haven’t nailed the problem to the qgroups code with absolute certainty, but our investigations are pointing in that direction.
The final lesson for today relates to full filesystems: you never, ever want to fill up a btrfs filesystem. Normal filesystems (eg. ext3) behave fairly predictably when they fill up or get close to capacity. Your system might behave erratically, but by and large it’s easy enough to fix.
In the one instance that we filled up a btrfs filesystem due to some misplaced rsync options, it slowed everything down to the point of being unusable. It wasn’t practical to diagnose the exact reason for the slowdown, but it suffices to say that if you can’t even navigate the filesystem to fix the error then it’s a big problem. Avoiding a recurrence was actually a key motivator for enabling qgroups when they appeared, but it didn’t quite go to plan.
Summary: it’s not all doom and gloom
This probably all sounds very critical, but we think the reality is actually quite positive:
- Staff have been using btrfs on workstations and personal servers, and it works great
- We’ve used btrfs in conjunction with Ceph and had no problems
- Zero incidents of data loss or corruption in our testing so far
- btrfs has actually detected single-bit corruption in an underlying hardware RAID volume – that’s a win for data integrity!
- btrfs has fairly gracefully managed thousands of snapshots on each of our backup servers
In the short-term we’ll be pushing most of our systems back to ext4 with hardlinking, while keeping an eye on btrfs and zfs for backups.
Development is very active, with new features and bugfixes appearing in every kernel release. Many see btrfs as the future for Linux beyond ext4, and we think it’s worth trying if you haven’t already – it probably won’t be too long until it’s the default recommended filesystem for a distro like Fedora or Ubuntu.