I wrote some documentation for our sysadmins last week detailing how one should deal with a critical diskspace notification at some ungodly hour of the morning. On the specifics of checking filesystems with the df tool:
“Astute readers will notice that we don’t query btrfs filesystems here; this is because btrfs uses extents, and inodes are a non-issue.”
Well, I wasn’t entirely wrong, but I wasn’t entirely right either.
btrfs is a modern filesystem with lots of shiny new features. It’s definitely not production-ready yet, but like a magpie drawn to shiny things, a couple of us use btrfs on our own machines (it’s what backups are for, right?).
Some time ago I wrote about how an ext filesystem can run out of free inodes and bite you. That happened to me last Thursday, only this time it was btrfs under the hood.
I first noticed the problem when puppet wouldn’t run, saying there was already another instance running. puppet is dumber than a bag of rocks so I pressed on, trying to run aptitude update instead.
root@misaka:~# aptitude update E: Write error - write (28: No space left on device)
O rly? df disagreed about that. I immediately thought of inode exhaustion, but btrfs isn’t meant to suffer from this problem! To prove it, I touched a few files, successfully wrote some bits, deleted them again – all good.
Their curiosity piqued, my fellow sysadmins cracked open the strace and confirmed what we knew: ENOSPC from the write() call. We were at a loss until someone serendipitously spotted some errors in the syslog:
Feb 2 19:09:31 misaka kernel: [683642.593034] no space left, need 4096, 10694656 delalloc bytes, 696373248 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 707067904 total Feb 2 19:09:55 misaka kernel: [683666.684247] no space left, need 4096, 6905856 delalloc bytes, 700162048 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 707067904 total
A little googling produced a promising bug ticket on Redhat, “[btrfs] hopeless ENOSPC handling and excessive administration costs“.
The short version for our specific scenario is: df doesn’t expose some exhaustion issues because btrfs doesn’t work like a classic filesystem.
This is where you can start moaning about how btrfs is FitH if you’re so inclined, but I like playing with my shiny toys, thank you.
btrfs has its own version of df for inspecting the filesystem:
root@misaka:~# btrfs filesystem df /var Metadata, DUP: total=95.12MB, used=15.16MB System, DUP: total=8.00MB, used=4.00KB Data: total=674.31MB, used=665.52MB <-- Under 10MB free!! Metadata: total=8.00MB, used=0.00 System: total=4.00MB, used=0.00
This would explain why I could create files myself, but stuff like aptitude was failing when it tried to write more than several MB. You'll also notice that there's a lot of allocated-but-unused metadata space in the first line of output.
We have a tool to fix this, and unlike btrfsck it's actually usable. We can rebalance the filesystem to adjust the proportion reserved for data. Some commenters on the bugzilla ticket noted that it caused a kernel panic when they ran it, but that was two years ago. It's probably fixed by now...
root@misaka:~# btrfs filesystem balance /var # Now when we run `df` again... Metadata, DUP: total=47.56MB, used=15.20MB <--- Much less allocated System, DUP: total=8.00MB, used=4.00KB Data: total=745.38MB, used=665.52MB <--- Plenty of free space Metadata: total=0.00, used=0.00 System: total=4.00MB, used=0.00
aptitude and puppet run fine now, so all is well. As a note, the rebalancing is (subjectively) not fast: it took 7-8sec on that 1gb filesystem.
To wrap things up, I thought I might extend that filesystem a bit, as some more breathing room would be good. The btrfs volume is on an LVM logical volume, so this is a pretty easy task.
- Extend the LVM LV by 512MiB
lvextend -L +512M /dev/misaka/var
- Grow the btrfs filesystem to fill the newly-enlarged block device
btrfs filesystem resize max /var
- Rebalance the btrfs filesystem (optional?)
btrfs filesystem balance /var
Now, I'm not sure whether the final rebalance is strictly necessary. The system's df tool acknowledges the extra size after the resize operation, but btrfs-df shows no change in its output until the rebalance is done. A little testing would be in order, but I'd rather do it on a dedicated testing machine.
Any other cowboys out there using btrfs? Your data may or may not be intact when the sun rises tomorrow, but boy it's exciting!