100% FAT-free

Published February 6th, 2012 by Barney Desmond

I wrote some documentation for our sysadmins last week detailing how one should deal with a critical diskspace notification at some ungodly hour of the morning. On the specifics of checking filesystems with the df tool:

“Astute readers will notice that we don’t query btrfs filesystems here; this is because btrfs uses extents, and inodes are a non-issue.”

Well, I wasn’t entirely wrong, but I wasn’t entirely right either.


btrfs is a modern filesystem with lots of shiny new features. It’s definitely not production-ready yet, but like a magpie drawn to shiny things, a couple of us use btrfs on our own machines (it’s what backups are for, right?).

Some time ago I wrote about how an ext filesystem can run out of free inodes and bite you. That happened to me last Thursday, only this time it was btrfs under the hood.


I first noticed the problem when puppet wouldn’t run, saying there was already another instance running. puppet is dumber than a bag of rocks so I pressed on, trying to run aptitude update instead.

root@misaka:~# aptitude update
E: Write error - write (28: No space left on device)

O rly? df disagreed about that. I immediately thought of inode exhaustion, but btrfs isn’t meant to suffer from this problem! To prove it, I touched a few files, successfully wrote some bits, deleted them again – all good.

Their curiosity piqued, my fellow sysadmins cracked open the strace and confirmed what we knew: ENOSPC from the write() call. We were at a loss until someone serendipitously spotted some errors in the syslog:

Feb 2 19:09:31 misaka kernel: [683642.593034] no space left, need 4096, 10694656 delalloc bytes, 696373248 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 707067904 total
Feb 2 19:09:55 misaka kernel: [683666.684247] no space left, need 4096, 6905856 delalloc bytes, 700162048 bytes_used, 0 bytes_reserved, 0 bytes_pinned, 0 bytes_readonly, 0 may use 707067904 total

A little googling produced a promising bug ticket on Redhat, “[btrfs] hopeless ENOSPC handling and excessive administration costs“.

The short version for our specific scenario is: df doesn’t expose some exhaustion issues because btrfs doesn’t work like a classic filesystem.
This is where you can start moaning about how btrfs is FitH if you’re so inclined, but I like playing with my shiny toys, thank you.


btrfs has its own version of df for inspecting the filesystem:

root@misaka:~# btrfs filesystem df /var

Metadata, DUP: total=95.12MB, used=15.16MB
System, DUP: total=8.00MB, used=4.00KB
Data: total=674.31MB, used=665.52MB       <-- Under 10MB free!!
Metadata: total=8.00MB, used=0.00
System: total=4.00MB, used=0.00

This would explain why I could create files myself, but stuff like aptitude was failing when it tried to write more than several MB. You'll also notice that there's a lot of allocated-but-unused metadata space in the first line of output.

We have a tool to fix this, and unlike btrfsck it's actually usable. We can rebalance the filesystem to adjust the proportion reserved for data. Some commenters on the bugzilla ticket noted that it caused a kernel panic when they ran it, but that was two years ago. It's probably fixed by now...

root@misaka:~# btrfs filesystem balance /var

# Now when we run `df` again...
Metadata, DUP: total=47.56MB, used=15.20MB  <--- Much less allocated
System, DUP: total=8.00MB, used=4.00KB
Data: total=745.38MB, used=665.52MB         <--- Plenty of free space
Metadata: total=0.00, used=0.00
System: total=4.00MB, used=0.00

Mission Accomplished!

aptitude and puppet run fine now, so all is well. As a note, the rebalancing is (subjectively) not fast: it took 7-8sec on that 1gb filesystem.


To wrap things up, I thought I might extend that filesystem a bit, as some more breathing room would be good. The btrfs volume is on an LVM logical volume, so this is a pretty easy task.

  1. Extend the LVM LV by 512MiB
    lvextend -L +512M /dev/misaka/var
    
  2. Grow the btrfs filesystem to fill the newly-enlarged block device
    btrfs filesystem resize max /var
    
  3. Rebalance the btrfs filesystem (optional?)
    btrfs filesystem balance /var
    

Now, I'm not sure whether the final rebalance is strictly necessary. The system's df tool acknowledges the extra size after the resize operation, but btrfs-df shows no change in its output until the rebalance is done. A little testing would be in order, but I'd rather do it on a dedicated testing machine.

Any other cowboys out there using btrfs? Your data may or may not be intact when the sun rises tomorrow, but boy it's exciting!

Tags: , ,
Posted in FTW

 Leave a comment

0
Comments

LCA day 2

Published January 19th, 2012 by Barney Desmond

Bit of a quiet day today, the highlight was probably the presentations on btrfs and xfs. Btrfs has been developing nicely, and Avi Miller got up to spruik some of the newer features of the filesystem. A bit like ZFS (which isn’t compatible with Linux licensing terms), it pulls in a lot of smarts that are usually the domain of your RAID controller/subsystem. This means more flexibility in how you handle your data, but a lot of new complexity too.

It’s exciting stuff, but we’ll be waiting a bit longer to consider it robust enough to use in production. We’d kill for the integrated snapshotting (great for backups) and data integrity checking (store CRCs with your data) features.

Meanwhile, XFS reports steady progress and positions itself as the filesystem of choice for Really Big systems. Not that anyone would admit to it, but it was clear there was a little bit of rivalry between the two, especially since both talks were back-to-back in the same room. :)

Dave Chinner talked about how they’ve spent a lot of time working through the metadata performance issues that have caused headaches for scaling-up in the past, and reckons XFS should scale linearly, unlike the competition. Probably not something you’ll lose sleep over when deciding how to format your root filesystem, but definitely important for databases and big filestores.


In lieu of other diversions, let’s have a look at the LeoStick, which was included in the bag of goodies for LCA attendees, alongside the requisite stubby coolers and mousepads.

Unless you’ve been living under a really big rock, the Arduino is the go-to platform for hackers wanting to build embedded systems. This is thanks to ease of programming, fast prototyping, and expansion options (need a thermal probe? fingerprint scanner? CCD camera? there’s probably a single shield module with all of those things). The Leostick is particularly cute in that it comes in USB thumbdrive form-factor. As this is a pre-release board, the more cynical amongst us will note that this is a stroke of marketing genius that should result in some free beta-testing. Heh.

I know a couple of my fair colleagues are handy with a soldering iron; just quietly, this thing may or may not have had something to do with requests from the LCA organisers to stop messing with the exposed USB ports on the electronic door locks around campus.

0
Comments