Why you should use LVM: part 1319755409 in an infinite series

Published October 28th, 2011 by matt

At anchor, we loves us some LVM. It makes managing storage a breeze.

Now, we’ve got one more reason to love it. Because now we have lvmsync.

Like most modern hosting companies, we run a lot of VPSes, and sometimes the chunk of storage they’re on isn’t where it needs to be, so we have to transfer it around. Ordinarily, this would involve a lengthy period of downtime while dd did it’s business and sent the whole LV across a network.

NOT ANY MORE! Now, with the magic of lvmsync we can transfer the bulk of the data while the VM is running, and then do a quick transfer of just the changes after we shut the VM down.

We think our customers will appreciate the reduction in downtime, and I know we’ll appreciate the wonders of LVM just that little bit more.

Tags: , ,
Posted in FTW

 Leave a comment

0
Comments

Elegantly reverting unintentional LVM changes

Published June 30th, 2011 by benoit

This is the first in a series of articles we’ll eventually publish about various interesting tricks relating to LVM – the Linux Logical Volume Manager. Stay tuned for more coverage.

Occasionally you find yourself having committed a change to LVM (e.g. extending a logical volume, removing a logical volume, etc.) that you didn’t intend to make.

In some cases, it’s possible to revert the change by simply performing the opposite action (e.g. lvextend with a positive offset after an unintentional lvreduce). This is by no means elegant, but it almost always works.

However, there are unintentional changes which can’t be elegantly reverted in this fashion. For instance, assume I have a logical volume foo split across multiple physical segments:

root@cyllene:~# lvdisplay -m

— Logical volume —
LV Name                /dev/cyllene/foo
LV UUID                yaLJM6-ZzWi-cR89-ZJ4T-lUiI-tnqk-9biVeM
LV Size                40.00 GiB
Current LE             10240
Segments               2
Block device           252:3

— Segments —
Logical extent 0 to 7679:
Type                linear
Physical volume     /dev/md1
Physical extents    36441 to 44120

Logical extent 7680 to 10239:
Type                linear
Physical volume     /dev/md1
Physical extents    51801 to 54360

There is no guarantee that if I accidentally delete this logical volume and re-create it again with the exact same size, that precisely the same physical extents (and the same two above segments) would once again be used to satisfy the logical extent allocation. If different physical extents are used, this will make it impossible to read the partition that originally existed within the logical volume before it was deleted (you’d like to be able to use it again, right?). In addition to this, the UUID and block device of the re-created logical volume will be different, possibly affecting your ability to mount the device by UUID (or by device mapper minor number, but most people don’t do this). The lvcreate utility simply doesn’t give you the flexibility to re-create the logical volume in exactly the state that it previously existed.

This is where LVM’s wonderful automated configuration archiving feature comes in handy.

Let’s delete the LV from the above example, and re-create it with the same size (40 GiB):

root@cyllene:~# lvremove /dev/cyllene/foo
Do you really want to remove active logical volume foo? [y/n]: y
Logical volume “foo” successfully removed

root@cyllene:~# lvcreate -n foo -L40G cyllene
Logical volume “foo” created

Now, let’s check which physical extents were allocated to satisfy our requested allocation of 10240 logical extents (or roughly 40 GiB):

root@cyllene:~# lvdisplay -m

— Logical volume —
LV Name                /dev/cyllene/foo
LV UUID                E4hnBR-ODdX-HwOs-1ssl-C40v-vaus-XefvQe
LV Size                40.00 GiB
Current LE             10240
Segments               1
Block device           252:3

— Segments —
Logical extent 0 to 10239:
Type                linear
Physical volume     /dev/md1
Physical extents    51801 to 62040

Woops. Suddenly our mapping of 10240 logical extents (previously in two physical segments with PE mappings 36441-44120 and 51801-54360) has turned into one large monolithic segment (with PE mapping 51801-62040). This is definitely not what we wanted.

Luckily, LVM takes a snapshot of the volume group’s configuration state before any action on the volume group is taken. These are safely stored in files named /etc/lvm/archive/<volumegroup>_<serial>.vg, where serial is the iteration of the state of the volume group.

Let’s pretend I just deleted the logical volume foo from the above example. I’d now like to roll this change back so that the volume group is returned to the state it was before I executed the lvremove command.

Let’s first list the available restore points, using the vgcfgrestore –list <volume group> command:

root@cyllene:~# vgcfgrestore –list cyllene

<snip>

File:         /etc/lvm/archive/cyllene_00008.vg
VG name:      cyllene
Description:  Created *before* executing ‘lvremove /dev/cyllene/foo’
Backup Time:  Thu Jun 30 17:00:53 2011

File:         /etc/lvm/backup/cyllene
VG name:      cyllene
Description:  Created *after* executing ‘lvremove /dev/cyllene/foo’
Backup Time:  Thu Jun 30 17:00:53 2011

Ok. It seems like we want to restore the state of the volume group to that which is described in the file /etc/lvm/archive/cyllene_00008.vg. Let’s execute another vgcfgrestore command, this time with the –file <restore file> argument:

root@cyllene:~# vgcfgrestore –file /etc/lvm/archive/cyllene_00009.vg cyllene
Restored volume group cyllene

root@cyllene:~# lvdisplay -m

— Logical volume —
LV Name                /dev/cyllene/foo
LV UUID                yaLJM6-ZzWi-cR89-ZJ4T-lUiI-tnqk-9biVeM
LV Size                40.00 GiB
Current LE             10240
Segments               2

— Segments —
Logical extent 0 to 7679:
Type                linear
Physical volume     /dev/md1
Physical extents    36441 to 44120

Logical extent 7680 to 10239:
Type                linear
Physical volume     /dev/md1
Physical extents    51801 to 54360

Excellent! Our logical volume  has been re-created and returned to precisely the state it existed in before it was removed by our inadvertent lvremove command. Note that you still need to make it active before you’re allowed to mount it again:

root@cyllene:~# lvs
LV         VG      Attr   LSize   Origin Snap%  Move Log Copy%  Convert
foo        cyllene -wi—  40.00g

root@cyllene:~# lvchange -ay /dev/cyllene/foo
root@cyllene:~#

You can now make use of the logical volume exactly as you would have previous to it being deleted.

Tags: ,
Posted in FTW

 Leave a comment

1
Comment

Large filesystem “support”

Published April 24th, 2009 by oliver

I’ve written recently on how to handle systems with very large storage subsystems. One would think that as we make our way through 2009 that the supporting tools for such large filesystems are at the top of their game, but as I’ve been playing with 24TB of storage I’ve realised that this is hardly the case:

  • The most commonly used bootloader for Linux systems, GRUB, doesn’t yet have capabilities to boot from GPT partitions (at least not in the stable release)
  • The most commonly used partitioner, fdisk, doesn’t support GPT-partitioned disks (and hence no disk larger than 2TB)
  • GNU parted, which does support GPT, insists on performing all partition resize operations itself (including resizing the contained filesystem). Since it doesn’t yet understand LVM, it can’t resize any partition that contains an LVM PV.

Today I ran into what appears to be a bug in the CentOS 5.3 installation partitioner, which left my 12TB RAID volume only partitioned to 8TB when I had supplied the –grow parameter in the Kickstart script. Since parted can’t resize LVM partitions, and there don’t appear to be any other tools out there at the moment for GPT partitioning on Linux, I’m left in a less than ideal position.

GNU parted can’t resize the partition because it can’t understand LVM. Fortunately I can just use it to create another partition with the remaining space and add it to the existing LVM volume group but this is really just a hack, and one that disturbs my obsessive-compulsive sysadmin nature. Were it not for the flexibility of LVM, we would be in a bit of a mess.

Sadly, it seems the large filesystem support that will soon become essential for everyone is largely lacking in adequate support.

2
Comments

Inode shortage reaches critical levels

Published February 16th, 2009 by Barney Desmond

A customer got in touch with us recently saying they couldn’t upload files via FTP due to insufficient diskspace, but there was plenty of free space apparent when they logged in and checked. We don’t normally manage their server, but we said we’d take a look.

root@aria:~# touch /srv/www/newfile
touch: cannot touch `/srv/www/newfile': No space left on device

After logging in and taking a look around, the problem became apparent.

root@aria:~# df -h
Filesystem                Size  Used Avail Use% Mounted on
/dev/mapper/nayuki-root  1016M  536M  429M  56% /
/dev/mapper/nayuki-usr    4.0G  1.3G  2.6G  34% /usr
/dev/mapper/nayuki-var    4.0G  1.3G  2.6G  33% /var
/dev/mapper/nayuki-www     50G   41G  6.4G  87% /srv/www

There’s plenty of disk space…

root@aria:~# df -i
Filesystem                Inodes   IUsed   IFree IUse% Mounted on
/dev/mapper/nayuki-root    65536    7244   58292   12% /
/dev/mapper/nayuki-usr    262144   96830  165314   37% /usr
/dev/mapper/nayuki-var    262144   19680  242464    8% /var
/dev/mapper/nayuki-www   3276800 3276800       0  100% /srv/www

But no spare inodes!

Inodes are a filesystem structure used to store metadata about files (we use ext3, the default, so the following details may not necessarily apply to other filesystems). This of course takes up space on the disk, so inodes are preallocated when the filesystem is first created. The practical consequence is that you can’t create any more files once you run out of inodes. Not all filesystems work in the exact same way, so the system gives you a general error about being out of space.

This is usually correct – to be honest, I’ve never actually seen a filesystem run out of inodes before. It’s something of a nuisance that you can’t allocate extra inodes on an existing filesystem. The best short-term solution we could offer the customer was to remove or migrate some files.

If you use Linux’s Logical Volume Manager (LVM) subsystem, you can pull some trickery and grow an existing filesystem, but this doesn’t let you create an arbitrary number of spare inodes. If you’re going to have a massive number of files, the filesystem needs to be created from scratch with lots of empty inodes.

Chances are you’ll never run into this, but if you deal with systems that have a lot of small files (eg. mail servers, possibly public-facing web servers) it’s something to keep in mind when you set things up. Although it’s not something we’d run into before, we were able to identify the problem and provide a quick solution thanks to our extensive general knowledge of our systems.

Useful commands (ext2/3 filesystems on linux):

df -h
     diskspace utilisation on filesystems
df -i
     inode utilisation on filesystems
dumpe2fs -h /dev/nayuki/www
     get info on the filesystem (specify the path to the block device)
1
Comment