Wiki

Handling Massive Filesystems

Spinning platter-based storage is growing to sizes we once never even dreamed of. I still remember my first 286-based computer with a 20MB hard drive, which seemed impressive at the time and could easily hold the hundreds of Wordperfect documents I had created. Now it is quite easy to find multiple terabytes within even a lowly desktop machine.

When you are talking about handling storage subsystems with this kind of capacity, the systems administration game changes slightly. Most of us with the need for this kind of storage will be running 64-bit systems so we don't need to worry about 32-bit limitations anymore but a lot of the tools and supporting systems aren't designed for storage of such magnitude. This article aims to present a few of the items you'll need to start thinking about when dealing with multiple-terabyte data storage subsystems.

Hardware

If you are building a system for large volumes of data storage there are several issues you still need to deal with:

RAID level

RAID levels have been documented to death. We will avoid duplicating the same work again here, but we have an informative article which describes the most common RAID levels.

Partitions

There are some limitations on Linux that you will need to be aware of when dealing with very large disks:

As mentioned previously, you should check that your hardware RAID solution is flexible enough to work around these sorts of problems. For example, if it does not allow you to create multiple arrays per disk you will find that you need to allocate at least two disks to a completely separate RAID1 volume to have a redundant boot volume. This is assuming your disks are smaller than 2TB. At such point that disks larger than 2TB are available, multiple arrays per disk will become a necessity to not permanently waste space on the disks.

Hopefully at this time, standard Linux utilities in the major supported distributions such as RHEL and SLES will support GPT partition tables and disks larger than 2TB.

Recent versions of Windows Server do not encounter the same problems, as far as we know.

Filesystems

Filesystems are often an issue of much contention, but rightly so. You need to select the most appropriate filesystem not only for your application, but also bearing in mind limitations of the filesystems available and real-world limitations of the hardware:

Recovery

One frequently neglected aspect when dealing with large filesystems is the time to repair aspect. Any one of the above filesystems may satisfy your operational goals, but if there is a system crash and the filesystem needs to be checked, a lot of filesystems will require an unacceptably large amount of time to check. There are some strategies to deal with this:

Benchmark

If you are in some doubt as to what will be best for your application, it is wise to try benchmarking the different configuration options you have shortlisted. Once you have the hardware it is usually quite easy to play around with different RAID levels and filesystems in order to get a feel for what will work best for you. We have recently added an article on I/O Benchmarking which should help you in this regard.


See also:

References/External Links

Wiki: dedicated/HandlingMassiveFileSystems (last edited 2009-09-25 08:58:21 by KeiranHolloway)