Posts Tagged ‘raid’

RAIDing USB flash disks – not just a silly stunt

Tuesday, September 29th, 2009

We’ve seen it all before:

hay guyz, check this out, I got a bunch of old 64mb thumb drives and made a RAID out of them! now i can put all my pr0n on there roffle lolololll

RAIDed floppies? It’s been done. RAIDed tapes? Yo dawg, that’s an enterprise storage solution! Let’s talk seriously now.

I have a fileserver that my family uses, it’s just a box with a couple pairs of hard drives in it (RAID-1, thank you very much. None of this starving-student crap with an oddball assortment of drives in RAID-0). Given that the box is used exclusively for serving up SMB shares, the OS installation is tiny.

I could’ve gone with something really stripped down and optimised, but that would require effort; sysadmins are allergic to unnecessary effort. Instead I just installed Ubuntu jaunty via netinst. Laugh all you want, but I have better things to do, like sleep.

Close-up of chikage's OS drives

Close-up of chikage's OS drives

The old system was whining about missing one half of its RAID-1, so I decided to splurge on a pair of 4gb USB flash disk – the princely sum of $22 for the pair. I setup the md software raid volumes ahead of time, which were happily picked up by the ubuntu installer – 512MiB /boot partition and the rest handed off for LVM to manage.

I could bore you with a bunch of details, but who cares about that.

  • Does it work? Yes, albeit a bit slower during bootup – total boot time from power-button to login prompt is 90 seconds.
  • Does the RAID work? Nicely, thank you. You can yank a drive out and it’ll keep ticking along.
  • Is there enough capacity? Plenty, the OS filesystem is 44% full.
  • Won’t swapping kill it? Yes, maybe eventually. The system has 1GiB of RAM, more than enough when you consider it’s only really using about 100MiB. At least there’s a chance both drives won’t fail at exactly the same time, so I can replace one.
  • Am I taking backups? Of course! If it toasts itself it’s not big deal.

What next? Hmm, if I splash out I could buy another pair of flash disks and kick it up to RAID-10 for a performance boost!

Tales of Hardware – IBM x3650

Tuesday, March 10th, 2009

All the servers Anchor buys are from Supermicro. Most people won’t have heard of them, but they’re a sizeable hardware vendor that also does some OEM gear. Supermicro certainly doesn’t carry the mindshare of other big brands like HP, Dell, et al., but we chose them because their stuff is reliable and affordable – we focus on the things that actually matter, rather than some enterprise-y idea of sticking with big brands that you trust – “noone ever got fired for buying IBM” they say.

Actually, hold that thought for a moment.
(more…)

Safely handling RAID failure

Monday, March 9th, 2009

With hard discs being by far the most common point of failure in servers RAID does wonders for protection against loss of data.

With a RAID array in normal operation we’re in a pretty safe place. We know that we can suffer failure of a drive without loss of data or disruption of service. Once a drive has failed however we’re in a slightly more precarious position. Loss of another drive or damage to the remaining drive could easily cause major problems. At this point the only thing that can protect you can against data loss if you make a mistake is your backups – you did configure backups didn’t you?

Restoring a damaged RAID array is a task that requires extra caution. 

On our range of dedicated servers and vps‘ it’s one of those things that just happens automatically and the client usually only finds out after the problem has been fixed. For our co-location customers however it’s a task that we often find ourselves involved with to lend a helping hand.

With this in mind we’ve started to put together a series of articles discussing the steps we take to restore a Linux RAID array after hard disc failure and recoving from a Windows software RAID failure We hope you find them useful.

A tale of two drives

Thursday, October 9th, 2008

It’s no secret that we’d rather be working on Linux than Windows here at Anchor. It is, by and large, much more annoying to actually get anything done, but it also just breaks in opaque and unexplained ways. O Windowes, let me count the ways in which you are broken! This is one such problem we ran into yesterday.

Hard drive failure is a fact of life when you run servers, by sheer virtue of that fact that you have hundreds of them. To mitigate the risk and reduce unscheduled downtime, we use Window’s built-in software RAID feature. It’s not an enterprise solution, but it gets the job done. What’s important is staying online and not losing data.

Did I mention that trying to monitor a Windows box is a nightmare? A colleague of mine wrote a script to allow us to keep a watchful eye on Windows RAID volumes, it’s a lifesaver. A recently-deployed machine got a broken mirror, which we were able to act on immediately. We removed the dodgy mirror and prepared a replacement (we always have plenty of spares, of course). Allow me now to re-enact this scene…

Windows (sounding almost efficient): The driver has detected that device \Device\Harddisk1\DR9 has predicted that it will fail

Sysadmin: Thanks, Windows, I’ll get right on that. You didn’t say whether that was SMART, or just voodoo, but whatever, it’s good to know.

The bad drive is removed and a replacement installed in the hotswap drive bay

Sysadmin: Okay, Windows, do your stuff. “Scan for new hardware”, please.

A pause.

Sysadmin: Ahem, Windows, “Scan for new hardware” and find my drive.

Windows: ‘Ey there, chaps. Do what now, you say? AIEEEEGRH!!

The server stops responding entirely, necessitating a touch of the reset button

Needless to say, we’re rather unimpressed, and have to call the customer to let them know why it’s just dropped offline.

A quick check of the logs is in order. It’s also frustrating that there’s no sane way to scroll through log entries in Windows with something like a text editor, or to “tail” a log as it’s updated in realtime.

09:36 – The previous system shutdown at 9:21:23 AM on 8/10/2008 was unexpected.

Okay, it went down at 09:21, which is correct. Now if we look back in time a little…

09:21 – dmio: Harddisk1 write error at block 1953524618 due to disk removal

*sigh* And this is after the disk was removed cleanly…

Site links
Anchor
Wiki
Blog
Services
Domain names
Web hosting
VPS
Dedicated Servers
Co-location
Articles
Dedicated Server Purchasing Guide
Dedicated Server Tutorials
Developer Friendly Hosting
Useful Tools