BLU Discuss list archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Reminder -- RAID 5 is not your friend

Subject: Reminder -- RAID 5 is not your friend
From: feenberg-fCu/yNAGv6M at public.gmane.org (Daniel Feenberg)
Date: Thu, 11 Mar 2010 07:07:07 -0500 (EST)
In-reply-to: <4B98D9D2.7020108-cGmSLFmkI3Y@public.gmane.org>
References: <20100311042755.GO14999@tao.merseine.nu> <4B98D9D2.7020108@napc.com>


On Thu, 11 Mar 2010, Grant M wrote:

> Dan Ritter wrote:
>> RAID 5 is not your friend.
>
> It depends. Most current systems will do RAID6 now, so it's probably
> moot. Anyhow, Read on...
>
>> A server with a mirrored setup for system disks and a RAID 5 for
>> storage reported a disk gone bad in the storage system. OK, the
>> alert is received, and we plan to replace the disk in the
>> morning.
>>
>> Before we can get around to it, another disk in the storage
>> system also dies. Poof.
>
> Typically, this is caused by the Spare disk that the system rebuilds on
> having bad blocks. The system starts to rebuild on the Spare, encounters
> a bad block and the rebuild dies. It seems this is typical of lower-end
> SATA Raids. Many enterprise-level hardware raid controllers with SATA

Also a characteristic of the software RAID in Linux and FreeBSD. However, 
note that you can still copy data off the raid to another disk array in 
degraded mode, you just can't rebuild this array in place. Your data isn't 
lost.

> will allow you to schedule 'bad-block scrubs'. What this does is during
> that scheduled time, the controller will go through the system and scan
> each disk in the system for potentially bad blocks, including the Spare.
> This helps ensure that the type of failure described above is avoided.
> For obvious reasons it can't be eliminated altogether, but minimizes the
> likelihood of it happening, and makes RAID 5 that much more reliable on
> SATA. However I do recommend at least RAID 6 on SATA.
>

It isn't at all clear to me why the rebuild can't continue past a bad 
block. It might be an unallocated sector, in which case, or it might be a 
sector that is still readable on an otherwise problematic drive. That is, 
a drive may have been failed for a problem on another sector but available 
for this sector. In those cases there is no downside to continuing. If the 
sector is truely lost, it still seems extreme to abort the entire rebuild 
for a single missing sector - although I realize it might be difficult to 
report this to the user in a usable manner - a file name rather than a 
sector number.

> Rant: this sort of config likely isn't possible with the vendor embedded
> raid controllers that come with typical HP/Dell/IBM server hardware. In
> reality I would never recommend using those for anything more than
> mirroring internal disks. Enterprise storage for critical data needs to
> be purpose built hardware that you spend more for than you spent on your
> server (magnitudes more). To the best of my knowledge, Buffalo, Netgear,
> and PlaySkool don't make enterprise-level raid hardware. If you're
> hacking together some white-box homemade solution, or buying something

3ware controllers have a "continue on error" option but AFAIK it is not 
the default. I don't know about other controllers and I don't know what 
happens in the bad sector - perhaps it is just zeroed out?

Daniel Feenberg

> with a name you've only ever seen in consumer-level products, you're
> building a tree fort, and you should expect tree fort level results.
> Assess how much you're going to lose in productivity per hour that this
> device is inaccessible, and then evaluate how much it's worth to NOT
> have that happen.
>
> Grant M.
> -- 
> Grant Mongardi
> Senior Systems Engineer
> NAPC
>
> gmongardi-cGmSLFmkI3Y at public.gmane.org
> http://www.napc.com/
> blog.napc.com
> 781.894.3114 phone
> 781.894.3997 fax
>
> NAPC | technology matters
>
>
> _______________________________________________
> Discuss mailing list
> Discuss-mNDKBlG2WHs at public.gmane.org
> http://lists.blu.org/mailman/listinfo/discuss
>

References:
- Reminder -- RAID 5 is not your friend
  - From: dsr-mzpnVDyJpH4k7aNtvndDlA at public.gmane.org (Dan Ritter)
- Reminder -- RAID 5 is not your friend
  - From: gmongardi-cGmSLFmkI3Y at public.gmane.org (Grant M)

Prev by Date: Reminder -- RAID 5 is not your friend
Next by Date: Reminder -- RAID 5 is not your friend
Previous by thread: Reminder -- RAID 5 is not your friend
Next by thread: Reminder -- RAID 5 is not your friend
Index(es):
- Date
- Thread


BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.