Boston Linux & UNIX was originally founded in 1994 as part of The Boston Computer Society. We meet on the third Wednesday of each month, online, via Jitsi Meet.

BLU Discuss list archive


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Reminder -- RAID 5 is not your friend



On Mon, Mar 15, 2010 at 3:41 PM, Daniel Feenberg <feenberg-fCu/yNAGv6M at public.gmane.org> wrote:
>
>
>
> On Mon, 15 Mar 2010, Kent Borg wrote:
>
>> Richard Pieri wrote:
>>> And neither is RAID 1. ?Except when you get lucky.
>>>
>>> I had a failure over the weekend. ?Two mirrored pairs, A1/A2 B1/B2 configuration. ?A2 and B1 failed simultaneously.
>>
>> Sounds like it is *disks* that are not your friend. And, that they hate
>> you enough that your use of raid isn't enough to save you.
>>
>> My conclusions:
>>
>> 1. don't run matched disks from the same manufacturer and lot
>> 2. watch disk temperature
>> 3. watch smartmon for indications of aging
>> 4. replace disks before they die
>> 5. use your replacements as an opportunity to get your pairs staggered
>> 6. have backups that at minimum are ping-ponged, current, and physically
>> offline
>> 7. goto #1...
>
> In most cases this is not a case of simultaneous failure due to common
> disk wear or defects, or power supply events, or controller problems. In
> most cases of apparent simultaneous failure Disk 2 has a bad sector that
> has never been written to. Such a sector can remain undisturbed for the
> life of the disk, or until the RAID software attempts to sync with another
> disk. When Disk 1 fails (and is noticed by the RAID software) and is
> replaced the sync starts copying Disk 2 to the new Disk 1 and runs until
> the bad sector on Disk 2 is encountered, at which point it announces the
> fact that Disk 2 has failed. But it didn't fail during the sync - it was
> probably bad from day 1, and if written to would have been remapped
> transparently to the user and the Raid software.

I don't understand what you are suggesting here.  When Disk 2 was made
a fully functioning member of the RAID subsystem, why wasn't every
block relevant to using it for recovery written at least once to
initialize it?   Isn't that what a RAID build/rebuild guarantees?  I
could see in the case of mirrored drives that the block was only
written once and never read again (any reads fulfilled by a different
drive), but I don't see how the block gets away with never being
written at all.

Bill Bogstad







BLU is a member of BostonUserGroups
BLU is a member of BostonUserGroups
We also thank MIT for the use of their facilities.

Valid HTML 4.01! Valid CSS!



Boston Linux & Unix / webmaster@blu.org