Reminder -- RAID 5 is not your friend

Daniel Feenberg feenberg-fCu/yNAGv6M at public.gmane.org
Mon Mar 15 17:36:48 EDT 2010



On Mon, 15 Mar 2010, Bill Bogstad wrote:

> On Mon, Mar 15, 2010 at 3:41 PM, Daniel Feenberg <feenberg-fCu/yNAGv6M at public.gmane.org> wrot=
e:
>>
>>
>>
>> On Mon, 15 Mar 2010, Kent Borg wrote:
>>
>>> Richard Pieri wrote:
>>>> And neither is RAID 1. =A0Except when you get lucky.
>>>>
>>>> I had a failure over the weekend. =A0Two mirrored pairs, A1/A2 B1/B2 c=
onfiguration. =A0A2 and B1 failed simultaneously.
>>>
>>> Sounds like it is *disks* that are not your friend. And, that they hate
>>> you enough that your use of raid isn't enough to save you.
>>>
>>> My conclusions:
>>>
>>> 1. don't run matched disks from the same manufacturer and lot
>>> 2. watch disk temperature
>>> 3. watch smartmon for indications of aging
>>> 4. replace disks before they die
>>> 5. use your replacements as an opportunity to get your pairs staggered
>>> 6. have backups that at minimum are ping-ponged, current, and physicall=
y
>>> offline
>>> 7. goto #1...
>>
>> In most cases this is not a case of simultaneous failure due to common
>> disk wear or defects, or power supply events, or controller problems. In
>> most cases of apparent simultaneous failure Disk 2 has a bad sector that
>> has never been written to. Such a sector can remain undisturbed for the
>> life of the disk, or until the RAID software attempts to sync with anoth=
er
>> disk. When Disk 1 fails (and is noticed by the RAID software) and is
>> replaced the sync starts copying Disk 2 to the new Disk 1 and runs until
>> the bad sector on Disk 2 is encountered, at which point it announces the
>> fact that Disk 2 has failed. But it didn't fail during the sync - it was
>> probably bad from day 1, and if written to would have been remapped
>> transparently to the user and the Raid software.
>
> I don't understand what you are suggesting here.  When Disk 2 was made
> a fully functioning member of the RAID subsystem, why wasn't every
> block relevant to using it for recovery written at least once to
> initialize it?   Isn't that what a RAID build/rebuild guarantees?  I
> could see in the case of mirrored drives that the block was only
> written once and never read again (any reads fulfilled by a different
> drive), but I don't see how the block gets away with never being
> written at all.

Sorry, my mistake - the sector would have to go bad between the original=20
build and the other drive failing, but that may be years, not just the=20
time between the failure and completing the rebuild.

Daniel Feenberg

>
> Bill Bogstad
>




More information about the Discuss mailing list