Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
On Tue, Nov 3, 2009 at 2:12 PM, Stephen Adler <adler-wRvlPVLobi1/31tCrMuHxg at public.gmane.org> wrote: > Hi all, > > I'm putting together a backup system at my job and in doing so setup the > good ol' raid 5 array. While I was putting the disk array together, I > read that one could encounter a problem in which you replace a failed > drive, the rebuilding processes will trip over another bad sector in on > of the drives which was good before starting the rebuilding process and > thus you end up with a screwed up raid array. So I was thinking of a way > to avoid this problem. One solution is to kick off a job once a week or > month in which you force the whole raid array to be read. I was thinking > of possibly forcing a check sum of all the files I had stored on the > disk. Reading all the files (whether you checksum them or not) won't read all of the allocated blocks on the disk: 1. With Raid 5, the parity blocks are pm;u read if a drive error occurs when reading the data blocks. The result is that the parity blocks won't ever get read during your testing (unless a failure occurs). 2. If the filesystem you are using supports snapshots, you will only be reading the data blocks for the current version of the file. (You could read all the snapshots as well, but that is going to result in the same physical block on the disk being 'read' multiple times (once for each snapshot in which it is included).) If you have direct read access to the drives (partitions), you might try just reading from them directly. Any drive on which you get read errors can then be taken offline and a rebuild can be forced. I think this is slightly better then what you suggest below because you are at least taking a drive with a known problem (bad blocks) offline rather then ignoring all of the good data on the driver you are randomly picking to force an error. What I think you really want is RAID scrubbing. Here is a link to some GENTOO Linux RAID docs on the subject: http://en.gentoo-wiki.com/wiki/Software_RAID_Install#Data_Scrubbing If you are using hardware RAID, you should investigate similar commands for your hardware controller. >The other idea I had was to force one of the drives into a failed > state and then add it back in and thus force the raid to rebuild. The > rebuilding processes takes about 3 hours on my system which I could > easily execute at 2am every Sunday morning. And what if one of the drives you didn't take offline has a failure during that window? Bill Bogstad
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |