Home
| Calendar
| Mail Lists
| List Archives
| Desktop SIG
| Hardware Hacking SIG
Wiki | Flickr | PicasaWeb | Video | Maps & Directions | Installfests | Keysignings Linux Cafe | Meeting Notes | Linux Links | Bling | About BLU |
Jerry noted... > One of the failings of the technical community (I include Linux, Unix, > Windows, Mac, etc) is that we do not pay a lot of attention to helping > our user communities to properly recover from errors. Which reminded me to point out something... there are two sets of software tools to manage software RAID. They are mdadm and raidtools. Currently I think they both have the same basic capabilities. But in the future I would expect mdadm to become much widespread and to get more new features. So for my own setup I switched to mdadm instead of raidtools. As for the original question, which was how to replace a failed drive element, I'll address that with another point to anyone else here who owns a software RAID setup: you should--right now if you haven't ever done so--test and verify your installation. You should be able to do the following sequence: - pull out an active sync'ed drive, run the system for a while (with no interruption) - see an alert come up from your monitoring tool - reboot the system to make sure it comes up properly - see that your monitoring tool notifies you of a degraded array at boot - put the drive back in and re-sync it with the array - see that you no longer get alerts I really emphasize the monitoring tool. It does *no good* to have a degraded array running for months at a time. If you don't have monitoring, then you will never notice a degraded array. Mdadm does this in a very straightforward way, I don't know if raidtools has this feature. SuSE now has this tool built into their installation script, but you can easily add the command '/sbin/mdadm -F -d 60 -m username at myhost -s -c /etc/mdadm.conf' to any system. Don't run without it. If you have hardware RAID, you need to figure out how to set up monitoring. I threw out a hardware RAID controller recently because I couldn't figure out how to do monitoring. A two-drive software RAID1 disk mirror setup on Linux will deliver 95% of the performance that any hardware RAID will provide; you really only need hardware RAID if you're running a larger configuration. The recovery steps for replacing a failed drive are: - Install the drive, reboot - Create the RAID partition(s) using fdisk, partition-id type 'fd', same size as your existing drive's RAID partion(s) - Issue raidhotadd (if raidtools) or mdadm --manage --add to start the sync - You can watch the sync in progress via 'cat /proc/mdstat' - Re-test the configuration (with monitoring) as noted above Referring back to some performance problems that I had a couple weeks ago, one handy command to remember is hdparm, which will tell you about DMA and other settings of your drives. -rich
BLU is a member of BostonUserGroups | |
We also thank MIT for the use of their facilities. |