drive dropped from RAID5 set after reboot
Matthew Gillen
me at mattgillen.net
Thu Mar 15 21:43:04 EDT 2007
Tom Metro wrote:
> but it troubles me that it just disappeared on its own. dmesg doesn't
> seem to show anything interesting, other than the lack of sdb1 being
> picked up by md:
Does it's partition type match the others that did get detected?
>
> Other than the DegradedArray event, /var/log/daemon.log doesn't show
> anything interesting. smartd didn't report any problems with /dev/sdb.
> Then again, while looking into this I found:
>
> smartd[6370]: Device: /dev/hda, opened
> smartd[6370]: Device: /dev/hda, found in smartd database.
> smartd[6370]: Device: /dev/hda, is SMART capable. Adding to "monitor" list.
> ...
> smartd[6370]: Device: /dev/sda, opened
> smartd[6370]: Device: /dev/sda, IE (SMART) not enabled, skip device Try
> 'smartctl -s on /dev/sda' to turn on SMART features
> ...
> smartd[6370]: Device: /dev/sdb, IE (SMART) not enabled...
> smartd[6370]: Device: /dev/sdc, IE (SMART) not enabled...
> smartd[6370]: Device: /dev/sdd, IE (SMART) not enabled...
> smartd[6370]: Monitoring 1 ATA and 0 SCSI devices
>
> So it looks like the drives in the RAID array weren't being monitored by
> smartd. Running the suggested command:
>
> # smartctl -s on /dev/sda
> smartctl version 5.36 ...
> unable to fetch IEC (SMART) mode page [unsupported field in scsi
> command]
> A mandatory SMART command failed: exiting. To continue, add one or more
> '-T permissive' options.
>
> Seems it doesn't like these SATA drives. I'll have to investigate
> further...
The older libata doesn't provide most of the ioctls needed for smart/hdparm
on SATA drives (even though the drives themselves support SMART). Don't
know if that situation has changed lately, it's been quite a while since I
had a working SATA drive (I reverted back to IDE because they're cheap and
my SATA controller had very spotty linux support at the time).
> I've noticed the device names have changed as of a reboot last weekend.
> Probably due to upgrades to the udev system. The array was originally
> setup with /dev/sda1 ... /dev/sdd1 and the output from /proc/mdstat
> prior to a reboot last week showed:
> md1 : active raid5 sda1[0] sdd1[3] sdc1[2] sdb1[1]
>
> and now shows:
> md1 : active raid5 dm-7[4] dm-6[3] dm-5[2] dm-4[0]
>
> but if that was the source of the problem, I'd expect it to throw off
> all the devices, not just one of the drives.
My aforementioned SATA controller gave me fits during one kernel upgrade
where the driver switched from treating them as ide (hdX) devices to scsi
(sdX). In any event, if the partition types for some of the drives were set
to RAID-autodetect and one wasn't, I could see how it might reconstruct it
partially. If that's not the root cause, I haven't the slightest idea what
could be...
> <snip>
> so maybe the array is being initially setup by initrd, and then being
> setup again at a later stage.
Almost certainly not, at least not the same arrays. Each array is setup
once. What that config file is telling you is that you don't have to set
them up all at once. (ie if you have some array that needs a proprietary
driver that you can't put in initrd, then you'd need to delay initialization
of that particular array until the root filesystem is mounted)
> This system doesn't have its root file
> system on the array, so I'm going to switch 'all' to 'none'.
If the modules in initrd match the ones in /lib/modules... for a given
kernel, then there /should/ be absolutely no difference whether you
configure your raid array before or after your root filesystem is mounted.
But it probably won't hurt, and I've been known to be wrong on occasion ;-)
Matt
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
More information about the Discuss
mailing list