Problems with raidhotadd (mdadm --add)

Wed Sep 10 16:35:46 EDT 2003

Hello Rich:

Things have improved, mainly because grub is using a root of /dev/md0.

There still is one big problem.  Rebooting is always creating one of
the disks to fail in the array.  Right before signing off, it says
"md: md0 still in use," a note left in /var/log/messages.  When I
rebooted the last time, I took notes from the console:

invalidate: busy buffers
invalidate: busy buffers     
invalidate: busy buffers (saw on a Google thread Linus doesn't think
...                       these matter at all)
invalidate: busy buffers
md: marking sb clean...
md: updating md0 RAID super block on device
md: ide/host0/bus0/target0/lun0/part6 (write)
ide/host0/bus0/target0/lun0/part6 sb offset: 6112576
md: ide/host0/bus1/target0/lun0/part6 (write)        [only bus, offset
ide/host0/bus1/target0/lun0/part6 sb offset: 6109888  have changed]
md: md0 switched to read-only mode
flushing ide devices: hda hdc hdd
power down

I read one place that the system may be closing off too quickly if APM
is in the kernel.  The only kernel option I have set is APM as a
module, so I'll disable that, but it does not appear with lsmod, so I
don't think it could be a factor.

To get things back it sync for a while, I run:

# mdadm /dev/md0 --add /dev/hda6

I'd rather not to that on this machine that gets turned off a lot.
The device hda6 is set to "Linux raid autodetect".

# mdadm  --detail /dev/md0
/dev/md0:
        Version : 00.90.00
  Creation Time : Sat Sep  6 15:04:56 2003
     Raid Level : raid1
     Array Size : 6109888 (5.83 GiB 6.26 GB)
    Device Size : 6109888 (5.83 GiB 6.26 GB)
   Raid Devices : 2
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Sep 10 13:30:05 2003
          State : dirty, no-errors
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0

    Number   Major   Minor   RaidDevice State
       0      22        6        0      active sync   /dev/hdc6
       1       3        6        1      active sync   /dev/hda6
           UUID : d77ccd49:017683f5:1fa1cf98:f9bb5bac
         Events : 0.73

After some Google time, I saw one person who claimed I should not
worry about the "State: dirty" since it indicates the the process has
been started, but not closed down cleanly.  raidstop /dev/md0 does not
work because / is still active.

After a wait, I see:

# cat /proc/mdstat
Personalities : [raid1] 
read_ahead 1024 sectors
md0 : active raid1 ide/host0/bus0/target0/lun0/part6[1]
ide/host0/bus1/target0/lun0/part6[0]
      6109888 blocks [2/2] [UU]

unused devices: <none>

This is where I want to stay (gotten here twice, failed on reboot
twice).  Any ideas beyond the kernel w/no APM games?

doug