Kernel version 2.6 -- RAID performance woes?
Rich Braun
richb at pioneer.ci.net
Wed Nov 16 10:15:41 EST 2005
My experience with replacing a 100% reliable 6-year-old Linux file server has,
so far, made me consider reverting back to a 1999-vintage kernel running on
that old vintage hardware.
The thing I didn't bargain for with this upgrade was the bloated buggy
inefficiency of 2005-vintage Linux kernel code. Wow, is this not your
grandfather's Oldsmobile (er, Linus' terminal emulator)! Even after running a
3-hour compile following a 15-minute 50-meg download from ftp.funet.fi, I get
weird syslog messages like the following runing the 2.6.14 kernel:
Nov 15 16:33:18 cumbre kernel: hda: DMA timeout retry
Nov 15 16:33:18 cumbre kernel: hda: timeout waiting for DMA
Nov 15 16:33:18 cumbre kernel: hda: status timeout: status=0xd0 { Busy }
Nov 15 16:33:18 cumbre kernel: ide: failed opcode was: unknown
Nov 15 16:33:18 cumbre kernel: hda: no DRQ after issuing MULTWRITE_EXT
Nov 15 16:33:19 cumbre kernel: ide0: reset: success
...
Nov 15 17:18:02 cumbre kernel: BUG: soft lockup detected on CPU#0!
Nov 15 17:18:02 cumbre kernel: Pid: 1178, comm: md0_raid1
Nov 15 17:18:02 cumbre kernel: EIP: 0060:[<cf83966f>] CPU: 0
Nov 15 17:18:02 cumbre kernel: EIP is at ide_intr+0x8f/0x120 [ide_core]
The latter one (soft lockup) has been seen only since upgrading from 2.6.13;
the former is frequently seen in syslog with both my new motherboard and the
other one I used to preconfigure the system.
But my main gripe about 2.6 is software RAID performance. It's stunningly
worse than under 2.4. On version 2.4, you see a process named raid1d that
never racks up any runtime, and a couple of related ones (bdflush,
mdrecoveryd) that have mere seconds of runtime after 3 weeks of uptime. On
version 2.6, a process called md0_raid1 sucks up so much runtime (at nice
level minus-5) during file creation that the system is brought to its knees.
I use this server mainly for MP3 playback. When I created the files 3 or 4
years ago, I set up four CD readers and ripped a 1000-disc collection with 4
simultaneous streams. Today if I run a single stream of CD ripping, the
server is so sluggish that if I am playing back an MP3 file, Samba gets
CPU-starved and playback is choppy. This is on a 667-MHz processor which is
more than ample to handle a few streams of 100-megabit file uploading.
I'm *amazed* at how Linux has evolved!
So my question is what to do about this: is there a version 2.6 kernel with
stable software RAID performance that I could revert to, or do I have to go
all the way back to 2.4? Where do I look online these days for technical
discussions of software RAID in the 2.6 kernel?
-rich
More information about the Discuss
mailing list