I've had a very odd problem with a -stable system on an Asus A7V333-raid,
which has a Promise raid controller on the motherboard. For several days
in a row the system lost its raid0 array during the 3am daily run, leaving
it with no disk. The raid was actually turned off in the bios, with
manual intervention required on reboot to turn it back on. I suspected
hardware, but in desperation booted a -stable kernel from 10/3/03. That
kernel survived the daily run, and reported the following:
Oct 14 14:41:43 192.168.24.4 /kernel.maybe.ok: ad6: hard error reading fsbn
133757952 of 0-127 (ad6 bn 133757952; cn 132696 tn 6 sn 6) trying PIO mode
(I should note that I added a script in /usr/local/etc/periodic/daily to
back up this system, so files are read that normally see no access.)
I suspect that something in the newer -stable kernel reacted to this hard
error by doing, intentionally or not, an ioctl ATARAIDDELETE. Since
the error has since been remapped, I can't easily test this idea,
but thought I should report it in case it triggers a eureka moment
in a developer.
The syndrome appears only in response to a disk error; I've been running
a -stable kernel from 10/16/03 with no problem after the bad block
was remapped. I added code to log and nop ata_raid_destroy, so I hope
to notice if it ever happens again.
--
Barney Wolff http://www.databus.com/bwresume.pdf
I'm available by contract or FT, in the NYC metro area or via the 'Net.