Hello all,
I've run across a problem that I hope someone can aid me with.
I have a fileserver that currently has a 4-disc raid connected to an IDE 3ware
card. I had hoped to
replace this dying system with a pair of synchronized 1TB SATA drives. When
trying to newfs them
both eventually failed with DMA READ or WRITE timeouts. Here's some infos:
FreeBSD rum.dub.net 6.2-STABLE FreeBSD 6.2-STABLE #2: Sat Jul 21 09:05:25 PDT
2007
unfurl@rum.dub.net:/usr/obj/usr/src/sys/GENERIC i386
<snip from dmesg>
ad0: 43979MB <IBM DTLA-307045 TX6OA50C> at ata0-master UDMA100 <--
system disk
ad4: 953869MB <Hitachi HDS721010KLA330 GKAOA70F> at ata2-master SATA150
ad6: 953869MB <Hitachi HDS721010KLA330 GKAOA70F> at ata3-master SATA150
twed0: <Unit 0, RAID5, Normal> on twe0
twed0: 583440MB (1194885120 sectors)
A complete dmesg is at http://dub.net/rum.dub.net.dmesg
Initially the attempted newfs would cause this:
Jul 21 00:21:45 rum kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left)
LBA=54194911
Jul 21 00:22:20 rum kernel: ad4: TIMEOUT - WRITE_DMA retrying (1 retry left)
LBA=107260543
Jul 21 00:22:57 rum kernel: ad4: FAILURE - device detached
Jul 21 00:22:57 rum kernel: subdisk4: detached
Jul 21 00:22:57 rum kernel: ad4: detached
Jul 21 00:24:19 rum kernel: ad6: FAILURE - device detached
Jul 21 00:24:19 rum kernel: subdisk6: detached
Jul 21 00:24:19 rum kernel: ad6: detached
After several tries I was able to get both disks newfs'd and mounted but
they quickly fell down with
DMA timeouts. On one occasion the machine actually panic'd too:
ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=1456106111
ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=1456106111
ad4: FAILURE - WRITE_DMA48 timed out LBA=1456106111
ad4: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=54194911
ad4: TIMEOUT - WRITE_DMA48 retrying (1 retry left) LBA=461407775
ad4: TIMEOUT - WRITE_DMA48 retrying (0 retries left) LBA=461407775
ad4: FAILURE - WRITE_DMA48 timed out LBA=461407775
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x66
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc07253c3
stack pointer = 0x28:0xd9724b9c
frame pointer = 0x28:0xd9724ba4
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 779 (mdnsd)
trap number = 12
panic: page fault
I've read that bad SATA cables could cause this, the cables I'm using
are brand new but are probably
pretty cheap.
Help freebsd-stable, you're my only hope! :)
-Bill
--
-=| Bill Swingle - unfurl@dub.net