Hi,
Recently, we upgrade a 4.11 box to 6.1-BETA2 by reinstall+newfs everything.
After that, we found that if hw.ata.ata_dma=1 at boot, then as soon as it
starts fsck -p, it panics. It happens only if ad0 is setted to UDMA66 or above.
My current solution is set hw.ata.ata_dma=0 in loader.conf and manually
turn DMA on ad0 to UDMA33 and rest ad4~ad7 to UDMA100. In the days of
4.x, there is something wrong with DMA on ad0, but it will fall back to
PIO4 automatically without problem. We have been tried to 1) change the
cable 2) change from primary ata controller to the second, 3) upgrade to
RELENG_6 as of March 11, but all these are failed. There is no options in
bios to turn off DMA for the onboard ATA controller.
The ata controller and ad0 is
atapci0: <VIA 82C686B UDMA100 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xffa0-0xffaf at device 7.1 on
pci0
atapci0: Reserved 0x10 bytes for rid 0x20 type 4 at 0xffa0
ata0: <ATA channel 0> on atapci0
atapci0: Reserved 0x8 bytes for rid 0x10 type 4 at 0x1f0
atapci0: Reserved 0x1 bytes for rid 0x14 type 4 at 0x3f6
ata0: reset tp1 mask=03 ostat0=50 ostat1=00
ata0: stat0=0x50 err=0x01 lsb=0x00 msb=0x00
ata0: stat1=0x00 err=0x01 lsb=0x00 msb=0x00
ata0: reset tp2 stat0=50 stat1=00 devices=0x1<ATA_MASTER>
ata0: [MPSAFE]
ata0-master: pio=PIO4 wdma=WDMA2 udma=UDMA100 cable=80 wire
ad0: setting PIO4 on 82C686B chip
ad0: setting UDMA100 on 82C686B chip
ad0: 38166MB <Seagate ST340016A 3.10> at ata0-master UDMA100
ad0: 78165360 sectors [19158C/16H/255S] 16 sectors/interrupt 1 depth queue
I'm pretty sure this HD is capable of UDMA100 (by the specification on
Seagate
website).
The console messages are:
/dev/ad0s1e: clean, 823031 free (447 frags, 102823 blocks, 0.0% fragmentation)
ad0: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=191
ad0: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=191
ad0: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=131647
ad0: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=131647
ad0: FAILURE - WRITE_DMA status=51<READY,DSC,ERROR>
error=84<ICRC,ABORTED> LBA=131647
g_vfs_done():ad0s1a[WRITE(offset=67371008, length=16384)]error = 5
[...]
kernel trap 12 with interrupts disabled
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x24
fault code = supervisor read, page not present
instruction pointer = 0x20:0xc04eef95
stack pointer = 0x28:0xe4c714f0
frame pointer = 0x28:0xe4c71500
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = resume, IOPL = 0
current process = 127 (cp)
[thread pid 127 tid 100028 ]
Stopped at turnstile_broadcast+0x9: movl 0x24(%eax),%eax
db> bt
Tracing pid 127 tid 100028 td 0xc474e000
turnstile_broadcast(0) at turnstile_broadcast+0x9
_mtx_unlock_sleep(c068aa60,0,0,0) at _mtx_unlock_sleep+0x6c
softdep_sync_metadata(c4958880) at softdep_sync_metadata+0x7d4
ffs_syncvnode(c4958880,1) at ffs_syncvnode+0x43d
ffs_truncate(c4958880,200,0,880,c4695d00,c474e000) at ffs_truncate+0x77e
ufs_direnter(c4958880,c49de880,e4c7192c,e4c71bd0,0) at ufs_direnter+0x85d
ufs_makeinode(81a4,c4958880,e4c71bbc,e4c71bd0) at ufs_makeinode+0x30f
ufs_create(e4c71a84) at ufs_create+0x37
VOP_CREATE_APV(c0670ec0,e4c71a84) at VOP_CREATE_APV+0x3c
VOP_CREATE(c4958880,e4c71bbc,e4c71bd0,e4c71ae0) at VOP_CREATE+0x34
vn_open_cred(e4c71ba8,e4c71cc4,1a4,c4695d00,4) at vn_open_cred+0x20c
vn_open(e4c71ba8,e4c71cc4,1a4,4) at vn_open+0x29
kern_open(c474e000,804c1c8,0,602,21b6) at kern_open+0xd4
open(c474e000,e4c71cf0) at open+0x22
syscall(3b,3b,3b,8060100,bfbfeec4) at syscall+0x337
Xint0x80_syscall() at Xint0x80_syscall+0x1f
--- syscall (5, FreeBSD ELF32, open), eip = 0x28137ccf, esp 0xbfbfec7c, ebp =
0xbfbfecc8 ---
db> call doadump
Cannot dump. No dump device defined.
The full dmesg (with boot_verbose) is available at
http://www.rafan.org/FreeBSD/ata/20060316-dmesg+db.txt
I did a alltrace in ddb:
http://www.rafan.org/FreeBSD/ata/20060311-dball.txt
Regards,
Rong-En Fan