Michael Reifenberger
2006-Mar-23 10:16 UTC
strange deadlock and magic resurrection with RELENG_6
Hi, I'm using a recent RELENG_6 under I386/SMP (Athlon X2 4800+). dmesg output is under http://people.freebsd.org/~mr/dmesg.log.gz Root is on gmirror volume (2 SATA disks), a backup FS is on graid3 (5 firewire disks). This server acts as an bacula server. During backup with bacula I discovered an complete system freeze (no keyboard, nfs, disk...) after the following lines on the screen: ... ad0: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=108916879 ad1: TIMEOUT - WRITE_DMA retrying (1 retry left) LBA=116030287 ad1: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=108911183 ad1: WARNING - WRITE_DMA UDMA ICRC error (retrying request) LBA=108378767 Since I could ping the system and after waiting a couple of hours in the hope the system would would resurrection by itself, I issued an flood-ping to this machine and voila, after getting the following lines: ... Limiting icmp ping response from 261 to 200 packets/sec Limiting icmp ping response from 283 to 200 packets/sec ... Anything went back to normality! This seems to me that we have an deadlock condition somewhere in the kernel. But how to debug this issue when anything is frozen? BTW: I've got the DMA errors in the past allready which seems to be an interaction between ata and some geom modules. See a former post from me regarding this issue. Maybe the same issue got fatal now after the latest gmirror/graid3 changes? Has anyone else seen this? Bye/2 --- Michael Reifenberger, Business Development Manager SAP-Basis, Plaut Consulting Comp: Michael.Reifenberger@plaut.de | Priv: Michael@Reifenberger.com http://www.plaut.de | http://www.Reifenberger.com