Martin Horneffer
2006-Mar-22 13:06 UTC
"TIMEOUT - WRITE_DMA" with SiI 3512 SATA on IBM eServer 326
Hi, I have a problem, probably with the SiI 3512 SATA150 controller in a dual-Opteron IBM eServer 326: Every once a while the kernel issues a message like: ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=150190687 The system waits a few seconds and continues to work normally. It typically occurs several times a day most likely depending on the load of the (SATA connected) hard drive. We have two machines of the same hardware configuration, both with two hard drives (identical type). The problem is the same with all the 4 drives on both machines. Thus I assume it's more a driver problem than a bad SATA cable. We are currently using one of the machines with FreeBSD 5-stable (RELENG_5) and the other with some Linux. While Linux didn't have a problem with the hardware, FreeBSD did. We tried 5.4-Release and 6.0-Release both with i386 as well as with amd64. We found that only 5.4-Release on amd64 was able to install, even though with some warning. The other versions failed to install at all. After successful installation we noticed two problems: - After a couple of uptime hours top stopped to report CPU utilization numbers (all 0). This went away by changing the timercounter hardware from ACPI-fast to i8254 (kern.timecounter.hardware=i8254 in /etc/sysctl.conf). - The "TIMEOUT - WRITE_DMA" messages occur from time to time, always stopping the system for a few seconds (probably all processes trying to access the hard drive). So far I didn't manage to solve the latter. I upgraded to 5-stable (RELENG_5) with cvsup (last time today) but the problem is still the same. Besides the occasional hickups the machine runs fine. The SATA controller reports as "SiI 3512A SATALink BIOS Version 4.3.47" during BIOS startup. I'll attach the last dmesg output. Any suggestions? Best regards, Martin -- Dr. Martin Horneffer -- maho@nic.dtag.de Deutsche Telekom AG T-Com Technology Engineering Internet Backbone Architecture -------------- next part -------------- Copyright (c) 1992-2006 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.5-PRERELEASE #1: Mon Mar 20 16:24:38 CET 2006 root@xxxx.NIC.DTAG.DE:/usr/obj/usr/src/sys/XXXX Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Opteron(tm) Processor 248 (2193.17-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0xf5a Stepping = 10 Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow+,3DNow> real memory = 2146893824 (2047 MB) avail memory = 2063441920 (1967 MB) ACPI APIC Table: <PTLTD APIC > FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 MADT: Forcing active-low polarity and level trigger for SCI ioapic0 <Version 1.1> irqs 0-23 on motherboard ioapic1 <Version 1.1> irqs 24-27 on motherboard ioapic2 <Version 1.1> irqs 28-31 on motherboard acpi0: <PTLTD XSDT> on motherboard acpi0: Power Button (fixed) unknown: I/O range not supported unknown: I/O range not supported Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 cpu0: <ACPI CPU> on acpi0 powernow0: <Cool`n'Quiet K8> on cpu0 device_attach: powernow0 attach returned 6 cpu1: <ACPI CPU> on acpi0 powernow1: <Cool`n'Quiet K8> on cpu1 device_attach: powernow1 attach returned 6 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0x8080-0x80ff,0x8000-0x807f,0xcf8-0xcff iomem 0xd8000-0xdbfff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 6.0 on pci0 pci1: <ACPI PCI bus> on pcib1 ohci0: <OHCI (generic) USB controller> mem 0xfc100000-0xfc100fff irq 19 at device 0.0 on pci1 usb0: OHCI version 1.0, legacy support usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 3 ports with 3 removable, self powered ohci1: <OHCI (generic) USB controller> mem 0xfc101000-0xfc101fff irq 19 at device 0.1 on pci1 usb1: OHCI version 1.0, legacy support usb1: <OHCI (generic) USB controller> on ohci1 usb1: USB revision 1.0 uhub1: AMD OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 3 ports with 3 removable, self powered pci1: <display, VGA> at device 5.0 (no driver attached) atapci0: <SiI 3512 SATA150 controller> port 0x2400-0x240f,0x2410-0x2413,0x2418-0x241f,0x2414-0x2417,0x2420-0x2427 mem 0xfc103000-0xfc1031ff irq 17 at device 6.0 on pci1 ata2: channel #0 on atapci0 ata3: channel #1 on atapci0 isab0: <PCI-ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 atapci1: <AMD 8111 UDMA133 controller> port 0x1020-0x102f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0 ata0: channel #0 on atapci1 ata1: channel #1 on atapci1 pci0: <serial bus, SMBus> at device 7.2 (no driver attached) pci0: <bridge> at device 7.3 (no driver attached) pcib2: <ACPI PCI-PCI bridge> at device 10.0 on pci0 pci2: <ACPI PCI bus> on pcib2 bge0: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2003> mem 0xfe000000-0xfe00ffff,0xfe010000-0xfe01ffff irq 24 at device 1.0 on pci2 miibus0: <MII bus> on bge0 brgphy0: <BCM5704 10/100/1000baseTX PHY> on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge0: Ethernet address: 00:11:25:1e:23:a4 bge1: <Broadcom BCM5704C Dual Gigabit Ethernet, ASIC rev. 0x2003> mem 0xfe020000-0xfe02ffff,0xfe030000-0xfe03ffff irq 25 at device 1.1 on pci2 miibus1: <MII bus> on bge1 brgphy1: <BCM5704 10/100/1000baseTX PHY> on miibus1 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto bge1: Ethernet address: 00:11:25:1e:23:a5 pci0: <base peripheral, interrupt controller> at device 10.1 (no driver attached) pcib3: <ACPI PCI-PCI bridge> at device 11.0 on pci0 pci3: <ACPI PCI bus> on pcib3 pci0: <base peripheral, interrupt controller> at device 11.1 (no driver attached) sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A, console orm0: <ISA Option ROMs> at iomem 0xcb000-0xcf7ff,0xc9800-0xcafff,0xc8000-0xc97ff,0xc0000-0xc7fff on isa0 atkbdc0: <Keyboard controller (i8042)> at port 0x64,0x60 on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x100> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: CDROM <CD-224E/2.9B> at ata1-master PIO4 ad4: 76324MB <ST380013AS/3.45> [155072/16/63] at ata2-master SATA150 ad6: 76324MB <ST380013AS/3.25> [155072/16/63] at ata3-master SATA150 SMP: AP CPU #1 Launched! Mounting root from ufs:/dev/ad4s1a ad4: TIMEOUT - WRITE_DMA retrying (2 retries left) LBA=8319