O. Hartmann
2005-Aug-09 00:25 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Hello. My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64 boxed (see dmesg). One of my SATA disks, the SAMSUNG SP2004C seems to show errors during operation (and also showd under 5.4-RELEASE-p3). Sometimes I get this error: ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 while the machine still keeps working. Other days the box crashes completely. Is this a operating system bug or is this message an evidence of defective hardware? By the way, DMA support is enabled: hw.ata.ata_dma: 1 hw.ata.atapi_dma: 1 Thanks in advance,\ Oliver -------------- next part -------------- Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 6.0-BETA2 #23: Sun Aug 7 23:32:03 UTC 2005 root@thor.schanze.de:/usr/backup/obj/usr/src/sys/THOR Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 Processor 3500+ (2211.34-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0x10ff0 Stepping = 0 Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> AMD Features=0xe2500800<SYSCALL,NX,MMX+,<b25>,LM,3DNow+,3DNow> real memory = 2147418112 (2047 MB) avail memory = 2064375808 (1968 MB) ACPI APIC Table: <Nvidia AWRDACPI> ioapic0 <Version 1.1> irqs 0-23 on motherboard netsmb_dev: loaded acpi0: <Nvidia AWRDACPI> on motherboard acpi_bus_number: can't get _ADR acpi_bus_number: can't get _ADR acpi0: Power Button (fixed) acpi_bus_number: can't get _ADR acpi_bus_number: can't get _ADR acpi_bus_number: can't get _ADR acpi_bus_number: can't get _ADR pci_link0: <ACPI PCI Link LNK1> irq 10 on acpi0 pci_link1: <ACPI PCI Link LNK2> on acpi0 pci_link2: <ACPI PCI Link LNK3> irq 5 on acpi0 pci_link3: <ACPI PCI Link LNK4> on acpi0 pci_link4: <ACPI PCI Link LNK5> on acpi0 pci_link5: <ACPI PCI Link LUBA> irq 5 on acpi0 pci_link6: <ACPI PCI Link LUBB> on acpi0 pci_link7: <ACPI PCI Link LMAC> irq 11 on acpi0 pci_link8: <ACPI PCI Link LACI> irq 3 on acpi0 pci_link9: <ACPI PCI Link LMCI> on acpi0 pci_link10: <ACPI PCI Link LSMB> irq 11 on acpi0 pci_link11: <ACPI PCI Link LUB2> irq 3 on acpi0 pci_link12: <ACPI PCI Link LIDE> on acpi0 pci_link13: <ACPI PCI Link LSID> irq 11 on acpi0 pci_link14: <ACPI PCI Link LFID> irq 10 on acpi0 pci_link15: <ACPI PCI Link LPCA> on acpi0 pci_link16: <ACPI PCI Link APC1> irq 0 on acpi0 pci_link17: <ACPI PCI Link APC2> irq 0 on acpi0 pci_link18: <ACPI PCI Link APC3> irq 0 on acpi0 pci_link19: <ACPI PCI Link APC4> irq 0 on acpi0 pci_link20: <ACPI PCI Link APC5> irq 16 on acpi0 pci_link21: <ACPI PCI Link APCF> irq 0 on acpi0 pci_link22: <ACPI PCI Link APCG> irq 0 on acpi0 pci_link23: <ACPI PCI Link APCH> irq 0 on acpi0 pci_link24: <ACPI PCI Link APCJ> irq 0 on acpi0 pci_link25: <ACPI PCI Link APCK> irq 0 on acpi0 pci_link26: <ACPI PCI Link APCS> irq 0 on acpi0 pci_link27: <ACPI PCI Link APCL> irq 0 on acpi0 pci_link28: <ACPI PCI Link APCZ> irq 0 on acpi0 pci_link29: <ACPI PCI Link APSI> irq 0 on acpi0 pci_link30: <ACPI PCI Link APSJ> irq 0 on acpi0 pci_link31: <ACPI PCI Link APCP> irq 0 on acpi0 Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0 cpu0: <ACPI CPU> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci_link26: BIOS IRQ 11 for -2145766612.1.INTA is invalid pci_link21: BIOS IRQ 5 for -2145766612.2.INTA is invalid pci_link27: BIOS IRQ 3 for -2145766612.2.INTB is invalid pci_link23: BIOS IRQ 11 for -2145766612.10.INTA is invalid pci_link24: BIOS IRQ 3 for -2145766612.4.INTA is invalid pci_link29: BIOS IRQ 11 for -2145766612.7.INTA is invalid pci_link30: BIOS IRQ 10 for -2145766612.8.INTA is invalid pci0: <ACPI PCI bus> on pcib0 pci_link26: Unable to choose an IRQ pci_link21: Unable to choose an IRQ pci_link27: Unable to choose an IRQ pci_link24: Unable to choose an IRQ pci_link29: Unable to choose an IRQ pci_link30: Unable to choose an IRQ pci_link23: Unable to choose an IRQ pci0: <memory> at device 0.0 (no driver attached) isab0: <PCI-ISA bridge> at device 1.0 on pci0 isa0: <ISA bus> on isab0 ichsmb0: <SMBus controller> port 0xe400-0xe41f,0x4c00-0x4c3f,0x4c40-0x4c7f irq 20 at device 1.1 on pci0 ichsmb0: [GIANT-LOCKED] smbus0: <System Management Bus> on ichsmb0 smb0: <SMBus generic I/O> on smbus0 ohci0: <OHCI (generic) USB controller> mem 0xd8104000-0xd8104fff irq 21 at device 2.0 on pci0 ohci0: [GIANT-LOCKED] usb0: OHCI version 1.0, legacy support usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 10 ports with 10 removable, self powered ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfeb00000-0xfeb000ff irq 22 at device 2.1 on pci0 ehci0: [GIANT-LOCKED] usb1: EHCI version 1.0 usb1: companion controller, 4 ports each: usb0 usb1: <EHCI (generic) USB 2.0 controller> on ehci0 usb1: USB revision 2.0 uhub1: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub1: 10 ports with 10 removable, self powered pcm0: <nVidia nForce4> port 0xdc00-0xdcff,0xe000-0xe0ff mem 0xd8103000-0xd8103fff irq 23 at device 4.0 on pci0 pcm0: [GIANT-LOCKED] pcm0: <Avance Logic ALC850 AC97 Codec> atapci0: <nVidia nForce4 UDMA133 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 6.0 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 atapci1: <nVidia nForce4 SATA150 controller> port 0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xd800-0xd80f mem 0xd8102000-0xd8102fff irq 21 at device 7.0 on pci0 ata2: <ATA channel 0> on atapci1 ata3: <ATA channel 1> on atapci1 atapci2: <nVidia nForce4 SATA150 controller> port 0x9e0-0x9e7,0xbe0-0xbe3,0x960-0x967,0xb60-0xb63,0xc400-0xc40f mem 0xd8101000-0xd8101fff irq 22 at device 8.0 on pci0 ata4: <ATA channel 0> on atapci2 ata5: <ATA channel 1> on atapci2 pcib1: <ACPI PCI-PCI bridge> at device 9.0 on pci0 pci_link17: BIOS IRQ 21 for 0.7.INTA is invalid pci_link18: BIOS IRQ 22 for 0.8.INTA is invalid pci_link19: BIOS IRQ 23 for 0.10.INTA is invalid pci5: <ACPI PCI bus> on pcib1 pci_link16: Unable to choose an IRQ fwohci0: <Texas Instruments TSB43AB22/A> mem 0xd8004000-0xd80047ff,0xd8000000-0xd8003fff irq 16 at device 11.0 on pci5 fwohci0: OHCI version 1.10 (ROM=1) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:11:d8:00:00:12:53:30 fwohci0: Phy 1394a available S400, 2 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: <IEEE1394(FireWire) bus> on fwohci0 fwohci0: Initiate bus reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) nve0: <NVIDIA nForce MCP9 Networking Adapter> port 0xb000-0xb007 mem 0xd8100000-0xd8100fff irq 23 at device 10.0 on pci0 nve0: Ethernet address 00:11:d8:92:a3:15 miibus0: <MII bus> on nve0 ukphy0: <Generic IEEE 802.3u media interface> on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto nve0: Ethernet address: 00:11:d8:92:a3:15 nve0: [GIANT-LOCKED] pcib2: <ACPI PCI-PCI bridge> at device 11.0 on pci0 pci4: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> at device 12.0 on pci0 pci3: <ACPI PCI bus> on pcib3 pcib4: <ACPI PCI-PCI bridge> at device 13.0 on pci0 pci2: <ACPI PCI bus> on pcib4 pcib5: <ACPI PCI-PCI bridge> at device 14.0 on pci0 pci1: <ACPI PCI bus> on pcib5 pci_link18: Unable to choose an IRQ pci1: <display, VGA> at device 0.0 (no driver attached) acpi_tz0: <Thermal Zone> on acpi0 fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77b irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/16 bytes threshold ppbus0: <Parallel port bus> on ppc0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: model IntelliMouse, device ID 3 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <8 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 fb0 at vga0 Timecounter "TSC" frequency 2211343498 Hz quality 800 Timecounters tick every 1.000 msec Fast IPsec: Initialized Security Association Processing. acd0: DVDR <NEC DVD RW ND-3500AG/2.19> at ata0-master UDMA33 ad8: 194481MB <Maxtor 6B200M0 BANC1B70> at ata4-master SATA150 ad10: 190782MB <SAMSUNG SP2004C VM100-31> at ata5-master SATA150 cd0 at ata0 bus 0 target 0 lun 0 cd0: <_NEC DVD_RW ND-3500AG 2.19> Removable CD-ROM SCSI-0 device cd0: 33.000MB/s transfers cd0: cd present [2295104 x 2048 byte records] GEOM_LABEL: Label for provider acd0 is iso9660/CDROM. Trying to mount root from ufs:/dev/ad8s1a pflog0: promiscuous mode enabled WARNING pid 525 (nasd): ioctl sign-extension ioctl ffffffffc0106924 ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Mike Tancsa
2005-Aug-09 03:28 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
At 08:25 PM 08/08/2005, O. Hartmann wrote:>Hello. > >My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64 boxed >(see dmesg). >One of my SATA disks, the SAMSUNG SP2004C seems to show errors during >operation (and also showd under 5.4-RELEASE-p3). >Sometimes I get this error: >ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 >while the machine still keeps working. >Other days the box crashes completely. > >Is this a operating system bug or is this message an evidence of defective >hardware?You can probably confirm a hardware issue with the smartmon tools. (/usr/ports/sysutils/smartmontools). It was quite handy the other day for us to narrow down a problem between a drive tray and the actual drive. We started to see Aug 3 02:02:49 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=391423 Aug 3 02:03:00 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2304319 Aug 3 02:03:10 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2312927 Aug 3 02:03:17 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2308639 Aug 3 02:03:26 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2309855 Aug 3 02:03:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2348359 Aug 4 12:12:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=1528639 Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=1530031 Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=1528639 Aug 4 12:13:04 verify1 kernel: ad0: FAILURE - READ_DMA timed out Aug 4 12:13:04 verify1 kernel: spec_getpages:(ad0s1a) I/O read failure: (error=5) bp 0xd630b4fc vp 0xc2640d68 Yet when we read the actual error info off the drive via smartctl -a ad0, it was clean. So it pointed to the drive tray which we swapped and all was well. In other situations however, the smart info will often tell you if the drive is starting to fail. Its not 100% reliable, but since we started using it, it generally gave us some sort of heads up as to whether or not a drive is in trouble. ---Mike
O. Hartmann
2005-Aug-09 08:23 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Mike Tancsa wrote:> At 08:25 PM 08/08/2005, O. Hartmann wrote: > >> Hello. >> >> My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64 >> boxed (see dmesg). >> One of my SATA disks, the SAMSUNG SP2004C seems to show errors during >> operation (and also showd under 5.4-RELEASE-p3). >> Sometimes I get this error: >> ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 >> while the machine still keeps working. >> Other days the box crashes completely. >> >> Is this a operating system bug or is this message an evidence of >> defective hardware? > > > You can probably confirm a hardware issue with the smartmon tools. > (/usr/ports/sysutils/smartmontools). > > It was quite handy the other day for us to narrow down a problem between > a drive tray and the actual drive. We started to see > > Aug 3 02:02:49 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=391423 > Aug 3 02:03:00 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2304319 > Aug 3 02:03:10 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2312927 > Aug 3 02:03:17 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2308639 > Aug 3 02:03:26 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2309855 > Aug 3 02:03:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2348359 > Aug 4 12:12:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=1528639 > Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=1530031 > Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=1528639 > Aug 4 12:13:04 verify1 kernel: ad0: FAILURE - READ_DMA timed out > Aug 4 12:13:04 verify1 kernel: spec_getpages:(ad0s1a) I/O read failure: > (error=5) bp 0xd630b4fc vp 0xc2640d68 > > Yet when we read the actual error info off the drive via smartctl -a > ad0, it was clean. So it pointed to the drive tray which we swapped and > all was well. In other situations however, the smart info will often > tell you if the drive is starting to fail. Its not 100% reliable, but > since we started using it, it generally gave us some sort of heads up as > to whether or not a drive is in trouble. > > > ---MikeDear Mike. Thanks a lot for this info. I will use this tool and try to report what I found out. I also use trays for my drives (like I did with SCSI and SCA2 on our servers at the lab). Maybe this could be an issue. Oliver
Chuck Swiger
2005-Aug-09 22:24 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
O. Hartmann wrote: [ ... ]> One of my SATA disks, the SAMSUNG SP2004C seems to show errors during > operation (and also showd under 5.4-RELEASE-p3). > Sometimes I get this error: > ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 > while the machine still keeps working. > Other days the box crashes completely. > > Is this a operating system bug or is this message an evidence of > defective hardware?Back up any data you care about now. Use the smartmontools port or hunt down a utility from Samsung which'll do a surface test (read only, nondestructive). You can also run a "dd if=/dev/ad10 of=/dev/null bs=8192" to do a full read test under FreeBSD, and see how many CRC errors show up. -- -Chuck
O. Hartmann
2005-Aug-10 05:59 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Chuck Swiger wrote:> O. Hartmann wrote: > [ ... ] > >> One of my SATA disks, the SAMSUNG SP2004C seems to show errors >> during operation (and also showd under 5.4-RELEASE-p3). >> Sometimes I get this error: >> ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 >> while the machine still keeps working. >> Other days the box crashes completely. >> >> Is this a operating system bug or is this message an evidence of >> defective hardware? > > You can also run a "dd if=/dev/ad10 of=/dev/null bs=8192" to do a full > read test under FreeBSD, and see how many CRC errors show up. >I did so and I ran into a crash of the system ... I changed the cabling, did it again and until now nothing happend ... hope it was only a cabling issue. The first time I use ATA/SATA and now these experiences ... When is SCSI back for desktops?
Joel Rees
2005-Aug-10 07:21 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
On ?? 17/08/10, at 7:36, O. Hartmann wrote:> [...] When is SCSI back for desktops?I vote for that. In my opinion, ATA is primarily for home media systems, if that. Joel Rees <rees@ddcom.co.jp> digitcom, inc. ???????? Kobe, Japan +81-78-672-8800 ** <http://www.ddcom.co.jp> **
Andrey V. Elsukov
2005-Aug-10 09:38 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
O. Hartmann wrote:> Sometimes I get this error: > ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 > while the machine still keeps working.Check your disks with MHDD (http://mhdd.com/). -- WBR, Andrey V. Elsukov
O. Hartmann
2005-Aug-10 17:45 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Mike Jakubik wrote:> On Wed, August 10, 2005 6:37 am, Dmitry Mityugov said: > > >>There are Maxtor MaXLine II and III, and perhaps several other models, >>that are supposed to work 24/7. > > > Right, i have a dead 250GB Maxline Plus II drive on my desk, only after > about 1.5 years. At least its still on warranty.On the other hand: In the department for physics of the athmosphere, where I built six years ago a server for meteorological data, a RAID-5 with 4 older IBM U160 SCSI discs still works - 24/7. Never had a problem!
Unix
2005-Aug-10 17:49 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
O. Hartmann wrote:> Mike Jakubik wrote: > >> On Wed, August 10, 2005 6:37 am, Dmitry Mityugov said: >> >> >>> There are Maxtor MaXLine II and III, and perhaps several other models, >>> that are supposed to work 24/7. >> >> >> >> Right, i have a dead 250GB Maxline Plus II drive on my desk, only after >> about 1.5 years. At least its still on warranty. > > > On the other hand: In the department for physics of the athmosphere, > where I built six years ago a server for meteorological data, a RAID-5 > with 4 older IBM U160 SCSI discs still works - 24/7. Never had a problem! >I still own old 1-2 GB old SCSI disks and these are still working, I also had an old 500mb SCSI disk that was in an old Mac that also worked but I trashed it since it was that old and no longer of use...
Karl Denninger
2005-Aug-10 20:54 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
On Wed, Aug 10, 2005 at 08:36:39PM +0200, S?ren Schmidt wrote:> > On 10/08/2005, at 20:05, Scot Hetzel wrote: > >> > >>Since I came in late in this, I need to know what kind of controller > >>we are talking about, and if the problem is still present in 6.0. > >>I plan to backport ATA from 6.0 to 5-stable when it has settled, so > >>6.0 is the one and only (pre)release to test with and get back to me > >>with the result. > >> > >> > > > >They have been talking about SII and Intel ICH6 chips. And a few have > >stated that they are having problems with the 6.0-Beta releases with > >these chips. > > Well, both work wonderfully here YMMV of course.. > > No, seriously I need *much* more accurate info than that, I need the > dmesg from the failing system, and I need an exact description of the > problem, preferably with logs, dumbs etc etc. > > - S?renhttp://www.freebsd.org/cgi/query-pr.cgi?pr=i386/83974 Filed on July 24th. Again, happy to give you SSH access to <THIS SPECIFIC MACHINE> if it will work towards getting this resolved. -- -- Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist http://www.denninger.net My home on the net - links to everything I do! http://scubaforum.org Your UNCENSORED place to talk about DIVING! http://genesis3.blogspot.com Musings Of A Sentient Mind
Mark Kane
2005-Aug-16 15:12 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
I've been having similar problems on 5.4-RELEASE. I have a brand new board back from the factory (a RMA) and had a thread going on freebsd-questions about this. I currently have six Maxtor 7200RPM ATA133 hard drives that I've been trying on and off and with various configurations in my 5.4-RELEASE amd64 machine. The only thing I have done thus far to reproduce the READ and WRITE errors (there have been more WRITE than READ) is copy data between the drives. All the drives check out just fine through PowerMax (Maxtor's utility), and work in other FreeBSD 4.x machines in the same placement that causes errors on my 5.4 box. The cables are also brand new. However, note that if I turn the drives speed down to UDMA100, the errors seem to go away. Has anyone else tried this for their problems? I've read the entire thread and so far there has been no mention of any nForce chipsets doing this, but I've got a Giga-Byte K8NS Pro motherboard with a nForce3 chipset. I've been troubleshooting this for almost two weeks now on my end, and up until last night I didn't see any "FAILURE" messages. They were all just WARNINGS. I've only seen the FAILURE on one of the six hard drives, and that was last night when I was trying to fdisk it. Right when I hit "w" to write the fdisk information my screen flooded with WARNINGS and FAILURES, so indeed that particular drive might be going. This problem did not happen with any of the other drives. My whole reason for bringing back this thread is to see if my problems could be a result of the problem discussed here, and in fact is really not my hardware. As I said, the board is brand new, the cables are brand new, and two of the hard drives are even brand new. I've been pulling my hair out trying to narrow this problem down, and as of yesterday I was just going to run all my drives in UDMA100 mode to save me the hassle (since they seem to run fine in 100). Then, I found this thread and thought I'd ask if anyone here might think anything other than hardware problems. Granted, I'm not using FreeBSD 5-STABLE, but I could certainly give it a shot if you guys think it would help anything. I just chose RELEASE hoping for the least problems. My dmesg is below. Note that I only have three of the drives in there currently. Thanks -Mark --------------------------- DMESG: FreeBSD 5.4-RELEASE #0: Sun May 8 07:00:26 UTC 2005 root@portnoy.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC ACPI APIC Table: <Nvidia AWRDACPI> Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 Processor 3000+ (2009.79-MHz K8-class CPU) Origin = "AuthenticAMD" Id = 0xfc0 Stepping = 0 Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow+,3DNow> real memory = 1610547200 (1535 MB) avail memory = 1543139328 (1471 MB) ioapic0 <Version 1.1> irqs 0-23 on motherboard acpi0: <Nvidia AWRDACPI> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: <ACPI CPU> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf0-0xcf3,0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 isab0: <PCI-ISA bridge> at device 1.0 on pci0 isa0: <ISA bus> on isab0 pci0: <serial bus, SMBus> at device 1.1 (no driver attached) ohci0: <OHCI (generic) USB controller> mem 0xfc002000-0xfc002fff irq 22 at device 2.0 on pci0 usb0: OHCI version 1.0, legacy support usb0: <OHCI (generic) USB controller> on ohci0 usb0: USB revision 1.0 uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 4 ports with 4 removable, self powered ohci1: <OHCI (generic) USB controller> mem 0xfc003000-0xfc003fff irq 21 at device 2.1 on pci0 usb1: OHCI version 1.0, legacy support usb1: <OHCI (generic) USB controller> on ohci1 usb1: USB revision 1.0 uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 4 ports with 4 removable, self powered pci0: <serial bus, USB> at device 2.2 (no driver attached) atapci0: <nVidia nForce3 Pro UDMA133 controller> port 0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 8.0 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 atapci1: <GENERIC ATA controller> port 0xe400-0xe40f,0xb70-0xb73,0x970-0x977,0xbf0-0xbf3,0x9f0-0x9f7 irq 22 at device 10.0 on pci0 ata2: channel #0 on atapci1 ata3: channel #1 on atapci1 pcib1: <ACPI PCI-PCI bridge> at device 11.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pci1: <display, VGA> at device 0.0 (no driver attached) pcib2: <ACPI PCI-PCI bridge> at device 14.0 on pci0 pci2: <ACPI PCI bus> on pcib2 pci2: <multimedia, audio> at device 9.0 (no driver attached) pci2: <input device> at device 9.1 (no driver attached) fwohci0: <1394 Open Host Controller Interface> mem 0xfb004000-0xfb007fff,0xfb00d000-0xfb00d7ff irq 18 at device 9.2 on pci2 fwohci0: OHCI version 1.10 (ROM=0) fwohci0: No. of Isochronous channels is 4. fwohci0: EUI64 00:02:3c:00:91:01:6c:20 fwohci0: Phy 1394a available S400, 2 ports. fwohci0: Link S400, max_rec 2048 bytes. firewire0: <IEEE1394(FireWire) bus> on fwohci0 fwe0: <Ethernet over FireWire> on firewire0 if_fwe0: Fake Ethernet address: 02:02:3c:01:6c:20 fwe0: Ethernet address: 02:02:3c:01:6c:20 fwe0: if_start running deferred for Giant sbp0: <SBP-2/SCSI over FireWire> on firewire0 fwohci0: Initiate bus reset fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire0: bus manager 0 (me) skc0: <Marvell Gigabit Ethernet> port 0xa800-0xa8ff mem 0xfb000000-0xfb003fff irq 19 at device 11.0 on pci2 skc0: Marvell Yukon Lite Gigabit Ethernet rev. A3(0x7) sk0: <Marvell Semiconductor, Inc. Yukon> on skc0 sk0: Ethernet address: 00:0f:ea:4f:83:8b miibus0: <MII bus> on sk0 e1000phy0: <Marvell 88E1000 Gigabit PHY> on miibus0 e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX-FDX, auto atapci2: <ITE IT8212F ATA133 controller> port 0xbc00-0xbc0f,0xb800-0xb803,0xb410-0xb417,0xb000-0xb003,0xac10-0xac17 irq 16 at device 12.0 on pci2 ata4: channel #0 on atapci2 ata5: channel #1 on atapci2 fwohci1: <Texas Instruments TSB43AB23> mem 0xfb008000-0xfb00bfff,0xfb00c000-0xfb00c7ff irq 18 at device 14.0 on pci2 fwohci1: OHCI version 1.10 (ROM=1) fwohci1: No. of Isochronous channels is 4. fwohci1: EUI64 00:0f:ea:00:00:47:38:9b fwohci1: Phy 1394a available S400, 3 ports. fwohci1: Link S400, max_rec 2048 bytes. firewire1: <IEEE1394(FireWire) bus> on fwohci1 fwe1: <Ethernet over FireWire> on firewire1 if_fwe1: Fake Ethernet address: 02:0f:ea:47:38:9b fwe1: Ethernet address: 02:0f:ea:47:38:9b fwe1: if_start running deferred for Giant sbp1: <SBP-2/SCSI over FireWire> on firewire1 fwohci1: Initiate bus reset fwohci1: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode firewire1: 1 nodes, maxhop <= 0, cable IRM = 0 (me) firewire1: bus manager 0 (me) fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0: <Standard parallel printer port> port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: model IntelliMouse, device ID 3 orm0: <ISA Option ROM> at iomem 0xc0000-0xcf7ff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 2009790353 Hz quality 800 Timecounters tick every 1.000 msec ad0: 194481MB <Maxtor 6B200R0/BAH41BM0> [395136/16/63] at ata0-master UDMA133 acd0: CDRW <TDK CDRW401240X/1t00> at ata1-master PIO4 acd1: DVDR <SONY DVD RW DRU-500A/2.1a> at ata1-slave PIO4 ad8: 156334MB <Maxtor 6Y160P0/YAR41BW0> [317632/16/63] at ata4-master UDMA133 ad9: 156334MB <Maxtor 6Y160P0/YAR41BW0> [317632/16/63] at ata4-slave UDMA133 Mounting root from ufs:/dev/ad0s1a
Mike Tancsa
2005-Aug-17 03:59 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
At 12:10 PM 16/08/2005, Mark Kane wrote:>However, note that if I turn the drives speed down to UDMA100, the >errors seem to go away. Has anyone else tried this for their problems?Yes, I have had Maxtor drives in the past where they would not work properly at certain bus speeds-- even back in the RELENG_4 days. Also, doesnt UDMA133 assume no slave ? I would just run them at 100. I dont think you would see much of a difference anyways. Perhaps Soeren could comment ? ---Mike
Mark Kirkwood
2005-Aug-17 04:57 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Mark Kane wrote:> > However, note that if I turn the drives speed down to UDMA100, the > errors seem to go away. Has anyone else tried this for their problems? >I currently do this, not due to problems, but to improve the write performance: 4xMaxtor 6E040L0 RAID0 UDMA133 -> 40M/s UDMA100 -> 120M/s (seem to find read performance is *slightly* slower for UDMA100, but not enough to make it worth worrying about). However this may just be a quirk specific to my system (Tyan S2510 + PCD20271). Cheers Mark