O. Hartmann
2005-Aug-09 00:25 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Hello.
My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64
boxed (see dmesg).
One of my SATA disks, the SAMSUNG SP2004C seems to show errors during
operation (and also showd under 5.4-RELEASE-p3).
Sometimes I get this error:
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
while the machine still keeps working.
Other days the box crashes completely.
Is this a operating system bug or is this message an evidence of
defective hardware?
By the way, DMA support is enabled:
hw.ata.ata_dma: 1
hw.ata.atapi_dma: 1
Thanks in advance,\
Oliver
-------------- next part --------------
Copyright (c) 1992-2005 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 6.0-BETA2 #23: Sun Aug 7 23:32:03 UTC 2005
root@thor.schanze.de:/usr/backup/obj/usr/src/sys/THOR
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) 64 Processor 3500+ (2211.34-MHz K8-class CPU)
Origin = "AuthenticAMD" Id = 0x10ff0 Stepping = 0
Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
AMD Features=0xe2500800<SYSCALL,NX,MMX+,<b25>,LM,3DNow+,3DNow>
real memory = 2147418112 (2047 MB)
avail memory = 2064375808 (1968 MB)
ACPI APIC Table: <Nvidia AWRDACPI>
ioapic0 <Version 1.1> irqs 0-23 on motherboard
netsmb_dev: loaded
acpi0: <Nvidia AWRDACPI> on motherboard
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi0: Power Button (fixed)
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
acpi_bus_number: can't get _ADR
pci_link0: <ACPI PCI Link LNK1> irq 10 on acpi0
pci_link1: <ACPI PCI Link LNK2> on acpi0
pci_link2: <ACPI PCI Link LNK3> irq 5 on acpi0
pci_link3: <ACPI PCI Link LNK4> on acpi0
pci_link4: <ACPI PCI Link LNK5> on acpi0
pci_link5: <ACPI PCI Link LUBA> irq 5 on acpi0
pci_link6: <ACPI PCI Link LUBB> on acpi0
pci_link7: <ACPI PCI Link LMAC> irq 11 on acpi0
pci_link8: <ACPI PCI Link LACI> irq 3 on acpi0
pci_link9: <ACPI PCI Link LMCI> on acpi0
pci_link10: <ACPI PCI Link LSMB> irq 11 on acpi0
pci_link11: <ACPI PCI Link LUB2> irq 3 on acpi0
pci_link12: <ACPI PCI Link LIDE> on acpi0
pci_link13: <ACPI PCI Link LSID> irq 11 on acpi0
pci_link14: <ACPI PCI Link LFID> irq 10 on acpi0
pci_link15: <ACPI PCI Link LPCA> on acpi0
pci_link16: <ACPI PCI Link APC1> irq 0 on acpi0
pci_link17: <ACPI PCI Link APC2> irq 0 on acpi0
pci_link18: <ACPI PCI Link APC3> irq 0 on acpi0
pci_link19: <ACPI PCI Link APC4> irq 0 on acpi0
pci_link20: <ACPI PCI Link APC5> irq 16 on acpi0
pci_link21: <ACPI PCI Link APCF> irq 0 on acpi0
pci_link22: <ACPI PCI Link APCG> irq 0 on acpi0
pci_link23: <ACPI PCI Link APCH> irq 0 on acpi0
pci_link24: <ACPI PCI Link APCJ> irq 0 on acpi0
pci_link25: <ACPI PCI Link APCK> irq 0 on acpi0
pci_link26: <ACPI PCI Link APCS> irq 0 on acpi0
pci_link27: <ACPI PCI Link APCL> irq 0 on acpi0
pci_link28: <ACPI PCI Link APCZ> irq 0 on acpi0
pci_link29: <ACPI PCI Link APSI> irq 0 on acpi0
pci_link30: <ACPI PCI Link APSJ> irq 0 on acpi0
pci_link31: <ACPI PCI Link APCP> irq 0 on acpi0
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x4008-0x400b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci_link26: BIOS IRQ 11 for -2145766612.1.INTA is invalid
pci_link21: BIOS IRQ 5 for -2145766612.2.INTA is invalid
pci_link27: BIOS IRQ 3 for -2145766612.2.INTB is invalid
pci_link23: BIOS IRQ 11 for -2145766612.10.INTA is invalid
pci_link24: BIOS IRQ 3 for -2145766612.4.INTA is invalid
pci_link29: BIOS IRQ 11 for -2145766612.7.INTA is invalid
pci_link30: BIOS IRQ 10 for -2145766612.8.INTA is invalid
pci0: <ACPI PCI bus> on pcib0
pci_link26: Unable to choose an IRQ
pci_link21: Unable to choose an IRQ
pci_link27: Unable to choose an IRQ
pci_link24: Unable to choose an IRQ
pci_link29: Unable to choose an IRQ
pci_link30: Unable to choose an IRQ
pci_link23: Unable to choose an IRQ
pci0: <memory> at device 0.0 (no driver attached)
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
ichsmb0: <SMBus controller> port 0xe400-0xe41f,0x4c00-0x4c3f,0x4c40-0x4c7f
irq 20 at device 1.1 on pci0
ichsmb0: [GIANT-LOCKED]
smbus0: <System Management Bus> on ichsmb0
smb0: <SMBus generic I/O> on smbus0
ohci0: <OHCI (generic) USB controller> mem 0xd8104000-0xd8104fff irq 21 at
device 2.0 on pci0
ohci0: [GIANT-LOCKED]
usb0: OHCI version 1.0, legacy support
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 10 ports with 10 removable, self powered
ehci0: <EHCI (generic) USB 2.0 controller> mem 0xfeb00000-0xfeb000ff irq
22 at device 2.1 on pci0
ehci0: [GIANT-LOCKED]
usb1: EHCI version 1.0
usb1: companion controller, 4 ports each: usb0
usb1: <EHCI (generic) USB 2.0 controller> on ehci0
usb1: USB revision 2.0
uhub1: nVidia EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub1: 10 ports with 10 removable, self powered
pcm0: <nVidia nForce4> port 0xdc00-0xdcff,0xe000-0xe0ff mem
0xd8103000-0xd8103fff irq 23 at device 4.0 on pci0
pcm0: [GIANT-LOCKED]
pcm0: <Avance Logic ALC850 AC97 Codec>
atapci0: <nVidia nForce4 UDMA133 controller> port
0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xf000-0xf00f at device 6.0 on pci0
ata0: <ATA channel 0> on atapci0
ata1: <ATA channel 1> on atapci0
atapci1: <nVidia nForce4 SATA150 controller> port
0x9f0-0x9f7,0xbf0-0xbf3,0x970-0x977,0xb70-0xb73,0xd800-0xd80f mem
0xd8102000-0xd8102fff irq 21 at device 7.0 on pci0
ata2: <ATA channel 0> on atapci1
ata3: <ATA channel 1> on atapci1
atapci2: <nVidia nForce4 SATA150 controller> port
0x9e0-0x9e7,0xbe0-0xbe3,0x960-0x967,0xb60-0xb63,0xc400-0xc40f mem
0xd8101000-0xd8101fff irq 22 at device 8.0 on pci0
ata4: <ATA channel 0> on atapci2
ata5: <ATA channel 1> on atapci2
pcib1: <ACPI PCI-PCI bridge> at device 9.0 on pci0
pci_link17: BIOS IRQ 21 for 0.7.INTA is invalid
pci_link18: BIOS IRQ 22 for 0.8.INTA is invalid
pci_link19: BIOS IRQ 23 for 0.10.INTA is invalid
pci5: <ACPI PCI bus> on pcib1
pci_link16: Unable to choose an IRQ
fwohci0: <Texas Instruments TSB43AB22/A> mem
0xd8004000-0xd80047ff,0xd8000000-0xd8003fff irq 16 at device 11.0 on pci5
fwohci0: OHCI version 1.10 (ROM=1)
fwohci0: No. of Isochronous channels is 4.
fwohci0: EUI64 00:11:d8:00:00:12:53:30
fwohci0: Phy 1394a available S400, 2 ports.
fwohci0: Link S400, max_rec 2048 bytes.
firewire0: <IEEE1394(FireWire) bus> on fwohci0
fwohci0: Initiate bus reset
fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
firewire0: bus manager 0 (me)
nve0: <NVIDIA nForce MCP9 Networking Adapter> port 0xb000-0xb007 mem
0xd8100000-0xd8100fff irq 23 at device 10.0 on pci0
nve0: Ethernet address 00:11:d8:92:a3:15
miibus0: <MII bus> on nve0
ukphy0: <Generic IEEE 802.3u media interface> on miibus0
ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
nve0: Ethernet address: 00:11:d8:92:a3:15
nve0: [GIANT-LOCKED]
pcib2: <ACPI PCI-PCI bridge> at device 11.0 on pci0
pci4: <ACPI PCI bus> on pcib2
pcib3: <ACPI PCI-PCI bridge> at device 12.0 on pci0
pci3: <ACPI PCI bus> on pcib3
pcib4: <ACPI PCI-PCI bridge> at device 13.0 on pci0
pci2: <ACPI PCI bus> on pcib4
pcib5: <ACPI PCI-PCI bridge> at device 14.0 on pci0
pci1: <ACPI PCI bus> on pcib5
pci_link18: Unable to choose an IRQ
pci1: <display, VGA> at device 0.0 (no driver attached)
acpi_tz0: <Thermal Zone> on acpi0
fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on
acpi0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on
acpi0
sio0: type 16550A
ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77b irq 7 drq 3
on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/16 bytes threshold
ppbus0: <Parallel port bus> on ppc0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: [GIANT-LOCKED]
psm0: model IntelliMouse, device ID 3
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <8 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
fb0 at vga0
Timecounter "TSC" frequency 2211343498 Hz quality 800
Timecounters tick every 1.000 msec
Fast IPsec: Initialized Security Association Processing.
acd0: DVDR <NEC DVD RW ND-3500AG/2.19> at ata0-master UDMA33
ad8: 194481MB <Maxtor 6B200M0 BANC1B70> at ata4-master SATA150
ad10: 190782MB <SAMSUNG SP2004C VM100-31> at ata5-master SATA150
cd0 at ata0 bus 0 target 0 lun 0
cd0: <_NEC DVD_RW ND-3500AG 2.19> Removable CD-ROM SCSI-0 device
cd0: 33.000MB/s transfers
cd0: cd present [2295104 x 2048 byte records]
GEOM_LABEL: Label for provider acd0 is iso9660/CDROM.
Trying to mount root from ufs:/dev/ad8s1a
pflog0: promiscuous mode enabled
WARNING pid 525 (nasd): ioctl sign-extension ioctl ffffffffc0106924
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Mike Tancsa
2005-Aug-09 03:28 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
At 08:25 PM 08/08/2005, O. Hartmann wrote:>Hello. > >My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64 boxed >(see dmesg). >One of my SATA disks, the SAMSUNG SP2004C seems to show errors during >operation (and also showd under 5.4-RELEASE-p3). >Sometimes I get this error: >ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 >while the machine still keeps working. >Other days the box crashes completely. > >Is this a operating system bug or is this message an evidence of defective >hardware?You can probably confirm a hardware issue with the smartmon tools. (/usr/ports/sysutils/smartmontools). It was quite handy the other day for us to narrow down a problem between a drive tray and the actual drive. We started to see Aug 3 02:02:49 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=391423 Aug 3 02:03:00 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2304319 Aug 3 02:03:10 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2312927 Aug 3 02:03:17 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2308639 Aug 3 02:03:26 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2309855 Aug 3 02:03:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=2348359 Aug 4 12:12:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=1528639 Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 retries left) LBA=1530031 Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (1 retry left) LBA=1528639 Aug 4 12:13:04 verify1 kernel: ad0: FAILURE - READ_DMA timed out Aug 4 12:13:04 verify1 kernel: spec_getpages:(ad0s1a) I/O read failure: (error=5) bp 0xd630b4fc vp 0xc2640d68 Yet when we read the actual error info off the drive via smartctl -a ad0, it was clean. So it pointed to the drive tray which we swapped and all was well. In other situations however, the smart info will often tell you if the drive is starting to fail. Its not 100% reliable, but since we started using it, it generally gave us some sort of heads up as to whether or not a drive is in trouble. ---Mike
O. Hartmann
2005-Aug-09 08:23 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Mike Tancsa wrote:> At 08:25 PM 08/08/2005, O. Hartmann wrote: > >> Hello. >> >> My box is a FreeBSD 6.0-BETA2 driven ASUS A8N-SLI Deluxe based AMD64 >> boxed (see dmesg). >> One of my SATA disks, the SAMSUNG SP2004C seems to show errors during >> operation (and also showd under 5.4-RELEASE-p3). >> Sometimes I get this error: >> ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 >> while the machine still keeps working. >> Other days the box crashes completely. >> >> Is this a operating system bug or is this message an evidence of >> defective hardware? > > > You can probably confirm a hardware issue with the smartmon tools. > (/usr/ports/sysutils/smartmontools). > > It was quite handy the other day for us to narrow down a problem between > a drive tray and the actual drive. We started to see > > Aug 3 02:02:49 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=391423 > Aug 3 02:03:00 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2304319 > Aug 3 02:03:10 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2312927 > Aug 3 02:03:17 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2308639 > Aug 3 02:03:26 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2309855 > Aug 3 02:03:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=2348359 > Aug 4 12:12:37 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=1528639 > Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (2 > retries left) LBA=1530031 > Aug 4 12:13:04 verify1 kernel: ad0: TIMEOUT - READ_DMA retrying (1 > retry left) LBA=1528639 > Aug 4 12:13:04 verify1 kernel: ad0: FAILURE - READ_DMA timed out > Aug 4 12:13:04 verify1 kernel: spec_getpages:(ad0s1a) I/O read failure: > (error=5) bp 0xd630b4fc vp 0xc2640d68 > > Yet when we read the actual error info off the drive via smartctl -a > ad0, it was clean. So it pointed to the drive tray which we swapped and > all was well. In other situations however, the smart info will often > tell you if the drive is starting to fail. Its not 100% reliable, but > since we started using it, it generally gave us some sort of heads up as > to whether or not a drive is in trouble. > > > ---MikeDear Mike. Thanks a lot for this info. I will use this tool and try to report what I found out. I also use trays for my drives (like I did with SCSI and SCA2 on our servers at the lab). Maybe this could be an issue. Oliver
Chuck Swiger
2005-Aug-09 22:24 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
O. Hartmann wrote: [ ... ]> One of my SATA disks, the SAMSUNG SP2004C seems to show errors during > operation (and also showd under 5.4-RELEASE-p3). > Sometimes I get this error: > ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 > while the machine still keeps working. > Other days the box crashes completely. > > Is this a operating system bug or is this message an evidence of > defective hardware?Back up any data you care about now. Use the smartmontools port or hunt down a utility from Samsung which'll do a surface test (read only, nondestructive). You can also run a "dd if=/dev/ad10 of=/dev/null bs=8192" to do a full read test under FreeBSD, and see how many CRC errors show up. -- -Chuck
O. Hartmann
2005-Aug-10 05:59 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Chuck Swiger wrote:> O. Hartmann wrote: > [ ... ] > >> One of my SATA disks, the SAMSUNG SP2004C seems to show errors >> during operation (and also showd under 5.4-RELEASE-p3). >> Sometimes I get this error: >> ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 >> while the machine still keeps working. >> Other days the box crashes completely. >> >> Is this a operating system bug or is this message an evidence of >> defective hardware? > > You can also run a "dd if=/dev/ad10 of=/dev/null bs=8192" to do a full > read test under FreeBSD, and see how many CRC errors show up. >I did so and I ran into a crash of the system ... I changed the cabling, did it again and until now nothing happend ... hope it was only a cabling issue. The first time I use ATA/SATA and now these experiences ... When is SCSI back for desktops?
Joel Rees
2005-Aug-10 07:21 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
On ?? 17/08/10, at 7:36, O. Hartmann wrote:> [...] When is SCSI back for desktops?I vote for that. In my opinion, ATA is primarily for home media systems, if that. Joel Rees <rees@ddcom.co.jp> digitcom, inc. ???????? Kobe, Japan +81-78-672-8800 ** <http://www.ddcom.co.jp> **
Andrey V. Elsukov
2005-Aug-10 09:38 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
O. Hartmann wrote:> Sometimes I get this error: > ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599 > while the machine still keeps working.Check your disks with MHDD (http://mhdd.com/). -- WBR, Andrey V. Elsukov
O. Hartmann
2005-Aug-10 17:45 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Mike Jakubik wrote:> On Wed, August 10, 2005 6:37 am, Dmitry Mityugov said: > > >>There are Maxtor MaXLine II and III, and perhaps several other models, >>that are supposed to work 24/7. > > > Right, i have a dead 250GB Maxline Plus II drive on my desk, only after > about 1.5 years. At least its still on warranty.On the other hand: In the department for physics of the athmosphere, where I built six years ago a server for meteorological data, a RAID-5 with 4 older IBM U160 SCSI discs still works - 24/7. Never had a problem!
Unix
2005-Aug-10 17:49 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
O. Hartmann wrote:> Mike Jakubik wrote: > >> On Wed, August 10, 2005 6:37 am, Dmitry Mityugov said: >> >> >>> There are Maxtor MaXLine II and III, and perhaps several other models, >>> that are supposed to work 24/7. >> >> >> >> Right, i have a dead 250GB Maxline Plus II drive on my desk, only after >> about 1.5 years. At least its still on warranty. > > > On the other hand: In the department for physics of the athmosphere, > where I built six years ago a server for meteorological data, a RAID-5 > with 4 older IBM U160 SCSI discs still works - 24/7. Never had a problem! >I still own old 1-2 GB old SCSI disks and these are still working, I also had an old 500mb SCSI disk that was in an old Mac that also worked but I trashed it since it was that old and no longer of use...
Karl Denninger
2005-Aug-10 20:54 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
On Wed, Aug 10, 2005 at 08:36:39PM +0200, S?ren Schmidt wrote:> > On 10/08/2005, at 20:05, Scot Hetzel wrote: > >> > >>Since I came in late in this, I need to know what kind of controller > >>we are talking about, and if the problem is still present in 6.0. > >>I plan to backport ATA from 6.0 to 5-stable when it has settled, so > >>6.0 is the one and only (pre)release to test with and get back to me > >>with the result. > >> > >> > > > >They have been talking about SII and Intel ICH6 chips. And a few have > >stated that they are having problems with the 6.0-Beta releases with > >these chips. > > Well, both work wonderfully here YMMV of course.. > > No, seriously I need *much* more accurate info than that, I need the > dmesg from the failing system, and I need an exact description of the > problem, preferably with logs, dumbs etc etc. > > - S?renhttp://www.freebsd.org/cgi/query-pr.cgi?pr=i386/83974 Filed on July 24th. Again, happy to give you SSH access to <THIS SPECIFIC MACHINE> if it will work towards getting this resolved. -- -- Karl Denninger (karl@denninger.net) Internet Consultant & Kids Rights Activist http://www.denninger.net My home on the net - links to everything I do! http://scubaforum.org Your UNCENSORED place to talk about DIVING! http://genesis3.blogspot.com Musings Of A Sentient Mind
Mark Kane
2005-Aug-16 15:12 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
I've been having similar problems on 5.4-RELEASE. I have a brand new
board back from the factory (a RMA) and had a thread going on
freebsd-questions about this.
I currently have six Maxtor 7200RPM ATA133 hard drives that I've been
trying on and off and with various configurations in my 5.4-RELEASE
amd64 machine. The only thing I have done thus far to reproduce the READ
and WRITE errors (there have been more WRITE than READ) is copy data
between the drives. All the drives check out just fine through PowerMax
(Maxtor's utility), and work in other FreeBSD 4.x machines in the same
placement that causes errors on my 5.4 box. The cables are also brand new.
However, note that if I turn the drives speed down to UDMA100, the
errors seem to go away. Has anyone else tried this for their problems?
I've read the entire thread and so far there has been no mention of any
nForce chipsets doing this, but I've got a Giga-Byte K8NS Pro
motherboard with a nForce3 chipset.
I've been troubleshooting this for almost two weeks now on my end, and
up until last night I didn't see any "FAILURE" messages. They were
all
just WARNINGS. I've only seen the FAILURE on one of the six hard drives,
and that was last night when I was trying to fdisk it. Right when I hit
"w" to write the fdisk information my screen flooded with WARNINGS and
FAILURES, so indeed that particular drive might be going. This problem
did not happen with any of the other drives.
My whole reason for bringing back this thread is to see if my problems
could be a result of the problem discussed here, and in fact is really
not my hardware. As I said, the board is brand new, the cables are brand
new, and two of the hard drives are even brand new. I've been pulling my
hair out trying to narrow this problem down, and as of yesterday I was
just going to run all my drives in UDMA100 mode to save me the hassle
(since they seem to run fine in 100). Then, I found this thread and
thought I'd ask if anyone here might think anything other than hardware
problems.
Granted, I'm not using FreeBSD 5-STABLE, but I could certainly give it a
shot if you guys think it would help anything. I just chose RELEASE
hoping for the least problems.
My dmesg is below. Note that I only have three of the drives in there
currently.
Thanks
-Mark
---------------------------
DMESG:
FreeBSD 5.4-RELEASE #0: Sun May 8 07:00:26 UTC 2005
root@portnoy.cse.buffalo.edu:/usr/obj/usr/src/sys/GENERIC
ACPI APIC Table: <Nvidia AWRDACPI>
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) 64 Processor 3000+ (2009.79-MHz K8-class CPU)
Origin = "AuthenticAMD" Id = 0xfc0 Stepping = 0
Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
AMD Features=0xe0500800<SYSCALL,NX,MMX+,LM,3DNow+,3DNow>
real memory = 1610547200 (1535 MB)
avail memory = 1543139328 (1471 MB)
ioapic0 <Version 1.1> irqs 0-23 on motherboard
acpi0: <Nvidia AWRDACPI> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf0-0xcf3,0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
isab0: <PCI-ISA bridge> at device 1.0 on pci0
isa0: <ISA bus> on isab0
pci0: <serial bus, SMBus> at device 1.1 (no driver attached)
ohci0: <OHCI (generic) USB controller> mem 0xfc002000-0xfc002fff irq 22
at device 2.0 on pci0
usb0: OHCI version 1.0, legacy support
usb0: <OHCI (generic) USB controller> on ohci0
usb0: USB revision 1.0
uhub0: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 4 ports with 4 removable, self powered
ohci1: <OHCI (generic) USB controller> mem 0xfc003000-0xfc003fff irq 21
at device 2.1 on pci0
usb1: OHCI version 1.0, legacy support
usb1: <OHCI (generic) USB controller> on ohci1
usb1: USB revision 1.0
uhub1: nVidia OHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 4 ports with 4 removable, self powered
pci0: <serial bus, USB> at device 2.2 (no driver attached)
atapci0: <nVidia nForce3 Pro UDMA133 controller> port
0xf000-0xf00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 8.0 on pci0
ata0: channel #0 on atapci0
ata1: channel #1 on atapci0
atapci1: <GENERIC ATA controller> port
0xe400-0xe40f,0xb70-0xb73,0x970-0x977,0xbf0-0xbf3,0x9f0-0x9f7 irq 22 at
device 10.0 on pci0
ata2: channel #0 on atapci1
ata3: channel #1 on atapci1
pcib1: <ACPI PCI-PCI bridge> at device 11.0 on pci0
pci1: <ACPI PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
pcib2: <ACPI PCI-PCI bridge> at device 14.0 on pci0
pci2: <ACPI PCI bus> on pcib2
pci2: <multimedia, audio> at device 9.0 (no driver attached)
pci2: <input device> at device 9.1 (no driver attached)
fwohci0: <1394 Open Host Controller Interface> mem
0xfb004000-0xfb007fff,0xfb00d000-0xfb00d7ff irq 18 at device 9.2 on pci2
fwohci0: OHCI version 1.10 (ROM=0)
fwohci0: No. of Isochronous channels is 4.
fwohci0: EUI64 00:02:3c:00:91:01:6c:20
fwohci0: Phy 1394a available S400, 2 ports.
fwohci0: Link S400, max_rec 2048 bytes.
firewire0: <IEEE1394(FireWire) bus> on fwohci0
fwe0: <Ethernet over FireWire> on firewire0
if_fwe0: Fake Ethernet address: 02:02:3c:01:6c:20
fwe0: Ethernet address: 02:02:3c:01:6c:20
fwe0: if_start running deferred for Giant
sbp0: <SBP-2/SCSI over FireWire> on firewire0
fwohci0: Initiate bus reset
fwohci0: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
firewire0: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
firewire0: bus manager 0 (me)
skc0: <Marvell Gigabit Ethernet> port 0xa800-0xa8ff mem
0xfb000000-0xfb003fff irq 19 at device 11.0 on pci2
skc0: Marvell Yukon Lite Gigabit Ethernet rev. A3(0x7)
sk0: <Marvell Semiconductor, Inc. Yukon> on skc0
sk0: Ethernet address: 00:0f:ea:4f:83:8b
miibus0: <MII bus> on sk0
e1000phy0: <Marvell 88E1000 Gigabit PHY> on miibus0
e1000phy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX,
1000baseTX-FDX, auto
atapci2: <ITE IT8212F ATA133 controller> port
0xbc00-0xbc0f,0xb800-0xb803,0xb410-0xb417,0xb000-0xb003,0xac10-0xac17
irq 16 at device 12.0 on pci2
ata4: channel #0 on atapci2
ata5: channel #1 on atapci2
fwohci1: <Texas Instruments TSB43AB23> mem
0xfb008000-0xfb00bfff,0xfb00c000-0xfb00c7ff irq 18 at device 14.0 on pci2
fwohci1: OHCI version 1.10 (ROM=1)
fwohci1: No. of Isochronous channels is 4.
fwohci1: EUI64 00:0f:ea:00:00:47:38:9b
fwohci1: Phy 1394a available S400, 3 ports.
fwohci1: Link S400, max_rec 2048 bytes.
firewire1: <IEEE1394(FireWire) bus> on fwohci1
fwe1: <Ethernet over FireWire> on firewire1
if_fwe1: Fake Ethernet address: 02:0f:ea:47:38:9b
fwe1: Ethernet address: 02:0f:ea:47:38:9b
fwe1: if_start running deferred for Giant
sbp1: <SBP-2/SCSI over FireWire> on firewire1
fwohci1: Initiate bus reset
fwohci1: node_id=0xc800ffc0, gen=1, CYCLEMASTER mode
firewire1: 1 nodes, maxhop <= 0, cable IRM = 0 (me)
firewire1: bus manager 0 (me)
fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on
acpi0
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on
acpi0
sio0: type 16550A
sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0: <Standard parallel printer port> port 0x378-0x37f irq 7 on acpi0
ppc0: Generic chipset (NIBBLE-only) in COMPATIBLE mode
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
psm0: <PS/2 Mouse> irq 12 on atkbdc0
psm0: model IntelliMouse, device ID 3
orm0: <ISA Option ROM> at iomem 0xc0000-0xcf7ff on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 2009790353 Hz quality 800
Timecounters tick every 1.000 msec
ad0: 194481MB <Maxtor 6B200R0/BAH41BM0> [395136/16/63] at ata0-master
UDMA133
acd0: CDRW <TDK CDRW401240X/1t00> at ata1-master PIO4
acd1: DVDR <SONY DVD RW DRU-500A/2.1a> at ata1-slave PIO4
ad8: 156334MB <Maxtor 6Y160P0/YAR41BW0> [317632/16/63] at ata4-master
UDMA133
ad9: 156334MB <Maxtor 6Y160P0/YAR41BW0> [317632/16/63] at ata4-slave
UDMA133
Mounting root from ufs:/dev/ad0s1a
Mike Tancsa
2005-Aug-17 03:59 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
At 12:10 PM 16/08/2005, Mark Kane wrote:>However, note that if I turn the drives speed down to UDMA100, the >errors seem to go away. Has anyone else tried this for their problems?Yes, I have had Maxtor drives in the past where they would not work properly at certain bus speeds-- even back in the RELENG_4 days. Also, doesnt UDMA133 assume no slave ? I would just run them at 100. I dont think you would see much of a difference anyways. Perhaps Soeren could comment ? ---Mike
Mark Kirkwood
2005-Aug-17 04:57 UTC
ad10: WARNING - READ_DMA UDMA ICRC error (retrying request) LBA=11441599
Mark Kane wrote:> > However, note that if I turn the drives speed down to UDMA100, the > errors seem to go away. Has anyone else tried this for their problems? >I currently do this, not due to problems, but to improve the write performance: 4xMaxtor 6E040L0 RAID0 UDMA133 -> 40M/s UDMA100 -> 120M/s (seem to find read performance is *slightly* slower for UDMA100, but not enough to make it worth worrying about). However this may just be a quirk specific to my system (Tyan S2510 + PCD20271). Cheers Mark