[Please Cc: any responses, I'm not currently subscribed to the list] Hardware: Intel SE7230NH1-E motherboard, integrated SATA RAID (ICH7) Two 500 GB HDDs set up as RAID1 Intel Pentium D 945 CPU 2 GB RAM OS: FreeBSD mail.domain.tld 6.2-RC1 FreeBSD 6.2-RC1 #0: Fri Nov 24 23:26:25 EET 2006 toomas@mail.domain.tld:/usr/obj/usr/src/sys/TVS/amd64 SMP-enabled kernel Problem: During large file transfer (ca 750000 files, total size ca 45 GB) from another server using gtar via ssh-pipe, kernel reported a series of ATA related error messages. Note that after timestamp "Dec 10 23:59:35" even the messages themselves appear garbled in the log: Dec 10 23:52:50 mail kernel: DOH! ata_alloc_request failed! Dec 10 23:52:52 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:52:52 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:52:52 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=270864936960, length=4096)]error = 5 Dec 10 23:52:52 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=270920138752, length=16384)]error = 5 Dec 10 23:57:29 mail kernel: DOH! ata_alloc_request failed! Dec 10 23:57:29 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:57:29 mail last message repeated 7 times Dec 10 23:57:29 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=271459713024, length=12288)]error = 5 Dec 10 23:57:30 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=271459729408, length=12288)]error = 5 Dec 10 23:57:30 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=271459745792, length=14336)]error = 5 Dec 10 23:57:30 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=271459762176, length=16384)]error = 5 Dec 10 23:57:30 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=271459778560, length=12288)]error = 5 Dec 10 23:57:30 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=271459794944, length=12288)]error = 5 Dec 10 23:57:30 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=271459811328, length=12288)]error = 5 Dec 10 23:57:30 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=271459827712, length=12288)]error = 5 Dec 10 23:58:32 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:58:32 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=65536, length=2048)]error = 5 Dec 10 23:59:03 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:59:03 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=274533269504, length=14336)]error = 5 Dec 10 23:59:35 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITEDOH! ata_alloc_(offset=275request failed!139362816, length=10240)]error = 5 Dec 10 23:59:35 mail kernel: Dec 10 23:59:35 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: FAg_vfs_done():Iar0s1g[WRITE(LoUfRfEs e-t= o2u7t5 1o1f2 9m2e3m1o3r6y, ilenn gattha_rai=d_1i0n2i4t0_)r]eerqruoers tDec 10 23:59:35 mail kernel: 5 Dec 10 23:59:35 mail kernel: FAILgU_vRfEs _-d oonuet() of m:eamro0rsy1 gi[WnR IaTtEa(_orfafisde_ti=nit2_7r5e1q1u2est Dec 10 23:59:35 mail kernel: 955904, length=10240)]error =F A5IL Dec 10 23:59:35 mail kernel: URE - out of memory in ata_raid_ing_ivtf_sr_edqounee(s)t: Dec 10 23:59:35 mail kernel: ar0s1g[WRITE(offset=27F5A1I3L9U7R8E8 8-0 0o,u tl eonfgt hm=e1m0o2r4y0 )i]ne rartoar_ =ra i5d Dec 10 23:59:35 mail kernel: _init_request Dec 10 23:59:35 mail kernel: FAILURE - ogu_tv fosf_ domneem(o)r:ya ri0ns 1agt[aW_RrIaTiEd(_oifnfiste_tr=eq2u7e5s1t3 Dec 10 23:59:35 mail kernel: 9805184, length=1228FAILURE - out of memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: 8)]error F=A I5LU Dec 10 23:59:35 mail kernel: RE - out of memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: FAILUgR_Ev f-s _oduotn eo(f) memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: FAILURE :- oaurt0s 1ogf[ WmReITmEo(royf fisne ta=ta2_7r5a1i3d9_8i2n1i5t6_8r,e qlueensgthtDec 10 23:59:35 mail kernel: 12288)]error = 5 Dec 10 23:59:35 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: FAILURE - out go_fv fmse_mdoonery( )i:na art0as_1rga[iWdR_IiTnEi(to_frfesqeute=st27 Dec 10 23:59:35 mail kernel: 5139837952, length=14336)]error = 5 Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=275139854336, FlAILURE - out of emnegmtohr=y 1i4n33 6a)t]ear_rorra i=d _5in Dec 10 23:59:35 mail kernel: it_request Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRFIATEI(LUoRfEf s-e to=ut 2o7f5 1m3e9m8o7r0y7 2i0n, altean_grtahi=d1_2i2n8i8t)_]reerqruoers tDec 10 23:59:35 mail kernel: 5 Dec 10 23:59:35 mail kernel: FAIgL_UvRfEs _-d oonuet( )o:fa rm0esm1ogr[yW RiInT Ea(toaf_frsaeitd=_in2i7t5_1r3e9q8uest8 Dec 10 23:59:35 mail kernel: 7104, length=12288)]error = 5 Dec 10 23:59:35 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=275115839488, length=10240)]error = 5 Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=275117283328, length=10240)]error = 5 Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=275117316096, length=10240)]error = 5 Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=6144000, length=16384)]error = 5 Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=6160384, length=16384)]error = 5 Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=6176768, length=6144)]error = 5 Dec 10 23:59:35 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=65536, length=2048)]error = 5 Dec 10 23:59:35 mail kernel: swap_pager: I/O error - pageout failed; blkno 34,size 4096, error 5 Dec 10 23:59:35 mail kernel: swap_pager: I/O error - pageout failed; blkno 35,size 4096, error 5 Dec 10 23:59:35 mail kernel: swap_pager: I/O error - pageout failed; blkno 36,size 4096, error 5 Dec 10 23:59:35 mail kernel: swap_pager: I/O error - pageout failed; blkno 37,size 4096, error 5 Dec 11 00:27:10 mail kernel: DOH! ata_alloc_request failed! Dec 11 00:27:12 mail kernel: FAILURE - out of memory in ata_raid_init_request Dec 11 00:27:12 mail last message repeated 4 times Dec 11 00:27:12 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=299515107328, length=6144)]error = 5 Dec 11 00:27:12 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=299518115840, length=6144)]error = 5 Dec 11 00:27:12 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=299515031552, length=4096)]error = 5 Dec 11 00:27:12 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=299518132224, length=6144)]error = 5 Dec 11 00:27:12 mail kernel: g_vfs_done():ar0s1g[WRITE(offset=299515123712, length=6144)]error = 5 The command I was using for file transfer was: ssh -l root oldserver.domain.tld tar -cpf - -C /storage/www . \ | gtar xpf - --delay-directory-restore During the transfer I received one error message from gtar: gtar: ./tallinn_linnaarhiiv/149-5-2178/TileGroup1/6-43-1.jpg: Cannot write: No space left on device There was actually plenty of free space on the partition to where I was transferring the files. I verified this file using md5 after the transfer completed and found that checksums of this file on old server and new server did not match. I also randomly verified some other files but didn't find any more checksum mismatches. Here's the dmesg of the machine where the problem occurs: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RC1 #0: Fri Nov 24 23:26:25 EET 2006 toomas@mail.domain.tld:/usr/obj/usr/src/sys/TVS ACPI APIC Table: <INTEL 04DT044 > Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Pentium(R) D CPU 3.40GHz (3398.30-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0xf64 Stepping = 4 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0xe49d<SSE3,RSVD2,MON,DS_CPL,EST,CNTX-ID,CX16,<b14>,<b15>> AMD Features=0x20000800<SYSCALL,LM> AMD Features2=0x1<LAHF> Cores per package: 2 real memory = 2145849344 (2046 MB) avail memory = 2064850944 (1969 MB) FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 ioapic0: Changing APIC ID to 5 ioapic0 <Version 2.0> irqs 0-23 on motherboard kbd1 at kbdmux0 acpi0: <INTEL 04DT044> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 cpu0: <ACPI CPU> on acpi0 cpu1: <ACPI CPU> on acpi0 acpi_button0: <Sleep Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 28.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> at device 28.4 on pci0 pci2: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> at device 28.5 on pci0 pci3: <ACPI PCI bus> on pcib3 em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0x2000-0x201f mem 0x88100000-0x8811ffff irq 17 at device 0.0 on pci3 em0: Ethernet address: 00:16:76:cc:3c:4e uhci0: <UHCI (generic) USB controller> port 0x3080-0x309f irq 23 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: <UHCI (generic) USB controller> on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <UHCI (generic) USB controller> port 0x3060-0x307f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: <UHCI (generic) USB controller> on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: <UHCI (generic) USB controller> port 0x3040-0x305f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: <UHCI (generic) USB controller> on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: <UHCI (generic) USB controller> port 0x3020-0x303f irq 16 at device 29.3 on pci0 uhci3: [GIANT-LOCKED] usb3: <UHCI (generic) USB controller> on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered ehci0: <Intel 82801GB/R (ICH7) USB 2.0 controller> mem 0x88200400-0x882007ff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb4: EHCI version 1.0 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3 usb4: <Intel 82801GB/R (ICH7) USB 2.0 controller> on ehci0 usb4: USB revision 2.0 uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub4: 8 ports with 8 removable, self powered pcib4: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci4: <ACPI PCI bus> on pcib4 pci4: <display, VGA> at device 4.0 (no driver attached) em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0x1100-0x113f mem 0x88020000-0x8803ffff,0x88000000-0x8801ffff irq 17 at device 5.0 on pci4 em1: Ethernet address: 00:16:76:cc:3c:4f isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel ICH7 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x30b0-0x30bf irq 18 at device 31.1 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 atapci1: <Intel ICH7 SATA300 controller> port 0x30c8-0x30cf,0x30e4-0x30e7,0x30c0-0x30c7,0x30e0-0x30e3,0x30a0-0x30af mem 0x88200000-0x882003ff irq 19 at device 31.2 on pci0 atapci1: AHCI Version 01.10 controller with 4 ports detected ata2: <ATA channel 0> on atapci1 ata3: <ATA channel 1> on atapci1 ata4: <ATA channel 2> on atapci1 ata5: <ATA channel 3> on atapci1 ichsmb0: <SMBus controller> port 0x3000-0x301f irq 19 at device 31.3 on pci0 ichsmb0: [GIANT-LOCKED] smbus0: <System Management Bus> on ichsmb0 smb0: <SMBus generic I/O> on smbus0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A orm0: <ISA Option ROM> at iomem 0xc0000-0xcafff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: DVDR <DVDRW DRW-6S160P/PSG6> at ata0-master UDMA66 ad4: 476940MB <Seagate ST3500630AS 3.AAE> at ata2-master SATA300 ad6: 476940MB <Seagate ST3500630AS 3.AAE> at ata3-master SATA300 ar0: 476772MB <LSILogic v3 MegaRAID RAID1> status: READY ar0: disk0 READY (master) using ad4 at ata2-master ar0: disk1 READY (mirror) using ad6 at ata3-master SMP: AP CPU #1 Launched! Trying to mount root from ufs:/dev/ar0s1a
Brian A. Seklecki
2007-Jan-12 09:33 UTC
Failure in SATA RAID1 during heavy disk activity (amd64)
Did you run PC-DOC and do a sector-test on each member disk? Or do you think its' != hardware, maybe kernel/interrupt/ata(4) problem? ~BAS On Mon, 11 Dec 2006, Toomas Aas wrote:> [Please Cc: any responses, I'm not currently subscribed to the list] > > Hardware: > Intel SE7230NH1-E motherboard, integrated SATA RAID (ICH7) > Two 500 GB HDDs set up as RAID1