Hi, I am seeing extremely poor performance (~100kB/s) when untaring large tar files into fresh ufs filesystems. I see the problem with softupdates and without softupdates but with an async mount. This is a Supermicro X7DB8 board, 4GB, 2 x Xeon 5140. Sample gstat output: dT: 1.033s w: 1.000s L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name 585 61 0 0 0.0 61 170 13812.0 100.1| da2 I see ms/w start at about 200ms with a ~3MB/s throughput, and then I see ms/w rise and kBps drop. ms/w goes as high as 16-20s, and then suddenly drops back down to about 200ms. Using iostat, while the performance is high(er), kb/t is 64kB, as the problem starts it drops towards 2kB. Copying a single large file doesn't exhibit this problem, although throughput isn't great (~3-5MB/s). However, that's better that 100kB/s. arcmsr0: <Areca SATA Host Adapter RAID Controller (RAID6 capable)> mem 0xd8900000-0xd8900fff,0xd8000000-0xd83fffff irq 16 at device 14.0 > on pci10ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07 ARECA RAID ADAPTER0: FIRMWARE VERSION V1.46 2008-08-06 arcmsr0: [ITHREAD] There are eight disks connected in a RAID-6 configuration. The controller's cache is write-through and the disks' write caches are disabled. NCQ is enabled on the drives. The same hardware when it ran 6.3-p1 didn't have this problem. However, the system BIOS was updated at the same time as the operating system (in an attempt to solve a recent em problem), so it is possible that it is a BIOS related problem. The same build on an entirely different machine with an aac controller and SAS disks also doesn't show this problem. Running 'devinfo -r' doesn't list arcmsr as having an interrupt at all. (see below). That strikes me as odd; checking another machine that is still running 6.2 with an arcmsr controller, I can see the interrupt just fine. So: - Does anyone have any suggestions? - Is it normal for arcmsr to not show an interrupt in the output from devinfo in 7.1? Full dmesg, devinfo below. Thanks, Jan Mikkelsen Copyright (c) 1992-2008 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.1-PRERELEASE #0: Mon Dec 1 14:53:12 EST 2008 root@valhalla.transactionware.com:/home/janm/p4/freebsd-image-std-2008.2/work/base-freebsd/home/janm/p4/freebsd-image-std-2008.2/FreeBSD/src/sys/TW-SMP Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU 5140 @ 2.33GHz (2333.35-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x6f6 Stepping = 6 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x4e3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> Cores per package: 2 usable memory = 4280651776 (4082 MB) avail memory = 4117843968 (3927 MB) ACPI APIC Table: <PTLTD APIC > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 24-47 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: <SMCI SMCISLP2> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci1 pci2: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci2 pci3: <ACPI PCI bus> on pcib3 pcib4: <ACPI PCI-PCI bridge> at device 0.0 on pci3 pci4: <ACPI PCI bus> on pcib4 ahd0: <Adaptec AIC7902 Ultra320 SCSI adapter> port 0x2400-0x24ff,0x2000-0x20ff mem 0xd8500000-0xd8501fff irq 16 at device 2.0 on pci4 ahd0: [ITHREAD] aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs ahd1: <Adaptec AIC7902 Ultra320 SCSI adapter> port 0x2c00-0x2cff,0x2800-0x28ff mem 0xd8502000-0xd8503fff irq 17 at device 2.1 on pci4 ahd1: [ITHREAD] aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs pcib5: <ACPI PCI-PCI bridge> at device 0.2 on pci3 pci5: <ACPI PCI bus> on pcib5 bge0: <Altima Gigabit Ethernet Controller, ASIC rev. 0x105> mem 0xd8600000-0xd860ffff irq 16 at device 1.0 on pci5 miibus0: <MII bus> on bge0 brgphy0: <BCM5701 10/100/1000baseTX PHY> PHY 1 on miibus0 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto bge0: Ethernet address: 00:40:f4:66:b1:56 bge0: [ITHREAD] pcib6: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci2 pci6: <ACPI PCI bus> on pcib6 em0: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0x3000-0x301f mem 0xd8400000-0xd841ffff irq 18 at device 0.0 on pci6 em0: Using MSI interrupt em0: [FILTER] em0: Ethernet address: 00:30:48:31:67:86 em1: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0x3020-0x303f mem 0xd8420000-0xd843ffff irq 19 at device 0.1 on pci6 em1: Using MSI interrupt em1: [FILTER] em1: Ethernet address: 00:30:48:31:67:87 pcib7: <ACPI PCI-PCI bridge> at device 0.3 on pci1 pci7: <ACPI PCI bus> on pcib7 pcib8: <ACPI PCI-PCI bridge> at device 4.0 on pci0 pci8: <ACPI PCI bus> on pcib8 pcib9: <ACPI PCI-PCI bridge> at device 6.0 on pci0 pci9: <ACPI PCI bus> on pcib9 pcib10: <PCI-PCI bridge> at device 0.0 on pci9 pci10: <PCI bus> on pcib10 arcmsr0: <Areca SATA Host Adapter RAID Controller (RAID6 capable)> mem 0xd8900000-0xd8900fff,0xd8000000-0xd83fffff irq 16 at device 14.0 > on pci10ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07 ARECA RAID ADAPTER0: FIRMWARE VERSION V1.46 2008-08-06 arcmsr0: [ITHREAD] pcib11: <PCI-PCI bridge> at device 0.2 on pci9 pci11: <PCI bus> on pcib11 pci0: <base peripheral> at device 8.0 (no driver attached) pcib12: <ACPI PCI-PCI bridge> irq 17 at device 28.0 on pci0 pci12: <ACPI PCI bus> on pcib12 uhci0: <Intel 631XESB/632XESB/3100 USB controller USB-1> port 0x1800-0x181f irq 17 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: <Intel 631XESB/632XESB/3100 USB controller USB-1> on uhci0 usb0: USB revision 1.0 uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: <Intel 631XESB/632XESB/3100 USB controller USB-2> port 0x1820-0x183f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: <Intel 631XESB/632XESB/3100 USB controller USB-2> on uhci1 usb1: USB revision 1.0 uhub1: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1 uhub1: 2 ports with 2 removable, self powered uhci2: <Intel 631XESB/632XESB/3100 USB controller USB-3> port 0x1840-0x185f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb2: <Intel 631XESB/632XESB/3100 USB controller USB-3> on uhci2 usb2: USB revision 1.0 uhub2: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb2 uhub2: 2 ports with 2 removable, self powered ehci0: <Intel 63XXESB USB 2.0 controller> mem 0xd8c00400-0xd8c007ff irq 17 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: <Intel 63XXESB USB 2.0 controller> on ehci0 usb3: USB revision 2.0 uhub3: <Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb3 uhub3: 6 ports with 6 removable, self powered pcib13: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci13: <ACPI PCI bus> on pcib13 vgapci0: <VGA-compatible display> port 0x4000-0x40ff mem 0xd0000000-0xd7ffffff,0xd8800000-0xd880ffff irq 18 at device 1.0 on pci13 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel 63XXESB2 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1860-0x186f at device 31.1 on pci0 ata0: <ATA channel 0> on atapci0 ata0: [ITHREAD] pci0: <serial bus, SMBus> at device 31.3 (no driver attached) acpi_button0: <Power Button> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model Generic PS/2 mouse, device ID 0 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: [FILTER] ppc0: <Parallel port> port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: <Parallel port bus> on ppc0 ppbus0: [ITHREAD] plip0: <PLIP network interface> on ppbus0 plip0: WARNING: using obsoleted IFF_NEEDSGIANT flag lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 ppc0: [GIANT-LOCKED] ppc0: [ITHREAD] cpu0: <ACPI CPU> on acpi0 est0: <Enhanced SpeedStep Frequency Control> on cpu0 p4tcc0: <CPU Frequency Thermal Control> on cpu0 cpu1: <ACPI CPU> on acpi0 est1: <Enhanced SpeedStep Frequency Control> on cpu1 p4tcc1: <CPU Frequency Thermal Control> on cpu1 cpu2: <ACPI CPU> on acpi0 est2: <Enhanced SpeedStep Frequency Control> on cpu2 p4tcc2: <CPU Frequency Thermal Control> on cpu2 cpu3: <ACPI CPU> on acpi0 est3: <Enhanced SpeedStep Frequency Control> on cpu3 p4tcc3: <CPU Frequency Thermal Control> on cpu3 orm0: <ISA Option ROM> at iomem 0xc0000-0xcafff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: DVDR <PIONEER DVD-RW DVR-111D/1.23> at ata0-master UDMA66 Waiting 5 seconds for SCSI devices to settle (probe46:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step da0 at arcmsr0 bus 0 target 0 lun 0 da0: <Areca ARC-1220-VOL#00 R001> Fixed Direct Access SCSI-5 device da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) da0: 77247MB (158201856 512 byte sectors: 255H 63S/T 9847C) da1 at arcmsr0 bus 0 target 1 lun 0 da1: <Areca ARC-1220-VOL#01 R001> Fixed Direct Access SCSI-5 device da1: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) da1: 953673MB (1953122304 512 byte sectors: 255H 63S/T 121576C) da2 at arcmsr0 bus 0 target 2 lun 0 da2: <Areca ARC-1220-VOL#02 R001> Fixed Direct Access SCSI-5 device da2: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) da2: 800131MB (1638669312 512 byte sectors: 255H 63S/T 102002C) sa0 at ahd1 bus 0 target 6 lun 0 sa0: <SEAGATE DAT 04106-XXX 7600> Removable Sequential Access SCSI-2 device sa0: 10.000MB/s transfers (10.000MHz, offset 15) SMP: AP CPU #1 Launched! SMP: AP CPU #3 Launched! SMP: AP CPU #2 Launched! Trying to mount root from ufs:/dev/da0s2a This module (opensolaris) contains code covered by the Common Development and Distribution License (CDDL) see http://opensolaris.org/os/licensing/opensolaris_license/ WARNING: ZFS is considered to be an experimental feature in FreeBSD. ZFS filesystem version 6 ZFS storage pool version 6 bge0: link state changed to UP em0: link state changed to UP em1: link state changed to UP nexus0 acpi0 Interrupt request lines: 9 I/O ports: 0x10-0x1f 0x24-0x25 0x28-0x29 0x2c-0x2d 0x2e-0x2f 0x30-0x31 0x34-0x35 0x38-0x39 0x3c-0x3d 0x4e-0x4f 0x50-0x53 0x63 0x65 0x67 0x72-0x77 0x80 0x90-0x9f 0xa4-0xa5 0xa8-0xa9 0xac-0xad 0xb0-0xb5 0xb8-0xb9 0xbc-0xbd 0x295-0x296 0x4d0-0x4d1 0x800-0x80f 0xca2-0xca3 0xca8-0xcaf 0x1000-0x107f 0x1180-0x11bf 0xfe00 I/O memory addresses: 0xe0000000-0xefffffff 0xfe000000-0xfe01ffff 0xfe600000-0xfe6fffff 0xfec80000-0xfec80fff 0xfed1c000-0xfed1ffff 0xfee00000-0xfee0ffff cpu0 acpi_perf0 est0 p4tcc0 cpufreq0 cpu1 acpi_perf1 est1 p4tcc1 cpufreq1 cpu2 acpi_perf2 est2 p4tcc2 cpufreq2 cpu3 acpi_perf3 est3 p4tcc3 cpufreq3 pcib0 pci0 hostb0 pcib1 pci1 pcib2 pci2 pcib3 pci3 pcib4 pci4 ahd0 Interrupt request lines: 16 I/O ports: 0x2000-0x20ff 0x2400-0x24ff I/O memory addresses: 0xd8500000-0xd8501fff ahd1 Interrupt request lines: 17 I/O ports: 0x2800-0x28ff 0x2c00-0x2cff I/O memory addresses: 0xd8502000-0xd8503fff pcib5 pci5 bge0 I/O memory addresses: 0xd8600000-0xd860ffff miibus0 brgphy0 pcib6 pci6 em0 Interrupt request lines: 256 I/O ports: 0x3000-0x301f I/O memory addresses: 0xd8400000-0xd841ffff em1 Interrupt request lines: 257 I/O ports: 0x3020-0x303f I/O memory addresses: 0xd8420000-0xd843ffff pcib7 pci7 pcib8 pci8 pcib9 pci9 pcib10 pci10 arcmsr0 I/O memory addresses: 0xd8000000-0xd83fffff 0xd8900000-0xd8900fff pcib11 pci11 hostb1 hostb2 hostb3 hostb4 hostb5 hostb6 hostb7 pcib12 pci12 uhci0 I/O ports: 0x1800-0x181f usb0 uhub0 uhci1 Interrupt request lines: 19 I/O ports: 0x1820-0x183f usb1 uhub1 uhci2 Interrupt request lines: 18 I/O ports: 0x1840-0x185f usb2 uhub2 ehci0 I/O memory addresses: 0xd8c00400-0xd8c007ff usb3 uhub3 pcib13 pci13 vgapci0 I/O ports: 0x4000-0x40ff I/O memory addresses: 0xd0000000-0xd7ffffff 0xd8800000-0xd880ffff isab0 isa0 sc0 vga0 I/O ports: 0x3c0-0x3df I/O memory addresses: 0xa0000-0xbffff orm0 I/O memory addresses: 0xc0000-0xcafff atapci0 I/O ports: 0x170-0x177 0x1f0-0x1f7 0x376 0x3f6 0x1860-0x186f ata0 Interrupt request lines: 14 acd0 acpi_sysresource0 atdma0 fpupnp0 attimer0 attimer1 pci_link0 pci_link1 pci_link2 pci_link3 pci_link4 pci_link5 pci_link6 pci_link7 atkbdc0 I/O ports: 0x60 0x64 atkbd0 Interrupt request lines: 1 psm0 Interrupt request lines: 12 psmcpnp0 sio0 Interrupt request lines: 4 I/O ports: 0x3f8-0x3ff sio1 Interrupt request lines: 3 I/O ports: 0x2f8-0x2ff fdc0 Interrupt request lines: 6 DMA request lines: 2 I/O ports: 0x3f0-0x3f5 0x3f7 ppc0 Interrupt request lines: 7 DMA request lines: 3 I/O ports: 0x378-0x37f ppbus0 plip0 lpt0 ppi0 acpi_button0 acpi_timer0 ACPI I/O ports: 0x1008-0x100b apic0 I/O memory addresses: 0xfec00000-0xfec0001f ram0 I/O memory addresses: 0x0-0x9dfff 0x100000-0xcff4ffff 0x100000000-0x12fffffff
Replying to my own post ... I have done a test on the same machine comparing 6.3-p1 to 7.1-PRE. The performance is the expected ~6MB/s (because of the lack of cache) on 6.3-p1, so the BIOS change doesn't seem to be at fault. This seems to be a regression somewhere between 6.3 to 7.1. The Areca driver is the same in 6.3 and 7.1, so the problem seems to be elsewhere. I think this is more than just a "performance" problem. The observations with gstat showing extremely high ms/w values (I have seen them as high as 22000) makes it look like IO completion interrupts are being lost. Any suggestions on where to look next? Are there obvious candidates? Jan Mikkelsen wrote:> Hi, > > I am seeing extremely poor performance (~100kB/s) when untaring large > tar files into fresh ufs filesystems. I see the problem with > softupdates and without softupdates but with an async mount. This is a > Supermicro X7DB8 board, 4GB, 2 x Xeon 5140. > > Sample gstat output: > > dT: 1.033s w: 1.000s > L(q) ops/s r/s kBps ms/r w/s kBps ms/w %busy Name > 585 61 0 0 0.0 61 170 13812.0 100.1| da2 > > I see ms/w start at about 200ms with a ~3MB/s throughput, and then I see > ms/w rise and kBps drop. ms/w goes as high as 16-20s, and then suddenly > drops back down to about 200ms. Using iostat, while the performance is > high(er), kb/t is 64kB, as the problem starts it drops towards 2kB. > > Copying a single large file doesn't exhibit this problem, although > throughput isn't great (~3-5MB/s). However, that's better that 100kB/s. > > arcmsr0: <Areca SATA Host Adapter RAID Controller (RAID6 capable) >> mem 0xd8900000-0xd8900fff,0xd8000000-0xd83fffff irq 16 at device 14.0 >> on pci10 > ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07 > ARECA RAID ADAPTER0: FIRMWARE VERSION V1.46 2008-08-06 > arcmsr0: [ITHREAD] > > There are eight disks connected in a RAID-6 configuration. The > controller's cache is write-through and the disks' write caches are > disabled. NCQ is enabled on the drives. > > The same hardware when it ran 6.3-p1 didn't have this problem. However, > the system BIOS was updated at the same time as the operating system (in > an attempt to solve a recent em problem), so it is possible that it is a > BIOS related problem. The same build on an entirely different machine > with an aac controller and SAS disks also doesn't show this problem. > > Running 'devinfo -r' doesn't list arcmsr as having an interrupt at all. > (see below). That strikes me as odd; checking another machine that is > still running 6.2 with an arcmsr controller, I can see the interrupt > just fine. > > So: > > - Does anyone have any suggestions? > > - Is it normal for arcmsr to not show an interrupt in the output from > devinfo in 7.1? > > Full dmesg, devinfo below. > > Thanks, > > Jan Mikkelsen > > > Copyright (c) 1992-2008 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 7.1-PRERELEASE #0: Mon Dec 1 14:53:12 EST 2008 > > root@valhalla.transactionware.com:/home/janm/p4/freebsd-image-std-2008.2/work/base-freebsd/home/janm/p4/freebsd-image-std-2008.2/FreeBSD/src/sys/TW-SMP > > Timecounter "i8254" frequency 1193182 Hz quality 0 > CPU: Intel(R) Xeon(R) CPU 5140 @ 2.33GHz (2333.35-MHz > K8-class CPU) > Origin = "GenuineIntel" Id = 0x6f6 Stepping = 6 > Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> > > Features2=0x4e3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA> > > AMD Features=0x20100800<SYSCALL,NX,LM> > AMD Features2=0x1<LAHF> > Cores per package: 2 > usable memory = 4280651776 (4082 MB) > avail memory = 4117843968 (3927 MB) > ACPI APIC Table: <PTLTD APIC > > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs > cpu0 (BSP): APIC ID: 0 > cpu1 (AP): APIC ID: 1 > cpu2 (AP): APIC ID: 6 > cpu3 (AP): APIC ID: 7 > ioapic0 <Version 2.0> irqs 0-23 on motherboard > ioapic1 <Version 2.0> irqs 24-47 on motherboard > kbd1 at kbdmux0 > ath_hal: 0.9.20.3 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) > acpi0: <SMCI SMCISLP2> on motherboard > acpi0: [ITHREAD] > acpi0: Power Button (fixed) > Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 > acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 > pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 > pci0: <ACPI PCI bus> on pcib0 > pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 > pci1: <ACPI PCI bus> on pcib1 > pcib2: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci1 > pci2: <ACPI PCI bus> on pcib2 > pcib3: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci2 > pci3: <ACPI PCI bus> on pcib3 > pcib4: <ACPI PCI-PCI bridge> at device 0.0 on pci3 > pci4: <ACPI PCI bus> on pcib4 > ahd0: <Adaptec AIC7902 Ultra320 SCSI adapter> port > 0x2400-0x24ff,0x2000-0x20ff mem 0xd8500000-0xd8501fff irq 16 at device > 2.0 on pci4 > ahd0: [ITHREAD] > aic7902: Ultra320 Wide Channel A, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs > ahd1: <Adaptec AIC7902 Ultra320 SCSI adapter> port > 0x2c00-0x2cff,0x2800-0x28ff mem 0xd8502000-0xd8503fff irq 17 at device > 2.1 on pci4 > ahd1: [ITHREAD] > aic7902: Ultra320 Wide Channel B, SCSI Id=7, PCI-X 67-100Mhz, 512 SCBs > pcib5: <ACPI PCI-PCI bridge> at device 0.2 on pci3 > pci5: <ACPI PCI bus> on pcib5 > bge0: <Altima Gigabit Ethernet Controller, ASIC rev. 0x105> mem > 0xd8600000-0xd860ffff irq 16 at device 1.0 on pci5 > miibus0: <MII bus> on bge0 > brgphy0: <BCM5701 10/100/1000baseTX PHY> PHY 1 on miibus0 > brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, > 1000baseT-FDX, auto > bge0: Ethernet address: 00:40:f4:66:b1:56 > bge0: [ITHREAD] > pcib6: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci2 > pci6: <ACPI PCI bus> on pcib6 > em0: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0x3000-0x301f mem > 0xd8400000-0xd841ffff irq 18 at device 0.0 on pci6 > em0: Using MSI interrupt > em0: [FILTER] > em0: Ethernet address: 00:30:48:31:67:86 > em1: <Intel(R) PRO/1000 Network Connection 6.9.6> port 0x3020-0x303f mem > 0xd8420000-0xd843ffff irq 19 at device 0.1 on pci6 > em1: Using MSI interrupt > em1: [FILTER] > em1: Ethernet address: 00:30:48:31:67:87 > pcib7: <ACPI PCI-PCI bridge> at device 0.3 on pci1 > pci7: <ACPI PCI bus> on pcib7 > pcib8: <ACPI PCI-PCI bridge> at device 4.0 on pci0 > pci8: <ACPI PCI bus> on pcib8 > pcib9: <ACPI PCI-PCI bridge> at device 6.0 on pci0 > pci9: <ACPI PCI bus> on pcib9 > pcib10: <PCI-PCI bridge> at device 0.0 on pci9 > pci10: <PCI bus> on pcib10 > arcmsr0: <Areca SATA Host Adapter RAID Controller (RAID6 capable) >> mem 0xd8900000-0xd8900fff,0xd8000000-0xd83fffff irq 16 at device 14.0 >> on pci10 > ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07 > ARECA RAID ADAPTER0: FIRMWARE VERSION V1.46 2008-08-06 > arcmsr0: [ITHREAD] > pcib11: <PCI-PCI bridge> at device 0.2 on pci9 > pci11: <PCI bus> on pcib11 > pci0: <base peripheral> at device 8.0 (no driver attached) > pcib12: <ACPI PCI-PCI bridge> irq 17 at device 28.0 on pci0 > pci12: <ACPI PCI bus> on pcib12 > uhci0: <Intel 631XESB/632XESB/3100 USB controller USB-1> port > 0x1800-0x181f irq 17 at device 29.0 on pci0 > uhci0: [GIANT-LOCKED] > uhci0: [ITHREAD] > usb0: <Intel 631XESB/632XESB/3100 USB controller USB-1> on uhci0 > usb0: USB revision 1.0 > uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 > uhub0: 2 ports with 2 removable, self powered > uhci1: <Intel 631XESB/632XESB/3100 USB controller USB-2> port > 0x1820-0x183f irq 19 at device 29.1 on pci0 > uhci1: [GIANT-LOCKED] > uhci1: [ITHREAD] > usb1: <Intel 631XESB/632XESB/3100 USB controller USB-2> on uhci1 > usb1: USB revision 1.0 > uhub1: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1 > uhub1: 2 ports with 2 removable, self powered > uhci2: <Intel 631XESB/632XESB/3100 USB controller USB-3> port > 0x1840-0x185f irq 18 at device 29.2 on pci0 > uhci2: [GIANT-LOCKED] > uhci2: [ITHREAD] > usb2: <Intel 631XESB/632XESB/3100 USB controller USB-3> on uhci2 > usb2: USB revision 1.0 > uhub2: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb2 > uhub2: 2 ports with 2 removable, self powered > ehci0: <Intel 63XXESB USB 2.0 controller> mem 0xd8c00400-0xd8c007ff irq > 17 at device 29.7 on pci0 > ehci0: [GIANT-LOCKED] > ehci0: [ITHREAD] > usb3: EHCI version 1.0 > usb3: companion controllers, 2 ports each: usb0 usb1 usb2 > usb3: <Intel 63XXESB USB 2.0 controller> on ehci0 > usb3: USB revision 2.0 > uhub3: <Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb3 > uhub3: 6 ports with 6 removable, self powered > pcib13: <ACPI PCI-PCI bridge> at device 30.0 on pci0 > pci13: <ACPI PCI bus> on pcib13 > vgapci0: <VGA-compatible display> port 0x4000-0x40ff mem > 0xd0000000-0xd7ffffff,0xd8800000-0xd880ffff irq 18 at device 1.0 on pci13 > isab0: <PCI-ISA bridge> at device 31.0 on pci0 > isa0: <ISA bus> on isab0 > atapci0: <Intel 63XXESB2 UDMA100 controller> port > 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1860-0x186f at device 31.1 on pci0 > ata0: <ATA channel 0> on atapci0 > ata0: [ITHREAD] > pci0: <serial bus, SMBus> at device 31.3 (no driver attached) > acpi_button0: <Power Button> on acpi0 > atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 > atkbd0: <AT Keyboard> irq 1 on atkbdc0 > kbd0 at atkbd0 > atkbd0: [GIANT-LOCKED] > atkbd0: [ITHREAD] > psm0: <PS/2 Mouse> irq 12 on atkbdc0 > psm0: [GIANT-LOCKED] > psm0: [ITHREAD] > psm0: model Generic PS/2 mouse, device ID 0 > sio0: configured irq 4 not in bitmap of probed irqs 0 > sio0: port may not be enabled > sio0: configured irq 4 not in bitmap of probed irqs 0 > sio0: port may not be enabled > sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on > acpi0 > sio0: type 16550A > sio0: [FILTER] > sio1: configured irq 3 not in bitmap of probed irqs 0 > sio1: port may not be enabled > sio1: configured irq 3 not in bitmap of probed irqs 0 > sio1: port may not be enabled > sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 > sio1: type 16550A > sio1: [FILTER] > fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 > fdc0: [FILTER] > ppc0: <Parallel port> port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0 > ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode > ppc0: FIFO with 16/16/9 bytes threshold > ppbus0: <Parallel port bus> on ppc0 > ppbus0: [ITHREAD] > plip0: <PLIP network interface> on ppbus0 > plip0: WARNING: using obsoleted IFF_NEEDSGIANT flag > lpt0: <Printer> on ppbus0 > lpt0: Interrupt-driven port > ppi0: <Parallel I/O> on ppbus0 > ppc0: [GIANT-LOCKED] > ppc0: [ITHREAD] > cpu0: <ACPI CPU> on acpi0 > est0: <Enhanced SpeedStep Frequency Control> on cpu0 > p4tcc0: <CPU Frequency Thermal Control> on cpu0 > cpu1: <ACPI CPU> on acpi0 > est1: <Enhanced SpeedStep Frequency Control> on cpu1 > p4tcc1: <CPU Frequency Thermal Control> on cpu1 > cpu2: <ACPI CPU> on acpi0 > est2: <Enhanced SpeedStep Frequency Control> on cpu2 > p4tcc2: <CPU Frequency Thermal Control> on cpu2 > cpu3: <ACPI CPU> on acpi0 > est3: <Enhanced SpeedStep Frequency Control> on cpu3 > p4tcc3: <CPU Frequency Thermal Control> on cpu3 > orm0: <ISA Option ROM> at iomem 0xc0000-0xcafff on isa0 > sc0: <System console> at flags 0x100 on isa0 > sc0: VGA <16 virtual consoles, flags=0x300> > vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 > Timecounters tick every 1.000 msec > acd0: DVDR <PIONEER DVD-RW DVR-111D/1.23> at ata0-master UDMA66 > Waiting 5 seconds for SCSI devices to settle > (probe46:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step > da0 at arcmsr0 bus 0 target 0 lun 0 > da0: <Areca ARC-1220-VOL#00 R001> Fixed Direct Access SCSI-5 device > da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) > da0: 77247MB (158201856 512 byte sectors: 255H 63S/T 9847C) > da1 at arcmsr0 bus 0 target 1 lun 0 > da1: <Areca ARC-1220-VOL#01 R001> Fixed Direct Access SCSI-5 device > da1: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) > da1: 953673MB (1953122304 512 byte sectors: 255H 63S/T 121576C) > da2 at arcmsr0 bus 0 target 2 lun 0 > da2: <Areca ARC-1220-VOL#02 R001> Fixed Direct Access SCSI-5 device > da2: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) > da2: 800131MB (1638669312 512 byte sectors: 255H 63S/T 102002C) > sa0 at ahd1 bus 0 target 6 lun 0 > sa0: <SEAGATE DAT 04106-XXX 7600> Removable Sequential Access SCSI-2 > device > sa0: 10.000MB/s transfers (10.000MHz, offset 15) > SMP: AP CPU #1 Launched! > SMP: AP CPU #3 Launched! > SMP: AP CPU #2 Launched! > Trying to mount root from ufs:/dev/da0s2a > This module (opensolaris) contains code covered by the > Common Development and Distribution License (CDDL) > see http://opensolaris.org/os/licensing/opensolaris_license/ > WARNING: ZFS is considered to be an experimental feature in FreeBSD. > ZFS filesystem version 6 > ZFS storage pool version 6 > bge0: link state changed to UP > em0: link state changed to UP > em1: link state changed to UP > > > nexus0 > acpi0 > Interrupt request lines: > 9 > I/O ports: > 0x10-0x1f > 0x24-0x25 > 0x28-0x29 > 0x2c-0x2d > 0x2e-0x2f > 0x30-0x31 > 0x34-0x35 > 0x38-0x39 > 0x3c-0x3d > 0x4e-0x4f > 0x50-0x53 > 0x63 > 0x65 > 0x67 > 0x72-0x77 > 0x80 > 0x90-0x9f > 0xa4-0xa5 > 0xa8-0xa9 > 0xac-0xad > 0xb0-0xb5 > 0xb8-0xb9 > 0xbc-0xbd > 0x295-0x296 > 0x4d0-0x4d1 > 0x800-0x80f > 0xca2-0xca3 > 0xca8-0xcaf > 0x1000-0x107f > 0x1180-0x11bf > 0xfe00 > I/O memory addresses: > 0xe0000000-0xefffffff > 0xfe000000-0xfe01ffff > 0xfe600000-0xfe6fffff > 0xfec80000-0xfec80fff > 0xfed1c000-0xfed1ffff > 0xfee00000-0xfee0ffff > cpu0 > acpi_perf0 > est0 > p4tcc0 > cpufreq0 > cpu1 > acpi_perf1 > est1 > p4tcc1 > cpufreq1 > cpu2 > acpi_perf2 > est2 > p4tcc2 > cpufreq2 > cpu3 > acpi_perf3 > est3 > p4tcc3 > cpufreq3 > pcib0 > pci0 > hostb0 > pcib1 > pci1 > pcib2 > pci2 > pcib3 > pci3 > pcib4 > pci4 > ahd0 > Interrupt request lines: > 16 > I/O ports: > 0x2000-0x20ff > 0x2400-0x24ff > I/O memory addresses: > 0xd8500000-0xd8501fff > ahd1 > Interrupt request lines: > 17 > I/O ports: > 0x2800-0x28ff > 0x2c00-0x2cff > I/O memory addresses: > 0xd8502000-0xd8503fff > pcib5 > pci5 > bge0 > I/O memory addresses: > 0xd8600000-0xd860ffff > miibus0 > brgphy0 > pcib6 > pci6 > em0 > Interrupt request lines: > 256 > I/O ports: > 0x3000-0x301f > I/O memory addresses: > 0xd8400000-0xd841ffff > em1 > Interrupt request lines: > 257 > I/O ports: > 0x3020-0x303f > I/O memory addresses: > 0xd8420000-0xd843ffff > pcib7 > pci7 > pcib8 > pci8 > pcib9 > pci9 > pcib10 > pci10 > arcmsr0 > I/O memory addresses: > 0xd8000000-0xd83fffff > 0xd8900000-0xd8900fff > pcib11 > pci11 > hostb1 > hostb2 > hostb3 > hostb4 > hostb5 > hostb6 > hostb7 > pcib12 > pci12 > uhci0 > I/O ports: > 0x1800-0x181f > usb0 > uhub0 > uhci1 > Interrupt request lines: > 19 > I/O ports: > 0x1820-0x183f > usb1 > uhub1 > uhci2 > Interrupt request lines: > 18 > I/O ports: > 0x1840-0x185f > usb2 > uhub2 > ehci0 > I/O memory addresses: > 0xd8c00400-0xd8c007ff > usb3 > uhub3 > pcib13 > pci13 > vgapci0 > I/O ports: > 0x4000-0x40ff > I/O memory addresses: > 0xd0000000-0xd7ffffff > 0xd8800000-0xd880ffff > isab0 > isa0 > sc0 > vga0 > I/O ports: > 0x3c0-0x3df > I/O memory addresses: > 0xa0000-0xbffff > orm0 > I/O memory addresses: > 0xc0000-0xcafff > atapci0 > I/O ports: > 0x170-0x177 > 0x1f0-0x1f7 > 0x376 > 0x3f6 > 0x1860-0x186f > ata0 > Interrupt request lines: > 14 > acd0 > acpi_sysresource0 > atdma0 > fpupnp0 > attimer0 > attimer1 > pci_link0 > pci_link1 > pci_link2 > pci_link3 > pci_link4 > pci_link5 > pci_link6 > pci_link7 > atkbdc0 > I/O ports: > 0x60 > 0x64 > atkbd0 > Interrupt request lines: > 1 > psm0 > Interrupt request lines: > 12 > psmcpnp0 > sio0 > Interrupt request lines: > 4 > I/O ports: > 0x3f8-0x3ff > sio1 > Interrupt request lines: > 3 > I/O ports: > 0x2f8-0x2ff > fdc0 > Interrupt request lines: > 6 > DMA request lines: > 2 > I/O ports: > 0x3f0-0x3f5 > 0x3f7 > ppc0 > Interrupt request lines: > 7 > DMA request lines: > 3 > I/O ports: > 0x378-0x37f > ppbus0 > plip0 > lpt0 > ppi0 > acpi_button0 > acpi_timer0 > ACPI I/O ports: > 0x1008-0x100b > apic0 > I/O memory addresses: > 0xfec00000-0xfec0001f > ram0 > I/O memory addresses: > 0x0-0x9dfff > 0x100000-0xcff4ffff > 0x100000000-0x12fffffff > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
At 02:58 PM 12/15/2008, Paul MacKenzie wrote:>This used to be on a 4.11x system with 1 cpu and only 1gb of ram and >ran flawlessly with much less resources with the same web site code >for a long time. I do not have this problem on the other 7.0 >machine. I originally thought it was just a cpu issue but it is very >closely tied to when something is trying to use the raid arrays and >this seems to be the way to reproduce it. > >I am having a hard time determining why the system load is so high. >Can you recommend the best way to identify the culprit?What does top -S show ? Most of the load is in system. Does the machine in question have a rather large master.passwd file by chance ? (http://www.freebsd.org/cgi/query-pr.cgi?pr=75855) ---Mike
At 03:27 PM 12/15/2008, Paul MacKenzie wrote:> > What does top -S show ? Most of the load is in system. Does the > > machine in question have a rather large master.passwd file by chance ? > > (http://www.freebsd.org/cgi/query-pr.cgi?pr=75855) > > ---Mike > > >Thanks for your quick reply: > >master.passwd is only 9467 (with a ls-l)I would try the change to /etc/nsswitch.conf so that group and passwd read group: files passwd: files At that file size, it sounds like you only have about 200 entries ? I doubt its the issue, but its worth a try. I know at around 9,000 files anything to do with UID lookups (e.g. ls -l) takes forever.> PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT >COMMAND > 54 root 2 0 0 7 0 7 100.00% syncerDo you have any other special tuning other than polling ? Any in /boot/loader.conf or /etc/sysctl.conf ? Does gstat show the disks busy ? ---Mike
> I would try the change to /etc/nsswitch.conf so that group and passwd > read > > group: files > passwd: files > > At that file size, it sounds like you only have about 200 entries ? I > doubt its the issue, but its worth a try. I know at around 9,000 > files anything to do with UID lookups (e.g. ls -l) takes forever. > > >> PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT >> COMMAND >> 54 root 2 0 0 7 0 7 100.00% >> syncer > > Do you have any other special tuning other than polling ? Any in > /boot/loader.conf or /etc/sysctl.conf ? > > Does gstat show the disks busy ? > > ---Mike >The next thing I am doing is going to be removing the QUOTA feature to see if this has any bearing on this problem. It does not appear to be even writing at a heavy load as you can see (almost nothing) but the processes are mostly in UFS when it spirals out of control. I moved the processing of amavisd-new into a memory drive to at least take that off the IO and this seems to have helped a bit. There is not a lot of mail going through the system but every little bit helps. I suspect this is one other reason that is bringing the problem to the forefront as amavisd-new can use the disks a bit to process each e-mail. Thanks for your help so far, Paul
> > I would also try disabling polling. Is you scheduler ULE or BSD? For > an 8 core box, it should be ULE > > ---MikeHi Mike, Thanks I will try this now as I have not tried this yet. Here is the current custom kernel and it is using ULE: cpu HAMMER ident MYCOMPUTER makeoptions DEBUG=-g # Build kernel with gdb(1) debug symbols options SCHED_ULE # ULE scheduler options PREEMPTION # Enable kernel thread preemption options INET # InterNETworking options INET6 # IPv6 communications protocols options SCTP # Stream Control Transmission Protocol options FFS # Berkeley Fast Filesystem options SOFTUPDATES # Enable FFS soft updates support options UFS_ACL # Support for access control lists options UFS_DIRHASH # Improve performance on big directories options UFS_GJOURNAL # Enable gjournal-based UFS journaling options MD_ROOT # MD is a potential root device options CD9660 # ISO 9660 Filesystem options PROCFS # Process filesystem (requires PSEUDOFS) options PSEUDOFS # Pseudo-filesystem framework options GEOM_PART_GPT # GUID Partition Tables. options GEOM_LABEL # Provides labelization options COMPAT_43TTY # BSD 4.3 TTY compat [KEEP THIS!] options COMPAT_IA32 # Compatible with i386 binaries options COMPAT_FREEBSD4 # Compatible with FreeBSD4 options COMPAT_FREEBSD5 # Compatible with FreeBSD5 options COMPAT_FREEBSD6 # Compatible with FreeBSD6 options SCSI_DELAY=5000 # Delay (in ms) before probing SCSI options KTRACE # ktrace(1) support options STACK # stack(9) support options SYSVSHM # SYSV-style shared memory options SYSVMSG # SYSV-style message queues options SYSVSEM # SYSV-style semaphores options _KPOSIX_PRIORITY_SCHEDULING # POSIX P1003_1B real-time extensions options KBD_INSTALL_CDEV # install a CDEV entry in /dev options ADAPTIVE_GIANT # Giant mutex is adaptive. options STOP_NMI # Stop CPUS using NMI instead of IPI options AUDIT # Security event auditing options SMP # Symmetric MultiProcessor Kernel options IPFIREWALL # Ip Firewall options IPFIREWALL_VERBOSE # Verbose options IPFIREWALL_VERBOSE_LIMIT=5000 # limit verbosity options QUOTA # Disk Quota options DEVICE_POLLING device cpufreq device acpi device pci device fdc device ata device atadisk # ATA disk drives device ataraid # ATA RAID drives device atapicd # ATAPI CDROM drives device atapifd # ATAPI floppy drives device atapist # ATAPI tape drives options ATA_STATIC_ID # Static device numbering device scbus # SCSI bus (required for SCSI) device ch # SCSI media changers device da # Direct Access (disks) device sa # Sequential Access (tape etc) device cd # CD device pass # Passthrough device (direct SCSI access) device ses # SCSI Environmental Services (and SAF-TE) device arcmsr # Areca SATA II RAID device atkbdc # AT keyboard controller device atkbd # AT keyboard device psm # PS/2 mouse device kbdmux # keyboard multiplexer device vga # VGA video card driver device splash # Splash screen and screen saver support device sc device agp # support several AGP chipsets device sio # 8250, 16[45]50 based serial ports device uart # Generic UART driver device ppc device ppbus # Parallel port bus (required) device lpt # Printer device plip # TCP/IP over parallel device ppi # Parallel port interface device device em # Intel PRO/1000 adapter Gigabit Ethernet Card device miibus # MII bus support device loop # Network loopback device random # Entropy device device ether # Ethernet support device sl # Kernel SLIP device ppp # Kernel PPP device tun # Packet tunnel. device pty # Pseudo-ttys (telnet etc) device md # Memory "disks" device gif # IPv6 and IPv4 tunneling device faith # IPv6-to-IPv4 relaying (translation) device firmware # firmware assist module device lagg # lagg interface for shared multi homed network connection device bpf # Berkeley packet filter device uhci # UHCI PCI->USB interface device ohci # OHCI PCI->USB interface device ehci # EHCI PCI->USB interface (USB 2.0) device usb # USB Bus (required) device ugen # Generic device uhid # "Human Interface Devices" device ukbd # Keyboard device umass # Disks/Mass storage - Requires scbus and da device ums # Mouse device uplcom device ucom
At 05:29 PM 12/15/2008, Paul MacKenzie wrote:>The next thing I am doing is going to be removing the QUOTA feature >to see if this has any bearing >on this problem. It does not appear to be even writing at a heavy >load as you can see (almost >nothing) but the processes are mostly in UFS when it spirals out of control.Whats strange is that the output from gstat shows the disks hardly active at all.... Yet why is the syncer at 100% ? Do you have write caching disabled on the array ? What does the raw throughput look like to the disks ? e.g. if you try a simple dd if=/dev/zero of=/var/tmp bs=1024k count=1000 ?>I moved the processing of amavisd-new into a memory drive to at >least take that off the IO and this >seems to have helped a bit. There is not a lot of mail going through >the system but every little bit >helps. I suspect this is one other reason that is bringing the >problem to the forefront as >amavisd-new can use the disks a bit to process each e-mail.Is the high load average simply a function of processes blocking on network io ? On our av/spam scanners for example show a high load avg because there are many processes waiting on network io to complete (e.g. talking to RBL lists, waiting for DCC servers to complete etc) Also, is it really related to the arcmsr driver ? i.e. if you did the same tasks on a single IDE drive, is the performance profile going to be the same ? ---Mike
> What does top -S show ? Most of the load is in system. Does the > machine in question have a rather large master.passwd file by chance ? > (http://www.freebsd.org/cgi/query-pr.cgi?pr=75855) > ---Mike >Thanks for your quick reply: master.passwd is only 9467 (with a ls-l) TOP -ISM at times shows syncer at the top but this ranges and is not always near the top. last pid: 55084; load averages: 17.74, 10.08, 5.58 up 0+10:19:24 15:05:23 290 processes: 50 running, 218 sleeping, 14 waiting, 8 lock CPU: 15.4% user, 0.0% nice, 68.3% system, 3.0% interrupt, 13.2% idle Mem: 795M Active, 3279M Inact, 492M Wired, 6116K Cache, 214M Buf, 11G Free Swap: 8192M Total, 8192M Free PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND 54 root 2 0 0 7 0 7 100.00% syncer Here is a top with it not fully locked but high system usage. last pid: 55468; load averages: 9.93, 11.31, 8.99 up 0+10:32:58 15:18:57 259 processes: 19 running, 215 sleeping, 14 waiting, 11 lock CPU: 19.1% user, 0.0% nice, 58.2% system, 1.9% interrupt, 20.8% idle Mem: 635M Active, 3258M Inact, 481M Wired, 6856K Cache, 214M Buf, 11G Free Swap: 8192M Total, 8192M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 18 root 1 171 ki31 0K 16K RUN 0 439:32 31.15% idle: cpu0 55422 www 1 102 0 193M 59632K RUN 5 0:26 30.96% httpd 12 root 1 171 ki31 0K 16K RUN 6 522:14 28.37% idle: cpu6 54 root 1 20 - 0K 16K syncer 2 81:19 28.37% syncer 15 root 1 171 ki31 0K 16K RUN 3 465:15 26.56% idle: cpu3 55411 www 1 -4 0 157M 33704K *vnode 1 0:21 26.17% httpd 55388 www 1 -4 0 160M 35940K *vnode 1 0:14 26.17% httpd 13 root 1 171 ki31 0K 16K RUN 5 509:35 25.98% idle: cpu5 11 root 1 171 ki31 0K 16K RUN 7 525:53 25.88% idle: cpu7 14 root 1 171 ki31 0K 16K RUN 4 491:32 25.29% idle: cpu4 55453 www 1 101 0 157M 33608K CPU7 7 0:08 24.76% httpd 55365 www 1 -4 0 157M 33408K ufs 3 0:23 24.56% httpd 55440 www 1 69 0 154M 31180K CPU2 7 0:09 24.37% httpd 55412 www 1 -4 0 153M 30156K *vnode 3 0:07 23.97% httpd 16 root 1 171 ki31 0K 16K CPU2 2 444:38 23.88% idle: cpu2 55376 www 1 -4 0 158M 34776K *vnode 0 0:26 23.88% httpd 55459 www 1 -4 0 145M 23920K *vnode 1 0:07 23.49% httpd 55467 www 1 70 0 154M 31056K *vnode 7 0:09 22.66% httpd 17 root 1 171 ki31 0K 16K CPU1 1 443:27 20.90% idle: cpu1 55374 www 1 -4 0 146M 25312K *vnode 7 0:09 13.38% httpd 55418 www 1 -4 0 145M 24192K ufs 0 0:18 12.89% httpd 55400 www 1 58 0 146M 25460K select 5 0:20 12.79% httpd 55443 www 1 -4 0 148M 25788K *vnode 1 0:03 12.50% httpd 55410 www 1 -4 0 147M 25700K *vnode 7 0:05 12.26% httpd 55438 www 1 -4 0 145M 24148K RUN 4 0:08 11.96% httpd 21 root 1 -44 - 0K 16K WAIT 0 34:45 11.77% swi1: net 55451 www 1 -4 0 144M 22704K *vnode 7 0:02 10.99% httpd 55447 www 1 60 0 145M 24008K select 2 0:07 10.50% httpd 55406 www 1 53 0 146M 25324K select 2 0:19 9.77% httpd 55433 www 1 49 0 146M 24912K select 2 0:11 8.06% httpd 55448 www 1 52 0 144M 22972K RUN 6 0:03 8.06% httpd 55383 www 1 45 0 145M 24284K select 2 0:12 7.96% httpd 55446 www 1 44 0 146M 24988K select 3 0:09 7.96% httpd 55430 www 1 4 0 145M 24136K kqread 0 0:03 6.69% httpd 55432 www 1 20 0 146M 24324K lockf 3 0:04 6.05% httpd 55464 www 1 -4 0 145M 23464K RUN 0 0:02 5.66% httpd 55424 www 1 45 0 146M 24876K select 6 0:08 3.66% httpd 55442 www 1 47 0 145M 23852K select 3 0:03 3.56% httpd 55373 www 1 48 0 146M 25364K select 5 0:07 3.17% httpd 55375 www 1 46 0 146M 25420K select 2 0:15 3.08% httpd 19 root 1 -32 - 0K 16K *Giant 2 9:02 2.98% swi4: clock sio 48518 wusage 1 46 0 10424K 2632K select 4 2:50 2.78% wusage 1490 mysql 97 4 -5 402M 184M sbwait 4 0:29 2.78% mysqld 55372 www 1 47 0 144M 22136K CPU6 0 0:01 2.59% httpd 55437 root 1 -32 0 9136K 2940K CPU4 4 0:01 2.59% top 55387 www 1 45 0 144M 22196K CPU5 1 0:02 2.39% httpd 55468 www 1 20 0 144M 21904K lockf 4 0:00 1.56% httpd 55441 www 1 45 0 144M 22088K select 5 0:01 1.46% httpd 51563 root 1 4 0 11848K 5540K connec 5 0:09 1.37% sendmail 55458 www 1 45 0 144M 22140K select 6 0:01 1.17% httpd 55336 root 1 51 0 144M 21888K CPU0 0 0:05 1.07% httpd 55455 www 1 45 0 144M 21992K select 0 0:01 1.07% httpd 55425 www 1 20 0 146M 25304K lockf 3 0:07 0.78% httpd 55415 www 1 45 0 144M 22192K select 7 0:01 0.68% httpd 55439 www 1 44 0 144M 22308K select 1 0:01 0.49% httpd 51561 root 1 45 0 11848K 5400K select 0 0:13 0.29% sendmail 54666 root 1 4 0 10824K 4496K connec 3 0:04 0.20% sendmail 55469 root 1 51 0 144M 21888K CPU3 3 0:00 0.00% httpd
> > [ns8]# vmstat -i > > interrupt total rate > > irq4: sio0 57065 0 > > irq17: em1 3989494045 554 > > irq18: arcmsr0 558098657 77 > > cpu0: timer 14381393929 2000 > > irq256: em0 22763077 3 > > cpu1: timer 14381384902 1999 > > Total 33333191675 4635 > > [ns8]# > > > > arcmsr0: <Areca SATA Host Adapter RAID Controller > >> > > mem 0xe8600000-0xe8600fff,0xe8000000-0xe83fffff irq 18 at device >> > > 14.0 on pci2 > > ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07 > > ARECA RAID ADAPTER0: FIRMWARE VERSION V1.43 2007-4-17 > > arcmsr0: [ITHREAD] > > ..... > > Waiting 5 seconds for SCSI devices to settle > > (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step > > da0 at arcmsr0 bus 0 target 0 lun 0 > > da0: <Areca ARC-1210-VOL#00 R001> Fixed Direct Access SCSI-5 device > > da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) > > da0: 305175MB (624999424 512 byte sectors: 255H 63S/T 38904C) > > SMP: AP CPU #1 Launched! >Hi and thanks for your reply. I do not believe the interrupts are the problem at the moment as the stats. Here is a vmstat when the system usage is spiking and just before http needs to be killed to get going again most recently. vmstat -i interrupt total rate irq1: atkbd0 2 0 irq4: sio0 22880 0 irq14: ata0 58 0 irq22: uhci1 uhci3 18068 0 irq23: uhci0 uhci+ 1 0 irq26: arcmsr0 496094 14 cpu0: timer 61769334 1791 irq256: em0 1 0 irq258: em2 48505 1 irq259: em3 1 0 cpu1: timer 61762043 1791 cpu3: timer 61299367 1777 cpu2: timer 61299179 1777 cpu4: timer 61326132 1778 cpu7: timer 60845245 1764 cpu5: timer 61326513 1778 cpu6: timer 60845018 1764 Total 491058441 14243 There are no errors en the event console for the areca-cli. ARC-1130-VOL#00 Main Raid Array Raid1+0 1000.0GB 00/00/00 Normal Main Raid Array 4 2000.0GB 0.0GB 1234 Normal Main Processor : 500MHz CPU ICache Size : 32KB CPU DCache Size : 32KB CPU SCache Size : 0KB System Memory : 1024MB/333MHz/ECC Firmware Version : V1.44 2008-2-1 BOOT ROM Version : V1.44 2008-1-28 The buildworld taking a really long time was just an example of the problem I am seeing that is easy to quantify. If I run boxbackup, dump, clamscan or a few other IO intensive everything gets VERY slow even when reading files from the server. When the HTTP locks up (another issue that is seen and is connected to the same issue in my view) this is what it looks like. It is almost as if the http gets backed up from what I can tell and I need a plunger to clean out the blockage :) I have to kill it and then restart it to get things back to normal for a bit. last pid: 46013; load averages: 105.30, 67.67, 34.45 up 4+23:59:42 19:08:40 629 processes: 89 running, 540 sleeping CPU: 21.9% user, 0.0% nice, 74.5% system, 3.1% interrupt, 0.4% idle Mem: 1538M Active, 11G Inact, 898M Wired, 303M Cache, 214M Buf, 1346M Free Swap: 8192M Total, 1036K Used, 8191M Free PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 46000 www 1 65 0 86728K 15008K RUN 1 0:01 12.06% httpd 45994 www 1 56 0 86728K 15008K CPU1 3 0:01 10.16% httpd 46002 www 1 -4 0 150M 20648K RUN 3 0:00 6.98% httpd 45195 www 1 68 0 121M 19748K RUN 1 0:29 6.88% httpd 45991 www 1 53 0 150M 21060K select 3 0:01 6.59% httpd 45997 www 1 -4 0 150M 20992K ufs 5 0:01 6.59% httpd 45950 www 1 57 0 153M 23388K RUN 2 0:01 6.49% httpd 45999 www 1 -4 0 150M 20640K ufs 6 0:00 5.96% httpd 45189 www 1 66 0 161M 29660K RUN 6 0:26 5.76% httpd 45974 www 1 -4 0 151M 21564K ufs 3 0:01 5.76% httpd 45984 www 1 -4 0 151M 21376K ufs 5 0:01 5.66% httpd 45998 www 1 -4 0 150M 20652K ufs 3 0:00 5.57% httpd 45780 www 1 -4 0 153M 23516K ufs 6 0:06 5.37% httpd 45972 www 1 -4 0 151M 21332K RUN 4 0:01 5.37% httpd 46001 www 1 20 0 150M 20568K lockf 4 0:00 5.37% httpd 45425 www 1 60 0 164M 31516K RUN 7 0:15 5.18% httpd 45995 www 1 63 0 150M 20820K RUN 2 0:00 5.18% httpd 45845 www 1 -4 0 151M 21692K ufs 6 0:02 4.98% httpd 45854 www 1 52 0 151M 22080K CPU6 0 0:02 4.88% httpd 45977 root 1 47 0 10160K 3260K CPU2 6 0:01 4.88% top 45509 www 1 56 0 155M 25028K RUN 0 0:14 4.79% httpd 45735 www 1 -4 0 158M 27096K RUN 3 0:07 4.79% httpd 45730 www 1 20 0 151M 21728K lockf 2 0:04 4.79% httpd 45847 www 1 -4 0 150M 21092K RUN 5 0:02 4.69% httpd 85338 root 1 46 0 150M 20560K select 7 5:03 4.59% httpd 45835 www 1 -4 0 150M 21100K ufs 0 0:02 4.59% httpd 45443 www 1 -4 0 151M 22220K ufs 6 0:12 4.49% httpd 45699 www 1 -4 0 157M 26528K RUN 0 0:07 4.39% httpd 45722 www 1 -4 0 152M 22908K RUN 0 0:05 4.39% httpd 45701 www 1 -4 0 152M 22268K RUN 2 0:07 4.30% httpd 45849 www 1 -4 0 151M 21748K ufs 6 0:02 4.30% httpd 46010 nagios 1 -4 0 7684K 1136K ufs 5 0:00 4.30% check_ping 45515 www 1 -4 0 160M 29048K ufs 5 0:13 4.20% httpd 45606 www 1 -4 0 152M 22420K ufs 0 0:09 4.20% httpd vfs.numvnodes: 355382 kern.maxvnodes: 400000 vfs.ufs.dirhash_docheck: 0 vfs.ufs.dirhash_mem: 3239015 vfs.ufs.dirhash_maxmem: 10485760 vfs.ufs.dirhash_minsize: 2560 kern.ipc.nsfbufs: 0 kern.openfiles: 3395 kern.maxfiles: 12328 Results from netstat -m ------------------------ 1185/3360/4545 mbufs in use (current/cache/total) 1062/2856/3918/25600 mbuf clusters in use (current/cache/total/max) 1062/1556 mbuf+clusters out of packet secondary zone in use (current/cache) 10/1550/1560/12800 4k (page size) jumbo clusters in use (current/cache/total/max) 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max) 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max) 2460K/12752K/15212K bytes allocated to network (current/cache/total) 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) 0/0/0 requests for jumbo clusters denied (4k/9k/16k) 0/0/0 sfbufs in use (current/peak/max) 0 requests for sfbufs denied 0 requests for sfbufs delayed 46262 requests for I/O initiated by sendfile 0 calls to protocol drain routines Results from vmstat -m ------------------------ Type InUse MemUse HighUse Requests Size(s) cdev 22 6K - 22 256 acd_driver 1 2K - 1 2048 sigio 1 1K - 1626 64 filedesc 684 941K - 1199696 16,32,64,128,256,512,1024,2048,4096 kenv 68 11K - 70 16,32,64 kqueue 368 414K - 1740632 256,2048,4096 proc-args 52 4K - 5389885 16,32,64,128,256 ithread 99 19K - 99 32,128,256 acpisem 13 2K - 13 128 CAM queue 12 1K - 302 16,32,64,128,256 KTRACE 100 13K - 100 128 linker 45 4K - 71 16,32,64,128,512 lockf 314 38K - 16413112 64,128,256,512,1024,2048,4096 scsi_da 0 0K - 65 16 ip6ndp 7 1K - 7 64,128 ip6opt 1 1K - 50503 256 temp 66 222K - 6704801 16,32,64,128,256,512,1024,2048,4096 devbuf 16781 35476K - 108258 16,32,64,128,256,512,1024,2048,4096 CAM SIM 2 1K - 2 256 module 204 26K - 204 128 acpidev 78 5K - 78 64 mtx_pool 1 8K - 1 subproc 1111 1606K - 1045562 512,4096 proc 2 16K - 2 session 34 5K - 20772 128 pgrp 39 5K - 158890 128 cred 24950 6238K - 11839905 256 uidinfo 13 3K - 7337 64,2048 plimit 24 6K - 226179 256 CAM periph 7 2K - 45 16,32,64,128,256 sysctltmp 0 0K - 215050 16,32,64,128,256 sysctloid 4373 216K - 4373 16,32,64,128 sysctl 0 0K - 828292 16,32,64 umtx 1692 212K - 1692 128 p1003.1b 1 1K - 1 16 SWAP 2 1097K - 2 64 CAM XPT 51 24K - 19790153 32,64,128,256,1024 bus-sc 111 101K - 1879 16,32 ,64,128,256,512,1024,2048,4096 bus 804 77K - 5926 16,32,64,128,256,1024 devstat 10 21K - 10 32,4096 eventhandler 57 5K - 57 64,128 kobj 125 500K - 160 4096 kbdmux 6 9K - 6 16,256,512,2048,4096 rman 168 21K - 576 16,64,128 sbuf 0 0K - 840 16,32,64,128,256,512,1024,2048,4096 pci_link 16 2K - 16 16,128 stack 0 0K - 14 256 taskqueue 19 2K - 19 16,32,128 Unitno 16 1K - 22074 32,64 iov 0 0K - 12126863 16,64,128,256,512 ioctlops 0 0K - 388714 16,32,64,128,256,512,1024,2048 msg 4 30K - 4 2048,4096 sem 4 8K - 4 512,1024,2048,4096 shm 1 16K - 1 ttys 1170 169K - 80824 128,1024 ptys 5 2K - 5 256 accf 3 1K - 301 32,64 mbuf_tag 0 0K - 520852 32,128 pcb 47 158K - 1332310 16,32,128,1024,2048,4096 soname 187 23K - 10680643 16,32,128 biobuf 1 2K - 143707 2048 vfscache 1 1024K - 1 cl_savebuf 0 0K - 154293 64,128 vfs_hash 1 512K - 1 vnodes 2 1K - 3 32,256 vnodemarker 1 1K - 124142 512 mount 111 6K - 495 16,32,64,128,256,2048 acpi_perf 8 1K - 8 64 BPF 6 1K - 6 128 ether_multi 29 2K - 32 16,32,64 ifaddr 136 48K - 136 32,64,128,256,512,4096 ifnet 7 13K - 7 256,2048 clone 6 24K - 6 4096 arpcom 5 1K - 5 16 lo 1 1K - 1 32 acpica 3057 292K - 68659 16,32,64,128,256,512,1024 routetbl 303 86K - 1027 32,64,128 ,256,512 in_multi 4 1K - 4 64 IpFw/IpAcct 60 9K - 60 64,128,2048 sctp_iter 0 0K - 65 256 sctp_ifn 4 1K - 4 128 sctp_ifa 66 9K - 66 128 sctp_vrf 1 1K - 1 64 sctp_a_it 0 0K - 65 16 hostcache 1 36K - 1 entropy 1024 64K - 1024 64 syncache 1 100K - 1 in6_multi 16 1K - 16 32,64,128 audit_evclass 150 5K - 187 32 savedino 0 0K - 406078 256 newdirblk 0 0K - 5047 64 dirrem 18 2K - 2259617 64 mkdir 1 1K - 283528 64 diradd 183 12K - 3426340 64 freefile 55 4K - 1081462 64 freeblks 26 7K - 792864 256 freefrag 2 1K - 781740 64 allocindir 5 1K - 2842332 128 indirdep 4 1K - 116594 64 allocdirect 62 16K - 4832896 256 bmsafemap 12 2K - 271759 128 newblk 1 1K - 7675229 64,512 inodedep 270 580K - 2593883 256 pagedep 12 130K - 318828 128 ufs_dirhash 2848 1230K - 42435 16,32,64,128,256,512,1024,2048,4096 ufs_quota 1 512K - 1 ufs_mount 15 241K - 51 128,256,512,2048,4096 UMAHash 5 572K - 33 512,1024,2048,4096 USBHC 0 0K - 660 16 USBdev 22 10K - 682 16,128,512 USB 761 683K - 4079 16,32,64,128,256,1024 vm_pgdata 2 129K - 2 128 DEVFS1 115 58K - 115 512 DEVFS3 250 63K - 251 256 DEVFS2 115 2K - 115 16 DEVFS_RULE 36 17K - 36 64,512 DEVFS 30 1K - 31 16,128 io_apic 2 4K - 2 2048 pfs_nodes 20 5K - 20 256 memdesc 1 4K - 2 4096 msi 4 1K - 4 128 nexusdev 4 1K - 4 16 acpitask 0 0K - 9 64 GEOM 104 20K - 882 16,32,64,128,256,512,1024,2048 atkbddev 2 1K - 2 64 isadev 7 1K - 7 128 CAM dev queue 2 1K - 2 128 ata_generic 1 1K - 1 1024 ata_dma 1 1K - 1 256 Results from systat -v ----------------------- 1 users Load 143 90.86 47.13 Nov 21 19:10 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 1754100 25964 4719924 55728 1551492 count All 1916252 113912 9413004 269144 pages Proc: Interrupts r p d s w Csw Trp Sys Int Sof Flt 179 cow 16002 total 73 133 454 2 32k 816 2520 3 29k 726 504 zfod atkbd0 1 ozfod sio1 irq3 86.8%Sys 3.5%Intr 9.2%User 0.0%Nice 0.6%Idle %ozfod sio0 irq4 | | | | | | | | | | | daefr ata0 irq14 ===========================================++>>>>> 16 prcfr uhci1 uhci 314 dtbuf 90 totfr uhci0 uhci Namei Name-cache Dir-cache 400000 desvn react 2 arcmsr0 26 Calls hits % hits % 355344 numvn pdwak 2004 cpu0: time 76763 76730 100 24902 frevn pdpgs em0 irq256 intrn 1 em2 irq258 Disks da0 da1 pass0 pass1 pass2 934624 wire em3 irq259 KB/t 9.00 0.00 0.00 0.00 0.00 1697060 act 2000 cpu1: time tps 1 0 0 0 0 12038912 inact 1996 cpu2: time MB/s 0.01 0.00 0.00 0.00 0.00 308732 cache 2000 cpu3: time %busy 0 0 0 0 0 1244784 free 2001 cpu7: time 219632 buf 1999 cpu4: time 1999 cpu6: time 2000 cpu5: time Here is a "normal" sysstat -v to compare when there are no "visible" problems: 3 users Load 1.67 1.03 1.02 Nov 25 22:12 Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER Tot Share Tot Share Free in out in out Act 797576 31388 2318500 57324 4051340 count All 952256 114916 6781828 226696 pages Proc: Interrupts r p d s w Csw Trp Sys Int Sof Flt 556 cow 16001 total 1 6 474 5463 1602 3387 1 31k 1567 853 zfod atkbd0 1 ozfod sio1 irq3 8.4%Sys 4.0%Intr 2.8%User 0.0%Nice 84.8%Idle %ozfod sio0 irq4 | | | | | | | | | | | daefr ata0 irq14 ====++>> 602 prcfr uhci1 uhci 125 dtbuf 1443 totfr uhci0 uhci Namei Name-cache Dir-cache 400000 desvn react arcmsr0 26 Calls hits % hits % 328748 numvn pdwak 2026 cpu0: time 52734 52660 100 24705 frevn pdpgs em0 irq256 intrn 1 em2 irq258 Disks da0 da1 pass0 pass1 pass2 857028 wire em3 irq259 KB/t 0.00 0.00 0.00 0.00 0.00 750716 act 2026 cpu1: time tps 0 0 0 0 0 10564316 inact 1975 cpu2: time MB/s 0.00 0.00 0.00 0.00 0.00 303468 cache 1977 cpu3: time %busy 0 0 0 0 0 3748056 free 1999 cpu7: time 219632 buf 2000 cpu4: time 1997 cpu6: time 2000 cpu5: time ---------------------------------- Here are the results of vmstat -w 1 during the problem: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id 157 110 13 5544M 1111M 1141 0 0 0 1100 44 0 0 47 115 8744 7 14 79 146 34 98 5546M 1099M 4191 0 0 0 729 0 2 0 17 18583 102586 9 91 0 224 33 15 5548M 1091M 3825 0 0 0 664 0 0 0 7 14115 141707 10 90 0 165 103 11 5633M 1064M 12222 0 0 0 6745 0 2 0 42 41519 403437 14 86 0 214 73 4 5653M 1044M 4539 0 0 0 959 0 0 0 7 15698 94269 11 88 1 260 30 1 5664M 1034M 8457 0 0 0 2171 0 0 0 14 36978 248202 12 87 0 57 182 45 5667M 1029M 4761 0 0 0 2535 0 0 0 6 21004 133617 10 90 0 55 24 16 2152M 2454M 7993 0 0 0 3135 0 0 0 13 20263 173347 13 81 7 20 15 2 1972M 2537M 93820 0 0 0 465955 0 10 0 588 99274 716238 23 76 1 13 11 0 1877M 2581M 7820 0 0 0 31044 0 6 0 38 7859 76120 16 83 1 9 12 1 1816M 2599M 6889 0 0 0 14550 0 20 0 79 359198 21333 14 79 7 11 13 0 1797M 2613M 6542 0 0 0 8416 0 3 0 17 606119 15341 18 61 21 1 9 1 1740M 2636M 1744 0 0 0 6267 0 2 0 14 11617 15322 8 63 29 2 3 0 1694M 2657M 3417 0 0 0 8669 0 15 0 52 50341 12045 6 29 65 Here is another view of top at a later date with the same problem happening focusing on IO setting in Top: -------------------------------------------------------------- last pid: 17984; load averages: 39.26, 37.68, 24.75 up 8+09:25:55 04:34:53 539 processes: 59 running, 480 sleeping CPU: 9.8% user, 0.5% nice, 87.0% system, 2.3% interrupt, 0.4% idle Mem: 1146M Active, 9663M Inact, 875M Wired, 582M Cache, 214M Buf, 3577M Free Swap: 8192M Total, 1036K Used, 8191M Free PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND 17587 www 446 62 0 0 0 0 0.00% httpd 17763 www 515 37 0 0 0 0 0.00% httpd 17860 www 538 47 0 0 0 0 0.00% httpd 17703 www 457 43 0 0 0 0 0.00% httpd 17701 www 485 34 0 0 0 0 0.00% httpd 17550 www 423 29 0 0 0 0 0.00% httpd 17579 www 0 0 0 0 0 0 0.00% httpd 17864 www 495 39 0 0 0 0 0.00% httpd 17836 www 520 36 0 0 0 0 0.00% httpd 17847 www 451 28 0 0 0 0 0.00% httpd 17756 www 462 29 0 0 0 0 0.00% httpd 17982 www 445 63 0 0 0 0 0.00% httpd 17581 www 451 60 0 0 0 0 0.00% httpd 17761 www 449 37 0 0 0 0 0.00% httpd 17582 www 509 30 0 0 0 0 0.00% httpd 17709 www 447 28 0 0 0 0 0.00% httpd 17705 www 515 30 0 0 0 0 0.00% httpd 17704 www 469 38 0 0 0 0 0.00% httpd 17706 www 508 53 0 0 0 0 0.00% httpd 17833 www 483 34 0 0 0 0 0.00% httpd 17834 www 499 43 0 0 0 0 0.00% httpd 17974 www 489 38 0 0 0 0 0.00% httpd 17978 www 467 45 0 0 0 0 0.00% httpd 17576 www 447 32 0 0 0 0 0.00% httpd 17570 www 443 37 0 0 0 0 0.00% httpd 17762 www 476 31 0 0 0 0 0.00% httpd 17837 www 508 44 0 0 0 0 0.00% httpd 17548 www 443 32 0 0 0 0 0.00% httpd 17783 www 390 22 0 0 0 0 0.00% httpd 17961 www 534 57 0 0 0 0 0.00% httpd 17590 www 498 50 0 0 0 0 0.00% httpd 17700 www 471 35 0 0 0 0 0.00% httpd 17580 www 438 41 0 0 0 0 0.00% httpd This used to be on a 4.11x system with 1 cpu and only 1gb of ram and ran flawlessly with much less resources with the same web site code for a long time. I do not have this problem on the other 7.0 machine. I originally thought it was just a cpu issue but it is very closely tied to when something is trying to use the raid arrays and this seems to be the way to reproduce it. I am having a hard time determining why the system load is so high. Can you recommend the best way to identify the culprit? Thanks, Paul
Just to confirm we see something similar on the box which runs our stats. We have updated from 5.4 -> 6.0 -> 6.2 -> 7.0 all have had no effect on the lockups which happen when the stats run. This box is also on an areca controller but it was on an Adaptec and we saw pretty much the same thing so I suspect its not related to the controller more to the way things are read from and flushed to disk. When we see this problem any ssh sessions become totally unresponsive. The stats we are running are a combination of rrdtool updates from a mysql DB and rrdtool backed mrtg for network stats. This is very reproducible here as it "stalls" the box every few mins when the stats kick off so if there needs to be more investigation we should be able to help. Regards Steve ----- Original Message ----- From: "Paul MacKenzie" <paul@elehost.com>>> > 14.0 on pci2 >> > ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07 >> > ARECA RAID ADAPTER0: FIRMWARE VERSION V1.43 2007-4-17 >> > arcmsr0: [ITHREAD] >> > ..... >> > Waiting 5 seconds for SCSI devices to settle >> > (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step >> > da0 at arcmsr0 bus 0 target 0 lun 0 >> > da0: <Areca ARC-1210-VOL#00 R001> Fixed Direct Access SCSI-5 device >> > da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit) >> > da0: 305175MB (624999424 512 byte sectors: 255H 63S/T 38904C) >> > SMP: AP CPU #1 Launched!... ===============================================This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone +44 845 868 1337 or return the E.mail to postmaster@multiplay.co.uk.
> Just to confirm we see something similar on the box which runs our stats. > > We have updated from 5.4 -> 6.0 -> 6.2 -> 7.0 all have had no effect on > the lockups which happen when the stats run. > > This box is also on an areca controller but it was on an Adaptec and we > saw pretty much the same thing so I suspect its not related to the > controller more to the way things are read from and flushed to disk. > > When we see this problem any ssh sessions become totally unresponsive. > > The stats we are running are a combination of rrdtool updates from a > mysql DB and rrdtool backed mrtg for network stats. > > This is very reproducible here as it "stalls" the box every few mins > when the stats kick off so if there needs to be more investigation > we should be able to help. > > Regards > SteveHi Steve, Thanks for your message. Do your processes also lock in UFS state? Well I went and bought a new Areca controller to see if this would fix it. It is quicker and seems to work much better but unfortunately I am still seeing the locking. I moved from a PCI-X Areca controller w/ 1 GB of Cache to a PCI-Express controller w/ 2 GB of cache. Main Processor : 800MHz CPU ICache Size : 32KB CPU DCache Size : 32KB CPU SCache Size : 512KB System Memory : 2048MB/533MHz/ECC Firmware Version : V1.46 2008-08-06 BOOT ROM Version : V1.45 2008-08-26 Controller Name : ARC-1231 As I try to see if there is a hardware connection to this I will let you know. I guess I am going to have to replace the full chassis SR2500ALBRPNA and mainboard S5000PAL next. If no solution is found with hardware replacement then am I out of options with using Freebsd? I wonder would there be any need to also try to replace the ram and cpus given there are no errors? Thanks, Paul