Marc G. Fournier
2009-Apr-14 20:13 UTC
7.1-STABLE Sun Mar 29 01:06:46 ADT 2009 Locks up ...
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi ... Over the past little while, two of my servers have suddenly started to hang ... servers that up until this started, have been reasonably rock solid ... they are generally within a day of each other for source code, and the hardware on both are pretty much identical (HP Proliant DL360 Servers) ... I have serial console configured on both so that I can do CR ~ ^b to get to DDB ... except, when it hangs, all I get is: "KDB: enter: Break sequence on console" And it hangs there, no prompt. I setup a simple script (see attached) to run every 5 minutes that gathers various pieces of info that I think are pertinent, but most likely don't cover everything ... Whenever this happens, on either machine, vmstat show data *like* (notice the high procs -> w values?): procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id 165 106 2 12699168 33840 3080 38 2 2 3082 1623 0 0 337 36961 4731 18 7 75 64 75 4 12761744 23084 46809 623 65 43 19307 116 334 0 1189 83674 11708 70 20 10 1 68 25 12773980 23068 11036 3003 9 36 4055 116 282 0 1336 78346 14869 56 16 28 0 71 25 12774236 23084 186 769 1 5 18 80 249 0 609 9298 5894 5 5 91 5 90 31 12747296 23352 626 2546 5 104 1147 368 281 0 1536 40945 19980 6 5 90 Where procs -> w just seems to keep rising ... note that the output for vmstat *5 minutes before* shows: procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy id 35 121 0 12414692 90552 3080 32 2 1 3090 1403 0 0 337 37022 4730 18 7 75 31 93 0 12314408 62024 36550 414 46 6 34285 27 563 0 916 94851 8813 67 33 0 43 179 0 12270932 23080 24035 101 41 12 13887 36 375 0 766 61969 6945 69 23 7 92 44 0 12265524 119804 2122 2028 1 32 13051 1096092 205 0 558 19460 4561 19 50 32 38 34 0 12330068 89140 30758 103 39 119 37037 2837365 165 0 773 92041 7111 47 53 0 I have one QEMU VPS running on this box, with kqemu running the latest kernel module ... but the other machine experiencing the same issue is only running FreeBSD jails ... Both servers are running SCHED_4BSD, if that matters any ... ? I'm at a loss as to what to look at / for next ... pointers would be greatly appreciated ... I have the various output files that the script generates available if anyone thinks they would be useful ... thank you ... Marc G. Fournier Hub.Org Hosting Solutions S.A. (http://www.hub.org) Email . scrappy@hub.org MSN . scrappy@hub.org Yahoo . yscrappy Skype: hub.org ICQ . 7615664 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.11 (FreeBSD) iEYEARECAAYFAknlRcMACgkQ4QvfyHIvDvNmIgCfSWdT9gug6VCjYM1VVMuv1UkN K28AoK298b6mxEeiddu4BAH0+IpkRsti =q6lD -----END PGP SIGNATURE----- -------------- next part -------------- Skipped content of type multipart/mixed
Hi Marc and List, i had similar issues with FreeBSD 7.2-PRERELEASE. Server (zfs,nfs) seems to hang in intervals of about 8 hours. kernel is still there but no connections can be made to nfs/ssh and login on local console doesn't seem to work due to incredible slowness. breaking to the debugger takes a moment but works. (compiling kernel with WITNESS didnt help) the server had been solid before with 7 stable kernel from around 19 October 2008. I now added these lines to /boot/loader.conf hw.pci.enable_msi=0 hw.pci.enable_msix=0 to disable Message Signaled Interrupts. Which are used by the 3ware twa driver and igb network driver on our server. With this the server had run 3 days with no hangs. I then enabled msi again and had a hang within 24 hours. Disabled again and now the server is online without an issue for 6 days. Im not 100% sure yet if this really is the sole source of the problems (e.g. workload might be another factor). But i guess its worth a try to check if it might help you too. If this is a known problem or there are any other hints to solve this problem or if the server configuration just seems wrong, i appreciate the feedback. regards, Martin pciconf (with msi): hostb0@pci0:0:0:0: class=0x060000 card=0xa28015d9 chip=0x40038086 rev=0x20 hdr=0x00 cap 01[50] = powerspec 3 supports D0 D3 current D0 cap 05[58] = MSI supports 2 messages cap 10[6c] = PCI-Express 2 root port pcib1@pci0:0:1:0: class=0x060400 card=0xa28015d9 chip=0x40218086 rev=0x20 hdr=0x01 cap 01[50] = powerspec 3 supports D0 D3 current D0 cap 05[58] = MSI supports 2 messages cap 10[6c] = PCI-Express 2 root port cap 0d[b0] = PCI Bridge card=0xa28015d9 pcib2@pci0:0:3:0: class=0x060400 card=0xa28015d9 chip=0x40238086 rev=0x20 hdr=0x01 cap 01[50] = powerspec 3 supports D0 D3 current D0 cap 05[58] = MSI supports 2 messages cap 10[6c] = PCI-Express 2 root port cap 0d[b0] = PCI Bridge card=0xa28015d9 pcib3@pci0:0:5:0: class=0x060400 card=0xa28015d9 chip=0x40258086 rev=0x20 hdr=0x01 cap 01[50] = powerspec 3 supports D0 D3 current D0 cap 05[58] = MSI supports 2 messages cap 10[6c] = PCI-Express 2 root port cap 0d[b0] = PCI Bridge card=0xa28015d9 pcib4@pci0:0:7:0: class=0x060400 card=0xa28015d9 chip=0x40278086 rev=0x20 hdr=0x01 cap 01[50] = powerspec 3 supports D0 D3 current D0 cap 05[58] = MSI supports 2 messages cap 10[6c] = PCI-Express 2 root port cap 0d[b0] = PCI Bridge card=0xa28015d9 pcib8@pci0:0:9:0: class=0x060400 card=0xa28015d9 chip=0x40298086 rev=0x20 hdr=0x01 cap 01[50] = powerspec 3 supports D0 D3 current D0 cap 05[58] = MSI supports 2 messages cap 10[6c] = PCI-Express 2 root port cap 0d[b0] = PCI Bridge card=0xa28015d9 none0@pci0:0:15:0: class=0x088000 card=0xa28015d9 chip=0x402f8086 rev=0x20 hdr=0x00 cap 01[50] = powerspec 3 supports D0 D3 current D0 cap 11[58] = MSI-X supports 4 messages in map 0x10 cap 10[6c] = PCI-Express 2 type 0 hostb1@pci0:0:16:0: class=0x060000 card=0xa28015d9 chip=0x40308086 rev=0x20 hdr=0x00 hostb2@pci0:0:16:1: class=0x060000 card=0xa28015d9 chip=0x40308086 rev=0x20 hdr=0x00 hostb3@pci0:0:16:2: class=0x060000 card=0xa28015d9 chip=0x40308086 rev=0x20 hdr=0x00 hostb4@pci0:0:16:3: class=0x060000 card=0xa28015d9 chip=0x40308086 rev=0x20 hdr=0x00 hostb5@pci0:0:16:4: class=0x060000 card=0xa28015d9 chip=0x40308086 rev=0x20 hdr=0x00 hostb6@pci0:0:17:0: class=0x060000 card=0xa28015d9 chip=0x40318086 rev=0x20 hdr=0x00 hostb7@pci0:0:21:0: class=0x060000 card=0xa28015d9 chip=0x40358086 rev=0x20 hdr=0x00 hostb8@pci0:0:21:1: class=0x060000 card=0xa28015d9 chip=0x40358086 rev=0x20 hdr=0x00 hostb9@pci0:0:22:0: class=0x060000 card=0xa28015d9 chip=0x40368086 rev=0x20 hdr=0x00 hostb10@pci0:0:22:1: class=0x060000 card=0xa28015d9 chip=0x40368086 rev=0x20 hdr=0x00 pcib9@pci0:0:28:0: class=0x060400 card=0xa28015d9 chip=0x26908086 rev=0x09 hdr=0x01 cap 10[40] = PCI-Express 1 root port cap 05[80] = MSI supports 1 message cap 0d[90] = PCI Bridge card=0xa28015d9 cap 01[a0] = powerspec 2 supports D0 D3 current D0 uhci0@pci0:0:29:0: class=0x0c0300 card=0xa28015d9 chip=0x26888086 rev=0x09 hdr=0x00 uhci1@pci0:0:29:1: class=0x0c0300 card=0xa28015d9 chip=0x26898086 rev=0x09 hdr=0x00 uhci2@pci0:0:29:2: class=0x0c0300 card=0xa28015d9 chip=0x268a8086 rev=0x09 hdr=0x00 ehci0@pci0:0:29:7: class=0x0c0320 card=0xa28015d9 chip=0x268c8086 rev=0x09 hdr=0x00 cap 01[50] = powerspec 2 supports D0 D3 current D0 cap 0a[58] = EHCI Debug Port at offset 0xa0 in map 0x14 pcib10@pci0:0:30:0: class=0x060401 card=0xa28015d9 chip=0x244e8086 rev=0xd9 hdr=0x01 cap 0d[50] = PCI Bridge card=0xa28015d9 isab0@pci0:0:31:0: class=0x060100 card=0xa28015d9 chip=0x26708086 rev=0x09 hdr=0x00 atapci0@pci0:0:31:1: class=0x01018a card=0xa28015d9 chip=0x269e8086 rev=0x09 hdr=0x00 atapci1@pci0:0:31:2: class=0x010601 card=0xa28015d9 chip=0x26818086 rev=0x09 hdr=0x00 cap 01[70] = powerspec 2 supports D0 D3 current D0 cap 12[a8] = unknown none1@pci0:0:31:3: class=0x0c0500 card=0xa28015d9 chip=0x269b8086 rev=0x09 hdr=0x00 twa0@pci0:1:0:0: class=0x010400 card=0x100413c1 chip=0x100413c1 rev=0x01 hdr=0x00 cap 01[40] = powerspec 2 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 32 messages, 64 bit cap 10[70] = PCI-Express 1 legacy endpoint pcib5@pci0:4:0:0: class=0x060400 card=0xa28015d9 chip=0x35008086 rev=0x01 hdr=0x01 cap 10[44] = PCI-Express 1 upstream port cap 01[70] = powerspec 2 supports D0 D3 current D0 cap 0d[80] = PCI Bridge card=0xa28015d9 pcib7@pci0:4:0:3: class=0x060400 card=0xa28015d9 chip=0x350c8086 rev=0x01 hdr=0x01 cap 10[44] = PCI-Express 1 PCI bridge cap 01[6c] = powerspec 2 supports D0 D3 current D0 cap 0d[80] = PCI Bridge card=0xa28015d9 cap 07[d8] = PCI-X bridge supports pcib6@pci0:5:0:0: class=0x060400 card=0xa28015d9 chip=0x35108086 rev=0x01 hdr=0x01 cap 10[44] = PCI-Express 1 downstream port cap 05[60] = MSI supports 1 message, 64 bit cap 01[70] = powerspec 2 supports D0 D3 current D0 cap 0d[80] = PCI Bridge card=0xa28015d9 twa1@pci0:6:0:0: class=0x010400 card=0x100413c1 chip=0x100413c1 rev=0x01 hdr=0x00 cap 01[40] = powerspec 2 supports D0 D1 D2 D3 current D0 cap 05[50] = MSI supports 32 messages, 64 bit cap 10[70] = PCI-Express 1 legacy endpoint igb0@pci0:8:0:0: class=0x020000 card=0x10a715d9 chip=0x10a78086 rev=0x02 hdr=0x00 cap 01[40] = powerspec 2 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 11[60] = MSI-X supports 10 messages in map 0x1c enabled cap 10[a0] = PCI-Express 2 endpoint igb1@pci0:8:0:1: class=0x020000 card=0x10a715d9 chip=0x10a78086 rev=0x02 hdr=0x00 cap 01[40] = powerspec 2 supports D0 D3 current D0 cap 05[50] = MSI supports 1 message, 64 bit cap 11[60] = MSI-X supports 10 messages in map 0x1c enabled cap 10[a0] = PCI-Express 2 endpoint vgapci0@pci0:10:1:0: class=0x030000 card=0xa28015d9 chip=0x515e1002 rev=0x02 hdr=0x00 cap 01[50] = powerspec 2 supports D0 D1 D2 D3 current D0 vmstat -i (with msi): mstat -i interrupt total rate irq1: atkbd0 2 0 irq14: ata0 216 0 irq17: atapci1 172855 200 irq23: ehci0 12 0 irq48: twa0 1472 1 irq54: twa1 1895 2 cpu0: timer 1722548 1998 irq256: igb0 772 0 irq257: igb0 2673 3 irq258: igb0 485 0 irq259: igb0 2121 2 irq260: igb0 1319 1 irq261: igb0 2 0 cpu1: timer 1714417 1988 cpu2: timer 1713997 1988 cpu3: timer 1714220 1988 Total 7049006 8177 vmstat -i (without msi): interrupt total rate irq1: atkbd0 2 0 irq14: ata0 216 0 irq17: atapci1 210359 536 irq23: ehci0 11 0 irq48: twa0 1331 3 irq54: twa1 1751 4 irq56: igb0 3733 9 cpu0: timer 783575 1998 cpu1: timer 775435 1978 cpu2: timer 775251 1977 cpu3: timer 775364 1977 Total 3327028 8487 dmesg (without msi): Copyright (c) 1992-2009 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 7.2-PRERELEASE #6: Mon Apr 13 13:30:07 CEST 2009 adm...@space.neurobiopsychologie.Uni-Osnabrueck.DE:/usr/obj/usr/ src/sys/SPACE Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU E5410 @ 2.33GHz (2327.51-MHz K8- class CPU) Origin = "GenuineIntel" Id = 0x10676 Stepping = 6 Features = 0xbfebfbff < FPU ,VME ,DE ,PSE ,TSC ,MSR ,PAE ,MCE ,CX8 ,APIC ,SEP ,MTRR ,PGE ,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2 = 0xce3bd <SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA,<b19>> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> Cores per package: 4 usable memory = 4280475648 (4082 MB) avail memory = 4107509760 (3917 MB) ACPI APIC Table: <PTLTD APIC > FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 2 cpu3 (AP): APIC ID: 3 ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 24-47 on motherboard ioapic2 <Version 2.0> irqs 48-71 on motherboard kbd1 at kbdmux0 acpi0: <PTLTD XSDT> on motherboard acpi0: [ITHREAD] acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 acpi_hpet0: <High Precision Event Timer> iomem 0xfed00000-0xfed003ff on acpi0 Timecounter "HPET" frequency 14318180 Hz quality 900 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> irq 48 at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 3ware device driver for 9000 series storage controllers, version: 3.70.05.001 twa0: <3ware 9000 series Storage Controller> port 0x2000-0x20ff mem 0xd8000000-0xd9ffffff,0xdc100000-0xdc100fff irq 48 at device 0.0 on pci1 twa0: [ITHREAD] twa0: INFO: (0x04: 0x0001): Controller reset occurred: resets=3 twa0: INFO: (0x15: 0x1300): Controller details:: Model 9650SE-8LPML, 8 ports, Firmware FE9X 4.06.00.004, BIOS BE9X 4.05.00.015 pcib2: <ACPI PCI-PCI bridge> irq 50 at device 3.0 on pci0 pci2: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> irq 52 at device 5.0 on pci0 pci3: <ACPI PCI bus> on pcib3 pcib4: <ACPI PCI-PCI bridge> irq 54 at device 7.0 on pci0 pci4: <ACPI PCI bus> on pcib4 pcib5: <ACPI PCI-PCI bridge> irq 54 at device 0.0 on pci4 pci5: <ACPI PCI bus> on pcib5 pcib6: <ACPI PCI-PCI bridge> irq 54 at device 0.0 on pci5 pci6: <ACPI PCI bus> on pcib6 twa1: <3ware 9000 series Storage Controller> port 0x3000-0x30ff mem 0xda000000-0xdbffffff,0xdc400000-0xdc400fff irq 54 at device 0.0 on pci6 twa1: [ITHREAD] twa1: INFO: (0x04: 0x0001): Controller reset occurred: resets=3 twa1: INFO: (0x15: 0x1300): Controller details:: Model 9650SE-8LPML, 8 ports, Firmware FE9X 4.06.00.004, BIOS BE9X 4.05.00.015 pcib7: <ACPI PCI-PCI bridge> at device 0.3 on pci4 pci7: <ACPI PCI bus> on pcib7 pcib8: <ACPI PCI-PCI bridge> irq 56 at device 9.0 on pci0 pci8: <ACPI PCI bus> on pcib8 igb0: <Intel(R) PRO/1000 Network Connection version - 1.4.1> port 0x4000-0x401f mem 0xdc020000-0xdc03ffff,0xdc000000-0xdc01ffff, 0xdc080000-0xdc083fff irq 56 at device 0.0 on pci8 igb0: [FILTER] igb0: Ethernet address: 00:30:48:c2:35:76 igb1: <Intel(R) PRO/1000 Network Connection version - 1.4.1> port 0x4020-0x403f mem 0xdc060000-0xdc07ffff,0xdc040000-0xdc05ffff, 0xdc084000-0xdc087fff irq 70 at device 0.1 on pci8 igb1: [FILTER] igb1: Ethernet address: 00:30:48:c2:35:77 pci0: <base peripheral> at device 15.0 (no driver attached) pcib9: <ACPI PCI-PCI bridge> irq 16 at device 28.0 on pci0 pci9: <ACPI PCI bus> on pcib9 uhci0: <Intel 631XESB/632XESB/3100 USB controller USB-1> port 0x1800-0x181f irq 20 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] uhci0: [ITHREAD] usb0: <Intel 631XESB/632XESB/3100 USB controller USB-1> on uhci0 usb0: USB revision 1.0 uhub0: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb0 uhub0: 2 ports with 2 removable, self powered uhci1: <Intel 631XESB/632XESB/3100 USB controller USB-2> port 0x1820-0x183f irq 21 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] uhci1: [ITHREAD] usb1: <Intel 631XESB/632XESB/3100 USB controller USB-2> on uhci1 usb1: USB revision 1.0 uhub1: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb1 uhub1: 2 ports with 2 removable, self powered uhci2: <Intel 631XESB/632XESB/3100 USB controller USB-3> port 0x1840-0x185f irq 22 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] uhci2: [ITHREAD] usb2: <Intel 631XESB/632XESB/3100 USB controller USB-3> on uhci2 usb2: USB revision 1.0 uhub2: <Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1> on usb2 uhub2: 2 ports with 2 removable, self powered ehci0: <Intel 63XXESB USB 2.0 controller> mem 0xdc704000-0xdc7043ff irq 23 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] ehci0: [ITHREAD] usb3: EHCI version 1.0 usb3: companion controllers, 2 ports each: usb0 usb1 usb2 usb3: <Intel 63XXESB USB 2.0 controller> on ehci0 usb3: USB revision 2.0 uhub3: <Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1> on usb3 uhub3: 6 ports with 6 removable, self powered ums0: <Peppercon AG Multidevice, class 0/0, rev 2.00/0.01, addr 2> on uhub3 ums0: 3 buttons and Z dir. ukbd0: <Peppercon AG Multidevice, class 0/0, rev 2.00/0.01, addr 2> on uhub3 kbd2 at ukbd0 pcib10: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci10: <ACPI PCI bus> on pcib10 vgapci0: <VGA-compatible display> port 0x5000-0x50ff mem 0xd0000000-0xd7ffffff,0xdc200000-0xdc20ffff irq 18 at device 1.0 on pci10 isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel 63XXESB2 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1860-0x186f at device 31.1 on pci0 ata0: <ATA channel 0> on atapci0 ata0: [ITHREAD] atapci1: <Intel AHCI controller> port 0x18b0-0x18b7,0x18a8-0x18ab, 0x18a0-0x18a7,0x1874-0x1877,0x1880-0x189f mem 0xdc704400-0xdc7047ff irq 17 at device 31.2 on pci0 atapci1: [ITHREAD] atapci1: AHCI Version 01.10 controller with 6 ports detected ata2: <ATA channel 0> on atapci1 ata2: [ITHREAD] ata3: <ATA channel 1> on atapci1 ata3: [ITHREAD] ata4: <ATA channel 2> on atapci1 ata4: [ITHREAD] ata5: <ATA channel 3> on atapci1 ata5: [ITHREAD] ata6: <ATA channel 4> on atapci1 ata6: [ITHREAD] ata7: <ATA channel 5> on atapci1 ata7: [ITHREAD] pci0: <serial bus, SMBus> at device 31.3 (no driver attached) acpi_button0: <Power Button> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] atkbd0: [ITHREAD] psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: [GIANT-LOCKED] psm0: [ITHREAD] psm0: model IntelliMouse, device ID 3 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A sio0: [FILTER] sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A sio1: [FILTER] fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 cpu0: <ACPI CPU> on acpi0 ACPI Error (psargs-0459): [\\_SB_.BCMD] Namespace lookup failure, AE_NOT_FOUND ACPI Error (psparse-0626): Method parse/execution failed [\ \_PR_.CPU0._OSC] (Node 0xffffff0001608c20), AE_NOT_FOUND ACPI Error (psparse-0626): Method parse/execution failed [\ \_PR_.CPU0._PDC] (Node 0xffffff0001608c40), AE_NOT_FOUND ACPI Error (psargs-0459): [\\_SB_.BCMD] Namespace lookup failure, AE_NOT_FOUND ACPI Error (psparse-0626): Method parse/execution failed [\ \_PR_.CPU0._OSC] (Node 0xffffff0001608c20), AE_NOT_FOUND coretemp0: <CPU On-Die Thermal Sensors> on cpu0 est0: <Enhanced SpeedStep Frequency Control> on cpu0 p4tcc0: <CPU Frequency Thermal Control> on cpu0 cpu1: <ACPI CPU> on acpi0 coretemp1: <CPU On-Die Thermal Sensors> on cpu1 est1: <Enhanced SpeedStep Frequency Control> on cpu1 p4tcc1: <CPU Frequency Thermal Control> on cpu1 cpu2: <ACPI CPU> on acpi0 coretemp2: <CPU On-Die Thermal Sensors> on cpu2 est2: <Enhanced SpeedStep Frequency Control> on cpu2 p4tcc2: <CPU Frequency Thermal Control> on cpu2 cpu3: <ACPI CPU> on acpi0 coretemp3: <CPU On-Die Thermal Sensors> on cpu3 est3: <Enhanced SpeedStep Frequency Control> on cpu3 p4tcc3: <CPU Frequency Thermal Control> on cpu3 fdc0: <floppy drive controller> port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on acpi0 fdc0: does not respond device_attach: fdc0 attach returned 6 ipmi0: <IPMI System Interface> on isa0 ipmi0: KCS mode found at io 0xca2 alignment 0x1 on isa orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xcb000-0xcd7ff, 0xcd800-0xcf7ff,0xcf800-0xcffff on isa0 ppc0: cannot reserve I/O port range sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec acd0: DVDROM <DVD-ROM UJDA780/1.50> at ata0-slave UDMA33 ad4: 238475MB <Seagate ST3250310NS SN06> at ata2-master SATA150 ad6: 238475MB <Seagate ST3250310NS SN06> at ata3-master SATA300 ipmi0: IPMI device rev. 1, firmware rev. 1.2, version 2.0 ipmi0: Number of channels 8 ipmi0: Attached watchdog da0 at twa0 bus 0 target 0 lun 0 da0: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da0: 100.000MB/s transfers da0: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da1 at twa0 bus 0 target 1 lun 0 da1: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da1: 100.000MB/s transfers da1: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da2 at twa0 bus 0 target 2 lun 0 da2: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da2: 100.000MB/s transfers da2: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da3 at twa0 bus 0 target 3 lun 0 da3: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da3: 100.000MB/s transfers da3: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da4 at twa0 bus 0 target 4 lun 0 da4: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da4: 100.000MB/s transfers da4: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da5 at twa0 bus 0 target 5 lun 0 da5: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da5: 100.000MB/s transfers da5: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da6 at twa0 bus 0 target 6 lun 0 da6: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da6: 100.000MB/s transfers da6: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da7 at twa0 bus 0 target 7 lun 0 da7: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da7: 100.000MB/s transfers da7: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da8 at twa1 bus 0 target 0 lun 0 da8: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da8: 100.000MB/s transfers da8: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da9 at twa1 bus 0 target 1 lun 0 da9: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da9: 100.000MB/s transfers da9: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da10 at twa1 bus 0 target 2 lun 0 da10: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da10: 100.000MB/s transfers da10: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da11 at twa1 bus 0 target 3 lun 0 da11: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da11: 100.000MB/s transfers da11: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da12 at twa1 bus 0 target 4 lun 0 da12: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da12: 100.000MB/s transfers da12: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da13 at twa1 bus 0 target 5 lun 0 da13: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da13: 100.000MB/s transfers da13: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da14 at twa1 bus 0 target 6 lun 0 da14: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da14: 100.000MB/s transfers da14: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) da15 at twa1 bus 0 target 7 lun 0 da15: <AMCC 9650SE-8LP DISK 4.06> Fixed Direct Access SCSI-5 device da15: 100.000MB/s transfers da15: 715245MB (1464821760 512 byte sectors: 255H 63S/T 91180C) SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! On Apr 15, 5:15 am, free...@hub.org ("Marc G. Fournier") wrote: > --==========FBEC849F7CF9A3F6439C========= > Content-Type: text/plain; charset=us-ascii > Content-Transfer-Encoding: 7bit > Content-Disposition: inline > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > Hi ... > Over the past little while, two of my servers have suddenly started to hang > ... servers that up until this started, have been reasonably rock solid ... > they are generally within a day of each other for source code, and the hardware > on both are pretty much identical (HP Proliant DL360 Servers) ... > I have serial console configured on both so that I can do CR ~ ^b to get to > DDB ... except, when it hangs, all I get is: > "KDB: enter: Break sequence on console" > And it hangs there, no prompt. > I setup a simple script (see attached) to run every 5 minutes that gathers > various pieces of info that I think are pertinent, but most likely don't cover > everything ... > Whenever this happens, on either machine, vmstat show data *like* (notice the > high procs -> w values?): > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy > id > 165 106 2 12699168 33840 3080 38 2 2 3082 1623 0 0 337 36961 4731 > 18 7 75 > 64 75 4 12761744 23084 46809 623 65 43 19307 116 334 0 1189 83674 11708 > 70 20 10 > 1 68 25 12773980 23068 11036 3003 9 36 4055 116 282 0 1336 78346 14869 > 56 16 28 > 0 71 25 12774236 23084 186 769 1 5 18 80 249 0 609 9298 5894 5 > 5 91 > 5 90 31 12747296 23352 626 2546 5 104 1147 368 281 0 1536 40945 19980 > 6 5 90 > Where procs -> w just seems to keep rising ... note that the output for > vmstat *5 minutes before* shows: > procs memory page disks faults cpu > r b w avm fre flt re pi po fr sr da0 pa0 in sy cs us sy > id > 35 121 0 12414692 90552 3080 32 2 1 3090 1403 0 0 337 37022 4730 > 18 7 75 > 31 93 0 12314408 62024 36550 414 46 6 34285 27 563 0 916 94851 8813 67 > 33 0 > 43 179 0 12270932 23080 24035 101 41 12 13887 36 375 0 766 61969 6945 > 69 23 7 > 92 44 0 12265524 119804 2122 2028 1 32 13051 1096092 205 0 558 19460 > 4561 19 50 32 > 38 34 0 12330068 89140 30758 103 39 119 37037 2837365 165 0 773 92041 > 7111 47 53 0 > I have one QEMU VPS running on this box, with kqemu running the latest kernel > module ... but the other machine experiencing the same issue is only running > FreeBSD jails ... > Both servers are running SCHED_4BSD, if that matters any ... ? > I'm at a loss as to what to look at / for next ... pointers would be greatly > appreciated ... > I have the various output files that the script generates available if anyone > thinks they would be useful ... > thank you ... > Marc G. Fournier Hub.Org Hosting Solutions S.A. (http://www.hub.org ) > Email . scra...@hub.org MSN . scra...@hub.org > Yahoo . yscrappy Skype: hub.org ICQ . 7615664