I wasn't sure if I should send this to -stable or -questions, so I apologize if I chose the wrong list. I've had a pre-production server 'hang' twice within the past couple of weeks. The first time the machine locked up it was running 5.4-BETA1 and the second time it was running 5.4-RC1. The machine ran fine and did not exhibit similar symptoms while running -STABLE for several months prior to cvsupping during the 5.4 cycle. By "hang" I mean that the machine stops responding to console keystrokes (serial or otherwise) while existing ssh and nfs connections stop responding, but do not immediately close. The machine continues to respond to ICMP pings. I can not make new SSH connections and my attempts eventually timeout rather than giving me a connection refused error. I wasn't able to find any clues in /var/log/messages. The only obvious similarity that I've found with the two crashes is that the serial console cable had been accidentally unplugged at some point before the two crashes (clarification in the off chance that it matters: It's an RJ-45 from a portmaster to a RJ-45/DB9 adapter. The RJ-45 end was unplugged, while the adapter was still attached to the server.). I tried to rule out the unpluggedness of the console cable contributing to this problem, by removing the console cable and letting the following run for a few minutes: # while true; do echo "blah" > /dev/console;done This above loop ran for several minutes after unplugging the console cable without any obvious ill effects. I tried to plug/unplug the cable several times, while the loop ran. After cvsupping to 5.4-Beta1 I enabled gvinum to use as a volume manager (RAID is being handled by a 3ware Escalade 9500-8). I'd appreciate any help in troubleshooting this. I will copy/paste my gvinum configuration followed by dmesg.boot at the end of this e-mail. Thanks, -Ash GVINUM: gvinum -> printconfig # Vinum configuration of golem.anhedonia.com, saved at Sat Apr 9 14:36:20 2005 drive array00 device /dev/da1a volume qbvol00 volume datavol00 plex name qbvol00.p0 org concat vol qbvol00 plex name datavol00.p0 org concat vol datavol00 sd name qbvol00.p0.s0 drive array00 len 20971520s driveoffset 265s plex qbvol00.p0 plexoffset 0s sd name datavol00.p0.s0 drive array00 len 209715200s driveoffset 20971785s plex datavol00.p0 plexoffset 0s gvinum -> l 1 drive: D array00 State: up /dev/da1a A: 840991/953631 MB (88%) 2 volumes: V qbvol00 State: up Plexes: 1 Size: 10 GB V datavol00 State: up Plexes: 1 Size: 100 GB 2 plexes: P qbvol00.p0 C State: up Subdisks: 1 Size: 10 GB P datavol00.p0 C State: up Subdisks: 1 Size: 100 GB 2 subdisks: S qbvol00.p0.s0 State: up D: array00 Size: 10 GB S datavol00.p0.s0 State: up D: array00 Size: 100 GB DMESG: Copyright (c) 1992-2005 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.4-RC1 #0: Tue Apr 5 18:38:44 CDT 2005 root@golem.anhedonia.com:/usr/obj/usr/src/sys/GOLEM ACPI APIC Table: <PTLTD APIC > Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(TM) CPU 2.66GHz (2657.82-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0xf25 Stepping = 5 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C MOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Hyperthreading: 2 logical CPUs real memory = 2146959360 (2047 MB) avail memory = 2095423488 (1998 MB) ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 24-47 on motherboard ioapic2 <Version 2.0> irqs 48-71 on motherboard npx0: <math processor> on motherboard npx0: INT 16 interface acpi0: <PTLTD RSDT> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: <ACPI CPU> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pci0: <unknown> at device 0.1 (no driver attached) pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pci1: <base peripheral, interrupt controller> at device 28.0 (no driver attached ) pcib2: <ACPI PCI-PCI bridge> at device 29.0 on pci1 pci2: <ACPI PCI bus> on pcib2 pcib3: <PCI-PCI bridge> at device 2.0 on pci2 pci3: <PCI bus> on pcib3 em0: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 0x7000-0x703f mem 0xfb200000-0xfb21ffff irq 28 at device 4.0 on pci3 em0: Ethernet address: 00:04:23:ad:73:e8 em0: Speed:N/A Duplex:N/A em1: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 0x7040-0x707f mem 0xfb220000-0xfb23ffff irq 29 at device 4.1 on pci3 em1: Ethernet address: 00:04:23:ad:73:e9 em1: Speed:N/A Duplex:N/A em2: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 0x7080-0x70bf mem 0xfb240000-0xfb25ffff irq 30 at device 6.0 on pci3 em2: Ethernet address: 00:04:23:ad:73:ea em2: Speed:N/A Duplex:N/A em3: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 0x70c0-0x70ff mem 0xfb260000-0xfb27ffff irq 31 at device 6.1 on pci3 em3: Ethernet address: 00:04:23:ad:73:eb em3: Speed:N/A Duplex:N/A pci1: <base peripheral, interrupt controller> at device 30.0 (no driver attached ) pcib4: <ACPI PCI-PCI bridge> at device 31.0 on pci1 pci4: <ACPI PCI bus> on pcib4 3ware device driver for 9000 series storage controllers, version: 2.50.02.012 twa0: <3ware 9000 series Storage Controller> port 0x8000-0x80ff mem 0xfd800000-0 xfdffffff,0xfb300000-0xfb3000ff irq 48 at device 1.0 on pci4 twa0: 8 ports, Firmware FE9X 2.04.00.005, BIOS BE9X 2.03.01.047 pci0: <unknown> at device 2.1 (no driver attached) pcib5: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci5: <ACPI PCI bus> on pcib5 pci5: <display, VGA> at device 3.0 (no driver attached) fxp0: <Intel 82550 Pro/100 Ethernet> port 0x9400-0x943f mem 0xfb400000-0xfb41fff f,0xfb441000-0xfb441fff irq 20 at device 4.0 on pci5 miibus0: <MII bus> on fxp0 inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto fxp0: Ethernet address: 00:0e:0c:4e:02:5e em4: <Intel(R) PRO/1000 Network Connection, Version - 1.7.35> port 0x9440-0x947f mem 0xfb420000-0xfb43ffff irq 23 at device 5.0 on pci5 em4: Ethernet address: 00:0e:0c:4e:01:03 em4: Speed:N/A Duplex:N/A isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel ICH3 UDMA100 controller> port 0x6c60-0x6c6f,0x376,0x170-0x177,0x 3f6,0x1f0-0x1f7 at device 31.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 pci0: <serial bus, SMBus> at device 31.3 (no driver attached) acpi_button0: <Power Button> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A, console sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0: <ECP parallel printer port> port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on ac pi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 orm0: <ISA Option ROMs> at iomem 0xe3000-0xe3fff,0xc8000-0xc97ff,0xc0000-0xc7fff on isa0 pmtimer0 on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 2657824944 Hz quality 800 Timecounters tick every 10.000 msec acd0: CDROM <SR244W/T01A> at ata0-master UDMA33 da0 at twa0 bus 0 target 0 lun 0 da0: <3ware Logical Disk 00 1.00> Fixed Direct Access SCSI-0 device da0: 100.000MB/s transfers da0: 76283MB (156227584 512 byte sectors: 255H 63S/T 9724C) da1 at twa0 bus 0 target 1 lun 0 da1: <3ware Logical Disk 01 1.00> Fixed Direct Access SCSI-0 device da1: 100.000MB/s transfers da1: 953632MB (1953038336 512 byte sectors: 255H 63S/T 121571C) Mounting root from ufs:/dev/da0s1a
Peter Jeremy
2005-Apr-09 15:03 UTC
5.4-RC1 Freezing, but pingable (may be related to gvinum)
On Sat, 2005-Apr-09 14:51:41 -0500, Ash wrote:>By "hang" I mean that the machine stops responding to console keystrokes >(serial or otherwise) while existing ssh and nfs connections stop >responding, but do not immediately close. The machine continues to >respond to ICMP pings. I can not make new SSH connections and my >attempts eventually timeout rather than giving me a connection refused >error.This is consistent with the kernel continuing to run normally but being unable to schedule userland processes - usually due to a deadlock. Do the caps-lock, num-lock, scroll-lock buttons on a local keyboard still toggle the relevant LEDs? Assuming the LEDs toggle: Do you have "options DDB" and "options KDB" in your kernel? If so, can you break into DDB? (If not, I think you'll need to build a kernel with DDB). Once the system has hung, you need to enter DDB and run 'ps'. The output from that will give (hopefully) give an indication as to what is going wrong (and where to look next). If you've build the kernel with debugging symbols and got a dump device enabled, "call doadump()" should also generate a crashdump which will be much easier to examine. Peter