Had a machine at work crash; got a core dump, but I'm having some trouble getting kgdb to behave usefully, and could use a hint. Machine is a dual Xeon, running: out03# uname -rms FreeBSD 6.2-RELEASE amd64 out03# Its primary workload is delivering mail to customers (similar to the role of mx2.freebsd.org, for those familiar with FreeBSD.org infrastructure). Like mx2, the MTA in use is Postfix. It runs a caching-only name server for its own use, and there's a Perl script that runs from time to time to scrape data out of /var/log/maillog and feed said data to some database machine somewhere. It also runs ntpd & sshd, and uses IPFW for packet-filtering. My first approach was to copy the core dump & kernel.debug files to my work desktop (which runs 6.2-STABLE on i386 as of yesterday; I'm in the habit of tracking RELENG_6 every Sunday on that machine). Results of that were succinct: catmint(6.2-S)[2] ls -ltr kernel.debug vmcore.0 -rw------- 1 dhw wheel 2913157120 Aug 13 05:49 vmcore.0 -rwxr-xr-x 1 dhw wheel 29215877 Aug 13 06:02 kernel.debug catmint(6.2-S)[3] kgdb kernel.debug vmcore.0 kgdb: bad namelist catmint(6.2-S)[4] echo $? 1 catmint(6.2-S)[5] I would prefer to continue the work from that machine, ideally. On the chance that there's something odd about the different environments, I tried invoking kgdb on the machine that crashed: out03# cd /usr/obj/usr/src/sys/SMP_IPFW/ out03# kgdb kernel.debug /var/spool/crash/vmcore.0 kgdb: kvm_read: kgdb: kvm_read: invalid address (0xffff67e9d231c931) kgdb: kvm_read: invalid address (0xffff67e9d231c931) ... kgdb: kvm_read: invalid address (0xffff67e9d231c931) kgdb: kvm_read: invalid address (0xffff67e9d231c931) ^Ckvm_read: invalid address (0xffff67e9d231c931) out03# It showed no indication of stopping; the novelty had worn off long since, and the machine is back in production at the moment, so I'd prefer to avoid disrupting that. Checking /var/log/console.log, I see: ... Aug 13 04:31:28 out03 kernel: 32-bit compatibility ldconfig path: /usr/lib32 Aug 13 04:31:28 out03 kernel: Checking for core dump on /dev/da0s3b... Aug 13 04:31:28 out03 kernel: savecore: reboot after panic: page fault Aug 13 04:31:28 out03 kernel: Aug 13 04:31:28 out03 savecore: reboot after panic: page fault Aug 13 04:31:28 out03 kernel: savecore: writing core to vmcore.0 Aug 13 04:47:17 out03 kernel: Script /etc/rc.d/savecore interrupted Aug 13 04:47:17 out03 kernel: Initial amd64 initialization: Aug 13 04:47:17 out03 kernel: . Aug 13 04:47:17 out03 kernel: Additional ABI support: ... I hadn't recalled noticing that "Script /etc/rc.d/savecore interrupted"; hmmm... Here's dmesg.boot from its most recent boot; since I hadn't changed anything, it should resemble the system from before the crash pretty closely: out03# cat /var/run/dmesg.boot Copyright (c) 1992-2007 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 6.2-RELEASE #0: Wed Jan 31 09:03:09 PST 2007 dhw@h239.dhw.mail-abuse.org:/usr/obj/usr/src/sys/SMP_IPFW ACPI APIC Table: <PTLTD APIC > Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Xeon(R) CPU 5130 @ 2.00GHz (2000.08-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x6f6 Stepping = 6 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x4e33d<SSE3,RSVD2,MON,DS_CPL,VMX,TM2,<b9>,CX16,<b14>,<b15>,<b18>> AMD Features=0x20100800<SYSCALL,NX,LM> AMD Features2=0x1<LAHF> Cores per package: 2 real memory = 5100273664 (4864 MB) avail memory = 4122042368 (3931 MB) FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs cpu0 (BSP): APIC ID: 0 cpu1 (AP): APIC ID: 1 cpu2 (AP): APIC ID: 6 cpu3 (AP): APIC ID: 7 ioapic0 <Version 2.0> irqs 0-23 on motherboard ioapic1 <Version 2.0> irqs 24-47 on motherboard kbd1 at kbdmux0 ath_hal: 0.9.17.2 (AR5210, AR5211, AR5212, RF5111, RF5112, RF2413, RF5413) acpi0: <PTLTD RSDT> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x1008-0x100b on acpi0 cpu0: <ACPI CPU> on acpi0 acpi_throttle0: <ACPI CPU Throttling> on cpu0 cpu1: <ACPI CPU> on acpi0 acpi_throttle1: <ACPI CPU Throttling> on cpu1 acpi_throttle1: failed to attach P_CNT device_attach: acpi_throttle1 attach returned 6 cpu2: <ACPI CPU> on acpi0 acpi_throttle2: <ACPI CPU Throttling> on cpu2 acpi_throttle2: failed to attach P_CNT device_attach: acpi_throttle2 attach returned 6 cpu3: <ACPI CPU> on acpi0 acpi_throttle3: <ACPI CPU Throttling> on cpu3 acpi_throttle3: failed to attach P_CNT device_attach: acpi_throttle3 attach returned 6 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 pcib1: <ACPI PCI-PCI bridge> at device 2.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pcib2: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci1 pci2: <ACPI PCI bus> on pcib2 pcib3: <ACPI PCI-PCI bridge> irq 16 at device 0.0 on pci2 pci3: <ACPI PCI bus> on pcib3 pcib4: <ACPI PCI-PCI bridge> irq 18 at device 2.0 on pci2 pci4: <ACPI PCI bus> on pcib4 em0: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0x2000-0x201f mem 0xda000000-0xda01ffff irq 18 at device 0.0 on pci4 em0: Ethernet address: 00:30:48:8b:94:72 em1: <Intel(R) PRO/1000 Network Connection Version - 6.2.9> port 0x2020-0x203f mem 0xda020000-0xda03ffff irq 19 at device 0.1 on pci4 em1: Ethernet address: 00:30:48:8b:94:73 pcib5: <ACPI PCI-PCI bridge> at device 0.3 on pci1 pci5: <ACPI PCI bus> on pcib5 3ware device driver for 9000 series storage controllers, version: 3.60.02.012 twa0: <3ware 9000 series Storage Controller> port 0x3000-0x303f mem 0xd8000000-0xd9ffffff,0xda100000-0xda100fff irq 24 at device 1.0 on pci5 twa0: [FAST] twa0: INFO: (0x04: 0x003B): Rebuild paused: unit=1 twa0: INFO: (0x15: 0x1300): Controller details:: Model 9550SX-4LP, 4 ports, Firmware FE9X 3.04.01.011, BIOS BE9X 3.04.00.002 pci0: <base peripheral> at device 8.0 (no driver attached) pcib6: <ACPI PCI-PCI bridge> irq 17 at device 28.0 on pci0 pci6: <ACPI PCI bus> on pcib6 uhci0: <UHCI (generic) USB controller> port 0x1800-0x181f irq 17 at device 29.0 on pci0 uhci0: [GIANT-LOCKED] usb0: <UHCI (generic) USB controller> on uhci0 usb0: USB revision 1.0 uhub0: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <UHCI (generic) USB controller> port 0x1820-0x183f irq 19 at device 29.1 on pci0 uhci1: [GIANT-LOCKED] usb1: <UHCI (generic) USB controller> on uhci1 usb1: USB revision 1.0 uhub1: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: <UHCI (generic) USB controller> port 0x1840-0x185f irq 18 at device 29.2 on pci0 uhci2: [GIANT-LOCKED] usb2: <UHCI (generic) USB controller> on uhci2 usb2: USB revision 1.0 uhub2: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered uhci3: <UHCI (generic) USB controller> port 0x1860-0x187f irq 16 at device 29.3 on pci0 uhci3: [GIANT-LOCKED] usb3: <UHCI (generic) USB controller> on uhci3 usb3: USB revision 1.0 uhub3: Intel UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub3: 2 ports with 2 removable, self powered ehci0: <EHCI (generic) USB 2.0 controller> mem 0xda600000-0xda6003ff irq 17 at device 29.7 on pci0 ehci0: [GIANT-LOCKED] usb4: EHCI version 1.0 usb4: companion controllers, 2 ports each: usb0 usb1 usb2 usb3 usb4: <EHCI (generic) USB 2.0 controller> on ehci0 usb4: USB revision 2.0 uhub4: Intel EHCI root hub, class 9/0, rev 2.00/1.00, addr 1 uhub4: 8 ports with 8 removable, self powered pcib7: <ACPI PCI-PCI bridge> at device 30.0 on pci0 pci7: <ACPI PCI bus> on pcib7 pci7: <display, VGA> at device 1.0 (no driver attached) isab0: <PCI-ISA bridge> at device 31.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel 63XXESB2 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0x1880-0x188f at device 31.1 on pci0 ata0: <ATA channel 0> on atapci0 ata1: <ATA channel 1> on atapci0 pci0: <serial bus, SMBus> at device 31.3 (no driver attached) acpi_button0: <Power Button> on acpi0 atkbdc0: <Keyboard controller (i8042)> port 0x60,0x64 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A, console sio1: <16550A-compatible COM port> port 0x2f8-0x2ff irq 3 on acpi0 sio1: type 16550A ppc0: <ECP parallel printer port> port 0x378-0x37f,0x778-0x77f irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 orm0: <ISA Option ROMs> at iomem 0xc0000-0xcafff,0xcb000-0xcc7ff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounters tick every 1.000 msec ipfw2 (+ipv6) initialized, divert loadable, rule-based forwarding disabled, default to deny, logging unlimited acd0: DMA limited to UDMA33, controller found non-ATA66 cable acd0: DVDROM <SONY DVD-ROM DDU1615/GYS1> at ata0-master UDMA33 SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! SMP: AP CPU #3 Launched! da0 at twa0 bus 0 target 0 lun 0 da0: <AMCC 9550SX-4LP DISK 3.04> Fixed Direct Access SCSI-3 device da0: 100.000MB/s transfers da0: 76283MB (156227584 512 byte sectors: 255H 63S/T 9724C) da1 at twa0 bus 0 target 1 lun 0 da1: <AMCC 9550SX-4LP DISK 3.04> Fixed Direct Access SCSI-3 device da1: 100.000MB/s transfers da1: 152577MB (312477696 512 byte sectors: 255H 63S/T 19450C) Trying to mount root from ufs:/dev/da0s1a em0: link state changed to UP twa0: INFO: (0x04: 0x000B): Rebuild started: unit=1 out03# Hints and/or clues would be quite welcome; thanks! Peace, david -- David H. Wolfskill david@catwhisker.org Anything and everything is a (potential) cat toy. See http://www.catwhisker.org/~david/publickey.gpg for my public key. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 195 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20070813/f7d6abed/attachment.pgp