Rick Updegrove
2004-Apr-02 05:03 UTC
upgrade from 4.8 SMP to 4.9 SMP causes unexplained rebooting!
(yes sorry for the cross-posting but I haven't been making much progress with this problem and my poor disks are taking a beating from all this rebooting) A 4.8-STABLE machine I have been running with no problems for over 130 days straight uptime is now having unexplained reboots AFTER upgrading to 4.9 STABLE. The reboots are not every single day, or predictable, but they are happening almost every other day. It is a low traffic qmail-scanner machine (7k messages a day) and the only reason I even upgraded was due to to http://www.freebsd.org/releases/4.9R/errata.html Now I am sort of wishing I did not : ) I lost all the uptime and now the unexplained rebooting... I hesitate reporting this because most people point their fingers at the hardware. I am tempted to abandon this machine for another but if anyone is interested in taking a look please advise. I have the following: #/etc/rc.conf #rebooting dumpdev=YES savecore=YES dumpdir="/var/crash" I know ths is not quite enough - I need to configure a dump devide but I have no tape drive. Is there another way? I rebuilt this kernel with: makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols Is there anything else I should do? Right now I do not have the ability to attach a serial console to the crashing system and set the system to serial console. And even if I did have physical access I am not sure how to do that exactly... Is there another way to accomplish the debugging of this? I have been running FreeBSD so long with no problems I am sort of rusty at tracking them down, especially the elusive ones. So please point me in the right direction. Thanks! Rick P.S. dmesg follows Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.9-STABLE #1: Wed Mar 24 08:06:56 PST 2004 root@govmail.ca.gov:/usr/obj/usr/src/sys/SMP Timecounter "i8254" frequency 1193182 Hz CPU: Pentium III/Pentium III Xeon/Celeron (499.15-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x673 Stepping = 3 Features=0x387fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,PN,MMX,FXSR,SSE> real memory = 536870912 (524288K bytes) avail memory = 519516160 (507340K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 FreeBSD/SMP: Multiprocessor motherboard: 2 CPUs cpu0 (BSP): apic id: 1, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 0, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 Preloaded elf kernel "kernel" at 0xc0327000. Pentium Pro MTRR support enabled md0: Malloc disk Using $PIR table, 14 entries at 0xc00fdee0 npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <Intel 82443BX host to PCI bridge (AGP disabled)> on motherboard IOAPIC #0 intpin 19 -> irq 2 IOAPIC #0 intpin 17 -> irq 16 pci0: <PCI bus> on pcib0 isab0: <Intel 82371AB PCI to ISA bridge> at device 4.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <Intel PIIX4 ATA33 controller> port 0xfcd0-0xfcdf at device 4.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 4.2 irq 2 Timecounter "PIIX" frequency 3579545 Hz chip1: <Intel 82371AB Power management controller> port 0x2180-0x218f at device 4.3 on pci0 pcib1: <PCI to PCI bridge (vendor=8086 device=0960)> at device 7.0 on pci0 IOAPIC #0 intpin 16 -> irq 17 pci1: <PCI bus> on pcib1 ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0xe800-0xe8ff mem 0xfebfe000-0xfebfefff irq 17 at device 4.0 on pci1 aic7880: Ultra Wide Channel A, SCSI Id=7, 16/253 SCBs pci1: <unknown card> (vendor=0x1000, dev=0x000c) at 7.0 irq 18 amr0: <LSILogic MegaRAID> mem 0xf0000000-0xf7ffffff irq 16 at device 7.1 on pci0 amr0: <Integrated HP NetRAID (T5)> Firmware D.02.05, BIOS B.01.04, 16MB RAM pcib2: <DEC 21152 PCI-PCI bridge> at device 8.0 on pci0 pci2: <PCI bus> on pcib2 fxp0: <Intel 82558 Pro/100 Ethernet> port 0xdce0-0xdcff mem 0xfe900000-0xfe9fffff,0xefffe000-0xefffefff irq 16 at device 2.0 on pci2 fxp0: Ethernet address 00:90:27:b7:09:76 inphy0: <i82555 10/100 media interface> on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pci0: <unknown card> (vendor=0x103c, dev=0x10c1) at 11.0 pci0: <Cirrus Logic GD5446 SVGA controller> at 13.0 orm0: <Option ROMs> at iomem 0xc0000-0xc7fff,0xc8000-0xc87ff,0xc8800-0xc8fff,0xc9000-0xc97ff on isa0 pmtimer0 on isa0 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> flags 0x1 irq 1 on atkbdc0 kbd0 at atkbd0 psm0: <PS/2 Mouse> irq 12 on atkbdc0 psm0: model Generic PS/2 mouse, device ID 0 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A sio1: configured irq 3 not in bitmap of probed irqs 0 ppc0: parallel port not found. APIC_IO: Testing 8254 interrupt delivery APIC_IO: Broken MP table detected: 8254 is not connected to IOAPIC #0 intpin 2 APIC_IO: routing 8254 via 8259 and IOAPIC #0 intpin 0 ata0-slave: ATAPI identify retries exceeded acd0: CDROM <CD-532E-B> at ata0-master PIO4 Waiting 15 seconds for SCSI devices to settle amrd0: <LSILogic MegaRAID logical drive> on amr0 amrd0: 34708MB (71081984 sectors) RAID 5 (optimal) SMP: AP CPU #1 Launched! Mounting root from ufs:/dev/amrd0s1a WARNING: / was not properly dismounted dumpon: YES : No such file or directory swapon: adding /dev/amrd0s1b as swap device Automatic boot in progress... /dev/amrd0s1a: <snip>
Mike Harding
2004-Apr-02 10:38 UTC
upgrade from 4.8 SMP to 4.9 SMP causes unexplained rebooting!
I had rebooting problems with 4.9 that I did not have with 4.8 or a recent 4-STABLE - I would recommend backing up to 4.8 and then going to 4.10 when released. Alternately, a recent 4-STABLE will be very close to 4.10. Inevitably there will be several people on this group that will say that this is a sign of bad hardware, but if so, my hardware is very choosy about which releases it will fail with. - Mike H. (yes sorry for the cross-posting but I haven't been making much progress with this problem and my poor disks are taking a beating from all this rebooting) A 4.8-STABLE machine I have been running with no problems for over 130 days straight uptime is now having unexplained reboots AFTER upgrading to 4.9 STABLE. ...
Eugene Grosbein
2004-Apr-03 08:39 UTC
upgrade from 4.8 SMP to 4.9 SMP causes unexplained rebooting!
On Thu, Apr 01, 2004 at 11:26:13AM -0800, Rick Updegrove wrote:> A 4.8-STABLE machine I have been running with no problems for over 130 > days straight uptime is now having unexplained reboots AFTER upgrading > to 4.9 STABLE.> only reason I even upgraded was due to > to http://www.freebsd.org/releases/4.9R/errata.html > > Now I am sort of wishing I did not : ) I lost all the uptime and now > the unexplained rebooting... I hesitate reporting this because most > people point their fingers at the hardware. I am tempted to abandon > this machine for another but if anyone is interested in taking a look > please advise.I have similar situation. My SMP machine runs fine with pre-PAE 4.8-STABLE and freezes with 4.9-STABLE quite often. There is no kernel panic, no crashdump. So I was forced to downgrade it to 4.8-STABLE of 8 August 2003 and keep it secure by applying patches.> I have the following: > > #/etc/rc.conf > #rebooting > dumpdev=YES > savecore=YES > dumpdir="/var/crash" > > I know ths is not quite enough - I need to configure a dump devide but I > have no tape drive. Is there another way?Yes. You should have swap partition greater or equal to size of RAM and enough free space in /var/crash (using symlinks is OK). And point dumpdev to your swap like this (example!): dumpdev=/dev/ad0s1b> > I rebuilt this kernel with: > > makeoptions DEBUG=-g #Build kernel with gdb(1) debug symbols > > Is there anything else I should do?Just read section about kernel debugging in the Developer Handbook.> Right now I do not have the ability to attach a serial console to the > crashing system and set the system to serial console. And even if I did > have physical access I am not sure how to do that exactly...It should not be necesary if you manage to get crashdump. Eugene Grosbein