As others have noted, Tor's patch appears to be a total solution to the recent instability the PAE patch introduced. So, if you're experiencing panics with a recent kernel, or are in a position to stress a machine, please cvsup and give it a test! Thanks, Mike "Silby" Silbersack ---------- Forwarded message ---------- Date: Sat, 30 Aug 2003 08:39:08 -0700 (PDT) From: Tor Egge <tegge@FreeBSD.org> To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org Subject: cvs commit: src/sys/i386/i386 genassym.c globals.s mp_machdep.c pmap.c src/sys/i386/include globaldata.h globals.h tegge 2003/08/30 08:39:08 PDT FreeBSD src repository Modified files: (Branch: RELENG_4) sys/i386/i386 mp_machdep.c genassym.c globals.s pmap.c sys/i386/include globaldata.h globals.h Log: Avoid conflict between temporary page table mappings performed by interrupts and temporary page table mappings performed outside interrupt context without splvm() protection. Interrupt time async completion callbacks for pageout operations triggered this conflict. Approved by: re (murray) Revision Changes Path 1.86.2.5 +2 -0 src/sys/i386/i386/genassym.c 1.13.2.2 +5 -1 src/sys/i386/i386/globals.s 1.115.2.18 +5 -2 src/sys/i386/i386/mp_machdep.c 1.250.2.21 +71 -11 src/sys/i386/i386/pmap.c 1.11.2.3 +5 -2 src/sys/i386/include/globaldata.h 1.5.2.3 +4 -0 src/sys/i386/include/globals.h
Our -stable machine has been rebooting every 24hrs from upgrading on "Jul 18". Then I did cvsup again on Aug 31 03:00JST (GMT +0900). But...... # gdb -k kernel.1 vmcore.1 IdlePTD at phsyical address 0x00367000 initial pcb at physical address 0x002c55c0 panicstr: page fault panic messages: --- Fatal trap 12: page fault while in kernel mode fault virtual address = 0x5ea26fef fault code = supervisor read, page not present instruction pointer = 0x8:0xc01924b0 stack pointer = 0x10:0xc8bafd74 frame pointer = 0x10:0xc8bafd90 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 5674 (perl) interrupt mask trap number = 12 panic: page fault syncing disks... 18 done Uptime: 23h55m53s # dmesg -a FreeBSD 4.9-PRERELEASE #6: Mon Sep 1 08:09:40 JST 2003 tss@stargate.tokai-ic.or .jp:/usr/src/sys/compile/STARGATE Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 768413581 Hz CPU: Intel Celeron (768.41-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x686 Stepping = 6 Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE> real memory = 133083136 (129964K bytes) avail memory = 126193664 (123236K bytes) Preloaded elf kernel "kernel" at 0xc0348000. Pentium Pro MTRR support enabled -- /////////////////////////////////////////////////////////////////////// // T.Suzuki @ Tokai Internet Council ///////////////////////////////////////////////////////////////////////
On Sat, Aug 30, 2003 at 02:20:48PM -0500, Mike Silbersack wrote:> As others have noted, Tor's patch appears to be a total solution to the > recent instability the PAE patch introduced. So, if you're experiencing > panics with a recent kernel, or are in a position to stress a machine, > please cvsup and give it a test!FYI, I'm *not* seeing a 24-hr crash. Just running the GENERIC kernel. I've been cvsuping fairly regular and the reloads are recompiles vs. crashes. Just a datapoint since I haven't made any effort to use PAE. FreeBSD pandora.jk.homeunix.net 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #6: Tue Sep 2 21:13:12 PDT 2003 root@pandora.jk.homeunix.net:/usr/src/sys/compile/GENERIC i386 Aug 27 06:45:42 pandora /kernel: FreeBSD 4.9-PRERELEASE #2: Mon Aug 25 17:34:08 PDT 2003 Aug 27 06:57:10 pandora /kernel: FreeBSD 4.9-PRERELEASE #3: Wed Aug 27 06:49:36 PDT 2003 Aug 30 09:12:34 pandora /kernel: FreeBSD 4.9-PRERELEASE #4: Sat Aug 30 09:09:03 PDT 2003 Sep 2 21:56:09 pandora /kernel: FreeBSD 4.9-PRERELEASE #6: Tue Sep 2 21:13:12 PDT 2003
Thanks for Mike and Silby. Sorry, of my poor information. I have following options in the kernel. option DDB option DDB_UNATTENDED makeoptions DEBUG=-g # gdb -k /usr/src/sys/compile/STARGATE/kernel.debug /var/crash/vmcore.1 (kgdb) bt #0 dumpsys () at ../../kern/kern_shutdown.c:487 #1 0xc01562bf in boot (howto=256) at ../../kern/kern_shutdown.c:316 #2 0xc01566fd in panic (fmt=0xc029ccac "%s") at ../../kern/kern_shutdown.c:595 #3 0xc025f1e7 in trap_fatal (frame=0xc8bafd34, eva=1587703791) at ../../i386/i386/trap.c:974 #4 0xc025ee95 in trap_pfault (frame=0xc8bafd34, usermode=0, eva=1587703791) at ../../i386/i386/trap.c:867 #5 0xc025ea3b in trap (frame={tf_fs = -1060634608, tf_es = 16, tf_ds = -927334384, tf_edi = -1060944235, tf_esi = -1060963163, tf_ebp = -927269488, tf_isp = -927269536, tf_ebx = 1587703791, tf_edx = -1060963128, tf_ecx = -1060963131, tf_eax = 28, tf_trapno = 12, tf_err = 0, tf_eip = -1072094032, tf_cs = 8, tf_eflags = 66050, tf_esp = 41216, tf_ss = -1060944240}) at ../../i386/i386/trap.c:466 #6 0xc01924b0 in ifa_ifwithnet (addr=0xc0c34690) at ../../net/if.c:612 #7 0xc019ed31 in in_pcbladdr (inp=0xc81d5b00, nam=0xc0c34690, plocal_sin=0xc8bafdc8) at ../../netinet/in_pcb.c:459 #8 0xc019ee16 in in_pcbconnect (inp=0xc81d5b00, nam=0xc0c34690, p=0xc8a7cea0) at ../../netinet/in_pcb.c:526 #9 0xc01b5373 in udp_output (inp=0xc81d5b00, m=0xc074d800, addr=0xc0c34690, control=0x0, p=0xc8a7cea0) at ../../netinet/udp_usrreq.c:708 #10 0xc01b5784 in udp_send (so=0xc8177a00, flags=0, m=0xc074d800, addr=0xc0c34690, control=0x0, p=0xc8a7cea0) at ../../netinet/udp_usrreq.c:920 #11 0xc01756b7 in sosend (so=0xc8177a00, addr=0xc0c34690, uio=0xc8bafecc, top=0xc074d800, control=0x0, flags=0, p=0xc8a7cea0) at ../../kern/uipc_socket.c:609 #12 0xc0178b37 in sendit (p=0xc8a7cea0, s=4, mp=0xc8baff0c, flags=0) at ../../kern/uipc_syscalls.c:590 #13 0xc0178c3a in sendto (p=0xc8a7cea0, uap=0xc8baff80) at ../../kern/uipc_syscalls.c:643 #14 0xc025f49d in syscall2 (frame={tf_fs = 135725103, tf_es = 135725103, tf_ds = -1078001617, tf_edi = 135172116, tf_esi = 135172112, tf_ebp = -1077937056, tf_isp = -927268908, tf_ebx = 672126736, tf_edx = 139254656, tf_ecx = 139260876, tf_eax = 133, tf_trapno = 0, tf_err = 2, tf_eip = 672614432, tf_cs = 31, tf_eflags = 659, tf_esp = -1077937148, tf_ss = 47}) at ../../i386/i386/trap.c:1175 #15 0xc0250785 in Xint0x80_syscall () #16 0x2807ef19 in ?? () #17 0x280e8c58 in ?? () #18 0x8048e79 in ?? () #19 0x8048d5a in ?? () -- /////////////////////////////////////////////////////////////////////// // T.Suzuki @ Tokai Internet Council, Japan ///////////////////////////////////////////////////////////////////////
+--- On Wednesday, September 03, 2003 03:09 --- | Mike Tancsa proclaimed: | >| in | >| /etc/rc.conf | >| add | >| dumpdev="/dev/ad0s1b" # Device name to crashdump to (or NO). | >| dumpdir="/var/crash" # Directory where crash dumps are to be stored | > | >Ok, I am guessing the 'dumpdev' line is the boot-time equivalent of | > running the dumpon(8) command to set the sysctl kern.dumpdev. | | Correct. The above also assumes thats where you have your swap. If its | not, than adjust accordingly. | Well, it has been a little over 24 hours, and I got a panic, but no dump! Here is the log from the panic as well as the message stating that a dump couldn't be performed: //--start--// Fatal trap 12: page fault while in kernel mode fault virtual address = 0xbed557c5 fault code = supervisor read, page not present instruction pointer = 0x8:0xc027c38d stack pointer = 0x10:0xdea01ecc frame pointer = 0x10:0xdea01ef4 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 5 (syncer) interrupt mask = none trap number = 12 panic: page fault syncing disks... 8 done Uptime: 1d0h9m53s dumping to dev #ar/0x20001, offset 1279168 dump failed, reason: device doesn't support a dump routine Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... //--end--// I have a HighPoint IDE RAID controller in this box with a RAID 1 configuration (ar0) using two Seagate 120GB disks (ad4 and ad6). I put this in /etc/rc.conf before I rebooted last night: //--start--// $ head /etc/rc.conf dumpdev="/dev/ar0s1b" #swap device configured in /etc/fstab dumpdir="/usr/var/crash" #using a dir under /usr as /var isn't big enough //--end--// Here is the output from 'disklabel -r ar0' showing that my swap device is indeed /dev/ar0s1b: //--start--// $ disklabel -r ar0 # /dev/ar0c: type: ESDI disk: ar0s1 label: flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 14592 sectors/unit: 234436482 rpm: 3600 interleave: 1 trackskew: 0 cylinderskew: 0 headswitch: 0 # milliseconds track-to-track seek: 0 # milliseconds drivedata: 0 8 partitions: # size offset fstype [fsize bsize bps/cpg] a: 262144 0 4.2BSD 2048 16384 94 # (Cyl. 0 - 16*) b: 2589856 262144 swap # (Cyl. 16*- 177*) c: 234436482 0 unused 0 0 # (Cyl. 0 - 14592*) e: 524288 2852000 4.2BSD 2048 16384 94 # (Cyl. 177*- 210*) f: 524288 3376288 4.2BSD 2048 16384 94 # (Cyl. 210*- 242*) g: 230535906 3900576 4.2BSD 2048 16384 89 # (Cyl. 242*- 14592*) //--end--// So, what am I missing here in order to get a dump so that I can help debug this problem? Is the ar0 device not able to be a dump device? Do I need to install a dedicated IDE drive on the MB's IDE controller just so that I can get a dump? This is exacerbated by the fact that I will have to wait another 24 hours to try again. -- Mike perl -e 'print unpack("u","88V]N=&%C=\"!I;F9O(&EN(&AE861E<G,*");'