As others have noted, Tor's patch appears to be a total solution to the
recent instability the PAE patch introduced. So, if you're experiencing
panics with a recent kernel, or are in a position to stress a machine,
please cvsup and give it a test!
Thanks,
Mike "Silby" Silbersack
---------- Forwarded message ----------
Date: Sat, 30 Aug 2003 08:39:08 -0700 (PDT)
From: Tor Egge <tegge@FreeBSD.org>
To: src-committers@FreeBSD.org, cvs-src@FreeBSD.org, cvs-all@FreeBSD.org
Subject: cvs commit: src/sys/i386/i386 genassym.c globals.s mp_machdep.c
pmap.c src/sys/i386/include globaldata.h globals.h
tegge 2003/08/30 08:39:08 PDT
FreeBSD src repository
Modified files: (Branch: RELENG_4)
sys/i386/i386 mp_machdep.c genassym.c globals.s pmap.c
sys/i386/include globaldata.h globals.h
Log:
Avoid conflict between temporary page table mappings performed by
interrupts and temporary page table mappings performed outside
interrupt context without splvm() protection. Interrupt time async
completion callbacks for pageout operations triggered this conflict.
Approved by: re (murray)
Revision Changes Path
1.86.2.5 +2 -0 src/sys/i386/i386/genassym.c
1.13.2.2 +5 -1 src/sys/i386/i386/globals.s
1.115.2.18 +5 -2 src/sys/i386/i386/mp_machdep.c
1.250.2.21 +71 -11 src/sys/i386/i386/pmap.c
1.11.2.3 +5 -2 src/sys/i386/include/globaldata.h
1.5.2.3 +4 -0 src/sys/i386/include/globals.h
Our -stable machine has been rebooting every 24hrs from upgrading
on "Jul 18".
Then I did cvsup again on Aug 31 03:00JST (GMT +0900). But......
# gdb -k kernel.1 vmcore.1
IdlePTD at phsyical address 0x00367000
initial pcb at physical address 0x002c55c0
panicstr: page fault
panic messages:
---
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0x5ea26fef
fault code = supervisor read, page not present
instruction pointer = 0x8:0xc01924b0
stack pointer = 0x10:0xc8bafd74
frame pointer = 0x10:0xc8bafd90
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 5674 (perl)
interrupt mask trap number = 12
panic: page fault
syncing disks... 18
done
Uptime: 23h55m53s
# dmesg -a
FreeBSD 4.9-PRERELEASE #6: Mon Sep 1 08:09:40 JST 2003
tss@stargate.tokai-ic.or .jp:/usr/src/sys/compile/STARGATE
Timecounter "i8254" frequency 1193182 Hz
Timecounter "TSC" frequency 768413581 Hz
CPU: Intel Celeron (768.41-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0x686 Stepping = 6
Features=0x383f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE>
real memory = 133083136 (129964K bytes)
avail memory = 126193664 (123236K bytes)
Preloaded elf kernel "kernel" at 0xc0348000.
Pentium Pro MTRR support enabled
--
///////////////////////////////////////////////////////////////////////
// T.Suzuki @ Tokai Internet Council
///////////////////////////////////////////////////////////////////////
On Sat, Aug 30, 2003 at 02:20:48PM -0500, Mike Silbersack wrote:> As others have noted, Tor's patch appears to be a total solution to the > recent instability the PAE patch introduced. So, if you're experiencing > panics with a recent kernel, or are in a position to stress a machine, > please cvsup and give it a test!FYI, I'm *not* seeing a 24-hr crash. Just running the GENERIC kernel. I've been cvsuping fairly regular and the reloads are recompiles vs. crashes. Just a datapoint since I haven't made any effort to use PAE. FreeBSD pandora.jk.homeunix.net 4.9-PRERELEASE FreeBSD 4.9-PRERELEASE #6: Tue Sep 2 21:13:12 PDT 2003 root@pandora.jk.homeunix.net:/usr/src/sys/compile/GENERIC i386 Aug 27 06:45:42 pandora /kernel: FreeBSD 4.9-PRERELEASE #2: Mon Aug 25 17:34:08 PDT 2003 Aug 27 06:57:10 pandora /kernel: FreeBSD 4.9-PRERELEASE #3: Wed Aug 27 06:49:36 PDT 2003 Aug 30 09:12:34 pandora /kernel: FreeBSD 4.9-PRERELEASE #4: Sat Aug 30 09:09:03 PDT 2003 Sep 2 21:56:09 pandora /kernel: FreeBSD 4.9-PRERELEASE #6: Tue Sep 2 21:13:12 PDT 2003
Thanks for Mike and Silby. Sorry, of my poor information.
I have following options in the kernel.
option DDB
option DDB_UNATTENDED
makeoptions DEBUG=-g
# gdb -k /usr/src/sys/compile/STARGATE/kernel.debug /var/crash/vmcore.1
(kgdb) bt
#0 dumpsys () at ../../kern/kern_shutdown.c:487
#1 0xc01562bf in boot (howto=256) at ../../kern/kern_shutdown.c:316
#2 0xc01566fd in panic (fmt=0xc029ccac "%s") at
../../kern/kern_shutdown.c:595
#3 0xc025f1e7 in trap_fatal (frame=0xc8bafd34, eva=1587703791) at
../../i386/i386/trap.c:974
#4 0xc025ee95 in trap_pfault (frame=0xc8bafd34, usermode=0, eva=1587703791) at
../../i386/i386/trap.c:867
#5 0xc025ea3b in trap (frame={tf_fs = -1060634608, tf_es = 16, tf_ds =
-927334384, tf_edi = -1060944235,
tf_esi = -1060963163, tf_ebp = -927269488, tf_isp = -927269536, tf_ebx =
1587703791,
tf_edx = -1060963128, tf_ecx = -1060963131, tf_eax = 28, tf_trapno = 12,
tf_err = 0,
tf_eip = -1072094032, tf_cs = 8, tf_eflags = 66050, tf_esp = 41216, tf_ss
= -1060944240})
at ../../i386/i386/trap.c:466
#6 0xc01924b0 in ifa_ifwithnet (addr=0xc0c34690) at ../../net/if.c:612
#7 0xc019ed31 in in_pcbladdr (inp=0xc81d5b00, nam=0xc0c34690,
plocal_sin=0xc8bafdc8)
at ../../netinet/in_pcb.c:459
#8 0xc019ee16 in in_pcbconnect (inp=0xc81d5b00, nam=0xc0c34690, p=0xc8a7cea0)
at ../../netinet/in_pcb.c:526
#9 0xc01b5373 in udp_output (inp=0xc81d5b00, m=0xc074d800, addr=0xc0c34690,
control=0x0, p=0xc8a7cea0)
at ../../netinet/udp_usrreq.c:708
#10 0xc01b5784 in udp_send (so=0xc8177a00, flags=0, m=0xc074d800,
addr=0xc0c34690, control=0x0, p=0xc8a7cea0)
at ../../netinet/udp_usrreq.c:920
#11 0xc01756b7 in sosend (so=0xc8177a00, addr=0xc0c34690, uio=0xc8bafecc,
top=0xc074d800, control=0x0, flags=0,
p=0xc8a7cea0) at ../../kern/uipc_socket.c:609
#12 0xc0178b37 in sendit (p=0xc8a7cea0, s=4, mp=0xc8baff0c, flags=0) at
../../kern/uipc_syscalls.c:590
#13 0xc0178c3a in sendto (p=0xc8a7cea0, uap=0xc8baff80) at
../../kern/uipc_syscalls.c:643
#14 0xc025f49d in syscall2 (frame={tf_fs = 135725103, tf_es = 135725103, tf_ds =
-1078001617,
tf_edi = 135172116, tf_esi = 135172112, tf_ebp = -1077937056, tf_isp =
-927268908, tf_ebx = 672126736,
tf_edx = 139254656, tf_ecx = 139260876, tf_eax = 133, tf_trapno = 0,
tf_err = 2, tf_eip = 672614432,
tf_cs = 31, tf_eflags = 659, tf_esp = -1077937148, tf_ss = 47}) at
../../i386/i386/trap.c:1175
#15 0xc0250785 in Xint0x80_syscall ()
#16 0x2807ef19 in ?? ()
#17 0x280e8c58 in ?? ()
#18 0x8048e79 in ?? ()
#19 0x8048d5a in ?? ()
--
///////////////////////////////////////////////////////////////////////
// T.Suzuki @ Tokai Internet Council, Japan
///////////////////////////////////////////////////////////////////////
+--- On Wednesday, September 03, 2003 03:09 ---
| Mike Tancsa proclaimed:
| >| in
| >| /etc/rc.conf
| >| add
| >| dumpdev="/dev/ad0s1b" # Device name to crashdump to
(or NO).
| >| dumpdir="/var/crash" # Directory where crash dumps are to
be stored
| >
| >Ok, I am guessing the 'dumpdev' line is the boot-time equivalent
of
| > running the dumpon(8) command to set the sysctl kern.dumpdev.
|
| Correct. The above also assumes thats where you have your swap. If its
| not, than adjust accordingly.
|
Well, it has been a little over 24 hours, and I got a panic, but no dump!
Here is the log from the panic as well as the message stating that a dump
couldn't be performed:
//--start--//
Fatal trap 12: page fault while in kernel mode
fault virtual address = 0xbed557c5
fault code = supervisor read, page not present
instruction pointer = 0x8:0xc027c38d
stack pointer = 0x10:0xdea01ecc
frame pointer = 0x10:0xdea01ef4
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 5 (syncer)
interrupt mask = none
trap number = 12
panic: page fault
syncing disks... 8
done
Uptime: 1d0h9m53s
dumping to dev #ar/0x20001, offset 1279168
dump failed, reason: device doesn't support a dump routine
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...
//--end--//
I have a HighPoint IDE RAID controller in this box with a RAID 1
configuration (ar0) using two Seagate 120GB disks (ad4 and ad6). I put
this in /etc/rc.conf before I rebooted last night:
//--start--//
$ head /etc/rc.conf
dumpdev="/dev/ar0s1b" #swap device configured in /etc/fstab
dumpdir="/usr/var/crash" #using a dir under /usr as /var isn't
big enough
//--end--//
Here is the output from 'disklabel -r ar0' showing that my swap device
is
indeed /dev/ar0s1b:
//--start--//
$ disklabel -r ar0
# /dev/ar0c:
type: ESDI
disk: ar0s1
label:
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 14592
sectors/unit: 234436482
rpm: 3600
interleave: 1
trackskew: 0
cylinderskew: 0
headswitch: 0 # milliseconds
track-to-track seek: 0 # milliseconds
drivedata: 0
8 partitions:
# size offset fstype [fsize bsize bps/cpg]
a: 262144 0 4.2BSD 2048 16384 94 # (Cyl. 0 - 16*)
b: 2589856 262144 swap # (Cyl. 16*- 177*)
c: 234436482 0 unused 0 0 # (Cyl. 0 - 14592*)
e: 524288 2852000 4.2BSD 2048 16384 94 # (Cyl. 177*- 210*)
f: 524288 3376288 4.2BSD 2048 16384 94 # (Cyl. 210*- 242*)
g: 230535906 3900576 4.2BSD 2048 16384 89 # (Cyl. 242*- 14592*)
//--end--//
So, what am I missing here in order to get a dump so that I can help debug
this problem? Is the ar0 device not able to be a dump device? Do I need to
install a dedicated IDE drive on the MB's IDE controller just so that I can
get a dump?
This is exacerbated by the fact that I will have to wait another 24 hours to
try again.
--
Mike
perl -e 'print
unpack("u","88V]N=&%C=\"!I;F9O(&EN(&AE861E<G,*");'