I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last built 2012-02-08). It will panic during the daily periodic scripts that run at 3am. Here is the most recent panic message: Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff8069d266 stack pointer = 0x28:0xffffff8094b90390 frame pointer = 0x28:0xffffff8094b903a0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = resume, IOPL = 0 current process = 72566 (ps) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: #0 0xffffffff8062cf8e at kdb_backtrace+0x5e #1 0xffffffff805facd3 at panic+0x183 #2 0xffffffff808e6c20 at trap_fatal+0x290 #3 0xffffffff808e715a at trap+0x10a #4 0xffffffff808cec64 at calltrap+0x8 #5 0xffffffff805ee034 at fill_kinfo_thread+0x54 #6 0xffffffff805eee76 at fill_kinfo_proc+0x586 #7 0xffffffff805f22b8 at sysctl_out_proc+0x48 #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278 #9 0xffffffff8060473f at sysctl_root+0x14f #10 0xffffffff80604a2a at userland_sysctl+0x14a #11 0xffffffff80604f1a at __sysctl+0xaa #12 0xffffffff808e62d4 at amd64_syscall+0x1f4 #13 0xffffffff808cef5c at Xfast_syscall+0xfc Uptime: 3d19h6m0s Dumping 1308 out of 2028 MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91% Dump complete Automatic reboot in 15 seconds - press a key on the console to abort Rebooting... The reason for the subject line is that I have another RELENG_8 system that uses ZFS + nullfs but doesn't panic, leading me to believe that ZFS + nullfs is not the problem. I am wondering if it is the combination of the three that is deadly, here. Both RELENG_8 systems are root-on-ZFS installs. Each night there is a separate backup script that runs and completes before the regular "periodic daily" run. This script takes a recursive snapshot of the ZFS pool and then mounts these snapshots via mount_nullfs to provide a coherent view of the filesystem under /backup. The only difference between the two RELENG_8 systems is that one uses rsync to back up /backup to another machine and the other uses the Linux Tivoli TSM client to back up /backup to a TSM server. After the backup is completed, a script runs that unmounts the nullfs file systems and then destroys the ZFS snapshot. The first (rsync backup) RELENG_8 system does not panic. It has been running the ZFS + nullfs rsync backup job without incident for weeks now. The second (Tivoli TSM) RELENG_8 will reliably panic when the subsequent "periodic daily" job runs. (It is using the 32-bit TSM 6.2.4 Linux client running "dsmc schedule" via the linux_base-f10-10_4 package.) The actual ZFS + nullfs Tivoli TSM backup job appears to run successfully, making me wonder if perhaps it has some memory leak or other subtle corruption that sets up the ensuing panic when the "periodic daily" job later gives the system a workout. If I can provide more information about the panic, please let me know. Despite the message about dumping in the panic output above, when the system reboots I get a "No core dumps found" message during boot. (I have dumpdev="AUTO" set in /etc/rc.conf.) My swap device is on separate partitions but is mirrored using geom_mirror as /dev/mirror/swap. Do crash dumps to gmirror devices work on RELENG_8? Does anyone have any idea what is to blame for the panic, or how I can fix or work around it? Cheers, Paul. PS: The uptime of three days in the panic message is because I disabled the Tivoli TSM backup job on Friday so it would not run over the weekend.
On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last built 2012-02-08). It will panic during the daily periodic scripts that run at 3am. Here is the most recent panic message: > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 0; apic id = 00 > instruction pointer = 0x20:0xffffffff8069d266 > stack pointer = 0x28:0xffffff8094b90390 > frame pointer = 0x28:0xffffff8094b903a0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = resume, IOPL = 0 > current process = 72566 (ps) > trap number = 9 > panic: general protection fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff8062cf8e at kdb_backtrace+0x5e > #1 0xffffffff805facd3 at panic+0x183 > #2 0xffffffff808e6c20 at trap_fatal+0x290 > #3 0xffffffff808e715a at trap+0x10a > #4 0xffffffff808cec64 at calltrap+0x8 > #5 0xffffffff805ee034 at fill_kinfo_thread+0x54 > #6 0xffffffff805eee76 at fill_kinfo_proc+0x586 > #7 0xffffffff805f22b8 at sysctl_out_proc+0x48 > #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278 > #9 0xffffffff8060473f at sysctl_root+0x14f > #10 0xffffffff80604a2a at userland_sysctl+0x14a > #11 0xffffffff80604f1a at __sysctl+0xaa > #12 0xffffffff808e62d4 at amd64_syscall+0x1f4 > #13 0xffffffff808cef5c at Xfast_syscall+0xfc > Uptime: 3d19h6m0s > Dumping 1308 out of 2028 MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91% > Dump complete > Automatic reboot in 15 seconds - press a key on the console to abort > Rebooting... > > > The reason for the subject line is that I have another RELENG_8 system that uses ZFS + nullfs but doesn't panic, leading me to believe that ZFS + nullfs is not the problem. I am wondering if it is the combination of the three that is deadly, here. > > Both RELENG_8 systems are root-on-ZFS installs. Each night there is a separate backup script that runs and completes before the regular "periodic daily" run. This script takes a recursive snapshot of the ZFS pool and then mounts these snapshots via mount_nullfs to provide a coherent view of the filesystem under /backup. The only difference between the two RELENG_8 systems is that one uses rsync to back up /backup to another machine and the other uses the Linux Tivoli TSM client to back up /backup to a TSM server. After the backup is completed, a script runs that unmounts the nullfs file systems and then destroys the ZFS snapshot. > > The first (rsync backup) RELENG_8 system does not panic. It has been running the ZFS + nullfs rsync backup job without incident for weeks now. The second (Tivoli TSM) RELENG_8 will reliably panic when the subsequent "periodic daily" job runs. (It is using the 32-bit TSM 6.2.4 Linux client running "dsmc schedule" via the linux_base-f10-10_4 package.) The actual ZFS + nullfs Tivoli TSM backup job appears to run successfully, making me wonder if perhaps it has some memory leak or other subtle corruption that sets up the ensuing panic when the "periodic daily" job later gives the system a workout. > > If I can provide more information about the panic, please let me know. Despite the message about dumping in the panic output above, when the system reboots I get a "No core dumps found" message during boot. (I have dumpdev="AUTO" set in /etc/rc.conf.) My swap device is on separate partitions but is mirrored using geom_mirror as /dev/mirror/swap. Do crash dumps to gmirror devices work on RELENG_8?See gmirror(8) man page, section NOTES. Read the full thing.> Does anyone have any idea what is to blame for the panic, or how I can fix or work around it?Does the panic always happen when "ps" is run? That's what's shown in the above panic message. Quoting:> current process = 72566 (ps)And I'm inclined to think it does, based on the backtrace:> #5 0xffffffff805ee034 at fill_kinfo_thread+0x54 > #6 0xffffffff805eee76 at fill_kinfo_proc+0x586 > #7 0xffffffff805f22b8 at sysctl_out_proc+0x48 > #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278But if you can go through the previous panics and confirm that, it would be helpful to developers in tracking down the problem. Sorry I can't be of any more assistance than this. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather wrote:> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last built 2012-02-08). It will panic during the daily periodic scripts that run at 3am. Here is the most recent panic message: > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 0; apic id = 00 > instruction pointer = 0x20:0xffffffff8069d266 > stack pointer = 0x28:0xffffff8094b90390 > frame pointer = 0x28:0xffffff8094b903a0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = resume, IOPL = 0 > current process = 72566 (ps) > trap number = 9 > panic: general protection fault > cpuid = 0 > KDB: stack backtrace: > #0 0xffffffff8062cf8e at kdb_backtrace+0x5e > #1 0xffffffff805facd3 at panic+0x183 > #2 0xffffffff808e6c20 at trap_fatal+0x290 > #3 0xffffffff808e715a at trap+0x10a > #4 0xffffffff808cec64 at calltrap+0x8 > #5 0xffffffff805ee034 at fill_kinfo_thread+0x54 > #6 0xffffffff805eee76 at fill_kinfo_proc+0x586 > #7 0xffffffff805f22b8 at sysctl_out_proc+0x48 > #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278 > #9 0xffffffff8060473f at sysctl_root+0x14f > #10 0xffffffff80604a2a at userland_sysctl+0x14a > #11 0xffffffff80604f1a at __sysctl+0xaa > #12 0xffffffff808e62d4 at amd64_syscall+0x1f4 > #13 0xffffffff808cef5c at Xfast_syscall+0xfcPlease look up the line number for the fill_kinfo_thread+0x54. -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 196 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20120215/384f7372/attachment-0001.pgp