thr3ads.net - freebsd stable - ZFS + nullfs + Linuxulator = panic? [Feb 2012]

If this information is useful, please help other people find it:
Share via:

Paul Mather

2012-Feb-14 23:49 UTC

ZFS + nullfs + Linuxulator = panic?

I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel, last
built 2012-02-08).  It will panic during the daily periodic scripts that run at
3am.  Here is the most recent panic message:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer     = 0x20:0xffffffff8069d266
stack pointer           = 0x28:0xffffff8094b90390
frame pointer           = 0x28:0xffffff8094b903a0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = resume, IOPL = 0
current process         = 72566 (ps)
trap number             = 9
panic: general protection fault
cpuid = 0
KDB: stack backtrace:
#0 0xffffffff8062cf8e at kdb_backtrace+0x5e
#1 0xffffffff805facd3 at panic+0x183
#2 0xffffffff808e6c20 at trap_fatal+0x290
#3 0xffffffff808e715a at trap+0x10a
#4 0xffffffff808cec64 at calltrap+0x8
#5 0xffffffff805ee034 at fill_kinfo_thread+0x54
#6 0xffffffff805eee76 at fill_kinfo_proc+0x586
#7 0xffffffff805f22b8 at sysctl_out_proc+0x48
#8 0xffffffff805f26c8 at sysctl_kern_proc+0x278
#9 0xffffffff8060473f at sysctl_root+0x14f
#10 0xffffffff80604a2a at userland_sysctl+0x14a
#11 0xffffffff80604f1a at __sysctl+0xaa
#12 0xffffffff808e62d4 at amd64_syscall+0x1f4
#13 0xffffffff808cef5c at Xfast_syscall+0xfc
Uptime: 3d19h6m0s
Dumping 1308 out of 2028 MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91%
Dump complete
Automatic reboot in 15 seconds - press a key on the console to abort
Rebooting...


The reason for the subject line is that I have another RELENG_8 system that uses
ZFS + nullfs but doesn't panic, leading me to believe that ZFS + nullfs is
not the problem.  I am wondering if it is the combination of the three that is
deadly, here.

Both RELENG_8 systems are root-on-ZFS installs.  Each night there is a separate
backup script that runs and completes before the regular "periodic
daily" run.  This script takes a recursive snapshot of the ZFS pool and
then mounts these snapshots via mount_nullfs to provide a coherent view of the
filesystem under /backup.  The only difference between the two RELENG_8 systems
is that one uses rsync to back up /backup to another machine and the other uses
the Linux Tivoli TSM client to back up /backup to a TSM server.  After the
backup is completed, a script runs that unmounts the nullfs file systems and
then destroys the ZFS snapshot.

The first (rsync backup) RELENG_8 system does not panic.  It has been running
the ZFS + nullfs rsync backup job without incident for weeks now.  The second
(Tivoli TSM) RELENG_8 will reliably panic when the subsequent "periodic
daily" job runs.  (It is using the 32-bit TSM 6.2.4 Linux client running
"dsmc schedule" via the linux_base-f10-10_4 package.)  The actual ZFS
+ nullfs Tivoli TSM backup job appears to run successfully, making me wonder if
perhaps it has some memory leak or other subtle corruption that sets up the
ensuing panic when the "periodic daily" job later gives the system a
workout.

If I can provide more information about the panic, please let me know.  Despite
the message about dumping in the panic output above, when the system reboots I
get a "No core dumps found" message during boot.  (I have
dumpdev="AUTO" set in /etc/rc.conf.)  My swap device is on separate
partitions but is mirrored using geom_mirror as /dev/mirror/swap.  Do crash
dumps to gmirror devices work on RELENG_8?

Does anyone have any idea what is to blame for the panic, or how I can fix or
work around it?

Cheers,

Paul.

PS: The uptime of three days in the panic message is because I disabled the
Tivoli TSM backup job on Friday so it would not run over the weekend.

Jeremy Chadwick

2012-Feb-15 00:24 UTC

head link

ZFS + nullfs + Linuxulator = panic?

On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather
wrote:> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel,
last built 2012-02-08).  It will panic during the daily periodic scripts that
run at 3am.  Here is the most recent panic message:
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer     = 0x20:0xffffffff8069d266
> stack pointer           = 0x28:0xffffff8094b90390
> frame pointer           = 0x28:0xffffff8094b903a0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 72566 (ps)
> trap number             = 9
> panic: general protection fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff8062cf8e at kdb_backtrace+0x5e
> #1 0xffffffff805facd3 at panic+0x183
> #2 0xffffffff808e6c20 at trap_fatal+0x290
> #3 0xffffffff808e715a at trap+0x10a
> #4 0xffffffff808cec64 at calltrap+0x8
> #5 0xffffffff805ee034 at fill_kinfo_thread+0x54
> #6 0xffffffff805eee76 at fill_kinfo_proc+0x586
> #7 0xffffffff805f22b8 at sysctl_out_proc+0x48
> #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278
> #9 0xffffffff8060473f at sysctl_root+0x14f
> #10 0xffffffff80604a2a at userland_sysctl+0x14a
> #11 0xffffffff80604f1a at __sysctl+0xaa
> #12 0xffffffff808e62d4 at amd64_syscall+0x1f4
> #13 0xffffffff808cef5c at Xfast_syscall+0xfc
> Uptime: 3d19h6m0s
> Dumping 1308 out of 2028
MB:..2%..12%..21%..31%..41%..51%..62%..71%..81%..91%
> Dump complete
> Automatic reboot in 15 seconds - press a key on the console to abort
> Rebooting...
> 
> 
> The reason for the subject line is that I have another RELENG_8 system that
uses ZFS + nullfs but doesn't panic, leading me to believe that ZFS + nullfs
is not the problem.  I am wondering if it is the combination of the three that
is deadly, here.
> 
> Both RELENG_8 systems are root-on-ZFS installs.  Each night there is a
separate backup script that runs and completes before the regular "periodic
daily" run.  This script takes a recursive snapshot of the ZFS pool and
then mounts these snapshots via mount_nullfs to provide a coherent view of the
filesystem under /backup.  The only difference between the two RELENG_8 systems
is that one uses rsync to back up /backup to another machine and the other uses
the Linux Tivoli TSM client to back up /backup to a TSM server.  After the
backup is completed, a script runs that unmounts the nullfs file systems and
then destroys the ZFS snapshot.
> 
> The first (rsync backup) RELENG_8 system does not panic.  It has been
running the ZFS + nullfs rsync backup job without incident for weeks now.  The
second (Tivoli TSM) RELENG_8 will reliably panic when the subsequent
"periodic daily" job runs.  (It is using the 32-bit TSM 6.2.4 Linux
client running "dsmc schedule" via the linux_base-f10-10_4 package.) 
The actual ZFS + nullfs Tivoli TSM backup job appears to run successfully,
making me wonder if perhaps it has some memory leak or other subtle corruption
that sets up the ensuing panic when the "periodic daily" job later
gives the system a workout.
> 
> If I can provide more information about the panic, please let me know. 
Despite the message about dumping in the panic output above, when the system
reboots I get a "No core dumps found" message during boot.  (I have
dumpdev="AUTO" set in /etc/rc.conf.)  My swap device is on separate
partitions but is mirrored using geom_mirror as /dev/mirror/swap.  Do crash
dumps to gmirror devices work on RELENG_8?
See gmirror(8) man page, section NOTES.  Read the full thing.
> Does anyone have any idea what is to blame for the panic, or how I can fix
or work around it?
Does the panic always happen when "ps" is run?  That's what's
shown in
the above panic message.  Quoting:
> current process         = 72566 (ps)
And I'm inclined to think it does, based on the backtrace:
> #5 0xffffffff805ee034 at fill_kinfo_thread+0x54
> #6 0xffffffff805eee76 at fill_kinfo_proc+0x586
> #7 0xffffffff805f22b8 at sysctl_out_proc+0x48
> #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278
But if you can go through the previous panics and confirm that, it would
be helpful to developers in tracking down the problem.

Sorry I can't be of any more assistance than this.

-- 
| Jeremy Chadwick                              jdc at parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |

Konstantin Belousov

2012-Feb-15 01:04 UTC

head link

ZFS + nullfs + Linuxulator = panic?

On Tue, Feb 14, 2012 at 09:38:18AM -0500, Paul Mather
wrote:> I have a problem with RELENG_8 (FreeBSD/amd64 running a GENERIC kernel,
last built 2012-02-08).  It will panic during the daily periodic scripts that
run at 3am.  Here is the most recent panic message:
> 
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0; apic id = 00
> instruction pointer     = 0x20:0xffffffff8069d266
> stack pointer           = 0x28:0xffffff8094b90390
> frame pointer           = 0x28:0xffffff8094b903a0
> code segment            = base 0x0, limit 0xfffff, type 0x1b
>                         = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags        = resume, IOPL = 0
> current process         = 72566 (ps)
> trap number             = 9
> panic: general protection fault
> cpuid = 0
> KDB: stack backtrace:
> #0 0xffffffff8062cf8e at kdb_backtrace+0x5e
> #1 0xffffffff805facd3 at panic+0x183
> #2 0xffffffff808e6c20 at trap_fatal+0x290
> #3 0xffffffff808e715a at trap+0x10a
> #4 0xffffffff808cec64 at calltrap+0x8
> #5 0xffffffff805ee034 at fill_kinfo_thread+0x54
> #6 0xffffffff805eee76 at fill_kinfo_proc+0x586
> #7 0xffffffff805f22b8 at sysctl_out_proc+0x48
> #8 0xffffffff805f26c8 at sysctl_kern_proc+0x278
> #9 0xffffffff8060473f at sysctl_root+0x14f
> #10 0xffffffff80604a2a at userland_sysctl+0x14a
> #11 0xffffffff80604f1a at __sysctl+0xaa
> #12 0xffffffff808e62d4 at amd64_syscall+0x1f4
> #13 0xffffffff808cef5c at Xfast_syscall+0xfc
Please look up the line number for the fill_kinfo_thread+0x54.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 196 bytes
Desc: not available
Url :
http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20120215/384f7372/attachment-0001.pgp

freebsd stable - Feb 2012 - ZFS + nullfs + Linuxulator = panic?

ZFS + nullfs + Linuxulator = panic?

ZFS + nullfs + Linuxulator = panic?

ZFS + nullfs + Linuxulator = panic?