Eugene Grosbein
2018-Apr-17 09:30 UTC
stable/11 r332356 started panicing in bpf_dtor/__mtx_lock_sleep
Hi!
I have a server that was running stable/11 rock-stable for many months.
A week ago I've updates it to 11.1-STABLE r332356 and today it paniced and I
have crashdump.
Any thoughts?
Unread portion of the kernel message buffer:
Sleeping thread (tid 100444, pid 28400) owns a non-sleepable lock
KDB: stack backtrace of thread 100444:
sched_switch() at sched_switch+0x626/frame 0xfffffe03550f9740
mi_switch() at mi_switch+0xc5/frame 0xfffffe03550f9770
sleepq_wait() at sleepq_wait+0x2c/frame 0xfffffe03550f97a0
_sx_xlock_hard() at _sx_xlock_hard+0x2f0/frame 0xfffffe03550f9830
vlan_ioctl() at vlan_ioctl+0x53f/frame 0xfffffe03550f98b0
ifpromisc() at ifpromisc+0x106/frame 0xfffffe03550f9910
bpf_detachd_locked() at bpf_detachd_locked+0x1b4/frame 0xfffffe03550f9960
bpf_dtor() at bpf_dtor+0x9a/frame 0xfffffe03550f9990
devfs_fpdrop() at devfs_fpdrop+0x9c/frame 0xfffffe03550f99b0
devfs_close_f() at devfs_close_f+0x45/frame 0xfffffe03550f99e0
closef() at closef+0x209/frame 0xfffffe03550f9a70
closefp() at closefp+0x8b/frame 0xfffffe03550f9ab0
amd64_syscall() at amd64_syscall+0x25c/frame 0xfffffe03550f9bf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe03550f9bf0
--- syscall (6, FreeBSD ELF64, sys_close), rip = 0x4a317a, rsp = 0x7fffffffe3d8,
rbp = 0x7fffffffe3f0 ---
panic: sleeping thread
cpuid = 1
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0354c6d7c0
vpanic() at vpanic+0x177/frame 0xfffffe0354c6d820
panic() at panic+0x43/frame 0xfffffe0354c6d880
propagate_priority() at propagate_priority+0x183/frame 0xfffffe0354c6d8b0
turnstile_wait() at turnstile_wait+0x2bc/frame 0xfffffe0354c6d900
__mtx_lock_sleep() at __mtx_lock_sleep+0x151/frame 0xfffffe0354c6d960
bpf_dtor() at bpf_dtor+0x191/frame 0xfffffe0354c6d990
devfs_fpdrop() at devfs_fpdrop+0x9c/frame 0xfffffe0354c6d9b0
devfs_close_f() at devfs_close_f+0x45/frame 0xfffffe0354c6d9e0
closef() at closef+0x209/frame 0xfffffe0354c6da70
closefp() at closefp+0x8b/frame 0xfffffe0354c6dab0
amd64_syscall() at amd64_syscall+0x25c/frame 0xfffffe0354c6dbf0
fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0354c6dbf0
--- syscall (6, FreeBSD ELF64, sys_close), rip = 0x4a317a, rsp = 0x7fffffffe3d8,
rbp = 0x7fffffffe3f0 ---
Uptime: 6d2h0m17s
Dumping 1414 out of 12265 MB:..2%..11%..21%..31%..41%..51%..62%..71%..81%..91%
(kgdb) bt
#0 doadump (textdump=1) at pcpu.h:229
#1 0xffffffff804f3b9a in kern_reboot (howto=260) at
/usr/local/src/sys/kern/kern_shutdown.c:383
#2 0xffffffff804f3f81 in vpanic (fmt=<value optimized out>, ap=<value
optimized out>)
at /usr/local/src/sys/kern/kern_shutdown.c:776
#3 0xffffffff804f3dc3 in panic (fmt=<value optimized out>) at
/usr/local/src/sys/kern/kern_shutdown.c:707
#4 0xffffffff80546b33 in propagate_priority (td=<value optimized out>) at
/usr/local/src/sys/kern/subr_turnstile.c:226
#5 0xffffffff8054720c in turnstile_wait (ts=0xfffff80151ac4180, owner=<value
optimized out>, queue=<value optimized out>)
at /usr/local/src/sys/kern/subr_turnstile.c:742
#6 0xffffffff804dda81 in __mtx_lock_sleep (c=0xffffffff80c9c910, v=<value
optimized out>)
at /usr/local/src/sys/kern/kern_mutex.c:627
#7 0xffffffff805cd061 in bpf_dtor (data=0xfffff801512df800) at
/usr/local/src/sys/net/bpf.c:778
#8 0xffffffff80455e5c in devfs_fpdrop (fp=<value optimized out>) at
/usr/local/src/sys/fs/devfs/devfs_vnops.c:193
#9 0xffffffff804595d5 in devfs_close_f (fp=0xfffff802af2fe6e0, td=<value
optimized out>)
at /usr/local/src/sys/fs/devfs/devfs_vnops.c:675
#10 0xffffffff804b1549 in closef (fp=0xfffff802af2fe6e0, td=0xfffff8000cf75000)
at file.h:346
#11 0xffffffff804af00b in closefp (fdp=0xfffff80083b48000, fd=<value
optimized out>, fp=0xfffff802af2fe6e0,
td=0xfffff8000cf75000, holdleaders=0) at
/usr/local/src/sys/kern/kern_descrip.c:1191
#12 0xffffffff807afb6c in amd64_syscall (td=0xfffff8000cf75000, traced=0) at
subr_syscall.c:132
#13 0xffffffff80791c3d in fast_syscall_common () at
/usr/local/src/sys/amd64/amd64/exception.S:480
#14 0x00000000004a317a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Eugene Grosbein
2018-Apr-17 10:59 UTC
stable/11 r332356 started panicing in bpf_dtor/__mtx_lock_sleep
On 17.04.2018 16:30, Eugene Grosbein wrote: CCing mjoras@ as author of suspicious change https://svnweb.freebsd.org/base?view=revision&revision=323477> I have a server that was running stable/11 rock-stable for many months.It was running stable/11 r314043 before last update.> A week ago I've updates it to 11.1-STABLE r332356 and today it paniced and I have crashdump. > Any thoughts? > > Unread portion of the kernel message buffer: > Sleeping thread (tid 100444, pid 28400) owns a non-sleepable lock > KDB: stack backtrace of thread 100444: > sched_switch() at sched_switch+0x626/frame 0xfffffe03550f9740 > mi_switch() at mi_switch+0xc5/frame 0xfffffe03550f9770 > sleepq_wait() at sleepq_wait+0x2c/frame 0xfffffe03550f97a0 > _sx_xlock_hard() at _sx_xlock_hard+0x2f0/frame 0xfffffe03550f9830 > vlan_ioctl() at vlan_ioctl+0x53f/frame 0xfffffe03550f98b0 > ifpromisc() at ifpromisc+0x106/frame 0xfffffe03550f9910 > bpf_detachd_locked() at bpf_detachd_locked+0x1b4/frame 0xfffffe03550f9960 > bpf_dtor() at bpf_dtor+0x9a/frame 0xfffffe03550f9990 > devfs_fpdrop() at devfs_fpdrop+0x9c/frame 0xfffffe03550f99b0 > devfs_close_f() at devfs_close_f+0x45/frame 0xfffffe03550f99e0 > closef() at closef+0x209/frame 0xfffffe03550f9a70 > closefp() at closefp+0x8b/frame 0xfffffe03550f9ab0 > amd64_syscall() at amd64_syscall+0x25c/frame 0xfffffe03550f9bf0 > fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe03550f9bf0 > --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x4a317a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe3f0 --- > panic: sleeping thread > cpuid = 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0354c6d7c0 > vpanic() at vpanic+0x177/frame 0xfffffe0354c6d820 > panic() at panic+0x43/frame 0xfffffe0354c6d880 > propagate_priority() at propagate_priority+0x183/frame 0xfffffe0354c6d8b0 > turnstile_wait() at turnstile_wait+0x2bc/frame 0xfffffe0354c6d900 > __mtx_lock_sleep() at __mtx_lock_sleep+0x151/frame 0xfffffe0354c6d960 > bpf_dtor() at bpf_dtor+0x191/frame 0xfffffe0354c6d990 > devfs_fpdrop() at devfs_fpdrop+0x9c/frame 0xfffffe0354c6d9b0 > devfs_close_f() at devfs_close_f+0x45/frame 0xfffffe0354c6d9e0 > closef() at closef+0x209/frame 0xfffffe0354c6da70 > closefp() at closefp+0x8b/frame 0xfffffe0354c6dab0 > amd64_syscall() at amd64_syscall+0x25c/frame 0xfffffe0354c6dbf0 > fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0354c6dbf0 > --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x4a317a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe3f0 --- > Uptime: 6d2h0m17s > Dumping 1414 out of 12265 MB:..2%..11%..21%..31%..41%..51%..62%..71%..81%..91% > > > (kgdb) bt > #0 doadump (textdump=1) at pcpu.h:229 > #1 0xffffffff804f3b9a in kern_reboot (howto=260) at /usr/local/src/sys/kern/kern_shutdown.c:383 > #2 0xffffffff804f3f81 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) > at /usr/local/src/sys/kern/kern_shutdown.c:776 > #3 0xffffffff804f3dc3 in panic (fmt=<value optimized out>) at /usr/local/src/sys/kern/kern_shutdown.c:707 > #4 0xffffffff80546b33 in propagate_priority (td=<value optimized out>) at /usr/local/src/sys/kern/subr_turnstile.c:226 > #5 0xffffffff8054720c in turnstile_wait (ts=0xfffff80151ac4180, owner=<value optimized out>, queue=<value optimized out>) > at /usr/local/src/sys/kern/subr_turnstile.c:742 > #6 0xffffffff804dda81 in __mtx_lock_sleep (c=0xffffffff80c9c910, v=<value optimized out>) > at /usr/local/src/sys/kern/kern_mutex.c:627 > #7 0xffffffff805cd061 in bpf_dtor (data=0xfffff801512df800) at /usr/local/src/sys/net/bpf.c:778 > #8 0xffffffff80455e5c in devfs_fpdrop (fp=<value optimized out>) at /usr/local/src/sys/fs/devfs/devfs_vnops.c:193 > #9 0xffffffff804595d5 in devfs_close_f (fp=0xfffff802af2fe6e0, td=<value optimized out>) > at /usr/local/src/sys/fs/devfs/devfs_vnops.c:675 > #10 0xffffffff804b1549 in closef (fp=0xfffff802af2fe6e0, td=0xfffff8000cf75000) at file.h:346 > #11 0xffffffff804af00b in closefp (fdp=0xfffff80083b48000, fd=<value optimized out>, fp=0xfffff802af2fe6e0, > td=0xfffff8000cf75000, holdleaders=0) at /usr/local/src/sys/kern/kern_descrip.c:1191 > #12 0xffffffff807afb6c in amd64_syscall (td=0xfffff8000cf75000, traced=0) at subr_syscall.c:132 > #13 0xffffffff80791c3d in fast_syscall_common () at /usr/local/src/sys/amd64/amd64/exception.S:480 > #14 0x00000000004a317a in ?? () > Previous frame inner to this frame (corrupt stack?)Some more details: this monitoring server has many vlans over em1. It runs some custom perl code utilizing Net::Pcap over bpf. Many times per minute it opens/closes bpf to send/receive custom PPPoE frames using vlans (sends over multiple if_vlan and receives using their parent em1). This server also multiple times per minute starts and stops receiving IPTV multicast traffic using another em0 interface as part of another monitoring job. Again, r314043 was pretty stable.