Eugene Grosbein
2018-Apr-17 09:30 UTC
stable/11 r332356 started panicing in bpf_dtor/__mtx_lock_sleep
Hi! I have a server that was running stable/11 rock-stable for many months. A week ago I've updates it to 11.1-STABLE r332356 and today it paniced and I have crashdump. Any thoughts? Unread portion of the kernel message buffer: Sleeping thread (tid 100444, pid 28400) owns a non-sleepable lock KDB: stack backtrace of thread 100444: sched_switch() at sched_switch+0x626/frame 0xfffffe03550f9740 mi_switch() at mi_switch+0xc5/frame 0xfffffe03550f9770 sleepq_wait() at sleepq_wait+0x2c/frame 0xfffffe03550f97a0 _sx_xlock_hard() at _sx_xlock_hard+0x2f0/frame 0xfffffe03550f9830 vlan_ioctl() at vlan_ioctl+0x53f/frame 0xfffffe03550f98b0 ifpromisc() at ifpromisc+0x106/frame 0xfffffe03550f9910 bpf_detachd_locked() at bpf_detachd_locked+0x1b4/frame 0xfffffe03550f9960 bpf_dtor() at bpf_dtor+0x9a/frame 0xfffffe03550f9990 devfs_fpdrop() at devfs_fpdrop+0x9c/frame 0xfffffe03550f99b0 devfs_close_f() at devfs_close_f+0x45/frame 0xfffffe03550f99e0 closef() at closef+0x209/frame 0xfffffe03550f9a70 closefp() at closefp+0x8b/frame 0xfffffe03550f9ab0 amd64_syscall() at amd64_syscall+0x25c/frame 0xfffffe03550f9bf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe03550f9bf0 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x4a317a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe3f0 --- panic: sleeping thread cpuid = 1 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0354c6d7c0 vpanic() at vpanic+0x177/frame 0xfffffe0354c6d820 panic() at panic+0x43/frame 0xfffffe0354c6d880 propagate_priority() at propagate_priority+0x183/frame 0xfffffe0354c6d8b0 turnstile_wait() at turnstile_wait+0x2bc/frame 0xfffffe0354c6d900 __mtx_lock_sleep() at __mtx_lock_sleep+0x151/frame 0xfffffe0354c6d960 bpf_dtor() at bpf_dtor+0x191/frame 0xfffffe0354c6d990 devfs_fpdrop() at devfs_fpdrop+0x9c/frame 0xfffffe0354c6d9b0 devfs_close_f() at devfs_close_f+0x45/frame 0xfffffe0354c6d9e0 closef() at closef+0x209/frame 0xfffffe0354c6da70 closefp() at closefp+0x8b/frame 0xfffffe0354c6dab0 amd64_syscall() at amd64_syscall+0x25c/frame 0xfffffe0354c6dbf0 fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0354c6dbf0 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x4a317a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe3f0 --- Uptime: 6d2h0m17s Dumping 1414 out of 12265 MB:..2%..11%..21%..31%..41%..51%..62%..71%..81%..91% (kgdb) bt #0 doadump (textdump=1) at pcpu.h:229 #1 0xffffffff804f3b9a in kern_reboot (howto=260) at /usr/local/src/sys/kern/kern_shutdown.c:383 #2 0xffffffff804f3f81 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) at /usr/local/src/sys/kern/kern_shutdown.c:776 #3 0xffffffff804f3dc3 in panic (fmt=<value optimized out>) at /usr/local/src/sys/kern/kern_shutdown.c:707 #4 0xffffffff80546b33 in propagate_priority (td=<value optimized out>) at /usr/local/src/sys/kern/subr_turnstile.c:226 #5 0xffffffff8054720c in turnstile_wait (ts=0xfffff80151ac4180, owner=<value optimized out>, queue=<value optimized out>) at /usr/local/src/sys/kern/subr_turnstile.c:742 #6 0xffffffff804dda81 in __mtx_lock_sleep (c=0xffffffff80c9c910, v=<value optimized out>) at /usr/local/src/sys/kern/kern_mutex.c:627 #7 0xffffffff805cd061 in bpf_dtor (data=0xfffff801512df800) at /usr/local/src/sys/net/bpf.c:778 #8 0xffffffff80455e5c in devfs_fpdrop (fp=<value optimized out>) at /usr/local/src/sys/fs/devfs/devfs_vnops.c:193 #9 0xffffffff804595d5 in devfs_close_f (fp=0xfffff802af2fe6e0, td=<value optimized out>) at /usr/local/src/sys/fs/devfs/devfs_vnops.c:675 #10 0xffffffff804b1549 in closef (fp=0xfffff802af2fe6e0, td=0xfffff8000cf75000) at file.h:346 #11 0xffffffff804af00b in closefp (fdp=0xfffff80083b48000, fd=<value optimized out>, fp=0xfffff802af2fe6e0, td=0xfffff8000cf75000, holdleaders=0) at /usr/local/src/sys/kern/kern_descrip.c:1191 #12 0xffffffff807afb6c in amd64_syscall (td=0xfffff8000cf75000, traced=0) at subr_syscall.c:132 #13 0xffffffff80791c3d in fast_syscall_common () at /usr/local/src/sys/amd64/amd64/exception.S:480 #14 0x00000000004a317a in ?? () Previous frame inner to this frame (corrupt stack?)
Eugene Grosbein
2018-Apr-17 10:59 UTC
stable/11 r332356 started panicing in bpf_dtor/__mtx_lock_sleep
On 17.04.2018 16:30, Eugene Grosbein wrote: CCing mjoras@ as author of suspicious change https://svnweb.freebsd.org/base?view=revision&revision=323477> I have a server that was running stable/11 rock-stable for many months.It was running stable/11 r314043 before last update.> A week ago I've updates it to 11.1-STABLE r332356 and today it paniced and I have crashdump. > Any thoughts? > > Unread portion of the kernel message buffer: > Sleeping thread (tid 100444, pid 28400) owns a non-sleepable lock > KDB: stack backtrace of thread 100444: > sched_switch() at sched_switch+0x626/frame 0xfffffe03550f9740 > mi_switch() at mi_switch+0xc5/frame 0xfffffe03550f9770 > sleepq_wait() at sleepq_wait+0x2c/frame 0xfffffe03550f97a0 > _sx_xlock_hard() at _sx_xlock_hard+0x2f0/frame 0xfffffe03550f9830 > vlan_ioctl() at vlan_ioctl+0x53f/frame 0xfffffe03550f98b0 > ifpromisc() at ifpromisc+0x106/frame 0xfffffe03550f9910 > bpf_detachd_locked() at bpf_detachd_locked+0x1b4/frame 0xfffffe03550f9960 > bpf_dtor() at bpf_dtor+0x9a/frame 0xfffffe03550f9990 > devfs_fpdrop() at devfs_fpdrop+0x9c/frame 0xfffffe03550f99b0 > devfs_close_f() at devfs_close_f+0x45/frame 0xfffffe03550f99e0 > closef() at closef+0x209/frame 0xfffffe03550f9a70 > closefp() at closefp+0x8b/frame 0xfffffe03550f9ab0 > amd64_syscall() at amd64_syscall+0x25c/frame 0xfffffe03550f9bf0 > fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe03550f9bf0 > --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x4a317a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe3f0 --- > panic: sleeping thread > cpuid = 1 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0354c6d7c0 > vpanic() at vpanic+0x177/frame 0xfffffe0354c6d820 > panic() at panic+0x43/frame 0xfffffe0354c6d880 > propagate_priority() at propagate_priority+0x183/frame 0xfffffe0354c6d8b0 > turnstile_wait() at turnstile_wait+0x2bc/frame 0xfffffe0354c6d900 > __mtx_lock_sleep() at __mtx_lock_sleep+0x151/frame 0xfffffe0354c6d960 > bpf_dtor() at bpf_dtor+0x191/frame 0xfffffe0354c6d990 > devfs_fpdrop() at devfs_fpdrop+0x9c/frame 0xfffffe0354c6d9b0 > devfs_close_f() at devfs_close_f+0x45/frame 0xfffffe0354c6d9e0 > closef() at closef+0x209/frame 0xfffffe0354c6da70 > closefp() at closefp+0x8b/frame 0xfffffe0354c6dab0 > amd64_syscall() at amd64_syscall+0x25c/frame 0xfffffe0354c6dbf0 > fast_syscall_common() at fast_syscall_common+0x101/frame 0xfffffe0354c6dbf0 > --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x4a317a, rsp = 0x7fffffffe3d8, rbp = 0x7fffffffe3f0 --- > Uptime: 6d2h0m17s > Dumping 1414 out of 12265 MB:..2%..11%..21%..31%..41%..51%..62%..71%..81%..91% > > > (kgdb) bt > #0 doadump (textdump=1) at pcpu.h:229 > #1 0xffffffff804f3b9a in kern_reboot (howto=260) at /usr/local/src/sys/kern/kern_shutdown.c:383 > #2 0xffffffff804f3f81 in vpanic (fmt=<value optimized out>, ap=<value optimized out>) > at /usr/local/src/sys/kern/kern_shutdown.c:776 > #3 0xffffffff804f3dc3 in panic (fmt=<value optimized out>) at /usr/local/src/sys/kern/kern_shutdown.c:707 > #4 0xffffffff80546b33 in propagate_priority (td=<value optimized out>) at /usr/local/src/sys/kern/subr_turnstile.c:226 > #5 0xffffffff8054720c in turnstile_wait (ts=0xfffff80151ac4180, owner=<value optimized out>, queue=<value optimized out>) > at /usr/local/src/sys/kern/subr_turnstile.c:742 > #6 0xffffffff804dda81 in __mtx_lock_sleep (c=0xffffffff80c9c910, v=<value optimized out>) > at /usr/local/src/sys/kern/kern_mutex.c:627 > #7 0xffffffff805cd061 in bpf_dtor (data=0xfffff801512df800) at /usr/local/src/sys/net/bpf.c:778 > #8 0xffffffff80455e5c in devfs_fpdrop (fp=<value optimized out>) at /usr/local/src/sys/fs/devfs/devfs_vnops.c:193 > #9 0xffffffff804595d5 in devfs_close_f (fp=0xfffff802af2fe6e0, td=<value optimized out>) > at /usr/local/src/sys/fs/devfs/devfs_vnops.c:675 > #10 0xffffffff804b1549 in closef (fp=0xfffff802af2fe6e0, td=0xfffff8000cf75000) at file.h:346 > #11 0xffffffff804af00b in closefp (fdp=0xfffff80083b48000, fd=<value optimized out>, fp=0xfffff802af2fe6e0, > td=0xfffff8000cf75000, holdleaders=0) at /usr/local/src/sys/kern/kern_descrip.c:1191 > #12 0xffffffff807afb6c in amd64_syscall (td=0xfffff8000cf75000, traced=0) at subr_syscall.c:132 > #13 0xffffffff80791c3d in fast_syscall_common () at /usr/local/src/sys/amd64/amd64/exception.S:480 > #14 0x00000000004a317a in ?? () > Previous frame inner to this frame (corrupt stack?)Some more details: this monitoring server has many vlans over em1. It runs some custom perl code utilizing Net::Pcap over bpf. Many times per minute it opens/closes bpf to send/receive custom PPPoE frames using vlans (sends over multiple if_vlan and receives using their parent em1). This server also multiple times per minute starts and stops receiving IPTV multicast traffic using another em0 interface as part of another monitoring job. Again, r314043 was pretty stable.