On Oct 8, 2012, at 8:09 AM, Guy Helmer <guy.helmer at gmail.com> wrote:
> I'm seeing a consistent new kernel panic in FreeBSD 8.3:
>
> #0 doadump () at pcpu.h:224
> 224 __asm("movq %%gs:0,%0" : "=r" (td));
> (kgdb) #0 doadump () at pcpu.h:224
> #1 0xffffffff804c82e0 in boot (howto=260) at
../../../kern/kern_shutdown.c:441
> #2 0xffffffff804c8763 in panic (fmt=0x0) at
../../../kern/kern_shutdown.c:614
> #3 0xffffffff8069f3cd in trap_fatal (frame=0xffffffff809ecfc0,
eva=Variable "eva" is not available.)
> at ../../../amd64/amd64/trap.c:825
> #4 0xffffffff8069f701 in trap_pfault (frame=0xffffff800014a8b0,
usermode=0)
> at ../../../amd64/amd64/trap.c:741
> #5 0xffffffff8069fadf in trap (frame=0xffffff800014a8b0)
> at ../../../amd64/amd64/trap.c:478
> #6 0xffffffff806870f4 in calltrap () at
../../../amd64/amd64/exception.S:228
> #7 0xffffffff8069d026 in bcopy () at ../../../amd64/amd64/support.S:124
> #8 0xffffffff8056ea8e in catchpacket (d=0xffffff00aee2c600,
> pkt=0xffffff00253ac600 "", pktlen=1434, snaplen=Variable
"snaplen" is not available.)
> at ../../../net/bpf.c:2005
> #9 0xffffffff8056f0e5 in bpf_mtap (bp=0xffffff0001be1780,
> m=0xffffff00253ac600) at ../../../net/bpf.c:1832
> #10 0xffffffff80579035 in ether_input (ifp=0xffffff0001b73800,
> m=0xffffff00253ac600) at ../../../net/if_ethersubr.c:635
> #11 0xffffffff802b694a in em_rxeof (rxr=0xffffff0001bca200, count=99,
done=0x0)
> at ../../../dev/e1000/if_em.c:4404
> #12 0xffffffff802b6db8 in em_handle_que (context=Variable
"context" is not available.)
> at ../../../dev/e1000/if_em.c:1494
> #13 0xffffffff80506de5 in taskqueue_run_locked (queue=0xffffff0001bdd600)
> at ../../../kern/subr_taskqueue.c:250
> #14 0xffffffff80506f7e in taskqueue_thread_loop (arg=Variable
"arg" is not available.)
> at ../../../kern/subr_taskqueue.c:387
> #15 0xffffffff8049da2f in fork_exit (
> callout=0xffffffff80506f30 <taskqueue_thread_loop>,
> arg=0xffffff8000945768, frame=0xffffff800014ac50)
> at ../../../kern/kern_fork.c:876
> #16 0xffffffff8068763e in fork_trampoline ()
> at ../../../amd64/amd64/exception.S:602
> #17 0x0000000000000000 in ?? ()
>
> (kgdb) frame 8
> #8 0xffffffff8056ea8e in catchpacket (d=0xffffff00aee2c600,
> pkt=0xffffff00253ac600 "", pktlen=1434, snaplen=Variable
"snaplen" is not available.
> )
> at ../../../net/bpf.c:2005
> 2005 bpf_append_bytes(d, d->bd_sbuf, curlen, &hdr, sizeof(hdr));
> (kgdb) list
> 2000 bzero(&hdr, sizeof(hdr));
> 2001 hdr.bh_tstamp = *tv;
> 2002 hdr.bh_datalen = pktlen;
> 2003 hdr.bh_hdrlen = hdrlen;
> 2004 hdr.bh_caplen = totlen - hdrlen;
> 2005 bpf_append_bytes(d, d->bd_sbuf, curlen, &hdr, sizeof(hdr));
> 2006
> 2007 /*
> 2008 * Copy the packet data into the store buffer and update its length.
> 2009 */
> (kgdb) print *d
> $2 = {bd_next = {le_next = 0x0, le_prev = 0xffffff0001be1790}, bd_sbuf =
0x0,
> bd_hbuf = 0x0, bd_fbuf = 0xffffff8000eae000 "?OoP", bd_slen = 0,
> bd_hlen = 0, bd_bufsize = 8388608, bd_bif = 0xffffff0001be1780,
> bd_rtout = 1, bd_rfilter = 0xffffff0001e89180, bd_wfilter = 0x0,
> bd_bfilter = 0x0, bd_rcount = 4, bd_dcount = 0, bd_promisc = 1
'\001',
> bd_state = 2 '\002', bd_immediate = 1 '\001', bd_hdrcmplt
= 1,
> bd_direction = 0, bd_feedback = 0, bd_async = 0, bd_sig = 23,
> bd_sigio = 0x0, bd_sel = {si_tdlist = {tqh_first = 0x0,
> tqh_last = 0xffffff00aee2c690}, si_note = {kl_list = {slh_first =
0x0},
> kl_lock = 0xffffffff80497980 <knlist_mtx_lock>,
> kl_unlock = 0xffffffff80497950 <knlist_mtx_unlock>,
> kl_assert_locked = 0xffffffff80494630
<knlist_mtx_assert_locked>,
> kl_assert_unlocked = 0xffffffff80494640
<knlist_mtx_assert_unlocked>,
> kl_lockarg = 0xffffff00aee2c6d8}, si_mtx = 0xffffff8000de5270},
> bd_mtx = {lock_object = {lo_name = 0xffffff0001a5fce0 "bpf",
> lo_flags = 16973824, lo_data = 0, lo_witness = 0x0},
> mtx_lock = 18446742974226712770}, bd_callout = {c_links = {sle = {
> sle_next = 0x0}, tqe = {tqe_next = 0x0,
> tqe_prev = 0xffffff80f69c0c00}}, c_time = 20424328,
> c_arg = 0xffffff00aee2c600, c_func = 0xffffffff8056b690
<bpf_timed_out>,
> c_lock = 0xffffff00aee2c6d8, c_flags = 2, c_cpu = 0}, bd_label = 0x0,
> bd_fcount = 4, bd_pid = 95393, bd_locked = 0, bd_bufmode = 1, bd_wcount =
0,
> bd_wfcount = 0, bd_wdcount = 0, bd_zcopy = 0, bd_compat32 = 0
'\0'}
> (kgdb)
>
> I'm not seeing how bd_sbuf would be NULL here. Any ideas?
Since I've not had any replies, I hope nobody minds if I reply with more
information.
This panic seems to be occasionally triggered now that my user land code is
changing the packet filter a while after the bpd device has been opened and an
initial packet filter was set (previously, my code did not change the filter
after it was initially set).
I'm focusing on bpf_setf() since that seems to be the place that could be
tickling a problem, and I see that bpf_setf() calls reset_d(d) to clear the hold
buffer. I have manually verified that the BPFD lock is held during the call to
reset_d(), and the lock is held every other place that the buffers are
manipulated, so I haven't been able to find any place that seems vulnerable
to losing one of the bpf buffers. Still searching, but any help would be
appreciated.
Guy