Howdy, I have some high-traffic squid servers, most of which are running a flavor of RELENG_7 very successfully, but one that I've been evaluating 8.x on has had a lot of problems. Most recently we had the crash below twice in the last 2 weeks. Same exact backtrace. Any suggestions on where to look would be appreciated. Thanks, Doug #0 doadump () at pcpu.h:224 224 pcpu.h: No such file or directory. in pcpu.h (kgdb) #0 doadump () at pcpu.h:224 #1 0xffffffff803ec4be in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:419 #2 0xffffffff803ec8f1 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:592 #3 0xffffffff8069a4d0 in trap_fatal (frame=0x1c, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:783 #4 0xffffffff8069aab9 in trap (frame=0xffffff800012f650) at /usr/src/sys/amd64/amd64/trap.c:592 #5 0xffffffff80682e84 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #6 0xffffffff80698896 in bcopy () at /usr/src/sys/amd64/amd64/support.S:124 #7 0xffffffff8044df61 in sbcompress (sb=0xffffff01d98945e0, m=0xffffff010b815300, n=0xffffff006baa3700) at /usr/src/sys/kern/uipc_sockbuf.c:779 #8 0xffffffff8044e1e6 in sbappendstream_locked (sb=0xffffff01d98945e0, m=0xffffff010b815300) at /usr/src/sys/kern/uipc_sockbuf.c:534 #9 0xffffffff80527530 in tcp_do_segment (m=0xffffff010b815300, th=Variable "th" is not available. ) at /usr/src/sys/netinet/tcp_input.c:2588 #10 0xffffffff80528b4b in tcp_input (m=0xffffff010b815300, off0=Variable "off0" is not available. ) at /usr/src/sys/netinet/tcp_input.c:1029 #11 0xffffffff804c3b2c in ip_input (m=0xffffff010b815300) at /usr/src/sys/netinet/ip_input.c:787 #12 0xffffffff804a631e in netisr_dispatch_src (proto=1, source=Variable "source" is not available. ) at /usr/src/sys/net/netisr.c:917 #13 0xffffffff8049d73d in ether_demux (ifp=0xffffff0002d30000, m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:894 #14 0xffffffff8049db2d in ether_input (ifp=0xffffff0002d30000, m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:753 #15 0xffffffff8027c18a in em_rxeof (rxr=0xffffff0002d7c600, count=98, done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4293 #16 0xffffffff8027c5a8 in em_handle_que (context=Variable "context" is not available. ) at /usr/src/sys/dev/e1000/if_em.c:1482 #17 0xffffffff80429ab5 in taskqueue_run_locked (queue=0xffffff0002d8d800) at /usr/src/sys/kern/subr_taskqueue.c:250 #18 0xffffffff80429c4e in taskqueue_thread_loop (arg=Variable "arg" is not available. ) at /usr/src/sys/kern/subr_taskqueue.c:387 #19 0xffffffff803c30f8 in fork_exit ( callout=0xffffffff80429c00 <taskqueue_thread_loop>, arg=0xffffff80005a8748, frame=0xffffff800012fc40) at /usr/src/sys/kern/kern_fork.c:845 #20 0xffffffff8068334e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:565 #21 0x0000000000000000 in ?? () #22 0x0000000000000000 in ?? () #23 0x0000000000000000 in ?? () #24 0x0000000000000000 in ?? () #25 0x0000000000000000 in ?? () #26 0x0000000000000000 in ?? () #27 0x0000000000000000 in ?? () #28 0x0000000000000000 in ?? () #29 0x0000000000000000 in ?? () #30 0x0000000000000000 in ?? () #31 0x0000000000000000 in ?? () #32 0x0000000000000000 in ?? () #33 0x0000000000000000 in ?? () #34 0x0000000000000000 in ?? () #35 0x0000000000000000 in ?? () #36 0x0000000000000000 in ?? () #37 0x0000000000000000 in ?? () #38 0x0000000000000000 in ?? () #39 0x0000000000000000 in ?? () #40 0x0000000000000000 in ?? () #41 0x0000000000000000 in ?? () #42 0x0000000000000000 in ?? () #43 0x0000000000000000 in ?? () #44 0x0000000000000000 in ?? () #45 0xffffffff8095ac00 in affinity () #46 0x0000000000000000 in ?? () #47 0x0000000000000000 in ?? () #48 0xffffff0002d2d8c0 in ?? () #49 0xffffff800012f320 in ?? () #50 0xffffff800012f2c8 in ?? () #51 0xffffff0002c59000 in ?? () #52 0xffffffff80411db9 in sched_switch (td=0xffffffff80429c00, newtd=0xffffff80005a8748, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1852 Previous frame inner to this frame (corrupt stack?) (kgdb) -- Nothin' ever doesn't change, but nothin' changes much. -- OK Go Breadth of IT experience, and depth of knowledge in the DNS. Yours for the right price. :) http://SupersetSolutions.com/
Jeremy Chadwick
2011-Aug-19 03:04 UTC
crash on 8.2-RELEASE amd64, high-traffic squid server
On Thu, Aug 18, 2011 at 07:36:50PM -0700, Doug Barton wrote:> Howdy, > > I have some high-traffic squid servers, most of which are running a > flavor of RELENG_7 very successfully, but one that I've been > evaluating 8.x on has had a lot of problems. Most recently we had > the crash below twice in the last 2 weeks. Same exact backtrace. Any > suggestions on where to look would be appreciated. > > > Thanks, > > Doug > > #0 doadump () at pcpu.h:224 > 224 pcpu.h: No such file or directory. > in pcpu.h > (kgdb) #0 doadump () at pcpu.h:224 > #1 0xffffffff803ec4be in boot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:419 > #2 0xffffffff803ec8f1 in panic (fmt=Variable "fmt" is not available. > ) > at /usr/src/sys/kern/kern_shutdown.c:592 > #3 0xffffffff8069a4d0 in trap_fatal (frame=0x1c, eva=Variable "eva" is not available. > ) > at /usr/src/sys/amd64/amd64/trap.c:783 > #4 0xffffffff8069aab9 in trap (frame=0xffffff800012f650) > at /usr/src/sys/amd64/amd64/trap.c:592 > #5 0xffffffff80682e84 in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:224 > #6 0xffffffff80698896 in bcopy () > at /usr/src/sys/amd64/amd64/support.S:124 > #7 0xffffffff8044df61 in sbcompress (sb=0xffffff01d98945e0, > m=0xffffff010b815300, n=0xffffff006baa3700) > at /usr/src/sys/kern/uipc_sockbuf.c:779 > #8 0xffffffff8044e1e6 in sbappendstream_locked (sb=0xffffff01d98945e0, > m=0xffffff010b815300) at /usr/src/sys/kern/uipc_sockbuf.c:534 > #9 0xffffffff80527530 in tcp_do_segment (m=0xffffff010b815300, th=Variable "th" is not available. > ) > at /usr/src/sys/netinet/tcp_input.c:2588 > #10 0xffffffff80528b4b in tcp_input (m=0xffffff010b815300, off0=Variable "off0" is not available. > ) > at /usr/src/sys/netinet/tcp_input.c:1029 > #11 0xffffffff804c3b2c in ip_input (m=0xffffff010b815300) > at /usr/src/sys/netinet/ip_input.c:787 > #12 0xffffffff804a631e in netisr_dispatch_src (proto=1, source=Variable "source" is not available. > ) > at /usr/src/sys/net/netisr.c:917 > #13 0xffffffff8049d73d in ether_demux (ifp=0xffffff0002d30000, > m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:894 > #14 0xffffffff8049db2d in ether_input (ifp=0xffffff0002d30000, > m=0xffffff010b815300) at /usr/src/sys/net/if_ethersubr.c:753 > #15 0xffffffff8027c18a in em_rxeof (rxr=0xffffff0002d7c600, count=98, > done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4293 > #16 0xffffffff8027c5a8 in em_handle_que (context=Variable "context" is not available. > ) > at /usr/src/sys/dev/e1000/if_em.c:1482 > #17 0xffffffff80429ab5 in taskqueue_run_locked (queue=0xffffff0002d8d800) > at /usr/src/sys/kern/subr_taskqueue.c:250 > #18 0xffffffff80429c4e in taskqueue_thread_loop (arg=Variable "arg" is not available. > ) > at /usr/src/sys/kern/subr_taskqueue.c:387 > #19 0xffffffff803c30f8 in fork_exit ( > callout=0xffffffff80429c00 <taskqueue_thread_loop>, > arg=0xffffff80005a8748, frame=0xffffff800012fc40) > at /usr/src/sys/kern/kern_fork.c:845 > #20 0xffffffff8068334e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:565 > #21 0x0000000000000000 in ?? () > #22 0x0000000000000000 in ?? () > #23 0x0000000000000000 in ?? () > #24 0x0000000000000000 in ?? () > #25 0x0000000000000000 in ?? () > #26 0x0000000000000000 in ?? () > #27 0x0000000000000000 in ?? () > #28 0x0000000000000000 in ?? () > #29 0x0000000000000000 in ?? () > #30 0x0000000000000000 in ?? () > #31 0x0000000000000000 in ?? () > #32 0x0000000000000000 in ?? () > #33 0x0000000000000000 in ?? () > #34 0x0000000000000000 in ?? () > #35 0x0000000000000000 in ?? () > #36 0x0000000000000000 in ?? () > #37 0x0000000000000000 in ?? () > #38 0x0000000000000000 in ?? () > #39 0x0000000000000000 in ?? () > #40 0x0000000000000000 in ?? () > #41 0x0000000000000000 in ?? () > #42 0x0000000000000000 in ?? () > #43 0x0000000000000000 in ?? () > #44 0x0000000000000000 in ?? () > #45 0xffffffff8095ac00 in affinity () > #46 0x0000000000000000 in ?? () > #47 0x0000000000000000 in ?? () > #48 0xffffff0002d2d8c0 in ?? () > #49 0xffffff800012f320 in ?? () > #50 0xffffff800012f2c8 in ?? () > #51 0xffffff0002c59000 in ?? () > #52 0xffffffff80411db9 in sched_switch (td=0xffffffff80429c00, > newtd=0xffffff80005a8748, flags=Variable "flags" is not available. > ) > at /usr/src/sys/kern/sched_ule.c:1852 > Previous frame inner to this frame (corrupt stack?) > (kgdb)CC'ing Jack Vogel here, since I see em(4) is involved. Jack will probably want this data from the system: # uname -a (hostname can be XXX'd out) # dmesg (particularly the emX entries and driver version) # pciconf -lvbc (specifically the emX entries and related data) # ifconfig -a (IPs and MACs can be X'd out; mainly interested in options and other pieces) # netstat -m (if possible from a system which has been up a while and is a likely crash candidate) # vmstat -i (same condition as netstat -m) There isn't enough data above for me to determine what's going on, but from the stack trace it looks like sbcompress() may be given some data which is null or inaccessible. The source for that hasn't been touched directly in a while. The TCP stack/code, however, has been (since 8.2-RELEASE for sure). I think em(4) has as well. This may end up being a case where running RELENG_8 is the fix, but I'd love to be able to say that for certain. "bt full" would be helpful but the above indicates the kernel might not have debugging symbols included in it? I've seen this kind of output even on a system with "makeoptions DEBUG=-g" in its kernel config before though. Never was sure how to deal with that problem. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |