Hi, I'm having trouble with a system on a Supermicro X7SPE-HF, it crashes about once a day. I haven't found a way to trigger this yet. The system has a bunch of VLANs on em1, it does routing between them. Currently its running 8-STABLE but it happend with 8.1-RELEASE too. greetings, Philipp # kgdb kernel.debug /var/crash/vmcore.0 GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"... Unread portion of the kernel message buffer: Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff8061f5a8 stack pointer = 0x28:0xffffff80000e64d0 frame pointer = 0x28:0xffffff80000e64e0 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 0 (em1 taskq) trap number = 9 panic: general protection fault cpuid = 0 Uptime: 13h24m39s Physical memory: 4079 MB Dumping 1349 MB: 1334 1318 1302 1286 1270 1254 1238 1222 1206 1190 1174 1158 1142 1126 1110 1094 1078 1062 1046 1030 1014 998 982 966 950 934 918 902 886 870 854 838 822 806 790 774 758 742 726 710 694 678 662 646 630 614 598 582 566 550 534 518 502 486 470 454 438 422 406 390 374 358 342 326 310 294 278 262 246 230 214 198 182 166 150 134 118 102 86 70 54 38 22 6 Reading symbols from /boot/kernel/zfs.ko...Reading symbols from /boot/kernel/zfs.ko.symbols...done. done. Loaded symbols for /boot/kernel/zfs.ko Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from /boot/kernel/opensolaris.ko.symbols...done. done. Loaded symbols for /boot/kernel/opensolaris.ko Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from /boot/kernel/coretemp.ko.symbols...done. done. Loaded symbols for /boot/kernel/coretemp.ko Reading symbols from /boot/kernel/ahci.ko...Reading symbols from /boot/kernel/ahci.ko.symbols...done. done. Loaded symbols for /boot/kernel/ahci.ko Reading symbols from /boot/kernel/ipmi.ko...Reading symbols from /boot/kernel/ipmi.ko.symbols...done. done. Loaded symbols for /boot/kernel/ipmi.ko Reading symbols from /boot/kernel/smbus.ko...Reading symbols from /boot/kernel/smbus.ko.symbols...done. done. Loaded symbols for /boot/kernel/smbus.ko Reading symbols from /boot/kernel/pflog.ko...Reading symbols from /boot/kernel/pflog.ko.symbols...done. done. Loaded symbols for /boot/kernel/pflog.ko Reading symbols from /boot/kernel/pf.ko...Reading symbols from /boot/kernel/pf.ko.symbols...done. done. Loaded symbols for /boot/kernel/pf.ko #0 doadump () at pcpu.h:224 224 __asm("movq %%gs:0,%0" : "=r" (td)); (kgdb) list *0xffffffff8061f5a8 0xffffffff8061f5a8 is in m_tag_locate (/usr/src/sys/kern/uipc_mbuf2.c:389). 384 if (t == NULL) 385 p = SLIST_FIRST(&m->m_pkthdr.tags); 386 else 387 p = SLIST_NEXT(t, m_tag_link); 388 while (p != NULL) { 389 if (p->m_tag_cookie == cookie && p->m_tag_id == type) 390 return p; 391 p = SLIST_NEXT(p, m_tag_link); 392 } 393 return NULL; (kgdb) backtrace #0 doadump () at pcpu.h:224 #1 0xffffffff805c25ce in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:416 #2 0xffffffff805c29dc in panic (fmt=0x0) at /usr/src/sys/kern/kern_shutdown.c:590 #3 0xffffffff808d40bd in trap_fatal (frame=0xffffffff80c8af60, eva=Variable "eva" is not available. ) at /usr/src/sys/amd64/amd64/trap.c:777 #4 0xffffffff808d4a8b in trap (frame=0xffffff80000e6420) at /usr/src/sys/amd64/amd64/trap.c:588 #5 0xffffffff808b9d64 in calltrap () at /usr/src/sys/amd64/amd64/exception.S:224 #6 0xffffffff8061f5a8 in m_tag_locate (m=0xffffff010bb45c00, cookie=0, type=6, t=Variable "t" is not available. ) at /usr/src/sys/kern/uipc_mbuf2.c:388 #7 0xffffffff806d7c56 in ip_ipsec_output (m=0xffffff80000e6598, inp=0xffffff010be43150, flags=0xffffff80000e6594, error=0xffffff80000e65a8, ifp=Variable "ifp" is not available. ) at mbuf.h:1006 #8 0xffffffff806d97ef in ip_output (m=0xffffff010bb45c00, opt=Variable "opt" is not available. ) at /usr/src/sys/netinet/ip_output.c:483 #9 0xffffffff8073ef13 in tcp_output (tp=0xffffff000a9eb370) at /usr/src/sys/netinet/tcp_output.c:1190 #10 0xffffffff8073a42d in tcp_do_segment (m=0xffffff000a4cd800, th=0xffffff000a4df824, so=0xffffff000a9037f8, tp=0xffffff000a9eb370, drop_hdrlen=52, tlen=0, iptos=0 '\0', ti_locked=2) at /usr/src/sys/netinet/tcp_input.c:1484 #11 0xffffffff8073cf7b in tcp_input (m=0xffffff000a4cd800, off0=Variable "off0" is not available. ) at /usr/src/sys/netinet/tcp_input.c:1029 #12 0xffffffff806d7660 in ip_input (m=0xffffff000a4cd800) at /usr/src/sys/netinet/ip_input.c:793 #13 0xffffffff8067bd3e in netisr_dispatch_src (proto=1, source=Variable "source" is not available. ) at /usr/src/sys/net/netisr.c:917 #14 0xffffffff806720fd in ether_demux (ifp=0xffffff0004fd4000, m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:901 #15 0xffffffff806724c7 in ether_input (ifp=0xffffff0004fd4000, m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:760 #16 0xffffffff8067201f in ether_demux (ifp=0xffffff0002686800, m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:810 #17 0xffffffff806724c7 in ether_input (ifp=0xffffff0002686800, m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:760 #18 0xffffffff8033a00b in em_rxeof (rxr=0xffffff00026f7400, count=100, done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4196 #19 0xffffffff8033aab8 in em_handle_que (context=Variable "context" is not available. ) at /usr/src/sys/dev/e1000/if_em.c:1451 #20 0xffffffff805ff704 in taskqueue_run (queue=0xffffff0002727b80) at /usr/src/sys/kern/subr_taskqueue.c:239 #21 0xffffffff805ff976 in taskqueue_thread_loop (arg=Variable "arg" is not available. ) at /usr/src/sys/kern/subr_taskqueue.c:360 #22 0xffffffff805985a8 in fork_exit ( callout=0xffffffff805ff930 <taskqueue_thread_loop>, arg=0xffffff80003c1740, frame=0xffffff80000e6c80) at /usr/src/sys/kern/kern_fork.c:844 #23 0xffffffff808ba23e in fork_trampoline () at /usr/src/sys/amd64/amd64/exception.S:566 #24 0x0000000000000000 in ?? () #25 0x0000000000000000 in ?? () #26 0x0000000000000000 in ?? () #27 0x0000000000000000 in ?? () #28 0x0000000000000000 in ?? () #29 0x0000000000000000 in ?? () #30 0x0000000000000000 in ?? () #31 0x0000000000000000 in ?? () #32 0x0000000000000000 in ?? () #33 0x0000000000000000 in ?? () #34 0x0000000000000000 in ?? () #35 0x0000000000000000 in ?? () #36 0x0000000000000000 in ?? () #37 0x0000000000000000 in ?? () #38 0x0000000000000000 in ?? () #39 0x0000000000000000 in ?? () #40 0x0000000000000000 in ?? () #41 0x0000000000000000 in ?? () #42 0x0000000000000000 in ?? () #43 0x0000000000000000 in ?? () #44 0x0000000000000000 in ?? () #45 0x0000000000000000 in ?? () #46 0x0000000000000000 in ?? () #47 0x0000000000000000 in ?? () #48 0x00000000010d1000 in ?? () #49 0x0000000000000000 in ?? () #50 0x0000000000000000 in ?? () #51 0xffffffff80cb0fa0 in sleepq_chains () #52 0xffffff00025087c0 in ?? () #53 0xffffff80000e6b20 in ?? () #54 0xffffff80000e6ad8 in ?? () #55 0xffffff00026877c0 in ?? () #56 0xffffffff805e64fa in sched_switch (td=0xffffff80003c1740, newtd=0xffffffff805ff930, flags=Variable "flags" is not available. ) at /usr/src/sys/kern/sched_ule.c:1844 Previous frame inner to this frame (corrupt stack?)
At 06:55 PM 8/24/2010, Philipp Wuensche wrote:>Hi, > >I'm having trouble with a system on a Supermicro X7SPE-HF, it crashes >about once a day. I haven't found a way to trigger this yet. > >The system has a bunch of VLANs on em1, it does routing between them. > >Currently its running 8-STABLE but it happend with 8.1-RELEASE too.I dont think its the same problem you are seeing, but the patch in <http://lists.freebsd.org/pipermail/freebsd-stable/2010-August/058296.html>http://lists.freebsd.org/pipermail/freebsd-stable/2010-August/058296.html might be worth a try. ---Mike>greetings, >Philipp > ># kgdb kernel.debug /var/crash/vmcore.0 >GNU gdb 6.1.1 [FreeBSD] >Copyright 2004 Free Software Foundation, Inc. >GDB is free software, covered by the GNU General Public License, and you are >welcome to change it and/or distribute copies of it under certain >conditions. >Type "show copying" to see the conditions. >There is absolutely no warranty for GDB. Type "show warranty" for details. >This GDB was configured as "amd64-marcel-freebsd"... > >Unread portion of the kernel message buffer: > > >Fatal trap 9: general protection fault while in kernel mode >cpuid = 0; apic id = 00 >instruction pointer = 0x20:0xffffffff8061f5a8 >stack pointer = 0x28:0xffffff80000e64d0 >frame pointer = 0x28:0xffffff80000e64e0 >code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 >processor eflags = interrupt enabled, resume, IOPL = 0 >current process = 0 (em1 taskq) >trap number = 9 >panic: general protection fault >cpuid = 0 >Uptime: 13h24m39s >Physical memory: 4079 MB >Dumping 1349 MB: 1334 1318 1302 1286 1270 1254 1238 1222 1206 1190 1174 >1158 1142 1126 1110 1094 1078 1062 1046 1030 1014 998 982 966 950 934 >918 902 886 870 854 838 822 806 790 774 758 742 726 710 694 678 662 646 >630 614 598 582 566 550 534 518 502 486 470 454 438 422 406 390 374 358 >342 326 310 294 278 262 246 230 214 198 182 166 150 134 118 102 86 70 54 >38 22 6 > >Reading symbols from /boot/kernel/zfs.ko...Reading symbols from >/boot/kernel/zfs.ko.symbols...done. >done. >Loaded symbols for /boot/kernel/zfs.ko >Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from >/boot/kernel/opensolaris.ko.symbols...done. >done. >Loaded symbols for /boot/kernel/opensolaris.ko >Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from >/boot/kernel/coretemp.ko.symbols...done. >done. >Loaded symbols for /boot/kernel/coretemp.ko >Reading symbols from /boot/kernel/ahci.ko...Reading symbols from >/boot/kernel/ahci.ko.symbols...done. >done. >Loaded symbols for /boot/kernel/ahci.ko >Reading symbols from /boot/kernel/ipmi.ko...Reading symbols from >/boot/kernel/ipmi.ko.symbols...done. >done. >Loaded symbols for /boot/kernel/ipmi.ko >Reading symbols from /boot/kernel/smbus.ko...Reading symbols from >/boot/kernel/smbus.ko.symbols...done. >done. >Loaded symbols for /boot/kernel/smbus.ko >Reading symbols from /boot/kernel/pflog.ko...Reading symbols from >/boot/kernel/pflog.ko.symbols...done. >done. >Loaded symbols for /boot/kernel/pflog.ko >Reading symbols from /boot/kernel/pf.ko...Reading symbols from >/boot/kernel/pf.ko.symbols...done. >done. >Loaded symbols for /boot/kernel/pf.ko >#0 doadump () at pcpu.h:224 >224 __asm("movq %%gs:0,%0" : "=r" (td)); >(kgdb) list *0xffffffff8061f5a8 >0xffffffff8061f5a8 is in m_tag_locate (/usr/src/sys/kern/uipc_mbuf2.c:389). >384 if (t == NULL) >385 p = SLIST_FIRST(&m->m_pkthdr.tags); >386 else >387 p = SLIST_NEXT(t, m_tag_link); >388 while (p != NULL) { >389 if (p->m_tag_cookie == cookie && p->m_tag_id == type) >390 return p; >391 p = SLIST_NEXT(p, m_tag_link); >392 } >393 return NULL; >(kgdb) backtrace >#0 doadump () at pcpu.h:224 >#1 0xffffffff805c25ce in boot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:416 >#2 0xffffffff805c29dc in panic (fmt=0x0) > at /usr/src/sys/kern/kern_shutdown.c:590 >#3 0xffffffff808d40bd in trap_fatal (frame=0xffffffff80c8af60, >eva=Variable "eva" is not available. >) > at /usr/src/sys/amd64/amd64/trap.c:777 >#4 0xffffffff808d4a8b in trap (frame=0xffffff80000e6420) > at /usr/src/sys/amd64/amd64/trap.c:588 >#5 0xffffffff808b9d64 in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:224 >#6 0xffffffff8061f5a8 in m_tag_locate (m=0xffffff010bb45c00, cookie=0, > type=6, t=Variable "t" is not available. >) at /usr/src/sys/kern/uipc_mbuf2.c:388 >#7 0xffffffff806d7c56 in ip_ipsec_output (m=0xffffff80000e6598, > inp=0xffffff010be43150, flags=0xffffff80000e6594, > error=0xffffff80000e65a8, ifp=Variable "ifp" is not available. >) at mbuf.h:1006 >#8 0xffffffff806d97ef in ip_output (m=0xffffff010bb45c00, opt=Variable >"opt" is not available. >) > at /usr/src/sys/netinet/ip_output.c:483 >#9 0xffffffff8073ef13 in tcp_output (tp=0xffffff000a9eb370) > at /usr/src/sys/netinet/tcp_output.c:1190 >#10 0xffffffff8073a42d in tcp_do_segment (m=0xffffff000a4cd800, > th=0xffffff000a4df824, so=0xffffff000a9037f8, tp=0xffffff000a9eb370, > drop_hdrlen=52, tlen=0, iptos=0 '\0', ti_locked=2) > at /usr/src/sys/netinet/tcp_input.c:1484 >#11 0xffffffff8073cf7b in tcp_input (m=0xffffff000a4cd800, off0=Variable >"off0" is not available. >) > at /usr/src/sys/netinet/tcp_input.c:1029 >#12 0xffffffff806d7660 in ip_input (m=0xffffff000a4cd800) > at /usr/src/sys/netinet/ip_input.c:793 >#13 0xffffffff8067bd3e in netisr_dispatch_src (proto=1, source=Variable >"source" is not available. >) > at /usr/src/sys/net/netisr.c:917 >#14 0xffffffff806720fd in ether_demux (ifp=0xffffff0004fd4000, > m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:901 >#15 0xffffffff806724c7 in ether_input (ifp=0xffffff0004fd4000, > m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:760 >#16 0xffffffff8067201f in ether_demux (ifp=0xffffff0002686800, > m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:810 >#17 0xffffffff806724c7 in ether_input (ifp=0xffffff0002686800, > m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:760 >#18 0xffffffff8033a00b in em_rxeof (rxr=0xffffff00026f7400, count=100, > done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4196 >#19 0xffffffff8033aab8 in em_handle_que (context=Variable "context" is >not available. >) > at /usr/src/sys/dev/e1000/if_em.c:1451 >#20 0xffffffff805ff704 in taskqueue_run (queue=0xffffff0002727b80) > at /usr/src/sys/kern/subr_taskqueue.c:239 >#21 0xffffffff805ff976 in taskqueue_thread_loop (arg=Variable "arg" is >not available. >) > at /usr/src/sys/kern/subr_taskqueue.c:360 >#22 0xffffffff805985a8 in fork_exit ( > callout=0xffffffff805ff930 <taskqueue_thread_loop>, > arg=0xffffff80003c1740, frame=0xffffff80000e6c80) > at /usr/src/sys/kern/kern_fork.c:844 >#23 0xffffffff808ba23e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:566 >#24 0x0000000000000000 in ?? () >#25 0x0000000000000000 in ?? () >#26 0x0000000000000000 in ?? () >#27 0x0000000000000000 in ?? () >#28 0x0000000000000000 in ?? () >#29 0x0000000000000000 in ?? () >#30 0x0000000000000000 in ?? () >#31 0x0000000000000000 in ?? () >#32 0x0000000000000000 in ?? () >#33 0x0000000000000000 in ?? () >#34 0x0000000000000000 in ?? () >#35 0x0000000000000000 in ?? () >#36 0x0000000000000000 in ?? () >#37 0x0000000000000000 in ?? () >#38 0x0000000000000000 in ?? () >#39 0x0000000000000000 in ?? () >#40 0x0000000000000000 in ?? () >#41 0x0000000000000000 in ?? () >#42 0x0000000000000000 in ?? () >#43 0x0000000000000000 in ?? () >#44 0x0000000000000000 in ?? () >#45 0x0000000000000000 in ?? () >#46 0x0000000000000000 in ?? () >#47 0x0000000000000000 in ?? () >#48 0x00000000010d1000 in ?? () >#49 0x0000000000000000 in ?? () >#50 0x0000000000000000 in ?? () >#51 0xffffffff80cb0fa0 in sleepq_chains () >#52 0xffffff00025087c0 in ?? () >#53 0xffffff80000e6b20 in ?? () >#54 0xffffff80000e6ad8 in ?? () >#55 0xffffff00026877c0 in ?? () >#56 0xffffffff805e64fa in sched_switch (td=0xffffff80003c1740, > newtd=0xffffffff805ff930, flags=Variable "flags" is not available. >) at /usr/src/sys/kern/sched_ule.c:1844 >Previous frame inner to this frame (corrupt stack?) >_______________________________________________ >freebsd-stable@freebsd.org mailing list >http://lists.freebsd.org/mailman/listinfo/freebsd-stable >To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"-------------------------------------------------------------------- Mike Tancsa, tel +1 519 651 3400 Sentex Communications, mike@sentex.net Providing Internet since 1994 www.sentex.net Cambridge, Ontario Canada www.sentex.net/mike
On Sat, Aug 28, 2010 at 04:19:07PM +0200, Philipp Wuensche wrote:> Philipp Wuensche wrote: > > > > It just now started running the kernel without IPSEC and ALTQ. > > Here we go again, this time it crashed with IPSEC and ALTQ disabled, > crashdump looks different this time though. > > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 12: page fault while in kernel mode > cpuid = 0; apic id = 00 > fault virtual address = 0xffff80400bc58038 > fault code = supervisor read data, page not present > instruction pointer = 0x20:0xffffffff808a41ae > stack pointer = 0x28:0xffffff80000e69a0 > frame pointer = 0x28:0xffffff80000e69b0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (em1 taskq) > trap number = 12 > panic: page fault > cpuid = 0 > Uptime: 23h30m3s > Physical memory: 4079 MB > Dumping 1907 MB: 1892 1876em1: Watchdog timeout -- resetting > 1860 1844 1828 1812 1796 1780 1764 1748 1732 1716 1700 1684 1668 1652 > 1636 1620 1604 1588 1572 1556 1540 1524 1508 1492 1476 1460 1444 1428 > 1412 1396 1380 1364 1348 1332 1316 1300 1284 1268 1252 1236 1220 1204 > 1188 1172 1156 1140 1124 1108 1092 1076 1060 1044 1028 1012 996 980 964 > 948 932 916 900 884 868 852 836 820 804 788 772 756 740 724 708 692 676 > 660 644 628 612 596 580 564 548 532 516 500 484 468 452 436 420 404 388 > 372 356 340 324 308 292 276 260 244 228 212 196 180 164 148 132 116 100 > 84 68 52 36 20 4 > > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from > /boot/kernel/zfs.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/zfs.ko > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from > /boot/kernel/opensolaris.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/opensolaris.ko > Reading symbols from /boot/kernel/geom_stripe.ko...Reading symbols from > /boot/kernel/geom_stripe.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/geom_stripe.ko > Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from > /boot/kernel/coretemp.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/coretemp.ko > Reading symbols from /boot/kernel/ahci.ko...Reading symbols from > /boot/kernel/ahci.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/ahci.ko > Reading symbols from /boot/kernel/ipmi.ko...Reading symbols from > /boot/kernel/ipmi.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/ipmi.ko > Reading symbols from /boot/kernel/smbus.ko...Reading symbols from > /boot/kernel/smbus.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/smbus.ko > Reading symbols from /boot/kernel/pflog.ko...Reading symbols from > /boot/kernel/pflog.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/pflog.ko > Reading symbols from /boot/kernel/pf.ko...Reading symbols from > /boot/kernel/pf.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/pf.ko > #0 doadump () at pcpu.h:224 > 224 __asm("movq %%gs:0,%0" : "=r" (td)); > (kgdb) list *0xffffffff808a41ae > 0xffffffff808a41ae is in pmap_kextract > (/usr/src/sys/amd64/amd64/pmap.c:1172). > 1167 vm_paddr_t pa; > 1168 > 1169 if (va >= DMAP_MIN_ADDRESS && va < DMAP_MAX_ADDRESS) { > 1170 pa = DMAP_TO_PHYS(va); > 1171 } else { > 1172 pde = *vtopde(va); > 1173 if (pde & PG_PS) { > 1174 pa = (pde & PG_PS_FRAME) | (va & PDRMASK); > 1175 } else { > 1176 /* > (kgdb) backtrace > #0 doadump () at pcpu.h:224 > #1 0xffffffff805b2b5e in boot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:416 > #2 0xffffffff805b2f6c in panic (fmt=0x0) > at /usr/src/sys/kern/kern_shutdown.c:590 > #3 0xffffffff808ac70d in trap_fatal (frame=0xffffffff80c5cc60, > eva=Variable "eva" is not available. > ) > at /usr/src/sys/amd64/amd64/trap.c:777 > #4 0xffffffff808acacf in trap_pfault (frame=0xffffff80000e68f0, usermode=0) > at /usr/src/sys/amd64/amd64/trap.c:693 > #5 0xffffffff808ad2e2 in trap (frame=0xffffff80000e68f0) > at /usr/src/sys/amd64/amd64/trap.c:451 > #6 0xffffffff808923b4 in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:224 > #7 0xffffffff808a41ae in pmap_kextract (va=51771551252551) > at /usr/src/sys/amd64/amd64/pmap.c:1172 > #8 0xffffffff80890f83 in bus_dmamap_load_mbuf_sg (dmat=0xffffff0002727c00, > map=0xffffffff80c99d40, m0=Variable "m0" is not available. > ) > at /usr/src/sys/amd64/amd64/busdma_machdep.c:659 > #9 0xffffffff8032f8fc in em_refresh_mbufs (rxr=0xffffff0002712600, > limit=975) > at /usr/src/sys/dev/e1000/if_em.c:3691 > #10 0xffffffff8032ff3c in em_rxeof (rxr=0xffffff0002712600, count=100, > done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4210 > #11 0xffffffff80330788 in em_handle_que (context=Variable "context" is > not available. > ) > at /usr/src/sys/dev/e1000/if_em.c:1451 > #12 0xffffffff805efc94 in taskqueue_run (queue=0xffffff0002727b80) > at /usr/src/sys/kern/subr_taskqueue.c:239 > #13 0xffffffff805eff06 in taskqueue_thread_loop (arg=Variable "arg" is > not available. > ) > at /usr/src/sys/kern/subr_taskqueue.c:360 > #14 0xffffffff80589998 in fork_exit ( > callout=0xffffffff805efec0 <taskqueue_thread_loop>, > arg=0xffffff80003c2740, frame=0xffffff80000e6c80) > at /usr/src/sys/kern/kern_fork.c:844 > #15 0xffffffff8089288e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:566 > #16 0x0000000000000000 in ?? () > #17 0x0000000000000000 in ?? () > #18 0x0000000000000000 in ?? () > #19 0x0000000000000000 in ?? () > #20 0x0000000000000000 in ?? () > #21 0x0000000000000000 in ?? () > #22 0x0000000000000000 in ?? () > #23 0x0000000000000000 in ?? () > #24 0x0000000000000000 in ?? () > #25 0x0000000000000000 in ?? () > #26 0x0000000000000000 in ?? () > #27 0x0000000000000000 in ?? () > #28 0x0000000000000000 in ?? () > #29 0x0000000000000000 in ?? () > #30 0x0000000000000000 in ?? () > #31 0x0000000000000000 in ?? () > #32 0x0000000000000000 in ?? () > #33 0x0000000000000000 in ?? () > #34 0x0000000000000000 in ?? () > #35 0x0000000000000000 in ?? () > #36 0x0000000000000000 in ?? () > #37 0x0000000000000000 in ?? () > #38 0x0000000000000000 in ?? () > #39 0x0000000000000000 in ?? () > #40 0x000000000109b000 in ?? () > #41 0x0000000000000000 in ?? () > #42 0x0000000000000000 in ?? () > #43 0xffffffff80c823e0 in sleepq_chains () > #44 0xffffff00025087c0 in ?? () > #45 0xffffff80000e6b20 in ?? () > #46 0xffffff80000e6ad8 in ?? () > #47 0xffffff000267f7c0 in ?? () > #48 0xffffffff805d6a8a in sched_switch (td=0xffffff80003c2740, > newtd=0xffffffff805efec0, flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1844 > Previous frame inner to this frame (corrupt stack?)Ok, thanks for the backtrace. This one indicates suspicious code path in em(4). Would you try attached patch and let me know whether it makes any difference on your box? Note, the patch was not extensively tested so make sure to test first before applying the patch to production box. The patch generated against HEAD but I guess it could be applied to stable/8. -------------- next part -------------- A non-text attachment was scrubbed... Name: em.rxdma.patch Type: text/x-diff Size: 7615 bytes Desc: not available Url : http://lists.freebsd.org/pipermail/freebsd-stable/attachments/20100830/43ecb20c/em.rxdma.bin
Bcc: Subject: Re: igb related(?) panics on 7.3-STABLE Reply-To: In-Reply-To: <20100830094631.GD12467@core.byshenk.net> On Mon, Aug 30, 2010 at 11:46:31AM +0200, Greg Byshenk wrote:> On Sun, Aug 29, 2010 at 08:16:59PM +0200, Greg Byshenk wrote: > > > I've begun seeing problems on a machine running FreeBSD-7.3-STABLE, 64-bit, > > with two igb nics in use. Previously the machine was fine, running earlier > > versions of 7-STABLE, although the load on the network has increased due > > to additional machines being added to the network (the machine functions > > as a fileserver, serving files to compute machines via NFS(v3)). > > > > Any advice is much appreciated. System info is below. > > > Followup with more information. The machine just panic'ed again, with > a lot of load on the network. > > Output from the 'systat' that was running at the time: > > 3 users Load 54.47 42.35 24.25 Aug 30 11:17 > > Mem:KB REAL VIRTUAL VN PAGER SWAP PAGER > Tot Share Tot Share Free in out in out > Act 46232 5504 868140 10548 943324 count > All 456484 7852 1074772k 27740 pages > Proc: Interrupts > r p d s w Csw Trp Sys Int Sof Flt cow 54220 total > 1 170 392k 8 278 22k 195 1 zfod sio0 irq4 > ozfod fdc0 irq6 > 70.4%Sys 3.1%Intr 0.0%User 0.0%Nice 26.5%Idle %ozfod 27 twa0 uhci0 > | | | | | | | | | | | daefr 2001 cpu0: time > ===================================++ prcfr igb0 256 > 9938 dtbuf 1247 totfr igb0 257 > Namei Name-cache Dir-cache 100000 desvn react igb0 258 > Calls hits % hits % 34443 numvn 1 pdwak igb0 259 > 24996 frevn 112852 pdpgs igb0 262 > intrn igb0 263 > Disks da0 da1 pass0 pass1 2570672 wire igb0 264 > KB/t 0.00 12.23 0.00 0.00 46760 act igb0 265 > tps 0 26 0 0 14706896 inact 19449 igb1 266 > MB/s 0.00 0.31 0.00 0.00 0 769796 26585 > 0 21 0 0 173528 > > > -greg > > > > > Machine: > > ======> > > > FreeBSD server.example.com 7.3-STABLE FreeBSD 7.3-STABLE #36: Wed Aug 25 11:01:07 CEST 2010 root@server.example.com:/usr/obj/usr/src/sys/KERNEL amd64 > > > > Kernel was csup'd earlier in the day on 25 August, immediately prior to > > the build. > > > > > > Panic: > > =====> > > > Fatal trap 9: general protection fault while in kernel mode > > cpuid = 2; apic id = 02 > > instruction pointer = 0x8:0xffffffff8052f40c > > stack pointer = 0x10:0xffffff82056819d0 > > frame pointer = 0x10:0xffffff82056819f0 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 65 (igb1 que) > > trap number = 9 > > panic: general protection fault > > cpuid = 2 > > KDB: stack backtrace: > > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > > panic() at panic+0x182 > > trap_fatal() at trap_fatal+0x294 > > trap() at trap+0x106 > > calltrap() at calltrap+0x8 > > --- trap 0x9, rip = 0xffffffff8052f40c, rsp = 0xffffff82056819d0, rbp = 0xffffff82056819f0 --- m_tag_delete_chain() at m_tag_delete_chain+0x1c > > uma_zfree_arg() at uma_zfree_arg+0x41 > > m_freem() at m_freem+0x54 > > ether_demux() at ether_demux+0x85 > > ether_input() at ether_input+0x1bb > > igb_rxeof() at igb_rxeof+0x29d > > igb_handle_que() at igb_handle_que+0x9a > > taskqueue_run() at taskqueue_run+0xac > > taskqueue_thread_loop() at taskqueue_thread_loop+0x46 > > fork_exit() at fork_exit+0x122 > > fork_trampoline() at fork_trampoline+0xe > > --- trap 0, rip = 0, rsp = 0xffffff8205681d30, rbp = 0 --- > > Uptime: 11h57m6s > > Physical memory: 18411 MB > > Dumping 3770 MB: > > > > Fatal trap 12: page fault while in kernel mode > > cpuid = 0; apic id = 00 > > fault virtual address = 0x8000000000 > > fault code = supervisor write data, page not present > > instruction pointer = 0x8:0xffffffff80188b5f > > stack pointer = 0x10:0xffffff82056811f0 > > frame pointer = 0x10:0xffffff82056812f0 > > code segment = base 0x0, limit 0xfffff, type 0x1b > > = DPL 0, pres 1, long 1, def32 0, gran 1 > > processor eflags = interrupt enabled, resume, IOPL = 0 > > current process = 65 (igb1 que) > > trap number = 12 > > > > > > pciconf: > > ======> > > > igb0@pci0:10:0:0: class=0x020000 card=0x10c915d9 chip=0x10c98086 rev=0x01 hdr=0x00 > > vendor = 'Intel Corporation' > > class = network > > subclass = ethernet > > igb1@pci0:10:0:1: class=0x020000 card=0x10c915d9 chip=0x10c98086 rev=0x01 hdr=0x00 > > vendor = 'Intel Corporation' > > class = network > > subclass = ethernet > > > > > > dmesg: > > ====> > > > igb0: <Intel(R) PRO/1000 Network Connection version - 1.9.5> port 0xe880-0xe89f mem 0xfbe60000-0xfbe > > 7ffff,0xfbe40000-0xfbe5ffff,0xfbeb8000-0xfbebbfff irq 16 at device 0.0 on pci10 > > igb0: Using MSIX interrupts with 10 vectors > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: [ITHREAD] > > igb0: Ethernet address: 00:30:48:ca:cd:72 > > igb1: <Intel(R) PRO/1000 Network Connection version - 1.9.5> port 0xec00-0xec1f mem 0xfbee0000-0xfbe > > fffff,0xfbec0000-0xfbedffff,0xfbebc000-0xfbebffff irq 17 at device 0.1 on pci10 > > igb1: Using MSIX interrupts with 10 vectors > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: [ITHREAD] > > igb1: Ethernet address: 00:30:48:ca:cd:73Adding Jack Vogel of Intel and Yong-Hyeon PYUN to the mix... I don't know if this is possible for you to do, but do you see the same problem when running 8.1-STABLE? I know there has been a lot of positive work on igb(4) in RELENG_8, but not too many of the fixes and improvements are backported to RELENG_7. http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/dev/e1000/if_igb.c Be sure to check out Revision 1.54 there (which is for HEAD/CURRENT, but I'm not sure if it's been backported/incorporated in some other way). Otherwise, as a test/workaround you might try disabling MSI-X support entirely to see if there's any improvement. This could degrade system performance a bit (under heavy interrupt load). In /boot/loader.conf, set hw.pci.enable_msix="0" and reboot. If there's no improvement, be sure to remove this. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |
In the end this turned out to be faulty hardware, at least the mainboard died a tragic death and has to be replaced now. Thanks for the help anyway and sorry for the noise! Greetings, philipp Philipp Wuensche wrote:> Hi, > > I'm having trouble with a system on a Supermicro X7SPE-HF, it crashes > about once a day. I haven't found a way to trigger this yet. > > The system has a bunch of VLANs on em1, it does routing between them. > > Currently its running 8-STABLE but it happend with 8.1-RELEASE too. > > greetings, > Philipp > > # kgdb kernel.debug /var/crash/vmcore.0 > GNU gdb 6.1.1 [FreeBSD] > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "amd64-marcel-freebsd"... > > Unread portion of the kernel message buffer: > > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 0; apic id = 00 > instruction pointer = 0x20:0xffffffff8061f5a8 > stack pointer = 0x28:0xffffff80000e64d0 > frame pointer = 0x28:0xffffff80000e64e0 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 0 (em1 taskq) > trap number = 9 > panic: general protection fault > cpuid = 0 > Uptime: 13h24m39s > Physical memory: 4079 MB > Dumping 1349 MB: 1334 1318 1302 1286 1270 1254 1238 1222 1206 1190 1174 > 1158 1142 1126 1110 1094 1078 1062 1046 1030 1014 998 982 966 950 934 > 918 902 886 870 854 838 822 806 790 774 758 742 726 710 694 678 662 646 > 630 614 598 582 566 550 534 518 502 486 470 454 438 422 406 390 374 358 > 342 326 310 294 278 262 246 230 214 198 182 166 150 134 118 102 86 70 54 > 38 22 6 > > Reading symbols from /boot/kernel/zfs.ko...Reading symbols from > /boot/kernel/zfs.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/zfs.ko > Reading symbols from /boot/kernel/opensolaris.ko...Reading symbols from > /boot/kernel/opensolaris.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/opensolaris.ko > Reading symbols from /boot/kernel/coretemp.ko...Reading symbols from > /boot/kernel/coretemp.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/coretemp.ko > Reading symbols from /boot/kernel/ahci.ko...Reading symbols from > /boot/kernel/ahci.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/ahci.ko > Reading symbols from /boot/kernel/ipmi.ko...Reading symbols from > /boot/kernel/ipmi.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/ipmi.ko > Reading symbols from /boot/kernel/smbus.ko...Reading symbols from > /boot/kernel/smbus.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/smbus.ko > Reading symbols from /boot/kernel/pflog.ko...Reading symbols from > /boot/kernel/pflog.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/pflog.ko > Reading symbols from /boot/kernel/pf.ko...Reading symbols from > /boot/kernel/pf.ko.symbols...done. > done. > Loaded symbols for /boot/kernel/pf.ko > #0 doadump () at pcpu.h:224 > 224 __asm("movq %%gs:0,%0" : "=r" (td)); > (kgdb) list *0xffffffff8061f5a8 > 0xffffffff8061f5a8 is in m_tag_locate (/usr/src/sys/kern/uipc_mbuf2.c:389). > 384 if (t == NULL) > 385 p = SLIST_FIRST(&m->m_pkthdr.tags); > 386 else > 387 p = SLIST_NEXT(t, m_tag_link); > 388 while (p != NULL) { > 389 if (p->m_tag_cookie == cookie && p->m_tag_id == type) > 390 return p; > 391 p = SLIST_NEXT(p, m_tag_link); > 392 } > 393 return NULL; > (kgdb) backtrace > #0 doadump () at pcpu.h:224 > #1 0xffffffff805c25ce in boot (howto=260) > at /usr/src/sys/kern/kern_shutdown.c:416 > #2 0xffffffff805c29dc in panic (fmt=0x0) > at /usr/src/sys/kern/kern_shutdown.c:590 > #3 0xffffffff808d40bd in trap_fatal (frame=0xffffffff80c8af60, > eva=Variable "eva" is not available. > ) > at /usr/src/sys/amd64/amd64/trap.c:777 > #4 0xffffffff808d4a8b in trap (frame=0xffffff80000e6420) > at /usr/src/sys/amd64/amd64/trap.c:588 > #5 0xffffffff808b9d64 in calltrap () > at /usr/src/sys/amd64/amd64/exception.S:224 > #6 0xffffffff8061f5a8 in m_tag_locate (m=0xffffff010bb45c00, cookie=0, > type=6, t=Variable "t" is not available. > ) at /usr/src/sys/kern/uipc_mbuf2.c:388 > #7 0xffffffff806d7c56 in ip_ipsec_output (m=0xffffff80000e6598, > inp=0xffffff010be43150, flags=0xffffff80000e6594, > error=0xffffff80000e65a8, ifp=Variable "ifp" is not available. > ) at mbuf.h:1006 > #8 0xffffffff806d97ef in ip_output (m=0xffffff010bb45c00, opt=Variable > "opt" is not available. > ) > at /usr/src/sys/netinet/ip_output.c:483 > #9 0xffffffff8073ef13 in tcp_output (tp=0xffffff000a9eb370) > at /usr/src/sys/netinet/tcp_output.c:1190 > #10 0xffffffff8073a42d in tcp_do_segment (m=0xffffff000a4cd800, > th=0xffffff000a4df824, so=0xffffff000a9037f8, tp=0xffffff000a9eb370, > drop_hdrlen=52, tlen=0, iptos=0 '\0', ti_locked=2) > at /usr/src/sys/netinet/tcp_input.c:1484 > #11 0xffffffff8073cf7b in tcp_input (m=0xffffff000a4cd800, off0=Variable > "off0" is not available. > ) > at /usr/src/sys/netinet/tcp_input.c:1029 > #12 0xffffffff806d7660 in ip_input (m=0xffffff000a4cd800) > at /usr/src/sys/netinet/ip_input.c:793 > #13 0xffffffff8067bd3e in netisr_dispatch_src (proto=1, source=Variable > "source" is not available. > ) > at /usr/src/sys/net/netisr.c:917 > #14 0xffffffff806720fd in ether_demux (ifp=0xffffff0004fd4000, > m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:901 > #15 0xffffffff806724c7 in ether_input (ifp=0xffffff0004fd4000, > m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:760 > #16 0xffffffff8067201f in ether_demux (ifp=0xffffff0002686800, > m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:810 > #17 0xffffffff806724c7 in ether_input (ifp=0xffffff0002686800, > m=0xffffff000a4cd800) at /usr/src/sys/net/if_ethersubr.c:760 > #18 0xffffffff8033a00b in em_rxeof (rxr=0xffffff00026f7400, count=100, > done=0x0) at /usr/src/sys/dev/e1000/if_em.c:4196 > #19 0xffffffff8033aab8 in em_handle_que (context=Variable "context" is > not available. > ) > at /usr/src/sys/dev/e1000/if_em.c:1451 > #20 0xffffffff805ff704 in taskqueue_run (queue=0xffffff0002727b80) > at /usr/src/sys/kern/subr_taskqueue.c:239 > #21 0xffffffff805ff976 in taskqueue_thread_loop (arg=Variable "arg" is > not available. > ) > at /usr/src/sys/kern/subr_taskqueue.c:360 > #22 0xffffffff805985a8 in fork_exit ( > callout=0xffffffff805ff930 <taskqueue_thread_loop>, > arg=0xffffff80003c1740, frame=0xffffff80000e6c80) > at /usr/src/sys/kern/kern_fork.c:844 > #23 0xffffffff808ba23e in fork_trampoline () > at /usr/src/sys/amd64/amd64/exception.S:566 > #24 0x0000000000000000 in ?? () > #25 0x0000000000000000 in ?? () > #26 0x0000000000000000 in ?? () > #27 0x0000000000000000 in ?? () > #28 0x0000000000000000 in ?? () > #29 0x0000000000000000 in ?? () > #30 0x0000000000000000 in ?? () > #31 0x0000000000000000 in ?? () > #32 0x0000000000000000 in ?? () > #33 0x0000000000000000 in ?? () > #34 0x0000000000000000 in ?? () > #35 0x0000000000000000 in ?? () > #36 0x0000000000000000 in ?? () > #37 0x0000000000000000 in ?? () > #38 0x0000000000000000 in ?? () > #39 0x0000000000000000 in ?? () > #40 0x0000000000000000 in ?? () > #41 0x0000000000000000 in ?? () > #42 0x0000000000000000 in ?? () > #43 0x0000000000000000 in ?? () > #44 0x0000000000000000 in ?? () > #45 0x0000000000000000 in ?? () > #46 0x0000000000000000 in ?? () > #47 0x0000000000000000 in ?? () > #48 0x00000000010d1000 in ?? () > #49 0x0000000000000000 in ?? () > #50 0x0000000000000000 in ?? () > #51 0xffffffff80cb0fa0 in sleepq_chains () > #52 0xffffff00025087c0 in ?? () > #53 0xffffff80000e6b20 in ?? () > #54 0xffffff80000e6ad8 in ?? () > #55 0xffffff00026877c0 in ?? () > #56 0xffffffff805e64fa in sched_switch (td=0xffffff80003c1740, > newtd=0xffffffff805ff930, flags=Variable "flags" is not available. > ) at /usr/src/sys/kern/sched_ule.c:1844 > Previous frame inner to this frame (corrupt stack?)