*Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of the following two panics during multiuser startup, usually while running the /usr/local/etc/rc.d scripts. (The instruction pointer is always exactly one of these two, and they look fairly related.) If after two or three reboots it manages to not panic, the system will run perfectly stable. For some probably-unrelated reason, the dump never finishes in either case. First panic (note em0 warning before it): ----- em0: discard frame w/o packet header Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff805e4fc5 stack pointer = 0x28:0xffffff80003299e0 frame pointer = 0x28:0xffffff8000329a00 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: em0:rx 0) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 trap_fatal() at trap_fatal+0x290 trap() at trap+0x10a calltrap() at calltrap+0x8 --- trap 0x9, rip = 0xffffffff805e4fc5, rsp = 0xffffff80003299e0, rbp = 0xffffff8000329a00 --- m_freem() at m_freem+0x25 ether_nh_input() at ether_nh_input+0x82 netisr_dispatch_src() at netisr_dispatch_src+0x20b em_rxeof() at em_rxeof+0x1ca em_msix_rx() at em_msix_rx+0x24 intr_event_execute_handlers() at intr_event_execute_handlers+0x104 ithread_loop() at ithread_loop+0xa4 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8000329d00, rbp = 0 --- Uptime: 49s Dumping 679 out of 12263 MB: ----- Second panic (no em0 discard warning this time): ----- Fatal trap 9: general protection fault while in kernel mode cpuid = 0; apic id = 00 instruction pointer = 0x20:0xffffffff8063c0e4 stack pointer = 0x28:0xffffff8000329a00 frame pointer = 0x28:0xffffff8000329a40 code segment = base 0x0, limit 0xfffff, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags = interrupt enabled, resume, IOPL = 0 current process = 12 (irq256: em0:rx 0) trap number = 9 panic: general protection fault cpuid = 0 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a kdb_backtrace() at kdb_backtrace+0x37 panic() at panic+0x187 trap_fatal() at trap_fatal+0x290 trap() at trap+0x10a calltrap() at calltrap+0x8 --- trap 0x9, rip = 0xffffffff8063c0e4, rsp = 0xffffff8000329a00, rbp = 0xffffff8000329a40 --- ether_nh_input() at ether_nh_input+0x94 netisr_dispatch_src() at netisr_dispatch_src+0x20b em_rxeof() at em_rxeof+0x1ca em_msix_rx() at em_msix_rx+0x24 intr_event_execute_handlers() at intr_event_execute_handlers+0x104 ithread_loop() at ithread_loop+0xa4 fork_exit() at fork_exit+0x11f fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffff8000329d00, rbp = 0 --- Uptime: 46s Dumping 657 out of 12263 MB:..3%
On Mon, 28 Nov 2011 23:37:27 +0100, Mike Andrews <mandrews@bit0.com> wrote:> *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one > of the following two panics during multiuser startup, usually while > running the /usr/local/etc/rc.d scripts. (The instruction pointer is > always exactly one of these two, and they look fairly related.) If > after two or three reboots it manages to not panic, the system will run > perfectly stable. > > For some probably-unrelated reason, the dump never finishes in either > case. > > First panic (note em0 warning before it): > ----- > em0: discard frame w/o packet header > > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 0; apic id = 00 > instruction pointer = 0x20:0xffffffff805e4fc5 > stack pointer = 0x28:0xffffff80003299e0 > frame pointer = 0x28:0xffffff8000329a00 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 12 (irq256: em0:rx 0) > trap number = 9 > panic: general protection fault > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > trap_fatal() at trap_fatal+0x290 > trap() at trap+0x10a > calltrap() at calltrap+0x8 > --- trap 0x9, rip = 0xffffffff805e4fc5, rsp = 0xffffff80003299e0, rbp = > 0xffffff8000329a00 --- > m_freem() at m_freem+0x25 > ether_nh_input() at ether_nh_input+0x82 > netisr_dispatch_src() at netisr_dispatch_src+0x20b > em_rxeof() at em_rxeof+0x1ca > em_msix_rx() at em_msix_rx+0x24 > intr_event_execute_handlers() at intr_event_execute_handlers+0x104 > ithread_loop() at ithread_loop+0xa4 > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffff8000329d00, rbp = 0 --- > Uptime: 49s > Dumping 679 out of 12263 MB: > > ----- > > Second panic (no em0 discard warning this time): > > ----- > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 0; apic id = 00 > instruction pointer = 0x20:0xffffffff8063c0e4 > stack pointer = 0x28:0xffffff8000329a00 > frame pointer = 0x28:0xffffff8000329a40 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 12 (irq256: em0:rx 0) > trap number = 9 > panic: general protection fault > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > trap_fatal() at trap_fatal+0x290 > trap() at trap+0x10a > calltrap() at calltrap+0x8 > --- trap 0x9, rip = 0xffffffff8063c0e4, rsp = 0xffffff8000329a00, rbp = > 0xffffff8000329a40 --- > ether_nh_input() at ether_nh_input+0x94 > netisr_dispatch_src() at netisr_dispatch_src+0x20b > em_rxeof() at em_rxeof+0x1ca > em_msix_rx() at em_msix_rx+0x24 > intr_event_execute_handlers() at intr_event_execute_handlers+0x104 > ithread_loop() at ithread_loop+0xa4 > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffff8000329d00, rbp = 0 --- > Uptime: 46s > Dumping 657 out of 12263 MB:..3% > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"Does it help if you disable msix on your em0? Google for 'sysctl em msix'. Or run 'sysctl -a | grep msix'. NB: I know nothing about the details of em of msix, so hopefully somebody with more clue responds also. Ronald.
On Mon, Nov 28, 2011 at 05:37:27PM -0500, Mike Andrews wrote:> *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get > one of the following two panics during multiuser startup, usually > while running the /usr/local/etc/rc.d scripts. (The instruction > pointer is always exactly one of these two, and they look fairly > related.) If after two or three reboots it manages to not panic, > the system will run perfectly stable. > > For some probably-unrelated reason, the dump never finishes in either case. > > First panic (note em0 warning before it): > ----- > em0: discard frame w/o packet header > > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 0; apic id = 00 > instruction pointer = 0x20:0xffffffff805e4fc5 > stack pointer = 0x28:0xffffff80003299e0 > frame pointer = 0x28:0xffffff8000329a00 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 12 (irq256: em0:rx 0) > trap number = 9 > panic: general protection fault > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > trap_fatal() at trap_fatal+0x290 > trap() at trap+0x10a > calltrap() at calltrap+0x8 > --- trap 0x9, rip = 0xffffffff805e4fc5, rsp = 0xffffff80003299e0, rbp = 0xffffff8000329a00 --- > m_freem() at m_freem+0x25 > ether_nh_input() at ether_nh_input+0x82 > netisr_dispatch_src() at netisr_dispatch_src+0x20b > em_rxeof() at em_rxeof+0x1ca > em_msix_rx() at em_msix_rx+0x24 > intr_event_execute_handlers() at intr_event_execute_handlers+0x104 > ithread_loop() at ithread_loop+0xa4 > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffff8000329d00, rbp = 0 --- > Uptime: 49s > Dumping 679 out of 12263 MB: > > ----- > > Second panic (no em0 discard warning this time): > > ----- > > Fatal trap 9: general protection fault while in kernel mode > cpuid = 0; apic id = 00 > instruction pointer = 0x20:0xffffffff8063c0e4 > stack pointer = 0x28:0xffffff8000329a00 > frame pointer = 0x28:0xffffff8000329a40 > code segment = base 0x0, limit 0xfffff, type 0x1b > = DPL 0, pres 1, long 1, def32 0, gran 1 > processor eflags = interrupt enabled, resume, IOPL = 0 > current process = 12 (irq256: em0:rx 0) > trap number = 9 > panic: general protection fault > cpuid = 0 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2a > kdb_backtrace() at kdb_backtrace+0x37 > panic() at panic+0x187 > trap_fatal() at trap_fatal+0x290 > trap() at trap+0x10a > calltrap() at calltrap+0x8 > --- trap 0x9, rip = 0xffffffff8063c0e4, rsp = 0xffffff8000329a00, rbp = 0xffffff8000329a40 --- > ether_nh_input() at ether_nh_input+0x94 > netisr_dispatch_src() at netisr_dispatch_src+0x20b > em_rxeof() at em_rxeof+0x1ca > em_msix_rx() at em_msix_rx+0x24 > intr_event_execute_handlers() at intr_event_execute_handlers+0x104 > ithread_loop() at ithread_loop+0xa4 > fork_exit() at fork_exit+0x11f > fork_trampoline() at fork_trampoline+0xe > --- trap 0, rip = 0, rsp = 0xffffff8000329d00, rbp = 0 --- > Uptime: 46s > Dumping 657 out of 12263 MB:..3%We need the following things: * uname -a output * dmesg output (only details specific to emX NICs please) * pciconf -lvcb output (only details specific to emX NICs please) CC'ing Jack Vogel (driver author) who can hopefully shed some light on this. -- | Jeremy Chadwick jdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |
On Monday, November 28, 2011 5:37:27 pm Mike Andrews wrote:> *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get one of > the following two panics during multiuser startup, usually while running > the /usr/local/etc/rc.d scripts. (The instruction pointer is always > exactly one of these two, and they look fairly related.) If after two or > three reboots it manages to not panic, the system will run perfectly > stable. > > For some probably-unrelated reason, the dump never finishes in either case. > > First panic (note em0 warning before it): > ----- > em0: discard frame w/o packet headerThis is odd. I see one bug that could possibly trigger this, but not on x86: Index: if_em.c ==================================================================--- if_em.c (revision 228074) +++ if_em.c (working copy) @@ -4305,8 +4305,10 @@ em_rxeof(struct rx_ring *rxr, int count, int *done #ifndef __NO_STRICT_ALIGNMENT if (adapter->max_frame_size > (MCLBYTES - ETHER_ALIGN) && - em_fixup_rx(rxr) != 0) - goto skip; + em_fixup_rx(rxr) != 0) { + sendmp = NULL; + goto next_desc; + } #endif if (status & E1000_RXD_STAT_VP) { sendmp->m_pkthdr.ether_vtag @@ -4318,9 +4320,6 @@ em_rxeof(struct rx_ring *rxr, int count, int *done sendmp->m_pkthdr.flowid = rxr->msix; sendmp->m_flags |= M_FLOWID; #endif -#ifndef __NO_STRICT_ALIGNMENT -skip: -#endif rxr->fmp = rxr->lmp = NULL; } next_desc: @@ -4426,6 +4425,7 @@ em_fixup_rx(struct rx_ring *rxr) adapter->dropped_pkts++; m_freem(rxr->fmp); rxr->fmp = NULL; + rxr->lmp = NULL; error = ENOMEM; } } -- John Baldwin
On 11/28/11 5:48 PM, Ronald Klop wrote:> On Mon, 28 Nov 2011 23:37:27 +0100, Mike Andrews <mandrews@bit0.com> wrote: > >> *Sometimes* when booting 9.0-RC2 on *some* of my machines, I'll get >> one of the following two panics during multiuser startup, usually >> while running the /usr/local/etc/rc.d scripts. (The instruction >> pointer is always exactly one of these two, and they look fairly >> related.) If after two or three reboots it manages to not panic, the >> system will run perfectly stable. >> >> For some probably-unrelated reason, the dump never finishes in either >> case. >> >> First panic (note em0 warning before it): >> ----- >> em0: discard frame w/o packet header >> >> >> Fatal trap 9: general protection fault while in kernel mode >> cpuid = 0; apic id = 00 >> instruction pointer = 0x20:0xffffffff805e4fc5 >> stack pointer = 0x28:0xffffff80003299e0 >> frame pointer = 0x28:0xffffff8000329a00 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 12 (irq256: em0:rx 0) >> trap number = 9 >> panic: general protection fault >> cpuid = 0 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a >> kdb_backtrace() at kdb_backtrace+0x37 >> panic() at panic+0x187 >> trap_fatal() at trap_fatal+0x290 >> trap() at trap+0x10a >> calltrap() at calltrap+0x8 >> --- trap 0x9, rip = 0xffffffff805e4fc5, rsp = 0xffffff80003299e0, rbp >> = 0xffffff8000329a00 --- >> m_freem() at m_freem+0x25 >> ether_nh_input() at ether_nh_input+0x82 >> netisr_dispatch_src() at netisr_dispatch_src+0x20b >> em_rxeof() at em_rxeof+0x1ca >> em_msix_rx() at em_msix_rx+0x24 >> intr_event_execute_handlers() at intr_event_execute_handlers+0x104 >> ithread_loop() at ithread_loop+0xa4 >> fork_exit() at fork_exit+0x11f >> fork_trampoline() at fork_trampoline+0xe >> --- trap 0, rip = 0, rsp = 0xffffff8000329d00, rbp = 0 --- >> Uptime: 49s >> Dumping 679 out of 12263 MB: >> >> ----- >> >> Second panic (no em0 discard warning this time): >> >> ----- >> >> Fatal trap 9: general protection fault while in kernel mode >> cpuid = 0; apic id = 00 >> instruction pointer = 0x20:0xffffffff8063c0e4 >> stack pointer = 0x28:0xffffff8000329a00 >> frame pointer = 0x28:0xffffff8000329a40 >> code segment = base 0x0, limit 0xfffff, type 0x1b >> = DPL 0, pres 1, long 1, def32 0, gran 1 >> processor eflags = interrupt enabled, resume, IOPL = 0 >> current process = 12 (irq256: em0:rx 0) >> trap number = 9 >> panic: general protection fault >> cpuid = 0 >> KDB: stack backtrace: >> db_trace_self_wrapper() at db_trace_self_wrapper+0x2a >> kdb_backtrace() at kdb_backtrace+0x37 >> panic() at panic+0x187 >> trap_fatal() at trap_fatal+0x290 >> trap() at trap+0x10a >> calltrap() at calltrap+0x8 >> --- trap 0x9, rip = 0xffffffff8063c0e4, rsp = 0xffffff8000329a00, rbp >> = 0xffffff8000329a40 --- >> ether_nh_input() at ether_nh_input+0x94 >> netisr_dispatch_src() at netisr_dispatch_src+0x20b >> em_rxeof() at em_rxeof+0x1ca >> em_msix_rx() at em_msix_rx+0x24 >> intr_event_execute_handlers() at intr_event_execute_handlers+0x104 >> ithread_loop() at ithread_loop+0xa4 >> fork_exit() at fork_exit+0x11f >> fork_trampoline() at fork_trampoline+0xe >> --- trap 0, rip = 0, rsp = 0xffffff8000329d00, rbp = 0 --- >> Uptime: 46s >> Dumping 657 out of 12263 MB:..3%> Does it help if you disable msix on your em0? > Google for 'sysctl em msix'. Or run 'sysctl -a | grep msix'.OK, setting hw.em.enable_msix=0 in /boot/loader.conf does NOT help.