Charles Owens
2012-Oct-31 16:58 UTC
Panic during kernel boot, igb-init related? (8.3-RELEASE)
Hello, We're seeing boot-time panics in about 4% of cases when upgrading from FreeBSD 8.1 to 8.3-RELEASE (i386). This problem is subtle enough that it escaped detection during our regular testing cycle... now with over 100 systems upgraded we're convinced there's a real issue. Our kernel config is essentially PAE (ie. static modules ... with a few drivers added/removed). The hardware is Intel Server System SR1625UR. This appears to match a finding discussed in these threads, having to do with timing of initialization of the igb(4)-based NICs (if I'm understanding it properly): http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/062596.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062949.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063867.html http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063958.html These threads include some potential patches and possibility of commit/MFC... but it isn't clear that there was ever final resolution (and MFC to 8-stable). I've cc'd a few folks from back then. A real challenge here is the frequency of occurrence. As mentioned, it only hit's a fraction of our systems. When it _does_ hit, the system may enter a reboot loop for days and then mysteriously break out of it... and thereafter seem to work fine. I'd be very grateful for any help. Some questions: * Was there ever a final "blessed" patch? o if so, will it apply to RELENG_8_3? * Is there anything that could be said that might help us with reproducing-the-problem / testing / validating-a-fix? Panic message is -- panic: m_getzone: m_getjcl: invalid cluster type cpuid = 0 KDB: stack backtrace: #0 0xc059c717 at kdb_backtrace+0x47 #1 0xc056caf7 at panic+0x117 #2 0xc03c979e at igb_refresh_mbufs+0x25e #3 0xc03c9f98 at igb_rxeof+0x638 #4 0xc03ca135 at igb_msix_que+0x105 #5 0xc0541e2b at intr_event_execute_handlers+0x13b #6 0xc05434eb at ithread_loop+0x6b #7 0xc053efb7 at fork_exit+0x97 #8 0xc0806744 at fork_trampoline+0x8 Thanks very much, Charles -- Charles Owens Great Bay Software, Inc.
Eugene Grosbein
2012-Nov-01 06:25 UTC
Panic during kernel boot, igb-init related? (8.3-RELEASE)
31.10.2012 23:58, Charles Owens ?????:> Hello, > > We're seeing boot-time panics in about 4% of cases when upgrading from > FreeBSD 8.1 to 8.3-RELEASE (i386). This problem is subtle enough that > it escaped detection during our regular testing cycle... now with over > 100 systems upgraded we're convinced there's a real issue. Our kernel > config is essentially PAE (ie. static modules ... with a few drivers > added/removed). The hardware is Intel Server System SR1625UR. > > This appears to match a finding discussed in these threads, having to do > with timing of initialization of the igb(4)-based NICs (if I'm > understanding it properly): > > http://lists.freebsd.org/pipermail/freebsd-stable/2011-May/062596.html > http://lists.freebsd.org/pipermail/freebsd-stable/2011-June/062949.html > http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063867.html > http://lists.freebsd.org/pipermail/freebsd-stable/2011-September/063958.html > > > These threads include some potential patches and possibility of > commit/MFC... but it isn't clear that there was ever final resolution > (and MFC to 8-stable). I've cc'd a few folks from back then. > > A real challenge here is the frequency of occurrence. As mentioned, it > only hit's a fraction of our systems. When it _does_ hit, the system > may enter a reboot loop for days and then mysteriously break out of > it... and thereafter seem to work fine. > > I'd be very grateful for any help. Some questions: > > * Was there ever a final "blessed" patch? > o if so, will it apply to RELENG_8_3? > * Is there anything that could be said that might help us with > reproducing-the-problem / testing / validating-a-fix? > > > Panic message is -- > > panic: m_getzone: m_getjcl: invalid cluster type > cpuid = 0 > KDB: stack backtrace: > #0 0xc059c717 at kdb_backtrace+0x47 > #1 0xc056caf7 at panic+0x117 > #2 0xc03c979e at igb_refresh_mbufs+0x25e > #3 0xc03c9f98 at igb_rxeof+0x638 > #4 0xc03ca135 at igb_msix_que+0x105 > #5 0xc0541e2b at intr_event_execute_handlers+0x13b > #6 0xc05434eb at ithread_loop+0x6b > #7 0xc053efb7 at fork_exit+0x97 > #8 0xc0806744 at fork_trampoline+0x8 > > Thanks very much, > > CharlesTake a look at http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/172113 that contains simple workaround in followup message not involving any patching, and the fix. Eugene Grosbein