Don Lewis
2018-Oct-10 04:30 UTC
early boot netisr_init() panic on older AMD SMP machine with recent 11-STABLE
My desktop machine has an older AMD SMP CPU and tracks 11-STABLE. For about six months or so it frequently panics early in boot. If I retry a sufficient number of times I can get a successful boot, but this is rather annoying. A normal boot looks like this: Copyright (c) 1992-2018 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 11.2-STABLE #16 r339017M: Sat Sep 29 19:18:41 PDT 2018 dl at mousie.catspoiler.org:/usr/obj/usr/src/sys/GENERICDDB amd64 FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1 ) WARNING: WITNESS option enabled, expect reduced performance. VT(vga): resolution 640x480 CPU: AMD Athlon(tm) II X3 450 Processor (3214.60-MHz K8-class CPU) Origin="AuthenticAMD" Id=0x100f53 Family=0x10 Model=0x5 Stepping=3 Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C MOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> Features2=0x802009<SSE3,MON,CX16,POPCNT> AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow!>AMD Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS, SKINIT,WDT> SVM: NP,NRIP,NAsids=64 TSC: P-state invariant real memory = 34359738368 (32768 MB) avail memory = 33275473920 (31733 MB) Event timer "LAPIC" quality 100 ACPI APIC Table: <GBT GBTUACPI> FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 1 package(s) x 3 core(s) ioapic0: Changing APIC ID to 2 ioapic0 <Version 2.1> irqs 0-23 on motherboard SMP: AP CPU #1 Launched! SMP: AP CPU #2 Launched! Timecounter "TSC-low" frequency 1607298818 Hz quality 800 random: entropy device external interface [SNIP] An unsuccessful boot looks like this (hand transcribed): [SNIP] ACPI APIC Table: <GBT GBTUACPI> FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs FreeBSD/SMP: 1 package(s) x 3 core(s) ioapic0: Changing APIC ID to 2 ioapic0 <Version 2.1> irqs 0-23 on motherboard SMP: AP CPU #2 Launched! SMP: AP CPU #1 Launched! Timecounter "TSC-low" frequency 1607298818 Hz quality 800 panic: netisr_init: not on CPU 0 cpuid = 2 KDB: stack backtrace: db_trace_selfwrapper() ... vpanic() ... doadump() ... netisr_init() ... mi_startup() ... btext() ... This problem may be silently occuring on many other machines. This machine is running a custom kernel with INVARIANTS and WITNESS. The panic is coming from a KASSERT(), which is only checked when the kernel is built with INVARIANTS. This KASSERT was removed from 12.0-CURRENT with this commit: https://svnweb.freebsd.org/base/head/sys/net/netisr.c?r1=301270&r2=302595 Revision 302595 - (view) (download) (annotate) - [select for diffs] Modified Mon Jul 11 21:25:28 2016 UTC (2 years, 2 months ago) by nwhitehorn File length: 44729 byte(s) Diff to previous 301270 Remove assumptions in MI code that the BSP is CPU 0. Perhaps this should be MFC'ed, but it seems odd that the BSP is non-deterministic.
Don Lewis
2018-Oct-10 19:07 UTC
early boot netisr_init() panic on older AMD SMP machine with recent 11-STABLE
On 9 Oct, Don Lewis wrote:> My desktop machine has an older AMD SMP CPU and tracks 11-STABLE. For > about six months or so it frequently panics early in boot. If I retry a > sufficient number of times I can get a successful boot, but this is > rather annoying. > > A normal boot looks like this: > > Copyright (c) 1992-2018 The FreeBSD Project. > Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 > The Regents of the University of California. All rights reserved. > FreeBSD is a registered trademark of The FreeBSD Foundation. > FreeBSD 11.2-STABLE #16 r339017M: Sat Sep 29 19:18:41 PDT 2018 > dl at mousie.catspoiler.org:/usr/obj/usr/src/sys/GENERICDDB amd64 > FreeBSD clang version 6.0.1 (tags/RELEASE_601/final 335540) (based on LLVM 6.0.1 > ) > WARNING: WITNESS option enabled, expect reduced performance. > VT(vga): resolution 640x480 > CPU: AMD Athlon(tm) II X3 450 Processor (3214.60-MHz K8-class CPU) > Origin="AuthenticAMD" Id=0x100f53 Family=0x10 Model=0x5 Stepping=3 > Features=0x178bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C > MOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT> > Features2=0x802009<SSE3,MON,CX16,POPCNT> > AMD Features=0xee500800<SYSCALL,NX,MMX+,FFXSR,Page1GB,RDTSCP,LM,3DNow!+,3DNow! >> > AMD Features2=0x37ff<LAHF,CMP,SVM,ExtAPIC,CR8,ABM,SSE4A,MAS,Prefetch,OSVW,IBS, > SKINIT,WDT> > SVM: NP,NRIP,NAsids=64 > TSC: P-state invariant > real memory = 34359738368 (32768 MB) > avail memory = 33275473920 (31733 MB) > Event timer "LAPIC" quality 100 > ACPI APIC Table: <GBT GBTUACPI> > FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs > FreeBSD/SMP: 1 package(s) x 3 core(s) > ioapic0: Changing APIC ID to 2 > ioapic0 <Version 2.1> irqs 0-23 on motherboard > SMP: AP CPU #1 Launched! > SMP: AP CPU #2 Launched! > Timecounter "TSC-low" frequency 1607298818 Hz quality 800 > random: entropy device external interface > [SNIP] > > An unsuccessful boot looks like this (hand transcribed): > [SNIP] > ACPI APIC Table: <GBT GBTUACPI> > FreeBSD/SMP: Multiprocessor System Detected: 3 CPUs > FreeBSD/SMP: 1 package(s) x 3 core(s) > ioapic0: Changing APIC ID to 2 > ioapic0 <Version 2.1> irqs 0-23 on motherboard > SMP: AP CPU #2 Launched! > SMP: AP CPU #1 Launched! > Timecounter "TSC-low" frequency 1607298818 Hz quality 800 > panic: netisr_init: not on CPU 0 > cpuid = 2 > KDB: stack backtrace: > db_trace_selfwrapper() ... > vpanic() ... > doadump() ... > netisr_init() ... > mi_startup() ... > btext() ... > > This problem may be silently occuring on many other machines. This > machine is running a custom kernel with INVARIANTS and WITNESS. The > panic is coming from a KASSERT(), which is only checked when the kernel > is built with INVARIANTS. > > This KASSERT was removed from 12.0-CURRENT with this commit: > https://svnweb.freebsd.org/base/head/sys/net/netisr.c?r1=301270&r2=302595 > > Revision 302595 - (view) (download) (annotate) - [select for diffs] > Modified Mon Jul 11 21:25:28 2016 UTC (2 years, 2 months ago) by nwhitehorn > File length: 44729 byte(s) > Diff to previous 301270 > > Remove assumptions in MI code that the BSP is CPU 0. > > Perhaps this should be MFC'ed, but it seems odd that the BSP is > non-deterministic.I now wonder if this panic is a side effect of EARLY_AP_STARTUP, which was enabled by default in 11-STABLE GENERIC back in May, so the timeframe fits. Since the panic only happens with INVARIANTS enabled, most users are unlikely to to encounter this problem.