On Thu, Feb 04, 2021 at 01:34:13PM -0800, Matthew Macy
wrote:> On Thu, Feb 4, 2021 at 1:31 PM Alan Somers <asomers at freebsd.org>
wrote:
> >
> > After upgrading a machine to FreeBSD, 12.2, it hit the following panic
on
> > its first reboot. I suspect that a few other servers have hit this
too,
> > but since it happens before swap is mounted there are no core dumps,
and
> > they usually reboot immediately. The code in question hasn't
changed since
> > 2018. The panic happened in cmci_monitor at line 930. Does anybody
have
> > any suggestions for how I could debug further? I can't readily
reproduce
> > it, and I can't dump core, but I'd like to investigate it any
way I can.
> > The server in question has dual Xeon Gold 6142 CPUs.
> >
>
> I can't actually help :( but I can add a +1 with similar hardware or
> equivalent specs. It's not frequent, but it's often enough to be
> annoying.
> -M
>
> > if (!(ctl & MC_CTL2_CMCI_EN))
> > /* This bank does not support CMCI. */
> > return;
> >
> > cc = &cmc_state[PCPU_GET(cpuid)][i]; // <- panic here
> >
> > /* Determine maximum threshold. */
> >
> >
> > Fatal trap 12: page fault while in kernel mode
> > cpuid = 26; apic id = 34
> > fault virtual address = 0xd0
> > fault code = supervisor read data, page not present
> > instruction pointer = 0x20:0xffffffff8125a009
> > stack pointer = 0x28:0xfffffe0000b65f20
> > frame pointer = 0x28:0xfffffe0000b65f50
> > code segment = base 0x0, limit 0xfffff, type 0x1b
> > = DPL 0, pres 1, long 1, def32 0, gran 1
> > processor eflags = resume, IOPL = 0
> > current process = 11 (idle: cpu26)
> > trap number = 12
> > panic: page fault
> > cpuid = 26
> > time = 1
> > KDB: stack backtrace:
> > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> > 0xfffffe0000b65be0
> > vpanic() at vpanic+0x17b/frame 0xfffffe0000b65c30
> > panic() at panic+0x43/frame 0xfffffe0000b65c90
> > trap_fatal() at trap_fatal+0x391/frame 0xfffffe0000b65cf0
> > trap_pfault() at trap_pfault+0x4f/frame 0xfffffe0000b65d40
> > trap() at trap+0x286/frame 0xfffffe0000b65e50
> > calltrap() at calltrap+0x8/frame 0xfffffe0000b65e50
> > --- trap 0xc, rip = 0xffffffff8125a009, rsp = 0xfffffe0000b65f20, rbp
> > 0xfffffe0000b65f50 ---
> > _mca_init() at _mca_init+0x5d9/frame 0xfffffe0000b65f50
> > init_secondary_tail() at init_secondary_tail+0xfd/frame
0xfffffe0000b65f80
> > init_secondary() at init_secondary+0x2d1/frame 0xfffffe0000b65ff0
> > KDB: enter: panic
> > [ thread pid 11 tid 100029 ]
> > Stopped at kdb_enter+0x37: movq $0,0x12bc1f6(%rip)
Try this.
I think that there is no other dependencies in the startup order, but
cannot know it for sure.
commit 19584e3d3e9606d591fa30999b370ed758960e8c
Author: Konstantin Belousov <kib at FreeBSD.org>
Date: Fri Feb 5 00:56:09 2021 +0200
x86: init mca before APs are started
diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d455..e2bf2673cf69 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1371,7 +1371,7 @@ mca_init_bsp(void *arg __unused)
mca_init();
}
-SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_ANY, mca_init_bsp, NULL);
+SYSINIT(mca_init_bsp, SI_SUB_CPU, SI_ORDER_SECOND, mca_init_bsp, NULL);
/* Called when a machine check exception fires. */
void