On Mon, Feb 08, 2021 at 05:33:22PM +0200, Konstantin Belousov
wrote:> On Mon, Feb 08, 2021 at 10:03:59AM -0500, Mark Johnston wrote:
> > On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote:
> > > On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> > > > Upgrading the BIOS fixed the problem, by clearing the
MCG_CMCI_P bit on all
> > > > processors. I don't have strong opinions about whether
we should commit
> > > > kib's patch too. Kib, what do you think?
> > >
> > > The patch causes some memory over-use.
> > >
> > > If this issue is not too widely experienced, I prefer to not
commit the patch.
> >
> > Couldn't we short-circuit cmci_monitor() if the BSP did not
allocate
> > anything?
> >
> > diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> > index 03100e77d45..0619a41b128 100644
> > --- a/sys/x86/x86/mca.c
> > +++ b/sys/x86/x86/mca.c
> I think something should be printed in this case, at least once.
> I believe printf() already works, because spin locks do.
Indeed, the printf() below should only fire on an AP during SI_SUB_SMP.
Access to the static flag is synchronized by mca_lock.
diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
index 03100e77d45..8098bcfb4bd 100644
--- a/sys/x86/x86/mca.c
+++ b/sys/x86/x86/mca.c
@@ -1065,11 +1065,26 @@ mca_setup(uint64_t mcg_cap)
static void
cmci_monitor(int i)
{
+ static bool first = true;
struct cmc_state *cc;
uint64_t ctl;
KASSERT(i < mca_banks, ("CPU %d has more MC banks",
PCPU_GET(cpuid)));
+ /*
+ * It is possible for some APs to report CMCI support even if the BSP
+ * does not, apparently due to a BIOS bug.
+ */
+ if (cmc_state == NULL) {
+ if (first) {
+ printf(
+ "AP %d reports CMCI support but the BSP does not\n",
+ PCPU_GET(cpuid));
+ first = false;
+ }
+ return;
+ }
+
ctl = rdmsr(MSR_MC_CTL2(i));
if (ctl & MC_CTL2_CMCI_EN)
/* Already monitored by another CPU. */
@@ -1114,6 +1129,10 @@ cmci_resume(int i)
KASSERT(i < mca_banks, ("CPU %d has more MC banks",
PCPU_GET(cpuid)));
+ /* See cmci_monitor(). */
+ if (cmc_state == NULL)
+ return;
+
/* Ignore banks not monitored by this CPU. */
if (!(PCPU_GET(cmci_mask) & 1 << i))
return;