On Mon, Feb 08, 2021 at 10:03:59AM -0500, Mark Johnston
wrote:> On Mon, Feb 08, 2021 at 12:18:12AM +0200, Konstantin Belousov wrote:
> > On Sun, Feb 07, 2021 at 02:33:11PM -0700, Alan Somers wrote:
> > > Upgrading the BIOS fixed the problem, by clearing the MCG_CMCI_P
bit on all
> > > processors. I don't have strong opinions about whether we
should commit
> > > kib's patch too. Kib, what do you think?
> >
> > The patch causes some memory over-use.
> >
> > If this issue is not too widely experienced, I prefer to not commit
the patch.
>
> Couldn't we short-circuit cmci_monitor() if the BSP did not allocate
> anything?
>
> diff --git a/sys/x86/x86/mca.c b/sys/x86/x86/mca.c
> index 03100e77d45..0619a41b128 100644
> --- a/sys/x86/x86/mca.c
> +++ b/sys/x86/x86/mca.c
> @@ -1070,6 +1070,13 @@ cmci_monitor(int i)
>
> KASSERT(i < mca_banks, ("CPU %d has more MC banks",
PCPU_GET(cpuid)));
>
> + /*
> + * It is possible for some APs to report CMCI support even if the BSP
> + * does not, apparently due to a BIOS bug.
> + */
> + if (cmc_state == NULL)
> + return;
> +
> ctl = rdmsr(MSR_MC_CTL2(i));
> if (ctl & MC_CTL2_CMCI_EN)
> /* Already monitored by another CPU. */
> @@ -1114,6 +1121,10 @@ cmci_resume(int i)
>
> KASSERT(i < mca_banks, ("CPU %d has more MC banks",
PCPU_GET(cpuid)));
>
> + /* See cmci_monitor(). */
> + if (cmc_state == NULL)
> + return;
> +
> /* Ignore banks not monitored by this CPU. */
> if (!(PCPU_GET(cmci_mask) & 1 << i))
> return;
I think something should be printed in this case, at least once.
I believe printf() already works, because spin locks do.