Sean Christopherson
2020-May-20 05:16 UTC
[PATCH v3 64/75] x86/sev-es: Cache CPUID results for improved performance
On Tue, Apr 28, 2020 at 05:17:14PM +0200, Joerg Roedel wrote:> From: Mike Stunes <mstunes at vmware.com> > > To avoid a future VMEXIT for a subsequent CPUID function, cache the > results returned by CPUID into an xarray. > > [tl: coding standard changes, register zero extension] > > Signed-off-by: Mike Stunes <mstunes at vmware.com> > Signed-off-by: Tom Lendacky <thomas.lendacky at amd.com> > [ jroedel at suse.de: - Wrapped cache handling into vc_handle_cpuid_cached() > - Used lower_32_bits() where applicable > - Moved cache_index out of struct es_em_ctxt ] > Co-developed-by: Joerg Roedel <jroedel at suse.de> > Signed-off-by: Joerg Roedel <jroedel at suse.de> > ---...> +struct sev_es_cpuid_cache_entry { > + unsigned long eax; > + unsigned long ebx; > + unsigned long ecx; > + unsigned long edx;Why are these unsigned longs? CPUID returns 32-bit values, this wastes 16 bytes per entry.> +}; > + > +static struct xarray sev_es_cpuid_cache; > +static bool __ro_after_init sev_es_cpuid_cache_initialized; > + > /* For early boot hypervisor communication in SEV-ES enabled guests */ > static struct ghcb boot_ghcb_page __bss_decrypted __aligned(PAGE_SIZE); > > @@ -463,6 +474,9 @@ void __init sev_es_init_vc_handling(void) > sev_es_setup_vc_stack(cpu); > } > > + xa_init_flags(&sev_es_cpuid_cache, XA_FLAGS_LOCK_IRQ); > + sev_es_cpuid_cache_initialized = true; > + > init_vc_stack_names(); > } > > @@ -744,6 +758,91 @@ static enum es_result vc_handle_mmio(struct ghcb *ghcb, > return ret; > } > > +static unsigned long sev_es_get_cpuid_cache_index(struct es_em_ctxt *ctxt) > +{ > + unsigned long hi, lo; > + > + /* Don't attempt to cache until the xarray is initialized */ > + if (!sev_es_cpuid_cache_initialized) > + return ULONG_MAX; > + > + lo = lower_32_bits(ctxt->regs->ax); > + > + /* > + * CPUID 0x0000000d requires both RCX and XCR0, so it can't be > + * cached. > + */ > + if (lo == 0x0000000d) > + return ULONG_MAX; > + > + /* > + * Some callers of CPUID don't always set RCX to zero for CPUID > + * functions that don't require RCX, which can result in excessive > + * cached values, so RCX needs to be manually zeroed for use as part > + * of the cache index. Future CPUID values may need RCX, but since > + * they can't be known, they must not be cached. > + */ > + if (lo > 0x80000020) > + return ULONG_MAX; > + > + switch (lo) { > + case 0x00000007:OSPKE may or may not be cached correctly depending on when sev_es_cpuid_cache_initialized is set.> + case 0x0000000b: > + case 0x0000000f: > + case 0x00000010: > + case 0x8000001d: > + case 0x80000020: > + hi = ctxt->regs->cx << 32; > + break; > + default: > + hi = 0; > + } > + > + return hi | lo;This needs to be way more restrictive on what is cached. Unless I've overlooked something, this lets userspace trigger arbitrary, unaccounted kernel memory allocations. E.g. for (i = 0; i <= 0x80000020; i++) { for (j = 0; j <= 0xffffffff; j++) { cpuid(i, j); if (i != 7 || i != 0xb || i != 0xf || i != 0x10 || i != 0x8000001d || i != 0x80000020) break; } } The whole cache on-demand approach seems like overkill. The number of CPUID leaves that are invoked after boot with any regularity can probably be counted on one hand. IIRC glibc invokes CPUID to gather TLB/cache info, XCR0-based features, and one or two other leafs. A statically sized global array that's arbitrarily index a la x86_capability would be just as simple and more performant. It would also allow fancier things like emulating CPUID 0xD in the guest if you want to go down that road.
Borislav Petkov
2020-May-26 09:19 UTC
[PATCH v3 64/75] x86/sev-es: Cache CPUID results for improved performance
On Tue, May 19, 2020 at 10:16:37PM -0700, Sean Christopherson wrote:> The whole cache on-demand approach seems like overkill. The number of CPUID > leaves that are invoked after boot with any regularity can probably be counted > on one hand. IIRC glibc invokes CPUID to gather TLB/cache info, XCR0-based > features, and one or two other leafs. A statically sized global array that's > arbitrarily index a la x86_capability would be just as simple and more > performant. It would also allow fancier things like emulating CPUID 0xD in > the guest if you want to go down that road.And before we do any of that "caching" or whatnot, I'd like to see numbers justifying its existence. Because if it is only a couple of CPUID invocations and the boot delay is immeasurable, then it's not worth the effort. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Tom Lendacky
2020-May-27 17:49 UTC
[PATCH v3 64/75] x86/sev-es: Cache CPUID results for improved performance
On 5/26/20 4:19 AM, Borislav Petkov wrote:> On Tue, May 19, 2020 at 10:16:37PM -0700, Sean Christopherson wrote: >> The whole cache on-demand approach seems like overkill. The number of CPUID >> leaves that are invoked after boot with any regularity can probably be counted >> on one hand. IIRC glibc invokes CPUID to gather TLB/cache info, XCR0-based >> features, and one or two other leafs. A statically sized global array that's >> arbitrarily index a la x86_capability would be just as simple and more >> performant. It would also allow fancier things like emulating CPUID 0xD in >> the guest if you want to go down that road. > > And before we do any of that "caching" or whatnot, I'd like to see > numbers justifying its existence. Because if it is only a couple of > CPUID invocations and the boot delay is immeasurable, then it's not > worth the effort.I added some rudimentary stats code to see how many times there was a CPUID cache hit on a 64-vCPU guest during a kernel build (make clean followed by make with -j 64): SEV-ES CPUID cache statistics 0x00000000/0x00000000: 220,384 0x00000007/0x00000000: 213,306 0x80000000/0x00000000: 1,054,642 0x80000001/0x00000000: 213,306 0x80000005/0x00000000: 210,334 0x80000006/0x00000000: 420,668 0x80000007/0x00000000: 210,334 0x80000008/0x00000000: 420,684 2,963,658 cache hits So it is significant in quantity, but I'm not sure what the overall performance difference is. If I can find some more time I'll try to compare the kernel builds with and without the caching to see if it is noticeable. Thanks, Tom>
Apparently Analagous Threads
- [PATCH v3 64/75] x86/sev-es: Cache CPUID results for improved performance
- [PATCH v3 64/75] x86/sev-es: Cache CPUID results for improved performance
- [PATCH v3 64/75] x86/sev-es: Cache CPUID results for improved performance
- [PATCH v3 64/75] x86/sev-es: Cache CPUID results for improved performance
- [PATCH v3 59/75] x86/sev-es: Handle MONITOR/MONITORX Events