Kirill A. Shutemov
2021-Nov-08 14:45 UTC
[PATCH v5 03/16] x86/tdx: Exclude Shared bit from physical_mask
On Fri, Nov 05, 2021 at 10:11:48PM +0000, Sean Christopherson wrote:> On Fri, Oct 08, 2021, Kuppuswamy Sathyanarayanan wrote: > > From: "Kirill A. Shutemov" <kirill.shutemov at linux.intel.com> > > > > Just like MKTME, TDX reassigns bits of the physical address for > > metadata. MKTME used several bits for an encryption KeyID. TDX > > uses a single bit in guests to communicate whether a physical page > > should be protected by TDX as private memory (bit set to 0) or > > unprotected and shared with the VMM (bit set to 1). > > > > Add a helper, tdx_shared_mask() to generate the mask. The processor > > enumerates its physical address width to include the shared bit, which > > means it gets included in __PHYSICAL_MASK by default. > > This is incorrect. The shared bit _may_ be a legal PA bit, but AIUI it's not a > hard requirement.Good point, will fix.> > Remove the shared mask from 'physical_mask' since any bits in > > tdx_shared_mask() are not used for physical addresses in page table > > entries. > > ... > > > @@ -94,6 +100,9 @@ static void tdx_get_info(void) > > > > td_info.gpa_width = out.rcx & GENMASK(5, 0); > > td_info.attributes = out.rdx; > > + > > + /* Exclude Shared bit from the __PHYSICAL_MASK */ > > + physical_mask &= ~tdx_shared_mask(); > > This is insufficient, though it's not really the fault of this patch, the specs > themselves botch this whole thing. > > The TDX Module spec explicitly states that GPAs above GPAW are considered reserved. > > 10.11.1. GPAW-Relate EPT Violations > GPA bits higher than the SHARED bit are considered reserved and must be 0. > Address translation with any of the reserved bits set to 1 cause a #PF with > PFEC (Page Fault Error Code) RSVD bit set. > > But this is contradicted by the architectural extensions spec, which states that > a GPA that satisfies MAXPA >= GPA > GPAW "can" cause an EPT violation, not #PF. > Note, this section also appears to have a bug, as it states that GPA bit 47 is > both the SHARED bit and reserved. I assume that blurb is intended to clarify > that bit 47 _would_ be reserved if it weren't the SHARED bit, but because it's > the shared bit it's ok to access. > > 1.4.2 > Guest Physical Address Translation > If the CPU's maximum physical-address width (MAXPA) is 52 and the guest physical > address width is configured to be 48, accesses with GPA bits 51:48 not all being > 0 can cause an EPT-violation, where such EPT-violations are not mutated to #VE, > even if the ?EPT-violations #VE? execution control is 1. > > If the CPU's physical-address width (MAXPA) is less than 48 and the SHARED bit > is configured to be in bit position 47, GPA bit 47 would be reserved, and GPA > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > bits 46:MAXPA would be reserved. On such CPUs, setting bits 51:48 or bits > 46:MAXPA in any paging structure can cause a reserved bit page fault on access. > > The Module spec also calls out that the effective GPA is not to be confused with > MAXPA, which combined with the above blurb about MAXPA < GPAW, suggests that MAXPA > is enumerated separately by design so that the guest doesn't incorrectly think > 46:MAXPA are usable. But that is problematic for the case where MAXPA > GPAW. > > The effective GPA width (in bits) for this TD (do not confuse with MAXPA). > SHARED bit is at GPA bit GPAW-1. > > I can't find the exact reference, but the TDX module always passes through host's > MAXPHYADDR. As it pertains to this patch, just doing > > physical_mask &= ~tdx_shared_mask() > > means that a guest running with GPAW=0 and MAXPHYADDR>48 will have a discontiguous > physical_mask, and could access "reserved" memory. If the VMM defines legal memory > with bits [MAXPHYADDR:48]!=0, explosions may ensue. That's arguably a VMM bug, but > given that the VMM is untrusted I think the guest should be paranoid when handling > the SHARED bit. I also don't know that the kernel will play nice with a discontiguous > mask.I expect it to be buggy.> Specs aside, unless Intel makes a hardware change to treat GPAW as guest.MAXPHYADDR, > or the TDX Module emulates on EPT violations to inject #PF(RSVD) when appropriate, > this mess isn't going to be truly fixed from the guest perspective. > > So, IMO all bits >= GPAW should be cleared, and the kernel should warn and/or > refuse to boot if the host has defined legal memory in that range.Right. But only >= GPAW-1 as shared bit is the MSB within GPAW: physical_mask &= GENMASK_ULL(td_info.gpa_width - 2, 0); '2' here smells bad, but well... Given that physical_mask is now contiguous we can truncate anything from e820 that cannot be addressed with adjusted __PHYSICAL_MASK: iff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index bc0657f0deed..16d57a8769e8 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -833,6 +833,9 @@ static unsigned long __init e820_end_pfn(unsigned long limit_pfn, enum e820_type unsigned long last_pfn = 0; unsigned long max_arch_pfn = MAX_ARCH_PFN; + if (max_arch_pfn > PHYS_PFN(__PHYSICAL_MASK + 1)) + max_arch_pfn = PHYS_PFN(__PHYSICAL_MASK + 1); + for (i = 0; i < e820_table->nr_entries; i++) { struct e820_entry *entry = &e820_table->entries[i]; unsigned long start_pfn; Does it look reasonable?> FWIW, from a VMM perspective, I'm pretty sure the only sane approach is to force > GPAW=1, a.k.a. SHARED bit == 51, if host.MAXPHYADDR>=49. But on the guest side, > I think we should be paranoid.-- Kirill A. Shutemov