Andres Lagar-Cavilla
2012-Mar-01 03:15 UTC
[PATCH 0 of 3] RFC Paging support for AMD NPT V2
There has been some progress, but still no joy. Definitely not intended for inclusion at this point. Tim, Wei, I added a Xen command line toggle to disable IOMMU and P2M table sharing. Tim, I verified that changes to p2m-pt.c don''t break shadow mode (64bit hypervisor and Win 7 guest). Hongkaixing, I incorporated your suggestion in patch 2, so I should add your Signed-off-by eventually. Please review. Olaf, I do not see errors when mapping pages prior to eviction. Let me know how it goes for you. If I unleash xenpaging on a domain for a non-ambitious target (64 MiBs paged out for one minute), we sail by just fine. But if I kick things up a notch, still dying on an VMEXIT_SHUTDOWN (a.k.a. triple fault). Not sure at all what to do next. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> xen/drivers/passthrough/iommu.c | 2 + xen/arch/x86/mm/p2m-pt.c | 56 ++++++++++++++++++++++++++++------------ xen/arch/x86/mm/mem_event.c | 7 +++- xen/arch/x86/mm/mem_sharing.c | 7 +++++ 4 files changed, 53 insertions(+), 19 deletions(-)
Andres Lagar-Cavilla
2012-Mar-01 03:15 UTC
[PATCH 1 of 3] IOMMU: Add command line param to disable sharing of IOMMU and hap tables
xen/drivers/passthrough/iommu.c | 2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) The default is 1, and the command line parameter sets to 1. Disabling may be desired if the host will contain VMs that do paging, sharing or mem access, and won''t be doing passthrough. These two features are mutually exclusive for AMD processors. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r 6532b3695d7b -r e4013da987e2 xen/drivers/passthrough/iommu.c --- a/xen/drivers/passthrough/iommu.c +++ b/xen/drivers/passthrough/iommu.c @@ -87,6 +87,8 @@ static void __init parse_iommu_param(cha iommu_dom0_strict = 1; else if ( !strcmp(s, "sharept") ) iommu_hap_pt_share = 1; + else if ( !strcmp(s, "no-sharept") ) + iommu_hap_pt_share = 0; s = ss + 1; } while ( ss );
Andres Lagar-Cavilla
2012-Mar-01 03:15 UTC
[PATCH 2 of 3] x86/mm: Teach paging to page table-based p2m
xen/arch/x86/mm/p2m-pt.c | 56 +++++++++++++++++++++++++++++++++-------------- 1 files changed, 39 insertions(+), 17 deletions(-) The p2m-pt.c code, used by both shadow and AMD NPT modes, was not aware of paging types, and the implications those types have on p2m entries. Add support to the page table-based p2m to understand the paging types. This is a necessary step towards enabling memory paging on AMD NPT mode, but not yet the full solution. Tested not to break neither shadow mode nor "normal" (i.e. no paging) AMD NPT mode. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r e4013da987e2 -r d6c3c77ad749 xen/arch/x86/mm/p2m-pt.c --- a/xen/arch/x86/mm/p2m-pt.c +++ b/xen/arch/x86/mm/p2m-pt.c @@ -53,6 +53,20 @@ #define P2M_BASE_FLAGS \ (_PAGE_PRESENT | _PAGE_USER | _PAGE_DIRTY | _PAGE_ACCESSED) +#ifdef __x86_64__ +/* l1e_from_pfn is not designed to have INVALID_MFN stored. The 0xff..ff + * value tramples over the higher-order bits used for flags (NX, p2mt, + * etc.) This happens for paging entries. Thus we do this clip/unclip + * juggle for l1 entries only (no paging superpages!) */ +#define EFF_MFN_WIDTH (PADDR_BITS-PAGE_SHIFT) /* 40 bits */ +#define clipped_mfn(mfn) ((mfn) & ((1UL << EFF_MFN_WIDTH) - 1)) +#define unclip_mfn(mfn) (((mfn) == clipped_mfn(INVALID_MFN)) ? \ + INVALID_MFN : (mfn)) +#else +#define clipped_mfn(mfn) (mfn) +#define unclip_mfn(mfn) (mfn) +#endif /* __x86_64__ */ + static unsigned long p2m_type_to_flags(p2m_type_t t, mfn_t mfn) { unsigned long flags; @@ -77,6 +91,9 @@ static unsigned long p2m_type_to_flags(p case p2m_invalid: case p2m_mmio_dm: case p2m_populate_on_demand: + case p2m_ram_paging_out: + case p2m_ram_paged: + case p2m_ram_paging_in: default: return flags; case p2m_ram_ro: @@ -168,7 +185,7 @@ p2m_next_level(struct p2m_domain *p2m, m shift, max)) ) return 0; - /* PoD: Not present doesn''t imply empty. */ + /* PoD/paging: Not present doesn''t imply empty. */ if ( !l1e_get_flags(*p2m_entry) ) { struct page_info *pg; @@ -384,8 +401,8 @@ p2m_set_entry(struct p2m_domain *p2m, un 0, L1_PAGETABLE_ENTRIES); ASSERT(p2m_entry); - if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) ) - entry_content = l1e_from_pfn(mfn_x(mfn), + if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) || p2m_is_paging(p2mt) ) + entry_content = l1e_from_pfn(clipped_mfn(mfn_x(mfn)), p2m_type_to_flags(p2mt, mfn)); else entry_content = l1e_empty(); @@ -393,7 +410,7 @@ p2m_set_entry(struct p2m_domain *p2m, un if ( entry_content.l1 != 0 ) { p2m_add_iommu_flags(&entry_content, 0, iommu_pte_flags); - old_mfn = l1e_get_pfn(*p2m_entry); + old_mfn = unclip_mfn(l1e_get_pfn(*p2m_entry)); } /* level 1 entry */ p2m->write_p2m_entry(p2m, gfn, p2m_entry, table_mfn, entry_content, 1); @@ -615,11 +632,12 @@ pod_retry_l1: sizeof(l1e)); if ( ret == 0 ) { + unsigned long l1e_mfn = unclip_mfn(l1e_get_pfn(l1e)); p2mt = p2m_flags_to_type(l1e_get_flags(l1e)); - ASSERT(l1e_get_pfn(l1e) != INVALID_MFN || !p2m_is_ram(p2mt)); + ASSERT( (l1e_mfn != INVALID_MFN || !p2m_is_ram(p2mt)) || + (l1e_mfn == INVALID_MFN && p2m_is_paging(p2mt)) ); - if ( p2m_flags_to_type(l1e_get_flags(l1e)) - == p2m_populate_on_demand ) + if ( p2mt == p2m_populate_on_demand ) { /* The read has succeeded, so we know that the mapping * exits at this point. */ @@ -641,7 +659,7 @@ pod_retry_l1: } if ( p2m_is_valid(p2mt) || p2m_is_grant(p2mt) ) - mfn = _mfn(l1e_get_pfn(l1e)); + mfn = _mfn(l1e_mfn); else /* XXX see above */ p2mt = p2m_mmio_dm; @@ -663,6 +681,8 @@ p2m_gfn_to_mfn(struct p2m_domain *p2m, u paddr_t addr = ((paddr_t)gfn) << PAGE_SHIFT; l2_pgentry_t *l2e; l1_pgentry_t *l1e; + unsigned long l1e_flags; + p2m_type_t l1t; ASSERT(paging_mode_translate(p2m->domain)); @@ -781,10 +801,12 @@ pod_retry_l2: l1e = map_domain_page(mfn_x(mfn)); l1e += l1_table_offset(addr); pod_retry_l1: - if ( (l1e_get_flags(*l1e) & _PAGE_PRESENT) == 0 ) + l1e_flags = l1e_get_flags(*l1e); + l1t = p2m_flags_to_type(l1e_flags); + if ( ((l1e_flags & _PAGE_PRESENT) == 0) && (!p2m_is_paging(l1t)) ) { /* PoD: Try to populate */ - if ( p2m_flags_to_type(l1e_get_flags(*l1e)) == p2m_populate_on_demand ) + if ( l1t == p2m_populate_on_demand ) { if ( q != p2m_query ) { if ( !p2m_pod_demand_populate(p2m, gfn, PAGE_ORDER_4K, q) ) @@ -792,15 +814,15 @@ pod_retry_l1: } else *t = p2m_populate_on_demand; } - + unmap_domain_page(l1e); return _mfn(INVALID_MFN); } - mfn = _mfn(l1e_get_pfn(*l1e)); - *t = p2m_flags_to_type(l1e_get_flags(*l1e)); + mfn = _mfn(unclip_mfn(l1e_get_pfn(*l1e))); + *t = l1t; unmap_domain_page(l1e); - ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t)); + ASSERT(mfn_valid(mfn) || !p2m_is_ram(*t) || p2m_is_paging(*t)); if ( page_order ) *page_order = PAGE_ORDER_4K; return (p2m_is_valid(*t) || p2m_is_grant(*t)) ? mfn : _mfn(INVALID_MFN); @@ -914,7 +936,7 @@ static void p2m_change_type_global(struc flags = l1e_get_flags(l1e[i1]); if ( p2m_flags_to_type(flags) != ot ) continue; - mfn = l1e_get_pfn(l1e[i1]); + mfn = unclip_mfn(l1e_get_pfn(l1e[i1])); gfn = i1 + (i2 + (i3 #if CONFIG_PAGING_LEVELS >= 4 + (i4 * L3_PAGETABLE_ENTRIES) @@ -923,7 +945,7 @@ static void p2m_change_type_global(struc * L2_PAGETABLE_ENTRIES) * L1_PAGETABLE_ENTRIES; /* create a new 1le entry with the new type */ flags = p2m_type_to_flags(nt, _mfn(mfn)); - l1e_content = l1e_from_pfn(mfn, flags); + l1e_content = l1e_from_pfn(clipped_mfn(mfn), flags); p2m->write_p2m_entry(p2m, gfn, &l1e[i1], l1mfn, l1e_content, 1); } @@ -1073,7 +1095,7 @@ long p2m_pt_audit_p2m(struct p2m_domain entry_count++; continue; } - mfn = l1e_get_pfn(l1e[i1]); + mfn = unclip_mfn(l1e_get_pfn(l1e[i1])); ASSERT(mfn_valid(_mfn(mfn))); m2pfn = get_gpfn_from_mfn(mfn); if ( m2pfn != gfn &&
Andres Lagar-Cavilla
2012-Mar-01 03:15 UTC
[PATCH 3 of 3] x86/mm: Enable paging and sharing in AMD NPT mode
xen/arch/x86/mm/mem_event.c | 7 +++++-- xen/arch/x86/mm/mem_sharing.c | 7 +++++++ 2 files changed, 12 insertions(+), 2 deletions(-) Both features are mutually exclusive with sharing iommu and p2m tables. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> Signed-off-by: Adin Scannell <adin@scannell.ca> diff -r d6c3c77ad749 -r 4c6bee5a191a xen/arch/x86/mm/mem_event.c --- a/xen/arch/x86/mm/mem_event.c +++ b/xen/arch/x86/mm/mem_event.c @@ -550,8 +550,11 @@ int mem_event_domctl(struct domain *d, x if ( !hap_enabled(d) ) break; - /* Currently only EPT is supported */ - if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) + /* Currently EPT or AMD with no iommu/hap page table sharing are + * supported */ + if ( !((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) || + ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && + !iommu_hap_pt_share)) ) break; rc = -EXDEV; diff -r d6c3c77ad749 -r 4c6bee5a191a xen/arch/x86/mm/mem_sharing.c --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1202,6 +1202,13 @@ int mem_sharing_domctl(struct domain *d, if ( !hap_enabled(d) ) return -ENODEV; + /* Currently EPT or AMD with no iommu/hap page table sharing are + * supported */ + if ( !((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) || + ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && + !iommu_hap_pt_share)) ) + return -ENODEV; + switch(mec->op) { case XEN_DOMCTL_MEM_SHARING_CONTROL:
> -----Original Message----- > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > Sent: Thursday, March 01, 2012 11:16 AM > To: xen-devel@lists.xensource.com > Cc: tim@xen.org; andres@gridcentric.ca; olaf@aepfle.de; adin@gridcentric.ca; wei.wang2@amd.com; hongkaixing@huawei.com > Subject: [PATCH 0 of 3] RFC Paging support for AMD NPT V2 > > There has been some progress, but still no joy. Definitely not intended for > inclusion at this point. > > Tim, Wei, I added a Xen command line toggle to disable IOMMU and P2M table > sharing. > > Tim, I verified that changes to p2m-pt.c don''t break shadow mode (64bit > hypervisor and Win 7 guest). > > Hongkaixing, I incorporated your suggestion in patch 2, so I should add your > Signed-off-by eventually. Please review.I have checked the code, it looks like OK. But I don''t have AMD machine to run it. We used almost the same code in a AMD server months before, and it worked fine. The only difference is we have not changed the clipped invalid mfn to INVALID_MFN, just like EPT.> > Olaf, I do not see errors when mapping pages prior to eviction. Let me know how > it goes for you. > > If I unleash xenpaging on a domain for a non-ambitious target (64 MiBs paged > out for one minute), we sail by just fine. But if I kick things up a notch, > still dying on an VMEXIT_SHUTDOWN (a.k.a. triple fault). Not sure at all what > to do next. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Signed-off-by: Adin Scannell <adin@scannell.ca> > > xen/drivers/passthrough/iommu.c | 2 + > xen/arch/x86/mm/p2m-pt.c | 56 ++++++++++++++++++++++++++++------------ > xen/arch/x86/mm/mem_event.c | 7 +++- > xen/arch/x86/mm/mem_sharing.c | 7 +++++ > 4 files changed, 53 insertions(+), 19 deletions(-)
Andres Lagar-Cavilla
2012-Mar-01 16:34 UTC
Re: [PATCH 0 of 3] RFC Paging support for AMD NPT V2
> > >> -----Original Message----- >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> Sent: Thursday, March 01, 2012 11:16 AM >> To: xen-devel@lists.xensource.com >> Cc: tim@xen.org; andres@gridcentric.ca; olaf@aepfle.de; >> adin@gridcentric.ca; wei.wang2@amd.com; hongkaixing@huawei.com >> Subject: [PATCH 0 of 3] RFC Paging support for AMD NPT V2 >> >> There has been some progress, but still no joy. Definitely not intended >> for >> inclusion at this point. >> >> Tim, Wei, I added a Xen command line toggle to disable IOMMU and P2M >> table >> sharing. >> >> Tim, I verified that changes to p2m-pt.c don''t break shadow mode (64bit >> hypervisor and Win 7 guest). >> >> Hongkaixing, I incorporated your suggestion in patch 2, so I should add >> your >> Signed-off-by eventually. Please review. > > I have checked the code, it looks like OK. But I don''t have AMD machine > to run it. > We used almost the same code in a AMD server months before, and it > worked fine. > The only difference is we have not changed the clipped invalid mfn to > INVALID_MFN, just like EPT.That is most interesting. Can you elaborate? do you have functional paging on AMD since "months before"? If you do, can you please post it to xen-devel for inclusion? Thanks a lot, Andres> > >> >> Olaf, I do not see errors when mapping pages prior to eviction. Let me >> know how >> it goes for you. >> >> If I unleash xenpaging on a domain for a non-ambitious target (64 MiBs >> paged >> out for one minute), we sail by just fine. But if I kick things up a >> notch, >> still dying on an VMEXIT_SHUTDOWN (a.k.a. triple fault). Not sure at all >> what >> to do next. >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> Signed-off-by: Adin Scannell <adin@scannell.ca> >> >> xen/drivers/passthrough/iommu.c | 2 + >> xen/arch/x86/mm/p2m-pt.c | 56 >> ++++++++++++++++++++++++++++------------ >> xen/arch/x86/mm/mem_event.c | 7 +++- >> xen/arch/x86/mm/mem_sharing.c | 7 +++++ >> 4 files changed, 53 insertions(+), 19 deletions(-) > >
> -----Original Message----- > From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > Sent: Friday, March 02, 2012 12:34 AM > To: Hongkaixing > Cc: xen-devel@lists.xensource.com; tim@xen.org; andres@gridcentric.ca; olaf@aepfle.de; adin@gridcentric.ca; wei.wang2@amd.com; > yanqiangjun@huawei.com > Subject: RE: [PATCH 0 of 3] RFC Paging support for AMD NPT V2 > > > > > > >> -----Original Message----- > >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] > >> Sent: Thursday, March 01, 2012 11:16 AM > >> To: xen-devel@lists.xensource.com > >> Cc: tim@xen.org; andres@gridcentric.ca; olaf@aepfle.de; > >> adin@gridcentric.ca; wei.wang2@amd.com; hongkaixing@huawei.com > >> Subject: [PATCH 0 of 3] RFC Paging support for AMD NPT V2 > >> > >> There has been some progress, but still no joy. Definitely not intended > >> for > >> inclusion at this point. > >> > >> Tim, Wei, I added a Xen command line toggle to disable IOMMU and P2M > >> table > >> sharing. > >> > >> Tim, I verified that changes to p2m-pt.c don''t break shadow mode (64bit > >> hypervisor and Win 7 guest). > >> > >> Hongkaixing, I incorporated your suggestion in patch 2, so I should add > >> your > >> Signed-off-by eventually. Please review. > > > > I have checked the code, it looks like OK. But I don''t have AMD machine > > to run it. > > We used almost the same code in a AMD server months before, and it > > worked fine. > > The only difference is we have not changed the clipped invalid mfn to > > INVALID_MFN, just like EPT. > > That is most interesting. Can you elaborate? do you have functional paging > on AMD since "months before"? If you do, can you please post it to > xen-devel for inclusion? > > Thanks a lot, > AndresAll what we have done is shown below. The code is based on xen 4.0.2, it looks informal, but it works fine when xenpaging target is set at arbitrary number.. Now we do not have AMD platform to validate, so I am not sure it can work under unstable code. Index: E:/xen-4.0.2/xen/arch/x86/mm/p2m.c ==================================================================--- E:/xen-4.0.2/xen/arch/x86/mm/p2m.c (revision 30771) +++ E:/xen-4.0.2/xen/arch/x86/mm/p2m.c (revision 30772) @@ -77,6 +77,13 @@ #define SUPERPAGE_PAGES (1UL << 9) #define superpage_aligned(_x) (((_x)&(SUPERPAGE_PAGES-1))==0) +#ifdef __x86_64__ +/* there are only 40bits for pfn in PTE */ +#define MASK_MFN(x) ( x & ((1UL << 40) -1 )) +#else +#define MASK_MFN(x) ( x ) +#endif + static unsigned long p2m_type_to_flags(p2m_type_t t) { unsigned long flags; @@ -1306,6 +1313,8 @@ if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) ) entry_content = l1e_from_pfn(mfn_x(mfn), p2m_type_to_flags(p2mt)); + else if ( p2m_is_paging( p2mt ) ) + entry_content = l1e_from_pfn( MASK_MFN(mfn_x(mfn)), p2m_type_to_flags(p2mt)); else entry_content = l1e_empty(); @@ -1370,6 +1379,7 @@ paddr_t addr = ((paddr_t)gfn) << PAGE_SHIFT; l2_pgentry_t *l2e; l1_pgentry_t *l1e; + unsigned long flags; ASSERT(paging_mode_translate(d)); @@ -1455,7 +1465,8 @@ l1e = map_domain_page(mfn_x(mfn)); l1e += l1_table_offset(addr); pod_retry_l1: - if ( (l1e_get_flags(*l1e) & _PAGE_PRESENT) == 0 ) + flags = l1e_get_flags(*l1e); + if ((( flags & _PAGE_PRESENT) == 0) && (!p2m_is_paging(p2m_flags_to_type(flags)))) { /* PoD: Try to populate */ if ( p2m_flags_to_type(l1e_get_flags(*l1e)) == p2m_populate_on_demand ) Index: E: /xen-4.0.2/xen/arch/x86/mm/mem_event.c ==================================================================--- E:/xen-4.0.2/xen/arch/x86/mm/mem_event.c (revision 30771) +++ E:/xen-4.0.2/xen/arch/x86/mm/mem_event.c (revision 30772) @@ -231,10 +231,9 @@ if ( d->mem_event.ring_page ) break; - /* Currently only EPT is supported */ + /* Both AMD NPT and intel EPT are supported */ rc = -ENODEV; - if ( !(is_hvm_domain(d) && d->arch.hvm_domain.hap_enabled && - (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)) ) + if ( !(is_hvm_domain(d) && d->arch.hvm_domain.hap_enabled )) break; /* Get MFN of ring page */> > > > > > >> > >> Olaf, I do not see errors when mapping pages prior to eviction. Let me > >> know how > >> it goes for you. > >> > >> If I unleash xenpaging on a domain for a non-ambitious target (64 MiBs > >> paged > >> out for one minute), we sail by just fine. But if I kick things up a > >> notch, > >> still dying on an VMEXIT_SHUTDOWN (a.k.a. triple fault). Not sure at all > >> what > >> to do next. > >> > >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > >> Signed-off-by: Adin Scannell <adin@scannell.ca> > >> > >> xen/drivers/passthrough/iommu.c | 2 + > >> xen/arch/x86/mm/p2m-pt.c | 56 > >> ++++++++++++++++++++++++++++------------ > >> xen/arch/x86/mm/mem_event.c | 7 +++- > >> xen/arch/x86/mm/mem_sharing.c | 7 +++++ > >> 4 files changed, 53 insertions(+), 19 deletions(-) > > > >
Andres Lagar-Cavilla
2012-Mar-02 15:55 UTC
Re: [PATCH 0 of 3] RFC Paging support for AMD NPT V2
> > >> -----Original Message----- >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> Sent: Friday, March 02, 2012 12:34 AM >> To: Hongkaixing >> Cc: xen-devel@lists.xensource.com; tim@xen.org; andres@gridcentric.ca; >> olaf@aepfle.de; adin@gridcentric.ca; wei.wang2@amd.com; >> yanqiangjun@huawei.com >> Subject: RE: [PATCH 0 of 3] RFC Paging support for AMD NPT V2 >> >> > >> > >> >> -----Original Message----- >> >> From: Andres Lagar-Cavilla [mailto:andres@lagarcavilla.org] >> >> Sent: Thursday, March 01, 2012 11:16 AM >> >> To: xen-devel@lists.xensource.com >> >> Cc: tim@xen.org; andres@gridcentric.ca; olaf@aepfle.de; >> >> adin@gridcentric.ca; wei.wang2@amd.com; hongkaixing@huawei.com >> >> Subject: [PATCH 0 of 3] RFC Paging support for AMD NPT V2 >> >> >> >> There has been some progress, but still no joy. Definitely not >> intended >> >> for >> >> inclusion at this point. >> >> >> >> Tim, Wei, I added a Xen command line toggle to disable IOMMU and P2M >> >> table >> >> sharing. >> >> >> >> Tim, I verified that changes to p2m-pt.c don''t break shadow mode >> (64bit >> >> hypervisor and Win 7 guest). >> >> >> >> Hongkaixing, I incorporated your suggestion in patch 2, so I should >> add >> >> your >> >> Signed-off-by eventually. Please review. >> > >> > I have checked the code, it looks like OK. But I don''t have AMD >> machine >> > to run it. >> > We used almost the same code in a AMD server months before, and it >> > worked fine. >> > The only difference is we have not changed the clipped invalid mfn >> to >> > INVALID_MFN, just like EPT. >> >> That is most interesting. Can you elaborate? do you have functional >> paging >> on AMD since "months before"? If you do, can you please post it to >> xen-devel for inclusion? >> >> Thanks a lot, >> Andres > > All what we have done is shown below. > The code is based on xen 4.0.2, it looks informal, but it works fine when > xenpaging target is set at arbitrary number.. > Now we do not have AMD platform to validate, so I am not sure it can work > under unstable code.Thanks. That''s actually great. The code is roughly the same, and it shows we''re on a good lead. Cheers! Andres> > Index: E:/xen-4.0.2/xen/arch/x86/mm/p2m.c > ==================================================================> --- E:/xen-4.0.2/xen/arch/x86/mm/p2m.c (revision 30771) > +++ E:/xen-4.0.2/xen/arch/x86/mm/p2m.c (revision 30772) > @@ -77,6 +77,13 @@ > #define SUPERPAGE_PAGES (1UL << 9) > #define superpage_aligned(_x) (((_x)&(SUPERPAGE_PAGES-1))==0) > > +#ifdef __x86_64__ > +/* there are only 40bits for pfn in PTE */ > +#define MASK_MFN(x) ( x & ((1UL << 40) -1 )) > +#else > +#define MASK_MFN(x) ( x ) > +#endif > + > static unsigned long p2m_type_to_flags(p2m_type_t t) > { > unsigned long flags; > @@ -1306,6 +1313,8 @@ > > if ( mfn_valid(mfn) || (p2mt == p2m_mmio_direct) ) > entry_content = l1e_from_pfn(mfn_x(mfn), > p2m_type_to_flags(p2mt)); > + else if ( p2m_is_paging( p2mt ) ) > + entry_content = l1e_from_pfn( MASK_MFN(mfn_x(mfn)), > p2m_type_to_flags(p2mt)); > else > entry_content = l1e_empty(); > > @@ -1370,6 +1379,7 @@ > paddr_t addr = ((paddr_t)gfn) << PAGE_SHIFT; > l2_pgentry_t *l2e; > l1_pgentry_t *l1e; > + unsigned long flags; > > ASSERT(paging_mode_translate(d)); > > @@ -1455,7 +1465,8 @@ > l1e = map_domain_page(mfn_x(mfn)); > l1e += l1_table_offset(addr); > pod_retry_l1: > - if ( (l1e_get_flags(*l1e) & _PAGE_PRESENT) == 0 ) > + flags = l1e_get_flags(*l1e); > + if ((( flags & _PAGE_PRESENT) == 0) && > (!p2m_is_paging(p2m_flags_to_type(flags)))) > { > /* PoD: Try to populate */ > if ( p2m_flags_to_type(l1e_get_flags(*l1e)) => p2m_populate_on_demand ) > Index: E: /xen-4.0.2/xen/arch/x86/mm/mem_event.c > ==================================================================> --- E:/xen-4.0.2/xen/arch/x86/mm/mem_event.c (revision 30771) > +++ E:/xen-4.0.2/xen/arch/x86/mm/mem_event.c (revision 30772) > @@ -231,10 +231,9 @@ > if ( d->mem_event.ring_page ) > break; > > - /* Currently only EPT is supported */ > + /* Both AMD NPT and intel EPT are supported */ > rc = -ENODEV; > - if ( !(is_hvm_domain(d) && d->arch.hvm_domain.hap_enabled && > - (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL)) ) > + if ( !(is_hvm_domain(d) && d->arch.hvm_domain.hap_enabled )) > break; > > /* Get MFN of ring page */ >> >> > >> > >> >> >> >> Olaf, I do not see errors when mapping pages prior to eviction. Let >> me >> >> know how >> >> it goes for you. >> >> >> >> If I unleash xenpaging on a domain for a non-ambitious target (64 >> MiBs >> >> paged >> >> out for one minute), we sail by just fine. But if I kick things up a >> >> notch, >> >> still dying on an VMEXIT_SHUTDOWN (a.k.a. triple fault). Not sure at >> all >> >> what >> >> to do next. >> >> >> >> Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> >> >> Signed-off-by: Adin Scannell <adin@scannell.ca> >> >> >> >> xen/drivers/passthrough/iommu.c | 2 + >> >> xen/arch/x86/mm/p2m-pt.c | 56 >> >> ++++++++++++++++++++++++++++------------ >> >> xen/arch/x86/mm/mem_event.c | 7 +++- >> >> xen/arch/x86/mm/mem_sharing.c | 7 +++++ >> >> 4 files changed, 53 insertions(+), 19 deletions(-) >> > >> > > > >
Tim Deegan
2012-Mar-08 13:30 UTC
Re: [PATCH 3 of 3] x86/mm: Enable paging and sharing in AMD NPT mode
At 22:15 -0500 on 29 Feb (1330553758), Andres Lagar-Cavilla wrote:> xen/arch/x86/mm/mem_event.c | 7 +++++-- > xen/arch/x86/mm/mem_sharing.c | 7 +++++++ > 2 files changed, 12 insertions(+), 2 deletions(-) > > > Both features are mutually exclusive with sharing iommu and p2m tables. > > Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> > Signed-off-by: Adin Scannell <adin@scannell.ca> > > diff -r d6c3c77ad749 -r 4c6bee5a191a xen/arch/x86/mm/mem_event.c > --- a/xen/arch/x86/mm/mem_event.c > +++ b/xen/arch/x86/mm/mem_event.c > @@ -550,8 +550,11 @@ int mem_event_domctl(struct domain *d, x > if ( !hap_enabled(d) ) > break; > > - /* Currently only EPT is supported */ > - if ( boot_cpu_data.x86_vendor != X86_VENDOR_INTEL ) > + /* Currently EPT or AMD with no iommu/hap page table sharing are > + * supported */ > + if ( !((boot_cpu_data.x86_vendor == X86_VENDOR_INTEL) || > + ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) && > + !iommu_hap_pt_share)) )AFAICT p2m tricks are not going to work with IOMMU/passthrough at all, since there''s no mechanism to fault and retry. So the interlock should really be against need_iommu() or similar. (And vice versa, to stop a device being passed through to a VM that has sharing/paging/events enabled). Cheers, Tim
Tim Deegan
2012-Mar-08 14:15 UTC
Re: [PATCH 2 of 3] x86/mm: Teach paging to page table-based p2m
At 22:15 -0500 on 29 Feb (1330553757), Andres Lagar-Cavilla wrote:> --- a/xen/arch/x86/mm/p2m-pt.c > +++ b/xen/arch/x86/mm/p2m-pt.c > @@ -53,6 +53,20 @@ > #define P2M_BASE_FLAGS \ > (_PAGE_PRESENT | _PAGE_USER | _PAGE_DIRTY | _PAGE_ACCESSED) > > +#ifdef __x86_64__ > +/* l1e_from_pfn is not designed to have INVALID_MFN stored. The 0xff..ff > + * value tramples over the higher-order bits used for flags (NX, p2mt, > + * etc.) This happens for paging entries. Thus we do this clip/unclip > + * juggle for l1 entries only (no paging superpages!) */ > +#define EFF_MFN_WIDTH (PADDR_BITS-PAGE_SHIFT) /* 40 bits */ > +#define clipped_mfn(mfn) ((mfn) & ((1UL << EFF_MFN_WIDTH) - 1)) > +#define unclip_mfn(mfn) (((mfn) == clipped_mfn(INVALID_MFN)) ? \ > + INVALID_MFN : (mfn)) > +#else > +#define clipped_mfn(mfn) (mfn) > +#define unclip_mfn(mfn) (mfn) > +#endif /* __x86_64__ */Hmmm. If we need to have this, we should probably have it in the main l*_from_pfn and l*_get_pfn code rather than needing to scatter it around in the callers. And we need a check in the e820 map to make sure we don''t ever use MFN 0xffffffff (once CPUs start supporting it). The alternative would be to add another type so we don''t have to store INVALID_MFN in p2m_ram_paging_in entries. Cheers, Tim.
Andres Lagar-Cavilla
2012-Mar-14 19:12 UTC
Re: [PATCH 2 of 3] x86/mm: Teach paging to page table-based p2m
> At 22:15 -0500 on 29 Feb (1330553757), Andres Lagar-Cavilla wrote: >> --- a/xen/arch/x86/mm/p2m-pt.c >> +++ b/xen/arch/x86/mm/p2m-pt.c >> @@ -53,6 +53,20 @@ >> #define P2M_BASE_FLAGS \ >> (_PAGE_PRESENT | _PAGE_USER | _PAGE_DIRTY | _PAGE_ACCESSED) >> >> +#ifdef __x86_64__ >> +/* l1e_from_pfn is not designed to have INVALID_MFN stored. The >> 0xff..ff >> + * value tramples over the higher-order bits used for flags (NX, p2mt, >> + * etc.) This happens for paging entries. Thus we do this clip/unclip >> + * juggle for l1 entries only (no paging superpages!) */ >> +#define EFF_MFN_WIDTH (PADDR_BITS-PAGE_SHIFT) /* 40 bits */ >> +#define clipped_mfn(mfn) ((mfn) & ((1UL << EFF_MFN_WIDTH) - 1)) >> +#define unclip_mfn(mfn) (((mfn) == clipped_mfn(INVALID_MFN)) ? \ >> + INVALID_MFN : (mfn)) >> +#else >> +#define clipped_mfn(mfn) (mfn) >> +#define unclip_mfn(mfn) (mfn) >> +#endif /* __x86_64__ */ > > Hmmm. If we need to have this, we should probably have it in the main > l*_from_pfn and l*_get_pfn code rather than needing to scatter it around > in the callers. And we need a check in the e820 map to make sure we > don''t ever use MFN 0xffffffff (once CPUs start supporting it). > > The alternative would be to add another type so we don''t have to store > INVALID_MFN in p2m_ram_paging_in entries.A lot of callers store INVALID_MFN into p2m entries (clear_mmio_p2m_entry, p2m_remove_page, more). For all of them, as well as for paging eviction, what matters is the type stored, not the mfn. That is why retrieving the INVALID_MFN is not the problem. The ept code itself clips the INVALID_MFN (typecast to bitfield in ept_entry union) and everything seems to work fine when returning the clipped INVALID_MFN: no one actually cares about that mfn. Because of that, I believe it''s neither necessary to unclip on the extraction path, nor to take any special precautions in the e820. But for p2m-pt, all the callers that set INVALID_MFN seem to work purely by chance. In all cases INVALID_MFN will trample over the higher order bits that are supposed to store the type. When reading the entry, the p2m-pt code will not understand, default to p2m_mmio_dm type, and that seems to be good enough (but not good enough when the type was supposed to be paged_out). So, I could submit one patch to clip the INVALID_MFN for p2m-pt in the 4k page case of set_entry. That is the only place where we need it, and avoids subtly changing semantics for a zillion other callers. Or, we could change the paging code to store _mfn(0). This is the way of PoD. I get the feeling George got lucky, or knew about this all along :) The key, again, is not to trample over the high order bits. Then, the rest of this patch, adding if''s and changing asserts, can be dealt with separately. Thanks, Andres> > Cheers, > > Tim. >
Tim Deegan
2012-Mar-15 10:52 UTC
Re: [PATCH 2 of 3] x86/mm: Teach paging to page table-based p2m
At 12:12 -0700 on 14 Mar (1331727126), Andres Lagar-Cavilla wrote:> > Hmmm. If we need to have this, we should probably have it in the main > > l*_from_pfn and l*_get_pfn code rather than needing to scatter it around > > in the callers. And we need a check in the e820 map to make sure we > > don''t ever use MFN 0xffffffff (once CPUs start supporting it). > > > > The alternative would be to add another type so we don''t have to store > > INVALID_MFN in p2m_ram_paging_in entries. > > A lot of callers store INVALID_MFN into p2m entries (clear_mmio_p2m_entry, > p2m_remove_page, more). For all of them, as well as for paging eviction, > what matters is the type stored, not the mfn. > > That is why retrieving the INVALID_MFN is not the problem. The ept code > itself clips the INVALID_MFN (typecast to bitfield in ept_entry union) and > everything seems to work fine when returning the clipped INVALID_MFN: no > one actually cares about that mfn. > > Because of that, I believe it''s neither necessary to unclip on the > extraction path, nor to take any special precautions in the e820.Righto. In that case, I''d be happy with just clipping MFNs and not trying to unpack them. But I think it should happen in the main pte-building macros, not scattered around the p2m code. It should just be a matter of using PADDR_MASK in the right place. Cheers, Tim.
Andres Lagar-Cavilla
2012-Mar-15 15:46 UTC
Re: [PATCH 2 of 3] x86/mm: Teach paging to page table-based p2m
> At 12:12 -0700 on 14 Mar (1331727126), Andres Lagar-Cavilla wrote: >> > Hmmm. If we need to have this, we should probably have it in the main >> > l*_from_pfn and l*_get_pfn code rather than needing to scatter it >> around >> > in the callers. And we need a check in the e820 map to make sure we >> > don''t ever use MFN 0xffffffff (once CPUs start supporting it). >> > >> > The alternative would be to add another type so we don''t have to store >> > INVALID_MFN in p2m_ram_paging_in entries. >> >> A lot of callers store INVALID_MFN into p2m entries >> (clear_mmio_p2m_entry, >> p2m_remove_page, more). For all of them, as well as for paging eviction, >> what matters is the type stored, not the mfn. >> >> That is why retrieving the INVALID_MFN is not the problem. The ept code >> itself clips the INVALID_MFN (typecast to bitfield in ept_entry union) >> and >> everything seems to work fine when returning the clipped INVALID_MFN: no >> one actually cares about that mfn. >> >> Because of that, I believe it''s neither necessary to unclip on the >> extraction path, nor to take any special precautions in the e820. > > Righto. In that case, I''d be happy with just clipping MFNs and not > trying to unpack them. But I think it should happen in the main > pte-building macros, not scattered around the p2m code. It should just > be a matter of using PADDR_MASK in the right place.Something along these lines? (RFC, not tested yet) Andres # HG changeset patch # Parent e490b3ca615d0141c1d97a89c1880dc8dc8f0783 Clip mfn to allowable width when building a PTE. Otherwise, INVALID_MFN tramples over high order bits used for additional flags. Signed-off-by: Andres Lagar-Cavilla <andres@lagarcavilla.org> diff -r e490b3ca615d xen/include/asm-x86/page.h --- a/xen/include/asm-x86/page.h +++ b/xen/include/asm-x86/page.h @@ -107,13 +107,17 @@ /* Construct a pte from a pfn and access flags. */ #define l1e_from_pfn(pfn, flags) \ - ((l1_pgentry_t) { ((intpte_t)(pfn) << PAGE_SHIFT) | put_pte_flags(flags) }) + ((l1_pgentry_t) { ((intpte_t)((pfn) & (PADDR_MASK >> PAGE_SHIFT)) << \ + PAGE_SHIFT) | put_pte_flags(flags) }) #define l2e_from_pfn(pfn, flags) \ - ((l2_pgentry_t) { ((intpte_t)(pfn) << PAGE_SHIFT) | put_pte_flags(flags) }) + ((l2_pgentry_t) { ((intpte_t)((pfn) & (PADDR_MASK >> PAGE_SHIFT)) << \ + PAGE_SHIFT) | put_pte_flags(flags) }) #define l3e_from_pfn(pfn, flags) \ - ((l3_pgentry_t) { ((intpte_t)(pfn) << PAGE_SHIFT) | put_pte_flags(flags) }) + ((l3_pgentry_t) { ((intpte_t)((pfn) & (PADDR_MASK >> PAGE_SHIFT)) << \ + PAGE_SHIFT) | put_pte_flags(flags) }) #define l4e_from_pfn(pfn, flags) \ - ((l4_pgentry_t) { ((intpte_t)(pfn) << PAGE_SHIFT) | put_pte_flags(flags) }) + ((l4_pgentry_t) { ((intpte_t)((pfn) & (PADDR_MASK >> PAGE_SHIFT)) << \ + PAGE_SHIFT) | put_pte_flags(flags) }) /* Construct a pte from a physical address and access flags. */ #ifndef __ASSEMBLY__> > Cheers, > > Tim. >
Tim Deegan
2012-Mar-15 17:21 UTC
Re: [PATCH 2 of 3] x86/mm: Teach paging to page table-based p2m
At 08:46 -0700 on 15 Mar (1331801170), Andres Lagar-Cavilla wrote:> > Righto. In that case, I''d be happy with just clipping MFNs and not > > trying to unpack them. But I think it should happen in the main > > pte-building macros, not scattered around the p2m code. It should just > > be a matter of using PADDR_MASK in the right place. > > Something along these lines? (RFC, not tested yet) > Andres > > /* Construct a pte from a pfn and access flags. */ > #define l1e_from_pfn(pfn, flags) \ > - ((l1_pgentry_t) { ((intpte_t)(pfn) << PAGE_SHIFT) | put_pte_flags(flags) }) > + ((l1_pgentry_t) { ((intpte_t)((pfn) & (PADDR_MASK >> PAGE_SHIFT)) << \ > + PAGE_SHIFT) | put_pte_flags(flags) })Yes, that''s the idea. I think> + ((l1_pgentry_t) { (((intpte_t)(pfn) << PAGE_SHIFT) & PADDR_MASK) \ > + | put_pte_flags(flags) })is a little neater, maybe? In any case, I''d like Keir''s ack on this, since it will affect all the PV pagetable code too (hopefully in a trivial and correct way). Tim.
Keir Fraser
2012-Mar-15 17:34 UTC
Re: [PATCH 2 of 3] x86/mm: Teach paging to page table-based p2m
On 15/03/2012 17:21, "Tim Deegan" <tim@xen.org> wrote:> At 08:46 -0700 on 15 Mar (1331801170), Andres Lagar-Cavilla wrote: >>> Righto. In that case, I''d be happy with just clipping MFNs and not >>> trying to unpack them. But I think it should happen in the main >>> pte-building macros, not scattered around the p2m code. It should just >>> be a matter of using PADDR_MASK in the right place. >> >> Something along these lines? (RFC, not tested yet) >> Andres >> >> /* Construct a pte from a pfn and access flags. */ >> #define l1e_from_pfn(pfn, flags) \ >> - ((l1_pgentry_t) { ((intpte_t)(pfn) << PAGE_SHIFT) | put_pte_flags(flags) >> }) >> + ((l1_pgentry_t) { ((intpte_t)((pfn) & (PADDR_MASK >> PAGE_SHIFT)) << \ >> + PAGE_SHIFT) | put_pte_flags(flags) }) > > Yes, that''s the idea. I think > >> + ((l1_pgentry_t) { (((intpte_t)(pfn) << PAGE_SHIFT) & PADDR_MASK) \ >> + | put_pte_flags(flags) }) > > is a little neater, maybe? > > In any case, I''d like Keir''s ack on this, since it will affect all the > PV pagetable code too (hopefully in a trivial and correct way).Fine by me. I don''t think any existing users should be intentionally stuffing non-address bits into a pte via the pfn parameter of a pte-constructor. I wonder though whether you should have your own constructor, or wrapper round the generic constructor, for laundering your filthy nasty pfns? ;-) -- Keir> Tim.