From: Zhang Xiantao <xiantao.zhang@intel.com> With virtual EPT support, L1 hyerpvisor can use EPT hardware for L2 guest''s memory virtualization. In this way, L2 guest''s performance can be improved sharply. According to our testing, some benchmarks can show > 5x performance gain. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Zhang Xiantao (11): nestedhap: Change hostcr3 and p2m->cr3 to meaningful words nestedhap: Change nested p2m''s walker to vendor-specific nested_ept: Implement guest ept''s walker nested_ept: Add permission check for success case EPT: Make ept data structure or operations neutral nEPT: Try to enable EPT paging for L2 guest. nEPT: Sync PDPTR fields if L2 guest in PAE paging mode nEPT: Use minimal permission for nested p2m. nEPT: handle invept instruction from L1 VMM nEPT: expost EPT capablity to L1 VMM nVMX: Expose VPID capability to nested VMM. xen/arch/x86/hvm/hvm.c | 7 +- xen/arch/x86/hvm/svm/nestedsvm.c | 31 +++ xen/arch/x86/hvm/svm/svm.c | 3 +- xen/arch/x86/hvm/vmx/vmcs.c | 2 +- xen/arch/x86/hvm/vmx/vmx.c | 76 +++++--- xen/arch/x86/hvm/vmx/vvmx.c | 208 ++++++++++++++++++- xen/arch/x86/mm/guest_walk.c | 12 +- xen/arch/x86/mm/hap/Makefile | 1 + xen/arch/x86/mm/hap/nested_ept.c | 345 +++++++++++++++++++++++++++++++ xen/arch/x86/mm/hap/nested_hap.c | 79 +++---- xen/arch/x86/mm/mm-locks.h | 2 +- xen/arch/x86/mm/p2m-ept.c | 96 +++++++-- xen/arch/x86/mm/p2m.c | 44 +++-- xen/arch/x86/mm/shadow/multi.c | 2 +- xen/include/asm-x86/guest_pt.h | 8 + xen/include/asm-x86/hvm/hvm.h | 9 +- xen/include/asm-x86/hvm/nestedhvm.h | 1 + xen/include/asm-x86/hvm/svm/nestedsvm.h | 3 + xen/include/asm-x86/hvm/vmx/vmcs.h | 31 ++- xen/include/asm-x86/hvm/vmx/vmx.h | 6 +- xen/include/asm-x86/hvm/vmx/vvmx.h | 29 +++- xen/include/asm-x86/p2m.h | 17 +- 22 files changed, 859 insertions(+), 153 deletions(-) create mode 100644 xen/arch/x86/mm/hap/nested_ept.c
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 01/11] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words
From: Zhang Xiantao <xiantao.zhang@intel.com> VMX doesn''t have the concept about host cr3 for nested p2m, and only SVM has, so change it to netural words. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/hvm.c | 6 +++--- xen/arch/x86/hvm/svm/svm.c | 2 +- xen/arch/x86/hvm/vmx/vmx.c | 2 +- xen/arch/x86/hvm/vmx/vvmx.c | 2 +- xen/arch/x86/mm/hap/nested_hap.c | 15 ++++++++------- xen/arch/x86/mm/mm-locks.h | 2 +- xen/arch/x86/mm/p2m.c | 26 +++++++++++++------------- xen/include/asm-x86/hvm/hvm.h | 4 ++-- xen/include/asm-x86/hvm/vmx/vvmx.h | 2 +- xen/include/asm-x86/p2m.h | 16 ++++++++-------- 10 files changed, 39 insertions(+), 38 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index b6026d7..85bc9be 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4536,10 +4536,10 @@ uint64_t nhvm_vcpu_guestcr3(struct vcpu *v) return -EOPNOTSUPP; } -uint64_t nhvm_vcpu_hostcr3(struct vcpu *v) +uint64_t nhvm_vcpu_p2m_base(struct vcpu *v) { - if (hvm_funcs.nhvm_vcpu_hostcr3) - return hvm_funcs.nhvm_vcpu_hostcr3(v); + if (hvm_funcs.nhvm_vcpu_p2m_base) + return hvm_funcs.nhvm_vcpu_p2m_base(v); return -EOPNOTSUPP; } diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 4c4abfc..6c469ec 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2003,7 +2003,7 @@ static struct hvm_function_table __read_mostly svm_function_table = { .nhvm_vcpu_vmexit = nsvm_vcpu_vmexit_inject, .nhvm_vcpu_vmexit_trap = nsvm_vcpu_vmexit_trap, .nhvm_vcpu_guestcr3 = nsvm_vcpu_guestcr3, - .nhvm_vcpu_hostcr3 = nsvm_vcpu_hostcr3, + .nhvm_vcpu_p2m_base = nsvm_vcpu_hostcr3, .nhvm_vcpu_asid = nsvm_vcpu_asid, .nhvm_vmcx_guest_intercepts_trap = nsvm_vmcb_guest_intercepts_trap, .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled, diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 9fb9562..47d8ca6 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1504,7 +1504,7 @@ static struct hvm_function_table __read_mostly vmx_function_table = { .nhvm_vcpu_destroy = nvmx_vcpu_destroy, .nhvm_vcpu_reset = nvmx_vcpu_reset, .nhvm_vcpu_guestcr3 = nvmx_vcpu_guestcr3, - .nhvm_vcpu_hostcr3 = nvmx_vcpu_hostcr3, + .nhvm_vcpu_p2m_base = nvmx_vcpu_eptp_base, .nhvm_vcpu_asid = nvmx_vcpu_asid, .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception, .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap, diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index b005816..6d1a736 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -94,7 +94,7 @@ uint64_t nvmx_vcpu_guestcr3(struct vcpu *v) return 0; } -uint64_t nvmx_vcpu_hostcr3(struct vcpu *v) +uint64_t nvmx_vcpu_eptp_base(struct vcpu *v) { /* TODO */ ASSERT(0); diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c index 317875d..f9a5edc 100644 --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -48,9 +48,10 @@ * 1. If #NPF is from L1 guest, then we crash the guest VM (same as old * code) * 2. If #NPF is from L2 guest, then we continue from (3) - * 3. Get h_cr3 from L1 guest. Map h_cr3 into L0 hypervisor address space. - * 4. Walk the h_cr3 page table - * 5. - if not present, then we inject #NPF back to L1 guest and + * 3. Get np2m base from L1 guest. Map np2m base into L0 hypervisor address space. + * 4. Walk the np2m''s page table + * 5. - if not present or permission check failure, then we inject #NPF back to + * L1 guest and * re-launch L1 guest (L1 guest will either treat this #NPF as MMIO, * or fix its p2m table for L2 guest) * 6. - if present, then we will get the a new translated value L1-GPA @@ -89,7 +90,7 @@ nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, if (old_flags & _PAGE_PRESENT) flush_tlb_mask(p2m->dirty_cpumask); - + paging_unlock(d); } @@ -110,7 +111,7 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m, /* If this p2m table has been flushed or recycled under our feet, * leave it alone. We''ll pick up the right one as we try to * vmenter the guest. */ - if ( p2m->cr3 == nhvm_vcpu_hostcr3(v) ) + if ( p2m->np2m_base == nhvm_vcpu_p2m_base(v) ) { unsigned long gfn, mask; mfn_t mfn; @@ -186,7 +187,7 @@ nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, uint32_t pfec; unsigned long nested_cr3, gfn; - nested_cr3 = nhvm_vcpu_hostcr3(v); + nested_cr3 = nhvm_vcpu_p2m_base(v); pfec = PFEC_user_mode | PFEC_page_present; if (access_w) @@ -221,7 +222,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, p2m_type_t p2mt_10; p2m = p2m_get_hostp2m(d); /* L0 p2m */ - nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_hostcr3(v)); + nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v)); /* walk the L1 P2M table */ rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21, diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h index 3700e32..1817f81 100644 --- a/xen/arch/x86/mm/mm-locks.h +++ b/xen/arch/x86/mm/mm-locks.h @@ -249,7 +249,7 @@ declare_mm_order_constraint(per_page_sharing) * A per-domain lock that protects the mapping from nested-CR3 to * nested-p2m. In particular it covers: * - the array of nested-p2m tables, and all LRU activity therein; and - * - setting the "cr3" field of any p2m table to a non-CR3_EADDR value. + * - setting the "cr3" field of any p2m table to a non-P2M_BASE_EAADR value. * (i.e. assigning a p2m table to be the shadow of that cr3 */ /* PoD lock (per-p2m-table) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index e351942..62c2d78 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -81,7 +81,7 @@ static void p2m_initialise(struct domain *d, struct p2m_domain *p2m) p2m->domain = d; p2m->default_access = p2m_access_rwx; - p2m->cr3 = CR3_EADDR; + p2m->np2m_base = P2M_BASE_EADDR; if ( hap_enabled(d) && cpu_has_vmx ) ept_p2m_init(p2m); @@ -1445,7 +1445,7 @@ p2m_flush_table(struct p2m_domain *p2m) ASSERT(page_list_empty(&p2m->pod.single)); /* This is no longer a valid nested p2m for any address space */ - p2m->cr3 = CR3_EADDR; + p2m->np2m_base = P2M_BASE_EADDR; /* Zap the top level of the trie */ top = mfn_to_page(pagetable_get_mfn(p2m_get_pagetable(p2m))); @@ -1483,7 +1483,7 @@ p2m_flush_nestedp2m(struct domain *d) } struct p2m_domain * -p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3) +p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base) { /* Use volatile to prevent gcc to cache nv->nv_p2m in a cpu register as * this may change within the loop by an other (v)cpu. @@ -1492,8 +1492,8 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3) struct domain *d; struct p2m_domain *p2m; - /* Mask out low bits; this avoids collisions with CR3_EADDR */ - cr3 &= ~(0xfffull); + /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */ + np2m_base &= ~(0xfffull); if (nv->nv_flushp2m && nv->nv_p2m) { nv->nv_p2m = NULL; @@ -1505,14 +1505,14 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3) if ( p2m ) { p2m_lock(p2m); - if ( p2m->cr3 == cr3 || p2m->cr3 == CR3_EADDR ) + if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR ) { nv->nv_flushp2m = 0; p2m_getlru_nestedp2m(d, p2m); nv->nv_p2m = p2m; - if (p2m->cr3 == CR3_EADDR) + if (p2m->np2m_base == P2M_BASE_EADDR) hvm_asid_flush_vcpu(v); - p2m->cr3 = cr3; + p2m->np2m_base = np2m_base; cpumask_set_cpu(v->processor, p2m->dirty_cpumask); p2m_unlock(p2m); nestedp2m_unlock(d); @@ -1527,7 +1527,7 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3) p2m_flush_table(p2m); p2m_lock(p2m); nv->nv_p2m = p2m; - p2m->cr3 = cr3; + p2m->np2m_base = np2m_base; nv->nv_flushp2m = 0; hvm_asid_flush_vcpu(v); cpumask_set_cpu(v->processor, p2m->dirty_cpumask); @@ -1543,7 +1543,7 @@ p2m_get_p2m(struct vcpu *v) if (!nestedhvm_is_n2(v)) return p2m_get_hostp2m(v->domain); - return p2m_get_nestedp2m(v, nhvm_vcpu_hostcr3(v)); + return p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v)); } unsigned long paging_gva_to_gfn(struct vcpu *v, @@ -1561,15 +1561,15 @@ unsigned long paging_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m; const struct paging_mode *mode; uint32_t pfec_21 = *pfec; - uint64_t ncr3 = nhvm_vcpu_hostcr3(v); + uint64_t np2m_base = nhvm_vcpu_p2m_base(v); /* translate l2 guest va into l2 guest gfn */ - p2m = p2m_get_nestedp2m(v, ncr3); + p2m = p2m_get_nestedp2m(v, np2m_base); mode = paging_get_nestedmode(v); gfn = mode->gva_to_gfn(v, p2m, va, pfec); /* translate l2 guest gfn into l1 guest gfn */ - return hostmode->p2m_ga_to_gfn(v, hostp2m, ncr3, + return hostmode->p2m_ga_to_gfn(v, hostp2m, np2m_base, gfn << PAGE_SHIFT, &pfec_21, NULL); } diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index fdb0f58..d3535b6 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -170,7 +170,7 @@ struct hvm_function_table { uint64_t exitcode); int (*nhvm_vcpu_vmexit_trap)(struct vcpu *v, struct hvm_trap *trap); uint64_t (*nhvm_vcpu_guestcr3)(struct vcpu *v); - uint64_t (*nhvm_vcpu_hostcr3)(struct vcpu *v); + uint64_t (*nhvm_vcpu_p2m_base)(struct vcpu *v); uint32_t (*nhvm_vcpu_asid)(struct vcpu *v); int (*nhvm_vmcx_guest_intercepts_trap)(struct vcpu *v, unsigned int trapnr, int errcode); @@ -475,7 +475,7 @@ uint64_t nhvm_vcpu_guestcr3(struct vcpu *v); /* returns l1 guest''s cr3 that points to the page table used to * translate l2 guest physical address to l1 guest physical address. */ -uint64_t nhvm_vcpu_hostcr3(struct vcpu *v); +uint64_t nhvm_vcpu_p2m_base(struct vcpu *v); /* returns the asid number l1 guest wants to use to run the l2 guest */ uint32_t nhvm_vcpu_asid(struct vcpu *v); diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index dce2cd8..d97011d 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -99,7 +99,7 @@ int nvmx_vcpu_initialise(struct vcpu *v); void nvmx_vcpu_destroy(struct vcpu *v); int nvmx_vcpu_reset(struct vcpu *v); uint64_t nvmx_vcpu_guestcr3(struct vcpu *v); -uint64_t nvmx_vcpu_hostcr3(struct vcpu *v); +uint64_t nvmx_vcpu_eptp_base(struct vcpu *v); uint32_t nvmx_vcpu_asid(struct vcpu *v); enum hvm_intblk nvmx_intr_blocked(struct vcpu *v); int nvmx_intercepts_exception(struct vcpu *v, diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 907a817..1807ad6 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -197,17 +197,17 @@ struct p2m_domain { struct domain *domain; /* back pointer to domain */ - /* Nested p2ms only: nested-CR3 value that this p2m shadows. - * This can be cleared to CR3_EADDR under the per-p2m lock but + /* Nested p2ms only: nested p2m base value that this p2m shadows. + * This can be cleared to P2M_BASE_EADDR under the per-p2m lock but * needs both the per-p2m lock and the per-domain nestedp2m lock * to set it to any other value. */ -#define CR3_EADDR (~0ULL) - uint64_t cr3; +#define P2M_BASE_EADDR (~0ULL) + uint64_t np2m_base; /* Nested p2ms: linked list of n2pms allocated to this domain. * The host p2m hasolds the head of the list and the np2ms are * threaded on in LRU order. */ - struct list_head np2m_list; + struct list_head np2m_list; /* Host p2m: when this flag is set, don''t flush all the nested-p2m @@ -282,11 +282,11 @@ struct p2m_domain { /* get host p2m table */ #define p2m_get_hostp2m(d) ((d)->arch.p2m) -/* Get p2m table (re)usable for specified cr3. +/* Get p2m table (re)usable for specified np2m base. * Automatically destroys and re-initializes a p2m if none found. - * If cr3 == 0 then v->arch.hvm_vcpu.guest_cr[3] is used. + * If np2m_base == 0 then v->arch.hvm_vcpu.guest_cr[3] is used. */ -struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3); +struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base); /* If vcpu is in host mode then behaviour matches p2m_get_hostp2m(). * If vcpu is in guest mode then behaviour matches p2m_get_nestedp2m(). -- 1.7.1
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 02/11] nestedhap: Change nested p2m''s walker to vendor-specific
From: Zhang Xiantao <xiantao.zhang@intel.com> EPT and NPT adopts differnt formats for each-level entry, so change the walker functions to vendor-specific. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/svm/nestedsvm.c | 31 +++++++++++++++++++++ xen/arch/x86/hvm/svm/svm.c | 1 + xen/arch/x86/hvm/vmx/vmx.c | 3 +- xen/arch/x86/hvm/vmx/vvmx.c | 13 +++++++++ xen/arch/x86/mm/hap/nested_hap.c | 46 +++++++++++-------------------- xen/include/asm-x86/hvm/hvm.h | 5 +++ xen/include/asm-x86/hvm/svm/nestedsvm.h | 3 ++ xen/include/asm-x86/hvm/vmx/vvmx.h | 5 +++ 8 files changed, 76 insertions(+), 31 deletions(-) diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index ed0faa6..5dcb354 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -1171,6 +1171,37 @@ nsvm_vmcb_hap_enabled(struct vcpu *v) return vcpu_nestedsvm(v).ns_hap_enabled; } +/* This function uses L2_gpa to walk the P2M page table in L1. If the + * walk is successful, the translated value is returned in + * L1_gpa. The result value tells what to do next. + */ +int +nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x) +{ + uint32_t pfec; + unsigned long nested_cr3, gfn; + + nested_cr3 = nhvm_vcpu_p2m_base(v); + + pfec = PFEC_user_mode | PFEC_page_present; + if (access_w) + pfec |= PFEC_write_access; + if (access_x) + pfec |= PFEC_insn_fetch; + + /* Walk the guest-supplied NPT table, just as if it were a pagetable */ + gfn = paging_ga_to_gfn_cr3(v, nested_cr3, L2_gpa, &pfec, page_order); + + if ( gfn == INVALID_GFN ) + return NESTEDHVM_PAGEFAULT_INJECT; + + *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK); + return NESTEDHVM_PAGEFAULT_DONE; +} + + enum hvm_intblk nsvm_intr_blocked(struct vcpu *v) { struct nestedsvm *svm = &vcpu_nestedsvm(v); diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 6c469ec..a905764 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2008,6 +2008,7 @@ static struct hvm_function_table __read_mostly svm_function_table = { .nhvm_vmcx_guest_intercepts_trap = nsvm_vmcb_guest_intercepts_trap, .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled, .nhvm_intr_blocked = nsvm_intr_blocked, + .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m, }; void svm_vmexit_handler(struct cpu_user_regs *regs) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 47d8ca6..c67ac59 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1511,7 +1511,8 @@ static struct hvm_function_table __read_mostly vmx_function_table = { .nhvm_intr_blocked = nvmx_intr_blocked, .nhvm_domain_relinquish_resources = nvmx_domain_relinquish_resources, .update_eoi_exit_bitmap = vmx_update_eoi_exit_bitmap, - .virtual_intr_delivery_enabled = vmx_virtual_intr_delivery_enabled + .virtual_intr_delivery_enabled = vmx_virtual_intr_delivery_enabled, + .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m, }; struct hvm_function_table * __init start_vmx(void) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 6d1a736..4495dd6 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1445,6 +1445,19 @@ int nvmx_msr_write_intercept(unsigned int msr, u64 msr_content) return 1; } +/* This function uses L2_gpa to walk the P2M page table in L1. If the + * walk is successful, the translated value is returned in + * L1_gpa. The result value tells what to do next. + */ +int +nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x) +{ + /*TODO:*/ + return 0; +} + void nvmx_idtv_handling(void) { struct vcpu *v = current; diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c index f9a5edc..8787c91 100644 --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -136,6 +136,22 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m, } } +/* This function uses L2_gpa to walk the P2M page table in L1. If the + * walk is successful, the translated value is returned in + * L1_gpa. The result value tells what to do next. + */ +static int +nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x) +{ + ASSERT(hvm_funcs.nhvm_hap_walk_L1_p2m); + + return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order, + access_r, access_w, access_x); +} + + /* This function uses L1_gpa to walk the P2M table in L0 hypervisor. If the * walk is successful, the translated value is returned in L0_gpa. The return * value tells the upper level what to do. @@ -175,36 +191,6 @@ out: return rc; } -/* This function uses L2_gpa to walk the P2M page table in L1. If the - * walk is successful, the translated value is returned in - * L1_gpa. The result value tells what to do next. - */ -static int -nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, - bool_t access_r, bool_t access_w, bool_t access_x) -{ - uint32_t pfec; - unsigned long nested_cr3, gfn; - - nested_cr3 = nhvm_vcpu_p2m_base(v); - - pfec = PFEC_user_mode | PFEC_page_present; - if (access_w) - pfec |= PFEC_write_access; - if (access_x) - pfec |= PFEC_insn_fetch; - - /* Walk the guest-supplied NPT table, just as if it were a pagetable */ - gfn = paging_ga_to_gfn_cr3(v, nested_cr3, L2_gpa, &pfec, page_order); - - if ( gfn == INVALID_GFN ) - return NESTEDHVM_PAGEFAULT_INJECT; - - *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK); - return NESTEDHVM_PAGEFAULT_DONE; -} - /* * The following function, nestedhap_page_fault(), is for steps (3)--(10). * diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index d3535b6..80f07e9 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -183,6 +183,11 @@ struct hvm_function_table { /* Virtual interrupt delivery */ void (*update_eoi_exit_bitmap)(struct vcpu *v, u8 vector, u8 trig); int (*virtual_intr_delivery_enabled)(void); + + /*Walk nested p2m */ + int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x); }; extern struct hvm_function_table hvm_funcs; diff --git a/xen/include/asm-x86/hvm/svm/nestedsvm.h b/xen/include/asm-x86/hvm/svm/nestedsvm.h index fa83023..0c90f30 100644 --- a/xen/include/asm-x86/hvm/svm/nestedsvm.h +++ b/xen/include/asm-x86/hvm/svm/nestedsvm.h @@ -133,6 +133,9 @@ int nsvm_wrmsr(struct vcpu *v, unsigned int msr, uint64_t msr_content); void svm_vmexit_do_clgi(struct cpu_user_regs *regs, struct vcpu *v); void svm_vmexit_do_stgi(struct cpu_user_regs *regs, struct vcpu *v); bool_t nestedsvm_gif_isset(struct vcpu *v); +int nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x); #define NSVM_INTR_NOTHANDLED 3 #define NSVM_INTR_NOTINTERCEPTED 2 diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index d97011d..422f006 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -108,6 +108,11 @@ void nvmx_domain_relinquish_resources(struct domain *d); int nvmx_handle_vmxon(struct cpu_user_regs *regs); int nvmx_handle_vmxoff(struct cpu_user_regs *regs); + +int +nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x); /* * Virtual VMCS layout * -- 1.7.1
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 03/11] nEPT: Implement guest ept''s walker
From: Zhang Xiantao <xiantao.zhang@intel.com> Implment guest EPT PT walker, some logic is based on shadow''s ia32e PT walker. During the PT walking, if the target pages are not in memory, use RETRY mechanism and get a chance to let the target page back. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/hvm.c | 1 + xen/arch/x86/hvm/vmx/vvmx.c | 42 +++++- xen/arch/x86/mm/guest_walk.c | 12 +- xen/arch/x86/mm/hap/Makefile | 1 + xen/arch/x86/mm/hap/nested_ept.c | 327 +++++++++++++++++++++++++++++++++++ xen/arch/x86/mm/hap/nested_hap.c | 2 +- xen/arch/x86/mm/shadow/multi.c | 2 +- xen/include/asm-x86/guest_pt.h | 8 + xen/include/asm-x86/hvm/nestedhvm.h | 1 + xen/include/asm-x86/hvm/vmx/vmcs.h | 1 + xen/include/asm-x86/hvm/vmx/vvmx.h | 14 ++ 11 files changed, 403 insertions(+), 8 deletions(-) create mode 100644 xen/arch/x86/mm/hap/nested_ept.c diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 85bc9be..3400e6b 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1324,6 +1324,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, access_r, access_w, access_x); switch (rv) { case NESTEDHVM_PAGEFAULT_DONE: + case NESTEDHVM_PAGEFAULT_RETRY: return 1; case NESTEDHVM_PAGEFAULT_L1_ERROR: /* An error occured while translating gpa from diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 4495dd6..76cf757 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -906,9 +906,18 @@ static void sync_vvmcs_ro(struct vcpu *v) { int i; struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + void *vvmcs = nvcpu->nv_vvmcx; for ( i = 0; i < ARRAY_SIZE(vmcs_ro_field); i++ ) shadow_to_vvmcs(nvcpu->nv_vvmcx, vmcs_ro_field[i]); + + /* Adjust exit_reason/exit_qualifciation for violation case */ + if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) =+ EXIT_REASON_EPT_VIOLATION ) { + __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept_exit.exit_qual); + __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept_exit.exit_reason); + } } static void load_vvmcs_host_state(struct vcpu *v) @@ -1454,8 +1463,37 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, unsigned int *page_order, bool_t access_r, bool_t access_w, bool_t access_x) { - /*TODO:*/ - return 0; + uint64_t exit_qual = __vmread(EXIT_QUALIFICATION); + uint32_t exit_reason = EXIT_REASON_EPT_VIOLATION; + int rc; + unsigned long gfn; + uint32_t rwx_rights = (access_x << 2) | (access_w << 1) | access_r; + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + + rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn, + &exit_qual, &exit_reason); + switch ( rc ) { + case EPT_TRANSLATE_SUCCEED: + *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK); + rc = NESTEDHVM_PAGEFAULT_DONE; + break; + case EPT_TRANSLATE_VIOLATION: + case EPT_TRANSLATE_MISCONFIG: + rc = NESTEDHVM_PAGEFAULT_INJECT; + nvmx->ept_exit.exit_reason = exit_reason; + nvmx->ept_exit.exit_qual = exit_qual; + break; + case EPT_TRANSLATE_RETRY: + rc = NESTEDHVM_PAGEFAULT_RETRY; + break; + case EPT_TRANSLATE_ERR_PAGE: + rc = NESTEDHVM_PAGEFAULT_L1_ERROR; + break; + default: + gdprintk(XENLOG_ERR, "GUEST EPT translation error!\n"); + } + + return rc; } void nvmx_idtv_handling(void) diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c index 13ea0bb..afbe9db 100644 --- a/xen/arch/x86/mm/guest_walk.c +++ b/xen/arch/x86/mm/guest_walk.c @@ -88,10 +88,11 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty) /* If the map is non-NULL, we leave this function having * acquired an extra ref on mfn_to_page(*mfn) */ -static inline void *map_domain_gfn(struct p2m_domain *p2m, +void *map_domain_gfn(struct p2m_domain *p2m, gfn_t gfn, mfn_t *mfn, p2m_type_t *p2mt, + p2m_query_t *q, uint32_t *rc) { struct page_info *page; @@ -99,7 +100,7 @@ static inline void *map_domain_gfn(struct p2m_domain *p2m, /* Translate the gfn, unsharing if shared */ page = get_page_from_gfn_p2m(p2m->domain, p2m, gfn_x(gfn), p2mt, NULL, - P2M_ALLOC | P2M_UNSHARE); + *q); if ( p2m_is_paging(*p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); @@ -128,7 +129,6 @@ static inline void *map_domain_gfn(struct p2m_domain *p2m, return map; } - /* Walk the guest pagetables, after the manner of a hardware walker. */ /* Because the walk is essentially random, it can cause a deadlock * warning in the p2m locking code. Highly unlikely this is an actual @@ -149,6 +149,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, uint32_t gflags, mflags, iflags, rc = 0; int smep; bool_t pse1G = 0, pse2M = 0; + p2m_query_t qt = P2M_ALLOC | P2M_UNSHARE; perfc_incr(guest_walk); memset(gw, 0, sizeof(*gw)); @@ -188,7 +189,8 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, l3p = map_domain_gfn(p2m, guest_l4e_get_gfn(gw->l4e), &gw->l3mfn, - &p2mt, + &p2mt, + &qt, &rc); if(l3p == NULL) goto out; @@ -249,6 +251,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, guest_l3e_get_gfn(gw->l3e), &gw->l2mfn, &p2mt, + &qt, &rc); if(l2p == NULL) goto out; @@ -322,6 +325,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, guest_l2e_get_gfn(gw->l2e), &gw->l1mfn, &p2mt, + &qt, &rc); if(l1p == NULL) goto out; diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile index 80a6bec..68f2bb5 100644 --- a/xen/arch/x86/mm/hap/Makefile +++ b/xen/arch/x86/mm/hap/Makefile @@ -3,6 +3,7 @@ obj-y += guest_walk_2level.o obj-y += guest_walk_3level.o obj-$(x86_64) += guest_walk_4level.o obj-y += nested_hap.o +obj-y += nested_ept.o guest_walk_%level.o: guest_walk.c Makefile $(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@ diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c new file mode 100644 index 0000000..da868e7 --- /dev/null +++ b/xen/arch/x86/mm/hap/nested_ept.c @@ -0,0 +1,327 @@ +/* + * nested_ept.c: Handling virtulized EPT for guest in nested case. + * + * pt walker logic based on arch/x86/mm/guest_walk.c + * Copyright (c) 2012, Intel Corporation + * Xiantao Zhang <xiantao.zhang@intel.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ +#include <asm/domain.h> +#include <asm/page.h> +#include <asm/paging.h> +#include <asm/p2m.h> +#include <asm/mem_event.h> +#include <public/mem_event.h> +#include <asm/mem_sharing.h> +#include <xen/event.h> +#include <asm/hap.h> +#include <asm/hvm/support.h> + +#include <asm/hvm/nestedhvm.h> + +#include "private.h" + +#include <asm/hvm/vmx/vmx.h> +#include <asm/hvm/vmx/vvmx.h> + +/* EPT always use 4-level paging structure*/ +#define GUEST_PAGING_LEVELS 4 +#include <asm/guest_pt.h> + +/* For EPT''s walker reserved bits and EMT check */ +#define EPT_MUST_RSV_BITS (((1ull << PADDR_BITS) -1) & \ + ~((1ull << paddr_bits) - 1)) + + +#define EPT_EMT_WB 6 +#define EPT_EMT_UC 0 + +#define NEPT_VPID_CAP_BITS 0 + +#define NEPT_1G_ENTRY_FLAG (1 << 11) +#define NEPT_2M_ENTRY_FLAG (1 << 10) +#define NEPT_4K_ENTRY_FLAG (1 << 9) + +/* Always expose 1G and 2M capability to guest, + so don''t need additional check */ +bool_t nept_sp_entry(uint64_t entry) +{ + return !!(entry & EPTE_SUPER_PAGE_MASK); +} + +static bool_t nept_rsv_bits_check(uint64_t entry, uint32_t level) +{ + uint64_t rsv_bits = EPT_MUST_RSV_BITS; + + switch ( level ){ + case 1: + break; + case 2 ... 3: + if (nept_sp_entry(entry)) + rsv_bits |= ((1ull << (9 * (level -1 ))) -1) << PAGE_SHIFT; + else + rsv_bits |= 0xfull << 3; + break; + case 4: + rsv_bits |= 0xf8; + break; + default: + printk("Unsupported EPT paging level: %d\n", level); + } + if ( ((entry & rsv_bits) ^ rsv_bits) == rsv_bits ) + return 0; + return 1; +} + +/* EMT checking*/ +static bool_t nept_emt_bits_check(uint64_t entry, uint32_t level) +{ + ept_entry_t e; + e.epte = entry; + if ( e.sp || level == 1 ) { + if ( e.emt == 2 || e.emt == 3 || e.emt == 7 ) + return 1; + } + return 0; +} + +static bool_t nept_rwx_bits_check(uint64_t entry) { + /*write only or write/execute only*/ + uint8_t rwx_bits = entry & 0x7; + + if ( rwx_bits == 2 || rwx_bits == 6) + return 1; + if ( rwx_bits == 4 && !(NEPT_VPID_CAP_BITS & + VMX_EPT_EXEC_ONLY_SUPPORTED)) + return 1; + return 0; +} + +/* nept''s misconfiguration check */ +static bool_t nept_misconfiguration_check(uint64_t entry, uint32_t level) +{ + return (nept_rsv_bits_check(entry, level) || + nept_emt_bits_check(entry, level) || + nept_rwx_bits_check(entry)); +} + +static bool_t nept_present_check(uint64_t entry) +{ + if (entry & 0x7) + return 1; + return 0; +} + +uint64_t nept_get_ept_vpid_cap(void) +{ + /*TODO: exposed ept and vpid features*/ + return NEPT_VPID_CAP_BITS; +} + +static uint32_t +nept_walk_tables(struct vcpu *v, unsigned long l2ga, walk_t *gw) +{ + p2m_type_t p2mt; + uint32_t rc = 0, ret = 0, gflags; + struct domain *d = v->domain; + struct p2m_domain *p2m = d->arch.p2m; + gfn_t base_gfn = _gfn(nhvm_vcpu_p2m_base(v) >> PAGE_SHIFT); + p2m_query_t qt = P2M_ALLOC; + + guest_l1e_t *l1p = NULL; + guest_l2e_t *l2p = NULL; + guest_l3e_t *l3p = NULL; + guest_l4e_t *l4p = NULL; + + bool_t sp= 0; + + memset(gw, 0, sizeof(*gw)); + gw->va = l2ga; + + /* Map the l4 root entry */ + l4p = map_domain_gfn(p2m, base_gfn, &gw->l4mfn, &p2mt, &qt, &rc); + if ( !l4p ) + goto map_err; + gw->l4e = l4p[guest_l4_table_offset(l2ga)]; + if (!nept_present_check(gw->l4e.l4)) + goto non_present; + if (nept_misconfiguration_check(gw->l4e.l4, 4)) + goto misconfig_err; + + /* Map the l3 table */ + base_gfn = guest_l4e_get_gfn(gw->l4e); + l3p = map_domain_gfn(p2m, base_gfn, &gw->l3mfn, &p2mt, &qt, &rc); + if( l3p == NULL ) + goto map_err; + + /* Get the l3e and check its flags*/ + gw->l3e = l3p[guest_l3_table_offset(l2ga)]; + if ( !nept_present_check(gw->l3e.l3) ) + goto non_present; + if ( nept_misconfiguration_check(gw->l3e.l3, 3) ) + goto misconfig_err; + + sp = nept_sp_entry(gw->l3e.l3); + /* Super 1G entry */ + if ( sp ) + { + /* Generate a fake l1 table entry so callers don''t all + * have to understand superpages. */ + gfn_t start = guest_l3e_get_gfn(gw->l3e); + + /* Increment the pfn by the right number of 4k pages. */ + start = _gfn((gfn_x(start) & ~GUEST_L3_GFN_MASK) + + ((l2ga >> PAGE_SHIFT) & GUEST_L3_GFN_MASK)); + gflags = (gw->l3e.l3 & 0x7f) | NEPT_1G_ENTRY_FLAG; + gw->l1e = guest_l1e_from_gfn(start, gflags); + gw->l2mfn = gw->l1mfn = _mfn(INVALID_MFN); + goto done; + } + + /* Map the l2 table */ + base_gfn = guest_l3e_get_gfn(gw->l3e); + l2p = map_domain_gfn(p2m, base_gfn, &gw->l2mfn, &p2mt, &qt, &rc); + if( l2p == NULL ) + goto map_err; + /* Get the l2e */ + gw->l2e = l2p[guest_l2_table_offset(l2ga)]; + if ( !nept_present_check(gw->l2e.l2) ) + goto non_present; + if ( nept_misconfiguration_check(gw->l2e.l2, 2) ) + goto misconfig_err; + sp = nept_sp_entry(gw->l2e.l2); + + if ( sp ) + { + gfn_t start = guest_l2e_get_gfn(gw->l2e); + gflags = (gw->l2e.l2 & 0x7f) | NEPT_2M_ENTRY_FLAG; + + /* Increment the pfn by the right number of 4k pages.*/ + start = _gfn((gfn_x(start) & ~GUEST_L2_GFN_MASK) + + guest_l1_table_offset(l2ga)); + gw->l1e = guest_l1e_from_gfn(start, gflags); + gw->l1mfn = _mfn(INVALID_MFN); + goto done; + } + /* Not a superpage: carry on and find the l1e. */ + base_gfn = guest_l2e_get_gfn(gw->l2e); + l1p = map_domain_gfn(p2m, base_gfn, &gw->l1mfn, &p2mt, &qt, &rc); + if( l1p == NULL ) + goto map_err; + /* Get the l1e */ + gw->l1e = l1p[guest_l1_table_offset(l2ga)]; + if ( !nept_present_check(gw->l1e.l1) ) + goto non_present; + if ( nept_misconfiguration_check(gw->l1e.l1, 1) ) + goto misconfig_err; + + gflags = (gw->l1e.l1 & 0x7f) | NEPT_4K_ENTRY_FLAG; + gw->l1e.l1 = (gw->l1e.l1 & PAGE_MASK) | gflags; + +done: + ret = EPT_TRANSLATE_SUCCEED; + goto unmap; + +misconfig_err: + ret = EPT_TRANSLATE_MISCONFIG; + goto unmap; + +map_err: + if ( rc == _PAGE_PAGED ) + ret = EPT_TRANSLATE_RETRY; + else + ret = EPT_TRANSLATE_ERR_PAGE; + goto unmap; + +non_present: + ret = EPT_TRANSLATE_VIOLATION; + +unmap: + if ( l4p ) + { + unmap_domain_page(l4p); + put_page(mfn_to_page(mfn_x(gw->l4mfn))); + } + if ( l3p ) + { + unmap_domain_page(l3p); + put_page(mfn_to_page(mfn_x(gw->l3mfn))); + } + if ( l2p ) + { + unmap_domain_page(l2p); + put_page(mfn_to_page(mfn_x(gw->l2mfn))); + } + if ( l1p ) + { + unmap_domain_page(l1p); + put_page(mfn_to_page(mfn_x(gw->l1mfn))); + } + return ret; +} + +/* Translate a L2 guest address to L1 gpa via L1 EPT paging structure */ + +int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, + unsigned int *page_order, uint32_t rwx_acc, + unsigned long *l1gfn, uint64_t *exit_qual, + uint32_t *exit_reason) +{ + uint32_t rc, rwx_bits = 0; + walk_t gw; + + *l1gfn = INVALID_GFN; + + rc = nept_walk_tables(v, l2ga, &gw); + switch ( rc ) { + case EPT_TRANSLATE_SUCCEED: + if ( likely(gw.l1e.l1 & NEPT_2M_ENTRY_FLAG) ) + { + rwx_bits = gw.l4e.l4 & gw.l3e.l3 & gw.l2e.l2 & 0x7; + *page_order = 9; + } + else if ( gw.l1e.l1 & NEPT_4K_ENTRY_FLAG ) { + rwx_bits = gw.l4e.l4 & gw.l3e.l3 & gw.l2e.l2 & gw.l1e.l1 & 0x7; + *page_order = 0; + } + else if ( gw.l1e.l1 & NEPT_1G_ENTRY_FLAG ) + { + rwx_bits = gw.l4e.l4 & gw.l3e.l3 & 0x7; + *page_order = 18; + } + else + gdprintk(XENLOG_ERR, "Uncorrect l1 entry!\n"); + + *l1gfn = guest_l1e_get_paddr(gw.l1e) >> PAGE_SHIFT; + break; + case EPT_TRANSLATE_VIOLATION: + *exit_qual = (*exit_qual & 0xffffffc0) | (rwx_bits << 3) | rwx_acc; + *exit_reason = EXIT_REASON_EPT_VIOLATION; + break; + + case EPT_TRANSLATE_ERR_PAGE: + break; + case EPT_TRANSLATE_MISCONFIG: + rc = EPT_TRANSLATE_MISCONFIG; + *exit_qual = 0; + *exit_reason = EXIT_REASON_EPT_MISCONFIG; + break; + case EPT_TRANSLATE_RETRY: + break; + default: + gdprintk(XENLOG_ERR, "Unsupported ept translation type!:%d\n", rc); + } + return rc; +} diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c index 8787c91..6d1264b 100644 --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -217,7 +217,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, /* let caller to handle these two cases */ switch (rv) { case NESTEDHVM_PAGEFAULT_INJECT: - return rv; + case NESTEDHVM_PAGEFAULT_RETRY: case NESTEDHVM_PAGEFAULT_L1_ERROR: return rv; case NESTEDHVM_PAGEFAULT_DONE: diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c index 4967da1..409198c 100644 --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -4582,7 +4582,7 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v, /* Translate the GFN to an MFN */ ASSERT(!paging_locked_by_me(v->domain)); mfn = get_gfn(v->domain, _gfn(gfn), &p2mt); - + if ( p2m_is_readonly(p2mt) ) { put_gfn(v->domain, gfn); diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h index 4e1dda0..600c52d 100644 --- a/xen/include/asm-x86/guest_pt.h +++ b/xen/include/asm-x86/guest_pt.h @@ -315,6 +315,14 @@ guest_walk_to_page_order(walk_t *gw) #define GPT_RENAME2(_n, _l) _n ## _ ## _l ## _levels #define GPT_RENAME(_n, _l) GPT_RENAME2(_n, _l) #define guest_walk_tables GPT_RENAME(guest_walk_tables, GUEST_PAGING_LEVELS) +#define map_domain_gfn GPT_RENAME(map_domain_gfn, GUEST_PAGING_LEVELS) + +extern void *map_domain_gfn(struct p2m_domain *p2m, + gfn_t gfn, + mfn_t *mfn, + p2m_type_t *p2mt, + p2m_query_t *q, + uint32_t *rc); extern uint32_t guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, unsigned long va, diff --git a/xen/include/asm-x86/hvm/nestedhvm.h b/xen/include/asm-x86/hvm/nestedhvm.h index 91fde0b..649c511 100644 --- a/xen/include/asm-x86/hvm/nestedhvm.h +++ b/xen/include/asm-x86/hvm/nestedhvm.h @@ -52,6 +52,7 @@ bool_t nestedhvm_vcpu_in_guestmode(struct vcpu *v); #define NESTEDHVM_PAGEFAULT_L1_ERROR 2 #define NESTEDHVM_PAGEFAULT_L0_ERROR 3 #define NESTEDHVM_PAGEFAULT_MMIO 4 +#define NESTEDHVM_PAGEFAULT_RETRY 5 int nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, bool_t access_r, bool_t access_w, bool_t access_x); diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index ef2c9c9..9a728b6 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -194,6 +194,7 @@ extern u32 vmx_secondary_exec_control; extern bool_t cpu_has_vmx_ins_outs_instr_info; +#define VMX_EPT_EXEC_ONLY_SUPPORTED 0x00000001 #define VMX_EPT_WALK_LENGTH_4_SUPPORTED 0x00000040 #define VMX_EPT_MEMORY_TYPE_UC 0x00000100 #define VMX_EPT_MEMORY_TYPE_WB 0x00004000 diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index 422f006..8eb377b 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -32,6 +32,10 @@ struct nestedvmx { unsigned long intr_info; u32 error_code; } intr; + struct { + uint32_t exit_reason; + uint32_t exit_qual; + } ept_exit; }; #define vcpu_2_nvmx(v) (vcpu_nestedhvm(v).u.nvmx) @@ -109,6 +113,12 @@ void nvmx_domain_relinquish_resources(struct domain *d); int nvmx_handle_vmxon(struct cpu_user_regs *regs); int nvmx_handle_vmxoff(struct cpu_user_regs *regs); +#define EPT_TRANSLATE_SUCCEED 0 +#define EPT_TRANSLATE_VIOLATION 1 +#define EPT_TRANSLATE_ERR_PAGE 2 +#define EPT_TRANSLATE_MISCONFIG 3 +#define EPT_TRANSLATE_RETRY 4 + int nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, unsigned int *page_order, @@ -192,5 +202,9 @@ u64 nvmx_get_tsc_offset(struct vcpu *v); int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs, unsigned int exit_reason); +int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, + unsigned int *page_order, uint32_t rwx_acc, + unsigned long *l1gfn, uint64_t *exit_qual, + uint32_t *exit_reason); #endif /* __ASM_X86_HVM_VVMX_H__ */ -- 1.7.1
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 04/11] nEPT: Do further permission check for sucessful translation.
From: Zhang Xiantao <xiantao.zhang@intel.com> If permission check fails, inject EPT violation vmexit to guest. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Signed-off-by: Xu Dongxiao<dongxiao.xu@intel.com> --- xen/arch/x86/mm/hap/nested_ept.c | 24 ++++++++++++++++++++---- 1 files changed, 20 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c index da868e7..2d733a8 100644 --- a/xen/arch/x86/mm/hap/nested_ept.c +++ b/xen/arch/x86/mm/hap/nested_ept.c @@ -272,6 +272,16 @@ unmap: return ret; } +static +bool_t nept_permission_check(uint32_t rwx_acc, uint32_t rwx_bits) +{ + if ( ((rwx_acc & 0x1) && !(rwx_bits & 0x1)) || + ((rwx_acc & 0x2) && !(rwx_bits & 0x2 )) || + ((rwx_acc & 0x4) && !(rwx_bits & 0x4 )) ) + return 0; + return 1; +} + /* Translate a L2 guest address to L1 gpa via L1 EPT paging structure */ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, @@ -301,11 +311,17 @@ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, rwx_bits = gw.l4e.l4 & gw.l3e.l3 & 0x7; *page_order = 18; } - else + else { gdprintk(XENLOG_ERR, "Uncorrect l1 entry!\n"); - - *l1gfn = guest_l1e_get_paddr(gw.l1e) >> PAGE_SHIFT; - break; + BUG(); + } + if ( nept_permission_check(rwx_acc, rwx_bits) ) + { + *l1gfn = guest_l1e_get_paddr(gw.l1e) >> PAGE_SHIFT; + break; + } + rc = EPT_TRANSLATE_VIOLATION; + /* Fall through to EPT violation if permission check fails. */ case EPT_TRANSLATE_VIOLATION: *exit_qual = (*exit_qual & 0xffffffc0) | (rwx_bits << 3) | rwx_acc; *exit_reason = EXIT_REASON_EPT_VIOLATION; -- 1.7.1
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 05/11] EPT: Make ept data structure or operations neutral
From: Zhang Xiantao <xiantao.zhang@intel.com> Share the current EPT logic with nested EPT case, so make the related data structure or operations netural to comment EPT and nested EPT. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/vmx/vmcs.c | 2 +- xen/arch/x86/hvm/vmx/vmx.c | 39 +++++++++------ xen/arch/x86/mm/p2m-ept.c | 96 ++++++++++++++++++++++++++++-------- xen/arch/x86/mm/p2m.c | 16 +++++- xen/include/asm-x86/hvm/vmx/vmcs.h | 30 +++++++---- xen/include/asm-x86/hvm/vmx/vmx.h | 6 ++- xen/include/asm-x86/p2m.h | 1 + 7 files changed, 137 insertions(+), 53 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 9adc7a4..b9ebdfe 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -942,7 +942,7 @@ static int construct_vmcs(struct vcpu *v) } if ( paging_mode_hap(d) ) - __vmwrite(EPT_POINTER, d->arch.hvm_domain.vmx.ept_control.eptp); + __vmwrite(EPT_POINTER, d->arch.hvm_domain.vmx.ept.ept_ctl.eptp); if ( cpu_has_vmx_pat && paging_mode_hap(d) ) { diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index c67ac59..06455bf 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -79,22 +79,23 @@ static void __ept_sync_domain(void *info); static int vmx_domain_initialise(struct domain *d) { int rc; + struct ept_data *ept = &d->arch.hvm_domain.vmx.ept; /* Set the memory type used when accessing EPT paging structures. */ - d->arch.hvm_domain.vmx.ept_control.ept_mt = EPT_DEFAULT_MT; + ept->ept_ctl.ept_mt = EPT_DEFAULT_MT; /* set EPT page-walk length, now it''s actual walk length - 1, i.e. 3 */ - d->arch.hvm_domain.vmx.ept_control.ept_wl = 3; + ept->ept_ctl.ept_wl = 3; - d->arch.hvm_domain.vmx.ept_control.asr + ept->ept_ctl.asr pagetable_get_pfn(p2m_get_pagetable(p2m_get_hostp2m(d))); - if ( !zalloc_cpumask_var(&d->arch.hvm_domain.vmx.ept_synced) ) + if ( !zalloc_cpumask_var(&ept->ept_synced) ) return -ENOMEM; if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 ) { - free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced); + free_cpumask_var(ept->ept_synced); return rc; } @@ -103,9 +104,10 @@ static int vmx_domain_initialise(struct domain *d) static void vmx_domain_destroy(struct domain *d) { + struct ept_data *ept = &d->arch.hvm_domain.vmx.ept; if ( paging_mode_hap(d) ) - on_each_cpu(__ept_sync_domain, d, 1); - free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced); + on_each_cpu(__ept_sync_domain, p2m_get_hostp2m(d), 1); + free_cpumask_var(ept->ept_synced); vmx_free_vlapic_mapping(d); } @@ -641,6 +643,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v) { struct domain *d = v->domain; unsigned long old_cr4 = read_cr4(), new_cr4 = mmu_cr4_features; + struct ept_data *ept_data = p2m_get_hostp2m(d)->hap_data; /* HOST_CR4 in VMCS is always mmu_cr4_features. Sync CR4 now. */ if ( old_cr4 != new_cr4 ) @@ -650,10 +653,10 @@ static void vmx_ctxt_switch_to(struct vcpu *v) { unsigned int cpu = smp_processor_id(); /* Test-and-test-and-set this CPU in the EPT-is-synced mask. */ - if ( !cpumask_test_cpu(cpu, d->arch.hvm_domain.vmx.ept_synced) && + if ( !cpumask_test_cpu(cpu, ept_get_synced_mask(ept_data)) && !cpumask_test_and_set_cpu(cpu, - d->arch.hvm_domain.vmx.ept_synced) ) - __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0); + ept_get_synced_mask(ept_data)) ) + __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0); } vmx_restore_guest_msrs(v); @@ -1218,12 +1221,16 @@ static void vmx_update_guest_efer(struct vcpu *v) static void __ept_sync_domain(void *info) { - struct domain *d = info; - __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0); + struct p2m_domain *p2m = info; + struct ept_data *ept_data = p2m->hap_data; + + __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0); } -void ept_sync_domain(struct domain *d) +void ept_sync_domain(struct p2m_domain *p2m) { + struct domain *d = p2m->domain; + struct ept_data *ept_data = p2m->hap_data; /* Only if using EPT and this domain has some VCPUs to dirty. */ if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] ) return; @@ -1236,11 +1243,11 @@ void ept_sync_domain(struct domain *d) * the ept_synced mask before on_selected_cpus() reads it, resulting in * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack. */ - cpumask_and(d->arch.hvm_domain.vmx.ept_synced, + cpumask_and(ept_get_synced_mask(ept_data), d->domain_dirty_cpumask, &cpu_online_map); - on_selected_cpus(d->arch.hvm_domain.vmx.ept_synced, - __ept_sync_domain, d, 1); + on_selected_cpus(ept_get_synced_mask(ept_data), + __ept_sync_domain, p2m, 1); } void nvmx_enqueue_n2_exceptions(struct vcpu *v, diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index c964f54..8adf3f9 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -291,9 +291,11 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, int need_modify_vtd_table = 1; int vtd_pte_present = 0; int needs_sync = 1; - struct domain *d = p2m->domain; ept_entry_t old_entry = { .epte = 0 }; + struct ept_data *ept_data = p2m->hap_data; + struct domain *d = p2m->domain; + ASSERT(ept_data); /* * the caller must make sure: * 1. passing valid gfn and mfn at order boundary. @@ -301,17 +303,17 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, * 3. passing a valid order. */ if ( ((gfn | mfn_x(mfn)) & ((1UL << order) - 1)) || - ((u64)gfn >> ((ept_get_wl(d) + 1) * EPT_TABLE_ORDER)) || + ((u64)gfn >> ((ept_get_wl(ept_data) + 1) * EPT_TABLE_ORDER)) || (order % EPT_TABLE_ORDER) ) return 0; - ASSERT((target == 2 && hvm_hap_has_1gb(d)) || - (target == 1 && hvm_hap_has_2mb(d)) || + ASSERT((target == 2 && hvm_hap_has_1gb()) || + (target == 1 && hvm_hap_has_2mb()) || (target == 0)); - table = map_domain_page(ept_get_asr(d)); + table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); - for ( i = ept_get_wl(d); i > target; i-- ) + for ( i = ept_get_wl(ept_data); i > target; i-- ) { ret = ept_next_level(p2m, 0, &table, &gfn_remainder, i); if ( !ret ) @@ -439,9 +441,11 @@ out: unmap_domain_page(table); if ( needs_sync ) - ept_sync_domain(p2m->domain); + ept_sync_domain(p2m); - if ( rv && iommu_enabled && need_iommu(p2m->domain) && need_modify_vtd_table ) + /* For non-nested p2m, may need to change VT-d page table.*/ + if ( rv && !p2m_is_nestedp2m(p2m) && iommu_enabled && need_iommu(p2m->domain) && + need_modify_vtd_table ) { if ( iommu_hap_pt_share ) iommu_pte_flush(d, gfn, (u64*)ept_entry, order, vtd_pte_present); @@ -488,14 +492,14 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m, unsigned long gfn, p2m_type_t *t, p2m_access_t* a, p2m_query_t q, unsigned int *page_order) { - struct domain *d = p2m->domain; - ept_entry_t *table = map_domain_page(ept_get_asr(d)); + ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); unsigned long gfn_remainder = gfn; ept_entry_t *ept_entry; u32 index; int i; int ret = 0; mfn_t mfn = _mfn(INVALID_MFN); + struct ept_data *ept_data = p2m->hap_data; *t = p2m_mmio_dm; *a = p2m_access_n; @@ -506,7 +510,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m, /* Should check if gfn obeys GAW here. */ - for ( i = ept_get_wl(d); i > 0; i-- ) + for ( i = ept_get_wl(ept_data); i > 0; i-- ) { retry: ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); @@ -588,19 +592,20 @@ out: static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m, unsigned long gfn, int *level) { - ept_entry_t *table = map_domain_page(ept_get_asr(p2m->domain)); + ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); unsigned long gfn_remainder = gfn; ept_entry_t *ept_entry; ept_entry_t content = { .epte = 0 }; u32 index; int i; int ret=0; + struct ept_data *ept_data = p2m->hap_data; /* This pfn is higher than the highest the p2m map currently holds */ if ( gfn > p2m->max_mapped_pfn ) goto out; - for ( i = ept_get_wl(p2m->domain); i > 0; i-- ) + for ( i = ept_get_wl(ept_data); i > 0; i-- ) { ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); if ( !ret || ret == GUEST_TABLE_POD_PAGE ) @@ -622,7 +627,8 @@ static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m, void ept_walk_table(struct domain *d, unsigned long gfn) { struct p2m_domain *p2m = p2m_get_hostp2m(d); - ept_entry_t *table = map_domain_page(ept_get_asr(d)); + struct ept_data *ept_data = p2m->hap_data; + ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); unsigned long gfn_remainder = gfn; int i; @@ -638,7 +644,7 @@ void ept_walk_table(struct domain *d, unsigned long gfn) goto out; } - for ( i = ept_get_wl(d); i >= 0; i-- ) + for ( i = ept_get_wl(ept_data); i >= 0; i-- ) { ept_entry_t *ept_entry, *next; u32 index; @@ -778,16 +784,16 @@ static void ept_change_entry_type_page(mfn_t ept_page_mfn, int ept_page_level, static void ept_change_entry_type_global(struct p2m_domain *p2m, p2m_type_t ot, p2m_type_t nt) { - struct domain *d = p2m->domain; - if ( ept_get_asr(d) == 0 ) + struct ept_data *ept_data = p2m->hap_data; + if ( ept_get_asr(ept_data) == 0 ) return; BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt)); BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt == p2m_mmio_direct)); - ept_change_entry_type_page(_mfn(ept_get_asr(d)), ept_get_wl(d), ot, nt); + ept_change_entry_type_page(_mfn(ept_get_asr(ept_data)), ept_get_wl(ept_data), ot, nt); - ept_sync_domain(d); + ept_sync_domain(p2m); } void ept_p2m_init(struct p2m_domain *p2m) @@ -811,6 +817,7 @@ static void ept_dump_p2m_table(unsigned char key) unsigned long gfn, gfn_remainder; unsigned long record_counter = 0; struct p2m_domain *p2m; + struct ept_data *ept_data; for_each_domain(d) { @@ -818,15 +825,16 @@ static void ept_dump_p2m_table(unsigned char key) continue; p2m = p2m_get_hostp2m(d); + ept_data = p2m->hap_data; printk("\ndomain%d EPT p2m table: \n", d->domain_id); for ( gfn = 0; gfn <= p2m->max_mapped_pfn; gfn += (1 << order) ) { gfn_remainder = gfn; mfn = _mfn(INVALID_MFN); - table = map_domain_page(ept_get_asr(d)); + table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); - for ( i = ept_get_wl(d); i > 0; i-- ) + for ( i = ept_get_wl(ept_data); i > 0; i-- ) { ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); if ( ret != GUEST_TABLE_NORMAL_PAGE ) @@ -858,6 +866,52 @@ out: } } +int alloc_p2m_hap_data(struct p2m_domain *p2m) +{ + struct domain *d = p2m->domain; + struct ept_data *ept; + + ASSERT(d); + if (!hap_enabled(d)) + return 0; + + p2m->hap_data = ept = xzalloc(struct ept_data); + if ( !p2m->hap_data ) + return -ENOMEM; + if ( !zalloc_cpumask_var(&ept->ept_synced) ) + { + xfree(ept); + p2m->hap_data = NULL; + return -ENOMEM; + } + return 0; +} + +void free_p2m_hap_data(struct p2m_domain *p2m) +{ + struct ept_data *ept; + + if ( !hap_enabled(p2m->domain) ) + return; + + if ( p2m_is_nestedp2m(p2m)) { + ept = p2m->hap_data; + if ( ept ) { + free_cpumask_var(ept->ept_synced); + xfree(ept); + } + } +} + +void p2m_init_hap_data(struct p2m_domain *p2m) +{ + struct ept_data *ept = p2m->hap_data; + + ept->ept_ctl.ept_wl = 3; + ept->ept_ctl.ept_mt = EPT_DEFAULT_MT; + ept->ept_ctl.asr = pagetable_get_pfn(p2m_get_pagetable(p2m)); +} + static struct keyhandler ept_p2m_table = { .diagnostic = 0, .u.fn = ept_dump_p2m_table, diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 62c2d78..799bbfb 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -105,6 +105,8 @@ p2m_init_nestedp2m(struct domain *d) if ( !zalloc_cpumask_var(&p2m->dirty_cpumask) ) return -ENOMEM; p2m_initialise(d, p2m); + if ( cpu_has_vmx && alloc_p2m_hap_data(p2m) ) + return -ENOMEM; p2m->write_p2m_entry = nestedp2m_write_p2m_entry; list_add(&p2m->np2m_list, &p2m_get_hostp2m(d)->np2m_list); } @@ -126,12 +128,14 @@ int p2m_init(struct domain *d) return -ENOMEM; } p2m_initialise(d, p2m); + if ( hap_enabled(d) && cpu_has_vmx) + p2m->hap_data = &d->arch.hvm_domain.vmx.ept; /* Must initialise nestedp2m unconditionally * since nestedhvm_enabled(d) returns false here. * (p2m_init runs too early for HVM_PARAM_* options) */ rc = p2m_init_nestedp2m(d); - if ( rc ) + if ( rc ) p2m_final_teardown(d); return rc; } @@ -354,6 +358,8 @@ int p2m_alloc_table(struct p2m_domain *p2m) if ( hap_enabled(d) ) iommu_share_p2m_table(d); + if ( p2m_is_nestedp2m(p2m) && hap_enabled(d) ) + p2m_init_hap_data(p2m); P2M_PRINTK("populating p2m table\n"); @@ -436,12 +442,16 @@ void p2m_teardown(struct p2m_domain *p2m) static void p2m_teardown_nestedp2m(struct domain *d) { uint8_t i; + struct p2m_domain *p2m; for (i = 0; i < MAX_NESTEDP2M; i++) { if ( !d->arch.nested_p2m[i] ) continue; - free_cpumask_var(d->arch.nested_p2m[i]->dirty_cpumask); - xfree(d->arch.nested_p2m[i]); + p2m = d->arch.nested_p2m[i]; + if ( p2m->hap_data ) + free_p2m_hap_data(p2m); + free_cpumask_var(p2m->dirty_cpumask); + xfree(p2m); d->arch.nested_p2m[i] = NULL; } } diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index 9a728b6..e6b4e3b 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -56,26 +56,34 @@ struct vmx_msr_state { #define EPT_DEFAULT_MT MTRR_TYPE_WRBACK -struct vmx_domain { - unsigned long apic_access_mfn; - union { - struct { +union eptp_control{ + struct { u64 ept_mt :3, ept_wl :3, rsvd :6, asr :52; }; u64 eptp; - } ept_control; +}; + +struct ept_data{ + union eptp_control ept_ctl; cpumask_var_t ept_synced; }; -#define ept_get_wl(d) \ - ((d)->arch.hvm_domain.vmx.ept_control.ept_wl) -#define ept_get_asr(d) \ - ((d)->arch.hvm_domain.vmx.ept_control.asr) -#define ept_get_eptp(d) \ - ((d)->arch.hvm_domain.vmx.ept_control.eptp) +struct vmx_domain { + unsigned long apic_access_mfn; + struct ept_data ept; +}; + +#define ept_get_wl(ept_data) \ + (((struct ept_data*)(ept_data))->ept_ctl.ept_wl) +#define ept_get_asr(ept_data) \ + (((struct ept_data*)(ept_data))->ept_ctl.asr) +#define ept_get_eptp(ept_data) \ + (((struct ept_data*)(ept_data))->ept_ctl.eptp) +#define ept_get_synced_mask(ept_data)\ + (((struct ept_data*)(ept_data))->ept_synced) struct arch_vmx_struct { /* Virtual address of VMCS. */ diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index aa5b080..573a12e 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -333,7 +333,7 @@ static inline void ept_sync_all(void) __invept(INVEPT_ALL_CONTEXT, 0, 0); } -void ept_sync_domain(struct domain *d); +void ept_sync_domain(struct p2m_domain *p2m); static inline void vpid_sync_vcpu_gva(struct vcpu *v, unsigned long gva) { @@ -401,6 +401,10 @@ void setup_ept_dump(void); void update_guest_eip(void); +int alloc_p2m_hap_data(struct p2m_domain *p2m); +void free_p2m_hap_data(struct p2m_domain *p2m); +void p2m_init_hap_data(struct p2m_domain *p2m); + /* EPT violation qualifications definitions */ #define _EPT_READ_VIOLATION 0 #define EPT_READ_VIOLATION (1UL<<_EPT_READ_VIOLATION) diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 1807ad6..0fb1b2d 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -277,6 +277,7 @@ struct p2m_domain { mm_lock_t lock; /* Locking of private pod structs, * * not relying on the p2m lock. */ } pod; + void *hap_data; }; /* get host p2m table */ -- 1.7.1
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 06/11] nEPT: Try to enable EPT paging for L2 guest.
From: Zhang Xiantao <xiantao.zhang@intel.com> Once found EPT is enabled by L1 VMM, enabled nested EPT support for L2 guest. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/vmx/vmx.c | 16 +++++++++-- xen/arch/x86/hvm/vmx/vvmx.c | 50 ++++++++++++++++++++++++++++-------- xen/include/asm-x86/hvm/vmx/vvmx.h | 5 +++- 3 files changed, 56 insertions(+), 15 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 06455bf..1bfb67f 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1513,6 +1513,7 @@ static struct hvm_function_table __read_mostly vmx_function_table = { .nhvm_vcpu_guestcr3 = nvmx_vcpu_guestcr3, .nhvm_vcpu_p2m_base = nvmx_vcpu_eptp_base, .nhvm_vcpu_asid = nvmx_vcpu_asid, + .nhvm_vmcx_hap_enabled = nvmx_ept_enabled, .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception, .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap, .nhvm_intr_blocked = nvmx_intr_blocked, @@ -2055,6 +2056,7 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa) unsigned long gla, gfn = gpa >> PAGE_SHIFT; mfn_t mfn; p2m_type_t p2mt; + int ret; struct domain *d = current->domain; if ( tb_init_done ) @@ -2073,14 +2075,22 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa) __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d); } - if ( hvm_hap_nested_page_fault(gpa, + ret = hvm_hap_nested_page_fault(gpa, qualification & EPT_GLA_VALID ? 1 : 0, qualification & EPT_GLA_VALID ? __vmread(GUEST_LINEAR_ADDRESS) : ~0ull, qualification & EPT_READ_VIOLATION ? 1 : 0, qualification & EPT_WRITE_VIOLATION ? 1 : 0, - qualification & EPT_EXEC_VIOLATION ? 1 : 0) ) - return; + qualification & EPT_EXEC_VIOLATION ? 1 : 0); + switch (ret) { + case 0: + break; + case 1: + return; + case -1: + vcpu_nestedhvm(current).nv_vmexit_pending = 1; + return; + } /* Everything else is an error. */ mfn = get_gfn_query_unlocked(d, gfn, &p2mt); diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 76cf757..ab68b52 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -41,6 +41,7 @@ int nvmx_vcpu_initialise(struct vcpu *v) gdprintk(XENLOG_ERR, "nest: allocation for shadow vmcs failed\n"); goto out; } + nvmx->ept.enabled = 0; nvmx->vmxon_region_pa = 0; nvcpu->nv_vvmcx = NULL; nvcpu->nv_vvmcxaddr = VMCX_EADDR; @@ -96,9 +97,11 @@ uint64_t nvmx_vcpu_guestcr3(struct vcpu *v) uint64_t nvmx_vcpu_eptp_base(struct vcpu *v) { - /* TODO */ - ASSERT(0); - return 0; + uint64_t eptp_base; + struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); + + eptp_base = __get_vvmcs(nvcpu->nv_vvmcx, EPT_POINTER); + return eptp_base & PAGE_MASK; } uint32_t nvmx_vcpu_asid(struct vcpu *v) @@ -108,6 +111,13 @@ uint32_t nvmx_vcpu_asid(struct vcpu *v) return 0; } +bool_t nvmx_ept_enabled(struct vcpu *v) +{ + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + + return !!(nvmx->ept.enabled); +} + static const enum x86_segment sreg_to_index[] = { [VMX_SREG_ES] = x86_seg_es, [VMX_SREG_CS] = x86_seg_cs, @@ -503,14 +513,16 @@ void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl) } void nvmx_update_secondary_exec_control(struct vcpu *v, - unsigned long value) + unsigned long host_cntrl) { u32 shadow_cntrl; struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); shadow_cntrl = __get_vvmcs(nvcpu->nv_vvmcx, SECONDARY_VM_EXEC_CONTROL); - shadow_cntrl |= value; - set_shadow_control(v, SECONDARY_VM_EXEC_CONTROL, shadow_cntrl); + nvmx->ept.enabled = !!(shadow_cntrl & SECONDARY_EXEC_ENABLE_EPT); + shadow_cntrl |= host_cntrl; + __vmwrite(SECONDARY_VM_EXEC_CONTROL, shadow_cntrl); } static void nvmx_update_pin_control(struct vcpu *v, unsigned long host_cntrl) @@ -818,6 +830,19 @@ static void load_shadow_guest_state(struct vcpu *v) /* TODO: CR3 target control */ } + +static uint64_t get_shadow_eptp(struct vcpu *v) +{ + uint64_t eptp_asr; + uint64_t np2m_base = nvmx_vcpu_eptp_base(v); + struct p2m_domain *p2m = p2m_get_nestedp2m(v, np2m_base); + struct ept_data *ept_data = p2m->hap_data; + + eptp_asr = pagetable_get_pfn(p2m_get_pagetable(p2m)); + ept_data->ept_ctl.asr = eptp_asr; + return ept_data->ept_ctl.eptp; +} + static void virtual_vmentry(struct cpu_user_regs *regs) { struct vcpu *v = current; @@ -862,7 +887,10 @@ static void virtual_vmentry(struct cpu_user_regs *regs) /* updating host cr0 to sync TS bit */ __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); - /* TODO: EPT_POINTER */ + /* Setup virtual ETP for L2 guest*/ + if ( nestedhvm_paging_mode_hap(v) ) + __vmwrite(EPT_POINTER, get_shadow_eptp(v)); + } static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs) @@ -915,8 +943,8 @@ static void sync_vvmcs_ro(struct vcpu *v) /* Adjust exit_reason/exit_qualifciation for violation case */ if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) = EXIT_REASON_EPT_VIOLATION ) { - __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept_exit.exit_qual); - __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept_exit.exit_reason); + __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept.exit_qual); + __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept.exit_reason); } } @@ -1480,8 +1508,8 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, case EPT_TRANSLATE_VIOLATION: case EPT_TRANSLATE_MISCONFIG: rc = NESTEDHVM_PAGEFAULT_INJECT; - nvmx->ept_exit.exit_reason = exit_reason; - nvmx->ept_exit.exit_qual = exit_qual; + nvmx->ept.exit_reason = exit_reason; + nvmx->ept.exit_qual = exit_qual; break; case EPT_TRANSLATE_RETRY: rc = NESTEDHVM_PAGEFAULT_RETRY; diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index 8eb377b..661cd8a 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -33,9 +33,10 @@ struct nestedvmx { u32 error_code; } intr; struct { + char enabled; uint32_t exit_reason; uint32_t exit_qual; - } ept_exit; + } ept; }; #define vcpu_2_nvmx(v) (vcpu_nestedhvm(v).u.nvmx) @@ -110,6 +111,8 @@ int nvmx_intercepts_exception(struct vcpu *v, unsigned int trap, int error_code); void nvmx_domain_relinquish_resources(struct domain *d); +bool_t nvmx_ept_enabled(struct vcpu *v); + int nvmx_handle_vmxon(struct cpu_user_regs *regs); int nvmx_handle_vmxoff(struct cpu_user_regs *regs); -- 1.7.1
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 07/11] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode
From: Zhang Xiantao <xiantao.zhang@intel.com> For PAE L2 guest, GUEST_DPPTR registers needs to be synced for each virtual vmentry. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/vmx/vvmx.c | 10 ++++++++-- 1 files changed, 8 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index ab68b52..3fc128b 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -824,9 +824,15 @@ static void load_shadow_guest_state(struct vcpu *v) vvmcs_to_shadow(vvmcs, CR0_READ_SHADOW); vvmcs_to_shadow(vvmcs, CR4_READ_SHADOW); vvmcs_to_shadow(vvmcs, CR0_GUEST_HOST_MASK); - vvmcs_to_shadow(vvmcs, CR4_GUEST_HOST_MASK); - /* TODO: PDPTRs for nested ept */ + if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) && + (v->arch.hvm_vcpu.guest_efer & EFER_LMA) ) { + vvmcs_to_shadow(vvmcs, GUEST_PDPTR0); + vvmcs_to_shadow(vvmcs, GUEST_PDPTR1); + vvmcs_to_shadow(vvmcs, GUEST_PDPTR2); + vvmcs_to_shadow(vvmcs, GUEST_PDPTR3); + } + /* TODO: CR3 target control */ } -- 1.7.1
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 08/11] nEPT: Use minimal permission for nested p2m.
From: Zhang Xiantao <xiantao.zhang@intel.com> Emulate permission check for the nested p2m. Current solution is to use minimal permission, and once meet permission violation in L0, then determin whether it is caused by guest EPT or host EPT Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/svm/nestedsvm.c | 2 +- xen/arch/x86/hvm/vmx/vvmx.c | 4 ++-- xen/arch/x86/mm/hap/nested_ept.c | 9 +++++---- xen/arch/x86/mm/hap/nested_hap.c | 22 +++++++++++++--------- xen/include/asm-x86/hvm/hvm.h | 2 +- xen/include/asm-x86/hvm/svm/nestedsvm.h | 2 +- xen/include/asm-x86/hvm/vmx/vvmx.h | 6 +++--- 7 files changed, 26 insertions(+), 21 deletions(-) diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index 5dcb354..ab455a9 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -1177,7 +1177,7 @@ nsvm_vmcb_hap_enabled(struct vcpu *v) */ int nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x) { uint32_t pfec; diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 3fc128b..41779bc 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1494,7 +1494,7 @@ int nvmx_msr_write_intercept(unsigned int msr, u64 msr_content) */ int nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x) { uint64_t exit_qual = __vmread(EXIT_QUALIFICATION); @@ -1504,7 +1504,7 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, uint32_t rwx_rights = (access_x << 2) | (access_w << 1) | access_r; struct nestedvmx *nvmx = &vcpu_2_nvmx(v); - rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn, + rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn, p2m_acc, &exit_qual, &exit_reason); switch ( rc ) { case EPT_TRANSLATE_SUCCEED: diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c index 2d733a8..637db1a 100644 --- a/xen/arch/x86/mm/hap/nested_ept.c +++ b/xen/arch/x86/mm/hap/nested_ept.c @@ -286,8 +286,8 @@ bool_t nept_permission_check(uint32_t rwx_acc, uint32_t rwx_bits) int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, unsigned int *page_order, uint32_t rwx_acc, - unsigned long *l1gfn, uint64_t *exit_qual, - uint32_t *exit_reason) + unsigned long *l1gfn, uint8_t *p2m_acc, + uint64_t *exit_qual, uint32_t *exit_reason) { uint32_t rc, rwx_bits = 0; walk_t gw; @@ -317,8 +317,9 @@ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, } if ( nept_permission_check(rwx_acc, rwx_bits) ) { - *l1gfn = guest_l1e_get_paddr(gw.l1e) >> PAGE_SHIFT; - break; + *l1gfn = guest_l1e_get_paddr(gw.l1e) >> PAGE_SHIFT; + *p2m_acc = (uint8_t)rwx_bits; + break; } rc = EPT_TRANSLATE_VIOLATION; /* Fall through to EPT violation if permission check fails. */ diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c index 6d1264b..9c1654d 100644 --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -142,12 +142,12 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m, */ static int nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x) { ASSERT(hvm_funcs.nhvm_hap_walk_L1_p2m); - return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order, + return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order, p2m_acc, access_r, access_w, access_x); } @@ -158,16 +158,15 @@ nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, */ static int nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa, - p2m_type_t *p2mt, + p2m_type_t *p2mt, p2m_access_t *p2ma, unsigned int *page_order, bool_t access_r, bool_t access_w, bool_t access_x) { mfn_t mfn; - p2m_access_t p2ma; int rc; /* walk L0 P2M table */ - mfn = get_gfn_type_access(p2m, L1_gpa >> PAGE_SHIFT, p2mt, &p2ma, + mfn = get_gfn_type_access(p2m, L1_gpa >> PAGE_SHIFT, p2mt, p2ma, 0, page_order); rc = NESTEDHVM_PAGEFAULT_MMIO; @@ -206,12 +205,14 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, struct p2m_domain *p2m, *nested_p2m; unsigned int page_order_21, page_order_10, page_order_20; p2m_type_t p2mt_10; + p2m_access_t p2ma_10; + uint8_t p2ma_21; p2m = p2m_get_hostp2m(d); /* L0 p2m */ nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v)); /* walk the L1 P2M table */ - rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21, + rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21, &p2ma_21, access_r, access_w, access_x); /* let caller to handle these two cases */ @@ -229,7 +230,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, /* ==> we have to walk L0 P2M */ rv = nestedhap_walk_L0_p2m(p2m, L1_gpa, &L0_gpa, - &p2mt_10, &page_order_10, + &p2mt_10, &p2ma_10, &page_order_10, access_r, access_w, access_x); /* let upper level caller to handle these two cases */ @@ -250,10 +251,13 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, page_order_20 = min(page_order_21, page_order_10); + if (p2ma_10 > p2m_access_rwx) + p2ma_10 = p2m_access_rwx; + p2ma_10 &= (p2m_access_t)p2ma_21; /* Use minimal permission for nested p2m. */ + /* fix p2m_get_pagetable(nested_p2m) */ nestedhap_fix_p2m(v, nested_p2m, *L2_gpa, L0_gpa, page_order_20, - p2mt_10, - p2m_access_rwx /* FIXME: Should use minimum permission. */); + p2mt_10, p2ma_10); return NESTEDHVM_PAGEFAULT_DONE; } diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index 80f07e9..889e3c9 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -186,7 +186,7 @@ struct hvm_function_table { /*Walk nested p2m */ int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x); }; diff --git a/xen/include/asm-x86/hvm/svm/nestedsvm.h b/xen/include/asm-x86/hvm/svm/nestedsvm.h index 0c90f30..748cc04 100644 --- a/xen/include/asm-x86/hvm/svm/nestedsvm.h +++ b/xen/include/asm-x86/hvm/svm/nestedsvm.h @@ -134,7 +134,7 @@ void svm_vmexit_do_clgi(struct cpu_user_regs *regs, struct vcpu *v); void svm_vmexit_do_stgi(struct cpu_user_regs *regs, struct vcpu *v); bool_t nestedsvm_gif_isset(struct vcpu *v); int nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x); #define NSVM_INTR_NOTHANDLED 3 diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index 661cd8a..55c0ad1 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -124,7 +124,7 @@ int nvmx_handle_vmxoff(struct cpu_user_regs *regs); int nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x); /* * Virtual VMCS layout @@ -207,7 +207,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs, int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, unsigned int *page_order, uint32_t rwx_acc, - unsigned long *l1gfn, uint64_t *exit_qual, - uint32_t *exit_reason); + unsigned long *l1gfn, uint8_t *p2m_acc, + uint64_t *exit_qual, uint32_t *exit_reason); #endif /* __ASM_X86_HVM_VVMX_H__ */ -- 1.7.1
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 09/11] nEPT: handle invept instruction from L1 VMM
From: Zhang Xiantao <xiantao.zhang@intel.com> Add the INVEPT instruction emulation logic. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/vmx/vmx.c | 6 +++- xen/arch/x86/hvm/vmx/vvmx.c | 37 ++++++++++++++++++++++++++++++++++++ xen/arch/x86/mm/p2m.c | 2 +- xen/include/asm-x86/hvm/vmx/vvmx.h | 1 + 4 files changed, 43 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 1bfb67f..36f6d82 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2622,11 +2622,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) if ( nvmx_handle_vmresume(regs) == X86EMUL_OKAY ) update_guest_eip(); break; - + case EXIT_REASON_INVEPT: + if ( nvmx_handle_invept(regs) == X86EMUL_OKAY ) + update_guest_eip(); + break; case EXIT_REASON_MWAIT_INSTRUCTION: case EXIT_REASON_MONITOR_INSTRUCTION: case EXIT_REASON_GETSEC: - case EXIT_REASON_INVEPT: case EXIT_REASON_INVVPID: /* * We should never exit on GETSEC because CR4.SMXE is always 0 when diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 41779bc..07ca90e 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1357,6 +1357,43 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs) return X86EMUL_OKAY; } +int nvmx_handle_invept(struct cpu_user_regs *regs) +{ + struct vmx_inst_decoded decode; + unsigned long eptp; + u64 inv_type; + + if ( decode_vmx_inst(regs, &decode, &eptp, 0) + != X86EMUL_OKAY ) + return X86EMUL_EXCEPTION; + + inv_type = reg_read(regs, decode.reg2); + gdprintk(XENLOG_DEBUG,"inv_type:%ld, eptp:%lx\n", inv_type, eptp); + + switch (inv_type){ + case INVEPT_SINGLE_CONTEXT: + { + struct p2m_domain *p2m = vcpu_nestedhvm(current).nv_p2m; + if ( p2m ) + { + p2m_flush(current, p2m); + ept_sync_domain(p2m); + } + } + break; + case INVEPT_ALL_CONTEXT: + p2m_flush_nestedp2m(current->domain); + __invept(INVEPT_ALL_CONTEXT, 0, 0); + break; + default: + return X86EMUL_EXCEPTION; + } + vmreturn(regs, VMSUCCEED); + + return X86EMUL_OKAY; +} + + #define __emul_value(enable1, default1) \ ((enable1 | default1) << 32 | (default1)) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 799bbfb..657fc03 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1478,7 +1478,7 @@ p2m_flush_table(struct p2m_domain *p2m) void p2m_flush(struct vcpu *v, struct p2m_domain *p2m) { - ASSERT(v->domain == p2m->domain); + ASSERT(p2m && v->domain == p2m->domain); vcpu_nestedhvm(v).nv_p2m = NULL; p2m_flush_table(p2m); hvm_asid_flush_vcpu(v); diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index 55c0ad1..cf5ed9a 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -190,6 +190,7 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs); int nvmx_handle_vmwrite(struct cpu_user_regs *regs); int nvmx_handle_vmresume(struct cpu_user_regs *regs); int nvmx_handle_vmlaunch(struct cpu_user_regs *regs); +int nvmx_handle_invept(struct cpu_user_regs *regs); int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content); int nvmx_msr_write_intercept(unsigned int msr, -- 1.7.1
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 10/11] nEPT: expost EPT capablity to L1 VMM
From: Zhang Xiantao <xiantao.zhang@intel.com> Expose EPT''s basic features to L1 VMM. No EPT A/D bit feature supported. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/vmx/vvmx.c | 6 +++++- xen/arch/x86/mm/hap/nested_ept.c | 2 +- xen/include/asm-x86/hvm/vmx/vvmx.h | 2 ++ 3 files changed, 8 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 07ca90e..ec875d2 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1457,7 +1457,8 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) case MSR_IA32_VMX_PROCBASED_CTLS2: /* 1-seetings */ data = SECONDARY_EXEC_DESCRIPTOR_TABLE_EXITING | - SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; + SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | + SECONDARY_EXEC_ENABLE_EPT; data = gen_vmx_msr(data, 0, host_data); break; case MSR_IA32_VMX_EXIT_CTLS: @@ -1510,6 +1511,9 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) case MSR_IA32_VMX_MISC: gdprintk(XENLOG_WARNING, "VMX MSR %x not fully supported yet.\n", msr); break; + case MSR_IA32_VMX_EPT_VPID_CAP: + data = nept_get_ept_vpid_cap(); + break; default: r = 0; break; diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c index 637db1a..8dfb70a 100644 --- a/xen/arch/x86/mm/hap/nested_ept.c +++ b/xen/arch/x86/mm/hap/nested_ept.c @@ -48,7 +48,7 @@ #define EPT_EMT_WB 6 #define EPT_EMT_UC 0 -#define NEPT_VPID_CAP_BITS 0 +#define NEPT_VPID_CAP_BITS 0x0000000006134140ul #define NEPT_1G_ENTRY_FLAG (1 << 11) #define NEPT_2M_ENTRY_FLAG (1 << 10) diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index cf5ed9a..fcdce62 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -206,6 +206,8 @@ u64 nvmx_get_tsc_offset(struct vcpu *v); int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs, unsigned int exit_reason); +uint64_t nept_get_ept_vpid_cap(void); + int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, unsigned int *page_order, uint32_t rwx_acc, unsigned long *l1gfn, uint8_t *p2m_acc, -- 1.7.1
xiantao.zhang@intel.com
2012-Dec-10 17:57 UTC
[PATCH 11/11] nVMX: Expose VPID capability to nested VMM.
From: Zhang Xiantao <xiantao.zhang@intel.com> Virtualize VPID for the nested vmm, use host''s VPID to emualte guest''s VPID. For each virtual vmentry, if guest''v vpid is changed, allocate a new host VPID for L2 guest. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/vmx/vmx.c | 10 +++++- xen/arch/x86/hvm/vmx/vvmx.c | 60 +++++++++++++++++++++++++++++++++++- xen/arch/x86/mm/hap/nested_ept.c | 7 ++-- xen/include/asm-x86/hvm/vmx/vvmx.h | 2 + 4 files changed, 73 insertions(+), 6 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 36f6d82..fb40392 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2626,10 +2626,13 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) if ( nvmx_handle_invept(regs) == X86EMUL_OKAY ) update_guest_eip(); break; + case EXIT_REASON_INVVPID: + if ( nvmx_handle_invvpid(regs) == X86EMUL_OKAY ) + update_guest_eip(); + break; case EXIT_REASON_MWAIT_INSTRUCTION: case EXIT_REASON_MONITOR_INSTRUCTION: case EXIT_REASON_GETSEC: - case EXIT_REASON_INVVPID: /* * We should never exit on GETSEC because CR4.SMXE is always 0 when * running in guest context, and the CPU checks that before getting @@ -2747,8 +2750,11 @@ void vmx_vmenter_helper(void) if ( !cpu_has_vmx_vpid ) goto out; + if ( nestedhvm_vcpu_in_guestmode(curr) ) + p_asid = &vcpu_nestedhvm(curr).nv_n2asid; + else + p_asid = &curr->arch.hvm_vcpu.n1asid; - p_asid = &curr->arch.hvm_vcpu.n1asid; old_asid = p_asid->asid; need_flush = hvm_asid_handle_vmenter(p_asid); new_asid = p_asid->asid; diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index ec875d2..28a8e78 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -42,6 +42,7 @@ int nvmx_vcpu_initialise(struct vcpu *v) goto out; } nvmx->ept.enabled = 0; + nvmx->guest_vpid = 0; nvmx->vmxon_region_pa = 0; nvcpu->nv_vvmcx = NULL; nvcpu->nv_vvmcxaddr = VMCX_EADDR; @@ -849,6 +850,16 @@ static uint64_t get_shadow_eptp(struct vcpu *v) return ept_data->ept_ctl.eptp; } +static bool_t nvmx_vpid_enabled(struct nestedvcpu *nvcpu) +{ + uint32_t second_cntl; + + second_cntl = __get_vvmcs(nvcpu->nv_vvmcx, SECONDARY_VM_EXEC_CONTROL); + if ( second_cntl & SECONDARY_EXEC_ENABLE_VPID ) + return 1; + return 0; +} + static void virtual_vmentry(struct cpu_user_regs *regs) { struct vcpu *v = current; @@ -897,6 +908,18 @@ static void virtual_vmentry(struct cpu_user_regs *regs) if ( nestedhvm_paging_mode_hap(v) ) __vmwrite(EPT_POINTER, get_shadow_eptp(v)); + /* nested VPID support! */ + if ( cpu_has_vmx_vpid && nvmx_vpid_enabled(nvcpu) ) + { + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + uint32_t new_vpid = __get_vvmcs(vvmcs, VIRTUAL_PROCESSOR_ID); + if ( nvmx->guest_vpid != new_vpid ) + { + hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(v).nv_n2asid); + nvmx->guest_vpid = new_vpid; + } + } + } static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs) @@ -1188,7 +1211,7 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs) if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR ) { vmreturn (regs, VMFAIL_INVALID); - return X86EMUL_OKAY; + return X86EMUL_OKAY; } launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, @@ -1363,6 +1386,9 @@ int nvmx_handle_invept(struct cpu_user_regs *regs) unsigned long eptp; u64 inv_type; + if(!cpu_has_vmx_ept) + return X86EMUL_EXCEPTION; + if ( decode_vmx_inst(regs, &decode, &eptp, 0) != X86EMUL_OKAY ) return X86EMUL_EXCEPTION; @@ -1401,6 +1427,37 @@ int nvmx_handle_invept(struct cpu_user_regs *regs) (((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \ ((uint32_t)(__emul_value(enable1, default1) | host_value))) +int nvmx_handle_invvpid(struct cpu_user_regs *regs) +{ + struct vmx_inst_decoded decode; + unsigned long vpid; + u64 inv_type; + + if(!cpu_has_vmx_vpid) + return X86EMUL_EXCEPTION; + + if ( decode_vmx_inst(regs, &decode, &vpid, 0) + != X86EMUL_OKAY ) + return X86EMUL_EXCEPTION; + + inv_type = reg_read(regs, decode.reg2); + gdprintk(XENLOG_DEBUG,"inv_type:%ld, vpid:%lx\n", inv_type, vpid); + + switch ( inv_type ){ + /* Just invalidate all tlb entries for all types! */ + case INVVPID_INDIVIDUAL_ADDR: + case INVVPID_SINGLE_CONTEXT: + case INVVPID_ALL_CONTEXT: + hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(current).nv_n2asid); + break; + default: + return X86EMUL_EXCEPTION; + } + vmreturn(regs, VMSUCCEED); + + return X86EMUL_OKAY; +} + /* * Capability reporting */ @@ -1458,6 +1515,7 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) /* 1-seetings */ data = SECONDARY_EXEC_DESCRIPTOR_TABLE_EXITING | SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | + SECONDARY_EXEC_ENABLE_VPID | SECONDARY_EXEC_ENABLE_EPT; data = gen_vmx_msr(data, 0, host_data); break; diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c index 8dfb70a..d0be5ce 100644 --- a/xen/arch/x86/mm/hap/nested_ept.c +++ b/xen/arch/x86/mm/hap/nested_ept.c @@ -48,7 +48,7 @@ #define EPT_EMT_WB 6 #define EPT_EMT_UC 0 -#define NEPT_VPID_CAP_BITS 0x0000000006134140ul +#define NEPT_VPID_CAP_BITS 0xf0106134140ul #define NEPT_1G_ENTRY_FLAG (1 << 11) #define NEPT_2M_ENTRY_FLAG (1 << 10) @@ -126,8 +126,9 @@ static bool_t nept_present_check(uint64_t entry) uint64_t nept_get_ept_vpid_cap(void) { - /*TODO: exposed ept and vpid features*/ - return NEPT_VPID_CAP_BITS; + if (cpu_has_vmx_ept && cpu_has_vmx_vpid) + return NEPT_VPID_CAP_BITS; + return 0; } static uint32_t diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index fcdce62..1e7a6d7 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -37,6 +37,7 @@ struct nestedvmx { uint32_t exit_reason; uint32_t exit_qual; } ept; + uint32_t guest_vpid; }; #define vcpu_2_nvmx(v) (vcpu_nestedhvm(v).u.nvmx) @@ -191,6 +192,7 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs); int nvmx_handle_vmresume(struct cpu_user_regs *regs); int nvmx_handle_vmlaunch(struct cpu_user_regs *regs); int nvmx_handle_invept(struct cpu_user_regs *regs); +int nvmx_handle_invvpid(struct cpu_user_regs *regs); int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content); int nvmx_msr_write_intercept(unsigned int msr, -- 1.7.1
Hi, Guys Do you have comments for this patchset ? Thanks! Xiantao> -----Original Message----- > From: Zhang, Xiantao > Sent: Tuesday, December 11, 2012 1:57 AM > To: xen-devel@lists.xensource.com > Cc: JBeulich@suse.com; keir@xen.org; Dong, Eddie; Nakajima, Jun; Zhang, > Xiantao > Subject: [PATCH 00/11] Add virtual EPT support Xen. > > From: Zhang Xiantao <xiantao.zhang@intel.com> > > With virtual EPT support, L1 hyerpvisor can use EPT hardware for L2 guest''s > memory virtualization. In this way, L2 guest''s performance can be improved > sharply. According to our testing, some benchmarks can show > 5x > performance gain. > > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> > > Zhang Xiantao (11): > nestedhap: Change hostcr3 and p2m->cr3 to meaningful words > nestedhap: Change nested p2m''s walker to vendor-specific > nested_ept: Implement guest ept''s walker > nested_ept: Add permission check for success case > EPT: Make ept data structure or operations neutral > nEPT: Try to enable EPT paging for L2 guest. > nEPT: Sync PDPTR fields if L2 guest in PAE paging mode > nEPT: Use minimal permission for nested p2m. > nEPT: handle invept instruction from L1 VMM > nEPT: expost EPT capablity to L1 VMM > nVMX: Expose VPID capability to nested VMM. > > xen/arch/x86/hvm/hvm.c | 7 +- > xen/arch/x86/hvm/svm/nestedsvm.c | 31 +++ > xen/arch/x86/hvm/svm/svm.c | 3 +- > xen/arch/x86/hvm/vmx/vmcs.c | 2 +- > xen/arch/x86/hvm/vmx/vmx.c | 76 +++++--- > xen/arch/x86/hvm/vmx/vvmx.c | 208 ++++++++++++++++++- > xen/arch/x86/mm/guest_walk.c | 12 +- > xen/arch/x86/mm/hap/Makefile | 1 + > xen/arch/x86/mm/hap/nested_ept.c | 345 > +++++++++++++++++++++++++++++++ > xen/arch/x86/mm/hap/nested_hap.c | 79 +++---- > xen/arch/x86/mm/mm-locks.h | 2 +- > xen/arch/x86/mm/p2m-ept.c | 96 +++++++-- > xen/arch/x86/mm/p2m.c | 44 +++-- > xen/arch/x86/mm/shadow/multi.c | 2 +- > xen/include/asm-x86/guest_pt.h | 8 + > xen/include/asm-x86/hvm/hvm.h | 9 +- > xen/include/asm-x86/hvm/nestedhvm.h | 1 + > xen/include/asm-x86/hvm/svm/nestedsvm.h | 3 + > xen/include/asm-x86/hvm/vmx/vmcs.h | 31 ++- > xen/include/asm-x86/hvm/vmx/vmx.h | 6 +- > xen/include/asm-x86/hvm/vmx/vvmx.h | 29 +++- > xen/include/asm-x86/p2m.h | 17 +- > 22 files changed, 859 insertions(+), 153 deletions(-) create mode 100644 > xen/arch/x86/mm/hap/nested_ept.c
>>> On 13.12.12 at 01:31, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: > Hi, Guys > Do you have comments for this patchset ? Thanks! > XiantaoI was actually hoping for Tim to take a look. But you should probably be a little more patient - it''s been just two days since this got posted. Jan>> -----Original Message----- >> From: Zhang, Xiantao >> Sent: Tuesday, December 11, 2012 1:57 AM >> To: xen-devel@lists.xensource.com >> Cc: JBeulich@suse.com; keir@xen.org; Dong, Eddie; Nakajima, Jun; Zhang, >> Xiantao >> Subject: [PATCH 00/11] Add virtual EPT support Xen. >> >> From: Zhang Xiantao <xiantao.zhang@intel.com> >> >> With virtual EPT support, L1 hyerpvisor can use EPT hardware for L2 guest''s >> memory virtualization. In this way, L2 guest''s performance can be improved >> sharply. According to our testing, some benchmarks can show > 5x >> performance gain. >> >> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> >> >> Zhang Xiantao (11): >> nestedhap: Change hostcr3 and p2m->cr3 to meaningful words >> nestedhap: Change nested p2m''s walker to vendor-specific >> nested_ept: Implement guest ept''s walker >> nested_ept: Add permission check for success case >> EPT: Make ept data structure or operations neutral >> nEPT: Try to enable EPT paging for L2 guest. >> nEPT: Sync PDPTR fields if L2 guest in PAE paging mode >> nEPT: Use minimal permission for nested p2m. >> nEPT: handle invept instruction from L1 VMM >> nEPT: expost EPT capablity to L1 VMM >> nVMX: Expose VPID capability to nested VMM. >> >> xen/arch/x86/hvm/hvm.c | 7 +- >> xen/arch/x86/hvm/svm/nestedsvm.c | 31 +++ >> xen/arch/x86/hvm/svm/svm.c | 3 +- >> xen/arch/x86/hvm/vmx/vmcs.c | 2 +- >> xen/arch/x86/hvm/vmx/vmx.c | 76 +++++--- >> xen/arch/x86/hvm/vmx/vvmx.c | 208 ++++++++++++++++++- >> xen/arch/x86/mm/guest_walk.c | 12 +- >> xen/arch/x86/mm/hap/Makefile | 1 + >> xen/arch/x86/mm/hap/nested_ept.c | 345 >> +++++++++++++++++++++++++++++++ >> xen/arch/x86/mm/hap/nested_hap.c | 79 +++---- >> xen/arch/x86/mm/mm-locks.h | 2 +- >> xen/arch/x86/mm/p2m-ept.c | 96 +++++++-- >> xen/arch/x86/mm/p2m.c | 44 +++-- >> xen/arch/x86/mm/shadow/multi.c | 2 +- >> xen/include/asm-x86/guest_pt.h | 8 + >> xen/include/asm-x86/hvm/hvm.h | 9 +- >> xen/include/asm-x86/hvm/nestedhvm.h | 1 + >> xen/include/asm-x86/hvm/svm/nestedsvm.h | 3 + >> xen/include/asm-x86/hvm/vmx/vmcs.h | 31 ++- >> xen/include/asm-x86/hvm/vmx/vmx.h | 6 +- >> xen/include/asm-x86/hvm/vmx/vvmx.h | 29 +++- >> xen/include/asm-x86/p2m.h | 17 +- >> 22 files changed, 859 insertions(+), 153 deletions(-) create mode 100644 >> xen/arch/x86/mm/hap/nested_ept.c
Tim Deegan
2012-Dec-13 14:52 UTC
Re: [PATCH 01/11] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words
At 01:57 +0800 on 11 Dec (1355191033), xiantao.zhang@intel.com wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com> > > VMX doesn''t have the concept about host cr3 for nested p2m, > and only SVM has, so change it to netural words. > > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>Acked-by: Tim Deegan <tim@xen.org>
Tim Deegan
2012-Dec-13 14:52 UTC
Re: [PATCH 02/11] nestedhap: Change nested p2m''s walker to vendor-specific
At 01:57 +0800 on 11 Dec (1355191034), xiantao.zhang@intel.com wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com> > > EPT and NPT adopts differnt formats for each-level entry, > so change the walker functions to vendor-specific. > > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>Acked-by: Tim Deegan <tim@xen.org>
At 01:57 +0800 on 11 Dec (1355191035), xiantao.zhang@intel.com wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com> > > Implment guest EPT PT walker, some logic is based on shadow''s > ia32e PT walker. During the PT walking, if the target pages are > not in memory, use RETRY mechanism and get a chance to let the > target page back. > > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>The design looks pretty good. A few comments below on code details -- I think the only big one is that the ept walker shouldn''t force eptes into ''normal'' pte types just so it can reuse the walk_t struct.> @@ -88,10 +88,11 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty) > > /* If the map is non-NULL, we leave this function having > * acquired an extra ref on mfn_to_page(*mfn) */ > -static inline void *map_domain_gfn(struct p2m_domain *p2m, > +void *map_domain_gfn(struct p2m_domain *p2m, > gfn_t gfn, > mfn_t *mfn, > p2m_type_t *p2mt, > + p2m_query_t *q,I think this should just be a plain p2m_query_t and not a pointer to one; the code below only dereferences the pointer to read it. That will save you having a variable just to hold ''P2M_ALLOC | P2M_UNSHARE'' in a few places below.> --- /dev/null > +++ b/xen/arch/x86/mm/hap/nested_ept.c > +/* For EPT''s walker reserved bits and EMT check */ > +#define EPT_MUST_RSV_BITS (((1ull << PADDR_BITS) -1) & \ > + ~((1ull << paddr_bits) - 1)) > + > + > +#define EPT_EMT_WB 6 > +#define EPT_EMT_UC 0These two definitions should be in vmx.h along with the other architectural constants for EPTEs.> + > +#define NEPT_VPID_CAP_BITS 0 > + > +#define NEPT_1G_ENTRY_FLAG (1 << 11) > +#define NEPT_2M_ENTRY_FLAG (1 << 10) > +#define NEPT_4K_ENTRY_FLAG (1 << 9) > + > +/* Always expose 1G and 2M capability to guest, > + so don''t need additional check */ > +bool_t nept_sp_entry(uint64_t entry) > +{ > + return !!(entry & EPTE_SUPER_PAGE_MASK); > +} > + > +static bool_t nept_rsv_bits_check(uint64_t entry, uint32_t level) > +{ > + uint64_t rsv_bits = EPT_MUST_RSV_BITS; > + > + switch ( level ){ > + case 1: > + break; > + case 2 ... 3: > + if (nept_sp_entry(entry)) > + rsv_bits |= ((1ull << (9 * (level -1 ))) -1) << PAGE_SHIFT; > + else > + rsv_bits |= 0xfull << 3;Please use EPTE_EMT_MASK rather than open-coding it.> + break; > + case 4: > + rsv_bits |= 0xf8;Again, please use EPTE_EMT_MASK | EPTE_IGMT_MASK | EPTE_SUPER_PAGE_MASK.> + break; > + default: > + printk("Unsupported EPT paging level: %d\n", level); > + } > + if ( ((entry & rsv_bits) ^ rsv_bits) == rsv_bits ) > + return 0;This XOR is useful in the normal walker because we care about _which_ bits are wrong. Here, you can just return !(entry & rsv_bits) for the same result.> + return 1; > +} > + > +/* EMT checking*/ > +static bool_t nept_emt_bits_check(uint64_t entry, uint32_t level) > +{ > + ept_entry_t e; > + e.epte = entry; > + if ( e.sp || level == 1 ) { > + if ( e.emt == 2 || e.emt == 3 || e.emt == 7 ) > + return 1;Please define more of the EPT_EMT_* constants for these values and use them.> + } > + return 0; > +} > + > +static bool_t nept_rwx_bits_check(uint64_t entry) { > + /*write only or write/execute only*/ > + uint8_t rwx_bits = entry & 0x7; > + > + if ( rwx_bits == 2 || rwx_bits == 6) > + return 1; > + if ( rwx_bits == 4 && !(NEPT_VPID_CAP_BITS & > + VMX_EPT_EXEC_ONLY_SUPPORTED))Please pass the entry as an ept_entry_t and check the named r, w and x fields rather than using magic numbers.> + return 1; > + return 0; > +} > + > +/* nept''s misconfiguration check */ > +static bool_t nept_misconfiguration_check(uint64_t entry, uint32_t level) > +{ > + return (nept_rsv_bits_check(entry, level) || > + nept_emt_bits_check(entry, level) || > + nept_rwx_bits_check(entry)); > +} > + > +static bool_t nept_present_check(uint64_t entry) > +{ > + if (entry & 0x7)Again, please pass an ept_entry_t and check the r/w/x fields.> + return 1; > + return 0; > +} > + > +uint64_t nept_get_ept_vpid_cap(void) > +{ > + /*TODO: exposed ept and vpid features*/This TODO comment doesn''t get removed later in the series. Is returning 0 here always OK?> + return NEPT_VPID_CAP_BITS; > +} > + > +static uint32_t > +nept_walk_tables(struct vcpu *v, unsigned long l2ga, walk_t *gw) > +{ > + p2m_type_t p2mt; > + uint32_t rc = 0, ret = 0, gflags; > + struct domain *d = v->domain; > + struct p2m_domain *p2m = d->arch.p2m; > + gfn_t base_gfn = _gfn(nhvm_vcpu_p2m_base(v) >> PAGE_SHIFT); > + p2m_query_t qt = P2M_ALLOC; > + > + guest_l1e_t *l1p = NULL; > + guest_l2e_t *l2p = NULL; > + guest_l3e_t *l3p = NULL; > + guest_l4e_t *l4p = NULL;These aren''t realy guest_l*es, so I think you should use ept_entry_t * to point to them. While you''re at it, why not define an equivalent ept_walk_t struct that uses the ept-specific types instead of putting EPT entries in a normal walk_t? Also, unlike the normal guest walker, you don''t need to hold these maps open for writing A/D bits, so you could just use a single pointer and unmap as you go.> + sp = nept_sp_entry(gw->l3e.l3); > + /* Super 1G entry */ > + if ( sp ) > + { > + /* Generate a fake l1 table entry so callers don''t all > + * have to understand superpages. */You only have one caller for this function, and it does understand superpages -- it explicitly check for them. So I think you can avoid this part altogether (likewise for 2M superpages) and just DTRT in the caller. Cheers, Tim.> + gfn_t start = guest_l3e_get_gfn(gw->l3e); > + > + /* Increment the pfn by the right number of 4k pages. */ > + start = _gfn((gfn_x(start) & ~GUEST_L3_GFN_MASK) + > + ((l2ga >> PAGE_SHIFT) & GUEST_L3_GFN_MASK)); > + gflags = (gw->l3e.l3 & 0x7f) | NEPT_1G_ENTRY_FLAG; > + gw->l1e = guest_l1e_from_gfn(start, gflags); > + gw->l2mfn = gw->l1mfn = _mfn(INVALID_MFN); > + goto done; > + } > +
Tim Deegan
2012-Dec-13 15:47 UTC
Re: [PATCH 04/11] nEPT: Do further permission check for sucessful translation.
At 01:57 +0800 on 11 Dec (1355191036), xiantao.zhang@intel.com wrote:> +static > +bool_t nept_permission_check(uint32_t rwx_acc, uint32_t rwx_bits) > +{ > + if ( ((rwx_acc & 0x1) && !(rwx_bits & 0x1)) || > + ((rwx_acc & 0x2) && !(rwx_bits & 0x2 )) || > + ((rwx_acc & 0x4) && !(rwx_bits & 0x4 )) ) > + return 0;Ugh. It would be nice to use human-readable names for these. Or, since you know these are both <= 0x7, just test for !(rwx_acc & ~rwx_bits). Also, this should really be folded into the previous patch. Cheers, Tim.> + > /* Translate a L2 guest address to L1 gpa via L1 EPT paging structure */ > > int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, > @@ -301,11 +311,17 @@ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, > rwx_bits = gw.l4e.l4 & gw.l3e.l3 & 0x7; > *page_order = 18; > } > - else > + else { > gdprintk(XENLOG_ERR, "Uncorrect l1 entry!\n"); > - > - *l1gfn = guest_l1e_get_paddr(gw.l1e) >> PAGE_SHIFT; > - break; > + BUG(); > + } > + if ( nept_permission_check(rwx_acc, rwx_bits) ) > + { > + *l1gfn = guest_l1e_get_paddr(gw.l1e) >> PAGE_SHIFT; > + break; > + } > + rc = EPT_TRANSLATE_VIOLATION; > + /* Fall through to EPT violation if permission check fails. */ > case EPT_TRANSLATE_VIOLATION: > *exit_qual = (*exit_qual & 0xffffffc0) | (rwx_bits << 3) | rwx_acc; > *exit_reason = EXIT_REASON_EPT_VIOLATION; > -- > 1.7.1 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Tim Deegan
2012-Dec-13 16:04 UTC
Re: [PATCH 05/11] EPT: Make ept data structure or operations neutral
At 01:57 +0800 on 11 Dec (1355191037), xiantao.zhang@intel.com wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com> > > Share the current EPT logic with nested EPT case, so > make the related data structure or operations netural > to comment EPT and nested EPT. > > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>Since the struct ept_data is only 16 bytes long, why not just embed it in the struct p2m_domain, as> mm_lock_t lock; /* Locking of private pod structs, * > * not relying on the p2m lock. */ > } pod; > + union { > + struct ept_data ept; > + /* NPT equivalent could go here if needed */ > + }; > };That would tidy up the alloc/free stuff a fair bit, though you''d still need it for the cpumask, I guess. It would be nice to wrap the alloc/free functions up in the usual way so we dont get ept-specific functions with arch-independednt names. Otherwise taht looks fine. Cheers, Tim.> --- > xen/arch/x86/hvm/vmx/vmcs.c | 2 +- > xen/arch/x86/hvm/vmx/vmx.c | 39 +++++++++------ > xen/arch/x86/mm/p2m-ept.c | 96 ++++++++++++++++++++++++++++-------- > xen/arch/x86/mm/p2m.c | 16 +++++- > xen/include/asm-x86/hvm/vmx/vmcs.h | 30 +++++++---- > xen/include/asm-x86/hvm/vmx/vmx.h | 6 ++- > xen/include/asm-x86/p2m.h | 1 + > 7 files changed, 137 insertions(+), 53 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c > index 9adc7a4..b9ebdfe 100644 > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -942,7 +942,7 @@ static int construct_vmcs(struct vcpu *v) > } > > if ( paging_mode_hap(d) ) > - __vmwrite(EPT_POINTER, d->arch.hvm_domain.vmx.ept_control.eptp); > + __vmwrite(EPT_POINTER, d->arch.hvm_domain.vmx.ept.ept_ctl.eptp); > > if ( cpu_has_vmx_pat && paging_mode_hap(d) ) > { > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index c67ac59..06455bf 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -79,22 +79,23 @@ static void __ept_sync_domain(void *info); > static int vmx_domain_initialise(struct domain *d) > { > int rc; > + struct ept_data *ept = &d->arch.hvm_domain.vmx.ept; > > /* Set the memory type used when accessing EPT paging structures. */ > - d->arch.hvm_domain.vmx.ept_control.ept_mt = EPT_DEFAULT_MT; > + ept->ept_ctl.ept_mt = EPT_DEFAULT_MT; > > /* set EPT page-walk length, now it''s actual walk length - 1, i.e. 3 */ > - d->arch.hvm_domain.vmx.ept_control.ept_wl = 3; > + ept->ept_ctl.ept_wl = 3; > > - d->arch.hvm_domain.vmx.ept_control.asr > + ept->ept_ctl.asr > pagetable_get_pfn(p2m_get_pagetable(p2m_get_hostp2m(d))); > > - if ( !zalloc_cpumask_var(&d->arch.hvm_domain.vmx.ept_synced) ) > + if ( !zalloc_cpumask_var(&ept->ept_synced) ) > return -ENOMEM; > > if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 ) > { > - free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced); > + free_cpumask_var(ept->ept_synced); > return rc; > } > > @@ -103,9 +104,10 @@ static int vmx_domain_initialise(struct domain *d) > > static void vmx_domain_destroy(struct domain *d) > { > + struct ept_data *ept = &d->arch.hvm_domain.vmx.ept; > if ( paging_mode_hap(d) ) > - on_each_cpu(__ept_sync_domain, d, 1); > - free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced); > + on_each_cpu(__ept_sync_domain, p2m_get_hostp2m(d), 1); > + free_cpumask_var(ept->ept_synced); > vmx_free_vlapic_mapping(d); > } > > @@ -641,6 +643,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v) > { > struct domain *d = v->domain; > unsigned long old_cr4 = read_cr4(), new_cr4 = mmu_cr4_features; > + struct ept_data *ept_data = p2m_get_hostp2m(d)->hap_data; > > /* HOST_CR4 in VMCS is always mmu_cr4_features. Sync CR4 now. */ > if ( old_cr4 != new_cr4 ) > @@ -650,10 +653,10 @@ static void vmx_ctxt_switch_to(struct vcpu *v) > { > unsigned int cpu = smp_processor_id(); > /* Test-and-test-and-set this CPU in the EPT-is-synced mask. */ > - if ( !cpumask_test_cpu(cpu, d->arch.hvm_domain.vmx.ept_synced) && > + if ( !cpumask_test_cpu(cpu, ept_get_synced_mask(ept_data)) && > !cpumask_test_and_set_cpu(cpu, > - d->arch.hvm_domain.vmx.ept_synced) ) > - __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0); > + ept_get_synced_mask(ept_data)) ) > + __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0); > } > > vmx_restore_guest_msrs(v); > @@ -1218,12 +1221,16 @@ static void vmx_update_guest_efer(struct vcpu *v) > > static void __ept_sync_domain(void *info) > { > - struct domain *d = info; > - __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0); > + struct p2m_domain *p2m = info; > + struct ept_data *ept_data = p2m->hap_data; > + > + __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0); > } > > -void ept_sync_domain(struct domain *d) > +void ept_sync_domain(struct p2m_domain *p2m) > { > + struct domain *d = p2m->domain; > + struct ept_data *ept_data = p2m->hap_data; > /* Only if using EPT and this domain has some VCPUs to dirty. */ > if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] ) > return; > @@ -1236,11 +1243,11 @@ void ept_sync_domain(struct domain *d) > * the ept_synced mask before on_selected_cpus() reads it, resulting in > * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack. > */ > - cpumask_and(d->arch.hvm_domain.vmx.ept_synced, > + cpumask_and(ept_get_synced_mask(ept_data), > d->domain_dirty_cpumask, &cpu_online_map); > > - on_selected_cpus(d->arch.hvm_domain.vmx.ept_synced, > - __ept_sync_domain, d, 1); > + on_selected_cpus(ept_get_synced_mask(ept_data), > + __ept_sync_domain, p2m, 1); > } > > void nvmx_enqueue_n2_exceptions(struct vcpu *v, > diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c > index c964f54..8adf3f9 100644 > --- a/xen/arch/x86/mm/p2m-ept.c > +++ b/xen/arch/x86/mm/p2m-ept.c > @@ -291,9 +291,11 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, > int need_modify_vtd_table = 1; > int vtd_pte_present = 0; > int needs_sync = 1; > - struct domain *d = p2m->domain; > ept_entry_t old_entry = { .epte = 0 }; > + struct ept_data *ept_data = p2m->hap_data; > + struct domain *d = p2m->domain; > > + ASSERT(ept_data); > /* > * the caller must make sure: > * 1. passing valid gfn and mfn at order boundary. > @@ -301,17 +303,17 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, > * 3. passing a valid order. > */ > if ( ((gfn | mfn_x(mfn)) & ((1UL << order) - 1)) || > - ((u64)gfn >> ((ept_get_wl(d) + 1) * EPT_TABLE_ORDER)) || > + ((u64)gfn >> ((ept_get_wl(ept_data) + 1) * EPT_TABLE_ORDER)) || > (order % EPT_TABLE_ORDER) ) > return 0; > > - ASSERT((target == 2 && hvm_hap_has_1gb(d)) || > - (target == 1 && hvm_hap_has_2mb(d)) || > + ASSERT((target == 2 && hvm_hap_has_1gb()) || > + (target == 1 && hvm_hap_has_2mb()) || > (target == 0)); > > - table = map_domain_page(ept_get_asr(d)); > + table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); > > - for ( i = ept_get_wl(d); i > target; i-- ) > + for ( i = ept_get_wl(ept_data); i > target; i-- ) > { > ret = ept_next_level(p2m, 0, &table, &gfn_remainder, i); > if ( !ret ) > @@ -439,9 +441,11 @@ out: > unmap_domain_page(table); > > if ( needs_sync ) > - ept_sync_domain(p2m->domain); > + ept_sync_domain(p2m); > > - if ( rv && iommu_enabled && need_iommu(p2m->domain) && need_modify_vtd_table ) > + /* For non-nested p2m, may need to change VT-d page table.*/ > + if ( rv && !p2m_is_nestedp2m(p2m) && iommu_enabled && need_iommu(p2m->domain) && > + need_modify_vtd_table ) > { > if ( iommu_hap_pt_share ) > iommu_pte_flush(d, gfn, (u64*)ept_entry, order, vtd_pte_present); > @@ -488,14 +492,14 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m, > unsigned long gfn, p2m_type_t *t, p2m_access_t* a, > p2m_query_t q, unsigned int *page_order) > { > - struct domain *d = p2m->domain; > - ept_entry_t *table = map_domain_page(ept_get_asr(d)); > + ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); > unsigned long gfn_remainder = gfn; > ept_entry_t *ept_entry; > u32 index; > int i; > int ret = 0; > mfn_t mfn = _mfn(INVALID_MFN); > + struct ept_data *ept_data = p2m->hap_data; > > *t = p2m_mmio_dm; > *a = p2m_access_n; > @@ -506,7 +510,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m, > > /* Should check if gfn obeys GAW here. */ > > - for ( i = ept_get_wl(d); i > 0; i-- ) > + for ( i = ept_get_wl(ept_data); i > 0; i-- ) > { > retry: > ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); > @@ -588,19 +592,20 @@ out: > static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m, > unsigned long gfn, int *level) > { > - ept_entry_t *table = map_domain_page(ept_get_asr(p2m->domain)); > + ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); > unsigned long gfn_remainder = gfn; > ept_entry_t *ept_entry; > ept_entry_t content = { .epte = 0 }; > u32 index; > int i; > int ret=0; > + struct ept_data *ept_data = p2m->hap_data; > > /* This pfn is higher than the highest the p2m map currently holds */ > if ( gfn > p2m->max_mapped_pfn ) > goto out; > > - for ( i = ept_get_wl(p2m->domain); i > 0; i-- ) > + for ( i = ept_get_wl(ept_data); i > 0; i-- ) > { > ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); > if ( !ret || ret == GUEST_TABLE_POD_PAGE ) > @@ -622,7 +627,8 @@ static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m, > void ept_walk_table(struct domain *d, unsigned long gfn) > { > struct p2m_domain *p2m = p2m_get_hostp2m(d); > - ept_entry_t *table = map_domain_page(ept_get_asr(d)); > + struct ept_data *ept_data = p2m->hap_data; > + ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); > unsigned long gfn_remainder = gfn; > > int i; > @@ -638,7 +644,7 @@ void ept_walk_table(struct domain *d, unsigned long gfn) > goto out; > } > > - for ( i = ept_get_wl(d); i >= 0; i-- ) > + for ( i = ept_get_wl(ept_data); i >= 0; i-- ) > { > ept_entry_t *ept_entry, *next; > u32 index; > @@ -778,16 +784,16 @@ static void ept_change_entry_type_page(mfn_t ept_page_mfn, int ept_page_level, > static void ept_change_entry_type_global(struct p2m_domain *p2m, > p2m_type_t ot, p2m_type_t nt) > { > - struct domain *d = p2m->domain; > - if ( ept_get_asr(d) == 0 ) > + struct ept_data *ept_data = p2m->hap_data; > + if ( ept_get_asr(ept_data) == 0 ) > return; > > BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt)); > BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt == p2m_mmio_direct)); > > - ept_change_entry_type_page(_mfn(ept_get_asr(d)), ept_get_wl(d), ot, nt); > + ept_change_entry_type_page(_mfn(ept_get_asr(ept_data)), ept_get_wl(ept_data), ot, nt); > > - ept_sync_domain(d); > + ept_sync_domain(p2m); > } > > void ept_p2m_init(struct p2m_domain *p2m) > @@ -811,6 +817,7 @@ static void ept_dump_p2m_table(unsigned char key) > unsigned long gfn, gfn_remainder; > unsigned long record_counter = 0; > struct p2m_domain *p2m; > + struct ept_data *ept_data; > > for_each_domain(d) > { > @@ -818,15 +825,16 @@ static void ept_dump_p2m_table(unsigned char key) > continue; > > p2m = p2m_get_hostp2m(d); > + ept_data = p2m->hap_data; > printk("\ndomain%d EPT p2m table: \n", d->domain_id); > > for ( gfn = 0; gfn <= p2m->max_mapped_pfn; gfn += (1 << order) ) > { > gfn_remainder = gfn; > mfn = _mfn(INVALID_MFN); > - table = map_domain_page(ept_get_asr(d)); > + table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); > > - for ( i = ept_get_wl(d); i > 0; i-- ) > + for ( i = ept_get_wl(ept_data); i > 0; i-- ) > { > ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); > if ( ret != GUEST_TABLE_NORMAL_PAGE ) > @@ -858,6 +866,52 @@ out: > } > } > > +int alloc_p2m_hap_data(struct p2m_domain *p2m) > +{ > + struct domain *d = p2m->domain; > + struct ept_data *ept; > + > + ASSERT(d); > + if (!hap_enabled(d)) > + return 0; > + > + p2m->hap_data = ept = xzalloc(struct ept_data); > + if ( !p2m->hap_data ) > + return -ENOMEM; > + if ( !zalloc_cpumask_var(&ept->ept_synced) ) > + { > + xfree(ept); > + p2m->hap_data = NULL; > + return -ENOMEM; > + } > + return 0; > +} > + > +void free_p2m_hap_data(struct p2m_domain *p2m) > +{ > + struct ept_data *ept; > + > + if ( !hap_enabled(p2m->domain) ) > + return; > + > + if ( p2m_is_nestedp2m(p2m)) { > + ept = p2m->hap_data; > + if ( ept ) { > + free_cpumask_var(ept->ept_synced); > + xfree(ept); > + } > + } > +} > + > +void p2m_init_hap_data(struct p2m_domain *p2m) > +{ > + struct ept_data *ept = p2m->hap_data; > + > + ept->ept_ctl.ept_wl = 3; > + ept->ept_ctl.ept_mt = EPT_DEFAULT_MT; > + ept->ept_ctl.asr = pagetable_get_pfn(p2m_get_pagetable(p2m)); > +} > + > static struct keyhandler ept_p2m_table = { > .diagnostic = 0, > .u.fn = ept_dump_p2m_table, > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c > index 62c2d78..799bbfb 100644 > --- a/xen/arch/x86/mm/p2m.c > +++ b/xen/arch/x86/mm/p2m.c > @@ -105,6 +105,8 @@ p2m_init_nestedp2m(struct domain *d) > if ( !zalloc_cpumask_var(&p2m->dirty_cpumask) ) > return -ENOMEM; > p2m_initialise(d, p2m); > + if ( cpu_has_vmx && alloc_p2m_hap_data(p2m) ) > + return -ENOMEM; > p2m->write_p2m_entry = nestedp2m_write_p2m_entry; > list_add(&p2m->np2m_list, &p2m_get_hostp2m(d)->np2m_list); > } > @@ -126,12 +128,14 @@ int p2m_init(struct domain *d) > return -ENOMEM; > } > p2m_initialise(d, p2m); > + if ( hap_enabled(d) && cpu_has_vmx) > + p2m->hap_data = &d->arch.hvm_domain.vmx.ept; > > /* Must initialise nestedp2m unconditionally > * since nestedhvm_enabled(d) returns false here. > * (p2m_init runs too early for HVM_PARAM_* options) */ > rc = p2m_init_nestedp2m(d); > - if ( rc ) > + if ( rc ) > p2m_final_teardown(d); > return rc; > } > @@ -354,6 +358,8 @@ int p2m_alloc_table(struct p2m_domain *p2m) > > if ( hap_enabled(d) ) > iommu_share_p2m_table(d); > + if ( p2m_is_nestedp2m(p2m) && hap_enabled(d) ) > + p2m_init_hap_data(p2m); > > P2M_PRINTK("populating p2m table\n"); > > @@ -436,12 +442,16 @@ void p2m_teardown(struct p2m_domain *p2m) > static void p2m_teardown_nestedp2m(struct domain *d) > { > uint8_t i; > + struct p2m_domain *p2m; > > for (i = 0; i < MAX_NESTEDP2M; i++) { > if ( !d->arch.nested_p2m[i] ) > continue; > - free_cpumask_var(d->arch.nested_p2m[i]->dirty_cpumask); > - xfree(d->arch.nested_p2m[i]); > + p2m = d->arch.nested_p2m[i]; > + if ( p2m->hap_data ) > + free_p2m_hap_data(p2m); > + free_cpumask_var(p2m->dirty_cpumask); > + xfree(p2m); > d->arch.nested_p2m[i] = NULL; > } > } > diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h > index 9a728b6..e6b4e3b 100644 > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > @@ -56,26 +56,34 @@ struct vmx_msr_state { > > #define EPT_DEFAULT_MT MTRR_TYPE_WRBACK > > -struct vmx_domain { > - unsigned long apic_access_mfn; > - union { > - struct { > +union eptp_control{ > + struct { > u64 ept_mt :3, > ept_wl :3, > rsvd :6, > asr :52; > }; > u64 eptp; > - } ept_control; > +}; > + > +struct ept_data{ > + union eptp_control ept_ctl; > cpumask_var_t ept_synced; > }; > > -#define ept_get_wl(d) \ > - ((d)->arch.hvm_domain.vmx.ept_control.ept_wl) > -#define ept_get_asr(d) \ > - ((d)->arch.hvm_domain.vmx.ept_control.asr) > -#define ept_get_eptp(d) \ > - ((d)->arch.hvm_domain.vmx.ept_control.eptp) > +struct vmx_domain { > + unsigned long apic_access_mfn; > + struct ept_data ept; > +}; > + > +#define ept_get_wl(ept_data) \ > + (((struct ept_data*)(ept_data))->ept_ctl.ept_wl) > +#define ept_get_asr(ept_data) \ > + (((struct ept_data*)(ept_data))->ept_ctl.asr) > +#define ept_get_eptp(ept_data) \ > + (((struct ept_data*)(ept_data))->ept_ctl.eptp) > +#define ept_get_synced_mask(ept_data)\ > + (((struct ept_data*)(ept_data))->ept_synced) > > struct arch_vmx_struct { > /* Virtual address of VMCS. */ > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h > index aa5b080..573a12e 100644 > --- a/xen/include/asm-x86/hvm/vmx/vmx.h > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h > @@ -333,7 +333,7 @@ static inline void ept_sync_all(void) > __invept(INVEPT_ALL_CONTEXT, 0, 0); > } > > -void ept_sync_domain(struct domain *d); > +void ept_sync_domain(struct p2m_domain *p2m); > > static inline void vpid_sync_vcpu_gva(struct vcpu *v, unsigned long gva) > { > @@ -401,6 +401,10 @@ void setup_ept_dump(void); > > void update_guest_eip(void); > > +int alloc_p2m_hap_data(struct p2m_domain *p2m); > +void free_p2m_hap_data(struct p2m_domain *p2m); > +void p2m_init_hap_data(struct p2m_domain *p2m); > + > /* EPT violation qualifications definitions */ > #define _EPT_READ_VIOLATION 0 > #define EPT_READ_VIOLATION (1UL<<_EPT_READ_VIOLATION) > diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h > index 1807ad6..0fb1b2d 100644 > --- a/xen/include/asm-x86/p2m.h > +++ b/xen/include/asm-x86/p2m.h > @@ -277,6 +277,7 @@ struct p2m_domain { > mm_lock_t lock; /* Locking of private pod structs, * > * not relying on the p2m lock. */ > } pod; > + void *hap_data; > }; > > /* get host p2m table */ > -- > 1.7.1 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Tim Deegan
2012-Dec-13 16:16 UTC
Re: [PATCH 06/11] nEPT: Try to enable EPT paging for L2 guest.
At 01:57 +0800 on 11 Dec (1355191038), xiantao.zhang@intel.com wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com> > > Once found EPT is enabled by L1 VMM, enabled nested EPT support > for L2 guest. > > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>Acked-by: Tim Deegan <tim@xen.org> (though strictly speaking this isn''t x86/mm code)
Tim Deegan
2012-Dec-13 16:17 UTC
Re: [PATCH 07/11] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode
At 01:57 +0800 on 11 Dec (1355191039), xiantao.zhang@intel.com wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com> > > For PAE L2 guest, GUEST_DPPTR registers needs to be synced for each virtual > vmentry. > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> > --- > xen/arch/x86/hvm/vmx/vvmx.c | 10 ++++++++-- > 1 files changed, 8 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c > index ab68b52..3fc128b 100644 > --- a/xen/arch/x86/hvm/vmx/vvmx.c > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > @@ -824,9 +824,15 @@ static void load_shadow_guest_state(struct vcpu *v) > vvmcs_to_shadow(vvmcs, CR0_READ_SHADOW); > vvmcs_to_shadow(vvmcs, CR4_READ_SHADOW); > vvmcs_to_shadow(vvmcs, CR0_GUEST_HOST_MASK); > - vvmcs_to_shadow(vvmcs, CR4_GUEST_HOST_MASK);Did you really mean to remove this line as well? If so, it''ll need some explanation in the checkin description. Tim.> > - /* TODO: PDPTRs for nested ept */ > + if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) && > + (v->arch.hvm_vcpu.guest_efer & EFER_LMA) ) { > + vvmcs_to_shadow(vvmcs, GUEST_PDPTR0); > + vvmcs_to_shadow(vvmcs, GUEST_PDPTR1); > + vvmcs_to_shadow(vvmcs, GUEST_PDPTR2); > + vvmcs_to_shadow(vvmcs, GUEST_PDPTR3); > + } > + > /* TODO: CR3 target control */ > } > > -- > 1.7.1 > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Tim Deegan
2012-Dec-13 16:43 UTC
Re: [PATCH 08/11] nEPT: Use minimal permission for nested p2m.
At 01:57 +0800 on 11 Dec (1355191040), xiantao.zhang@intel.com wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com> > > Emulate permission check for the nested p2m. Current solution is to > use minimal permission, and once meet permission violation in L0, then > determin whether it is caused by guest EPT or host EPT > > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>> --- a/xen/arch/x86/hvm/svm/nestedsvm.c > +++ b/xen/arch/x86/hvm/svm/nestedsvm.c > @@ -1177,7 +1177,7 @@ nsvm_vmcb_hap_enabled(struct vcpu *v) > */ > int > nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, > - unsigned int *page_order, > + unsigned int *page_order, uint8_t *p2m_acc, > bool_t access_r, bool_t access_w, bool_t access_x)I don''t like these interface changes (see below) but if we do have them, at least make the SVM version use p2m_access_rwx, to match the old behaviour, rather than letting it use an uninitialised stack variable. :)> @@ -250,10 +251,13 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, > > page_order_20 = min(page_order_21, page_order_10); > > + if (p2ma_10 > p2m_access_rwx) > + p2ma_10 = p2m_access_rwx;That''s plain wrong. If the access type is p2m_access_rx2rw, this will give the guest write access to what ought to be a read-only page. I think it would be best to leave the p2m-access stuff to the p2m walkers, and not add all those extra p2ma arguments. Instead, just use the _actual_ access permissions of this fault as the p2ma. That way you know you have something that''s acceptabel to both p2m tables. I guess that will mean some extra faults on read-then-write behaviour. If those are measurable, we could look at pulling the p2m-access types out like this, but you''ll have to explicitly handle all the special types. Cheers, Tim.
Tim Deegan
2012-Dec-13 16:56 UTC
Re: [PATCH 09/11] nEPT: handle invept instruction from L1 VMM
At 01:57 +0800 on 11 Dec (1355191041), xiantao.zhang@intel.com wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com> > > Add the INVEPT instruction emulation logic. > > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>Looks fine, but you have some whitespace problems...> +int nvmx_handle_invept(struct cpu_user_regs *regs) > +{ > + struct vmx_inst_decoded decode; > + unsigned long eptp; > + u64 inv_type; > + > + if ( decode_vmx_inst(regs, &decode, &eptp, 0) > + != X86EMUL_OKAY ) > + return X86EMUL_EXCEPTION; > + > + inv_type = reg_read(regs, decode.reg2); > + gdprintk(XENLOG_DEBUG,"inv_type:%ld, eptp:%lx\n", inv_type, eptp); > + > + switch (inv_type){here> + case INVEPT_SINGLE_CONTEXT: > + { > + struct p2m_domain *p2m = vcpu_nestedhvm(current).nv_p2m; > + if ( p2m ) > + { > + p2m_flush(current, p2m); > + ept_sync_domain(p2m);and again here (hard tabs)> + } > + } > + break;and again. With those fixed, Acked-by: Tim Deegan <tim@xen.org> (again with the caveat that this isn''t under x86/mm) Cheers, Tim.
At 01:57 +0800 on 11 Dec (1355191042), xiantao.zhang@intel.com wrote:> --- a/xen/arch/x86/mm/hap/nested_ept.c > +++ b/xen/arch/x86/mm/hap/nested_ept.c > @@ -48,7 +48,7 @@ > #define EPT_EMT_WB 6 > #define EPT_EMT_UC 0 > > -#define NEPT_VPID_CAP_BITS 0 > +#define NEPT_VPID_CAP_BITS 0x0000000006134140ulAh, I didn''t spot this earlier. I think for clarity the definition of nept_get_ept_vpid_cap() should be moved entirely into this faile (and presuambly the TODO comment can be removed). Where does the magic number 0x0000000006134140ul come from? Can it be broken out into meaningful constants? Cheers, Tim.
Tim Deegan
2012-Dec-13 17:15 UTC
Re: [PATCH 11/11] nVMX: Expose VPID capability to nested VMM.
At 01:57 +0800 on 11 Dec (1355191043), xiantao.zhang@intel.com wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com> > > Virtualize VPID for the nested vmm, use host''s VPID > to emualte guest''s VPID. For each virtual vmentry, if > guest''v vpid is changed, allocate a new host VPID for > L2 guest.Looks fine to me, but there''s some whitespace mangling:> @@ -2747,8 +2750,11 @@ void vmx_vmenter_helper(void) > > if ( !cpu_has_vmx_vpid ) > goto out; > + if ( nestedhvm_vcpu_in_guestmode(curr) ) > + p_asid = &vcpu_nestedhvm(curr).nv_n2asid;here (after ''=''),> @@ -897,6 +908,18 @@ static void virtual_vmentry(struct cpu_user_regs *regs) > if ( nestedhvm_paging_mode_hap(v) ) > __vmwrite(EPT_POINTER, get_shadow_eptp(v)); > > + /* nested VPID support! */ > + if ( cpu_has_vmx_vpid && nvmx_vpid_enabled(nvcpu) ) > + { > + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); > + uint32_t new_vpid = __get_vvmcs(vvmcs, VIRTUAL_PROCESSOR_ID);here (after ''=''),> @@ -1363,6 +1386,9 @@ int nvmx_handle_invept(struct cpu_user_regs *regs) > unsigned long eptp; > u64 inv_type; > > + if(!cpu_has_vmx_ept)here,> @@ -1401,6 +1427,37 @@ int nvmx_handle_invept(struct cpu_user_regs *regs) > (((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \ > ((uint32_t)(__emul_value(enable1, default1) | host_value))) > > +int nvmx_handle_invvpid(struct cpu_user_regs *regs) > +{ > + struct vmx_inst_decoded decode; > + unsigned long vpid; > + u64 inv_type; > + > + if(!cpu_has_vmx_vpid)here,> + return X86EMUL_EXCEPTION; > + > + if ( decode_vmx_inst(regs, &decode, &vpid, 0) > + != X86EMUL_OKAY ) > + return X86EMUL_EXCEPTION; > + > + inv_type = reg_read(regs, decode.reg2); > + gdprintk(XENLOG_DEBUG,"inv_type:%ld, vpid:%lx\n", inv_type, vpid); > + > + switch ( inv_type ){ > + /* Just invalidate all tlb entries for all types! */ > + case INVVPID_INDIVIDUAL_ADDR: > + case INVVPID_SINGLE_CONTEXT: > + case INVVPID_ALL_CONTEXT: > + hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(current).nv_n2asid); > + break; > + default: > + return X86EMUL_EXCEPTION; > + }here (lots of tabs),> @@ -126,8 +126,9 @@ static bool_t nept_present_check(uint64_t entry) > > uint64_t nept_get_ept_vpid_cap(void) > { > - /*TODO: exposed ept and vpid features*/ > - return NEPT_VPID_CAP_BITS; > + if (cpu_has_vmx_ept && cpu_has_vmx_vpid)and here. With those fixed, Acked-by: Tim Deegan <tim@xen.org>
Zhang, Xiantao
2012-Dec-17 08:57 UTC
Re: [PATCH 05/11] EPT: Make ept data structure or operations neutral
> -----Original Message----- > From: Tim Deegan [mailto:tim@xen.org] > Sent: Friday, December 14, 2012 12:04 AM > To: Zhang, Xiantao > Cc: xen-devel@lists.xensource.com; Dong, Eddie; keir@xen.org; Nakajima, > Jun; JBeulich@suse.com > Subject: Re: [Xen-devel] [PATCH 05/11] EPT: Make ept data structure or > operations neutral > > At 01:57 +0800 on 11 Dec (1355191037), xiantao.zhang@intel.com wrote: > > From: Zhang Xiantao <xiantao.zhang@intel.com> > > > > Share the current EPT logic with nested EPT case, so make the related > > data structure or operations netural to comment EPT and nested EPT. > > > > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> > > Since the struct ept_data is only 16 bytes long, why not just embed it in the > struct p2m_domain, as > > > mm_lock_t lock; /* Locking of private pod structs, * > > * not relying on the p2m lock. */ > > } pod; > > + union { > > + struct ept_data ept; > > + /* NPT equivalent could go here if needed */ > > + }; > > };Hi, Tim Thanks for your review! If we change it like this, p2m.h have to include asm/hvm/vmx/vmcs.h, is it acceptable ? Xiantao> That would tidy up the alloc/free stuff a fair bit, though you''d still need it for > the cpumask, I guess. > > It would be nice to wrap the alloc/free functions up in the usual way so we > dont get ept-specific functions with arch-independednt names. > > Otherwise taht looks fine. > > Cheers, > > Tim. > > > --- > > xen/arch/x86/hvm/vmx/vmcs.c | 2 +- > > xen/arch/x86/hvm/vmx/vmx.c | 39 +++++++++------ > > xen/arch/x86/mm/p2m-ept.c | 96 > ++++++++++++++++++++++++++++-------- > > xen/arch/x86/mm/p2m.c | 16 +++++- > > xen/include/asm-x86/hvm/vmx/vmcs.h | 30 +++++++---- > > xen/include/asm-x86/hvm/vmx/vmx.h | 6 ++- > > xen/include/asm-x86/p2m.h | 1 + > > 7 files changed, 137 insertions(+), 53 deletions(-) > > > > diff --git a/xen/arch/x86/hvm/vmx/vmcs.c > b/xen/arch/x86/hvm/vmx/vmcs.c > > index 9adc7a4..b9ebdfe 100644 > > --- a/xen/arch/x86/hvm/vmx/vmcs.c > > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > > @@ -942,7 +942,7 @@ static int construct_vmcs(struct vcpu *v) > > } > > > > if ( paging_mode_hap(d) ) > > - __vmwrite(EPT_POINTER, d- > >arch.hvm_domain.vmx.ept_control.eptp); > > + __vmwrite(EPT_POINTER, > > + d->arch.hvm_domain.vmx.ept.ept_ctl.eptp); > > > > if ( cpu_has_vmx_pat && paging_mode_hap(d) ) > > { > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c > b/xen/arch/x86/hvm/vmx/vmx.c > > index c67ac59..06455bf 100644 > > --- a/xen/arch/x86/hvm/vmx/vmx.c > > +++ b/xen/arch/x86/hvm/vmx/vmx.c > > @@ -79,22 +79,23 @@ static void __ept_sync_domain(void *info); static > > int vmx_domain_initialise(struct domain *d) { > > int rc; > > + struct ept_data *ept = &d->arch.hvm_domain.vmx.ept; > > > > /* Set the memory type used when accessing EPT paging structures. */ > > - d->arch.hvm_domain.vmx.ept_control.ept_mt = EPT_DEFAULT_MT; > > + ept->ept_ctl.ept_mt = EPT_DEFAULT_MT; > > > > /* set EPT page-walk length, now it''s actual walk length - 1, i.e. 3 */ > > - d->arch.hvm_domain.vmx.ept_control.ept_wl = 3; > > + ept->ept_ctl.ept_wl = 3; > > > > - d->arch.hvm_domain.vmx.ept_control.asr > > + ept->ept_ctl.asr > > pagetable_get_pfn(p2m_get_pagetable(p2m_get_hostp2m(d))); > > > > - if ( !zalloc_cpumask_var(&d->arch.hvm_domain.vmx.ept_synced) ) > > + if ( !zalloc_cpumask_var(&ept->ept_synced) ) > > return -ENOMEM; > > > > if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 ) > > { > > - free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced); > > + free_cpumask_var(ept->ept_synced); > > return rc; > > } > > > > @@ -103,9 +104,10 @@ static int vmx_domain_initialise(struct domain > > *d) > > > > static void vmx_domain_destroy(struct domain *d) { > > + struct ept_data *ept = &d->arch.hvm_domain.vmx.ept; > > if ( paging_mode_hap(d) ) > > - on_each_cpu(__ept_sync_domain, d, 1); > > - free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced); > > + on_each_cpu(__ept_sync_domain, p2m_get_hostp2m(d), 1); > > + free_cpumask_var(ept->ept_synced); > > vmx_free_vlapic_mapping(d); > > } > > > > @@ -641,6 +643,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v) { > > struct domain *d = v->domain; > > unsigned long old_cr4 = read_cr4(), new_cr4 = mmu_cr4_features; > > + struct ept_data *ept_data = p2m_get_hostp2m(d)->hap_data; > > > > /* HOST_CR4 in VMCS is always mmu_cr4_features. Sync CR4 now. */ > > if ( old_cr4 != new_cr4 ) > > @@ -650,10 +653,10 @@ static void vmx_ctxt_switch_to(struct vcpu *v) > > { > > unsigned int cpu = smp_processor_id(); > > /* Test-and-test-and-set this CPU in the EPT-is-synced mask. */ > > - if ( !cpumask_test_cpu(cpu, d->arch.hvm_domain.vmx.ept_synced) > && > > + if ( !cpumask_test_cpu(cpu, ept_get_synced_mask(ept_data)) && > > !cpumask_test_and_set_cpu(cpu, > > - d->arch.hvm_domain.vmx.ept_synced) ) > > - __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0); > > + ept_get_synced_mask(ept_data)) ) > > + __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), > > + 0); > > } > > > > vmx_restore_guest_msrs(v); > > @@ -1218,12 +1221,16 @@ static void vmx_update_guest_efer(struct vcpu > > *v) > > > > static void __ept_sync_domain(void *info) { > > - struct domain *d = info; > > - __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0); > > + struct p2m_domain *p2m = info; > > + struct ept_data *ept_data = p2m->hap_data; > > + > > + __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0); > > } > > > > -void ept_sync_domain(struct domain *d) > > +void ept_sync_domain(struct p2m_domain *p2m) > > { > > + struct domain *d = p2m->domain; > > + struct ept_data *ept_data = p2m->hap_data; > > /* Only if using EPT and this domain has some VCPUs to dirty. */ > > if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] ) > > return; > > @@ -1236,11 +1243,11 @@ void ept_sync_domain(struct domain *d) > > * the ept_synced mask before on_selected_cpus() reads it, resulting in > > * unnecessary extra flushes, to avoid allocating a cpumask_t on the > stack. > > */ > > - cpumask_and(d->arch.hvm_domain.vmx.ept_synced, > > + cpumask_and(ept_get_synced_mask(ept_data), > > d->domain_dirty_cpumask, &cpu_online_map); > > > > - on_selected_cpus(d->arch.hvm_domain.vmx.ept_synced, > > - __ept_sync_domain, d, 1); > > + on_selected_cpus(ept_get_synced_mask(ept_data), > > + __ept_sync_domain, p2m, 1); > > } > > > > void nvmx_enqueue_n2_exceptions(struct vcpu *v, diff --git > > a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index > > c964f54..8adf3f9 100644 > > --- a/xen/arch/x86/mm/p2m-ept.c > > +++ b/xen/arch/x86/mm/p2m-ept.c > > @@ -291,9 +291,11 @@ ept_set_entry(struct p2m_domain *p2m, > unsigned long gfn, mfn_t mfn, > > int need_modify_vtd_table = 1; > > int vtd_pte_present = 0; > > int needs_sync = 1; > > - struct domain *d = p2m->domain; > > ept_entry_t old_entry = { .epte = 0 }; > > + struct ept_data *ept_data = p2m->hap_data; > > + struct domain *d = p2m->domain; > > > > + ASSERT(ept_data); > > /* > > * the caller must make sure: > > * 1. passing valid gfn and mfn at order boundary. > > @@ -301,17 +303,17 @@ ept_set_entry(struct p2m_domain *p2m, > unsigned long gfn, mfn_t mfn, > > * 3. passing a valid order. > > */ > > if ( ((gfn | mfn_x(mfn)) & ((1UL << order) - 1)) || > > - ((u64)gfn >> ((ept_get_wl(d) + 1) * EPT_TABLE_ORDER)) || > > + ((u64)gfn >> ((ept_get_wl(ept_data) + 1) * EPT_TABLE_ORDER)) > > + || > > (order % EPT_TABLE_ORDER) ) > > return 0; > > > > - ASSERT((target == 2 && hvm_hap_has_1gb(d)) || > > - (target == 1 && hvm_hap_has_2mb(d)) || > > + ASSERT((target == 2 && hvm_hap_has_1gb()) || > > + (target == 1 && hvm_hap_has_2mb()) || > > (target == 0)); > > > > - table = map_domain_page(ept_get_asr(d)); > > + table > > + map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); > > > > - for ( i = ept_get_wl(d); i > target; i-- ) > > + for ( i = ept_get_wl(ept_data); i > target; i-- ) > > { > > ret = ept_next_level(p2m, 0, &table, &gfn_remainder, i); > > if ( !ret ) > > @@ -439,9 +441,11 @@ out: > > unmap_domain_page(table); > > > > if ( needs_sync ) > > - ept_sync_domain(p2m->domain); > > + ept_sync_domain(p2m); > > > > - if ( rv && iommu_enabled && need_iommu(p2m->domain) && > need_modify_vtd_table ) > > + /* For non-nested p2m, may need to change VT-d page table.*/ > > + if ( rv && !p2m_is_nestedp2m(p2m) && iommu_enabled && > need_iommu(p2m->domain) && > > + need_modify_vtd_table ) > > { > > if ( iommu_hap_pt_share ) > > iommu_pte_flush(d, gfn, (u64*)ept_entry, order, > > vtd_pte_present); @@ -488,14 +492,14 @@ static mfn_t > ept_get_entry(struct p2m_domain *p2m, > > unsigned long gfn, p2m_type_t *t, p2m_access_t* a, > > p2m_query_t q, unsigned int *page_order) > > { > > - struct domain *d = p2m->domain; > > - ept_entry_t *table = map_domain_page(ept_get_asr(d)); > > + ept_entry_t *table > > + map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); > > unsigned long gfn_remainder = gfn; > > ept_entry_t *ept_entry; > > u32 index; > > int i; > > int ret = 0; > > mfn_t mfn = _mfn(INVALID_MFN); > > + struct ept_data *ept_data = p2m->hap_data; > > > > *t = p2m_mmio_dm; > > *a = p2m_access_n; > > @@ -506,7 +510,7 @@ static mfn_t ept_get_entry(struct p2m_domain > *p2m, > > > > /* Should check if gfn obeys GAW here. */ > > > > - for ( i = ept_get_wl(d); i > 0; i-- ) > > + for ( i = ept_get_wl(ept_data); i > 0; i-- ) > > { > > retry: > > ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); @@ > > -588,19 +592,20 @@ out: > > static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m, > > unsigned long gfn, int *level) > > { > > - ept_entry_t *table = map_domain_page(ept_get_asr(p2m->domain)); > > + ept_entry_t *table > > + map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); > > unsigned long gfn_remainder = gfn; > > ept_entry_t *ept_entry; > > ept_entry_t content = { .epte = 0 }; > > u32 index; > > int i; > > int ret=0; > > + struct ept_data *ept_data = p2m->hap_data; > > > > /* This pfn is higher than the highest the p2m map currently holds */ > > if ( gfn > p2m->max_mapped_pfn ) > > goto out; > > > > - for ( i = ept_get_wl(p2m->domain); i > 0; i-- ) > > + for ( i = ept_get_wl(ept_data); i > 0; i-- ) > > { > > ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); > > if ( !ret || ret == GUEST_TABLE_POD_PAGE ) @@ -622,7 +627,8 > > @@ static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m, > > void ept_walk_table(struct domain *d, unsigned long gfn) { > > struct p2m_domain *p2m = p2m_get_hostp2m(d); > > - ept_entry_t *table = map_domain_page(ept_get_asr(d)); > > + struct ept_data *ept_data = p2m->hap_data; > > + ept_entry_t *table > > + map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); > > unsigned long gfn_remainder = gfn; > > > > int i; > > @@ -638,7 +644,7 @@ void ept_walk_table(struct domain *d, unsigned > long gfn) > > goto out; > > } > > > > - for ( i = ept_get_wl(d); i >= 0; i-- ) > > + for ( i = ept_get_wl(ept_data); i >= 0; i-- ) > > { > > ept_entry_t *ept_entry, *next; > > u32 index; > > @@ -778,16 +784,16 @@ static void ept_change_entry_type_page(mfn_t > > ept_page_mfn, int ept_page_level, static void > ept_change_entry_type_global(struct p2m_domain *p2m, > > p2m_type_t ot, p2m_type_t > > nt) { > > - struct domain *d = p2m->domain; > > - if ( ept_get_asr(d) == 0 ) > > + struct ept_data *ept_data = p2m->hap_data; > > + if ( ept_get_asr(ept_data) == 0 ) > > return; > > > > BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt)); > > BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt => > p2m_mmio_direct)); > > > > - ept_change_entry_type_page(_mfn(ept_get_asr(d)), ept_get_wl(d), > ot, nt); > > + ept_change_entry_type_page(_mfn(ept_get_asr(ept_data)), > > + ept_get_wl(ept_data), ot, nt); > > > > - ept_sync_domain(d); > > + ept_sync_domain(p2m); > > } > > > > void ept_p2m_init(struct p2m_domain *p2m) @@ -811,6 +817,7 @@ static > > void ept_dump_p2m_table(unsigned char key) > > unsigned long gfn, gfn_remainder; > > unsigned long record_counter = 0; > > struct p2m_domain *p2m; > > + struct ept_data *ept_data; > > > > for_each_domain(d) > > { > > @@ -818,15 +825,16 @@ static void ept_dump_p2m_table(unsigned char > key) > > continue; > > > > p2m = p2m_get_hostp2m(d); > > + ept_data = p2m->hap_data; > > printk("\ndomain%d EPT p2m table: \n", d->domain_id); > > > > for ( gfn = 0; gfn <= p2m->max_mapped_pfn; gfn += (1 << order) ) > > { > > gfn_remainder = gfn; > > mfn = _mfn(INVALID_MFN); > > - table = map_domain_page(ept_get_asr(d)); > > + table > > + map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); > > > > - for ( i = ept_get_wl(d); i > 0; i-- ) > > + for ( i = ept_get_wl(ept_data); i > 0; i-- ) > > { > > ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); > > if ( ret != GUEST_TABLE_NORMAL_PAGE ) @@ -858,6 > > +866,52 @@ out: > > } > > } > > > > +int alloc_p2m_hap_data(struct p2m_domain *p2m) { > > + struct domain *d = p2m->domain; > > + struct ept_data *ept; > > + > > + ASSERT(d); > > + if (!hap_enabled(d)) > > + return 0; > > + > > + p2m->hap_data = ept = xzalloc(struct ept_data); > > + if ( !p2m->hap_data ) > > + return -ENOMEM; > > + if ( !zalloc_cpumask_var(&ept->ept_synced) ) > > + { > > + xfree(ept); > > + p2m->hap_data = NULL; > > + return -ENOMEM; > > + } > > + return 0; > > +} > > + > > +void free_p2m_hap_data(struct p2m_domain *p2m) { > > + struct ept_data *ept; > > + > > + if ( !hap_enabled(p2m->domain) ) > > + return; > > + > > + if ( p2m_is_nestedp2m(p2m)) { > > + ept = p2m->hap_data; > > + if ( ept ) { > > + free_cpumask_var(ept->ept_synced); > > + xfree(ept); > > + } > > + } > > +} > > + > > +void p2m_init_hap_data(struct p2m_domain *p2m) { > > + struct ept_data *ept = p2m->hap_data; > > + > > + ept->ept_ctl.ept_wl = 3; > > + ept->ept_ctl.ept_mt = EPT_DEFAULT_MT; > > + ept->ept_ctl.asr = pagetable_get_pfn(p2m_get_pagetable(p2m)); > > +} > > + > > static struct keyhandler ept_p2m_table = { > > .diagnostic = 0, > > .u.fn = ept_dump_p2m_table, > > diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index > > 62c2d78..799bbfb 100644 > > --- a/xen/arch/x86/mm/p2m.c > > +++ b/xen/arch/x86/mm/p2m.c > > @@ -105,6 +105,8 @@ p2m_init_nestedp2m(struct domain *d) > > if ( !zalloc_cpumask_var(&p2m->dirty_cpumask) ) > > return -ENOMEM; > > p2m_initialise(d, p2m); > > + if ( cpu_has_vmx && alloc_p2m_hap_data(p2m) ) > > + return -ENOMEM; > > p2m->write_p2m_entry = nestedp2m_write_p2m_entry; > > list_add(&p2m->np2m_list, &p2m_get_hostp2m(d)->np2m_list); > > } > > @@ -126,12 +128,14 @@ int p2m_init(struct domain *d) > > return -ENOMEM; > > } > > p2m_initialise(d, p2m); > > + if ( hap_enabled(d) && cpu_has_vmx) > > + p2m->hap_data = &d->arch.hvm_domain.vmx.ept; > > > > /* Must initialise nestedp2m unconditionally > > * since nestedhvm_enabled(d) returns false here. > > * (p2m_init runs too early for HVM_PARAM_* options) */ > > rc = p2m_init_nestedp2m(d); > > - if ( rc ) > > + if ( rc ) > > p2m_final_teardown(d); > > return rc; > > } > > @@ -354,6 +358,8 @@ int p2m_alloc_table(struct p2m_domain *p2m) > > > > if ( hap_enabled(d) ) > > iommu_share_p2m_table(d); > > + if ( p2m_is_nestedp2m(p2m) && hap_enabled(d) ) > > + p2m_init_hap_data(p2m); > > > > P2M_PRINTK("populating p2m table\n"); > > > > @@ -436,12 +442,16 @@ void p2m_teardown(struct p2m_domain *p2m) > > static void p2m_teardown_nestedp2m(struct domain *d) { > > uint8_t i; > > + struct p2m_domain *p2m; > > > > for (i = 0; i < MAX_NESTEDP2M; i++) { > > if ( !d->arch.nested_p2m[i] ) > > continue; > > - free_cpumask_var(d->arch.nested_p2m[i]->dirty_cpumask); > > - xfree(d->arch.nested_p2m[i]); > > + p2m = d->arch.nested_p2m[i]; > > + if ( p2m->hap_data ) > > + free_p2m_hap_data(p2m); > > + free_cpumask_var(p2m->dirty_cpumask); > > + xfree(p2m); > > d->arch.nested_p2m[i] = NULL; > > } > > } > > diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h > > b/xen/include/asm-x86/hvm/vmx/vmcs.h > > index 9a728b6..e6b4e3b 100644 > > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > > @@ -56,26 +56,34 @@ struct vmx_msr_state { > > > > #define EPT_DEFAULT_MT MTRR_TYPE_WRBACK > > > > -struct vmx_domain { > > - unsigned long apic_access_mfn; > > - union { > > - struct { > > +union eptp_control{ > > + struct { > > u64 ept_mt :3, > > ept_wl :3, > > rsvd :6, > > asr :52; > > }; > > u64 eptp; > > - } ept_control; > > +}; > > + > > +struct ept_data{ > > + union eptp_control ept_ctl; > > cpumask_var_t ept_synced; > > }; > > > > -#define ept_get_wl(d) \ > > - ((d)->arch.hvm_domain.vmx.ept_control.ept_wl) > > -#define ept_get_asr(d) \ > > - ((d)->arch.hvm_domain.vmx.ept_control.asr) > > -#define ept_get_eptp(d) \ > > - ((d)->arch.hvm_domain.vmx.ept_control.eptp) > > +struct vmx_domain { > > + unsigned long apic_access_mfn; > > + struct ept_data ept; > > +}; > > + > > +#define ept_get_wl(ept_data) \ > > + (((struct ept_data*)(ept_data))->ept_ctl.ept_wl) > > +#define ept_get_asr(ept_data) \ > > + (((struct ept_data*)(ept_data))->ept_ctl.asr) > > +#define ept_get_eptp(ept_data) \ > > + (((struct ept_data*)(ept_data))->ept_ctl.eptp) > > +#define ept_get_synced_mask(ept_data)\ > > + (((struct ept_data*)(ept_data))->ept_synced) > > > > struct arch_vmx_struct { > > /* Virtual address of VMCS. */ > > diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h > > b/xen/include/asm-x86/hvm/vmx/vmx.h > > index aa5b080..573a12e 100644 > > --- a/xen/include/asm-x86/hvm/vmx/vmx.h > > +++ b/xen/include/asm-x86/hvm/vmx/vmx.h > > @@ -333,7 +333,7 @@ static inline void ept_sync_all(void) > > __invept(INVEPT_ALL_CONTEXT, 0, 0); } > > > > -void ept_sync_domain(struct domain *d); > > +void ept_sync_domain(struct p2m_domain *p2m); > > > > static inline void vpid_sync_vcpu_gva(struct vcpu *v, unsigned long > > gva) { @@ -401,6 +401,10 @@ void setup_ept_dump(void); > > > > void update_guest_eip(void); > > > > +int alloc_p2m_hap_data(struct p2m_domain *p2m); void > > +free_p2m_hap_data(struct p2m_domain *p2m); void > > +p2m_init_hap_data(struct p2m_domain *p2m); > > + > > /* EPT violation qualifications definitions */ > > #define _EPT_READ_VIOLATION 0 > > #define EPT_READ_VIOLATION (1UL<<_EPT_READ_VIOLATION) > > diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h > > index 1807ad6..0fb1b2d 100644 > > --- a/xen/include/asm-x86/p2m.h > > +++ b/xen/include/asm-x86/p2m.h > > @@ -277,6 +277,7 @@ struct p2m_domain { > > mm_lock_t lock; /* Locking of private pod structs, * > > * not relying on the p2m lock. */ > > } pod; > > + void *hap_data; > > }; > > > > /* get host p2m table */ > > -- > > 1.7.1 > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > http://lists.xen.org/xen-devel
Jan Beulich
2012-Dec-17 09:56 UTC
Re: [PATCH 05/11] EPT: Make ept data structure or operations neutral
>>> On 17.12.12 at 09:57, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote:>> -----Original Message----- >> From: Tim Deegan [mailto:tim@xen.org] >> Sent: Friday, December 14, 2012 12:04 AM >> To: Zhang, Xiantao >> Cc: xen-devel@lists.xensource.com; Dong, Eddie; keir@xen.org; Nakajima, >> Jun; JBeulich@suse.com >> Subject: Re: [Xen-devel] [PATCH 05/11] EPT: Make ept data structure or >> operations neutral >> >> At 01:57 +0800 on 11 Dec (1355191037), xiantao.zhang@intel.com wrote: >> > From: Zhang Xiantao <xiantao.zhang@intel.com> >> > >> > Share the current EPT logic with nested EPT case, so make the related >> > data structure or operations netural to comment EPT and nested EPT. >> > >> > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> >> >> Since the struct ept_data is only 16 bytes long, why not just embed it in > the >> struct p2m_domain, as >> >> > mm_lock_t lock; /* Locking of private pod structs, > * >> > * not relying on the p2m lock. > */ >> > } pod; >> > + union { >> > + struct ept_data ept; >> > + /* NPT equivalent could go here if needed */ >> > + }; >> > }; > > Hi, Tim > Thanks for your review! If we change it like this, p2m.h have to > include asm/hvm/vmx/vmcs.h, is it acceptable ?I''m sure there are ways to avoid such a dependency. Jan