Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM
From: Zhang Xiantao <xiantao.zhang@intel.com> With virtual EPT support, L1 hyerpvisor can use EPT hardware for L2 guest''s memory virtualization. In this way, L2 guest''s performance can be improved sharply. According to our testing, some benchmarks can show > 5x performance gain. Changes from v1: Update the patches according to Tim''s comments. 1. Patch 03: Enhance the virtual EPT''s walker logic. 2. Patch 04: Add a new field in struct p2m_domain, and use it to store EPT-specific data. For host p2m, it saves L1 VMM''s EPT data, and for nested p2m, it saves nested EPT''s data 3. Patch 07: strictly check host''s p2m access type. 4. Other patches: some whitespace mangling fixes. Changes form v2: Addressed comments from Jan and Jun: 1. Add Acked-by message for reviewed patches by Tim. 2. Fixed one whitespace mangling issue in PATCH 08 3. Add some comments to describe the meaning of the return value of hvm_hap_nested_page_fault in PATCH 05. 4. Add the logic for handling default case of two switch statements. Changes v3: 1. Re-check all patches'' whitespace mangling issue. 2. Addressed Jan''s comments in Patch08 and Patch09 that once return X86EMUL_EXCEPTION, the callee should be responsible for handling the execption before its return. 3. Addressed Tim''s comments in Patch03 and Patch04 and Patch07: Patch03: If host doesn''t support exec-only capability, we shoudln''t expost this feature to L1 VMM. Once map guest''s EPT table error, inject an EPT misconfiguration errot to L1. Patch04: Re-organize p2m''s and nested p2m''s structure {init/teardown} logic. Patch07: Initialize p2ma_21 -> p2m_access_rwx, so not to change SVM''s behavior. Zhang Xiantao (10): nestedhap: Change hostcr3 and p2m->cr3 to meaningful words nestedhap: Change nested p2m''s walker to vendor-specific nested_ept: Implement guest ept''s walker EPT: Make ept data structure or operations neutral nEPT: Try to enable EPT paging for L2 guest. nEPT: Sync PDPTR fields if L2 guest in PAE paging mode nEPT: Use minimal permission for nested p2m. nEPT: handle invept instruction from L1 VMM nVMX: virutalize VPID capability to nested VMM. nEPT: expose EPT & VPID capablities to L1 VMM xen/arch/x86/hvm/hvm.c | 7 +- xen/arch/x86/hvm/svm/nestedsvm.c | 31 ++++ xen/arch/x86/hvm/svm/svm.c | 3 +- xen/arch/x86/hvm/vmx/vmcs.c | 8 +- xen/arch/x86/hvm/vmx/vmx.c | 91 ++++------ xen/arch/x86/hvm/vmx/vvmx.c | 208 ++++++++++++++++++++-- xen/arch/x86/mm/guest_walk.c | 16 +- xen/arch/x86/mm/hap/Makefile | 1 + xen/arch/x86/mm/hap/nested_ept.c | 298 +++++++++++++++++++++++++++++++ xen/arch/x86/mm/hap/nested_hap.c | 96 ++++++----- xen/arch/x86/mm/mm-locks.h | 2 +- xen/arch/x86/mm/p2m-ept.c | 104 +++++++++--- xen/arch/x86/mm/p2m.c | 159 +++++++++++------ xen/include/asm-x86/guest_pt.h | 8 + xen/include/asm-x86/hvm/hvm.h | 9 +- xen/include/asm-x86/hvm/nestedhvm.h | 1 + xen/include/asm-x86/hvm/svm/nestedsvm.h | 3 + xen/include/asm-x86/hvm/vmx/vmcs.h | 24 ++-- xen/include/asm-x86/hvm/vmx/vmx.h | 41 ++++- xen/include/asm-x86/hvm/vmx/vvmx.h | 28 +++- xen/include/asm-x86/p2m.h | 20 ++- 21 files changed, 932 insertions(+), 226 deletions(-) create mode 100644 xen/arch/x86/mm/hap/nested_ept.c
Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words
From: Zhang Xiantao <xiantao.zhang@intel.com> VMX doesn''t have the concept about host cr3 for nested p2m, and only SVM has, so change it to netural words. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Acked-by: Tim Deegan <tim@xen.org> --- xen/arch/x86/hvm/hvm.c | 6 +++--- xen/arch/x86/hvm/svm/svm.c | 2 +- xen/arch/x86/hvm/vmx/vmx.c | 2 +- xen/arch/x86/hvm/vmx/vvmx.c | 2 +- xen/arch/x86/mm/hap/nested_hap.c | 15 ++++++++------- xen/arch/x86/mm/mm-locks.h | 2 +- xen/arch/x86/mm/p2m.c | 26 +++++++++++++------------- xen/include/asm-x86/hvm/hvm.h | 4 ++-- xen/include/asm-x86/hvm/vmx/vvmx.h | 2 +- xen/include/asm-x86/p2m.h | 16 ++++++++-------- 10 files changed, 39 insertions(+), 38 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 40c1ab2..f63ee52 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4536,10 +4536,10 @@ uint64_t nhvm_vcpu_guestcr3(struct vcpu *v) return -EOPNOTSUPP; } -uint64_t nhvm_vcpu_hostcr3(struct vcpu *v) +uint64_t nhvm_vcpu_p2m_base(struct vcpu *v) { - if (hvm_funcs.nhvm_vcpu_hostcr3) - return hvm_funcs.nhvm_vcpu_hostcr3(v); + if ( hvm_funcs.nhvm_vcpu_p2m_base ) + return hvm_funcs.nhvm_vcpu_p2m_base(v); return -EOPNOTSUPP; } diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 55a5ae5..2c8504a 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2003,7 +2003,7 @@ static struct hvm_function_table __read_mostly svm_function_table = { .nhvm_vcpu_vmexit = nsvm_vcpu_vmexit_inject, .nhvm_vcpu_vmexit_trap = nsvm_vcpu_vmexit_trap, .nhvm_vcpu_guestcr3 = nsvm_vcpu_guestcr3, - .nhvm_vcpu_hostcr3 = nsvm_vcpu_hostcr3, + .nhvm_vcpu_p2m_base = nsvm_vcpu_hostcr3, .nhvm_vcpu_asid = nsvm_vcpu_asid, .nhvm_vmcx_guest_intercepts_trap = nsvm_vmcb_guest_intercepts_trap, .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled, diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index aee1f9e..98309da 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1504,7 +1504,7 @@ static struct hvm_function_table __read_mostly vmx_function_table = { .nhvm_vcpu_destroy = nvmx_vcpu_destroy, .nhvm_vcpu_reset = nvmx_vcpu_reset, .nhvm_vcpu_guestcr3 = nvmx_vcpu_guestcr3, - .nhvm_vcpu_hostcr3 = nvmx_vcpu_hostcr3, + .nhvm_vcpu_p2m_base = nvmx_vcpu_eptp_base, .nhvm_vcpu_asid = nvmx_vcpu_asid, .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception, .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap, diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 7b27d2d..6999c25 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -94,7 +94,7 @@ uint64_t nvmx_vcpu_guestcr3(struct vcpu *v) return 0; } -uint64_t nvmx_vcpu_hostcr3(struct vcpu *v) +uint64_t nvmx_vcpu_eptp_base(struct vcpu *v) { /* TODO */ ASSERT(0); diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c index 317875d..f9a5edc 100644 --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -48,9 +48,10 @@ * 1. If #NPF is from L1 guest, then we crash the guest VM (same as old * code) * 2. If #NPF is from L2 guest, then we continue from (3) - * 3. Get h_cr3 from L1 guest. Map h_cr3 into L0 hypervisor address space. - * 4. Walk the h_cr3 page table - * 5. - if not present, then we inject #NPF back to L1 guest and + * 3. Get np2m base from L1 guest. Map np2m base into L0 hypervisor address space. + * 4. Walk the np2m''s page table + * 5. - if not present or permission check failure, then we inject #NPF back to + * L1 guest and * re-launch L1 guest (L1 guest will either treat this #NPF as MMIO, * or fix its p2m table for L2 guest) * 6. - if present, then we will get the a new translated value L1-GPA @@ -89,7 +90,7 @@ nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned long gfn, if (old_flags & _PAGE_PRESENT) flush_tlb_mask(p2m->dirty_cpumask); - + paging_unlock(d); } @@ -110,7 +111,7 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m, /* If this p2m table has been flushed or recycled under our feet, * leave it alone. We''ll pick up the right one as we try to * vmenter the guest. */ - if ( p2m->cr3 == nhvm_vcpu_hostcr3(v) ) + if ( p2m->np2m_base == nhvm_vcpu_p2m_base(v) ) { unsigned long gfn, mask; mfn_t mfn; @@ -186,7 +187,7 @@ nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, uint32_t pfec; unsigned long nested_cr3, gfn; - nested_cr3 = nhvm_vcpu_hostcr3(v); + nested_cr3 = nhvm_vcpu_p2m_base(v); pfec = PFEC_user_mode | PFEC_page_present; if (access_w) @@ -221,7 +222,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, p2m_type_t p2mt_10; p2m = p2m_get_hostp2m(d); /* L0 p2m */ - nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_hostcr3(v)); + nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v)); /* walk the L1 P2M table */ rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21, diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h index 3700e32..1817f81 100644 --- a/xen/arch/x86/mm/mm-locks.h +++ b/xen/arch/x86/mm/mm-locks.h @@ -249,7 +249,7 @@ declare_mm_order_constraint(per_page_sharing) * A per-domain lock that protects the mapping from nested-CR3 to * nested-p2m. In particular it covers: * - the array of nested-p2m tables, and all LRU activity therein; and - * - setting the "cr3" field of any p2m table to a non-CR3_EADDR value. + * - setting the "cr3" field of any p2m table to a non-P2M_BASE_EAADR value. * (i.e. assigning a p2m table to be the shadow of that cr3 */ /* PoD lock (per-p2m-table) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 258f46e..41a461b 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -69,7 +69,7 @@ static void p2m_initialise(struct domain *d, struct p2m_domain *p2m) p2m->domain = d; p2m->default_access = p2m_access_rwx; - p2m->cr3 = CR3_EADDR; + p2m->np2m_base = P2M_BASE_EADDR; if ( hap_enabled(d) && cpu_has_vmx ) ept_p2m_init(p2m); @@ -1433,7 +1433,7 @@ p2m_flush_table(struct p2m_domain *p2m) ASSERT(page_list_empty(&p2m->pod.single)); /* This is no longer a valid nested p2m for any address space */ - p2m->cr3 = CR3_EADDR; + p2m->np2m_base = P2M_BASE_EADDR; /* Zap the top level of the trie */ top = mfn_to_page(pagetable_get_mfn(p2m_get_pagetable(p2m))); @@ -1471,7 +1471,7 @@ p2m_flush_nestedp2m(struct domain *d) } struct p2m_domain * -p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3) +p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base) { /* Use volatile to prevent gcc to cache nv->nv_p2m in a cpu register as * this may change within the loop by an other (v)cpu. @@ -1480,8 +1480,8 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3) struct domain *d; struct p2m_domain *p2m; - /* Mask out low bits; this avoids collisions with CR3_EADDR */ - cr3 &= ~(0xfffull); + /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */ + np2m_base &= ~(0xfffull); if (nv->nv_flushp2m && nv->nv_p2m) { nv->nv_p2m = NULL; @@ -1493,14 +1493,14 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3) if ( p2m ) { p2m_lock(p2m); - if ( p2m->cr3 == cr3 || p2m->cr3 == CR3_EADDR ) + if ( p2m->np2m_base == np2m_base || p2m->np2m_base == P2M_BASE_EADDR ) { nv->nv_flushp2m = 0; p2m_getlru_nestedp2m(d, p2m); nv->nv_p2m = p2m; - if (p2m->cr3 == CR3_EADDR) + if ( p2m->np2m_base == P2M_BASE_EADDR ) hvm_asid_flush_vcpu(v); - p2m->cr3 = cr3; + p2m->np2m_base = np2m_base; cpumask_set_cpu(v->processor, p2m->dirty_cpumask); p2m_unlock(p2m); nestedp2m_unlock(d); @@ -1515,7 +1515,7 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3) p2m_flush_table(p2m); p2m_lock(p2m); nv->nv_p2m = p2m; - p2m->cr3 = cr3; + p2m->np2m_base = np2m_base; nv->nv_flushp2m = 0; hvm_asid_flush_vcpu(v); cpumask_set_cpu(v->processor, p2m->dirty_cpumask); @@ -1531,7 +1531,7 @@ p2m_get_p2m(struct vcpu *v) if (!nestedhvm_is_n2(v)) return p2m_get_hostp2m(v->domain); - return p2m_get_nestedp2m(v, nhvm_vcpu_hostcr3(v)); + return p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v)); } unsigned long paging_gva_to_gfn(struct vcpu *v, @@ -1549,15 +1549,15 @@ unsigned long paging_gva_to_gfn(struct vcpu *v, struct p2m_domain *p2m; const struct paging_mode *mode; uint32_t pfec_21 = *pfec; - uint64_t ncr3 = nhvm_vcpu_hostcr3(v); + uint64_t np2m_base = nhvm_vcpu_p2m_base(v); /* translate l2 guest va into l2 guest gfn */ - p2m = p2m_get_nestedp2m(v, ncr3); + p2m = p2m_get_nestedp2m(v, np2m_base); mode = paging_get_nestedmode(v); gfn = mode->gva_to_gfn(v, p2m, va, pfec); /* translate l2 guest gfn into l1 guest gfn */ - return hostmode->p2m_ga_to_gfn(v, hostp2m, ncr3, + return hostmode->p2m_ga_to_gfn(v, hostp2m, np2m_base, gfn << PAGE_SHIFT, &pfec_21, NULL); } diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index fdb0f58..d3535b6 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -170,7 +170,7 @@ struct hvm_function_table { uint64_t exitcode); int (*nhvm_vcpu_vmexit_trap)(struct vcpu *v, struct hvm_trap *trap); uint64_t (*nhvm_vcpu_guestcr3)(struct vcpu *v); - uint64_t (*nhvm_vcpu_hostcr3)(struct vcpu *v); + uint64_t (*nhvm_vcpu_p2m_base)(struct vcpu *v); uint32_t (*nhvm_vcpu_asid)(struct vcpu *v); int (*nhvm_vmcx_guest_intercepts_trap)(struct vcpu *v, unsigned int trapnr, int errcode); @@ -475,7 +475,7 @@ uint64_t nhvm_vcpu_guestcr3(struct vcpu *v); /* returns l1 guest''s cr3 that points to the page table used to * translate l2 guest physical address to l1 guest physical address. */ -uint64_t nhvm_vcpu_hostcr3(struct vcpu *v); +uint64_t nhvm_vcpu_p2m_base(struct vcpu *v); /* returns the asid number l1 guest wants to use to run the l2 guest */ uint32_t nhvm_vcpu_asid(struct vcpu *v); diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index dce2cd8..d97011d 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -99,7 +99,7 @@ int nvmx_vcpu_initialise(struct vcpu *v); void nvmx_vcpu_destroy(struct vcpu *v); int nvmx_vcpu_reset(struct vcpu *v); uint64_t nvmx_vcpu_guestcr3(struct vcpu *v); -uint64_t nvmx_vcpu_hostcr3(struct vcpu *v); +uint64_t nvmx_vcpu_eptp_base(struct vcpu *v); uint32_t nvmx_vcpu_asid(struct vcpu *v); enum hvm_intblk nvmx_intr_blocked(struct vcpu *v); int nvmx_intercepts_exception(struct vcpu *v, diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index 2bd2048..ce26594 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -197,17 +197,17 @@ struct p2m_domain { struct domain *domain; /* back pointer to domain */ - /* Nested p2ms only: nested-CR3 value that this p2m shadows. - * This can be cleared to CR3_EADDR under the per-p2m lock but + /* Nested p2ms only: nested p2m base value that this p2m shadows. + * This can be cleared to P2M_BASE_EADDR under the per-p2m lock but * needs both the per-p2m lock and the per-domain nestedp2m lock * to set it to any other value. */ -#define CR3_EADDR (~0ULL) - uint64_t cr3; +#define P2M_BASE_EADDR (~0ULL) + uint64_t np2m_base; /* Nested p2ms: linked list of n2pms allocated to this domain. * The host p2m hasolds the head of the list and the np2ms are * threaded on in LRU order. */ - struct list_head np2m_list; + struct list_head np2m_list; /* Host p2m: when this flag is set, don''t flush all the nested-p2m @@ -282,11 +282,11 @@ struct p2m_domain { /* get host p2m table */ #define p2m_get_hostp2m(d) ((d)->arch.p2m) -/* Get p2m table (re)usable for specified cr3. +/* Get p2m table (re)usable for specified np2m base. * Automatically destroys and re-initializes a p2m if none found. - * If cr3 == 0 then v->arch.hvm_vcpu.guest_cr[3] is used. + * If np2m_base == 0 then v->arch.hvm_vcpu.guest_cr[3] is used. */ -struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3); +struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base); /* If vcpu is in host mode then behaviour matches p2m_get_hostp2m(). * If vcpu is in guest mode then behaviour matches p2m_get_nestedp2m(). -- 1.7.1
Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 02/10] nestedhap: Change nested p2m''s walker to vendor-specific
From: Zhang Xiantao <xiantao.zhang@intel.com> EPT and NPT adopts differnt formats for each-level entry, so change the walker functions to vendor-specific. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Acked-by: Tim Deegan <tim@xen.org> --- xen/arch/x86/hvm/svm/nestedsvm.c | 31 +++++++++++++++++++++ xen/arch/x86/hvm/svm/svm.c | 1 + xen/arch/x86/hvm/vmx/vmx.c | 3 +- xen/arch/x86/hvm/vmx/vvmx.c | 13 +++++++++ xen/arch/x86/mm/hap/nested_hap.c | 46 +++++++++++-------------------- xen/include/asm-x86/hvm/hvm.h | 5 +++ xen/include/asm-x86/hvm/svm/nestedsvm.h | 3 ++ xen/include/asm-x86/hvm/vmx/vvmx.h | 5 +++ 8 files changed, 76 insertions(+), 31 deletions(-) diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index ed0faa6..c1c6fa7 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -1171,6 +1171,37 @@ nsvm_vmcb_hap_enabled(struct vcpu *v) return vcpu_nestedsvm(v).ns_hap_enabled; } +/* This function uses L2_gpa to walk the P2M page table in L1. If the + * walk is successful, the translated value is returned in + * L1_gpa. The result value tells what to do next. + */ +int +nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x) +{ + uint32_t pfec; + unsigned long nested_cr3, gfn; + + nested_cr3 = nhvm_vcpu_p2m_base(v); + + pfec = PFEC_user_mode | PFEC_page_present; + if ( access_w ) + pfec |= PFEC_write_access; + if ( access_x ) + pfec |= PFEC_insn_fetch; + + /* Walk the guest-supplied NPT table, just as if it were a pagetable */ + gfn = paging_ga_to_gfn_cr3(v, nested_cr3, L2_gpa, &pfec, page_order); + + if ( gfn == INVALID_GFN ) + return NESTEDHVM_PAGEFAULT_INJECT; + + *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK); + return NESTEDHVM_PAGEFAULT_DONE; +} + + enum hvm_intblk nsvm_intr_blocked(struct vcpu *v) { struct nestedsvm *svm = &vcpu_nestedsvm(v); diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 2c8504a..acd2d49 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2008,6 +2008,7 @@ static struct hvm_function_table __read_mostly svm_function_table = { .nhvm_vmcx_guest_intercepts_trap = nsvm_vmcb_guest_intercepts_trap, .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled, .nhvm_intr_blocked = nsvm_intr_blocked, + .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m, }; void svm_vmexit_handler(struct cpu_user_regs *regs) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 98309da..4abfa90 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1511,7 +1511,8 @@ static struct hvm_function_table __read_mostly vmx_function_table = { .nhvm_intr_blocked = nvmx_intr_blocked, .nhvm_domain_relinquish_resources = nvmx_domain_relinquish_resources, .update_eoi_exit_bitmap = vmx_update_eoi_exit_bitmap, - .virtual_intr_delivery_enabled = vmx_virtual_intr_delivery_enabled + .virtual_intr_delivery_enabled = vmx_virtual_intr_delivery_enabled, + .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m, }; struct hvm_function_table * __init start_vmx(void) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 6999c25..53f6a4d 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1479,6 +1479,19 @@ int nvmx_msr_write_intercept(unsigned int msr, u64 msr_content) return 1; } +/* This function uses L2_gpa to walk the P2M page table in L1. If the + * walk is successful, the translated value is returned in + * L1_gpa. The result value tells what to do next. + */ +int +nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x) +{ + /*TODO:*/ + return 0; +} + void nvmx_idtv_handling(void) { struct vcpu *v = current; diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c index f9a5edc..8787c91 100644 --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -136,6 +136,22 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m, } } +/* This function uses L2_gpa to walk the P2M page table in L1. If the + * walk is successful, the translated value is returned in + * L1_gpa. The result value tells what to do next. + */ +static int +nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x) +{ + ASSERT(hvm_funcs.nhvm_hap_walk_L1_p2m); + + return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order, + access_r, access_w, access_x); +} + + /* This function uses L1_gpa to walk the P2M table in L0 hypervisor. If the * walk is successful, the translated value is returned in L0_gpa. The return * value tells the upper level what to do. @@ -175,36 +191,6 @@ out: return rc; } -/* This function uses L2_gpa to walk the P2M page table in L1. If the - * walk is successful, the translated value is returned in - * L1_gpa. The result value tells what to do next. - */ -static int -nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, - bool_t access_r, bool_t access_w, bool_t access_x) -{ - uint32_t pfec; - unsigned long nested_cr3, gfn; - - nested_cr3 = nhvm_vcpu_p2m_base(v); - - pfec = PFEC_user_mode | PFEC_page_present; - if (access_w) - pfec |= PFEC_write_access; - if (access_x) - pfec |= PFEC_insn_fetch; - - /* Walk the guest-supplied NPT table, just as if it were a pagetable */ - gfn = paging_ga_to_gfn_cr3(v, nested_cr3, L2_gpa, &pfec, page_order); - - if ( gfn == INVALID_GFN ) - return NESTEDHVM_PAGEFAULT_INJECT; - - *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK); - return NESTEDHVM_PAGEFAULT_DONE; -} - /* * The following function, nestedhap_page_fault(), is for steps (3)--(10). * diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index d3535b6..80f07e9 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -183,6 +183,11 @@ struct hvm_function_table { /* Virtual interrupt delivery */ void (*update_eoi_exit_bitmap)(struct vcpu *v, u8 vector, u8 trig); int (*virtual_intr_delivery_enabled)(void); + + /*Walk nested p2m */ + int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x); }; extern struct hvm_function_table hvm_funcs; diff --git a/xen/include/asm-x86/hvm/svm/nestedsvm.h b/xen/include/asm-x86/hvm/svm/nestedsvm.h index fa83023..0c90f30 100644 --- a/xen/include/asm-x86/hvm/svm/nestedsvm.h +++ b/xen/include/asm-x86/hvm/svm/nestedsvm.h @@ -133,6 +133,9 @@ int nsvm_wrmsr(struct vcpu *v, unsigned int msr, uint64_t msr_content); void svm_vmexit_do_clgi(struct cpu_user_regs *regs, struct vcpu *v); void svm_vmexit_do_stgi(struct cpu_user_regs *regs, struct vcpu *v); bool_t nestedsvm_gif_isset(struct vcpu *v); +int nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x); #define NSVM_INTR_NOTHANDLED 3 #define NSVM_INTR_NOTINTERCEPTED 2 diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index d97011d..422f006 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -108,6 +108,11 @@ void nvmx_domain_relinquish_resources(struct domain *d); int nvmx_handle_vmxon(struct cpu_user_regs *regs); int nvmx_handle_vmxoff(struct cpu_user_regs *regs); + +int +nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, + unsigned int *page_order, + bool_t access_r, bool_t access_w, bool_t access_x); /* * Virtual VMCS layout * -- 1.7.1
Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 03/10] nested_ept: Implement guest ept''s walker
From: Zhang Xiantao <xiantao.zhang@intel.com> Implment guest EPT PT walker, some logic is based on shadow''s ia32e PT walker. During the PT walking, if the target pages are not in memory, use RETRY mechanism and get a chance to let the target page back. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/hvm.c | 1 + xen/arch/x86/hvm/vmx/vvmx.c | 42 +++++- xen/arch/x86/mm/guest_walk.c | 16 ++- xen/arch/x86/mm/hap/Makefile | 1 + xen/arch/x86/mm/hap/nested_ept.c | 287 +++++++++++++++++++++++++++++++++++ xen/arch/x86/mm/hap/nested_hap.c | 2 +- xen/include/asm-x86/guest_pt.h | 8 + xen/include/asm-x86/hvm/nestedhvm.h | 1 + xen/include/asm-x86/hvm/vmx/vmcs.h | 1 + xen/include/asm-x86/hvm/vmx/vmx.h | 31 ++++ xen/include/asm-x86/hvm/vmx/vvmx.h | 13 ++ 11 files changed, 394 insertions(+), 9 deletions(-) create mode 100644 xen/arch/x86/mm/hap/nested_ept.c diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index f63ee52..bd7314f 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1324,6 +1324,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa, access_r, access_w, access_x); switch (rv) { case NESTEDHVM_PAGEFAULT_DONE: + case NESTEDHVM_PAGEFAULT_RETRY: return 1; case NESTEDHVM_PAGEFAULT_L1_ERROR: /* An error occured while translating gpa from diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 53f6a4d..1d3090d 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -939,9 +939,18 @@ static void sync_vvmcs_ro(struct vcpu *v) { int i; struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + void *vvmcs = nvcpu->nv_vvmcx; for ( i = 0; i < ARRAY_SIZE(vmcs_ro_field); i++ ) shadow_to_vvmcs(nvcpu->nv_vvmcx, vmcs_ro_field[i]); + + /* Adjust exit_reason/exit_qualifciation for violation case */ + if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) == EXIT_REASON_EPT_VIOLATION ) + { + __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept_exit.exit_qual); + __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept_exit.exit_reason); + } } static void load_vvmcs_host_state(struct vcpu *v) @@ -1488,8 +1497,37 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, unsigned int *page_order, bool_t access_r, bool_t access_w, bool_t access_x) { - /*TODO:*/ - return 0; + int rc; + unsigned long gfn; + uint64_t exit_qual = __vmread(EXIT_QUALIFICATION); + uint32_t exit_reason = EXIT_REASON_EPT_VIOLATION; + uint32_t rwx_rights = (access_x << 2) | (access_w << 1) | access_r; + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + + rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn, + &exit_qual, &exit_reason); + switch ( rc ) + { + case EPT_TRANSLATE_SUCCEED: + *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK); + rc = NESTEDHVM_PAGEFAULT_DONE; + break; + case EPT_TRANSLATE_VIOLATION: + case EPT_TRANSLATE_MISCONFIG: + rc = NESTEDHVM_PAGEFAULT_INJECT; + nvmx->ept_exit.exit_reason = exit_reason; + nvmx->ept_exit.exit_qual = exit_qual; + break; + case EPT_TRANSLATE_RETRY: + rc = NESTEDHVM_PAGEFAULT_RETRY; + break; + default: + gdprintk(XENLOG_ERR, "GUEST EPT translation error!:%d\n", rc); + BUG(); + break; + } + + return rc; } void nvmx_idtv_handling(void) diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c index 0f08fb0..1c165c6 100644 --- a/xen/arch/x86/mm/guest_walk.c +++ b/xen/arch/x86/mm/guest_walk.c @@ -88,18 +88,19 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int set_dirty) /* If the map is non-NULL, we leave this function having * acquired an extra ref on mfn_to_page(*mfn) */ -static inline void *map_domain_gfn(struct p2m_domain *p2m, - gfn_t gfn, +void *map_domain_gfn(struct p2m_domain *p2m, + gfn_t gfn, mfn_t *mfn, p2m_type_t *p2mt, - uint32_t *rc) + p2m_query_t q, + uint32_t *rc) { struct page_info *page; void *map; /* Translate the gfn, unsharing if shared */ page = get_page_from_gfn_p2m(p2m->domain, p2m, gfn_x(gfn), p2mt, NULL, - P2M_ALLOC | P2M_UNSHARE); + q); if ( p2m_is_paging(*p2mt) ) { ASSERT(!p2m_is_nestedp2m(p2m)); @@ -128,7 +129,6 @@ static inline void *map_domain_gfn(struct p2m_domain *p2m, return map; } - /* Walk the guest pagetables, after the manner of a hardware walker. */ /* Because the walk is essentially random, it can cause a deadlock * warning in the p2m locking code. Highly unlikely this is an actual @@ -149,6 +149,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, uint32_t gflags, mflags, iflags, rc = 0; int smep; bool_t pse1G = 0, pse2M = 0; + p2m_query_t qt = P2M_ALLOC | P2M_UNSHARE; perfc_incr(guest_walk); memset(gw, 0, sizeof(*gw)); @@ -188,7 +189,8 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, l3p = map_domain_gfn(p2m, guest_l4e_get_gfn(gw->l4e), &gw->l3mfn, - &p2mt, + &p2mt, + qt, &rc); if(l3p == NULL) goto out; @@ -249,6 +251,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, guest_l3e_get_gfn(gw->l3e), &gw->l2mfn, &p2mt, + qt, &rc); if(l2p == NULL) goto out; @@ -322,6 +325,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, guest_l2e_get_gfn(gw->l2e), &gw->l1mfn, &p2mt, + qt, &rc); if(l1p == NULL) goto out; diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile index 80a6bec..68f2bb5 100644 --- a/xen/arch/x86/mm/hap/Makefile +++ b/xen/arch/x86/mm/hap/Makefile @@ -3,6 +3,7 @@ obj-y += guest_walk_2level.o obj-y += guest_walk_3level.o obj-$(x86_64) += guest_walk_4level.o obj-y += nested_hap.o +obj-y += nested_ept.o guest_walk_%level.o: guest_walk.c Makefile $(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@ diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c new file mode 100644 index 0000000..1463d81 --- /dev/null +++ b/xen/arch/x86/mm/hap/nested_ept.c @@ -0,0 +1,287 @@ +/* + * nested_ept.c: Handling virtulized EPT for guest in nested case. + * + * Copyright (c) 2012, Intel Corporation + * Xiantao Zhang <xiantao.zhang@intel.com> + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; if not, write to the Free Software Foundation, Inc., 59 Temple + * Place - Suite 330, Boston, MA 02111-1307 USA. + */ +#include <asm/domain.h> +#include <asm/page.h> +#include <asm/paging.h> +#include <asm/p2m.h> +#include <asm/mem_event.h> +#include <public/mem_event.h> +#include <asm/mem_sharing.h> +#include <xen/event.h> +#include <asm/hap.h> +#include <asm/hvm/support.h> + +#include <asm/hvm/nestedhvm.h> + +#include "private.h" + +#include <asm/hvm/vmx/vmx.h> +#include <asm/hvm/vmx/vvmx.h> + +/* EPT always use 4-level paging structure */ +#define GUEST_PAGING_LEVELS 4 +#include <asm/guest_pt.h> + +/* Must reserved bits in all level entries */ +#define EPT_MUST_RSV_BITS (((1ull << PADDR_BITS) -1) & \ + ~((1ull << paddr_bits) - 1)) + +/* + *TODO: Just leave it as 0 here for compile pass, will + * define real capabilities in the subsequent patches. + */ +#define NEPT_VPID_CAP_BITS 0 + + +#define NEPT_1G_ENTRY_FLAG (1 << 11) +#define NEPT_2M_ENTRY_FLAG (1 << 10) +#define NEPT_4K_ENTRY_FLAG (1 << 9) + +bool_t nept_sp_entry(ept_entry_t e) +{ + return !!(e.sp); +} + +static bool_t nept_rsv_bits_check(ept_entry_t e, uint32_t level) +{ + uint64_t rsv_bits = EPT_MUST_RSV_BITS; + + switch ( level ) + { + case 1: + break; + case 2 ... 3: + if ( nept_sp_entry(e) ) + rsv_bits |= ((1ull << (9 * (level -1 ))) -1) << PAGE_SHIFT; + else + rsv_bits |= EPTE_EMT_MASK | EPTE_IGMT_MASK; + break; + case 4: + rsv_bits |= EPTE_EMT_MASK | EPTE_IGMT_MASK | EPTE_SUPER_PAGE_MASK; + break; + default: + gdprintk(XENLOG_ERR,"Unsupported EPT paging level: %d\n", level); + BUG(); + break; + } + return !!(e.epte & rsv_bits); +} + +/* EMT checking*/ +static bool_t nept_emt_bits_check(ept_entry_t e, uint32_t level) +{ + if ( e.sp || level == 1 ) + { + if ( e.emt == EPT_EMT_RSV0 || e.emt == EPT_EMT_RSV1 || + e.emt == EPT_EMT_RSV2 ) + return 1; + } + return 0; +} + +static bool_t nept_permission_check(uint32_t rwx_acc, uint32_t rwx_bits) +{ + return !(EPTE_RWX_MASK & rwx_acc & ~rwx_bits); +} + +/* nept''s non-present check */ +static bool_t nept_non_present_check(ept_entry_t e) +{ + if ( e.epte & EPTE_RWX_MASK ) + return 0; + return 1; +} + +uint64_t nept_get_ept_vpid_cap(void) +{ + uint64_t caps = NEPT_VPID_CAP_BITS; + + if ( !cpu_has_vmx_ept_exec_only_supported ) + caps &= ~VMX_EPT_EXEC_ONLY_SUPPORTED; + return caps; +} + +static bool_t nept_rwx_bits_check(ept_entry_t e) +{ + /*write only or write/execute only*/ + uint8_t rwx_bits = e.epte & EPTE_RWX_MASK; + + if ( rwx_bits == ept_access_w || rwx_bits == ept_access_wx ) + return 1; + + if ( rwx_bits == ept_access_x && !(nept_get_ept_vpid_cap() & + VMX_EPT_EXEC_ONLY_SUPPORTED) ) + return 1; + + return 0; +} + +/* nept''s misconfiguration check */ +static bool_t nept_misconfiguration_check(ept_entry_t e, uint32_t level) +{ + return (nept_rsv_bits_check(e, level) || + nept_emt_bits_check(e, level) || + nept_rwx_bits_check(e)); +} + +static int ept_lvl_table_offset(unsigned long gpa, int lvl) +{ + return (gpa >>(EPT_L4_PAGETABLE_SHIFT -(4 - lvl) * 9)) & + (EPT_PAGETABLE_ENTRIES -1 ); +} + +static uint32_t +nept_walk_tables(struct vcpu *v, unsigned long l2ga, ept_walk_t *gw) +{ + int lvl; + p2m_type_t p2mt; + uint32_t rc = 0, ret = 0, gflags; + struct domain *d = v->domain; + struct p2m_domain *p2m = d->arch.p2m; + gfn_t base_gfn = _gfn(nhvm_vcpu_p2m_base(v) >> PAGE_SHIFT); + mfn_t lxmfn; + ept_entry_t *lxp = NULL; + + memset(gw, 0, sizeof(*gw)); + + for (lvl = 4; lvl > 0; lvl--) + { + lxp = map_domain_gfn(p2m, base_gfn, &lxmfn, &p2mt, P2M_ALLOC, &rc); + if ( !lxp ) + goto map_err; + gw->lxe[lvl] = lxp[ept_lvl_table_offset(l2ga, lvl)]; + unmap_domain_page(lxp); + put_page(mfn_to_page(mfn_x(lxmfn))); + + if ( nept_non_present_check(gw->lxe[lvl]) ) + goto non_present; + + if ( nept_misconfiguration_check(gw->lxe[lvl], lvl) ) + goto misconfig_err; + + if ( (lvl == 2 || lvl == 3) && nept_sp_entry(gw->lxe[lvl]) ) + { + /* Generate a fake l1 table entry so callers don''t all + * have to understand superpages. */ + unsigned long gfn_lvl_mask = (1ull << ((lvl - 1) * 9)) - 1; + gfn_t start = _gfn(gw->lxe[lvl].mfn); + /* Increment the pfn by the right number of 4k pages. */ + start = _gfn((gfn_x(start) & ~gfn_lvl_mask) + + ((l2ga >> PAGE_SHIFT) & gfn_lvl_mask)); + gflags = (gw->lxe[lvl].epte & EPTE_FLAG_MASK) | + (lvl == 3 ? NEPT_1G_ENTRY_FLAG: NEPT_2M_ENTRY_FLAG); + gw->lxe[0].epte = (gfn_x(start) << PAGE_SHIFT) | gflags; + goto done; + } + if ( lvl > 1 ) + base_gfn = _gfn(gw->lxe[lvl].mfn); + } + + /* If this is not a super entry, we can reach here. */ + gflags = (gw->lxe[1].epte & EPTE_FLAG_MASK) | NEPT_4K_ENTRY_FLAG; + gw->lxe[0].epte = (gw->lxe[1].epte & PAGE_MASK) | gflags; + +done: + ret = EPT_TRANSLATE_SUCCEED; + goto out; + +map_err: + if ( rc == _PAGE_PAGED ) + { + ret = EPT_TRANSLATE_RETRY; + goto out; + } + /* fall through to misconfig error */ +misconfig_err: + ret = EPT_TRANSLATE_MISCONFIG; + goto out; + +non_present: + ret = EPT_TRANSLATE_VIOLATION; + /* fall through. */ +out: + return ret; +} + +/* Translate a L2 guest address to L1 gpa via L1 EPT paging structure */ + +int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, + unsigned int *page_order, uint32_t rwx_acc, + unsigned long *l1gfn, uint64_t *exit_qual, + uint32_t *exit_reason) +{ + uint32_t rc, rwx_bits = 0; + ept_walk_t gw; + rwx_acc &= EPTE_RWX_MASK; + + *l1gfn = INVALID_GFN; + + rc = nept_walk_tables(v, l2ga, &gw); + switch ( rc ) + { + case EPT_TRANSLATE_SUCCEED: + if ( likely(gw.lxe[0].epte & NEPT_2M_ENTRY_FLAG) ) + { + rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte & gw.lxe[2].epte & + EPTE_RWX_MASK; + *page_order = 9; + } + else if ( gw.lxe[0].epte & NEPT_4K_ENTRY_FLAG ) + { + rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte & gw.lxe[2].epte & + gw.lxe[1].epte & EPTE_RWX_MASK; + *page_order = 0; + } + else if ( gw.lxe[0].epte & NEPT_1G_ENTRY_FLAG ) + { + rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte & EPTE_RWX_MASK; + *page_order = 18; + } + else + { + gdprintk(XENLOG_ERR, "Uncorrect l1 entry!\n"); + BUG(); + } + if ( nept_permission_check(rwx_acc, rwx_bits) ) + { + *l1gfn = gw.lxe[0].mfn; + break; + } + rc = EPT_TRANSLATE_VIOLATION; + /* Fall through to EPT violation if permission check fails. */ + case EPT_TRANSLATE_VIOLATION: + *exit_qual = (*exit_qual & 0xffffffc0) | (rwx_bits << 3) | rwx_acc; + *exit_reason = EXIT_REASON_EPT_VIOLATION; + break; + + case EPT_TRANSLATE_MISCONFIG: + rc = EPT_TRANSLATE_MISCONFIG; + *exit_qual = 0; + *exit_reason = EXIT_REASON_EPT_MISCONFIG; + break; + case EPT_TRANSLATE_RETRY: + break; + default: + gdprintk(XENLOG_ERR, "Unsupported ept translation type!:%d\n", rc); + BUG(); + break; + } + return rc; +} diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c index 8787c91..6d1264b 100644 --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -217,7 +217,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, /* let caller to handle these two cases */ switch (rv) { case NESTEDHVM_PAGEFAULT_INJECT: - return rv; + case NESTEDHVM_PAGEFAULT_RETRY: case NESTEDHVM_PAGEFAULT_L1_ERROR: return rv; case NESTEDHVM_PAGEFAULT_DONE: diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h index 4e1dda0..db8a0b6 100644 --- a/xen/include/asm-x86/guest_pt.h +++ b/xen/include/asm-x86/guest_pt.h @@ -315,6 +315,14 @@ guest_walk_to_page_order(walk_t *gw) #define GPT_RENAME2(_n, _l) _n ## _ ## _l ## _levels #define GPT_RENAME(_n, _l) GPT_RENAME2(_n, _l) #define guest_walk_tables GPT_RENAME(guest_walk_tables, GUEST_PAGING_LEVELS) +#define map_domain_gfn GPT_RENAME(map_domain_gfn, GUEST_PAGING_LEVELS) + +extern void *map_domain_gfn(struct p2m_domain *p2m, + gfn_t gfn, + mfn_t *mfn, + p2m_type_t *p2mt, + p2m_query_t q, + uint32_t *rc); extern uint32_t guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, unsigned long va, diff --git a/xen/include/asm-x86/hvm/nestedhvm.h b/xen/include/asm-x86/hvm/nestedhvm.h index 91fde0b..649c511 100644 --- a/xen/include/asm-x86/hvm/nestedhvm.h +++ b/xen/include/asm-x86/hvm/nestedhvm.h @@ -52,6 +52,7 @@ bool_t nestedhvm_vcpu_in_guestmode(struct vcpu *v); #define NESTEDHVM_PAGEFAULT_L1_ERROR 2 #define NESTEDHVM_PAGEFAULT_L0_ERROR 3 #define NESTEDHVM_PAGEFAULT_MMIO 4 +#define NESTEDHVM_PAGEFAULT_RETRY 5 int nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, bool_t access_r, bool_t access_w, bool_t access_x); diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index ef2c9c9..9a728b6 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -194,6 +194,7 @@ extern u32 vmx_secondary_exec_control; extern bool_t cpu_has_vmx_ins_outs_instr_info; +#define VMX_EPT_EXEC_ONLY_SUPPORTED 0x00000001 #define VMX_EPT_WALK_LENGTH_4_SUPPORTED 0x00000040 #define VMX_EPT_MEMORY_TYPE_UC 0x00000100 #define VMX_EPT_MEMORY_TYPE_WB 0x00004000 diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index aa5b080..c73946f 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -51,6 +51,22 @@ typedef union { u64 epte; } ept_entry_t; +typedef struct { + /*use lxe[0] to save result */ + ept_entry_t lxe[5]; +} ept_walk_t; + +typedef enum { + ept_access_n = 0, /* No access permissions allowed */ + ept_access_r = 1, /* Read only */ + ept_access_w = 2, /* Write only */ + ept_access_rw = 3, /* Read & Write */ + ept_access_x = 4, /* Exec Only */ + ept_access_rx = 5, /* Read & Exec */ + ept_access_wx = 6, /* Write & Exec*/ + ept_access_all = 7, /* Full permissions */ +} ept_access_t; + #define EPT_TABLE_ORDER 9 #define EPTE_SUPER_PAGE_MASK 0x80 #define EPTE_MFN_MASK 0xffffffffff000ULL @@ -60,6 +76,17 @@ typedef union { #define EPTE_AVAIL1_SHIFT 8 #define EPTE_EMT_SHIFT 3 #define EPTE_IGMT_SHIFT 6 +#define EPTE_RWX_MASK 0x7 +#define EPTE_FLAG_MASK 0x7f + +#define EPT_EMT_UC 0 +#define EPT_EMT_WC 1 +#define EPT_EMT_RSV0 2 +#define EPT_EMT_RSV1 3 +#define EPT_EMT_WT 4 +#define EPT_EMT_WP 5 +#define EPT_EMT_WB 6 +#define EPT_EMT_RSV2 7 void vmx_asm_vmexit_handler(struct cpu_user_regs); void vmx_asm_do_vmentry(void); @@ -191,6 +218,9 @@ void vmx_update_secondary_exec_control(struct vcpu *v); extern u64 vmx_ept_vpid_cap; +#define cpu_has_vmx_ept_exec_only_supported \ + (vmx_ept_vpid_cap & VMX_EPT_EXEC_ONLY_SUPPORTED) + #define cpu_has_vmx_ept_wl4_supported \ (vmx_ept_vpid_cap & VMX_EPT_WALK_LENGTH_4_SUPPORTED) #define cpu_has_vmx_ept_mt_uc \ @@ -419,6 +449,7 @@ void update_guest_eip(void); #define _EPT_GLA_FAULT 8 #define EPT_GLA_FAULT (1UL<<_EPT_GLA_FAULT) +#define EPT_L4_PAGETABLE_SHIFT 39 #define EPT_PAGETABLE_ENTRIES 512 #endif /* __ASM_X86_HVM_VMX_VMX_H__ */ diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index 422f006..97554bf 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -32,6 +32,10 @@ struct nestedvmx { unsigned long intr_info; u32 error_code; } intr; + struct { + uint32_t exit_reason; + uint32_t exit_qual; + } ept_exit; }; #define vcpu_2_nvmx(v) (vcpu_nestedhvm(v).u.nvmx) @@ -109,6 +113,11 @@ void nvmx_domain_relinquish_resources(struct domain *d); int nvmx_handle_vmxon(struct cpu_user_regs *regs); int nvmx_handle_vmxoff(struct cpu_user_regs *regs); +#define EPT_TRANSLATE_SUCCEED 0 +#define EPT_TRANSLATE_VIOLATION 1 +#define EPT_TRANSLATE_MISCONFIG 2 +#define EPT_TRANSLATE_RETRY 3 + int nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, unsigned int *page_order, @@ -192,5 +201,9 @@ u64 nvmx_get_tsc_offset(struct vcpu *v); int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs, unsigned int exit_reason); +int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, + unsigned int *page_order, uint32_t rwx_acc, + unsigned long *l1gfn, uint64_t *exit_qual, + uint32_t *exit_reason); #endif /* __ASM_X86_HVM_VVMX_H__ */ -- 1.7.1
Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 04/10] EPT: Make ept data structure or operations neutral
From: Zhang Xiantao <xiantao.zhang@intel.com> Share the current EPT logic with nested EPT case, so make the related data structure or operations netural to comment EPT and nested EPT. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/vmx/vmcs.c | 8 ++- xen/arch/x86/hvm/vmx/vmx.c | 53 +------------- xen/arch/x86/mm/p2m-ept.c | 104 ++++++++++++++++++++++------ xen/arch/x86/mm/p2m.c | 132 +++++++++++++++++++++++++----------- xen/include/asm-x86/hvm/vmx/vmcs.h | 23 +++--- xen/include/asm-x86/hvm/vmx/vmx.h | 10 ++- xen/include/asm-x86/p2m.h | 4 + 7 files changed, 208 insertions(+), 126 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 9adc7a4..de22e03 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -942,7 +942,13 @@ static int construct_vmcs(struct vcpu *v) } if ( paging_mode_hap(d) ) - __vmwrite(EPT_POINTER, d->arch.hvm_domain.vmx.ept_control.eptp); + { + struct p2m_domain *p2m = p2m_get_hostp2m(d); + struct ept_data *ept = &p2m->ept; + + ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m)); + __vmwrite(EPT_POINTER, ept_get_eptp(ept)); + } if ( cpu_has_vmx_pat && paging_mode_hap(d) ) { diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 4abfa90..d74aae0 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -74,38 +74,19 @@ static void vmx_fpu_dirty_intercept(void); static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content); static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content); static void vmx_invlpg_intercept(unsigned long vaddr); -static void __ept_sync_domain(void *info); static int vmx_domain_initialise(struct domain *d) { int rc; - /* Set the memory type used when accessing EPT paging structures. */ - d->arch.hvm_domain.vmx.ept_control.ept_mt = EPT_DEFAULT_MT; - - /* set EPT page-walk length, now it''s actual walk length - 1, i.e. 3 */ - d->arch.hvm_domain.vmx.ept_control.ept_wl = 3; - - d->arch.hvm_domain.vmx.ept_control.asr - pagetable_get_pfn(p2m_get_pagetable(p2m_get_hostp2m(d))); - - if ( !zalloc_cpumask_var(&d->arch.hvm_domain.vmx.ept_synced) ) - return -ENOMEM; - if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 ) - { - free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced); return rc; - } return 0; } static void vmx_domain_destroy(struct domain *d) { - if ( paging_mode_hap(d) ) - on_each_cpu(__ept_sync_domain, d, 1); - free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced); vmx_free_vlapic_mapping(d); } @@ -641,6 +622,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v) { struct domain *d = v->domain; unsigned long old_cr4 = read_cr4(), new_cr4 = mmu_cr4_features; + struct ept_data *ept_data = &p2m_get_hostp2m(d)->ept; /* HOST_CR4 in VMCS is always mmu_cr4_features. Sync CR4 now. */ if ( old_cr4 != new_cr4 ) @@ -650,10 +632,10 @@ static void vmx_ctxt_switch_to(struct vcpu *v) { unsigned int cpu = smp_processor_id(); /* Test-and-test-and-set this CPU in the EPT-is-synced mask. */ - if ( !cpumask_test_cpu(cpu, d->arch.hvm_domain.vmx.ept_synced) && + if ( !cpumask_test_cpu(cpu, ept_get_synced_mask(ept_data)) && !cpumask_test_and_set_cpu(cpu, - d->arch.hvm_domain.vmx.ept_synced) ) - __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0); + ept_get_synced_mask(ept_data)) ) + __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0); } vmx_restore_guest_msrs(v); @@ -1216,33 +1198,6 @@ static void vmx_update_guest_efer(struct vcpu *v) (v->arch.hvm_vcpu.guest_efer & EFER_SCE)); } -static void __ept_sync_domain(void *info) -{ - struct domain *d = info; - __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0); -} - -void ept_sync_domain(struct domain *d) -{ - /* Only if using EPT and this domain has some VCPUs to dirty. */ - if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] ) - return; - - ASSERT(local_irq_is_enabled()); - - /* - * Flush active cpus synchronously. Flush others the next time this domain - * is scheduled onto them. We accept the race of other CPUs adding to - * the ept_synced mask before on_selected_cpus() reads it, resulting in - * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack. - */ - cpumask_and(d->arch.hvm_domain.vmx.ept_synced, - d->domain_dirty_cpumask, &cpu_online_map); - - on_selected_cpus(d->arch.hvm_domain.vmx.ept_synced, - __ept_sync_domain, d, 1); -} - void nvmx_enqueue_n2_exceptions(struct vcpu *v, unsigned long intr_fields, int error_code) { diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c index c964f54..e33f415 100644 --- a/xen/arch/x86/mm/p2m-ept.c +++ b/xen/arch/x86/mm/p2m-ept.c @@ -291,9 +291,11 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, int need_modify_vtd_table = 1; int vtd_pte_present = 0; int needs_sync = 1; - struct domain *d = p2m->domain; ept_entry_t old_entry = { .epte = 0 }; + struct ept_data *ept = &p2m->ept; + struct domain *d = p2m->domain; + ASSERT(ept); /* * the caller must make sure: * 1. passing valid gfn and mfn at order boundary. @@ -301,17 +303,17 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn, mfn_t mfn, * 3. passing a valid order. */ if ( ((gfn | mfn_x(mfn)) & ((1UL << order) - 1)) || - ((u64)gfn >> ((ept_get_wl(d) + 1) * EPT_TABLE_ORDER)) || + ((u64)gfn >> ((ept_get_wl(ept) + 1) * EPT_TABLE_ORDER)) || (order % EPT_TABLE_ORDER) ) return 0; - ASSERT((target == 2 && hvm_hap_has_1gb(d)) || - (target == 1 && hvm_hap_has_2mb(d)) || + ASSERT((target == 2 && hvm_hap_has_1gb()) || + (target == 1 && hvm_hap_has_2mb()) || (target == 0)); - table = map_domain_page(ept_get_asr(d)); + table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); - for ( i = ept_get_wl(d); i > target; i-- ) + for ( i = ept_get_wl(ept); i > target; i-- ) { ret = ept_next_level(p2m, 0, &table, &gfn_remainder, i); if ( !ret ) @@ -439,9 +441,11 @@ out: unmap_domain_page(table); if ( needs_sync ) - ept_sync_domain(p2m->domain); + ept_sync_domain(p2m); - if ( rv && iommu_enabled && need_iommu(p2m->domain) && need_modify_vtd_table ) + /* For non-nested p2m, may need to change VT-d page table.*/ + if ( rv && !p2m_is_nestedp2m(p2m) && iommu_enabled && need_iommu(p2m->domain) && + need_modify_vtd_table ) { if ( iommu_hap_pt_share ) iommu_pte_flush(d, gfn, (u64*)ept_entry, order, vtd_pte_present); @@ -488,14 +492,14 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m, unsigned long gfn, p2m_type_t *t, p2m_access_t* a, p2m_query_t q, unsigned int *page_order) { - struct domain *d = p2m->domain; - ept_entry_t *table = map_domain_page(ept_get_asr(d)); + ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); unsigned long gfn_remainder = gfn; ept_entry_t *ept_entry; u32 index; int i; int ret = 0; mfn_t mfn = _mfn(INVALID_MFN); + struct ept_data *ept = &p2m->ept; *t = p2m_mmio_dm; *a = p2m_access_n; @@ -506,7 +510,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m, /* Should check if gfn obeys GAW here. */ - for ( i = ept_get_wl(d); i > 0; i-- ) + for ( i = ept_get_wl(ept); i > 0; i-- ) { retry: ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); @@ -588,19 +592,20 @@ out: static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m, unsigned long gfn, int *level) { - ept_entry_t *table = map_domain_page(ept_get_asr(p2m->domain)); + ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); unsigned long gfn_remainder = gfn; ept_entry_t *ept_entry; ept_entry_t content = { .epte = 0 }; u32 index; int i; int ret=0; + struct ept_data *ept = &p2m->ept; /* This pfn is higher than the highest the p2m map currently holds */ if ( gfn > p2m->max_mapped_pfn ) goto out; - for ( i = ept_get_wl(p2m->domain); i > 0; i-- ) + for ( i = ept_get_wl(ept); i > 0; i-- ) { ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); if ( !ret || ret == GUEST_TABLE_POD_PAGE ) @@ -622,7 +627,8 @@ static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m, void ept_walk_table(struct domain *d, unsigned long gfn) { struct p2m_domain *p2m = p2m_get_hostp2m(d); - ept_entry_t *table = map_domain_page(ept_get_asr(d)); + struct ept_data *ept = &p2m->ept; + ept_entry_t *table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); unsigned long gfn_remainder = gfn; int i; @@ -638,7 +644,7 @@ void ept_walk_table(struct domain *d, unsigned long gfn) goto out; } - for ( i = ept_get_wl(d); i >= 0; i-- ) + for ( i = ept_get_wl(ept); i >= 0; i-- ) { ept_entry_t *ept_entry, *next; u32 index; @@ -778,24 +784,76 @@ static void ept_change_entry_type_page(mfn_t ept_page_mfn, int ept_page_level, static void ept_change_entry_type_global(struct p2m_domain *p2m, p2m_type_t ot, p2m_type_t nt) { - struct domain *d = p2m->domain; - if ( ept_get_asr(d) == 0 ) + struct ept_data *ept = &p2m->ept; + if ( ept_get_asr(ept) == 0 ) return; BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt)); BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt == p2m_mmio_direct)); - ept_change_entry_type_page(_mfn(ept_get_asr(d)), ept_get_wl(d), ot, nt); + ept_change_entry_type_page(_mfn(ept_get_asr(ept)), + ept_get_wl(ept), ot, nt); + + ept_sync_domain(p2m); +} + +static void __ept_sync_domain(void *info) +{ + struct ept_data *ept = &((struct p2m_domain *)info)->ept; - ept_sync_domain(d); + __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0); } -void ept_p2m_init(struct p2m_domain *p2m) +void ept_sync_domain(struct p2m_domain *p2m) { + struct domain *d = p2m->domain; + struct ept_data *ept = &p2m->ept; + /* Only if using EPT and this domain has some VCPUs to dirty. */ + if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] ) + return; + + ASSERT(local_irq_is_enabled()); + + /* + * Flush active cpus synchronously. Flush others the next time this domain + * is scheduled onto them. We accept the race of other CPUs adding to + * the ept_synced mask before on_selected_cpus() reads it, resulting in + * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack. + */ + cpumask_and(ept_get_synced_mask(ept), + d->domain_dirty_cpumask, &cpu_online_map); + + on_selected_cpus(ept_get_synced_mask(ept), + __ept_sync_domain, p2m, 1); +} + +int ept_p2m_init(struct p2m_domain *p2m) +{ + struct ept_data *ept = &p2m->ept; + p2m->set_entry = ept_set_entry; p2m->get_entry = ept_get_entry; p2m->change_entry_type_global = ept_change_entry_type_global; p2m->audit_p2m = NULL; + + /* Set the memory type used when accessing EPT paging structures. */ + ept->ept_mt = EPT_DEFAULT_MT; + + /* set EPT page-walk length, now it''s actual walk length - 1, i.e. 3 */ + ept->ept_wl = 3; + + if ( !zalloc_cpumask_var(&ept->synced_mask) ) + return -ENOMEM; + + on_each_cpu(__ept_sync_domain, p2m, 1); + + return 0; +} + +void ept_p2m_uninit(struct p2m_domain *p2m) +{ + struct ept_data *ept = &p2m->ept; + free_cpumask_var(ept->synced_mask); } static void ept_dump_p2m_table(unsigned char key) @@ -811,6 +869,7 @@ static void ept_dump_p2m_table(unsigned char key) unsigned long gfn, gfn_remainder; unsigned long record_counter = 0; struct p2m_domain *p2m; + struct ept_data *ept; for_each_domain(d) { @@ -818,15 +877,16 @@ static void ept_dump_p2m_table(unsigned char key) continue; p2m = p2m_get_hostp2m(d); + ept = &p2m->ept; printk("\ndomain%d EPT p2m table: \n", d->domain_id); for ( gfn = 0; gfn <= p2m->max_mapped_pfn; gfn += (1 << order) ) { gfn_remainder = gfn; mfn = _mfn(INVALID_MFN); - table = map_domain_page(ept_get_asr(d)); + table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m))); - for ( i = ept_get_wl(d); i > 0; i-- ) + for ( i = ept_get_wl(ept); i > 0; i-- ) { ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i); if ( ret != GUEST_TABLE_NORMAL_PAGE ) diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 41a461b..49eb8af 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -57,8 +57,10 @@ boolean_param("hap_2mb", opt_hap_2mb); /* Init the datastructures for later use by the p2m code */ -static void p2m_initialise(struct domain *d, struct p2m_domain *p2m) +static int p2m_initialise(struct domain *d, struct p2m_domain *p2m) { + int ret = 0; + mm_rwlock_init(&p2m->lock); mm_lock_init(&p2m->pod.lock); INIT_LIST_HEAD(&p2m->np2m_list); @@ -72,27 +74,81 @@ static void p2m_initialise(struct domain *d, struct p2m_domain *p2m) p2m->np2m_base = P2M_BASE_EADDR; if ( hap_enabled(d) && cpu_has_vmx ) - ept_p2m_init(p2m); + ret = ept_p2m_init(p2m); else p2m_pt_init(p2m); - return; + return ret; +} + +static struct p2m_domain *p2m_init_one(struct domain *d) +{ + struct p2m_domain *p2m = xzalloc(struct p2m_domain); + + if ( !p2m ) + return NULL; + + if ( !zalloc_cpumask_var(&p2m->dirty_cpumask) ) + goto free_p2m; + + if ( p2m_initialise(d, p2m) ) + goto free_cpumask; + return p2m; + +free_cpumask: + free_cpumask_var(p2m->dirty_cpumask); +free_p2m: + xfree(p2m); + return NULL; } -static int -p2m_init_nestedp2m(struct domain *d) +static void p2m_free_one(struct p2m_domain *p2m) +{ + if ( hap_enabled(p2m->domain) && cpu_has_vmx ) + ept_p2m_uninit(p2m); + free_cpumask_var(p2m->dirty_cpumask); + xfree(p2m); +} + +static int p2m_init_hostp2m(struct domain *d) +{ + struct p2m_domain *p2m = p2m_init_one(d); + + if ( p2m ) + { + d->arch.p2m = p2m; + return 0; + } + return -ENOMEM; +} + +static void p2m_teardown_hostp2m(struct domain *d) +{ + /* Iterate over all p2m tables per domain */ + struct p2m_domain *p2m = p2m_get_hostp2m(d); + + if ( p2m ) { + p2m_free_one(p2m); + d->arch.p2m = NULL; + } +} + +static void p2m_teardown_nestedp2m(struct domain *d); + +static int p2m_init_nestedp2m(struct domain *d) { uint8_t i; struct p2m_domain *p2m; mm_lock_init(&d->arch.nested_p2m_lock); - for (i = 0; i < MAX_NESTEDP2M; i++) { - d->arch.nested_p2m[i] = p2m = xzalloc(struct p2m_domain); - if (p2m == NULL) - return -ENOMEM; - if ( !zalloc_cpumask_var(&p2m->dirty_cpumask) ) + for (i = 0; i < MAX_NESTEDP2M; i++) + { + d->arch.nested_p2m[i] = p2m = p2m_init_one(d); + if ( p2m == NULL ) + { + p2m_teardown_nestedp2m(d); return -ENOMEM; - p2m_initialise(d, p2m); + } p2m->write_p2m_entry = nestedp2m_write_p2m_entry; list_add(&p2m->np2m_list, &p2m_get_hostp2m(d)->np2m_list); } @@ -100,27 +156,37 @@ p2m_init_nestedp2m(struct domain *d) return 0; } -int p2m_init(struct domain *d) +static void p2m_teardown_nestedp2m(struct domain *d) { + uint8_t i; struct p2m_domain *p2m; - int rc; - p2m_get_hostp2m(d) = p2m = xzalloc(struct p2m_domain); - if ( p2m == NULL ) - return -ENOMEM; - if ( !zalloc_cpumask_var(&p2m->dirty_cpumask) ) + for (i = 0; i < MAX_NESTEDP2M; i++) { - xfree(p2m); - return -ENOMEM; + if ( !d->arch.nested_p2m[i] ) + continue; + p2m = d->arch.nested_p2m[i]; + list_del(&p2m->np2m_list); + p2m_free_one(p2m); + d->arch.nested_p2m[i] = NULL; } - p2m_initialise(d, p2m); +} + +int p2m_init(struct domain *d) +{ + int rc; + + rc = p2m_init_hostp2m(d); + if ( rc ) + return rc; /* Must initialise nestedp2m unconditionally * since nestedhvm_enabled(d) returns false here. * (p2m_init runs too early for HVM_PARAM_* options) */ rc = p2m_init_nestedp2m(d); - if ( rc ) - p2m_final_teardown(d); + if ( rc ) + p2m_teardown_hostp2m(d); + return rc; } @@ -421,28 +487,12 @@ void p2m_teardown(struct p2m_domain *p2m) p2m_unlock(p2m); } -static void p2m_teardown_nestedp2m(struct domain *d) -{ - uint8_t i; - - for (i = 0; i < MAX_NESTEDP2M; i++) { - if ( !d->arch.nested_p2m[i] ) - continue; - free_cpumask_var(d->arch.nested_p2m[i]->dirty_cpumask); - xfree(d->arch.nested_p2m[i]); - d->arch.nested_p2m[i] = NULL; - } -} - void p2m_final_teardown(struct domain *d) { /* Iterate over all p2m tables per domain */ - if ( d->arch.p2m ) - { - free_cpumask_var(d->arch.p2m->dirty_cpumask); - xfree(d->arch.p2m); - d->arch.p2m = NULL; - } + struct p2m_domain *p2m = p2m_get_hostp2m(d); + if ( p2m ) + p2m_teardown_hostp2m(d); /* We must teardown unconditionally because * we initialise them unconditionally. diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index 9a728b6..2d38b43 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -56,26 +56,27 @@ struct vmx_msr_state { #define EPT_DEFAULT_MT MTRR_TYPE_WRBACK -struct vmx_domain { - unsigned long apic_access_mfn; +struct ept_data{ union { - struct { + struct { u64 ept_mt :3, ept_wl :3, rsvd :6, asr :52; }; u64 eptp; - } ept_control; - cpumask_var_t ept_synced; + }; + cpumask_var_t synced_mask; +}; + +struct vmx_domain { + unsigned long apic_access_mfn; }; -#define ept_get_wl(d) \ - ((d)->arch.hvm_domain.vmx.ept_control.ept_wl) -#define ept_get_asr(d) \ - ((d)->arch.hvm_domain.vmx.ept_control.asr) -#define ept_get_eptp(d) \ - ((d)->arch.hvm_domain.vmx.ept_control.eptp) +#define ept_get_wl(ept) ((ept)->ept_wl) +#define ept_get_asr(ept) ((ept)->asr) +#define ept_get_eptp(ept) ((ept)->eptp) +#define ept_get_synced_mask(ept) ((ept)->synced_mask) struct arch_vmx_struct { /* Virtual address of VMCS. */ diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h b/xen/include/asm-x86/hvm/vmx/vmx.h index c73946f..d4d6feb 100644 --- a/xen/include/asm-x86/hvm/vmx/vmx.h +++ b/xen/include/asm-x86/hvm/vmx/vmx.h @@ -363,7 +363,7 @@ static inline void ept_sync_all(void) __invept(INVEPT_ALL_CONTEXT, 0, 0); } -void ept_sync_domain(struct domain *d); +void ept_sync_domain(struct p2m_domain *p2m); static inline void vpid_sync_vcpu_gva(struct vcpu *v, unsigned long gva) { @@ -425,12 +425,18 @@ void vmx_get_segment_register(struct vcpu *, enum x86_segment, void vmx_inject_extint(int trap); void vmx_inject_nmi(void); -void ept_p2m_init(struct p2m_domain *p2m); +int ept_p2m_init(struct p2m_domain *p2m); +void ept_p2m_uninit(struct p2m_domain *p2m); + void ept_walk_table(struct domain *d, unsigned long gfn); void setup_ept_dump(void); void update_guest_eip(void); +int alloc_p2m_hap_data(struct p2m_domain *p2m); +void free_p2m_hap_data(struct p2m_domain *p2m); +void p2m_init_hap_data(struct p2m_domain *p2m); + /* EPT violation qualifications definitions */ #define _EPT_READ_VIOLATION 0 #define EPT_READ_VIOLATION (1UL<<_EPT_READ_VIOLATION) diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h index ce26594..b6a84b6 100644 --- a/xen/include/asm-x86/p2m.h +++ b/xen/include/asm-x86/p2m.h @@ -277,6 +277,10 @@ struct p2m_domain { mm_lock_t lock; /* Locking of private pod structs, * * not relying on the p2m lock. */ } pod; + union { + struct ept_data ept; + /* NPT-equivalent structure could be added here. */ + }; }; /* get host p2m table */ -- 1.7.1
Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 05/10] nEPT: Try to enable EPT paging for L2 guest.
From: Zhang Xiantao <xiantao.zhang@intel.com> Once found EPT is enabled by L1 VMM, enabled nested EPT support for L2 guest. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Acked-by: Tim Deegan <tim@xen.org> --- xen/arch/x86/hvm/vmx/vmx.c | 16 +++++++++-- xen/arch/x86/hvm/vmx/vvmx.c | 48 +++++++++++++++++++++++++++-------- xen/include/asm-x86/hvm/vmx/vvmx.h | 5 +++- 3 files changed, 54 insertions(+), 15 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index d74aae0..ed8d532 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -1461,6 +1461,7 @@ static struct hvm_function_table __read_mostly vmx_function_table = { .nhvm_vcpu_guestcr3 = nvmx_vcpu_guestcr3, .nhvm_vcpu_p2m_base = nvmx_vcpu_eptp_base, .nhvm_vcpu_asid = nvmx_vcpu_asid, + .nhvm_vmcx_hap_enabled = nvmx_ept_enabled, .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception, .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap, .nhvm_intr_blocked = nvmx_intr_blocked, @@ -2003,6 +2004,7 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa) unsigned long gla, gfn = gpa >> PAGE_SHIFT; mfn_t mfn; p2m_type_t p2mt; + int ret; struct domain *d = current->domain; if ( tb_init_done ) @@ -2017,18 +2019,26 @@ static void ept_handle_violation(unsigned long qualification, paddr_t gpa) _d.gpa = gpa; _d.qualification = qualification; _d.mfn = mfn_x(get_gfn_query_unlocked(d, gfn, &_d.p2mt)); - + __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d); } - if ( hvm_hap_nested_page_fault(gpa, + ret = hvm_hap_nested_page_fault(gpa, qualification & EPT_GLA_VALID ? 1 : 0, qualification & EPT_GLA_VALID ? __vmread(GUEST_LINEAR_ADDRESS) : ~0ull, qualification & EPT_READ_VIOLATION ? 1 : 0, qualification & EPT_WRITE_VIOLATION ? 1 : 0, - qualification & EPT_EXEC_VIOLATION ? 1 : 0) ) + qualification & EPT_EXEC_VIOLATION ? 1 : 0); + switch ( ret ) { + case 0: // Unhandled L1 EPT violation + break; + case 1: // This violation is handled completly return; + case -1: // This vioaltion should be injected to L1 VMM + vcpu_nestedhvm(current).nv_vmexit_pending = 1; + return; + } /* Everything else is an error. */ mfn = get_gfn_query_unlocked(d, gfn, &p2mt); diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 1d3090d..f9699dc 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -41,6 +41,7 @@ int nvmx_vcpu_initialise(struct vcpu *v) gdprintk(XENLOG_ERR, "nest: allocation for shadow vmcs failed\n"); goto out; } + nvmx->ept.enabled = 0; nvmx->vmxon_region_pa = 0; nvcpu->nv_vvmcx = NULL; nvcpu->nv_vvmcxaddr = VMCX_EADDR; @@ -96,9 +97,11 @@ uint64_t nvmx_vcpu_guestcr3(struct vcpu *v) uint64_t nvmx_vcpu_eptp_base(struct vcpu *v) { - /* TODO */ - ASSERT(0); - return 0; + uint64_t eptp_base; + struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); + + eptp_base = __get_vvmcs(nvcpu->nv_vvmcx, EPT_POINTER); + return eptp_base & PAGE_MASK; } uint32_t nvmx_vcpu_asid(struct vcpu *v) @@ -108,6 +111,13 @@ uint32_t nvmx_vcpu_asid(struct vcpu *v) return 0; } +bool_t nvmx_ept_enabled(struct vcpu *v) +{ + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + + return !!(nvmx->ept.enabled); +} + static const enum x86_segment sreg_to_index[] = { [VMX_SREG_ES] = x86_seg_es, [VMX_SREG_CS] = x86_seg_cs, @@ -502,14 +512,16 @@ void nvmx_update_exec_control(struct vcpu *v, u32 host_cntrl) } void nvmx_update_secondary_exec_control(struct vcpu *v, - unsigned long value) + unsigned long host_cntrl) { u32 shadow_cntrl; struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); shadow_cntrl = __get_vvmcs(nvcpu->nv_vvmcx, SECONDARY_VM_EXEC_CONTROL); - shadow_cntrl |= value; - set_shadow_control(v, SECONDARY_VM_EXEC_CONTROL, shadow_cntrl); + nvmx->ept.enabled = !!(shadow_cntrl & SECONDARY_EXEC_ENABLE_EPT); + shadow_cntrl |= host_cntrl; + __vmwrite(SECONDARY_VM_EXEC_CONTROL, shadow_cntrl); } static void nvmx_update_pin_control(struct vcpu *v, unsigned long host_cntrl) @@ -851,6 +863,17 @@ static void load_shadow_guest_state(struct vcpu *v) /* TODO: CR3 target control */ } + +static uint64_t get_shadow_eptp(struct vcpu *v) +{ + uint64_t np2m_base = nvmx_vcpu_eptp_base(v); + struct p2m_domain *p2m = p2m_get_nestedp2m(v, np2m_base); + struct ept_data *ept = &p2m->ept; + + ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m)); + return ept_get_eptp(ept); +} + static void virtual_vmentry(struct cpu_user_regs *regs) { struct vcpu *v = current; @@ -895,7 +918,10 @@ static void virtual_vmentry(struct cpu_user_regs *regs) /* updating host cr0 to sync TS bit */ __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0); - /* TODO: EPT_POINTER */ + /* Setup virtual ETP for L2 guest*/ + if ( nestedhvm_paging_mode_hap(v) ) + __vmwrite(EPT_POINTER, get_shadow_eptp(v)); + } static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs) @@ -948,8 +974,8 @@ static void sync_vvmcs_ro(struct vcpu *v) /* Adjust exit_reason/exit_qualifciation for violation case */ if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) == EXIT_REASON_EPT_VIOLATION ) { - __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept_exit.exit_qual); - __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept_exit.exit_reason); + __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept.exit_qual); + __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept.exit_reason); } } @@ -1515,8 +1541,8 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, case EPT_TRANSLATE_VIOLATION: case EPT_TRANSLATE_MISCONFIG: rc = NESTEDHVM_PAGEFAULT_INJECT; - nvmx->ept_exit.exit_reason = exit_reason; - nvmx->ept_exit.exit_qual = exit_qual; + nvmx->ept.exit_reason = exit_reason; + nvmx->ept.exit_qual = exit_qual; break; case EPT_TRANSLATE_RETRY: rc = NESTEDHVM_PAGEFAULT_RETRY; diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index 97554bf..e3d1a22 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -33,9 +33,10 @@ struct nestedvmx { u32 error_code; } intr; struct { + bool_t enabled; uint32_t exit_reason; uint32_t exit_qual; - } ept_exit; + } ept; }; #define vcpu_2_nvmx(v) (vcpu_nestedhvm(v).u.nvmx) @@ -110,6 +111,8 @@ int nvmx_intercepts_exception(struct vcpu *v, unsigned int trap, int error_code); void nvmx_domain_relinquish_resources(struct domain *d); +bool_t nvmx_ept_enabled(struct vcpu *v); + int nvmx_handle_vmxon(struct cpu_user_regs *regs); int nvmx_handle_vmxoff(struct cpu_user_regs *regs); -- 1.7.1
Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode
From: Zhang Xiantao <xiantao.zhang@intel.com> For PAE L2 guest, GUEST_DPPTR registers needs to be synced for each virtual vmentry. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Acked-by: Tim Deegan <tim@xen.org> --- xen/arch/x86/hvm/vmx/vvmx.c | 10 +++++++++- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index f9699dc..7b48436 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -859,7 +859,15 @@ static void load_shadow_guest_state(struct vcpu *v) vvmcs_to_shadow(vvmcs, CR0_GUEST_HOST_MASK); vvmcs_to_shadow(vvmcs, CR4_GUEST_HOST_MASK); - /* TODO: PDPTRs for nested ept */ + if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) && + (v->arch.hvm_vcpu.guest_efer & EFER_LMA) ) + { + vvmcs_to_shadow(vvmcs, GUEST_PDPTR0); + vvmcs_to_shadow(vvmcs, GUEST_PDPTR1); + vvmcs_to_shadow(vvmcs, GUEST_PDPTR2); + vvmcs_to_shadow(vvmcs, GUEST_PDPTR3); + } + /* TODO: CR3 target control */ } -- 1.7.1
Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 07/10] nEPT: Use minimal permission for nested p2m.
From: Zhang Xiantao <xiantao.zhang@intel.com> Emulate permission check for the nested p2m. Current solution is to use minimal permission, and once meet permission violation in L0, then determin whether it is caused by guest EPT or host EPT Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/svm/nestedsvm.c | 2 +- xen/arch/x86/hvm/vmx/vvmx.c | 4 +- xen/arch/x86/mm/hap/nested_ept.c | 5 ++- xen/arch/x86/mm/hap/nested_hap.c | 39 +++++++++++++++++++++++------- xen/include/asm-x86/hvm/hvm.h | 2 +- xen/include/asm-x86/hvm/svm/nestedsvm.h | 2 +- xen/include/asm-x86/hvm/vmx/vvmx.h | 6 ++-- 7 files changed, 41 insertions(+), 19 deletions(-) diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c index c1c6fa7..b8a93f4 100644 --- a/xen/arch/x86/hvm/svm/nestedsvm.c +++ b/xen/arch/x86/hvm/svm/nestedsvm.c @@ -1177,7 +1177,7 @@ nsvm_vmcb_hap_enabled(struct vcpu *v) */ int nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x) { uint32_t pfec; diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 7b48436..f1f6af2 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1528,7 +1528,7 @@ int nvmx_msr_write_intercept(unsigned int msr, u64 msr_content) */ int nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x) { int rc; @@ -1538,7 +1538,7 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, uint32_t rwx_rights = (access_x << 2) | (access_w << 1) | access_r; struct nestedvmx *nvmx = &vcpu_2_nvmx(v); - rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn, + rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn, p2m_acc, &exit_qual, &exit_reason); switch ( rc ) { diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c index 1463d81..4393065 100644 --- a/xen/arch/x86/mm/hap/nested_ept.c +++ b/xen/arch/x86/mm/hap/nested_ept.c @@ -224,8 +224,8 @@ out: int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, unsigned int *page_order, uint32_t rwx_acc, - unsigned long *l1gfn, uint64_t *exit_qual, - uint32_t *exit_reason) + unsigned long *l1gfn, uint8_t *p2m_acc, + uint64_t *exit_qual, uint32_t *exit_reason) { uint32_t rc, rwx_bits = 0; ept_walk_t gw; @@ -262,6 +262,7 @@ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, if ( nept_permission_check(rwx_acc, rwx_bits) ) { *l1gfn = gw.lxe[0].mfn; + *p2m_acc = (uint8_t)rwx_bits; break; } rc = EPT_TRANSLATE_VIOLATION; diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c index 6d1264b..7722a2a 100644 --- a/xen/arch/x86/mm/hap/nested_hap.c +++ b/xen/arch/x86/mm/hap/nested_hap.c @@ -142,12 +142,12 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m, */ static int nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x) { ASSERT(hvm_funcs.nhvm_hap_walk_L1_p2m); - return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order, + return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order, p2m_acc, access_r, access_w, access_x); } @@ -158,16 +158,15 @@ nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, */ static int nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa, - p2m_type_t *p2mt, + p2m_type_t *p2mt, p2m_access_t *p2ma, unsigned int *page_order, bool_t access_r, bool_t access_w, bool_t access_x) { mfn_t mfn; - p2m_access_t p2ma; int rc; /* walk L0 P2M table */ - mfn = get_gfn_type_access(p2m, L1_gpa >> PAGE_SHIFT, p2mt, &p2ma, + mfn = get_gfn_type_access(p2m, L1_gpa >> PAGE_SHIFT, p2mt, p2ma, 0, page_order); rc = NESTEDHVM_PAGEFAULT_MMIO; @@ -206,12 +205,14 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, struct p2m_domain *p2m, *nested_p2m; unsigned int page_order_21, page_order_10, page_order_20; p2m_type_t p2mt_10; + p2m_access_t p2ma_10 = p2m_access_rwx; + uint8_t p2ma_21 = p2m_access_rwx; p2m = p2m_get_hostp2m(d); /* L0 p2m */ nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v)); /* walk the L1 P2M table */ - rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21, + rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21, &p2ma_21, access_r, access_w, access_x); /* let caller to handle these two cases */ @@ -229,7 +230,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, /* ==> we have to walk L0 P2M */ rv = nestedhap_walk_L0_p2m(p2m, L1_gpa, &L0_gpa, - &p2mt_10, &page_order_10, + &p2mt_10, &p2ma_10, &page_order_10, access_r, access_w, access_x); /* let upper level caller to handle these two cases */ @@ -250,10 +251,30 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa, page_order_20 = min(page_order_21, page_order_10); + ASSERT(p2ma_10 <= p2m_access_n2rwx); + /*NOTE: if assert fails, needs to handle new access type here */ + + switch ( p2ma_10 ) + { + case p2m_access_n ... p2m_access_rwx: + break; + case p2m_access_rx2rw: + p2ma_10 = p2m_access_rx; + break; + case p2m_access_n2rwx: + p2ma_10 = p2m_access_n; + break; + default: + p2ma_10 = p2m_access_n; + /* For safety, remove all permissions. */ + gdprintk(XENLOG_ERR, "Unhandled p2m access type:%d\n", p2ma_10); + } + /* Use minimal permission for nested p2m. */ + p2ma_10 &= (p2m_access_t)p2ma_21; + /* fix p2m_get_pagetable(nested_p2m) */ nestedhap_fix_p2m(v, nested_p2m, *L2_gpa, L0_gpa, page_order_20, - p2mt_10, - p2m_access_rwx /* FIXME: Should use minimum permission. */); + p2mt_10, p2ma_10); return NESTEDHVM_PAGEFAULT_DONE; } diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index 80f07e9..889e3c9 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -186,7 +186,7 @@ struct hvm_function_table { /*Walk nested p2m */ int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x); }; diff --git a/xen/include/asm-x86/hvm/svm/nestedsvm.h b/xen/include/asm-x86/hvm/svm/nestedsvm.h index 0c90f30..748cc04 100644 --- a/xen/include/asm-x86/hvm/svm/nestedsvm.h +++ b/xen/include/asm-x86/hvm/svm/nestedsvm.h @@ -134,7 +134,7 @@ void svm_vmexit_do_clgi(struct cpu_user_regs *regs, struct vcpu *v); void svm_vmexit_do_stgi(struct cpu_user_regs *regs, struct vcpu *v); bool_t nestedsvm_gif_isset(struct vcpu *v); int nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x); #define NSVM_INTR_NOTHANDLED 3 diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index e3d1a22..1da0e77 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -123,7 +123,7 @@ int nvmx_handle_vmxoff(struct cpu_user_regs *regs); int nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, - unsigned int *page_order, + unsigned int *page_order, uint8_t *p2m_acc, bool_t access_r, bool_t access_w, bool_t access_x); /* * Virtual VMCS layout @@ -206,7 +206,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs, int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, unsigned int *page_order, uint32_t rwx_acc, - unsigned long *l1gfn, uint64_t *exit_qual, - uint32_t *exit_reason); + unsigned long *l1gfn, uint8_t *p2m_acc, + uint64_t *exit_qual, uint32_t *exit_reason); #endif /* __ASM_X86_HVM_VVMX_H__ */ -- 1.7.1
Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 08/10] nEPT: handle invept instruction from L1 VMM
From: Zhang Xiantao <xiantao.zhang@intel.com> Add the INVEPT instruction emulation logic. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Acked-by: Tim Deegan <tim@xen.org> --- xen/arch/x86/hvm/vmx/vmx.c | 6 +++++- xen/arch/x86/hvm/vmx/vvmx.c | 36 ++++++++++++++++++++++++++++++++++++ xen/include/asm-x86/hvm/vmx/vvmx.h | 1 + 3 files changed, 42 insertions(+), 1 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index ed8d532..94cac17 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2573,10 +2573,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) update_guest_eip(); break; + case EXIT_REASON_INVEPT: + if ( nvmx_handle_invept(regs) == X86EMUL_OKAY ) + update_guest_eip(); + break; + case EXIT_REASON_MWAIT_INSTRUCTION: case EXIT_REASON_MONITOR_INSTRUCTION: case EXIT_REASON_GETSEC: - case EXIT_REASON_INVEPT: case EXIT_REASON_INVVPID: /* * We should never exit on GETSEC because CR4.SMXE is always 0 when diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index f1f6af2..c31f7ba 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1390,6 +1390,42 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs) return X86EMUL_OKAY; } +int nvmx_handle_invept(struct cpu_user_regs *regs) +{ + struct vmx_inst_decoded decode; + unsigned long eptp; + u64 inv_type; + + if ( decode_vmx_inst(regs, &decode, &eptp, 0) != X86EMUL_OKAY ) + return X86EMUL_EXCEPTION; + + inv_type = reg_read(regs, decode.reg2); + + switch ( inv_type ) + { + case INVEPT_SINGLE_CONTEXT: + { + struct p2m_domain *p2m = vcpu_nestedhvm(current).nv_p2m; + if ( p2m ) + { + p2m_flush(current, p2m); + ept_sync_domain(p2m); + } + break; + } + case INVEPT_ALL_CONTEXT: + p2m_flush_nestedp2m(current->domain); + __invept(INVEPT_ALL_CONTEXT, 0, 0); + break; + default: + vmreturn(regs, VMFAIL_INVALID); + return X86EMUL_OKAY; + } + vmreturn(regs, VMSUCCEED); + return X86EMUL_OKAY; +} + + #define __emul_value(enable1, default1) \ ((enable1 | default1) << 32 | (default1)) diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index 1da0e77..e671635 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -189,6 +189,7 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs); int nvmx_handle_vmwrite(struct cpu_user_regs *regs); int nvmx_handle_vmresume(struct cpu_user_regs *regs); int nvmx_handle_vmlaunch(struct cpu_user_regs *regs); +int nvmx_handle_invept(struct cpu_user_regs *regs); int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content); int nvmx_msr_write_intercept(unsigned int msr, -- 1.7.1
Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 09/10] nVMX: virutalize VPID capability to nested VMM.
From: Zhang Xiantao <xiantao.zhang@intel.com> Virtualize VPID for the nested vmm, use host''s VPID to emualte guest''s VPID. For each virtual vmentry, if guest''v vpid is changed, allocate a new host VPID for L2 guest. Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> Acked-by: Tim Deegan <tim@xen.org> --- xen/arch/x86/hvm/vmx/vmx.c | 11 ++++++- xen/arch/x86/hvm/vmx/vvmx.c | 52 +++++++++++++++++++++++++++++++++++- xen/include/asm-x86/hvm/vmx/vvmx.h | 2 + 3 files changed, 62 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 94cac17..0e479f8 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2578,10 +2578,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) update_guest_eip(); break; + case EXIT_REASON_INVVPID: + if ( nvmx_handle_invvpid(regs) == X86EMUL_OKAY ) + update_guest_eip(); + break; + case EXIT_REASON_MWAIT_INSTRUCTION: case EXIT_REASON_MONITOR_INSTRUCTION: case EXIT_REASON_GETSEC: - case EXIT_REASON_INVVPID: /* * We should never exit on GETSEC because CR4.SMXE is always 0 when * running in guest context, and the CPU checks that before getting @@ -2699,8 +2703,11 @@ void vmx_vmenter_helper(void) if ( !cpu_has_vmx_vpid ) goto out; + if ( nestedhvm_vcpu_in_guestmode(curr) ) + p_asid = &vcpu_nestedhvm(curr).nv_n2asid; + else + p_asid = &curr->arch.hvm_vcpu.n1asid; - p_asid = &curr->arch.hvm_vcpu.n1asid; old_asid = p_asid->asid; need_flush = hvm_asid_handle_vmenter(p_asid); new_asid = p_asid->asid; diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index c31f7ba..c54ee44 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -42,6 +42,7 @@ int nvmx_vcpu_initialise(struct vcpu *v) goto out; } nvmx->ept.enabled = 0; + nvmx->guest_vpid = 0; nvmx->vmxon_region_pa = 0; nvcpu->nv_vvmcx = NULL; nvcpu->nv_vvmcxaddr = VMCX_EADDR; @@ -882,6 +883,16 @@ static uint64_t get_shadow_eptp(struct vcpu *v) return ept_get_eptp(ept); } +static bool_t nvmx_vpid_enabled(struct nestedvcpu *nvcpu) +{ + uint32_t second_cntl; + + second_cntl = __get_vvmcs(nvcpu->nv_vvmcx, SECONDARY_VM_EXEC_CONTROL); + if ( second_cntl & SECONDARY_EXEC_ENABLE_VPID ) + return 1; + return 0; +} + static void virtual_vmentry(struct cpu_user_regs *regs) { struct vcpu *v = current; @@ -930,6 +941,18 @@ static void virtual_vmentry(struct cpu_user_regs *regs) if ( nestedhvm_paging_mode_hap(v) ) __vmwrite(EPT_POINTER, get_shadow_eptp(v)); + /* nested VPID support! */ + if ( cpu_has_vmx_vpid && nvmx_vpid_enabled(nvcpu) ) + { + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + uint32_t new_vpid = __get_vvmcs(vvmcs, VIRTUAL_PROCESSOR_ID); + if ( nvmx->guest_vpid != new_vpid ) + { + hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(v).nv_n2asid); + nvmx->guest_vpid = new_vpid; + } + } + } static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs) @@ -1221,7 +1244,7 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs) if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR ) { vmreturn (regs, VMFAIL_INVALID); - return X86EMUL_OKAY; + return X86EMUL_OKAY; } launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, @@ -1433,6 +1456,33 @@ int nvmx_handle_invept(struct cpu_user_regs *regs) (((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \ ((uint32_t)(__emul_value(enable1, default1) | host_value))) +int nvmx_handle_invvpid(struct cpu_user_regs *regs) +{ + struct vmx_inst_decoded decode; + unsigned long vpid; + u64 inv_type; + + if ( decode_vmx_inst(regs, &decode, &vpid, 0) != X86EMUL_OKAY ) + return X86EMUL_EXCEPTION; + + inv_type = reg_read(regs, decode.reg2); + + switch ( inv_type ) { + /* Just invalidate all tlb entries for all types! */ + case INVVPID_INDIVIDUAL_ADDR: + case INVVPID_SINGLE_CONTEXT: + case INVVPID_ALL_CONTEXT: + hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(current).nv_n2asid); + break; + default: + vmreturn(regs, VMFAIL_INVALID); + return X86EMUL_OKAY; + } + + vmreturn(regs, VMSUCCEED); + return X86EMUL_OKAY; +} + /* * Capability reporting */ diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index e671635..d1368a3 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -37,6 +37,7 @@ struct nestedvmx { uint32_t exit_reason; uint32_t exit_qual; } ept; + uint32_t guest_vpid; }; #define vcpu_2_nvmx(v) (vcpu_nestedhvm(v).u.nvmx) @@ -190,6 +191,7 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs); int nvmx_handle_vmresume(struct cpu_user_regs *regs); int nvmx_handle_vmlaunch(struct cpu_user_regs *regs); int nvmx_handle_invept(struct cpu_user_regs *regs); +int nvmx_handle_invvpid(struct cpu_user_regs *regs); int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content); int nvmx_msr_write_intercept(unsigned int msr, -- 1.7.1
Xiantao Zhang
2012-Dec-24 14:26 UTC
[PATCH v4 10/10] nEPT: Expose EPT & VPID capablities to L1 VMM
From: Zhang Xiantao <xiantao.zhang@intel.com> Expose EPT''s and VPID ''s basic features to L1 VMM. For EPT, no EPT A/D bit feature supported. For VPID, exposes all features to L1 VMM Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com> --- xen/arch/x86/hvm/vmx/vvmx.c | 17 +++++++++++++++-- xen/arch/x86/mm/hap/nested_ept.c | 24 +++++++++++++++++------- xen/include/asm-x86/hvm/vmx/vvmx.h | 2 ++ 3 files changed, 34 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index c54ee44..5f12f03 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1513,6 +1513,8 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) break; case MSR_IA32_VMX_PROCBASED_CTLS: case MSR_IA32_VMX_TRUE_PROCBASED_CTLS: + { + u32 default1_bits = VMX_PROCBASED_CTLS_DEFAULT1; /* 1-seetings */ data = CPU_BASED_HLT_EXITING | CPU_BASED_VIRTUAL_INTR_PENDING | @@ -1535,12 +1537,20 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) CPU_BASED_RDPMC_EXITING | CPU_BASED_TPR_SHADOW | CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; - data = gen_vmx_msr(data, VMX_PROCBASED_CTLS_DEFAULT1, host_data); + + if ( msr == MSR_IA32_VMX_TRUE_PROCBASED_CTLS ) + default1_bits &= ~(CPU_BASED_CR3_LOAD_EXITING | + CPU_BASED_CR3_STORE_EXITING | CPU_BASED_INVLPG_EXITING); + + data = gen_vmx_msr(data, default1_bits, host_data); break; + } case MSR_IA32_VMX_PROCBASED_CTLS2: /* 1-seetings */ data = SECONDARY_EXEC_DESCRIPTOR_TABLE_EXITING | - SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES; + SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES | + SECONDARY_EXEC_ENABLE_VPID | + SECONDARY_EXEC_ENABLE_EPT; data = gen_vmx_msr(data, 0, host_data); break; case MSR_IA32_VMX_EXIT_CTLS: @@ -1593,6 +1603,9 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) case MSR_IA32_VMX_MISC: gdprintk(XENLOG_WARNING, "VMX MSR %x not fully supported yet.\n", msr); break; + case MSR_IA32_VMX_EPT_VPID_CAP: + data = nept_get_ept_vpid_cap(); + break; default: r = 0; break; diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c index 4393065..83431e1 100644 --- a/xen/arch/x86/mm/hap/nested_ept.c +++ b/xen/arch/x86/mm/hap/nested_ept.c @@ -43,12 +43,17 @@ #define EPT_MUST_RSV_BITS (((1ull << PADDR_BITS) -1) & \ ~((1ull << paddr_bits) - 1)) -/* - *TODO: Just leave it as 0 here for compile pass, will - * define real capabilities in the subsequent patches. - */ -#define NEPT_VPID_CAP_BITS 0 - +#define NEPT_CAP_BITS \ + (VMX_EPT_INVEPT_ALL_CONTEXT | VMX_EPT_INVEPT_SINGLE_CONTEXT | \ + VMX_EPT_INVEPT_INSTRUCTION | VMX_EPT_SUPERPAGE_1GB | \ + VMX_EPT_SUPERPAGE_2MB | VMX_EPT_MEMORY_TYPE_WB | \ + VMX_EPT_MEMORY_TYPE_UC | VMX_EPT_WALK_LENGTH_4_SUPPORTED | \ + VMX_EPT_EXEC_ONLY_SUPPORTED) + +#define NVPID_CAP_BITS \ + (VMX_VPID_INVVPID_INSTRUCTION | VMX_VPID_INVVPID_INDIVIDUAL_ADDR |\ + VMX_VPID_INVVPID_SINGLE_CONTEXT | VMX_VPID_INVVPID_ALL_CONTEXT |\ + VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL) #define NEPT_1G_ENTRY_FLAG (1 << 11) #define NEPT_2M_ENTRY_FLAG (1 << 10) @@ -111,10 +116,15 @@ static bool_t nept_non_present_check(ept_entry_t e) uint64_t nept_get_ept_vpid_cap(void) { - uint64_t caps = NEPT_VPID_CAP_BITS; + uint64_t caps = 0; + if ( cpu_has_vmx_ept ) + caps |= NEPT_CAP_BITS; if ( !cpu_has_vmx_ept_exec_only_supported ) caps &= ~VMX_EPT_EXEC_ONLY_SUPPORTED; + if ( cpu_has_vmx_vpid ) + caps |= NVPID_CAP_BITS; + return caps; } diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index d1368a3..375c7f1 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -207,6 +207,8 @@ u64 nvmx_get_tsc_offset(struct vcpu *v); int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs, unsigned int exit_reason); +uint64_t nept_get_ept_vpid_cap(void); + int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, unsigned int *page_order, uint32_t rwx_acc, unsigned long *l1gfn, uint8_t *p2m_acc, -- 1.7.1
Jan Beulich
2013-Jan-03 12:03 UTC
Re: [PATCH v4 08/10] nEPT: handle invept instruction from L1 VMM
>>> On 24.12.12 at 15:26, Xiantao Zhang <xiantao.zhang@intel.com> wrote: > --- a/xen/arch/x86/hvm/vmx/vvmx.c > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > @@ -1390,6 +1390,42 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs) > return X86EMUL_OKAY; > } > > +int nvmx_handle_invept(struct cpu_user_regs *regs) > +{ > + struct vmx_inst_decoded decode; > + unsigned long eptp; > + u64 inv_type; > + > + if ( decode_vmx_inst(regs, &decode, &eptp, 0) != X86EMUL_OKAY ) > + return X86EMUL_EXCEPTION;So in the overview you said you fixed this, but here it is again: There are more than the two X86EMUL_* values referenced above, and hence you can''t imply that if it''s not one, it''s the other.> + > + inv_type = reg_read(regs, decode.reg2); > + > + switch ( inv_type )There doesn''t appear to be a second use of inv_type, and hence you can switch ( reg_read(regs, decode.reg2) ) and remove the local variable.> + { > + case INVEPT_SINGLE_CONTEXT: > + { > + struct p2m_domain *p2m = vcpu_nestedhvm(current).nv_p2m; > + if ( p2m ) > + { > + p2m_flush(current, p2m);And similarly you said you fixed all the white space issues. Jan> + ept_sync_domain(p2m); > + } > + break; > + } > + case INVEPT_ALL_CONTEXT: > + p2m_flush_nestedp2m(current->domain); > + __invept(INVEPT_ALL_CONTEXT, 0, 0); > + break; > + default: > + vmreturn(regs, VMFAIL_INVALID); > + return X86EMUL_OKAY; > + } > + vmreturn(regs, VMSUCCEED); > + return X86EMUL_OKAY; > +} > + > + > #define __emul_value(enable1, default1) \ > ((enable1 | default1) << 32 | (default1)) >
Jan Beulich
2013-Jan-03 12:05 UTC
Re: [PATCH v4 09/10] nVMX: virutalize VPID capability to nested VMM.
>>> On 24.12.12 at 15:26, Xiantao Zhang <xiantao.zhang@intel.com> wrote: > @@ -1433,6 +1456,33 @@ int nvmx_handle_invept(struct cpu_user_regs *regs) > (((__emul_value(enable1, default1) & host_value) & (~0ul << 32)) | \ > ((uint32_t)(__emul_value(enable1, default1) | host_value))) > > +int nvmx_handle_invvpid(struct cpu_user_regs *regs) > +{ > + struct vmx_inst_decoded decode; > + unsigned long vpid; > + u64 inv_type; > + > + if ( decode_vmx_inst(regs, &decode, &vpid, 0) != X86EMUL_OKAY ) > + return X86EMUL_EXCEPTION;Same comment as for patch 8.> + > + inv_type = reg_read(regs, decode.reg2); > + > + switch ( inv_type ) {And here. Jan> + /* Just invalidate all tlb entries for all types! */ > + case INVVPID_INDIVIDUAL_ADDR: > + case INVVPID_SINGLE_CONTEXT: > + case INVVPID_ALL_CONTEXT: > + hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(current).nv_n2asid); > + break; > + default: > + vmreturn(regs, VMFAIL_INVALID); > + return X86EMUL_OKAY; > + } > + > + vmreturn(regs, VMSUCCEED); > + return X86EMUL_OKAY; > +} > + > /* > * Capability reporting > */
Zhang, Xiantao
2013-Jan-04 00:57 UTC
Re: [PATCH v4 08/10] nEPT: handle invept instruction from L1 VMM
> > > > +int nvmx_handle_invept(struct cpu_user_regs *regs) { > > + struct vmx_inst_decoded decode; > > + unsigned long eptp; > > + u64 inv_type; > > + > > + if ( decode_vmx_inst(regs, &decode, &eptp, 0) != X86EMUL_OKAY ) > > + return X86EMUL_EXCEPTION; > > So in the overview you said you fixed this, but here it is again: > There are more than the two X86EMUL_* values referenced above, and > hence you can''t imply that if it''s not one, it''s the other.Do you mean X86EMUL_EXCEPTION can''t be returned here ? I think decode_vmx_inst handles the exception already, and the caller doesn''t need to do anything. Once the caller of nvmx_handle_invept get this return value, it doesn''t do RIP++, and just inject one exception instead in its return path.> > + > > + inv_type = reg_read(regs, decode.reg2); > > + > > + switch ( inv_type ) > > There doesn''t appear to be a second use of inv_type, and hence you can > > switch ( reg_read(regs, decode.reg2) ) > > and remove the local variable.Okay.> > + { > > + case INVEPT_SINGLE_CONTEXT: > > + { > > + struct p2m_domain *p2m = vcpu_nestedhvm(current).nv_p2m; > > + if ( p2m ) > > + { > > + p2m_flush(current, p2m); > > And similarly you said you fixed all the white space issues.Very strange, and I will fix it. Thanks! Xiantao> Jan > > > + ept_sync_domain(p2m); > > + } > > + break; > > + } > > + case INVEPT_ALL_CONTEXT: > > + p2m_flush_nestedp2m(current->domain); > > + __invept(INVEPT_ALL_CONTEXT, 0, 0); > > + break; > > + default: > > + vmreturn(regs, VMFAIL_INVALID); > > + return X86EMUL_OKAY; > > + } > > + vmreturn(regs, VMSUCCEED); > > + return X86EMUL_OKAY; > > +} > > + > > + > > #define __emul_value(enable1, default1) \ > > ((enable1 | default1) << 32 | (default1)) > > >
Jan Beulich
2013-Jan-04 08:24 UTC
Re: [PATCH v4 08/10] nEPT: handle invept instruction from L1 VMM
>>> On 04.01.13 at 01:57, "Zhang, Xiantao" <xiantao.zhang@intel.com> wrote: >> > >> > +int nvmx_handle_invept(struct cpu_user_regs *regs) { >> > + struct vmx_inst_decoded decode; >> > + unsigned long eptp; >> > + u64 inv_type; >> > + >> > + if ( decode_vmx_inst(regs, &decode, &eptp, 0) != X86EMUL_OKAY ) >> > + return X86EMUL_EXCEPTION; >> >> So in the overview you said you fixed this, but here it is again: >> There are more than the two X86EMUL_* values referenced above, and >> hence you can''t imply that if it''s not one, it''s the other. > > Do you mean X86EMUL_EXCEPTION can''t be returned here ?No - I''m trying to tell you that you should return whatever decode_vmx_inst() returned. Jan
Zhang, Xiantao
2013-Jan-05 03:19 UTC
Re: [PATCH v4 08/10] nEPT: handle invept instruction from L1 VMM
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Friday, January 04, 2013 4:25 PM > To: Zhang, Xiantao > Cc: Dong, Eddie; Nakajima, Jun; xen-devel@lists.xen.org; keir@xen.org; > tim@xen.org > Subject: RE: [PATCH v4 08/10] nEPT: handle invept instruction from L1 VMM > > >>> On 04.01.13 at 01:57, "Zhang, Xiantao" <xiantao.zhang@intel.com> > wrote: > >> > > >> > +int nvmx_handle_invept(struct cpu_user_regs *regs) { > >> > + struct vmx_inst_decoded decode; > >> > + unsigned long eptp; > >> > + u64 inv_type; > >> > + > >> > + if ( decode_vmx_inst(regs, &decode, &eptp, 0) != X86EMUL_OKAY ) > >> > + return X86EMUL_EXCEPTION; > >> > >> So in the overview you said you fixed this, but here it is again: > >> There are more than the two X86EMUL_* values referenced above, and > >> hence you can''t imply that if it''s not one, it''s the other. > > > > Do you mean X86EMUL_EXCEPTION can''t be returned here ? > > No - I''m trying to tell you that you should return whatever > decode_vmx_inst() returned.Considering the caller doesn''t care non-X86EMUL_OKAY case, so here just return X86EMUL_EXCEPTION. Anyway, I will change it. Thanks! Xiantao
Ian Jackson
2013-Feb-12 16:22 UTC
Re: [PATCH v4 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM
Xiantao Zhang writes ("[Xen-devel] [PATCH v4 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM"):> From: Zhang Xiantao <xiantao.zhang@intel.com> > > With virtual EPT support, L1 hyerpvisor can use EPT hardware for L2 guest''s memory virtualization. > In this way, L2 guest''s performance can be improved sharply. > According to our testing, some benchmarks can show > 5x performance gain.I''m no expert on the areas of code you''re touching, so perhaps you''ve already done this, but: I think there may need to be some high-level knob to turn this feature on and off (probably, for individual guests). This is because this feature exposes a richer attack surface for guests (AFAICT). I know there''s already a feature check for nested HVM, but I wonder if that''s enough. I''d like to hear other people''s opinions on this point. Thanks, Ian.
Nakajima, Jun
2013-Feb-12 17:54 UTC
Re: [PATCH v4 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM
On Tue, Feb 12, 2013 at 8:22 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> Xiantao Zhang writes ("[Xen-devel] [PATCH v4 00/10] Nested VMX: Add > virtual EPT & VPID support to L1 VMM"): > > From: Zhang Xiantao <xiantao.zhang@intel.com> > > > > With virtual EPT support, L1 hyerpvisor can use EPT hardware for L2 > guest''s memory virtualization. > > In this way, L2 guest''s performance can be improved sharply. > > According to our testing, some benchmarks can show > 5x performance gain. > > I''m no expert on the areas of code you''re touching, so perhaps you''ve > already done this, but: > > I think there may need to be some high-level knob to turn this feature > on and off (probably, for individual guests). This is because this > feature exposes a richer attack surface for guests (AFAICT). I know > there''s already a feature check for nested HVM, but I wonder if that''s > enough. >I agree that the feature does or can expose a richer attack surface for guests today. We need to set "nestedhvm" in the config (''false'' by default) for each guest, to turn on the feature, as far as I know. I don''t think we need a global switch like a boot parameter for Xen at this point. -- Jun Intel Open Source Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Ian Jackson
2013-Feb-12 17:56 UTC
Re: [PATCH v4 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM
Nakajima, Jun writes ("Re: [Xen-devel] [PATCH v4 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM"):> I agree that the feature does or can expose a richer attack surface > for guests today. We need to set "nestedhvm" in the config (''false'' > by default) for each guest, to turn on the feature, as far as I > know. I don''t think we need a global switch like a boot parameter > for Xen at this point.Yes, but my point was whether the "nestedhvm" switch is sufficient. As I understand it nestedhvm with virtual EPT provides a richer attack surface than without. So the question is whether we should provide a switch to disable virtual EPT while leaving nestedhvm enabled. Ian.
Nakajima, Jun
2013-Feb-12 18:20 UTC
Re: [PATCH v4 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM
On Tue, Feb 12, 2013 at 9:56 AM, Ian Jackson <Ian.Jackson@eu.citrix.com>wrote:> Nakajima, Jun writes ("Re: [Xen-devel] [PATCH v4 00/10] Nested VMX: Add > virtual EPT & VPID support to L1 VMM"): > > I agree that the feature does or can expose a richer attack surface > > for guests today. We need to set "nestedhvm" in the config (''false'' > > by default) for each guest, to turn on the feature, as far as I > > know. I don''t think we need a global switch like a boot parameter > > for Xen at this point. > > Yes, but my point was whether the "nestedhvm" switch is sufficient. > As I understand it nestedhvm with virtual EPT provides a richer attack > surface than without. So the question is whether we should provide a > switch to disable virtual EPT while leaving nestedhvm enabled. > >Given the simple implementation in Xen that utilizes the real H/W feature, I think nestedhvm with virtual EPT should be able to provide more secure implementations with less testing/QA. It''s possible that we may see more security issues as a side-effect of virtual EPT support in the short term because people may use the nestedhvm feature more. In other words, the option nestedhvm may not be practical without virtual EPT from performance point of view. -- Jun Intel Open Source Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel