thr3ads.net - Xen devel - [PATCH v3 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM [Dec 2012]

If this information is useful, please help other people find it:
Share via:

Jan Beulich

2012-Dec-20 09:39 UTC

Re: [PATCH v3 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode

>>> On 20.12.12 at 16:43, Xiantao Zhang <xiantao.zhang@intel.com>
wrote:
> From: Zhang Xiantao <xiantao.zhang@intel.com>
> 
> For PAE L2 guest, GUEST_DPPTR registers needs to be synced for each virtual
> vmentry.
> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 2ae6f6a..1f7de7a 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -826,7 +826,14 @@ static void load_shadow_guest_state(struct vcpu *v)
>      vvmcs_to_shadow(vvmcs, CR0_GUEST_HOST_MASK);
>      vvmcs_to_shadow(vvmcs, CR4_GUEST_HOST_MASK);
>  
> -    /* TODO: PDPTRs for nested ept */
> +    if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) &&
> +                    (v->arch.hvm_vcpu.guest_efer & EFER_LMA) ) {
> +	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR0);
> +	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR1);
> +	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR2);
> +	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR3);
When I commented on the previous version''s white space issue
in one of the patches, I really expected you to check all patches.
Yet here we have tabs again, and the placement of the opening
brace isn''t right either, nor is the indentation of the continued
if() clause.

Please, before re-submitting, make sure you look through all of
the patches for further coding style issues.

Jan
> +    }
> +
>      /* TODO: CR3 target control */
>  }
>  
> -- 
> 1.7.1

Jan Beulich

2012-Dec-20 09:54 UTC

head link

Re: [PATCH v3 08/10] nEPT: handle invept instruction from L1 VMM

>>> On 20.12.12 at 16:43, Xiantao Zhang <xiantao.zhang@intel.com>
wrote:
> --- a/xen/arch/x86/hvm/vmx/vmx.c
> +++ b/xen/arch/x86/hvm/vmx/vmx.c
> @@ -2573,10 +2573,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>              update_guest_eip();
>          break;
>  
> +    case EXIT_REASON_INVEPT:
> +        if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
> +            update_guest_eip();
> +        break;
> +
I realize that you''re just copying code written the same way
elsewhere, but: What if (here and elsewhere) X86EMUL_OKAY
is not being returned (e.g. in the non-nested case)? Without
the nested VMX code, all of these would have ended up at
the default case (crashing the guest). Iiuc the correct action
would be to inject an exception at least when X86EMUL_EXCEPTION
is being returned here - whether that''s done here or (perhaps
better, as only it can know _what_ exception to inject) by the
callee is another thing to decide.

Also, at the example of nvmx_handle_vmclear() I see that it
produces exceptions in most of the cases, but I think all of the
related code needs auditing that things are being handled
consistently _and_ completely (constructs like

    if ( ... != X86EMUL_OKAY )
        return X86EMUL_EXCEPTION;

are definitely not okay, as there are further X86EMUL_* values
that can occur; if you know only the two must ever occur at a
given place, ASSERT() so, making things clear to the reader
without having to follow all code paths).

Jan

Jan Beulich

2012-Dec-20 09:56 UTC

head link

Re: [PATCH v3 09/10] nVMX: virutalize VPID capability to nested VMM.

>>> On 20.12.12 at 16:43, Xiantao Zhang <xiantao.zhang@intel.com>
wrote:
> +int nvmx_handle_invvpid(struct cpu_user_regs *regs)
> +{
> +    struct vmx_inst_decoded decode;
> +    unsigned long vpid;
> +    u64 inv_type;
> +
> +    if ( !cpu_has_vmx_vpid )
> +        return X86EMUL_EXCEPTION;
Same problem here - you mustn''t return X86EMUL_EXCEPTION
without also raising an exception.

Jan
> +
> +    if ( decode_vmx_inst(regs, &decode, &vpid, 0) != X86EMUL_OKAY
)
> +        return X86EMUL_EXCEPTION;
> +
> +    inv_type = reg_read(regs, decode.reg2);
> +    gdprintk(XENLOG_DEBUG,"inv_type:%ld, vpid:%lx\n", inv_type,
vpid);
> +
> +    switch ( inv_type ) {
> +        /* Just invalidate all tlb entries for all types! */
> +        case INVVPID_INDIVIDUAL_ADDR:
> +        case INVVPID_SINGLE_CONTEXT:
> +        case INVVPID_ALL_CONTEXT:
> +           
hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(current).nv_n2asid);
> +            break;
> +        default:
> +            return X86EMUL_EXCEPTION;
> +    }
> +    vmreturn(regs, VMSUCCEED);
> +
> +    return X86EMUL_OKAY;
> +}
> +
>  /*
>   * Capability reporting
>   */

Tim Deegan

2012-Dec-20 12:11 UTC

head link

Re: [PATCH v3 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words

At 23:43 +0800 on 20 Dec (1356047022), Xiantao Zhang
wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com>
> 
> VMX doesn''t have the concept about host cr3 for nested p2m,
> and only SVM has, so change it to netural words.
> 
> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
> Acded-by: Tim Deegan <tim@xen.org>
Ac_k_ed-by. :)

Tim Deegan

2012-Dec-20 12:18 UTC

head link

Re: [PATCH v3 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode

At 23:43 +0800 on 20 Dec (1356047027), Xiantao Zhang
wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com>
> 
> For PAE L2 guest, GUEST_DPPTR registers needs to be synced for each virtual
> vmentry.
> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Apart from the whitespace mangling that Jan pointed out, 

Acked-by: Tim Deegan <tim@xen.org>
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c |    9 ++++++++-
>  1 files changed, 8 insertions(+), 1 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 2ae6f6a..1f7de7a 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -826,7 +826,14 @@ static void load_shadow_guest_state(struct vcpu *v)
>      vvmcs_to_shadow(vvmcs, CR0_GUEST_HOST_MASK);
>      vvmcs_to_shadow(vvmcs, CR4_GUEST_HOST_MASK);
>  
> -    /* TODO: PDPTRs for nested ept */
> +    if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) &&
> +                    (v->arch.hvm_vcpu.guest_efer & EFER_LMA) ) {
> +	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR0);
> +	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR1);
> +	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR2);
> +	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR3);
> +    }
> +
>      /* TODO: CR3 target control */
>  }
>  
> -- 
> 1.7.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Tim Deegan

2012-Dec-20 12:51 UTC

head link

Re: [PATCH v3 03/10] nested_ept: Implement guest ept''s walker

At 23:43 +0800 on 20 Dec (1356047024), Xiantao Zhang
wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com>
> 
> Implment guest EPT PT walker, some logic is based on shadow''s
> ia32e PT walker. During the PT walking, if the target pages are
> not in memory, use RETRY mechanism and get a chance to let the
> target page back.
> 
> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
This is much nicer than v1, thanks.  I have some comments below, and the
whole thing needs to be checked for whitespace mangling.
> +static bool_t nept_rwx_bits_check(ept_entry_t e) {
> +    /*write only or write/execute only*/
> +    uint8_t rwx_bits = e.epte & EPTE_RWX_MASK;
> +
> +    if ( rwx_bits == ept_access_w || rwx_bits == ept_access_wx )
> +        return 1;
> +
> +    if ( rwx_bits == ept_access_x && !(NEPT_VPID_CAP_BITS &
> +                        VMX_EPT_EXEC_ONLY_SUPPORTED))
In a later patch you add VMX_EPT_EXEC_ONLY_SUPPORTED to this field.  How
can that work when running on a CPU that doesn''t support exec-only? 
The
nested-ept tables will have exec-only mapping in them which the CPU will
reject.
> +done:
> +    ret = EPT_TRANSLATE_SUCCEED;
> +    goto out;
> +
> +map_err:
> +    if ( rc == _PAGE_PAGED )
> +        ret = EPT_TRANSLATE_RETRY;
> +    else
> +        ret = EPT_TRANSLATE_ERR_PAGE;
What does this error code mean?  The caller just retries the faulting
instruction when it sees it, which sounds wrong.  Why not just return
EPT_TRANSLATE_MISCONFIG if the guest uses an unmappable frame for EPT
tables?
> +    default:
> +        rc = EPT_TRANSLATE_UNSUPPORTED;
> +        gdprintk(XENLOG_ERR, "Unsupported ept translation
type!:%d\n", rc);
Just BUG() here and get rid of EPT_TRANSLATE_UNSUPPORTED and
NESTEDHVM_PAGEFAULT_UNHANDLED.  The function that provided rc is right
above and we can see it hasn''t got any other return values.
> --- a/xen/arch/x86/mm/shadow/multi.c
> +++ b/xen/arch/x86/mm/shadow/multi.c
> @@ -4582,7 +4582,7 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v,
>      /* Translate the GFN to an MFN */
>      ASSERT(!paging_locked_by_me(v->domain));
>      mfn = get_gfn(v->domain, _gfn(gfn), &p2mt);
> -        
> +
This stray change should be dropped. 
> +typedef enum {
> +    ept_access_n     = 0, /* No access permissions allowed */
> +    ept_access_r     = 1,
> +    ept_access_w     = 2,
> +    ept_access_rw    = 3,
> +    ept_access_x     = 4,
> +    ept_access_rx    = 5,
> +    ept_access_wx    = 6,
> +    ept_access_all   = 7,
> +} ept_access_t;
This enum isn''t used anywhere.  

Cheers,

Tim.

Tim Deegan

2012-Dec-20 13:01 UTC

head link

Re: [PATCH v3 04/10] EPT: Make ept data structure or operations neutral

At 23:43 +0800 on 20 Dec (1356047025), Xiantao Zhang
wrote:> From: Zhang Xiantao <xiantao.zhang@intel.com>
> 
> Share the current EPT logic with nested EPT case, so
> make the related data structure or operations netural
> to comment EPT and nested EPT.
> 
> Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Looking good.  I would ack this, except for for one thing -- you''ve
made
p2m_initialise() return an error code but not updated either of the
callers to use that code.

Cheers,

Tim.

Tim Deegan

2012-Dec-20 13:10 UTC

head link

Re: [PATCH v3 07/10] nEPT: Use minimal permission for nested p2m.

At 23:43 +0800 on 20 Dec (1356047028), Xiantao Zhang
wrote:> @@ -206,12 +205,14 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v,
paddr_t *L2_gpa,
>      struct p2m_domain *p2m, *nested_p2m;
>      unsigned int page_order_21, page_order_10, page_order_20;
>      p2m_type_t p2mt_10;
> +    p2m_access_t p2ma_10 = p2m_access_rwx;
> +    uint8_t p2ma_21;
>  
>      p2m = p2m_get_hostp2m(d); /* L0 p2m */
>      nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
>  
>      /* walk the L1 P2M table */
> -    rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa,
&page_order_21,
> +    rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa,
&page_order_21, &p2ma_21,
>          access_r, access_w, access_x);
Once again, please either initialise p2ma_21 to rwx or have the SVM
version of this lookup set it to something sensible. 

Cheers,

Tim.

Tim Deegan

2012-Dec-20 13:55 UTC

head link

Re: [PATCH v3 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM

Hi, 

At 23:43 +0800 on 20 Dec (1356047021), Xiantao Zhang
wrote:> Received: from hax-build.sh.intel.com ([10.239.48.28])
>         by fmsmga001.fm.intel.com with ESMTP; 19 Dec 2012 19:59:04 -0800
> From: Xiantao Zhang <xiantao.zhang@intel.com>
> To: xen-devel@lists.xen.org
> Date: Thu, 20 Dec 2012 23:43:41 +0800
I think the clock on your computer or your email client is confused:
your email is datestamped about 12 hours in the future.

Tim.

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM

From: Zhang Xiantao <xiantao.zhang@intel.com>

With virtual EPT support, L1 hyerpvisor can use EPT hardware for L2
guest''s memory virtualization.
In this way, L2 guest''s performance can be improved sharply.
According to our testing, some benchmarks can show > 5x performance gain.

Changes from v1:
Update the patches according to Tim''s comments. 
1. Patch 03: Enhance the virtual EPT''s walker logic.
2. Patch 04: Add a new field in struct p2m_domain, and use it to store
   EPT-specific data. For host p2m, it saves L1 VMM''s EPT data,
   and for nested p2m, it saves nested EPT''s data 3. Patch 07: strictly
check host''s p2m access type.
4. Other patches: some whitespace mangling fixes.

Changes form v2:
Addressed comments from Jan and Jun:
1. Add Acked-by message for reviewed patches by Tim. 
2. Fixed one whilespace mangling issue in PATCH 08
3. Add some comments to describe the meaning of 
   the return value of hvm_hap_nested_page_fault 
   in PATCH 05.
4. Add the logic for handling default case of two switch
   statements.

Zhang Xiantao (10):
  nestedhap: Change hostcr3 and p2m->cr3 to meaningful words
  nestedhap: Change nested p2m''s walker to vendor-specific
  nested_ept: Implement guest ept''s walker
  EPT: Make ept data structure or operations neutral
  nEPT: Try to enable EPT paging for L2 guest.
  nEPT: Sync PDPTR fields if L2 guest in PAE paging mode
  nEPT: Use minimal permission for nested p2m.
  nEPT: handle invept instruction from L1 VMM
  nVMX: virutalize VPID capability to nested VMM.
  nEPT: expost EPT & VPID capablities to L1 VMM

 xen/arch/x86/hvm/hvm.c                  |    7 +-
 xen/arch/x86/hvm/svm/nestedsvm.c        |   31 ++++
 xen/arch/x86/hvm/svm/svm.c              |    3 +-
 xen/arch/x86/hvm/vmx/vmcs.c             |    9 +-
 xen/arch/x86/hvm/vmx/vmx.c              |   91 ++++------
 xen/arch/x86/hvm/vmx/vvmx.c             |  215 ++++++++++++++++++++++--
 xen/arch/x86/mm/guest_walk.c            |   16 +-
 xen/arch/x86/mm/hap/Makefile            |    1 +
 xen/arch/x86/mm/hap/nested_ept.c        |  286 +++++++++++++++++++++++++++++++
 xen/arch/x86/mm/hap/nested_hap.c        |   95 ++++++-----
 xen/arch/x86/mm/mm-locks.h              |    2 +-
 xen/arch/x86/mm/p2m-ept.c               |  104 +++++++++---
 xen/arch/x86/mm/p2m.c                   |   49 +++---
 xen/arch/x86/mm/shadow/multi.c          |    2 +-
 xen/include/asm-x86/guest_pt.h          |    8 +
 xen/include/asm-x86/hvm/hvm.h           |    9 +-
 xen/include/asm-x86/hvm/nestedhvm.h     |    2 +
 xen/include/asm-x86/hvm/svm/nestedsvm.h |    3 +
 xen/include/asm-x86/hvm/vmx/vmcs.h      |   24 ++--
 xen/include/asm-x86/hvm/vmx/vmx.h       |   38 ++++-
 xen/include/asm-x86/hvm/vmx/vvmx.h      |   30 +++-
 xen/include/asm-x86/p2m.h               |   20 ++-
 22 files changed, 852 insertions(+), 193 deletions(-)
 create mode 100644 xen/arch/x86/mm/hap/nested_ept.c

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words

From: Zhang Xiantao <xiantao.zhang@intel.com>

VMX doesn''t have the concept about host cr3 for nested p2m,
and only SVM has, so change it to netural words.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acded-by: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/hvm/hvm.c             |    6 +++---
 xen/arch/x86/hvm/svm/svm.c         |    2 +-
 xen/arch/x86/hvm/vmx/vmx.c         |    2 +-
 xen/arch/x86/hvm/vmx/vvmx.c        |    2 +-
 xen/arch/x86/mm/hap/nested_hap.c   |   15 ++++++++-------
 xen/arch/x86/mm/mm-locks.h         |    2 +-
 xen/arch/x86/mm/p2m.c              |   26 +++++++++++++-------------
 xen/include/asm-x86/hvm/hvm.h      |    4 ++--
 xen/include/asm-x86/hvm/vmx/vvmx.h |    2 +-
 xen/include/asm-x86/p2m.h          |   16 ++++++++--------
 10 files changed, 39 insertions(+), 38 deletions(-)

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 40c1ab2..1cae8a8 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4536,10 +4536,10 @@ uint64_t nhvm_vcpu_guestcr3(struct vcpu *v)
     return -EOPNOTSUPP;
 }
 
-uint64_t nhvm_vcpu_hostcr3(struct vcpu *v)
+uint64_t nhvm_vcpu_p2m_base(struct vcpu *v)
 {
-    if (hvm_funcs.nhvm_vcpu_hostcr3)
-        return hvm_funcs.nhvm_vcpu_hostcr3(v);
+    if (hvm_funcs.nhvm_vcpu_p2m_base)
+        return hvm_funcs.nhvm_vcpu_p2m_base(v);
     return -EOPNOTSUPP;
 }
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 55a5ae5..2c8504a 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2003,7 +2003,7 @@ static struct hvm_function_table __read_mostly
svm_function_table = {
     .nhvm_vcpu_vmexit = nsvm_vcpu_vmexit_inject,
     .nhvm_vcpu_vmexit_trap = nsvm_vcpu_vmexit_trap,
     .nhvm_vcpu_guestcr3 = nsvm_vcpu_guestcr3,
-    .nhvm_vcpu_hostcr3 = nsvm_vcpu_hostcr3,
+    .nhvm_vcpu_p2m_base = nsvm_vcpu_hostcr3,
     .nhvm_vcpu_asid = nsvm_vcpu_asid,
     .nhvm_vmcx_guest_intercepts_trap = nsvm_vmcb_guest_intercepts_trap,
     .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled,
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index aee1f9e..98309da 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1504,7 +1504,7 @@ static struct hvm_function_table __read_mostly
vmx_function_table = {
     .nhvm_vcpu_destroy    = nvmx_vcpu_destroy,
     .nhvm_vcpu_reset      = nvmx_vcpu_reset,
     .nhvm_vcpu_guestcr3   = nvmx_vcpu_guestcr3,
-    .nhvm_vcpu_hostcr3    = nvmx_vcpu_hostcr3,
+    .nhvm_vcpu_p2m_base   = nvmx_vcpu_eptp_base,
     .nhvm_vcpu_asid       = nvmx_vcpu_asid,
     .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception,
     .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap,
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index b005816..6d1a736 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -94,7 +94,7 @@ uint64_t nvmx_vcpu_guestcr3(struct vcpu *v)
     return 0;
 }
 
-uint64_t nvmx_vcpu_hostcr3(struct vcpu *v)
+uint64_t nvmx_vcpu_eptp_base(struct vcpu *v)
 {
     /* TODO */
     ASSERT(0);
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 317875d..f9a5edc 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -48,9 +48,10 @@
  *    1. If #NPF is from L1 guest, then we crash the guest VM (same as old 
  *       code)
  *    2. If #NPF is from L2 guest, then we continue from (3)
- *    3. Get h_cr3 from L1 guest. Map h_cr3 into L0 hypervisor address space.
- *    4. Walk the h_cr3 page table
- *    5.    - if not present, then we inject #NPF back to L1 guest and 
+ *    3. Get np2m base from L1 guest. Map np2m base into L0 hypervisor address
space.
+ *    4. Walk the np2m''s  page table
+ *    5.    - if not present or permission check failure, then we inject #NPF
back to
+ *    L1 guest and 
  *            re-launch L1 guest (L1 guest will either treat this #NPF as MMIO,
  *            or fix its p2m table for L2 guest)
  *    6.    - if present, then we will get the a new translated value L1-GPA 
@@ -89,7 +90,7 @@ nestedp2m_write_p2m_entry(struct p2m_domain *p2m, unsigned
long gfn,
 
     if (old_flags & _PAGE_PRESENT)
         flush_tlb_mask(p2m->dirty_cpumask);
-    
+
     paging_unlock(d);
 }
 
@@ -110,7 +111,7 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
     /* If this p2m table has been flushed or recycled under our feet, 
      * leave it alone.  We''ll pick up the right one as we try to 
      * vmenter the guest. */
-    if ( p2m->cr3 == nhvm_vcpu_hostcr3(v) )
+    if ( p2m->np2m_base == nhvm_vcpu_p2m_base(v) )
     {
         unsigned long gfn, mask;
         mfn_t mfn;
@@ -186,7 +187,7 @@ nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa,
paddr_t *L1_gpa,
     uint32_t pfec;
     unsigned long nested_cr3, gfn;
     
-    nested_cr3 = nhvm_vcpu_hostcr3(v);
+    nested_cr3 = nhvm_vcpu_p2m_base(v);
 
     pfec = PFEC_user_mode | PFEC_page_present;
     if (access_w)
@@ -221,7 +222,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t
*L2_gpa,
     p2m_type_t p2mt_10;
 
     p2m = p2m_get_hostp2m(d); /* L0 p2m */
-    nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_hostcr3(v));
+    nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
 
     /* walk the L1 P2M table */
     rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21,
diff --git a/xen/arch/x86/mm/mm-locks.h b/xen/arch/x86/mm/mm-locks.h
index 3700e32..1817f81 100644
--- a/xen/arch/x86/mm/mm-locks.h
+++ b/xen/arch/x86/mm/mm-locks.h
@@ -249,7 +249,7 @@ declare_mm_order_constraint(per_page_sharing)
  * A per-domain lock that protects the mapping from nested-CR3 to 
  * nested-p2m.  In particular it covers:
  * - the array of nested-p2m tables, and all LRU activity therein; and
- * - setting the "cr3" field of any p2m table to a non-CR3_EADDR
value.
+ * - setting the "cr3" field of any p2m table to a non-P2M_BASE_EAADR
value.
  *   (i.e. assigning a p2m table to be the shadow of that cr3 */
 
 /* PoD lock (per-p2m-table)
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 258f46e..6a4bdd9 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -69,7 +69,7 @@ static void p2m_initialise(struct domain *d, struct p2m_domain
*p2m)
     p2m->domain = d;
     p2m->default_access = p2m_access_rwx;
 
-    p2m->cr3 = CR3_EADDR;
+    p2m->np2m_base = P2M_BASE_EADDR;
 
     if ( hap_enabled(d) && cpu_has_vmx )
         ept_p2m_init(p2m);
@@ -1433,7 +1433,7 @@ p2m_flush_table(struct p2m_domain *p2m)
     ASSERT(page_list_empty(&p2m->pod.single));
 
     /* This is no longer a valid nested p2m for any address space */
-    p2m->cr3 = CR3_EADDR;
+    p2m->np2m_base = P2M_BASE_EADDR;
     
     /* Zap the top level of the trie */
     top = mfn_to_page(pagetable_get_mfn(p2m_get_pagetable(p2m)));
@@ -1471,7 +1471,7 @@ p2m_flush_nestedp2m(struct domain *d)
 }
 
 struct p2m_domain *
-p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3)
+p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base)
 {
     /* Use volatile to prevent gcc to cache nv->nv_p2m in a cpu register as
      * this may change within the loop by an other (v)cpu.
@@ -1480,8 +1480,8 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3)
     struct domain *d;
     struct p2m_domain *p2m;
 
-    /* Mask out low bits; this avoids collisions with CR3_EADDR */
-    cr3 &= ~(0xfffull);
+    /* Mask out low bits; this avoids collisions with P2M_BASE_EADDR */
+    np2m_base &= ~(0xfffull);
 
     if (nv->nv_flushp2m && nv->nv_p2m) {
         nv->nv_p2m = NULL;
@@ -1493,14 +1493,14 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3)
     if ( p2m ) 
     {
         p2m_lock(p2m);
-        if ( p2m->cr3 == cr3 || p2m->cr3 == CR3_EADDR )
+        if ( p2m->np2m_base == np2m_base || p2m->np2m_base ==
P2M_BASE_EADDR )
         {
             nv->nv_flushp2m = 0;
             p2m_getlru_nestedp2m(d, p2m);
             nv->nv_p2m = p2m;
-            if (p2m->cr3 == CR3_EADDR)
+            if (p2m->np2m_base == P2M_BASE_EADDR)
                 hvm_asid_flush_vcpu(v);
-            p2m->cr3 = cr3;
+            p2m->np2m_base = np2m_base;
             cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
             p2m_unlock(p2m);
             nestedp2m_unlock(d);
@@ -1515,7 +1515,7 @@ p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3)
     p2m_flush_table(p2m);
     p2m_lock(p2m);
     nv->nv_p2m = p2m;
-    p2m->cr3 = cr3;
+    p2m->np2m_base = np2m_base;
     nv->nv_flushp2m = 0;
     hvm_asid_flush_vcpu(v);
     cpumask_set_cpu(v->processor, p2m->dirty_cpumask);
@@ -1531,7 +1531,7 @@ p2m_get_p2m(struct vcpu *v)
     if (!nestedhvm_is_n2(v))
         return p2m_get_hostp2m(v->domain);
 
-    return p2m_get_nestedp2m(v, nhvm_vcpu_hostcr3(v));
+    return p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
 }
 
 unsigned long paging_gva_to_gfn(struct vcpu *v,
@@ -1549,15 +1549,15 @@ unsigned long paging_gva_to_gfn(struct vcpu *v,
         struct p2m_domain *p2m;
         const struct paging_mode *mode;
         uint32_t pfec_21 = *pfec;
-        uint64_t ncr3 = nhvm_vcpu_hostcr3(v);
+        uint64_t np2m_base = nhvm_vcpu_p2m_base(v);
 
         /* translate l2 guest va into l2 guest gfn */
-        p2m = p2m_get_nestedp2m(v, ncr3);
+        p2m = p2m_get_nestedp2m(v, np2m_base);
         mode = paging_get_nestedmode(v);
         gfn = mode->gva_to_gfn(v, p2m, va, pfec);
 
         /* translate l2 guest gfn into l1 guest gfn */
-        return hostmode->p2m_ga_to_gfn(v, hostp2m, ncr3,
+        return hostmode->p2m_ga_to_gfn(v, hostp2m, np2m_base,
                                        gfn << PAGE_SHIFT, &pfec_21,
NULL);
     }
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index fdb0f58..d3535b6 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -170,7 +170,7 @@ struct hvm_function_table {
                                 uint64_t exitcode);
     int (*nhvm_vcpu_vmexit_trap)(struct vcpu *v, struct hvm_trap *trap);
     uint64_t (*nhvm_vcpu_guestcr3)(struct vcpu *v);
-    uint64_t (*nhvm_vcpu_hostcr3)(struct vcpu *v);
+    uint64_t (*nhvm_vcpu_p2m_base)(struct vcpu *v);
     uint32_t (*nhvm_vcpu_asid)(struct vcpu *v);
     int (*nhvm_vmcx_guest_intercepts_trap)(struct vcpu *v, 
                                unsigned int trapnr, int errcode);
@@ -475,7 +475,7 @@ uint64_t nhvm_vcpu_guestcr3(struct vcpu *v);
 /* returns l1 guest''s cr3 that points to the page table used to
  * translate l2 guest physical address to l1 guest physical address.
  */
-uint64_t nhvm_vcpu_hostcr3(struct vcpu *v);
+uint64_t nhvm_vcpu_p2m_base(struct vcpu *v);
 /* returns the asid number l1 guest wants to use to run the l2 guest */
 uint32_t nhvm_vcpu_asid(struct vcpu *v);
 
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index dce2cd8..d97011d 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -99,7 +99,7 @@ int nvmx_vcpu_initialise(struct vcpu *v);
 void nvmx_vcpu_destroy(struct vcpu *v);
 int nvmx_vcpu_reset(struct vcpu *v);
 uint64_t nvmx_vcpu_guestcr3(struct vcpu *v);
-uint64_t nvmx_vcpu_hostcr3(struct vcpu *v);
+uint64_t nvmx_vcpu_eptp_base(struct vcpu *v);
 uint32_t nvmx_vcpu_asid(struct vcpu *v);
 enum hvm_intblk nvmx_intr_blocked(struct vcpu *v);
 int nvmx_intercepts_exception(struct vcpu *v, 
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index 2bd2048..ce26594 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -197,17 +197,17 @@ struct p2m_domain {
 
     struct domain     *domain;   /* back pointer to domain */
 
-    /* Nested p2ms only: nested-CR3 value that this p2m shadows. 
-     * This can be cleared to CR3_EADDR under the per-p2m lock but
+    /* Nested p2ms only: nested p2m base value that this p2m shadows. 
+     * This can be cleared to P2M_BASE_EADDR under the per-p2m lock but
      * needs both the per-p2m lock and the per-domain nestedp2m lock
      * to set it to any other value. */
-#define CR3_EADDR     (~0ULL)
-    uint64_t           cr3;
+#define P2M_BASE_EADDR     (~0ULL)
+    uint64_t           np2m_base;
 
     /* Nested p2ms: linked list of n2pms allocated to this domain. 
      * The host p2m hasolds the head of the list and the np2ms are 
      * threaded on in LRU order. */
-    struct list_head np2m_list; 
+    struct list_head   np2m_list; 
 
 
     /* Host p2m: when this flag is set, don''t flush all the nested-p2m
@@ -282,11 +282,11 @@ struct p2m_domain {
 /* get host p2m table */
 #define p2m_get_hostp2m(d)      ((d)->arch.p2m)
 
-/* Get p2m table (re)usable for specified cr3.
+/* Get p2m table (re)usable for specified np2m base.
  * Automatically destroys and re-initializes a p2m if none found.
- * If cr3 == 0 then v->arch.hvm_vcpu.guest_cr[3] is used.
+ * If np2m_base == 0 then v->arch.hvm_vcpu.guest_cr[3] is used.
  */
-struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t cr3);
+struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, uint64_t np2m_base);
 
 /* If vcpu is in host mode then behaviour matches p2m_get_hostp2m().
  * If vcpu is in guest mode then behaviour matches p2m_get_nestedp2m().
-- 
1.7.1

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 02/10] nestedhap: Change nested p2m''s walker to vendor-specific

From: Zhang Xiantao <xiantao.zhang@intel.com>

EPT and NPT adopts differnt formats for each-level entry,
so change the walker functions to vendor-specific.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/hvm/svm/nestedsvm.c        |   31 +++++++++++++++++++++
 xen/arch/x86/hvm/svm/svm.c              |    1 +
 xen/arch/x86/hvm/vmx/vmx.c              |    3 +-
 xen/arch/x86/hvm/vmx/vvmx.c             |   13 +++++++++
 xen/arch/x86/mm/hap/nested_hap.c        |   46 +++++++++++--------------------
 xen/include/asm-x86/hvm/hvm.h           |    5 +++
 xen/include/asm-x86/hvm/svm/nestedsvm.h |    3 ++
 xen/include/asm-x86/hvm/vmx/vvmx.h      |    5 +++
 8 files changed, 76 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index ed0faa6..5dcb354 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1171,6 +1171,37 @@ nsvm_vmcb_hap_enabled(struct vcpu *v)
     return vcpu_nestedsvm(v).ns_hap_enabled;
 }
 
+/* This function uses L2_gpa to walk the P2M page table in L1. If the 
+ * walk is successful, the translated value is returned in
+ * L1_gpa. The result value tells what to do next.
+ */
+int
+nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x)
+{
+    uint32_t pfec;
+    unsigned long nested_cr3, gfn;
+    
+    nested_cr3 = nhvm_vcpu_p2m_base(v);
+
+    pfec = PFEC_user_mode | PFEC_page_present;
+    if (access_w)
+        pfec |= PFEC_write_access;
+    if (access_x)
+        pfec |= PFEC_insn_fetch;
+
+    /* Walk the guest-supplied NPT table, just as if it were a pagetable */
+    gfn = paging_ga_to_gfn_cr3(v, nested_cr3, L2_gpa, &pfec, page_order);
+
+    if ( gfn == INVALID_GFN ) 
+        return NESTEDHVM_PAGEFAULT_INJECT;
+
+    *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK);
+    return NESTEDHVM_PAGEFAULT_DONE;
+}
+
+
 enum hvm_intblk nsvm_intr_blocked(struct vcpu *v)
 {
     struct nestedsvm *svm = &vcpu_nestedsvm(v);
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 2c8504a..acd2d49 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -2008,6 +2008,7 @@ static struct hvm_function_table __read_mostly
svm_function_table = {
     .nhvm_vmcx_guest_intercepts_trap = nsvm_vmcb_guest_intercepts_trap,
     .nhvm_vmcx_hap_enabled = nsvm_vmcb_hap_enabled,
     .nhvm_intr_blocked = nsvm_intr_blocked,
+    .nhvm_hap_walk_L1_p2m = nsvm_hap_walk_L1_p2m,
 };
 
 void svm_vmexit_handler(struct cpu_user_regs *regs)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 98309da..4abfa90 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1511,7 +1511,8 @@ static struct hvm_function_table __read_mostly
vmx_function_table = {
     .nhvm_intr_blocked    = nvmx_intr_blocked,
     .nhvm_domain_relinquish_resources = nvmx_domain_relinquish_resources,
     .update_eoi_exit_bitmap = vmx_update_eoi_exit_bitmap,
-    .virtual_intr_delivery_enabled = vmx_virtual_intr_delivery_enabled
+    .virtual_intr_delivery_enabled = vmx_virtual_intr_delivery_enabled,
+    .nhvm_hap_walk_L1_p2m = nvmx_hap_walk_L1_p2m,
 };
 
 struct hvm_function_table * __init start_vmx(void)
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 6d1a736..4495dd6 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1445,6 +1445,19 @@ int nvmx_msr_write_intercept(unsigned int msr, u64
msr_content)
     return 1;
 }
 
+/* This function uses L2_gpa to walk the P2M page table in L1. If the 
+ * walk is successful, the translated value is returned in
+ * L1_gpa. The result value tells what to do next.
+ */
+int
+nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x)
+{
+    /*TODO:*/
+    return 0;
+}
+
 void nvmx_idtv_handling(void)
 {
     struct vcpu *v = current;
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index f9a5edc..8787c91 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -136,6 +136,22 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
     }
 }
 
+/* This function uses L2_gpa to walk the P2M page table in L1. If the 
+ * walk is successful, the translated value is returned in
+ * L1_gpa. The result value tells what to do next.
+ */
+static int
+nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x)
+{
+    ASSERT(hvm_funcs.nhvm_hap_walk_L1_p2m);
+
+    return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order,
+        access_r, access_w, access_x);
+}
+
+
 /* This function uses L1_gpa to walk the P2M table in L0 hypervisor. If the
  * walk is successful, the translated value is returned in L0_gpa. The return 
  * value tells the upper level what to do.
@@ -175,36 +191,6 @@ out:
     return rc;
 }
 
-/* This function uses L2_gpa to walk the P2M page table in L1. If the 
- * walk is successful, the translated value is returned in
- * L1_gpa. The result value tells what to do next.
- */
-static int
-nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
-                      bool_t access_r, bool_t access_w, bool_t access_x)
-{
-    uint32_t pfec;
-    unsigned long nested_cr3, gfn;
-    
-    nested_cr3 = nhvm_vcpu_p2m_base(v);
-
-    pfec = PFEC_user_mode | PFEC_page_present;
-    if (access_w)
-        pfec |= PFEC_write_access;
-    if (access_x)
-        pfec |= PFEC_insn_fetch;
-
-    /* Walk the guest-supplied NPT table, just as if it were a pagetable */
-    gfn = paging_ga_to_gfn_cr3(v, nested_cr3, L2_gpa, &pfec, page_order);
-
-    if ( gfn == INVALID_GFN ) 
-        return NESTEDHVM_PAGEFAULT_INJECT;
-
-    *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK);
-    return NESTEDHVM_PAGEFAULT_DONE;
-}
-
 /*
  * The following function, nestedhap_page_fault(), is for steps (3)--(10).
  *
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index d3535b6..80f07e9 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -183,6 +183,11 @@ struct hvm_function_table {
     /* Virtual interrupt delivery */
     void (*update_eoi_exit_bitmap)(struct vcpu *v, u8 vector, u8 trig);
     int (*virtual_intr_delivery_enabled)(void);
+
+    /*Walk nested p2m  */
+    int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa, paddr_t
*L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x);
 };
 
 extern struct hvm_function_table hvm_funcs;
diff --git a/xen/include/asm-x86/hvm/svm/nestedsvm.h
b/xen/include/asm-x86/hvm/svm/nestedsvm.h
index fa83023..0c90f30 100644
--- a/xen/include/asm-x86/hvm/svm/nestedsvm.h
+++ b/xen/include/asm-x86/hvm/svm/nestedsvm.h
@@ -133,6 +133,9 @@ int nsvm_wrmsr(struct vcpu *v, unsigned int msr, uint64_t
msr_content);
 void svm_vmexit_do_clgi(struct cpu_user_regs *regs, struct vcpu *v);
 void svm_vmexit_do_stgi(struct cpu_user_regs *regs, struct vcpu *v);
 bool_t nestedsvm_gif_isset(struct vcpu *v);
+int nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x);
 
 #define NSVM_INTR_NOTHANDLED     3
 #define NSVM_INTR_NOTINTERCEPTED 2
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index d97011d..422f006 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -108,6 +108,11 @@ void nvmx_domain_relinquish_resources(struct domain *d);
 
 int nvmx_handle_vmxon(struct cpu_user_regs *regs);
 int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
+
+int
+nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
+                      unsigned int *page_order,
+                      bool_t access_r, bool_t access_w, bool_t access_x);
 /*
  * Virtual VMCS layout
  *
-- 
1.7.1

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 03/10] nested_ept: Implement guest ept''s walker

From: Zhang Xiantao <xiantao.zhang@intel.com>

Implment guest EPT PT walker, some logic is based on shadow''s
ia32e PT walker. During the PT walking, if the target pages are
not in memory, use RETRY mechanism and get a chance to let the
target page back.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/hvm.c              |    1 +
 xen/arch/x86/hvm/vmx/vvmx.c         |   44 +++++-
 xen/arch/x86/mm/guest_walk.c        |   16 ++-
 xen/arch/x86/mm/hap/Makefile        |    1 +
 xen/arch/x86/mm/hap/nested_ept.c    |  280 +++++++++++++++++++++++++++++++++++
 xen/arch/x86/mm/hap/nested_hap.c    |    2 +-
 xen/arch/x86/mm/shadow/multi.c      |    2 +-
 xen/include/asm-x86/guest_pt.h      |    8 +
 xen/include/asm-x86/hvm/nestedhvm.h |    2 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  |    1 +
 xen/include/asm-x86/hvm/vmx/vmx.h   |   28 ++++
 xen/include/asm-x86/hvm/vmx/vvmx.h  |   15 ++
 12 files changed, 390 insertions(+), 10 deletions(-)
 create mode 100644 xen/arch/x86/mm/hap/nested_ept.c

diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 1cae8a8..3cd0075 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -1324,6 +1324,7 @@ int hvm_hap_nested_page_fault(paddr_t gpa,
                                              access_r, access_w, access_x);
         switch (rv) {
         case NESTEDHVM_PAGEFAULT_DONE:
+        case NESTEDHVM_PAGEFAULT_RETRY:
             return 1;
         case NESTEDHVM_PAGEFAULT_L1_ERROR:
             /* An error occured while translating gpa from
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 4495dd6..f9e620c 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -906,9 +906,18 @@ static void sync_vvmcs_ro(struct vcpu *v)
 {
     int i;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+    void *vvmcs = nvcpu->nv_vvmcx;
 
     for ( i = 0; i < ARRAY_SIZE(vmcs_ro_field); i++ )
         shadow_to_vvmcs(nvcpu->nv_vvmcx, vmcs_ro_field[i]);
+
+    /* Adjust exit_reason/exit_qualifciation for violation case */
+    if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) =+               
EXIT_REASON_EPT_VIOLATION ) {
+        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept_exit.exit_qual);
+        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept_exit.exit_reason);
+    }
 }
 
 static void load_vvmcs_host_state(struct vcpu *v)
@@ -1454,8 +1463,39 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa,
paddr_t *L1_gpa,
                       unsigned int *page_order,
                       bool_t access_r, bool_t access_w, bool_t access_x)
 {
-    /*TODO:*/
-    return 0;
+    uint64_t exit_qual = __vmread(EXIT_QUALIFICATION);
+    uint32_t exit_reason = EXIT_REASON_EPT_VIOLATION;
+    int rc;
+    unsigned long gfn;
+    uint32_t rwx_rights = (access_x << 2) | (access_w << 1) |
access_r;
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+
+    rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn,
+                                &exit_qual, &exit_reason);
+    switch ( rc ) {
+        case EPT_TRANSLATE_SUCCEED:
+            *L1_gpa = (gfn << PAGE_SHIFT) + (L2_gpa & ~PAGE_MASK);
+            rc = NESTEDHVM_PAGEFAULT_DONE;
+            break;
+        case EPT_TRANSLATE_VIOLATION:
+        case EPT_TRANSLATE_MISCONFIG:
+            rc = NESTEDHVM_PAGEFAULT_INJECT;
+            nvmx->ept_exit.exit_reason = exit_reason;
+            nvmx->ept_exit.exit_qual = exit_qual;
+            break;
+        case EPT_TRANSLATE_RETRY:
+            rc = NESTEDHVM_PAGEFAULT_RETRY;
+            break;
+        case EPT_TRANSLATE_ERR_PAGE:
+            rc = NESTEDHVM_PAGEFAULT_L1_ERROR;
+            break;
+        default:
+            rc = NESTEDHVM_PAGEFAULT_UNHANDLED;
+            gdprintk(XENLOG_ERR, "GUEST EPT translation error!:%d\n",
rc);
+            break;
+    }
+
+    return rc;
 }
 
 void nvmx_idtv_handling(void)
diff --git a/xen/arch/x86/mm/guest_walk.c b/xen/arch/x86/mm/guest_walk.c
index 0f08fb0..1c165c6 100644
--- a/xen/arch/x86/mm/guest_walk.c
+++ b/xen/arch/x86/mm/guest_walk.c
@@ -88,18 +88,19 @@ static uint32_t set_ad_bits(void *guest_p, void *walk_p, int
set_dirty)
 
 /* If the map is non-NULL, we leave this function having 
  * acquired an extra ref on mfn_to_page(*mfn) */
-static inline void *map_domain_gfn(struct p2m_domain *p2m,
-                                   gfn_t gfn, 
+void *map_domain_gfn(struct p2m_domain *p2m,
+                                   gfn_t gfn,
                                    mfn_t *mfn,
                                    p2m_type_t *p2mt,
-                                   uint32_t *rc) 
+                                   p2m_query_t q,
+                                   uint32_t *rc)
 {
     struct page_info *page;
     void *map;
 
     /* Translate the gfn, unsharing if shared */
     page = get_page_from_gfn_p2m(p2m->domain, p2m, gfn_x(gfn), p2mt, NULL,
-                                  P2M_ALLOC | P2M_UNSHARE);
+                                  q);
     if ( p2m_is_paging(*p2mt) )
     {
         ASSERT(!p2m_is_nestedp2m(p2m));
@@ -128,7 +129,6 @@ static inline void *map_domain_gfn(struct p2m_domain *p2m,
     return map;
 }
 
-
 /* Walk the guest pagetables, after the manner of a hardware walker. */
 /* Because the walk is essentially random, it can cause a deadlock 
  * warning in the p2m locking code. Highly unlikely this is an actual
@@ -149,6 +149,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     uint32_t gflags, mflags, iflags, rc = 0;
     int smep;
     bool_t pse1G = 0, pse2M = 0;
+    p2m_query_t qt = P2M_ALLOC | P2M_UNSHARE;
 
     perfc_incr(guest_walk);
     memset(gw, 0, sizeof(*gw));
@@ -188,7 +189,8 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
     l3p = map_domain_gfn(p2m, 
                          guest_l4e_get_gfn(gw->l4e), 
                          &gw->l3mfn,
-                         &p2mt, 
+                         &p2mt,
+                         qt, 
                          &rc); 
     if(l3p == NULL)
         goto out;
@@ -249,6 +251,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
                          guest_l3e_get_gfn(gw->l3e), 
                          &gw->l2mfn,
                          &p2mt, 
+                         qt,
                          &rc); 
     if(l2p == NULL)
         goto out;
@@ -322,6 +325,7 @@ guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m,
                              guest_l2e_get_gfn(gw->l2e), 
                              &gw->l1mfn,
                              &p2mt,
+                             qt,
                              &rc);
         if(l1p == NULL)
             goto out;
diff --git a/xen/arch/x86/mm/hap/Makefile b/xen/arch/x86/mm/hap/Makefile
index 80a6bec..68f2bb5 100644
--- a/xen/arch/x86/mm/hap/Makefile
+++ b/xen/arch/x86/mm/hap/Makefile
@@ -3,6 +3,7 @@ obj-y += guest_walk_2level.o
 obj-y += guest_walk_3level.o
 obj-$(x86_64) += guest_walk_4level.o
 obj-y += nested_hap.o
+obj-y += nested_ept.o
 
 guest_walk_%level.o: guest_walk.c Makefile
 	$(CC) $(CFLAGS) -DGUEST_PAGING_LEVELS=$* -c $< -o $@
diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c
new file mode 100644
index 0000000..c3e698c
--- /dev/null
+++ b/xen/arch/x86/mm/hap/nested_ept.c
@@ -0,0 +1,280 @@
+/*
+ * nested_ept.c: Handling virtulized EPT for guest in nested case.
+ *
+ * Copyright (c) 2012, Intel Corporation
+ *  Xiantao Zhang <xiantao.zhang@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
+ * Place - Suite 330, Boston, MA 02111-1307 USA.
+ */
+#include <asm/domain.h>
+#include <asm/page.h>
+#include <asm/paging.h>
+#include <asm/p2m.h>
+#include <asm/mem_event.h>
+#include <public/mem_event.h>
+#include <asm/mem_sharing.h>
+#include <xen/event.h>
+#include <asm/hap.h>
+#include <asm/hvm/support.h>
+
+#include <asm/hvm/nestedhvm.h>
+
+#include "private.h"
+
+#include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vvmx.h>
+
+/* EPT always use 4-level paging structure */
+#define GUEST_PAGING_LEVELS 4
+#include <asm/guest_pt.h>
+
+/* Must reserved bits in all level entries  */
+#define EPT_MUST_RSV_BITS (((1ull << PADDR_BITS) -1) & \
+                     ~((1ull << paddr_bits) - 1))
+
+/*
+ *TODO: Just leave it as 0 here for compile pass, will
+ * define real capabilities in the subsequent patches.
+ */
+#define NEPT_VPID_CAP_BITS 0
+
+
+#define NEPT_1G_ENTRY_FLAG (1 << 11)
+#define NEPT_2M_ENTRY_FLAG (1 << 10)
+#define NEPT_4K_ENTRY_FLAG (1 << 9)
+
+bool_t nept_sp_entry(ept_entry_t e)
+{
+    return !!(e.sp);
+}
+
+static bool_t nept_rsv_bits_check(ept_entry_t e, uint32_t level)
+{
+    uint64_t rsv_bits = EPT_MUST_RSV_BITS;
+
+    switch ( level ) {
+    case 1:
+        break;
+    case 2 ... 3:
+        if (nept_sp_entry(e))
+            rsv_bits |=  ((1ull << (9 * (level -1 ))) -1) <<
PAGE_SHIFT;
+        else
+            rsv_bits |= EPTE_EMT_MASK | EPTE_IGMT_MASK;
+        break;
+    case 4:
+        rsv_bits |= EPTE_EMT_MASK | EPTE_IGMT_MASK | EPTE_SUPER_PAGE_MASK;
+    break;
+    default:
+        gdprintk(XENLOG_ERR,"Unsupported EPT paging level: %d\n",
level);
+        BUG();
+        break;
+    }
+    return !!(e.epte & rsv_bits);
+}
+
+/* EMT checking*/
+static bool_t nept_emt_bits_check(ept_entry_t e, uint32_t level)
+{
+    if ( e.sp || level == 1 ) {
+        if ( e.emt == EPT_EMT_RSV0 || e.emt == EPT_EMT_RSV1 ||
+                e.emt == EPT_EMT_RSV2 )
+            return 1;
+    }
+    return 0;
+}
+
+static bool_t nept_rwx_bits_check(ept_entry_t e) {
+    /*write only or write/execute only*/
+    uint8_t rwx_bits = e.epte & EPTE_RWX_MASK;
+
+    if ( rwx_bits == ept_access_w || rwx_bits == ept_access_wx )
+        return 1;
+
+    if ( rwx_bits == ept_access_x && !(NEPT_VPID_CAP_BITS &
+                        VMX_EPT_EXEC_ONLY_SUPPORTED))
+        return 1;
+
+    return 0;
+}
+
+/* nept''s misconfiguration check */
+static bool_t nept_misconfiguration_check(ept_entry_t e, uint32_t level)
+{
+    return (nept_rsv_bits_check(e, level) ||
+                nept_emt_bits_check(e, level) ||
+                nept_rwx_bits_check(e));
+}
+
+static bool_t nept_permission_check(uint32_t rwx_acc, uint32_t rwx_bits)
+{
+    return !(EPTE_RWX_MASK & rwx_acc & ~rwx_bits);
+}
+
+/* nept''s non-present check */
+static bool_t nept_non_present_check(ept_entry_t e)
+{
+    if (e.epte & EPTE_RWX_MASK)
+        return 0;
+    return 1;
+}
+
+uint64_t nept_get_ept_vpid_cap(void)
+{
+    return NEPT_VPID_CAP_BITS;
+}
+
+static int ept_lvl_table_offset(unsigned long gpa, int lvl)
+{
+    return (gpa >>(EPT_L4_PAGETABLE_SHIFT -(4 - lvl) * 9)) &
+                (EPT_PAGETABLE_ENTRIES -1 );
+}
+
+static uint32_t
+nept_walk_tables(struct vcpu *v, unsigned long l2ga, ept_walk_t *gw)
+{
+    int lvl;
+    p2m_type_t p2mt;
+    uint32_t rc = 0, ret = 0, gflags;
+    struct domain *d = v->domain;
+    struct p2m_domain *p2m = d->arch.p2m;
+    gfn_t base_gfn = _gfn(nhvm_vcpu_p2m_base(v) >> PAGE_SHIFT);
+    mfn_t lxmfn;
+    ept_entry_t *lxp = NULL;
+
+    memset(gw, 0, sizeof(*gw));
+
+    for (lvl = 4; lvl > 0; lvl--)
+    {
+        lxp = map_domain_gfn(p2m, base_gfn, &lxmfn, &p2mt, P2M_ALLOC,
&rc);
+        if ( !lxp )
+            goto map_err;
+        gw->lxe[lvl] = lxp[ept_lvl_table_offset(l2ga, lvl)];
+        unmap_domain_page(lxp);
+        put_page(mfn_to_page(mfn_x(lxmfn)));
+
+        if (nept_non_present_check(gw->lxe[lvl]))
+            goto non_present;
+
+        if (nept_misconfiguration_check(gw->lxe[lvl], lvl))
+            goto misconfig_err;
+
+        if ( (lvl == 2 || lvl == 3) && nept_sp_entry(gw->lxe[lvl]) )
+        {
+            /* Generate a fake l1 table entry so callers don''t all
+             * have to understand superpages. */
+            unsigned long gfn_lvl_mask =  (1ull << ((lvl - 1) * 9)) - 1;
+            gfn_t start = _gfn(gw->lxe[lvl].mfn);
+            /* Increment the pfn by the right number of 4k pages. */
+            start = _gfn((gfn_x(start) & ~gfn_lvl_mask) +
+                     ((l2ga >> PAGE_SHIFT) & gfn_lvl_mask));
+            gflags = (gw->lxe[lvl].epte & EPTE_FLAG_MASK) |
+                    (lvl == 3 ? NEPT_1G_ENTRY_FLAG: NEPT_2M_ENTRY_FLAG);
+            gw->lxe[0].epte = (gfn_x(start) << PAGE_SHIFT) | gflags;
+            goto done;
+        }
+        if ( lvl > 1 )
+            base_gfn = _gfn(gw->lxe[lvl].mfn);
+    }
+
+    /* If this is not a super entry, we can reach here. */
+    gflags = (gw->lxe[1].epte & EPTE_FLAG_MASK) | NEPT_4K_ENTRY_FLAG;
+    gw->lxe[0].epte = (gw->lxe[1].epte & PAGE_MASK) | gflags;
+
+done:
+    ret = EPT_TRANSLATE_SUCCEED;
+    goto out;
+
+map_err:
+    if ( rc == _PAGE_PAGED )
+        ret = EPT_TRANSLATE_RETRY;
+    else
+        ret = EPT_TRANSLATE_ERR_PAGE;
+    goto out;
+
+misconfig_err:
+    ret =  EPT_TRANSLATE_MISCONFIG;
+    goto out;
+
+non_present:
+    ret = EPT_TRANSLATE_VIOLATION;
+    /* fall through. */
+out:
+    return ret;
+}
+
+/* Translate a L2 guest address to L1 gpa via L1 EPT paging structure */
+
+int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
+                        unsigned int *page_order, uint32_t rwx_acc,
+                        unsigned long *l1gfn, uint64_t *exit_qual,
+                        uint32_t *exit_reason)
+{
+    uint32_t rc, rwx_bits = 0;
+    ept_walk_t gw;
+    rwx_acc &= EPTE_RWX_MASK;
+
+    *l1gfn = INVALID_GFN;
+
+    rc = nept_walk_tables(v, l2ga, &gw);
+    switch ( rc ) {
+    case EPT_TRANSLATE_SUCCEED:
+        if ( likely(gw.lxe[0].epte & NEPT_2M_ENTRY_FLAG) )
+        {
+            rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte & gw.lxe[2].epte
&
+                            EPTE_RWX_MASK;
+            *page_order = 9;
+        }
+        else if ( gw.lxe[0].epte & NEPT_4K_ENTRY_FLAG ) {
+            rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte & gw.lxe[2].epte
&
+                    gw.lxe[1].epte & EPTE_RWX_MASK;
+            *page_order = 0;
+        }
+        else if ( gw.lxe[0].epte & NEPT_1G_ENTRY_FLAG  )
+        {
+            rwx_bits = gw.lxe[4].epte & gw.lxe[3].epte  &
EPTE_RWX_MASK;
+            *page_order = 18;
+        }
+        else
+        {
+            gdprintk(XENLOG_ERR, "Uncorrect l1 entry!\n");
+            BUG();
+        }
+        if ( nept_permission_check(rwx_acc, rwx_bits) )
+        {
+            *l1gfn = gw.lxe[0].mfn;
+            break;
+        }
+        rc = EPT_TRANSLATE_VIOLATION;
+    /* Fall through to EPT violation if permission check fails. */
+    case EPT_TRANSLATE_VIOLATION:
+        *exit_qual = (*exit_qual & 0xffffffc0) | (rwx_bits << 3) |
rwx_acc;
+        *exit_reason = EXIT_REASON_EPT_VIOLATION;
+        break;
+
+    case EPT_TRANSLATE_ERR_PAGE:
+        break;
+    case EPT_TRANSLATE_MISCONFIG:
+        rc = EPT_TRANSLATE_MISCONFIG;
+        *exit_qual = 0;
+        *exit_reason = EXIT_REASON_EPT_MISCONFIG;
+        break;
+    case EPT_TRANSLATE_RETRY:
+        break;
+    default:
+        rc = EPT_TRANSLATE_UNSUPPORTED;
+        gdprintk(XENLOG_ERR, "Unsupported ept translation
type!:%d\n", rc);
+        break;
+    }
+    return rc;
+}
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 8787c91..6d1264b 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -217,7 +217,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t
*L2_gpa,
     /* let caller to handle these two cases */
     switch (rv) {
     case NESTEDHVM_PAGEFAULT_INJECT:
-        return rv;
+    case NESTEDHVM_PAGEFAULT_RETRY:
     case NESTEDHVM_PAGEFAULT_L1_ERROR:
         return rv;
     case NESTEDHVM_PAGEFAULT_DONE:
diff --git a/xen/arch/x86/mm/shadow/multi.c b/xen/arch/x86/mm/shadow/multi.c
index 4967da1..409198c 100644
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -4582,7 +4582,7 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v,
     /* Translate the GFN to an MFN */
     ASSERT(!paging_locked_by_me(v->domain));
     mfn = get_gfn(v->domain, _gfn(gfn), &p2mt);
-        
+
     if ( p2m_is_readonly(p2mt) )
     {
         put_gfn(v->domain, gfn);
diff --git a/xen/include/asm-x86/guest_pt.h b/xen/include/asm-x86/guest_pt.h
index 4e1dda0..db8a0b6 100644
--- a/xen/include/asm-x86/guest_pt.h
+++ b/xen/include/asm-x86/guest_pt.h
@@ -315,6 +315,14 @@ guest_walk_to_page_order(walk_t *gw)
 #define GPT_RENAME2(_n, _l) _n ## _ ## _l ## _levels
 #define GPT_RENAME(_n, _l) GPT_RENAME2(_n, _l)
 #define guest_walk_tables GPT_RENAME(guest_walk_tables, GUEST_PAGING_LEVELS)
+#define map_domain_gfn GPT_RENAME(map_domain_gfn, GUEST_PAGING_LEVELS)
+
+extern void *map_domain_gfn(struct p2m_domain *p2m,
+                                   gfn_t gfn,
+                                   mfn_t *mfn,
+                                   p2m_type_t *p2mt,
+                                   p2m_query_t q,
+                                   uint32_t *rc);
 
 extern uint32_t 
 guest_walk_tables(struct vcpu *v, struct p2m_domain *p2m, unsigned long va,
diff --git a/xen/include/asm-x86/hvm/nestedhvm.h
b/xen/include/asm-x86/hvm/nestedhvm.h
index 91fde0b..4c489d2 100644
--- a/xen/include/asm-x86/hvm/nestedhvm.h
+++ b/xen/include/asm-x86/hvm/nestedhvm.h
@@ -47,11 +47,13 @@ bool_t nestedhvm_vcpu_in_guestmode(struct vcpu *v);
     vcpu_nestedhvm(v).nv_guestmode = 0
 
 /* Nested paging */
+#define NESTEDHVM_PAGEFAULT_UNHANDLED -1
 #define NESTEDHVM_PAGEFAULT_DONE       0
 #define NESTEDHVM_PAGEFAULT_INJECT     1
 #define NESTEDHVM_PAGEFAULT_L1_ERROR   2
 #define NESTEDHVM_PAGEFAULT_L0_ERROR   3
 #define NESTEDHVM_PAGEFAULT_MMIO       4
+#define NESTEDHVM_PAGEFAULT_RETRY      5
 int nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t *L2_gpa,
     bool_t access_r, bool_t access_w, bool_t access_x);
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index ef2c9c9..9a728b6 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -194,6 +194,7 @@ extern u32 vmx_secondary_exec_control;
 
 extern bool_t cpu_has_vmx_ins_outs_instr_info;
 
+#define VMX_EPT_EXEC_ONLY_SUPPORTED             0x00000001
 #define VMX_EPT_WALK_LENGTH_4_SUPPORTED         0x00000040
 #define VMX_EPT_MEMORY_TYPE_UC                  0x00000100
 #define VMX_EPT_MEMORY_TYPE_WB                  0x00004000
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
b/xen/include/asm-x86/hvm/vmx/vmx.h
index aa5b080..feaaa80 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -51,6 +51,11 @@ typedef union {
     u64 epte;
 } ept_entry_t;
 
+typedef struct {
+    /*use lxe[0] to save result */
+    ept_entry_t lxe[5];
+} ept_walk_t;
+
 #define EPT_TABLE_ORDER         9
 #define EPTE_SUPER_PAGE_MASK    0x80
 #define EPTE_MFN_MASK           0xffffffffff000ULL
@@ -60,6 +65,28 @@ typedef union {
 #define EPTE_AVAIL1_SHIFT       8
 #define EPTE_EMT_SHIFT          3
 #define EPTE_IGMT_SHIFT         6
+#define EPTE_RWX_MASK           0x7
+#define EPTE_FLAG_MASK          0x7f
+
+#define EPT_EMT_UC              0
+#define EPT_EMT_WC              1
+#define EPT_EMT_RSV0            2
+#define EPT_EMT_RSV1            3
+#define EPT_EMT_WT              4
+#define EPT_EMT_WP              5
+#define EPT_EMT_WB              6
+#define EPT_EMT_RSV2            7
+
+typedef enum {
+    ept_access_n     = 0, /* No access permissions allowed */
+    ept_access_r     = 1,
+    ept_access_w     = 2,
+    ept_access_rw    = 3,
+    ept_access_x     = 4,
+    ept_access_rx    = 5,
+    ept_access_wx    = 6,
+    ept_access_all   = 7,
+} ept_access_t;
 
 void vmx_asm_vmexit_handler(struct cpu_user_regs);
 void vmx_asm_do_vmentry(void);
@@ -419,6 +446,7 @@ void update_guest_eip(void);
 #define _EPT_GLA_FAULT              8
 #define EPT_GLA_FAULT               (1UL<<_EPT_GLA_FAULT)
 
+#define EPT_L4_PAGETABLE_SHIFT      39
 #define EPT_PAGETABLE_ENTRIES       512
 
 #endif /* __ASM_X86_HVM_VMX_VMX_H__ */
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 422f006..245fddb 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -32,6 +32,10 @@ struct nestedvmx {
         unsigned long intr_info;
         u32           error_code;
     } intr;
+    struct {
+        uint32_t exit_reason;
+        uint32_t exit_qual;
+    } ept_exit;
 };
 
 #define vcpu_2_nvmx(v)	(vcpu_nestedhvm(v).u.nvmx)
@@ -109,6 +113,13 @@ void nvmx_domain_relinquish_resources(struct domain *d);
 int nvmx_handle_vmxon(struct cpu_user_regs *regs);
 int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
 
+#define EPT_TRANSLATE_UNSUPPORTED  -1
+#define EPT_TRANSLATE_SUCCEED       0
+#define EPT_TRANSLATE_VIOLATION     1
+#define EPT_TRANSLATE_ERR_PAGE      2
+#define EPT_TRANSLATE_MISCONFIG     3
+#define EPT_TRANSLATE_RETRY         4
+
 int
 nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
                       unsigned int *page_order,
@@ -192,5 +203,9 @@ u64 nvmx_get_tsc_offset(struct vcpu *v);
 int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
                           unsigned int exit_reason);
 
+int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, 
+                        unsigned int *page_order, uint32_t rwx_acc,
+                        unsigned long *l1gfn, uint64_t *exit_qual,
+                        uint32_t *exit_reason);
 #endif /* __ASM_X86_HVM_VVMX_H__ */
 
-- 
1.7.1

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 04/10] EPT: Make ept data structure or operations neutral

From: Zhang Xiantao <xiantao.zhang@intel.com>

Share the current EPT logic with nested EPT case, so
make the related data structure or operations netural
to comment EPT and nested EPT.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        |    9 +++-
 xen/arch/x86/hvm/vmx/vmx.c         |   53 ++-----------------
 xen/arch/x86/mm/p2m-ept.c          |  104 ++++++++++++++++++++++++++++--------
 xen/arch/x86/mm/p2m.c              |   23 ++++++---
 xen/include/asm-x86/hvm/vmx/vmcs.h |   23 ++++----
 xen/include/asm-x86/hvm/vmx/vmx.h  |   10 +++-
 xen/include/asm-x86/p2m.h          |    4 ++
 7 files changed, 133 insertions(+), 93 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 9adc7a4..379b75c 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -941,8 +941,13 @@ static int construct_vmcs(struct vcpu *v)
         __vmwrite(TPR_THRESHOLD, 0);
     }
 
-    if ( paging_mode_hap(d) )
-        __vmwrite(EPT_POINTER, d->arch.hvm_domain.vmx.ept_control.eptp);
+    if ( paging_mode_hap(d) ) {
+        struct p2m_domain *p2m = p2m_get_hostp2m(d);
+        struct ept_data *ept = &p2m->ept;
+
+        ept->asr  = pagetable_get_pfn(p2m_get_pagetable(p2m));
+        __vmwrite(EPT_POINTER, ept_get_eptp(ept));
+    }
 
     if ( cpu_has_vmx_pat && paging_mode_hap(d) )
     {
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4abfa90..d74aae0 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -74,38 +74,19 @@ static void vmx_fpu_dirty_intercept(void);
 static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content);
 static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content);
 static void vmx_invlpg_intercept(unsigned long vaddr);
-static void __ept_sync_domain(void *info);
 
 static int vmx_domain_initialise(struct domain *d)
 {
     int rc;
 
-    /* Set the memory type used when accessing EPT paging structures. */
-    d->arch.hvm_domain.vmx.ept_control.ept_mt = EPT_DEFAULT_MT;
-
-    /* set EPT page-walk length, now it''s actual walk length - 1, i.e.
3 */
-    d->arch.hvm_domain.vmx.ept_control.ept_wl = 3;
-
-    d->arch.hvm_domain.vmx.ept_control.asr  -       
pagetable_get_pfn(p2m_get_pagetable(p2m_get_hostp2m(d)));
-
-    if ( !zalloc_cpumask_var(&d->arch.hvm_domain.vmx.ept_synced) )
-        return -ENOMEM;
-
     if ( (rc = vmx_alloc_vlapic_mapping(d)) != 0 )
-    {
-        free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced);
         return rc;
-    }
 
     return 0;
 }
 
 static void vmx_domain_destroy(struct domain *d)
 {
-    if ( paging_mode_hap(d) )
-        on_each_cpu(__ept_sync_domain, d, 1);
-    free_cpumask_var(d->arch.hvm_domain.vmx.ept_synced);
     vmx_free_vlapic_mapping(d);
 }
 
@@ -641,6 +622,7 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
 {
     struct domain *d = v->domain;
     unsigned long old_cr4 = read_cr4(), new_cr4 = mmu_cr4_features;
+    struct ept_data *ept_data = &p2m_get_hostp2m(d)->ept;
 
     /* HOST_CR4 in VMCS is always mmu_cr4_features. Sync CR4 now. */
     if ( old_cr4 != new_cr4 )
@@ -650,10 +632,10 @@ static void vmx_ctxt_switch_to(struct vcpu *v)
     {
         unsigned int cpu = smp_processor_id();
         /* Test-and-test-and-set this CPU in the EPT-is-synced mask. */
-        if ( !cpumask_test_cpu(cpu, d->arch.hvm_domain.vmx.ept_synced)
&&
+        if ( !cpumask_test_cpu(cpu, ept_get_synced_mask(ept_data)) &&
              !cpumask_test_and_set_cpu(cpu,
-                                       d->arch.hvm_domain.vmx.ept_synced) )
-            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0);
+                                       ept_get_synced_mask(ept_data)) )
+            __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0);
     }
 
     vmx_restore_guest_msrs(v);
@@ -1216,33 +1198,6 @@ static void vmx_update_guest_efer(struct vcpu *v)
                    (v->arch.hvm_vcpu.guest_efer & EFER_SCE));
 }
 
-static void __ept_sync_domain(void *info)
-{
-    struct domain *d = info;
-    __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(d), 0);
-}
-
-void ept_sync_domain(struct domain *d)
-{
-    /* Only if using EPT and this domain has some VCPUs to dirty. */
-    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
-        return;
-
-    ASSERT(local_irq_is_enabled());
-
-    /*
-     * Flush active cpus synchronously. Flush others the next time this domain
-     * is scheduled onto them. We accept the race of other CPUs adding to
-     * the ept_synced mask before on_selected_cpus() reads it, resulting in
-     * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack.
-     */
-    cpumask_and(d->arch.hvm_domain.vmx.ept_synced,
-                d->domain_dirty_cpumask, &cpu_online_map);
-
-    on_selected_cpus(d->arch.hvm_domain.vmx.ept_synced,
-                     __ept_sync_domain, d, 1);
-}
-
 void nvmx_enqueue_n2_exceptions(struct vcpu *v, 
             unsigned long intr_fields, int error_code)
 {
diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
index c964f54..e33f415 100644
--- a/xen/arch/x86/mm/p2m-ept.c
+++ b/xen/arch/x86/mm/p2m-ept.c
@@ -291,9 +291,11 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn,
mfn_t mfn,
     int need_modify_vtd_table = 1;
     int vtd_pte_present = 0;
     int needs_sync = 1;
-    struct domain *d = p2m->domain;
     ept_entry_t old_entry = { .epte = 0 };
+    struct ept_data *ept = &p2m->ept;
+    struct domain *d = p2m->domain;
 
+    ASSERT(ept);
     /*
      * the caller must make sure:
      * 1. passing valid gfn and mfn at order boundary.
@@ -301,17 +303,17 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long gfn,
mfn_t mfn,
      * 3. passing a valid order.
      */
     if ( ((gfn | mfn_x(mfn)) & ((1UL << order) - 1)) ||
-         ((u64)gfn >> ((ept_get_wl(d) + 1) * EPT_TABLE_ORDER)) ||
+         ((u64)gfn >> ((ept_get_wl(ept) + 1) * EPT_TABLE_ORDER)) ||
          (order % EPT_TABLE_ORDER) )
         return 0;
 
-    ASSERT((target == 2 && hvm_hap_has_1gb(d)) ||
-           (target == 1 && hvm_hap_has_2mb(d)) ||
+    ASSERT((target == 2 && hvm_hap_has_1gb()) ||
+           (target == 1 && hvm_hap_has_2mb()) ||
            (target == 0));
 
-    table = map_domain_page(ept_get_asr(d));
+    table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
 
-    for ( i = ept_get_wl(d); i > target; i-- )
+    for ( i = ept_get_wl(ept); i > target; i-- )
     {
         ret = ept_next_level(p2m, 0, &table, &gfn_remainder, i);
         if ( !ret )
@@ -439,9 +441,11 @@ out:
     unmap_domain_page(table);
 
     if ( needs_sync )
-        ept_sync_domain(p2m->domain);
+        ept_sync_domain(p2m);
 
-    if ( rv && iommu_enabled && need_iommu(p2m->domain)
&& need_modify_vtd_table )
+    /* For non-nested p2m, may need to change VT-d page table.*/
+    if ( rv && !p2m_is_nestedp2m(p2m) && iommu_enabled
&& need_iommu(p2m->domain) &&
+                need_modify_vtd_table )
     {
         if ( iommu_hap_pt_share )
             iommu_pte_flush(d, gfn, (u64*)ept_entry, order, vtd_pte_present);
@@ -488,14 +492,14 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
                            unsigned long gfn, p2m_type_t *t, p2m_access_t* a,
                            p2m_query_t q, unsigned int *page_order)
 {
-    struct domain *d = p2m->domain;
-    ept_entry_t *table = map_domain_page(ept_get_asr(d));
+    ept_entry_t *table =
map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
     unsigned long gfn_remainder = gfn;
     ept_entry_t *ept_entry;
     u32 index;
     int i;
     int ret = 0;
     mfn_t mfn = _mfn(INVALID_MFN);
+    struct ept_data *ept = &p2m->ept;
 
     *t = p2m_mmio_dm;
     *a = p2m_access_n;
@@ -506,7 +510,7 @@ static mfn_t ept_get_entry(struct p2m_domain *p2m,
 
     /* Should check if gfn obeys GAW here. */
 
-    for ( i = ept_get_wl(d); i > 0; i-- )
+    for ( i = ept_get_wl(ept); i > 0; i-- )
     {
     retry:
         ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i);
@@ -588,19 +592,20 @@ out:
 static ept_entry_t ept_get_entry_content(struct p2m_domain *p2m,
     unsigned long gfn, int *level)
 {
-    ept_entry_t *table = map_domain_page(ept_get_asr(p2m->domain));
+    ept_entry_t *table = 
map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
     unsigned long gfn_remainder = gfn;
     ept_entry_t *ept_entry;
     ept_entry_t content = { .epte = 0 };
     u32 index;
     int i;
     int ret=0;
+    struct ept_data *ept = &p2m->ept;
 
     /* This pfn is higher than the highest the p2m map currently holds */
     if ( gfn > p2m->max_mapped_pfn )
         goto out;
 
-    for ( i = ept_get_wl(p2m->domain); i > 0; i-- )
+    for ( i = ept_get_wl(ept); i > 0; i-- )
     {
         ret = ept_next_level(p2m, 1, &table, &gfn_remainder, i);
         if ( !ret || ret == GUEST_TABLE_POD_PAGE )
@@ -622,7 +627,8 @@ static ept_entry_t ept_get_entry_content(struct p2m_domain
*p2m,
 void ept_walk_table(struct domain *d, unsigned long gfn)
 {
     struct p2m_domain *p2m = p2m_get_hostp2m(d);
-    ept_entry_t *table = map_domain_page(ept_get_asr(d));
+    struct ept_data *ept = &p2m->ept;
+    ept_entry_t *table = 
map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
     unsigned long gfn_remainder = gfn;
 
     int i;
@@ -638,7 +644,7 @@ void ept_walk_table(struct domain *d, unsigned long gfn)
         goto out;
     }
 
-    for ( i = ept_get_wl(d); i >= 0; i-- )
+    for ( i = ept_get_wl(ept); i >= 0; i-- )
     {
         ept_entry_t *ept_entry, *next;
         u32 index;
@@ -778,24 +784,76 @@ static void ept_change_entry_type_page(mfn_t ept_page_mfn,
int ept_page_level,
 static void ept_change_entry_type_global(struct p2m_domain *p2m,
                                          p2m_type_t ot, p2m_type_t nt)
 {
-    struct domain *d = p2m->domain;
-    if ( ept_get_asr(d) == 0 )
+    struct ept_data *ept = &p2m->ept;
+    if ( ept_get_asr(ept) == 0 )
         return;
 
     BUG_ON(p2m_is_grant(ot) || p2m_is_grant(nt));
     BUG_ON(ot != nt && (ot == p2m_mmio_direct || nt ==
p2m_mmio_direct));
 
-    ept_change_entry_type_page(_mfn(ept_get_asr(d)), ept_get_wl(d), ot, nt);
+    ept_change_entry_type_page(_mfn(ept_get_asr(ept)),
+            ept_get_wl(ept), ot, nt);
+
+    ept_sync_domain(p2m);
+}
+
+static void __ept_sync_domain(void *info)
+{
+    struct ept_data *ept = &((struct p2m_domain *)info)->ept;
 
-    ept_sync_domain(d);
+    __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept), 0);
 }
 
-void ept_p2m_init(struct p2m_domain *p2m)
+void ept_sync_domain(struct p2m_domain *p2m)
 {
+    struct domain *d = p2m->domain;
+    struct ept_data *ept = &p2m->ept;
+    /* Only if using EPT and this domain has some VCPUs to dirty. */
+    if ( !paging_mode_hap(d) || !d->vcpu || !d->vcpu[0] )
+        return;
+
+    ASSERT(local_irq_is_enabled());
+
+    /*
+     * Flush active cpus synchronously. Flush others the next time this domain
+     * is scheduled onto them. We accept the race of other CPUs adding to
+     * the ept_synced mask before on_selected_cpus() reads it, resulting in
+     * unnecessary extra flushes, to avoid allocating a cpumask_t on the stack.
+     */
+    cpumask_and(ept_get_synced_mask(ept),
+                d->domain_dirty_cpumask, &cpu_online_map);
+
+    on_selected_cpus(ept_get_synced_mask(ept),
+                     __ept_sync_domain, p2m, 1);
+}
+
+int ept_p2m_init(struct p2m_domain *p2m)
+{
+    struct ept_data *ept = &p2m->ept;
+
     p2m->set_entry = ept_set_entry;
     p2m->get_entry = ept_get_entry;
     p2m->change_entry_type_global = ept_change_entry_type_global;
     p2m->audit_p2m = NULL;
+
+    /* Set the memory type used when accessing EPT paging structures. */
+    ept->ept_mt = EPT_DEFAULT_MT;
+
+    /* set EPT page-walk length, now it''s actual walk length - 1, i.e.
3 */
+    ept->ept_wl = 3;
+
+    if ( !zalloc_cpumask_var(&ept->synced_mask) )
+        return -ENOMEM;
+
+    on_each_cpu(__ept_sync_domain, p2m, 1);
+
+    return 0;
+}
+
+void ept_p2m_uninit(struct p2m_domain *p2m)
+{
+    struct ept_data *ept = &p2m->ept;
+    free_cpumask_var(ept->synced_mask);
 }
 
 static void ept_dump_p2m_table(unsigned char key)
@@ -811,6 +869,7 @@ static void ept_dump_p2m_table(unsigned char key)
     unsigned long gfn, gfn_remainder;
     unsigned long record_counter = 0;
     struct p2m_domain *p2m;
+    struct ept_data *ept;
 
     for_each_domain(d)
     {
@@ -818,15 +877,16 @@ static void ept_dump_p2m_table(unsigned char key)
             continue;
 
         p2m = p2m_get_hostp2m(d);
+        ept = &p2m->ept;
         printk("\ndomain%d EPT p2m table: \n", d->domain_id);
 
         for ( gfn = 0; gfn <= p2m->max_mapped_pfn; gfn += (1 <<
order) )
         {
             gfn_remainder = gfn;
             mfn = _mfn(INVALID_MFN);
-            table = map_domain_page(ept_get_asr(d));
+            table = map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
 
-            for ( i = ept_get_wl(d); i > 0; i-- )
+            for ( i = ept_get_wl(ept); i > 0; i-- )
             {
                 ret = ept_next_level(p2m, 1, &table, &gfn_remainder,
i);
                 if ( ret != GUEST_TABLE_NORMAL_PAGE )
diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 6a4bdd9..1f59410 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -57,8 +57,10 @@ boolean_param("hap_2mb", opt_hap_2mb);
 
 
 /* Init the datastructures for later use by the p2m code */
-static void p2m_initialise(struct domain *d, struct p2m_domain *p2m)
+static int p2m_initialise(struct domain *d, struct p2m_domain *p2m)
 {
+    int ret = 0;
+
     mm_rwlock_init(&p2m->lock);
     mm_lock_init(&p2m->pod.lock);
     INIT_LIST_HEAD(&p2m->np2m_list);
@@ -72,11 +74,11 @@ static void p2m_initialise(struct domain *d, struct
p2m_domain *p2m)
     p2m->np2m_base = P2M_BASE_EADDR;
 
     if ( hap_enabled(d) && cpu_has_vmx )
-        ept_p2m_init(p2m);
+        ret = ept_p2m_init(p2m);
     else
         p2m_pt_init(p2m);
 
-    return;
+    return ret;
 }
 
 static int
@@ -119,7 +121,7 @@ int p2m_init(struct domain *d)
      * since nestedhvm_enabled(d) returns false here.
      * (p2m_init runs too early for HVM_PARAM_* options) */
     rc = p2m_init_nestedp2m(d);
-    if ( rc ) 
+    if ( rc )
         p2m_final_teardown(d);
     return rc;
 }
@@ -424,12 +426,16 @@ void p2m_teardown(struct p2m_domain *p2m)
 static void p2m_teardown_nestedp2m(struct domain *d)
 {
     uint8_t i;
+    struct p2m_domain *p2m;
 
     for (i = 0; i < MAX_NESTEDP2M; i++) {
         if ( !d->arch.nested_p2m[i] )
             continue;
-        free_cpumask_var(d->arch.nested_p2m[i]->dirty_cpumask);
-        xfree(d->arch.nested_p2m[i]);
+        p2m = d->arch.nested_p2m[i];
+        free_cpumask_var(p2m->dirty_cpumask);
+        if ( hap_enabled(d) && cpu_has_vmx )
+            ept_p2m_uninit(p2m);
+        xfree(p2m);
         d->arch.nested_p2m[i] = NULL;
     }
 }
@@ -437,9 +443,12 @@ static void p2m_teardown_nestedp2m(struct domain *d)
 void p2m_final_teardown(struct domain *d)
 {
     /* Iterate over all p2m tables per domain */
-    if ( d->arch.p2m )
+    struct p2m_domain *p2m = p2m_get_hostp2m(d);
+    if ( p2m )
     {
         free_cpumask_var(d->arch.p2m->dirty_cpumask);
+        if ( hap_enabled(d) && cpu_has_vmx )
+            ept_p2m_uninit(p2m);
         xfree(d->arch.p2m);
         d->arch.p2m = NULL;
     }
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 9a728b6..2d38b43 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -56,26 +56,27 @@ struct vmx_msr_state {
 
 #define EPT_DEFAULT_MT      MTRR_TYPE_WRBACK
 
-struct vmx_domain {
-    unsigned long apic_access_mfn;
+struct ept_data{
     union {
-        struct {
+    struct {
             u64 ept_mt :3,
                 ept_wl :3,
                 rsvd   :6,
                 asr    :52;
         };
         u64 eptp;
-    } ept_control;
-    cpumask_var_t ept_synced;
+    };
+    cpumask_var_t synced_mask;
+};
+
+struct vmx_domain {
+    unsigned long apic_access_mfn;
 };
 
-#define ept_get_wl(d)   \
-    ((d)->arch.hvm_domain.vmx.ept_control.ept_wl)
-#define ept_get_asr(d)  \
-    ((d)->arch.hvm_domain.vmx.ept_control.asr)
-#define ept_get_eptp(d) \
-    ((d)->arch.hvm_domain.vmx.ept_control.eptp)
+#define ept_get_wl(ept)   ((ept)->ept_wl)
+#define ept_get_asr(ept)  ((ept)->asr)
+#define ept_get_eptp(ept) ((ept)->eptp)
+#define ept_get_synced_mask(ept) ((ept)->synced_mask)
 
 struct arch_vmx_struct {
     /* Virtual address of VMCS. */
diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h
b/xen/include/asm-x86/hvm/vmx/vmx.h
index feaaa80..2600694 100644
--- a/xen/include/asm-x86/hvm/vmx/vmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vmx.h
@@ -360,7 +360,7 @@ static inline void ept_sync_all(void)
     __invept(INVEPT_ALL_CONTEXT, 0, 0);
 }
 
-void ept_sync_domain(struct domain *d);
+void ept_sync_domain(struct p2m_domain *p2m);
 
 static inline void vpid_sync_vcpu_gva(struct vcpu *v, unsigned long gva)
 {
@@ -422,12 +422,18 @@ void vmx_get_segment_register(struct vcpu *, enum
x86_segment,
 void vmx_inject_extint(int trap);
 void vmx_inject_nmi(void);
 
-void ept_p2m_init(struct p2m_domain *p2m);
+int ept_p2m_init(struct p2m_domain *p2m);
+void ept_p2m_uninit(struct p2m_domain *p2m);
+
 void ept_walk_table(struct domain *d, unsigned long gfn);
 void setup_ept_dump(void);
 
 void update_guest_eip(void);
 
+int alloc_p2m_hap_data(struct p2m_domain *p2m);
+void free_p2m_hap_data(struct p2m_domain *p2m);
+void p2m_init_hap_data(struct p2m_domain *p2m);
+
 /* EPT violation qualifications definitions */
 #define _EPT_READ_VIOLATION         0
 #define EPT_READ_VIOLATION          (1UL<<_EPT_READ_VIOLATION)
diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
index ce26594..b6a84b6 100644
--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -277,6 +277,10 @@ struct p2m_domain {
         mm_lock_t        lock;         /* Locking of private pod structs,   *
                                         * not relying on the p2m lock.      */
     } pod;
+    union {
+        struct ept_data ept;
+        /* NPT-equivalent structure could be added here. */
+    };
 };
 
 /* get host p2m table */
-- 
1.7.1

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 05/10] nEPT: Try to enable EPT paging for L2 guest.

From: Zhang Xiantao <xiantao.zhang@intel.com>

Once found EPT is enabled by L1 VMM, enabled nested EPT support
for L2 guest.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/hvm/vmx/vmx.c         |   16 +++++++++--
 xen/arch/x86/hvm/vmx/vvmx.c        |   48 +++++++++++++++++++++++++++--------
 xen/include/asm-x86/hvm/vmx/vvmx.h |    5 +++-
 3 files changed, 54 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d74aae0..ed8d532 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -1461,6 +1461,7 @@ static struct hvm_function_table __read_mostly
vmx_function_table = {
     .nhvm_vcpu_guestcr3   = nvmx_vcpu_guestcr3,
     .nhvm_vcpu_p2m_base   = nvmx_vcpu_eptp_base,
     .nhvm_vcpu_asid       = nvmx_vcpu_asid,
+    .nhvm_vmcx_hap_enabled = nvmx_ept_enabled,
     .nhvm_vmcx_guest_intercepts_trap = nvmx_intercepts_exception,
     .nhvm_vcpu_vmexit_trap = nvmx_vmexit_trap,
     .nhvm_intr_blocked    = nvmx_intr_blocked,
@@ -2003,6 +2004,7 @@ static void ept_handle_violation(unsigned long
qualification, paddr_t gpa)
     unsigned long gla, gfn = gpa >> PAGE_SHIFT;
     mfn_t mfn;
     p2m_type_t p2mt;
+    int ret;
     struct domain *d = current->domain;
 
     if ( tb_init_done )
@@ -2017,18 +2019,26 @@ static void ept_handle_violation(unsigned long
qualification, paddr_t gpa)
         _d.gpa = gpa;
         _d.qualification = qualification;
         _d.mfn = mfn_x(get_gfn_query_unlocked(d, gfn, &_d.p2mt));
-        
+
         __trace_var(TRC_HVM_NPF, 0, sizeof(_d), &_d);
     }
 
-    if ( hvm_hap_nested_page_fault(gpa,
+    ret = hvm_hap_nested_page_fault(gpa,
                                    qualification & EPT_GLA_VALID       ? 1
: 0,
                                    qualification & EPT_GLA_VALID
                                      ? __vmread(GUEST_LINEAR_ADDRESS) : ~0ull,
                                    qualification & EPT_READ_VIOLATION  ? 1
: 0,
                                    qualification & EPT_WRITE_VIOLATION ? 1
: 0,
-                                   qualification & EPT_EXEC_VIOLATION  ? 1
: 0) )
+                                   qualification & EPT_EXEC_VIOLATION  ? 1
: 0);
+    switch ( ret ) {
+    case 0:         // Unhandled L1 EPT violation
+        break;
+    case 1:         // This violation is handled completly
         return;
+    case -1:        // This vioaltion should be injected to L1 VMM
+        vcpu_nestedhvm(current).nv_vmexit_pending = 1;
+        return;
+    }
 
     /* Everything else is an error. */
     mfn = get_gfn_query_unlocked(d, gfn, &p2mt);
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index f9e620c..2ae6f6a 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -41,6 +41,7 @@ int nvmx_vcpu_initialise(struct vcpu *v)
         gdprintk(XENLOG_ERR, "nest: allocation for shadow vmcs
failed\n");
 	goto out;
     }
+    nvmx->ept.enabled = 0;
     nvmx->vmxon_region_pa = 0;
     nvcpu->nv_vvmcx = NULL;
     nvcpu->nv_vvmcxaddr = VMCX_EADDR;
@@ -96,9 +97,11 @@ uint64_t nvmx_vcpu_guestcr3(struct vcpu *v)
 
 uint64_t nvmx_vcpu_eptp_base(struct vcpu *v)
 {
-    /* TODO */
-    ASSERT(0);
-    return 0;
+    uint64_t eptp_base;
+    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+
+    eptp_base = __get_vvmcs(nvcpu->nv_vvmcx, EPT_POINTER);
+    return eptp_base & PAGE_MASK; 
 }
 
 uint32_t nvmx_vcpu_asid(struct vcpu *v)
@@ -108,6 +111,13 @@ uint32_t nvmx_vcpu_asid(struct vcpu *v)
     return 0;
 }
 
+bool_t nvmx_ept_enabled(struct vcpu *v)
+{
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+
+    return !!(nvmx->ept.enabled);
+}
+
 static const enum x86_segment sreg_to_index[] = {
     [VMX_SREG_ES] = x86_seg_es,
     [VMX_SREG_CS] = x86_seg_cs,
@@ -503,14 +513,16 @@ void nvmx_update_exec_control(struct vcpu *v, u32
host_cntrl)
 }
 
 void nvmx_update_secondary_exec_control(struct vcpu *v,
-                                            unsigned long value)
+                                            unsigned long host_cntrl)
 {
     u32 shadow_cntrl;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 
     shadow_cntrl = __get_vvmcs(nvcpu->nv_vvmcx, SECONDARY_VM_EXEC_CONTROL);
-    shadow_cntrl |= value;
-    set_shadow_control(v, SECONDARY_VM_EXEC_CONTROL, shadow_cntrl);
+    nvmx->ept.enabled = !!(shadow_cntrl & SECONDARY_EXEC_ENABLE_EPT);
+    shadow_cntrl |= host_cntrl;
+    __vmwrite(SECONDARY_VM_EXEC_CONTROL, shadow_cntrl);
 }
 
 static void nvmx_update_pin_control(struct vcpu *v, unsigned long host_cntrl)
@@ -818,6 +830,17 @@ static void load_shadow_guest_state(struct vcpu *v)
     /* TODO: CR3 target control */
 }
 
+
+static uint64_t get_shadow_eptp(struct vcpu *v)
+{
+    uint64_t np2m_base = nvmx_vcpu_eptp_base(v);
+    struct p2m_domain *p2m = p2m_get_nestedp2m(v, np2m_base);
+    struct ept_data *ept = &p2m->ept;
+
+    ept->asr = pagetable_get_pfn(p2m_get_pagetable(p2m));
+    return ept_get_eptp(ept);
+}
+
 static void virtual_vmentry(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
@@ -862,7 +885,10 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
     /* updating host cr0 to sync TS bit */
     __vmwrite(HOST_CR0, v->arch.hvm_vmx.host_cr0);
 
-    /* TODO: EPT_POINTER */
+    /* Setup virtual ETP for L2 guest*/
+    if ( nestedhvm_paging_mode_hap(v) )
+        __vmwrite(EPT_POINTER, get_shadow_eptp(v));
+
 }
 
 static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs)
@@ -915,8 +941,8 @@ static void sync_vvmcs_ro(struct vcpu *v)
     /* Adjust exit_reason/exit_qualifciation for violation case */
     if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) =                
EXIT_REASON_EPT_VIOLATION ) {
-        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept_exit.exit_qual);
-        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept_exit.exit_reason);
+        __set_vvmcs(vvmcs, EXIT_QUALIFICATION, nvmx->ept.exit_qual);
+        __set_vvmcs(vvmcs, VM_EXIT_REASON, nvmx->ept.exit_reason);
     }
 }
 
@@ -1480,8 +1506,8 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa,
paddr_t *L1_gpa,
         case EPT_TRANSLATE_VIOLATION:
         case EPT_TRANSLATE_MISCONFIG:
             rc = NESTEDHVM_PAGEFAULT_INJECT;
-            nvmx->ept_exit.exit_reason = exit_reason;
-            nvmx->ept_exit.exit_qual = exit_qual;
+            nvmx->ept.exit_reason = exit_reason;
+            nvmx->ept.exit_qual = exit_qual;
             break;
         case EPT_TRANSLATE_RETRY:
             rc = NESTEDHVM_PAGEFAULT_RETRY;
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 245fddb..3114ec0 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -33,9 +33,10 @@ struct nestedvmx {
         u32           error_code;
     } intr;
     struct {
+        bool_t   enabled;
         uint32_t exit_reason;
         uint32_t exit_qual;
-    } ept_exit;
+    } ept;
 };
 
 #define vcpu_2_nvmx(v)	(vcpu_nestedhvm(v).u.nvmx)
@@ -110,6 +111,8 @@ int nvmx_intercepts_exception(struct vcpu *v,
                               unsigned int trap, int error_code);
 void nvmx_domain_relinquish_resources(struct domain *d);
 
+bool_t nvmx_ept_enabled(struct vcpu *v);
+
 int nvmx_handle_vmxon(struct cpu_user_regs *regs);
 int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
 
-- 
1.7.1

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode

From: Zhang Xiantao <xiantao.zhang@intel.com>

For PAE L2 guest, GUEST_DPPTR registers needs to be synced for each virtual
vmentry.
Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c |    9 ++++++++-
 1 files changed, 8 insertions(+), 1 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 2ae6f6a..1f7de7a 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -826,7 +826,14 @@ static void load_shadow_guest_state(struct vcpu *v)
     vvmcs_to_shadow(vvmcs, CR0_GUEST_HOST_MASK);
     vvmcs_to_shadow(vvmcs, CR4_GUEST_HOST_MASK);
 
-    /* TODO: PDPTRs for nested ept */
+    if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) &&
+                    (v->arch.hvm_vcpu.guest_efer & EFER_LMA) ) {
+	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR0);
+	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR1);
+	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR2);
+	    vvmcs_to_shadow(vvmcs, GUEST_PDPTR3);
+    }
+
     /* TODO: CR3 target control */
 }
 
-- 
1.7.1

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 07/10] nEPT: Use minimal permission for nested p2m.

From: Zhang Xiantao <xiantao.zhang@intel.com>

Emulate permission check for the nested p2m. Current solution is to
use minimal permission, and once meet permission violation in L0, then
determin whether it is caused by guest EPT or host EPT
---
 xen/arch/x86/hvm/svm/nestedsvm.c        |    2 +-
 xen/arch/x86/hvm/vmx/vvmx.c             |    4 +-
 xen/arch/x86/mm/hap/nested_ept.c        |    5 ++-
 xen/arch/x86/mm/hap/nested_hap.c        |   38 +++++++++++++++++++++++-------
 xen/include/asm-x86/hvm/hvm.h           |    2 +-
 xen/include/asm-x86/hvm/svm/nestedsvm.h |    2 +-
 xen/include/asm-x86/hvm/vmx/vvmx.h      |    6 ++--
 7 files changed, 40 insertions(+), 19 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 5dcb354..ab455a9 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -1177,7 +1177,7 @@ nsvm_vmcb_hap_enabled(struct vcpu *v)
  */
 int
 nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x)
 {
     uint32_t pfec;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 1f7de7a..b275044 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1493,7 +1493,7 @@ int nvmx_msr_write_intercept(unsigned int msr, u64
msr_content)
  */
 int
 nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x)
 {
     uint64_t exit_qual = __vmread(EXIT_QUALIFICATION);
@@ -1503,7 +1503,7 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa,
paddr_t *L1_gpa,
     uint32_t rwx_rights = (access_x << 2) | (access_w << 1) |
access_r;
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 
-    rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn,
+    rc = nept_translate_l2ga(v, L2_gpa, page_order, rwx_rights, &gfn,
p2m_acc,
                                 &exit_qual, &exit_reason);
     switch ( rc ) {
         case EPT_TRANSLATE_SUCCEED:
diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c
index c3e698c..447b5d5 100644
--- a/xen/arch/x86/mm/hap/nested_ept.c
+++ b/xen/arch/x86/mm/hap/nested_ept.c
@@ -217,8 +217,8 @@ out:
 
 int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
                         unsigned int *page_order, uint32_t rwx_acc,
-                        unsigned long *l1gfn, uint64_t *exit_qual,
-                        uint32_t *exit_reason)
+                        unsigned long *l1gfn, uint8_t *p2m_acc,
+                        uint64_t *exit_qual, uint32_t *exit_reason)
 {
     uint32_t rc, rwx_bits = 0;
     ept_walk_t gw;
@@ -253,6 +253,7 @@ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
         if ( nept_permission_check(rwx_acc, rwx_bits) )
         {
             *l1gfn = gw.lxe[0].mfn;
+            *p2m_acc = (uint8_t)rwx_bits;
             break;
         }
         rc = EPT_TRANSLATE_VIOLATION;
diff --git a/xen/arch/x86/mm/hap/nested_hap.c b/xen/arch/x86/mm/hap/nested_hap.c
index 6d1264b..84dbf15 100644
--- a/xen/arch/x86/mm/hap/nested_hap.c
+++ b/xen/arch/x86/mm/hap/nested_hap.c
@@ -142,12 +142,12 @@ nestedhap_fix_p2m(struct vcpu *v, struct p2m_domain *p2m,
  */
 static int
 nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x)
 {
     ASSERT(hvm_funcs.nhvm_hap_walk_L1_p2m);
 
-    return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order,
+    return hvm_funcs.nhvm_hap_walk_L1_p2m(v, L2_gpa, L1_gpa, page_order,
p2m_acc,
         access_r, access_w, access_x);
 }
 
@@ -158,16 +158,15 @@ nestedhap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa,
paddr_t *L1_gpa,
  */
 static int
 nestedhap_walk_L0_p2m(struct p2m_domain *p2m, paddr_t L1_gpa, paddr_t *L0_gpa,
-                      p2m_type_t *p2mt,
+                      p2m_type_t *p2mt, p2m_access_t *p2ma,
                       unsigned int *page_order,
                       bool_t access_r, bool_t access_w, bool_t access_x)
 {
     mfn_t mfn;
-    p2m_access_t p2ma;
     int rc;
 
     /* walk L0 P2M table */
-    mfn = get_gfn_type_access(p2m, L1_gpa >> PAGE_SHIFT, p2mt, &p2ma,
+    mfn = get_gfn_type_access(p2m, L1_gpa >> PAGE_SHIFT, p2mt, p2ma, 
                               0, page_order);
 
     rc = NESTEDHVM_PAGEFAULT_MMIO;
@@ -206,12 +205,14 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t
*L2_gpa,
     struct p2m_domain *p2m, *nested_p2m;
     unsigned int page_order_21, page_order_10, page_order_20;
     p2m_type_t p2mt_10;
+    p2m_access_t p2ma_10 = p2m_access_rwx;
+    uint8_t p2ma_21;
 
     p2m = p2m_get_hostp2m(d); /* L0 p2m */
     nested_p2m = p2m_get_nestedp2m(v, nhvm_vcpu_p2m_base(v));
 
     /* walk the L1 P2M table */
-    rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21,
+    rv = nestedhap_walk_L1_p2m(v, *L2_gpa, &L1_gpa, &page_order_21,
&p2ma_21,
         access_r, access_w, access_x);
 
     /* let caller to handle these two cases */
@@ -229,7 +230,7 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t
*L2_gpa,
 
     /* ==> we have to walk L0 P2M */
     rv = nestedhap_walk_L0_p2m(p2m, L1_gpa, &L0_gpa,
-        &p2mt_10, &page_order_10,
+        &p2mt_10, &p2ma_10, &page_order_10,
         access_r, access_w, access_x);
 
     /* let upper level caller to handle these two cases */
@@ -250,10 +251,29 @@ nestedhvm_hap_nested_page_fault(struct vcpu *v, paddr_t
*L2_gpa,
 
     page_order_20 = min(page_order_21, page_order_10);
 
+    ASSERT(p2ma_10 <= p2m_access_n2rwx);
+    /*NOTE: if assert fails, needs to handle new access type here */
+
+    switch ( p2ma_10 ) {
+    case p2m_access_n ... p2m_access_rwx:
+        break;
+    case p2m_access_rx2rw:
+        p2ma_10 = p2m_access_rx;
+        break;
+    case p2m_access_n2rwx:
+        p2ma_10 = p2m_access_n;
+        break;
+    default:
+        p2ma_10 = p2m_access_n;
+        /* For safety, remove all permissions. */
+        gdprintk(XENLOG_ERR, "Unhandled p2m access type:%d\n",
p2ma_10);
+    }
+    /* Use minimal permission for nested p2m. */
+    p2ma_10 &= (p2m_access_t)p2ma_21;
+
     /* fix p2m_get_pagetable(nested_p2m) */
     nestedhap_fix_p2m(v, nested_p2m, *L2_gpa, L0_gpa, page_order_20,
-        p2mt_10,
-        p2m_access_rwx /* FIXME: Should use minimum permission. */);
+        p2mt_10, p2ma_10);
 
     return NESTEDHVM_PAGEFAULT_DONE;
 }
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 80f07e9..889e3c9 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -186,7 +186,7 @@ struct hvm_function_table {
 
     /*Walk nested p2m  */
     int (*nhvm_hap_walk_L1_p2m)(struct vcpu *v, paddr_t L2_gpa, paddr_t
*L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x);
 };
 
diff --git a/xen/include/asm-x86/hvm/svm/nestedsvm.h
b/xen/include/asm-x86/hvm/svm/nestedsvm.h
index 0c90f30..748cc04 100644
--- a/xen/include/asm-x86/hvm/svm/nestedsvm.h
+++ b/xen/include/asm-x86/hvm/svm/nestedsvm.h
@@ -134,7 +134,7 @@ void svm_vmexit_do_clgi(struct cpu_user_regs *regs, struct
vcpu *v);
 void svm_vmexit_do_stgi(struct cpu_user_regs *regs, struct vcpu *v);
 bool_t nestedsvm_gif_isset(struct vcpu *v);
 int nsvm_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x);
 
 #define NSVM_INTR_NOTHANDLED     3
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 3114ec0..e35e425 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -125,7 +125,7 @@ int nvmx_handle_vmxoff(struct cpu_user_regs *regs);
 
 int
 nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa,
-                      unsigned int *page_order,
+                      unsigned int *page_order, uint8_t *p2m_acc,
                       bool_t access_r, bool_t access_w, bool_t access_x);
 /*
  * Virtual VMCS layout
@@ -208,7 +208,7 @@ int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
 
 int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, 
                         unsigned int *page_order, uint32_t rwx_acc,
-                        unsigned long *l1gfn, uint64_t *exit_qual,
-                        uint32_t *exit_reason);
+                        unsigned long *l1gfn, uint8_t *p2m_acc,
+                        uint64_t *exit_qual, uint32_t *exit_reason);
 #endif /* __ASM_X86_HVM_VVMX_H__ */
 
-- 
1.7.1

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 08/10] nEPT: handle invept instruction from L1 VMM

From: Zhang Xiantao <xiantao.zhang@intel.com>

Add the INVEPT instruction emulation logic.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/hvm/vmx/vmx.c         |    6 ++++-
 xen/arch/x86/hvm/vmx/vvmx.c        |   39 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vvmx.h |    1 +
 3 files changed, 45 insertions(+), 1 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index ed8d532..94cac17 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2573,10 +2573,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             update_guest_eip();
         break;
 
+    case EXIT_REASON_INVEPT:
+        if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
+            update_guest_eip();
+        break;
+
     case EXIT_REASON_MWAIT_INSTRUCTION:
     case EXIT_REASON_MONITOR_INSTRUCTION:
     case EXIT_REASON_GETSEC:
-    case EXIT_REASON_INVEPT:
     case EXIT_REASON_INVVPID:
         /*
          * We should never exit on GETSEC because CR4.SMXE is always 0 when
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index b275044..8346387 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1356,6 +1356,45 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs)
     return X86EMUL_OKAY;
 }
 
+int nvmx_handle_invept(struct cpu_user_regs *regs)
+{
+    struct vmx_inst_decoded decode;
+    unsigned long eptp;
+    u64 inv_type;
+
+    if ( !cpu_has_vmx_ept )
+        return X86EMUL_EXCEPTION;
+
+    if ( decode_vmx_inst(regs, &decode, &eptp, 0)
+             != X86EMUL_OKAY )
+        return X86EMUL_EXCEPTION;
+
+    inv_type = reg_read(regs, decode.reg2);
+    gdprintk(XENLOG_DEBUG,"inv_type:%ld, eptp:%lx\n", inv_type,
eptp);
+
+    switch ( inv_type ) {
+    case INVEPT_SINGLE_CONTEXT:
+    {
+        struct p2m_domain *p2m = vcpu_nestedhvm(current).nv_p2m;
+        if ( p2m )
+        {
+	        p2m_flush(current, p2m);
+            ept_sync_domain(p2m);
+        }
+        break;
+    }
+    case INVEPT_ALL_CONTEXT:
+        p2m_flush_nestedp2m(current->domain);
+        __invept(INVEPT_ALL_CONTEXT, 0, 0);
+        break;
+    default:
+        return X86EMUL_EXCEPTION;
+    }
+    vmreturn(regs, VMSUCCEED);
+    return X86EMUL_OKAY;
+}
+
+
 #define __emul_value(enable1, default1) \
     ((enable1 | default1) << 32 | (default1))
 
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index e35e425..03ab987 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -191,6 +191,7 @@ int nvmx_handle_vmread(struct cpu_user_regs *regs);
 int nvmx_handle_vmwrite(struct cpu_user_regs *regs);
 int nvmx_handle_vmresume(struct cpu_user_regs *regs);
 int nvmx_handle_vmlaunch(struct cpu_user_regs *regs);
+int nvmx_handle_invept(struct cpu_user_regs *regs);
 int nvmx_msr_read_intercept(unsigned int msr,
                                 u64 *msr_content);
 int nvmx_msr_write_intercept(unsigned int msr,
-- 
1.7.1

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 09/10] nVMX: virutalize VPID capability to nested VMM.

From: Zhang Xiantao <xiantao.zhang@intel.com>

Virtualize VPID for the nested vmm, use host''s VPID
to emualte guest''s VPID. For each virtual vmentry, if
guest''v vpid is changed, allocate a new host VPID for
L2 guest.

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
Acked-by: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/hvm/vmx/vmx.c         |   11 ++++++-
 xen/arch/x86/hvm/vmx/vvmx.c        |   56 ++++++++++++++++++++++++++++++++++-
 xen/include/asm-x86/hvm/vmx/vvmx.h |    2 +
 3 files changed, 65 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 94cac17..0e479f8 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2578,10 +2578,14 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
             update_guest_eip();
         break;
 
+    case EXIT_REASON_INVVPID:
+        if ( nvmx_handle_invvpid(regs) == X86EMUL_OKAY )
+            update_guest_eip();
+        break;
+
     case EXIT_REASON_MWAIT_INSTRUCTION:
     case EXIT_REASON_MONITOR_INSTRUCTION:
     case EXIT_REASON_GETSEC:
-    case EXIT_REASON_INVVPID:
         /*
          * We should never exit on GETSEC because CR4.SMXE is always 0 when
          * running in guest context, and the CPU checks that before getting
@@ -2699,8 +2703,11 @@ void vmx_vmenter_helper(void)
 
     if ( !cpu_has_vmx_vpid )
         goto out;
+    if ( nestedhvm_vcpu_in_guestmode(curr) )
+        p_asid = &vcpu_nestedhvm(curr).nv_n2asid;
+    else
+        p_asid = &curr->arch.hvm_vcpu.n1asid;
 
-    p_asid = &curr->arch.hvm_vcpu.n1asid;
     old_asid = p_asid->asid;
     need_flush = hvm_asid_handle_vmenter(p_asid);
     new_asid = p_asid->asid;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 8346387..0e1a5ee 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -42,6 +42,7 @@ int nvmx_vcpu_initialise(struct vcpu *v)
 	goto out;
     }
     nvmx->ept.enabled = 0;
+    nvmx->guest_vpid = 0;
     nvmx->vmxon_region_pa = 0;
     nvcpu->nv_vvmcx = NULL;
     nvcpu->nv_vvmcxaddr = VMCX_EADDR;
@@ -848,6 +849,16 @@ static uint64_t get_shadow_eptp(struct vcpu *v)
     return ept_get_eptp(ept);
 }
 
+static bool_t nvmx_vpid_enabled(struct nestedvcpu *nvcpu)
+{
+    uint32_t second_cntl;
+
+    second_cntl = __get_vvmcs(nvcpu->nv_vvmcx, SECONDARY_VM_EXEC_CONTROL);
+    if ( second_cntl & SECONDARY_EXEC_ENABLE_VPID )
+        return 1;
+    return 0;
+}
+
 static void virtual_vmentry(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
@@ -896,6 +907,18 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
     if ( nestedhvm_paging_mode_hap(v) )
         __vmwrite(EPT_POINTER, get_shadow_eptp(v));
 
+    /* nested VPID support! */
+    if ( cpu_has_vmx_vpid && nvmx_vpid_enabled(nvcpu) )
+    {
+        struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
+        uint32_t new_vpid =  __get_vvmcs(vvmcs, VIRTUAL_PROCESSOR_ID);
+        if ( nvmx->guest_vpid != new_vpid )
+        {
+            hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(v).nv_n2asid);
+            nvmx->guest_vpid = new_vpid;
+        }
+    }
+
 }
 
 static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs)
@@ -1187,7 +1210,7 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
     if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR )
     {
         vmreturn (regs, VMFAIL_INVALID);
-        return X86EMUL_OKAY;        
+        return X86EMUL_OKAY;
     }
 
     launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
@@ -1370,7 +1393,6 @@ int nvmx_handle_invept(struct cpu_user_regs *regs)
         return X86EMUL_EXCEPTION;
 
     inv_type = reg_read(regs, decode.reg2);
-    gdprintk(XENLOG_DEBUG,"inv_type:%ld, eptp:%lx\n", inv_type,
eptp);
 
     switch ( inv_type ) {
     case INVEPT_SINGLE_CONTEXT:
@@ -1402,6 +1424,36 @@ int nvmx_handle_invept(struct cpu_user_regs *regs)
     (((__emul_value(enable1, default1) & host_value) & (~0ul <<
32)) | \
     ((uint32_t)(__emul_value(enable1, default1) | host_value)))
 
+int nvmx_handle_invvpid(struct cpu_user_regs *regs)
+{
+    struct vmx_inst_decoded decode;
+    unsigned long vpid;
+    u64 inv_type;
+
+    if ( !cpu_has_vmx_vpid )
+        return X86EMUL_EXCEPTION;
+
+    if ( decode_vmx_inst(regs, &decode, &vpid, 0) != X86EMUL_OKAY )
+        return X86EMUL_EXCEPTION;
+
+    inv_type = reg_read(regs, decode.reg2);
+    gdprintk(XENLOG_DEBUG,"inv_type:%ld, vpid:%lx\n", inv_type,
vpid);
+
+    switch ( inv_type ) {
+        /* Just invalidate all tlb entries for all types! */
+        case INVVPID_INDIVIDUAL_ADDR:
+        case INVVPID_SINGLE_CONTEXT:
+        case INVVPID_ALL_CONTEXT:
+            hvm_asid_flush_vcpu_asid(&vcpu_nestedhvm(current).nv_n2asid);
+            break;
+        default:
+            return X86EMUL_EXCEPTION;
+    }
+    vmreturn(regs, VMSUCCEED);
+
+    return X86EMUL_OKAY;
+}
+
 /*
  * Capability reporting
  */
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 03ab987..af702c4 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -37,6 +37,7 @@ struct nestedvmx {
         uint32_t exit_reason;
         uint32_t exit_qual;
     } ept;
+    uint32_t guest_vpid;
 };
 
 #define vcpu_2_nvmx(v)	(vcpu_nestedhvm(v).u.nvmx)
@@ -192,6 +193,7 @@ int nvmx_handle_vmwrite(struct cpu_user_regs *regs);
 int nvmx_handle_vmresume(struct cpu_user_regs *regs);
 int nvmx_handle_vmlaunch(struct cpu_user_regs *regs);
 int nvmx_handle_invept(struct cpu_user_regs *regs);
+int nvmx_handle_invvpid(struct cpu_user_regs *regs);
 int nvmx_msr_read_intercept(unsigned int msr,
                                 u64 *msr_content);
 int nvmx_msr_write_intercept(unsigned int msr,
-- 
1.7.1

Xiantao Zhang

2012-Dec-20 15:43 UTC

head link

[PATCH v3 10/10] nEPT: expost EPT & VPID capablities to L1 VMM

From: Zhang Xiantao <xiantao.zhang@intel.com>

Expose EPT''s  and VPID ''s basic features to L1 VMM.
For EPT, no EPT A/D bit feature supported.
For VPID, exposes all features to L1 VMM

Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c        |   17 +++++++++++++++--
 xen/arch/x86/mm/hap/nested_ept.c   |   19 ++++++++++++-------
 xen/include/asm-x86/hvm/vmx/vvmx.h |    2 ++
 3 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 0e1a5ee..241e295 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -1484,6 +1484,8 @@ int nvmx_msr_read_intercept(unsigned int msr, u64
*msr_content)
         break;
     case MSR_IA32_VMX_PROCBASED_CTLS:
     case MSR_IA32_VMX_TRUE_PROCBASED_CTLS:
+    {
+        u32 default1_bits = VMX_PROCBASED_CTLS_DEFAULT1;
         /* 1-seetings */
         data = CPU_BASED_HLT_EXITING |
                CPU_BASED_VIRTUAL_INTR_PENDING |
@@ -1505,12 +1507,20 @@ int nvmx_msr_read_intercept(unsigned int msr, u64
*msr_content)
                CPU_BASED_PAUSE_EXITING |
                CPU_BASED_RDPMC_EXITING |
                CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
-        data = gen_vmx_msr(data, VMX_PROCBASED_CTLS_DEFAULT1, host_data);
+
+        if ( msr == MSR_IA32_VMX_TRUE_PROCBASED_CTLS )
+            default1_bits &= ~(CPU_BASED_CR3_LOAD_EXITING |
+                    CPU_BASED_CR3_STORE_EXITING | CPU_BASED_INVLPG_EXITING);
+
+        data = gen_vmx_msr(data, default1_bits, host_data);
         break;
+    }
     case MSR_IA32_VMX_PROCBASED_CTLS2:
         /* 1-seetings */
         data = SECONDARY_EXEC_DESCRIPTOR_TABLE_EXITING |
-               SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
+               SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
+               SECONDARY_EXEC_ENABLE_VPID |
+               SECONDARY_EXEC_ENABLE_EPT;
         data = gen_vmx_msr(data, 0, host_data);
         break;
     case MSR_IA32_VMX_EXIT_CTLS:
@@ -1563,6 +1573,9 @@ int nvmx_msr_read_intercept(unsigned int msr, u64
*msr_content)
     case MSR_IA32_VMX_MISC:
         gdprintk(XENLOG_WARNING, "VMX MSR %x not fully supported
yet.\n", msr);
         break;
+    case MSR_IA32_VMX_EPT_VPID_CAP:
+        data = nept_get_ept_vpid_cap();
+        break;
     default:
         r = 0;
         break;
diff --git a/xen/arch/x86/mm/hap/nested_ept.c b/xen/arch/x86/mm/hap/nested_ept.c
index 447b5d5..b1738fa 100644
--- a/xen/arch/x86/mm/hap/nested_ept.c
+++ b/xen/arch/x86/mm/hap/nested_ept.c
@@ -43,12 +43,15 @@
 #define EPT_MUST_RSV_BITS (((1ull << PADDR_BITS) -1) & \
                      ~((1ull << paddr_bits) - 1))
 
-/*
- *TODO: Just leave it as 0 here for compile pass, will
- * define real capabilities in the subsequent patches.
- */
-#define NEPT_VPID_CAP_BITS 0
-
+#define NEPT_VPID_CAP_BITS  \
+        (VMX_EPT_INVEPT_ALL_CONTEXT | VMX_EPT_INVEPT_SINGLE_CONTEXT |   \
+        VMX_EPT_INVEPT_INSTRUCTION | VMX_EPT_SUPERPAGE_1GB |            \
+        VMX_EPT_SUPERPAGE_2MB | VMX_EPT_MEMORY_TYPE_WB |                \
+        VMX_EPT_MEMORY_TYPE_UC | VMX_EPT_WALK_LENGTH_4_SUPPORTED |      \
+        VMX_EPT_EXEC_ONLY_SUPPORTED | VMX_VPID_INVVPID_INSTRUCTION |    \
+        VMX_VPID_INVVPID_INDIVIDUAL_ADDR |                              \
+        VMX_VPID_INVVPID_SINGLE_CONTEXT | VMX_VPID_INVVPID_ALL_CONTEXT |\
+        VMX_VPID_INVVPID_SINGLE_CONTEXT_RETAINING_GLOBAL)
 
 #define NEPT_1G_ENTRY_FLAG (1 << 11)
 #define NEPT_2M_ENTRY_FLAG (1 << 10)
@@ -131,7 +134,9 @@ static bool_t nept_non_present_check(ept_entry_t e)
 
 uint64_t nept_get_ept_vpid_cap(void)
 {
-    return NEPT_VPID_CAP_BITS;
+    if ( cpu_has_vmx_ept && cpu_has_vmx_vpid )
+        return NEPT_VPID_CAP_BITS;
+    return 0;
 }
 
 static int ept_lvl_table_offset(unsigned long gpa, int lvl)
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index af702c4..ea33ed0 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -209,6 +209,8 @@ u64 nvmx_get_tsc_offset(struct vcpu *v);
 int nvmx_n2_vmexit_handler(struct cpu_user_regs *regs,
                           unsigned int exit_reason);
 
+uint64_t nept_get_ept_vpid_cap(void);
+
 int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga, 
                         unsigned int *page_order, uint32_t rwx_acc,
                         unsigned long *l1gfn, uint8_t *p2m_acc,
-- 
1.7.1

Zhang, Xiantao

2012-Dec-21 01:14 UTC

head link

Re: [PATCH v3 08/10] nEPT: handle invept instruction from L1 VMM

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Thursday, December 20, 2012 5:55 PM
> To: Zhang, Xiantao
> Cc: Dong, Eddie; Nakajima, Jun; xen-devel@lists.xen.org; keir@xen.org;
> tim@xen.org
> Subject: Re: [PATCH v3 08/10] nEPT: handle invept instruction from L1 VMM
> 
> >>> On 20.12.12 at 16:43, Xiantao Zhang
<xiantao.zhang@intel.com> wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -2573,10 +2573,14 @@ void vmx_vmexit_handler(struct
> cpu_user_regs *regs)
> >              update_guest_eip();
> >          break;
> >
> > +    case EXIT_REASON_INVEPT:
> > +        if ( nvmx_handle_invept(regs) == X86EMUL_OKAY )
> > +            update_guest_eip();
> > +        break;
> > +
> 
> I realize that you''re just copying code written the same way
elsewhere, but:
> What if (here and elsewhere) X86EMUL_OKAY is not being returned (e.g. in
> the non-nested case)? Without the nested VMX code, all of these would
> have ended up at the default case (crashing the guest). Iiuc the correct
action
> would be to inject an exception at least when X86EMUL_EXCEPTION is being
> returned here - whether that''s done here or (perhaps better, as
only it can
> know _what_ exception to inject) by the callee is another thing to decide.
> 
> Also, at the example of nvmx_handle_vmclear() I see that it produces
> exceptions in most of the cases, but I think all of the related code needs
> auditing that things are being handled consistently _and_ completely
> (constructs like
> 
>     if ( ... != X86EMUL_OKAY )
>         return X86EMUL_EXCEPTION;
> 
> are definitely not okay, as there are further X86EMUL_* values that can
> occur; if you know only the two must ever occur at a given place, ASSERT()
so,
> making things clear to the reader without having to follow all code paths).Hi, Jan
I think it is better that the callee should be responsible for handling the
exception before returning X86EMUL_EXCEPTION to its caller.
so for the newly-introduced two functions,  I will add the logic to handle its
possible exceptions before its return.
Xiantao

Zhang, Xiantao

2012-Dec-21 01:27 UTC

head link

Re: [PATCH v3 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM

Oops! Seems my dev machine has wrong date setting. Anyway,  it tells us the End
of the World is not real, because we have some mails sent out after the  END. :)
Xiantao
> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Thursday, December 20, 2012 9:56 PM
> To: Zhang, Xiantao
> Cc: xen-devel@lists.xen.org; keir@xen.org; Nakajima, Jun; Dong, Eddie;
> JBeulich@suse.com
> Subject: Re: [Xen-devel] [PATCH v3 00/10] Nested VMX: Add virtual EPT &
> VPID support to L1 VMM
> 
> Hi,
> 
> At 23:43 +0800 on 20 Dec (1356047021), Xiantao Zhang wrote:
> > Received: from hax-build.sh.intel.com ([10.239.48.28])
> >         by fmsmga001.fm.intel.com with ESMTP; 19 Dec 2012 19:59:04
-0800
> > From: Xiantao Zhang <xiantao.zhang@intel.com>
> > To: xen-devel@lists.xen.org
> > Date: Thu, 20 Dec 2012 23:43:41 +0800
> 
> I think the clock on your computer or your email client is confused:
> your email is datestamped about 12 hours in the future.
> 
> Tim.

Zhang, Xiantao

2012-Dec-24 09:01 UTC

head link

Re: [PATCH v3 03/10] nested_ept: Implement guest ept''s walker

> -----Original Message-----
> From: Tim Deegan [mailto:tim@xen.org]
> Sent: Thursday, December 20, 2012 8:52 PM
> To: Zhang, Xiantao
> Cc: xen-devel@lists.xen.org; keir@xen.org; Nakajima, Jun; Dong, Eddie;
> JBeulich@suse.com
> Subject: Re: [Xen-devel] [PATCH v3 03/10] nested_ept: Implement guest
> ept''s walker
> 
> At 23:43 +0800 on 20 Dec (1356047024), Xiantao Zhang wrote:
> > From: Zhang Xiantao <xiantao.zhang@intel.com>
> >
> > Implment guest EPT PT walker, some logic is based on shadow''s
ia32e PT
> > walker. During the PT walking, if the target pages are not in memory,
> > use RETRY mechanism and get a chance to let the target page back.
> >
> > Signed-off-by: Zhang Xiantao <xiantao.zhang@intel.com>
> 
> This is much nicer than v1, thanks.  I have some comments below, and the
> whole thing needs to be checked for whitespace mangling.
> 
> > +static bool_t nept_rwx_bits_check(ept_entry_t e) {
> > +    /*write only or write/execute only*/
> > +    uint8_t rwx_bits = e.epte & EPTE_RWX_MASK;
> > +
> > +    if ( rwx_bits == ept_access_w || rwx_bits == ept_access_wx )
> > +        return 1;
> > +
> > +    if ( rwx_bits == ept_access_x && !(NEPT_VPID_CAP_BITS
&
> > +                        VMX_EPT_EXEC_ONLY_SUPPORTED))
> 
> In a later patch you add VMX_EPT_EXEC_ONLY_SUPPORTED to this field.
> How can that work when running on a CPU that doesn''t support
exec-only?
> The nested-ept tables will have exec-only mapping in them which the CPU
> will reject.
Fixed, and will consult host''s capability first. 
> > +done:
> > +    ret = EPT_TRANSLATE_SUCCEED;
> > +    goto out;
> > +
> > +map_err:
> > +    if ( rc == _PAGE_PAGED )
> > +        ret = EPT_TRANSLATE_RETRY;
> > +    else
> > +        ret = EPT_TRANSLATE_ERR_PAGE;
> 
> What does this error code mean?  The caller just retries the faulting
> instruction when it sees it, which sounds wrong.  Why not just return
> EPT_TRANSLATE_MISCONFIG if the guest uses an unmappable frame for EPT
> tables?
Okay, although this doesn''t fully follow SDM, injecting a EPT
misconfiguration in this case should be a better way instead of hanging there.
> > +    default:
> > +        rc = EPT_TRANSLATE_UNSUPPORTED;
> > +        gdprintk(XENLOG_ERR, "Unsupported ept translation
> > + type!:%d\n", rc);
> 
> Just BUG() here and get rid of EPT_TRANSLATE_UNSUPPORTED and
> NESTEDHVM_PAGEFAULT_UNHANDLED.  The function that provided rc is
> right above and we can see it hasn''t got any other return values.
Okay, this is also what the version 1 does. 
> > --- a/xen/arch/x86/mm/shadow/multi.c
> > +++ b/xen/arch/x86/mm/shadow/multi.c
> > @@ -4582,7 +4582,7 @@ static mfn_t emulate_gva_to_mfn(struct vcpu *v,
> >      /* Translate the GFN to an MFN */
> >      ASSERT(!paging_locked_by_me(v->domain));
> >      mfn = get_gfn(v->domain, _gfn(gfn), &p2mt);
> > -
> > +
> 
> This stray change should be dropped.

Dropped. 
> > +typedef enum {
> > +    ept_access_n     = 0, /* No access permissions allowed */
> > +    ept_access_r     = 1,
> > +    ept_access_w     = 2,
> > +    ept_access_rw    = 3,
> > +    ept_access_x     = 4,
> > +    ept_access_rx    = 5,
> > +    ept_access_wx    = 6,
> > +    ept_access_all   = 7,
> > +} ept_access_t;
> 
> This enum isn''t used anywhere.
Actually,  it is used in the function nept_rwx_bits_check. :)

Xiantao

Tim Deegan

2013-Jan-10 11:19 UTC

head link

Re: [PATCH v3 03/10] nested_ept: Implement guest ept''s walker

At 09:01 +0000 on 24 Dec (1356339693), Zhang, Xiantao
wrote:> > > +typedef enum {
> > > +    ept_access_n     = 0, /* No access permissions allowed */
> > > +    ept_access_r     = 1,
> > > +    ept_access_w     = 2,
> > > +    ept_access_rw    = 3,
> > > +    ept_access_x     = 4,
> > > +    ept_access_rx    = 5,
> > > +    ept_access_wx    = 6,
> > > +    ept_access_all   = 7,
> > > +} ept_access_t;
> > 
> > This enum isn''t used anywhere.
> 
> Actually,  it is used in the function nept_rwx_bits_check. :)
Oops - so it is. :)

Tim.

Xen devel - Dec 2012 - [PATCH v3 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM

Re: [PATCH v3 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode

Re: [PATCH v3 08/10] nEPT: handle invept instruction from L1 VMM

Re: [PATCH v3 09/10] nVMX: virutalize VPID capability to nested VMM.

Re: [PATCH v3 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words

Re: [PATCH v3 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode

Re: [PATCH v3 03/10] nested_ept: Implement guest ept''s walker

Re: [PATCH v3 04/10] EPT: Make ept data structure or operations neutral

Re: [PATCH v3 07/10] nEPT: Use minimal permission for nested p2m.

Re: [PATCH v3 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM

[PATCH v3 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM

[PATCH v3 01/10] nestedhap: Change hostcr3 and p2m->cr3 to meaningful words

[PATCH v3 02/10] nestedhap: Change nested p2m''s walker to vendor-specific

[PATCH v3 03/10] nested_ept: Implement guest ept''s walker

[PATCH v3 04/10] EPT: Make ept data structure or operations neutral

[PATCH v3 05/10] nEPT: Try to enable EPT paging for L2 guest.

[PATCH v3 06/10] nEPT: Sync PDPTR fields if L2 guest in PAE paging mode

[PATCH v3 07/10] nEPT: Use minimal permission for nested p2m.

[PATCH v3 08/10] nEPT: handle invept instruction from L1 VMM

[PATCH v3 09/10] nVMX: virutalize VPID capability to nested VMM.

[PATCH v3 10/10] nEPT: expost EPT & VPID capablities to L1 VMM

Re: [PATCH v3 08/10] nEPT: handle invept instruction from L1 VMM

Re: [PATCH v3 00/10] Nested VMX: Add virtual EPT & VPID support to L1 VMM

Re: [PATCH v3 03/10] nested_ept: Implement guest ept''s walker

Re: [PATCH v3 03/10] nested_ept: Implement guest ept''s walker