thr3ads.net - Xen devel - [PATCH v5 rebased 0/4] nested vmx: enable VMCS shadowing feature [Jan 2013]

If this information is useful, please help other people find it:
Share via:

Dongxiao Xu

2013-Jan-23 14:32 UTC

[PATCH v5 rebased 0/4] nested vmx: enable VMCS shadowing feature

Changes from v5 to v5 rebased:
- Rebase the patches to latest staging tree.

Changes from v4 to v5:
- Change the size for vvmcs buffer from 500 to 100.
- Store page_info in per-pCPU structure instead of MFN in per-vCPU structure
for vmread/vmwrite bitmaps.
- Define correct data type to avoid force type conversion.

Changes from v3 to v4:
- Allocate a buffer in initialization time for data exchange instead of runtime
allocation (reflect Jun''s comment).
- Move the vmread/vmwrite bitmap logic into nvmx_vcpu_initialise(), also track
MFN instead of virtual address in the vcpu global data. (reflect Jan''s
comment).

Changes from v2 to v3:
- Use pfn_to_paddr() to get the address from frame number instead of doing
shift directly.
- Remove some unnecessary initialization code and add "static" to
vmentry_fields and gpdptr_fields.
- Enable the VMREAD/VMWRITE bitmap only if nested hvm is enabled.
- Use clear_page() to set all 0 to the page instead of memset().
- Use domheap to allocate the VMREAD/VMWRITE bitmap instead of xenheap.

Changes from v1 to v2:
- Use domain_page_map_to_mfn() instead of virt_to_mfn() to get the machine
frame.
- Address other comments from Jan, including some programming format, return
value type, etc.

Latest Intel SDM introduced a new feature "VMCS shadowing" at bit 14
in
Secondary Processor-Based VM-Execution Controls for nested virtualization.

The main purpose of this feature is to reduce or eliminate the number of VM exit
for non-root VMREAD and VMWRITE. It provides the capability to link a
"virtual VMCS" with the current running VMCS, so that after VM entry,
the
non-root VMREAD and VMWRITE can get/set related data directly from/to the
"virtual VMCS" without trap and emulation.

A separate bitmap is introduced for VMREAD and VMWRITE, from which hypervisor
can control whether VMREAD/VMWRITE from/to certain VMCS field will trigger
VM exit or directly get/set data by hardware.

With the new feature introduced, all the in "virtual VMCS" need to be
operated
by VMREAD and VMWRITE because this VMCS will also be loaded into hardware.
This requires the capability to VMWRITE all the VMCS fields, including those
readonly ones. Intel SDM introduces this functionality at bit 29 in
IA32_VMX_MISC MSR.

For details, please refer to:
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html

Thanks,
Dongxiao

Dongxiao Xu (4):
nested vmx: Use a list to store the launched vvmcs for L1 VMM
nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS
shadowing
nested vmx: optimize for bulk access of virtual VMCS
nested vmx: enable VMCS shadowing feature

xen/arch/x86/hvm/vmx/vmcs.c | 48 ++++++
xen/arch/x86/hvm/vmx/vvmx.c | 324 ++++++++++++++++++++++++++++++++----
xen/include/asm-x86/hvm/vmx/vmcs.h | 23 +++-
xen/include/asm-x86/hvm/vmx/vvmx.h | 24 +++-
4 files changed, 383 insertions(+), 36 deletions(-)

Dongxiao Xu

2013-Jan-23 14:32 UTC

head link

[PATCH v5 rebased 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM

Originally we use a virtual VMCS field to store the launch state of
a certain vmcs. However if we introduce VMCS shadowing feature, this
virtual VMCS should also be able to load into real hardware,
and VMREAD/VMWRITE operate invalid fields.

The new approach is to store the launch state into a list for L1 VMM.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/hvm/vmx/vvmx.c        |   96 ++++++++++++++++++++++++++++++++----
 xen/include/asm-x86/hvm/vmx/vmcs.h |    2 -
 xen/include/asm-x86/hvm/vmx/vvmx.h |    6 ++
 3 files changed, 92 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 90b4da9..cd8bb90 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -51,6 +51,7 @@ int nvmx_vcpu_initialise(struct vcpu *v)
     nvmx->iobitmap[0] = NULL;
     nvmx->iobitmap[1] = NULL;
     nvmx->msrbitmap = NULL;
+    INIT_LIST_HEAD(&nvmx->launched_list);
     return 0;
 out:
     return -ENOMEM;
@@ -58,7 +59,9 @@ out:
  
 void nvmx_vcpu_destroy(struct vcpu *v)
 {
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct vvmcs_list *item, *n;
 
     /* 
      * When destroying the vcpu, it may be running on behalf of L2 guest.
@@ -74,6 +77,12 @@ void nvmx_vcpu_destroy(struct vcpu *v)
         free_xenheap_page(nvcpu->nv_n2vmcx);
         nvcpu->nv_n2vmcx = NULL;
     }
+
+    list_for_each_entry_safe(item, n, &nvmx->launched_list, node)
+    {
+        list_del(&item->node);
+        xfree(item);
+    }
 }
  
 void nvmx_domain_relinquish_resources(struct domain *d)
@@ -1202,6 +1211,62 @@ int nvmx_handle_vmxoff(struct cpu_user_regs *regs)
     return X86EMUL_OKAY;
 }
 
+static bool_t vvmcs_launched(struct list_head *launched_list,
+                             unsigned long vvmcs_mfn)
+{
+    struct vvmcs_list *vvmcs;
+    struct list_head *pos;
+    bool_t launched = 0;
+
+    list_for_each(pos, launched_list)
+    {
+        vvmcs = list_entry(pos, struct vvmcs_list, node);
+        if ( vvmcs_mfn == vvmcs->vvmcs_mfn )
+        {
+            launched = 1;
+            break;
+        }
+    }
+
+    return launched;
+}
+
+static int set_vvmcs_launched(struct list_head *launched_list,
+                              unsigned long vvmcs_mfn)
+{
+    struct vvmcs_list *vvmcs;
+
+    if ( vvmcs_launched(launched_list, vvmcs_mfn) )
+        return 0;
+
+    vvmcs = xzalloc(struct vvmcs_list);
+    if ( !vvmcs )
+        return -ENOMEM;
+
+    vvmcs->vvmcs_mfn = vvmcs_mfn;
+    list_add(&vvmcs->node, launched_list);
+
+    return 0;
+}
+
+static void clear_vvmcs_launched(struct list_head *launched_list,
+                                 paddr_t vvmcs_mfn)
+{
+    struct vvmcs_list *vvmcs;
+    struct list_head *pos;
+
+    list_for_each(pos, launched_list)
+    {
+        vvmcs = list_entry(pos, struct vvmcs_list, node);
+        if ( vvmcs_mfn == vvmcs->vvmcs_mfn )
+        {
+            list_del(&vvmcs->node);
+            xfree(vvmcs);
+            break;
+        }
+    }
+}
+
 int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs)
 {
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
@@ -1225,8 +1290,10 @@ int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs
*regs)
 
 int nvmx_handle_vmresume(struct cpu_user_regs *regs)
 {
-    int launched;
+    bool_t launched;
     struct vcpu *v = current;
+    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 
     if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR )
     {
@@ -1234,8 +1301,8 @@ int nvmx_handle_vmresume(struct cpu_user_regs *regs)
         return X86EMUL_OKAY;        
     }
 
-    launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
-                           NVMX_LAUNCH_STATE);
+    launched = vvmcs_launched(&nvmx->launched_list,
+                   domain_page_map_to_mfn(nvcpu->nv_vvmcx));
     if ( !launched ) {
        vmreturn (regs, VMFAIL_VALID);
        return X86EMUL_OKAY;
@@ -1245,9 +1312,11 @@ int nvmx_handle_vmresume(struct cpu_user_regs *regs)
 
 int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
 {
-    int launched;
+    bool_t launched;
     int rc;
     struct vcpu *v = current;
+    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
 
     if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR )
     {
@@ -1255,8 +1324,8 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
         return X86EMUL_OKAY;
     }
 
-    launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
-                           NVMX_LAUNCH_STATE);
+    launched = vvmcs_launched(&nvmx->launched_list,
+                   domain_page_map_to_mfn(nvcpu->nv_vvmcx));
     if ( launched ) {
        vmreturn (regs, VMFAIL_VALID);
        return X86EMUL_OKAY;
@@ -1264,8 +1333,12 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
     else {
         rc = nvmx_vmresume(v,regs);
         if ( rc == X86EMUL_OKAY )
-            __set_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
-                        NVMX_LAUNCH_STATE, 1);
+        {
+            if ( set_vvmcs_launched(&nvmx->launched_list,
+                    domain_page_map_to_mfn(nvcpu->nv_vvmcx)) < 0 )
+                return X86EMUL_UNHANDLEABLE;
+        }
+          
     }
     return rc;
 }
@@ -1332,6 +1405,7 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
     struct vcpu *v = current;
     struct vmx_inst_decoded decode;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
     unsigned long gpa = 0;
     void *vvmcs;
     int rc;
@@ -1348,7 +1422,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
     
     if ( gpa == nvcpu->nv_vvmcxaddr ) 
     {
-        __set_vvmcs(nvcpu->nv_vvmcx, NVMX_LAUNCH_STATE, 0);
+        clear_vvmcs_launched(&nvmx->launched_list,
+            domain_page_map_to_mfn(nvcpu->nv_vvmcx));
         nvmx_purge_vvmcs(v);
     }
     else 
@@ -1356,7 +1431,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
         /* Even if this VMCS isn''t the current one, we must clear it.
*/
         vvmcs = hvm_map_guest_frame_rw(gpa >> PAGE_SHIFT, 0);
         if ( vvmcs ) 
-            __set_vvmcs(vvmcs, NVMX_LAUNCH_STATE, 0);
+            clear_vvmcs_launched(&nvmx->launched_list,
+                domain_page_map_to_mfn(vvmcs));
         hvm_unmap_guest_frame(vvmcs, 0);
     }
 
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 51df81e..9ff741f 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -421,8 +421,6 @@ enum vmcs_field {
     HOST_SYSENTER_EIP               = 0x00006c12,
     HOST_RSP                        = 0x00006c14,
     HOST_RIP                        = 0x00006c16,
-    /* A virtual VMCS field used for nestedvmx only */
-    NVMX_LAUNCH_STATE               = 0x00006c20,
 };
 
 #define VMCS_VPID_WIDTH 16
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 9e1dc77..89e839f 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -23,6 +23,11 @@
 #ifndef __ASM_X86_HVM_VVMX_H__
 #define __ASM_X86_HVM_VVMX_H__
 
+struct vvmcs_list {
+    unsigned long vvmcs_mfn;
+    struct list_head node;
+};
+
 struct nestedvmx {
     paddr_t    vmxon_region_pa;
     void       *iobitmap[2];		/* map (va) of L1 guest I/O bitmap */
@@ -38,6 +43,7 @@ struct nestedvmx {
         uint32_t exit_qual;
     } ept;
     uint32_t guest_vpid;
+    struct list_head launched_list;
 };
 
 #define vcpu_2_nvmx(v)	(vcpu_nestedhvm(v).u.nvmx)
-- 
1.7.1

Dongxiao Xu

2013-Jan-23 14:32 UTC

head link

[PATCH v5 rebased 2/4] nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing

Before the VMCS shadowing feature, we use memory operation to build up
the virtual VMCS. This does work since this virtual VMCS will never be
loaded into real hardware. However after we introduce the VMCS
shadowing feature, this VMCS will be loaded into hardware, which
requires all fields in the VMCS accessed by VMREAD/VMWRITE.

Besides, the virtual VMCS revision identifer should also meet the
hardware''s requirement, instead of using a faked one.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        |   29 +++++++++++++++++++++++++++++
 xen/arch/x86/hvm/vmx/vvmx.c        |   20 ++++++++++++++++----
 xen/include/asm-x86/hvm/vmx/vmcs.h |    5 +++++
 xen/include/asm-x86/hvm/vmx/vvmx.h |   16 ++++++++++++----
 4 files changed, 62 insertions(+), 8 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index de22e03..82a8d91 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -725,6 +725,35 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct
vmcs_struct *to)
     spin_unlock(&vmx->vmcs_lock);
 }
 
+void virtual_vmcs_enter(void *vvmcs)
+{
+    __vmptrld(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
+}
+
+void virtual_vmcs_exit(void *vvmcs)
+{
+    __vmpclear(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
+    __vmptrld(virt_to_maddr(this_cpu(current_vmcs)));
+}
+
+u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding)
+{
+    u64 res;
+
+    virtual_vmcs_enter(vvmcs);
+    res = __vmread(vmcs_encoding);
+    virtual_vmcs_exit(vvmcs);
+
+    return res;
+}
+
+void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
+{
+    virtual_vmcs_enter(vvmcs);
+    __vmwrite(vmcs_encoding, val);
+    virtual_vmcs_exit(vvmcs);
+}
+
 static int construct_vmcs(struct vcpu *v)
 {
     struct domain *d = v->domain;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index cd8bb90..8000f84 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -175,7 +175,7 @@ static int vvmcs_offset(u32 width, u32 type, u32 index)
     return offset;
 }
 
-u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding)
+u64 __get_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding)
 {
     union vmcs_encoding enc;
     u64 *content = (u64 *) vvmcs;
@@ -205,7 +205,12 @@ u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding)
     return res;
 }
 
-void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val)
+u64 __get_vvmcs_real(void *vvmcs, u32 vmcs_encoding)
+{
+    return virtual_vmcs_vmread(vvmcs, vmcs_encoding);
+}
+
+void __set_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding, u64 val)
 {
     union vmcs_encoding enc;
     u64 *content = (u64 *) vvmcs;
@@ -241,6 +246,11 @@ void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val)
     content[offset] = res;
 }
 
+void __set_vvmcs_real(void *vvmcs, u32 vmcs_encoding, u64 val)
+{
+    virtual_vmcs_vmwrite(vvmcs, vmcs_encoding, val);
+}
+
 static unsigned long reg_read(struct cpu_user_regs *regs,
                               enum vmx_regs_enc index)
 {
@@ -1568,10 +1578,11 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs)
  */
 int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
 {
+    struct vcpu *v = current;
     u64 data = 0, host_data = 0;
     int r = 1;
 
-    if ( !nestedhvm_enabled(current->domain) )
+    if ( !nestedhvm_enabled(v->domain) )
         return 0;
 
     rdmsrl(msr, host_data);
@@ -1581,7 +1592,8 @@ int nvmx_msr_read_intercept(unsigned int msr, u64
*msr_content)
      */
     switch (msr) {
     case MSR_IA32_VMX_BASIC:
-        data = (host_data & (~0ul << 32)) | VVMCS_REVISION;
+        data = (host_data & (~0ul << 32)) |
+               ((v->arch.hvm_vmx.vmcs)->vmcs_revision_id);
         break;
     case MSR_IA32_VMX_PINBASED_CTLS:
     case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 9ff741f..652dc21 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -244,6 +244,7 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
     (vmx_secondary_exec_control & SECONDARY_EXEC_APIC_REGISTER_VIRT)
 #define cpu_has_vmx_virtual_intr_delivery \
     (vmx_secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)
+#define cpu_has_vmx_vmcs_shadowing 0
 
 /* GUEST_INTERRUPTIBILITY_INFO flags. */
 #define VMX_INTR_SHADOW_STI             0x00000001
@@ -436,6 +437,10 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct
vmcs_struct *to);
 void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
 int vmx_check_msr_bitmap(unsigned long *msr_bitmap, u32 msr, int access_type);
+void virtual_vmcs_enter(void *vvmcs);
+void virtual_vmcs_exit(void *vvmcs);
+u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding);
+void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val);
 
 #endif /* ASM_X86_HVM_VMX_VMCS_H__ */
 
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 89e839f..73a67cc 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -152,8 +152,6 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t
*L1_gpa,
  *
  */
 
-#define VVMCS_REVISION 0x40000001u
-
 struct vvmcs_header {
     u32 revision;
     u32 abort;
@@ -185,8 +183,18 @@ enum vvmcs_encoding_type {
     VVMCS_TYPE_HSTATE,
 };
 
-u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding);
-void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val);
+u64 __get_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding);
+u64 __get_vvmcs_real(void *vvmcs, u32 vmcs_encoding);
+void __set_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding, u64 val);
+void __set_vvmcs_real(void *vvmcs, u32 vmcs_encoding, u64 val);
+
+#define __get_vvmcs(_vvmcs, _vmcs_encoding) \
+  (cpu_has_vmx_vmcs_shadowing ? __get_vvmcs_real(_vvmcs, _vmcs_encoding) \
+                              : __get_vvmcs_virtual(_vvmcs, _vmcs_encoding))
+
+#define __set_vvmcs(_vvmcs, _vmcs_encoding, _val) \
+  (cpu_has_vmx_vmcs_shadowing ? __set_vvmcs_real(_vvmcs, _vmcs_encoding, _val)
\
+                              : __set_vvmcs_virtual(_vvmcs, _vmcs_encoding,
_val))
 
 uint64_t get_shadow_eptp(struct vcpu *v);
 
-- 
1.7.1

Dongxiao Xu

2013-Jan-23 14:32 UTC

head link

[PATCH v5 rebased 3/4] nested vmx: optimize for bulk access of virtual VMCS

After we use the VMREAD/VMWRITE to build up the virtual VMCS, each
access to the virtual VMCS needs two VMPTRLD and one VMCLEAR to
switch the environment, which might be an overhead to performance.
This commit tries to handle multiple virtual VMCS access together
to improve the performance.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        |   10 +++
 xen/arch/x86/hvm/vmx/vvmx.c        |  126 +++++++++++++++++++++++++++++++-----
 xen/include/asm-x86/hvm/vmx/vvmx.h |    2 +
 3 files changed, 122 insertions(+), 16 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 82a8d91..f89ea93 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -31,6 +31,7 @@
 #include <asm/hvm/io.h>
 #include <asm/hvm/support.h>
 #include <asm/hvm/vmx/vmx.h>
+#include <asm/hvm/vmx/vvmx.h>
 #include <asm/hvm/vmx/vmcs.h>
 #include <asm/flushtlb.h>
 #include <xen/event.h>
@@ -423,6 +424,13 @@ static void vmx_load_vmcs(struct vcpu *v)
 
 int vmx_cpu_up_prepare(unsigned int cpu)
 {
+    /*
+     * If nvmx_cpu_up_prepare() failed, do not return failure and just fallback
+     * to legacy mode for vvmcs synchronization.
+     */
+    if ( nvmx_cpu_up_prepare(cpu) != 0 )
+        printk("CPU%d: Could not allocate virtual VMCS buffer.\n",
cpu);
+
     if ( per_cpu(vmxon_region, cpu) != NULL )
         return 0;
 
@@ -431,6 +439,7 @@ int vmx_cpu_up_prepare(unsigned int cpu)
         return 0;
 
     printk("CPU%d: Could not allocate host VMCS\n", cpu);
+    nvmx_cpu_dead(cpu);
     return -ENOMEM;
 }
 
@@ -438,6 +447,7 @@ void vmx_cpu_dead(unsigned int cpu)
 {
     vmx_free_vmcs(per_cpu(vmxon_region, cpu));
     per_cpu(vmxon_region, cpu) = NULL;
+    nvmx_cpu_dead(cpu);
 }
 
 int vmx_cpu_up(void)
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 8000f84..1e1ad56 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -28,8 +28,31 @@
 #include <asm/hvm/vmx/vvmx.h>
 #include <asm/hvm/nestedhvm.h>
 
+static DEFINE_PER_CPU(u64 *, vvmcs_buf);
+
 static void nvmx_purge_vvmcs(struct vcpu *v);
 
+#define VMCS_BUF_SIZE 100
+
+int nvmx_cpu_up_prepare(unsigned int cpu)
+{
+    if ( per_cpu(vvmcs_buf, cpu) != NULL )
+        return 0;
+
+    per_cpu(vvmcs_buf, cpu) = xzalloc_array(u64, VMCS_BUF_SIZE);
+
+    if ( per_cpu(vvmcs_buf, cpu) != NULL )
+        return 0;
+
+    return -ENOMEM;
+}
+
+void nvmx_cpu_dead(unsigned int cpu)
+{
+    xfree(per_cpu(vvmcs_buf, cpu));
+    per_cpu(vvmcs_buf, cpu) = NULL;
+}
+
 int nvmx_vcpu_initialise(struct vcpu *v)
 {
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
@@ -834,6 +857,40 @@ static void vvmcs_to_shadow(void *vvmcs, unsigned int
field)
     __vmwrite(field, value);
 }
 
+static void vvmcs_to_shadow_bulk(struct vcpu *v, unsigned int n,
+                                 const u16 *field)
+{
+    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    void *vvmcs = nvcpu->nv_vvmcx;
+    u64 *value = this_cpu(vvmcs_buf);
+    unsigned int i;
+
+    if ( !cpu_has_vmx_vmcs_shadowing )
+        goto fallback;
+
+    if ( !value || n > VMCS_BUF_SIZE )
+    {
+        gdprintk(XENLOG_DEBUG, "vmcs sync fall back to non-bulk mode, \
+                 buffer: %p, buffer size: %d, fields number: %d.\n",
+                 value, VMCS_BUF_SIZE, n);
+        goto fallback;
+    }
+
+    virtual_vmcs_enter(vvmcs);
+    for ( i = 0; i < n; i++ )
+        value[i] = __vmread(field[i]);
+    virtual_vmcs_exit(vvmcs);
+
+    for ( i = 0; i < n; i++ )
+        __vmwrite(field[i], value[i]);
+
+    return;
+
+fallback:
+    for ( i = 0; i < n; i++ )
+        vvmcs_to_shadow(vvmcs, field[i]);
+}
+
 static void shadow_to_vvmcs(void *vvmcs, unsigned int field)
 {
     u64 value;
@@ -844,6 +901,40 @@ static void shadow_to_vvmcs(void *vvmcs, unsigned int
field)
         __set_vvmcs(vvmcs, field, value);
 }
 
+static void shadow_to_vvmcs_bulk(struct vcpu *v, unsigned int n,
+                                 const u16 *field)
+{
+    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
+    void *vvmcs = nvcpu->nv_vvmcx;
+    u64 *value = this_cpu(vvmcs_buf);
+    unsigned int i;
+
+    if ( !cpu_has_vmx_vmcs_shadowing )
+        goto fallback;
+
+    if ( !value || n > VMCS_BUF_SIZE )
+    {
+        gdprintk(XENLOG_DEBUG, "vmcs sync fall back to non-bulk mode, \
+                 buffer: %p, buffer size: %d, fields number: %d.\n",
+                 value, VMCS_BUF_SIZE, n);
+        goto fallback;
+    }
+
+    for ( i = 0; i < n; i++ )
+        value[i] = __vmread(field[i]);
+
+    virtual_vmcs_enter(vvmcs);
+    for ( i = 0; i < n; i++ )
+        __vmwrite(field[i], value[i]);
+    virtual_vmcs_exit(vvmcs);
+
+    return;
+
+fallback:
+    for ( i = 0; i < n; i++ )
+        shadow_to_vvmcs(vvmcs, field[i]);
+}
+
 static void load_shadow_control(struct vcpu *v)
 {
     /*
@@ -867,13 +958,18 @@ static void load_shadow_guest_state(struct vcpu *v)
 {
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     void *vvmcs = nvcpu->nv_vvmcx;
-    int i;
     u32 control;
     u64 cr_gh_mask, cr_read_shadow;
 
+    static const u16 vmentry_fields[] = {
+        VM_ENTRY_INTR_INFO,
+        VM_ENTRY_EXCEPTION_ERROR_CODE,
+        VM_ENTRY_INSTRUCTION_LEN,
+    };
+
     /* vvmcs.gstate to shadow vmcs.gstate */
-    for ( i = 0; i < ARRAY_SIZE(vmcs_gstate_field); i++ )
-        vvmcs_to_shadow(vvmcs, vmcs_gstate_field[i]);
+    vvmcs_to_shadow_bulk(v, ARRAY_SIZE(vmcs_gstate_field),
+                         vmcs_gstate_field);
 
     hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0));
     hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4));
@@ -887,9 +983,7 @@ static void load_shadow_guest_state(struct vcpu *v)
 
     hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset);
 
-    vvmcs_to_shadow(vvmcs, VM_ENTRY_INTR_INFO);
-    vvmcs_to_shadow(vvmcs, VM_ENTRY_EXCEPTION_ERROR_CODE);
-    vvmcs_to_shadow(vvmcs, VM_ENTRY_INSTRUCTION_LEN);
+    vvmcs_to_shadow_bulk(v, ARRAY_SIZE(vmentry_fields), vmentry_fields);
 
     /*
      * While emulate CR0 and CR4 for nested virtualization, set the CR0/CR4
@@ -909,10 +1003,13 @@ static void load_shadow_guest_state(struct vcpu *v)
     if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) &&
          (v->arch.hvm_vcpu.guest_efer & EFER_LMA) )
     {
-        vvmcs_to_shadow(vvmcs, GUEST_PDPTR0);
-        vvmcs_to_shadow(vvmcs, GUEST_PDPTR1);
-        vvmcs_to_shadow(vvmcs, GUEST_PDPTR2);
-        vvmcs_to_shadow(vvmcs, GUEST_PDPTR3);
+        static const u16 gpdptr_fields[] = {
+            GUEST_PDPTR0,
+            GUEST_PDPTR1,
+            GUEST_PDPTR2,
+            GUEST_PDPTR3,
+        };
+        vvmcs_to_shadow_bulk(v, ARRAY_SIZE(gpdptr_fields), gpdptr_fields);
     }
 
     /* TODO: CR3 target control */
@@ -1003,13 +1100,12 @@ static void virtual_vmentry(struct cpu_user_regs *regs)
 
 static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs)
 {
-    int i;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     void *vvmcs = nvcpu->nv_vvmcx;
 
     /* copy shadow vmcs.gstate back to vvmcs.gstate */
-    for ( i = 0; i < ARRAY_SIZE(vmcs_gstate_field); i++ )
-        shadow_to_vvmcs(vvmcs, vmcs_gstate_field[i]);
+    shadow_to_vvmcs_bulk(v, ARRAY_SIZE(vmcs_gstate_field),
+                         vmcs_gstate_field);
     /* RIP, RSP are in user regs */
     __set_vvmcs(vvmcs, GUEST_RIP, regs->eip);
     __set_vvmcs(vvmcs, GUEST_RSP, regs->esp);
@@ -1021,13 +1117,11 @@ static void sync_vvmcs_guest_state(struct vcpu *v,
struct cpu_user_regs *regs)
 
 static void sync_vvmcs_ro(struct vcpu *v)
 {
-    int i;
     struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
     struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
     void *vvmcs = nvcpu->nv_vvmcx;
 
-    for ( i = 0; i < ARRAY_SIZE(vmcs_ro_field); i++ )
-        shadow_to_vvmcs(nvcpu->nv_vvmcx, vmcs_ro_field[i]);
+    shadow_to_vvmcs_bulk(v, ARRAY_SIZE(vmcs_ro_field), vmcs_ro_field);
 
     /* Adjust exit_reason/exit_qualifciation for violation case */
     if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) == EXIT_REASON_EPT_VIOLATION )
diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
b/xen/include/asm-x86/hvm/vmx/vvmx.h
index 73a67cc..3874525 100644
--- a/xen/include/asm-x86/hvm/vmx/vvmx.h
+++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
@@ -229,5 +229,7 @@ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
                         unsigned int *page_order, uint32_t rwx_acc,
                         unsigned long *l1gfn, uint8_t *p2m_acc,
                         uint64_t *exit_qual, uint32_t *exit_reason);
+int nvmx_cpu_up_prepare(unsigned int cpu);
+void nvmx_cpu_dead(unsigned int cpu);
 #endif /* __ASM_X86_HVM_VVMX_H__ */
 
-- 
1.7.1

Dongxiao Xu

2013-Jan-23 14:32 UTC

head link

[PATCH v5 rebased 4/4] nested vmx: enable VMCS shadowing feature

The current logic for handling the non-root VMREAD/VMWRITE is by
VM-Exit and emulate, which may bring certain overhead.

On new Intel platform, it introduces a new feature called VMCS
shadowing, where non-root VMREAD/VMWRITE will not trigger VM-Exit,
and the hardware will read/write the virtual VMCS instead.
This is proved to have performance improvement with the feature.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
---
 xen/arch/x86/hvm/vmx/vmcs.c        |    9 ++++
 xen/arch/x86/hvm/vmx/vvmx.c        |   82 ++++++++++++++++++++++++++++++++++++
 xen/include/asm-x86/hvm/vmx/vmcs.h |   18 +++++++-
 3 files changed, 108 insertions(+), 1 deletions(-)

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index f89ea93..69b7dae 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -92,6 +92,7 @@ static void __init vmx_display_features(void)
     P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
     P(cpu_has_vmx_apic_reg_virt, "APIC Register Virtualization");
     P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt
Delivery");
+    P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
 #undef P
 
     if ( !printed )
@@ -133,6 +134,7 @@ static int vmx_init_vmcs_config(void)
     u32 _vmx_cpu_based_exec_control;
     u32 _vmx_secondary_exec_control = 0;
     u64 _vmx_ept_vpid_cap = 0;
+    u64 _vmx_misc_cap = 0;
     u32 _vmx_vmexit_control;
     u32 _vmx_vmentry_control;
     bool_t mismatch = 0;
@@ -180,6 +182,9 @@ static int vmx_init_vmcs_config(void)
                SECONDARY_EXEC_ENABLE_RDTSCP |
                SECONDARY_EXEC_PAUSE_LOOP_EXITING |
                SECONDARY_EXEC_ENABLE_INVPCID);
+        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
+        if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
+            opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
         if ( opt_vpid_enabled )
             opt |= SECONDARY_EXEC_ENABLE_VPID;
         if ( opt_unrestricted_guest_enabled )
@@ -383,6 +388,8 @@ static void __vmx_clear_vmcs(void *info)
     if ( arch_vmx->active_cpu == smp_processor_id() )
     {
         __vmpclear(virt_to_maddr(arch_vmx->vmcs));
+        if ( arch_vmx->vmcs_shadow_maddr )
+            __vmpclear(arch_vmx->vmcs_shadow_maddr);
 
         arch_vmx->active_cpu = -1;
         arch_vmx->launched   = 0;
@@ -720,6 +727,8 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct
vmcs_struct *to)
     spin_lock(&vmx->vmcs_lock);
 
     __vmpclear(virt_to_maddr(from));
+    if ( vmx->vmcs_shadow_maddr )
+        __vmpclear(vmx->vmcs_shadow_maddr);
     __vmptrld(virt_to_maddr(to));
 
     vmx->vmcs = to;
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
index 1e1ad56..3c152c5 100644
--- a/xen/arch/x86/hvm/vmx/vvmx.c
+++ b/xen/arch/x86/hvm/vmx/vvmx.c
@@ -64,6 +64,48 @@ int nvmx_vcpu_initialise(struct vcpu *v)
         gdprintk(XENLOG_ERR, "nest: allocation for shadow vmcs
failed\n");
 	goto out;
     }
+
+    /* non-root VMREAD/VMWRITE bitmap. */
+    if ( cpu_has_vmx_vmcs_shadowing )
+    {
+        struct page_info *vmread_bitmap, *vmwrite_bitmap;
+        unsigned long *vr, *vw;
+
+        vmread_bitmap = alloc_domheap_page(NULL, 0);
+        if ( !vmread_bitmap )
+        {
+            gdprintk(XENLOG_ERR, "nest: allocation for vmread bitmap
failed\n");
+            goto out1;
+        }
+        v->arch.hvm_vmx.vmread_bitmap = vmread_bitmap;
+
+        vmwrite_bitmap = alloc_domheap_page(NULL, 0);
+        if ( !vmwrite_bitmap )
+        {
+            gdprintk(XENLOG_ERR, "nest: allocation for vmwrite bitmap
failed\n");
+            goto out2;
+        }
+        v->arch.hvm_vmx.vmwrite_bitmap = vmwrite_bitmap;
+
+        vr = __map_domain_page(vmread_bitmap);
+        vw = __map_domain_page(vmwrite_bitmap);
+
+        clear_page(vr);
+        clear_page(vw);
+
+        /* 
+         * For the following 4 encodings, we need to handle them in VMM.
+         * Let them vmexit as usual.
+         */
+        set_bit(IO_BITMAP_A, vw);
+        set_bit(IO_BITMAP_A_HIGH, vw);
+        set_bit(IO_BITMAP_B, vw);
+        set_bit(IO_BITMAP_B_HIGH, vw);
+
+        unmap_domain_page(vr);
+        unmap_domain_page(vw);
+    }
+
     nvmx->ept.enabled = 0;
     nvmx->guest_vpid = 0;
     nvmx->vmxon_region_pa = 0;
@@ -76,6 +118,10 @@ int nvmx_vcpu_initialise(struct vcpu *v)
     nvmx->msrbitmap = NULL;
     INIT_LIST_HEAD(&nvmx->launched_list);
     return 0;
+out2:
+    free_domheap_page(v->arch.hvm_vmx.vmread_bitmap);
+out1:
+    free_xenheap_page(nvcpu->nv_n2vmcx);
 out:
     return -ENOMEM;
 }
@@ -106,6 +152,11 @@ void nvmx_vcpu_destroy(struct vcpu *v)
         list_del(&item->node);
         xfree(item);
     }
+
+    if ( v->arch.hvm_vmx.vmread_bitmap )
+        free_domheap_page(v->arch.hvm_vmx.vmread_bitmap);
+    if ( v->arch.hvm_vmx.vmwrite_bitmap )
+        free_domheap_page(v->arch.hvm_vmx.vmwrite_bitmap);
 }
  
 void nvmx_domain_relinquish_resources(struct domain *d)
@@ -1035,6 +1086,32 @@ static bool_t nvmx_vpid_enabled(struct nestedvcpu *nvcpu)
     return 0;
 }
 
+static void nvmx_set_vmcs_pointer(struct vcpu *v, struct vmcs_struct *vvmcs)
+{
+    unsigned long vvmcs_mfn = domain_page_map_to_mfn(vvmcs);
+    paddr_t vvmcs_maddr = vvmcs_mfn << PAGE_SHIFT;
+
+    __vmpclear(vvmcs_maddr);
+    vvmcs->vmcs_revision_id |= VMCS_RID_TYPE_MASK;
+    v->arch.hvm_vmx.vmcs_shadow_maddr = vvmcs_maddr;
+    __vmwrite(VMCS_LINK_POINTER, vvmcs_maddr);
+    __vmwrite(VMREAD_BITMAP, page_to_maddr(v->arch.hvm_vmx.vmread_bitmap));
+    __vmwrite(VMWRITE_BITMAP,
page_to_maddr(v->arch.hvm_vmx.vmwrite_bitmap));
+}
+
+static void nvmx_clear_vmcs_pointer(struct vcpu *v, struct vmcs_struct *vvmcs)
+{
+    unsigned long vvmcs_mfn = domain_page_map_to_mfn(vvmcs);
+    paddr_t vvmcs_maddr = vvmcs_mfn << PAGE_SHIFT;
+
+    __vmpclear(vvmcs_maddr);
+    vvmcs->vmcs_revision_id &= ~VMCS_RID_TYPE_MASK;
+    v->arch.hvm_vmx.vmcs_shadow_maddr = 0;
+    __vmwrite(VMCS_LINK_POINTER, ~0ul);
+    __vmwrite(VMREAD_BITMAP, 0);
+    __vmwrite(VMWRITE_BITMAP, 0);
+}
+
 static void virtual_vmentry(struct cpu_user_regs *regs)
 {
     struct vcpu *v = current;
@@ -1476,6 +1553,9 @@ int nvmx_handle_vmptrld(struct cpu_user_regs *regs)
         __map_msr_bitmap(v);
     }
 
+    if ( cpu_has_vmx_vmcs_shadowing )
+        nvmx_set_vmcs_pointer(v, nvcpu->nv_vvmcx);
+
     vmreturn(regs, VMSUCCEED);
 
 out:
@@ -1526,6 +1606,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
     
     if ( gpa == nvcpu->nv_vvmcxaddr ) 
     {
+        if ( cpu_has_vmx_vmcs_shadowing )
+            nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx);
         clear_vvmcs_launched(&nvmx->launched_list,
             domain_page_map_to_mfn(nvcpu->nv_vvmcx));
         nvmx_purge_vvmcs(v);
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 652dc21..ba02221 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -81,6 +81,8 @@ struct vmx_domain {
 struct arch_vmx_struct {
     /* Virtual address of VMCS. */
     struct vmcs_struct  *vmcs;
+    /* VMCS shadow machine address. */
+    paddr_t             vmcs_shadow_maddr;
 
     /* Protects remote usage of VMCS (VMPTRLD/VMCLEAR). */
     spinlock_t           vmcs_lock;
@@ -125,6 +127,10 @@ struct arch_vmx_struct {
     /* Remember EFLAGS while in virtual 8086 mode */
     uint32_t             vm86_saved_eflags;
     int                  hostenv_migrated;
+
+    /* Bitmap to control vmexit policy for Non-root VMREAD/VMWRITE */
+    struct page_info     *vmread_bitmap;
+    struct page_info     *vmwrite_bitmap;
 };
 
 int vmx_create_vmcs(struct vcpu *v);
@@ -191,6 +197,7 @@ extern u32 vmx_vmentry_control;
 #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING       0x00000400
 #define SECONDARY_EXEC_ENABLE_INVPCID           0x00001000
+#define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
 extern u32 vmx_secondary_exec_control;
 
 extern bool_t cpu_has_vmx_ins_outs_instr_info;
@@ -205,6 +212,8 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
 #define VMX_EPT_INVEPT_SINGLE_CONTEXT           0x02000000
 #define VMX_EPT_INVEPT_ALL_CONTEXT              0x04000000
 
+#define VMX_MISC_VMWRITE_ALL                    0x20000000
+
 #define VMX_VPID_INVVPID_INSTRUCTION                        0x100000000ULL
 #define VMX_VPID_INVVPID_INDIVIDUAL_ADDR                    0x10000000000ULL
 #define VMX_VPID_INVVPID_SINGLE_CONTEXT                     0x20000000000ULL
@@ -244,7 +253,10 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
     (vmx_secondary_exec_control & SECONDARY_EXEC_APIC_REGISTER_VIRT)
 #define cpu_has_vmx_virtual_intr_delivery \
     (vmx_secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)
-#define cpu_has_vmx_vmcs_shadowing 0
+#define cpu_has_vmx_vmcs_shadowing \
+    (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
+
+#define VMCS_RID_TYPE_MASK              0x80000000
 
 /* GUEST_INTERRUPTIBILITY_INFO flags. */
 #define VMX_INTR_SHADOW_STI             0x00000001
@@ -305,6 +317,10 @@ enum vmcs_field {
     EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
     EOI_EXIT_BITMAP3                = 0x00002022,
     EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
+    VMREAD_BITMAP                   = 0x00002026,
+    VMREAD_BITMAP_HIGH              = 0x00002027,
+    VMWRITE_BITMAP                  = 0x00002028,
+    VMWRITE_BITMAP_HIGH             = 0x00002029,
     GUEST_PHYSICAL_ADDRESS          = 0x00002400,
     GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
     VMCS_LINK_POINTER               = 0x00002800,
-- 
1.7.1

Nakajima, Jun

2013-Jan-23 14:55 UTC

head link

Re: [PATCH v5 rebased 4/4] nested vmx: enable VMCS shadowing feature

On Wed, Jan 23, 2013 at 6:32 AM, Dongxiao Xu <dongxiao.xu@intel.com>
wrote:>
> The current logic for handling the non-root VMREAD/VMWRITE is by
> VM-Exit and emulate, which may bring certain overhead.
>
> On new Intel platform, it introduces a new feature called VMCS
> shadowing, where non-root VMREAD/VMWRITE will not trigger VM-Exit,
> and the hardware will read/write the virtual VMCS instead.
> This is proved to have performance improvement with the feature.
>
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c        |    9 ++++
>  xen/arch/x86/hvm/vmx/vvmx.c        |   82
> ++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/vmx/vmcs.h |   18 +++++++-
>  3 files changed, 108 insertions(+), 1 deletions(-)
>
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index f89ea93..69b7dae 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -92,6 +92,7 @@ static void __init vmx_display_features(void)
>      P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
>      P(cpu_has_vmx_apic_reg_virt, "APIC Register
Virtualization");
>      P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt
Delivery");
> +    P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
>  #undef P
>
>      if ( !printed )
> @@ -133,6 +134,7 @@ static int vmx_init_vmcs_config(void)
>      u32 _vmx_cpu_based_exec_control;
>      u32 _vmx_secondary_exec_control = 0;
>      u64 _vmx_ept_vpid_cap = 0;
> +    u64 _vmx_misc_cap = 0;
>      u32 _vmx_vmexit_control;
>      u32 _vmx_vmentry_control;
>      bool_t mismatch = 0;
> @@ -180,6 +182,9 @@ static int vmx_init_vmcs_config(void)
>                 SECONDARY_EXEC_ENABLE_RDTSCP |
>                 SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>                 SECONDARY_EXEC_ENABLE_INVPCID);
> +        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
> +        if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
> +            opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>          if ( opt_vpid_enabled )
>              opt |= SECONDARY_EXEC_ENABLE_VPID;
>          if ( opt_unrestricted_guest_enabled )
> @@ -383,6 +388,8 @@ static void __vmx_clear_vmcs(void *info)
>      if ( arch_vmx->active_cpu == smp_processor_id() )
>      {
>          __vmpclear(virt_to_maddr(arch_vmx->vmcs));
> +        if ( arch_vmx->vmcs_shadow_maddr )
> +            __vmpclear(arch_vmx->vmcs_shadow_maddr);
>
>          arch_vmx->active_cpu = -1;
>          arch_vmx->launched   = 0;
> @@ -720,6 +727,8 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct
> vmcs_struct *to)
>      spin_lock(&vmx->vmcs_lock);
>
>      __vmpclear(virt_to_maddr(from));
> +    if ( vmx->vmcs_shadow_maddr )
> +        __vmpclear(vmx->vmcs_shadow_maddr);
>      __vmptrld(virt_to_maddr(to));
>
>      vmx->vmcs = to;
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 1e1ad56..3c152c5 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -64,6 +64,48 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>          gdprintk(XENLOG_ERR, "nest: allocation for shadow vmcs
> failed\n");
>         goto out;
>      }
> +
> +    /* non-root VMREAD/VMWRITE bitmap. */
> +    if ( cpu_has_vmx_vmcs_shadowing )
> +    {
> +        struct page_info *vmread_bitmap, *vmwrite_bitmap;
> +        unsigned long *vr, *vw;
> +
> +        vmread_bitmap = alloc_domheap_page(NULL, 0);
> +        if ( !vmread_bitmap )
> +        {
> +            gdprintk(XENLOG_ERR, "nest: allocation for vmread bitmap
> failed\n");
> +            goto out1;
> +        }
> +        v->arch.hvm_vmx.vmread_bitmap = vmread_bitmap;
> +
> +        vmwrite_bitmap = alloc_domheap_page(NULL, 0);
> +        if ( !vmwrite_bitmap )
> +        {
> +            gdprintk(XENLOG_ERR, "nest: allocation for vmwrite bitmap
> failed\n");
> +            goto out2;
> +        }
> +        v->arch.hvm_vmx.vmwrite_bitmap = vmwrite_bitmap;
> +
> +        vr = __map_domain_page(vmread_bitmap);
> +        vw = __map_domain_page(vmwrite_bitmap);
> +
> +        clear_page(vr);
> +        clear_page(vw);
> +
> +        /*
> +         * For the following 4 encodings, we need to handle them in VMM.
> +         * Let them vmexit as usual.
> +         */
> +        set_bit(IO_BITMAP_A, vw);
> +        set_bit(IO_BITMAP_A_HIGH, vw);
> +        set_bit(IO_BITMAP_B, vw);
> +        set_bit(IO_BITMAP_B_HIGH, vw);
> +
> +        unmap_domain_page(vr);
> +        unmap_domain_page(vw);
> +    }
> +
>      nvmx->ept.enabled = 0;
>      nvmx->guest_vpid = 0;
>      nvmx->vmxon_region_pa = 0;
> @@ -76,6 +118,10 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>      nvmx->msrbitmap = NULL;
>      INIT_LIST_HEAD(&nvmx->launched_list);
>      return 0;
> +out2:
> +    free_domheap_page(v->arch.hvm_vmx.vmread_bitmap);
> +out1:
> +    free_xenheap_page(nvcpu->nv_n2vmcx);
>  out:
>      return -ENOMEM;
>  }
> @@ -106,6 +152,11 @@ void nvmx_vcpu_destroy(struct vcpu *v)
>          list_del(&item->node);
>          xfree(item);
>      }
> +
> +    if ( v->arch.hvm_vmx.vmread_bitmap )
> +        free_domheap_page(v->arch.hvm_vmx.vmread_bitmap);
> +    if ( v->arch.hvm_vmx.vmwrite_bitmap )
> +        free_domheap_page(v->arch.hvm_vmx.vmwrite_bitmap);
>  }
>
>  void nvmx_domain_relinquish_resources(struct domain *d)
> @@ -1035,6 +1086,32 @@ static bool_t nvmx_vpid_enabled(struct nestedvcpu
> *nvcpu)
>      return 0;
>  }
>
> +static void nvmx_set_vmcs_pointer(struct vcpu *v, struct vmcs_struct
> *vvmcs)
> +{
> +    unsigned long vvmcs_mfn = domain_page_map_to_mfn(vvmcs);
> +    paddr_t vvmcs_maddr = vvmcs_mfn << PAGE_SHIFT;
> +
> +    __vmpclear(vvmcs_maddr);
> +    vvmcs->vmcs_revision_id |= VMCS_RID_TYPE_MASK;
> +    v->arch.hvm_vmx.vmcs_shadow_maddr = vvmcs_maddr;
> +    __vmwrite(VMCS_LINK_POINTER, vvmcs_maddr);
> +    __vmwrite(VMREAD_BITMAP,
> page_to_maddr(v->arch.hvm_vmx.vmread_bitmap));
> +    __vmwrite(VMWRITE_BITMAP,
> page_to_maddr(v->arch.hvm_vmx.vmwrite_bitmap));
> +}
> +
> +static void nvmx_clear_vmcs_pointer(struct vcpu *v, struct vmcs_struct
> *vvmcs)
> +{
> +    unsigned long vvmcs_mfn = domain_page_map_to_mfn(vvmcs);
> +    paddr_t vvmcs_maddr = vvmcs_mfn << PAGE_SHIFT;
> +
> +    __vmpclear(vvmcs_maddr);
> +    vvmcs->vmcs_revision_id &= ~VMCS_RID_TYPE_MASK;
> +    v->arch.hvm_vmx.vmcs_shadow_maddr = 0;
> +    __vmwrite(VMCS_LINK_POINTER, ~0ul);
> +    __vmwrite(VMREAD_BITMAP, 0);
> +    __vmwrite(VMWRITE_BITMAP, 0);
> +}
> +
>  static void virtual_vmentry(struct cpu_user_regs *regs)
>  {
>      struct vcpu *v = current;
> @@ -1476,6 +1553,9 @@ int nvmx_handle_vmptrld(struct cpu_user_regs *regs)
>          __map_msr_bitmap(v);
>      }
>
> +    if ( cpu_has_vmx_vmcs_shadowing )
> +        nvmx_set_vmcs_pointer(v, nvcpu->nv_vvmcx);
> +
>      vmreturn(regs, VMSUCCEED);
>
>  out:
> @@ -1526,6 +1606,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs)
>
>      if ( gpa == nvcpu->nv_vvmcxaddr )
>      {
> +        if ( cpu_has_vmx_vmcs_shadowing )
> +            nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx);
>          clear_vvmcs_launched(&nvmx->launched_list,
>              domain_page_map_to_mfn(nvcpu->nv_vvmcx));
>          nvmx_purge_vvmcs(v);
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
> b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 652dc21..ba02221 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -81,6 +81,8 @@ struct vmx_domain {
>  struct arch_vmx_struct {
>      /* Virtual address of VMCS. */
>      struct vmcs_struct  *vmcs;
> +    /* VMCS shadow machine address. */
> +    paddr_t             vmcs_shadow_maddr;
>
>      /* Protects remote usage of VMCS (VMPTRLD/VMCLEAR). */
>      spinlock_t           vmcs_lock;
> @@ -125,6 +127,10 @@ struct arch_vmx_struct {
>      /* Remember EFLAGS while in virtual 8086 mode */
>      uint32_t             vm86_saved_eflags;
>      int                  hostenv_migrated;
> +
> +    /* Bitmap to control vmexit policy for Non-root VMREAD/VMWRITE */
> +    struct page_info     *vmread_bitmap;
> +    struct page_info     *vmwrite_bitmap;
>  };
>
>  int vmx_create_vmcs(struct vcpu *v);
> @@ -191,6 +197,7 @@ extern u32 vmx_vmentry_control;
>  #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
>  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING       0x00000400
>  #define SECONDARY_EXEC_ENABLE_INVPCID           0x00001000
> +#define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
>  extern u32 vmx_secondary_exec_control;
>
>  extern bool_t cpu_has_vmx_ins_outs_instr_info;
> @@ -205,6 +212,8 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
>  #define VMX_EPT_INVEPT_SINGLE_CONTEXT           0x02000000
>  #define VMX_EPT_INVEPT_ALL_CONTEXT              0x04000000
>
> +#define VMX_MISC_VMWRITE_ALL                    0x20000000
> +
>  #define VMX_VPID_INVVPID_INSTRUCTION
> 0x100000000ULL
>  #define VMX_VPID_INVVPID_INDIVIDUAL_ADDR
> 0x10000000000ULL
>  #define VMX_VPID_INVVPID_SINGLE_CONTEXT
> 0x20000000000ULL
> @@ -244,7 +253,10 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
>      (vmx_secondary_exec_control & SECONDARY_EXEC_APIC_REGISTER_VIRT)
>  #define cpu_has_vmx_virtual_intr_delivery \
>      (vmx_secondary_exec_control &
SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)
> -#define cpu_has_vmx_vmcs_shadowing 0
> +#define cpu_has_vmx_vmcs_shadowing \
> +    (vmx_secondary_exec_control &
SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
> +
> +#define VMCS_RID_TYPE_MASK              0x80000000
>
>  /* GUEST_INTERRUPTIBILITY_INFO flags. */
>  #define VMX_INTR_SHADOW_STI             0x00000001
> @@ -305,6 +317,10 @@ enum vmcs_field {
>      EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
>      EOI_EXIT_BITMAP3                = 0x00002022,
>      EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
> +    VMREAD_BITMAP                   = 0x00002026,
> +    VMREAD_BITMAP_HIGH              = 0x00002027,
> +    VMWRITE_BITMAP                  = 0x00002028,
> +    VMWRITE_BITMAP_HIGH             = 0x00002029,
>      GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>      GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>      VMCS_LINK_POINTER               = 0x00002800,
> --
> 1.7.1
>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>


--
Jun
Intel Open Source Technology Center

Dong, Eddie

2013-Jan-25 02:13 UTC

head link

Re: [PATCH v5 rebased 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM

Acked-by Eddie Dong <eddie.dong@intel.com>
> -----Original Message-----
> From: xen-devel-bounces@lists.xen.org
> [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of Dongxiao Xu
> Sent: Wednesday, January 23, 2013 10:32 PM
> To: xen-devel@lists.xensource.com
> Cc: JBeulich@suse.com; Dong, Eddie; Zhang, Xiantao; Nakajima, Jun
> Subject: [Xen-devel] [PATCH v5 rebased 1/4] nested vmx: Use a list to store
> the launched vvmcs for L1 VMM
> 
> Originally we use a virtual VMCS field to store the launch state of
> a certain vmcs. However if we introduce VMCS shadowing feature, this
> virtual VMCS should also be able to load into real hardware,
> and VMREAD/VMWRITE operate invalid fields.
> 
> The new approach is to store the launch state into a list for L1 VMM.
> 
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vvmx.c        |   96
> ++++++++++++++++++++++++++++++++----
>  xen/include/asm-x86/hvm/vmx/vmcs.h |    2 -
>  xen/include/asm-x86/hvm/vmx/vvmx.h |    6 ++
>  3 files changed, 92 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 90b4da9..cd8bb90 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -51,6 +51,7 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>      nvmx->iobitmap[0] = NULL;
>      nvmx->iobitmap[1] = NULL;
>      nvmx->msrbitmap = NULL;
> +    INIT_LIST_HEAD(&nvmx->launched_list);
>      return 0;
>  out:
>      return -ENOMEM;
> @@ -58,7 +59,9 @@ out:
> 
>  void nvmx_vcpu_destroy(struct vcpu *v)
>  {
> +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
>      struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> +    struct vvmcs_list *item, *n;
> 
>      /*
>       * When destroying the vcpu, it may be running on behalf of L2 guest.
> @@ -74,6 +77,12 @@ void nvmx_vcpu_destroy(struct vcpu *v)
>          free_xenheap_page(nvcpu->nv_n2vmcx);
>          nvcpu->nv_n2vmcx = NULL;
>      }
> +
> +    list_for_each_entry_safe(item, n, &nvmx->launched_list, node)
> +    {
> +        list_del(&item->node);
> +        xfree(item);
> +    }
>  }
> 
>  void nvmx_domain_relinquish_resources(struct domain *d)
> @@ -1202,6 +1211,62 @@ int nvmx_handle_vmxoff(struct cpu_user_regs
> *regs)
>      return X86EMUL_OKAY;
>  }
> 
> +static bool_t vvmcs_launched(struct list_head *launched_list,
> +                             unsigned long vvmcs_mfn)
> +{
> +    struct vvmcs_list *vvmcs;
> +    struct list_head *pos;
> +    bool_t launched = 0;
> +
> +    list_for_each(pos, launched_list)
> +    {
> +        vvmcs = list_entry(pos, struct vvmcs_list, node);
> +        if ( vvmcs_mfn == vvmcs->vvmcs_mfn )
> +        {
> +            launched = 1;
> +            break;
> +        }
> +    }
> +
> +    return launched;
> +}
> +
> +static int set_vvmcs_launched(struct list_head *launched_list,
> +                              unsigned long vvmcs_mfn)
> +{
> +    struct vvmcs_list *vvmcs;
> +
> +    if ( vvmcs_launched(launched_list, vvmcs_mfn) )
> +        return 0;
> +
> +    vvmcs = xzalloc(struct vvmcs_list);
> +    if ( !vvmcs )
> +        return -ENOMEM;
> +
> +    vvmcs->vvmcs_mfn = vvmcs_mfn;
> +    list_add(&vvmcs->node, launched_list);
> +
> +    return 0;
> +}
> +
> +static void clear_vvmcs_launched(struct list_head *launched_list,
> +                                 paddr_t vvmcs_mfn)
> +{
> +    struct vvmcs_list *vvmcs;
> +    struct list_head *pos;
> +
> +    list_for_each(pos, launched_list)
> +    {
> +        vvmcs = list_entry(pos, struct vvmcs_list, node);
> +        if ( vvmcs_mfn == vvmcs->vvmcs_mfn )
> +        {
> +            list_del(&vvmcs->node);
> +            xfree(vvmcs);
> +            break;
> +        }
> +    }
> +}
> +
>  int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs)
>  {
>      struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> @@ -1225,8 +1290,10 @@ int nvmx_vmresume(struct vcpu *v, struct
> cpu_user_regs *regs)
> 
>  int nvmx_handle_vmresume(struct cpu_user_regs *regs)
>  {
> -    int launched;
> +    bool_t launched;
>      struct vcpu *v = current;
> +    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> 
>      if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR )
>      {
> @@ -1234,8 +1301,8 @@ int nvmx_handle_vmresume(struct cpu_user_regs
> *regs)
>          return X86EMUL_OKAY;
>      }
> 
> -    launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
> -                           NVMX_LAUNCH_STATE);
> +    launched = vvmcs_launched(&nvmx->launched_list,
> +                   domain_page_map_to_mfn(nvcpu->nv_vvmcx));
>      if ( !launched ) {
>         vmreturn (regs, VMFAIL_VALID);
>         return X86EMUL_OKAY;
> @@ -1245,9 +1312,11 @@ int nvmx_handle_vmresume(struct
> cpu_user_regs *regs)
> 
>  int nvmx_handle_vmlaunch(struct cpu_user_regs *regs)
>  {
> -    int launched;
> +    bool_t launched;
>      int rc;
>      struct vcpu *v = current;
> +    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> 
>      if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR )
>      {
> @@ -1255,8 +1324,8 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs
> *regs)
>          return X86EMUL_OKAY;
>      }
> 
> -    launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
> -                           NVMX_LAUNCH_STATE);
> +    launched = vvmcs_launched(&nvmx->launched_list,
> +                   domain_page_map_to_mfn(nvcpu->nv_vvmcx));
>      if ( launched ) {
>         vmreturn (regs, VMFAIL_VALID);
>         return X86EMUL_OKAY;
> @@ -1264,8 +1333,12 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs
> *regs)
>      else {
>          rc = nvmx_vmresume(v,regs);
>          if ( rc == X86EMUL_OKAY )
> -            __set_vvmcs(vcpu_nestedhvm(v).nv_vvmcx,
> -                        NVMX_LAUNCH_STATE, 1);
> +        {
> +            if ( set_vvmcs_launched(&nvmx->launched_list,
> +                    domain_page_map_to_mfn(nvcpu->nv_vvmcx)) <
> 0 )
> +                return X86EMUL_UNHANDLEABLE;
> +        }
> +
>      }
>      return rc;
>  }
> @@ -1332,6 +1405,7 @@ int nvmx_handle_vmclear(struct cpu_user_regs
> *regs)
>      struct vcpu *v = current;
>      struct vmx_inst_decoded decode;
>      struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> +    struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
>      unsigned long gpa = 0;
>      void *vvmcs;
>      int rc;
> @@ -1348,7 +1422,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs
> *regs)
> 
>      if ( gpa == nvcpu->nv_vvmcxaddr )
>      {
> -        __set_vvmcs(nvcpu->nv_vvmcx, NVMX_LAUNCH_STATE, 0);
> +        clear_vvmcs_launched(&nvmx->launched_list,
> +            domain_page_map_to_mfn(nvcpu->nv_vvmcx));
>          nvmx_purge_vvmcs(v);
>      }
>      else
> @@ -1356,7 +1431,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs
> *regs)
>          /* Even if this VMCS isn''t the current one, we must clear
it. */
>          vvmcs = hvm_map_guest_frame_rw(gpa >> PAGE_SHIFT, 0);
>          if ( vvmcs )
> -            __set_vvmcs(vvmcs, NVMX_LAUNCH_STATE, 0);
> +            clear_vvmcs_launched(&nvmx->launched_list,
> +                domain_page_map_to_mfn(vvmcs));
>          hvm_unmap_guest_frame(vvmcs, 0);
>      }
> 
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
> b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 51df81e..9ff741f 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -421,8 +421,6 @@ enum vmcs_field {
>      HOST_SYSENTER_EIP               = 0x00006c12,
>      HOST_RSP                        = 0x00006c14,
>      HOST_RIP                        = 0x00006c16,
> -    /* A virtual VMCS field used for nestedvmx only */
> -    NVMX_LAUNCH_STATE               = 0x00006c20,
>  };
> 
>  #define VMCS_VPID_WIDTH 16
> diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
> b/xen/include/asm-x86/hvm/vmx/vvmx.h
> index 9e1dc77..89e839f 100644
> --- a/xen/include/asm-x86/hvm/vmx/vvmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
> @@ -23,6 +23,11 @@
>  #ifndef __ASM_X86_HVM_VVMX_H__
>  #define __ASM_X86_HVM_VVMX_H__
> 
> +struct vvmcs_list {
> +    unsigned long vvmcs_mfn;
> +    struct list_head node;
> +};
> +
>  struct nestedvmx {
>      paddr_t    vmxon_region_pa;
>      void       *iobitmap[2];		/* map (va) of L1 guest I/O bitmap */
> @@ -38,6 +43,7 @@ struct nestedvmx {
>          uint32_t exit_qual;
>      } ept;
>      uint32_t guest_vpid;
> +    struct list_head launched_list;
>  };
> 
>  #define vcpu_2_nvmx(v)	(vcpu_nestedhvm(v).u.nvmx)
> --
> 1.7.1
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Dong, Eddie

2013-Jan-25 02:26 UTC

head link

Re: [PATCH v5 rebased 2/4] nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing

Acked-by Eddie Dong <eddie.dong@intel.com>
> -----Original Message-----
> From: Xu, Dongxiao
> Sent: Wednesday, January 23, 2013 10:32 PM
> To: xen-devel@lists.xensource.com
> Cc: Dong, Eddie; Nakajima, Jun; Zhang, Xiantao; JBeulich@suse.com
> Subject: [PATCH v5 rebased 2/4] nested vmx: use VMREAD/VMWRITE to
> construct vVMCS if enabled VMCS shadowing
> 
> Before the VMCS shadowing feature, we use memory operation to build up
> the virtual VMCS. This does work since this virtual VMCS will never be
> loaded into real hardware. However after we introduce the VMCS
> shadowing feature, this VMCS will be loaded into hardware, which
> requires all fields in the VMCS accessed by VMREAD/VMWRITE.
> 
> Besides, the virtual VMCS revision identifer should also meet the
> hardware''s requirement, instead of using a faked one.
> 
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c        |   29
> +++++++++++++++++++++++++++++
>  xen/arch/x86/hvm/vmx/vvmx.c        |   20 ++++++++++++++++----
>  xen/include/asm-x86/hvm/vmx/vmcs.h |    5 +++++
>  xen/include/asm-x86/hvm/vmx/vvmx.h |   16 ++++++++++++----
>  4 files changed, 62 insertions(+), 8 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index de22e03..82a8d91 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -725,6 +725,35 @@ void vmx_vmcs_switch(struct vmcs_struct *from,
> struct vmcs_struct *to)
>      spin_unlock(&vmx->vmcs_lock);
>  }
> 
> +void virtual_vmcs_enter(void *vvmcs)
> +{
> +    __vmptrld(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
> +}
> +
> +void virtual_vmcs_exit(void *vvmcs)
> +{
> +    __vmpclear(pfn_to_paddr(domain_page_map_to_mfn(vvmcs)));
> +    __vmptrld(virt_to_maddr(this_cpu(current_vmcs)));
> +}
> +
> +u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding)
> +{
> +    u64 res;
> +
> +    virtual_vmcs_enter(vvmcs);
> +    res = __vmread(vmcs_encoding);
> +    virtual_vmcs_exit(vvmcs);
> +
> +    return res;
> +}
> +
> +void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val)
> +{
> +    virtual_vmcs_enter(vvmcs);
> +    __vmwrite(vmcs_encoding, val);
> +    virtual_vmcs_exit(vvmcs);
> +}
> +
>  static int construct_vmcs(struct vcpu *v)
>  {
>      struct domain *d = v->domain;
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index cd8bb90..8000f84 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -175,7 +175,7 @@ static int vvmcs_offset(u32 width, u32 type, u32
> index)
>      return offset;
>  }
> 
> -u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding)
> +u64 __get_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding)
>  {
>      union vmcs_encoding enc;
>      u64 *content = (u64 *) vvmcs;
> @@ -205,7 +205,12 @@ u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding)
>      return res;
>  }
> 
> -void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val)
> +u64 __get_vvmcs_real(void *vvmcs, u32 vmcs_encoding)
> +{
> +    return virtual_vmcs_vmread(vvmcs, vmcs_encoding);
> +}
> +
> +void __set_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding, u64 val)
>  {
>      union vmcs_encoding enc;
>      u64 *content = (u64 *) vvmcs;
> @@ -241,6 +246,11 @@ void __set_vvmcs(void *vvmcs, u32 vmcs_encoding,
> u64 val)
>      content[offset] = res;
>  }
> 
> +void __set_vvmcs_real(void *vvmcs, u32 vmcs_encoding, u64 val)
> +{
> +    virtual_vmcs_vmwrite(vvmcs, vmcs_encoding, val);
> +}
> +
>  static unsigned long reg_read(struct cpu_user_regs *regs,
>                                enum vmx_regs_enc index)
>  {
> @@ -1568,10 +1578,11 @@ int nvmx_handle_invvpid(struct cpu_user_regs
> *regs)
>   */
>  int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content)
>  {
> +    struct vcpu *v = current;
>      u64 data = 0, host_data = 0;
>      int r = 1;
> 
> -    if ( !nestedhvm_enabled(current->domain) )
> +    if ( !nestedhvm_enabled(v->domain) )
>          return 0;
> 
>      rdmsrl(msr, host_data);
> @@ -1581,7 +1592,8 @@ int nvmx_msr_read_intercept(unsigned int msr,
> u64 *msr_content)
>       */
>      switch (msr) {
>      case MSR_IA32_VMX_BASIC:
> -        data = (host_data & (~0ul << 32)) | VVMCS_REVISION;
> +        data = (host_data & (~0ul << 32)) |
> +               ((v->arch.hvm_vmx.vmcs)->vmcs_revision_id);
>          break;
>      case MSR_IA32_VMX_PINBASED_CTLS:
>      case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
> b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 9ff741f..652dc21 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -244,6 +244,7 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
>      (vmx_secondary_exec_control &
> SECONDARY_EXEC_APIC_REGISTER_VIRT)
>  #define cpu_has_vmx_virtual_intr_delivery \
>      (vmx_secondary_exec_control &
> SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)
> +#define cpu_has_vmx_vmcs_shadowing 0
> 
>  /* GUEST_INTERRUPTIBILITY_INFO flags. */
>  #define VMX_INTR_SHADOW_STI             0x00000001
> @@ -436,6 +437,10 @@ void vmx_vmcs_switch(struct vmcs_struct *from,
> struct vmcs_struct *to);
>  void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector);
>  void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector);
>  int vmx_check_msr_bitmap(unsigned long *msr_bitmap, u32 msr, int
> access_type);
> +void virtual_vmcs_enter(void *vvmcs);
> +void virtual_vmcs_exit(void *vvmcs);
> +u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding);
> +void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val);
> 
>  #endif /* ASM_X86_HVM_VMX_VMCS_H__ */
> 
> diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
> b/xen/include/asm-x86/hvm/vmx/vvmx.h
> index 89e839f..73a67cc 100644
> --- a/xen/include/asm-x86/hvm/vmx/vvmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
> @@ -152,8 +152,6 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t
> L2_gpa, paddr_t *L1_gpa,
>   *
>   */
> 
> -#define VVMCS_REVISION 0x40000001u
> -
>  struct vvmcs_header {
>      u32 revision;
>      u32 abort;
> @@ -185,8 +183,18 @@ enum vvmcs_encoding_type {
>      VVMCS_TYPE_HSTATE,
>  };
> 
> -u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding);
> -void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val);
> +u64 __get_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding);
> +u64 __get_vvmcs_real(void *vvmcs, u32 vmcs_encoding);
> +void __set_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding, u64 val);
> +void __set_vvmcs_real(void *vvmcs, u32 vmcs_encoding, u64 val);
> +
> +#define __get_vvmcs(_vvmcs, _vmcs_encoding) \
> +  (cpu_has_vmx_vmcs_shadowing ? __get_vvmcs_real(_vvmcs,
> _vmcs_encoding) \
> +                              : __get_vvmcs_virtual(_vvmcs,
> _vmcs_encoding))
> +
> +#define __set_vvmcs(_vvmcs, _vmcs_encoding, _val) \
> +  (cpu_has_vmx_vmcs_shadowing ? __set_vvmcs_real(_vvmcs,
> _vmcs_encoding, _val) \
> +                              : __set_vvmcs_virtual(_vvmcs,
> _vmcs_encoding, _val))
> 
>  uint64_t get_shadow_eptp(struct vcpu *v);
> 
> --
> 1.7.1

Dong, Eddie

2013-Jan-25 02:27 UTC

head link

Re: [PATCH v5 rebased 3/4] nested vmx: optimize for bulk access of virtual VMCS

Acked-by Eddie Dong <eddie.dong@intel.com>
> -----Original Message-----
> From: Xu, Dongxiao
> Sent: Wednesday, January 23, 2013 10:32 PM
> To: xen-devel@lists.xensource.com
> Cc: Dong, Eddie; Nakajima, Jun; Zhang, Xiantao; JBeulich@suse.com
> Subject: [PATCH v5 rebased 3/4] nested vmx: optimize for bulk access of
> virtual VMCS
> 
> After we use the VMREAD/VMWRITE to build up the virtual VMCS, each
> access to the virtual VMCS needs two VMPTRLD and one VMCLEAR to
> switch the environment, which might be an overhead to performance.
> This commit tries to handle multiple virtual VMCS access together
> to improve the performance.
> 
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c        |   10 +++
>  xen/arch/x86/hvm/vmx/vvmx.c        |  126
> +++++++++++++++++++++++++++++++-----
>  xen/include/asm-x86/hvm/vmx/vvmx.h |    2 +
>  3 files changed, 122 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index 82a8d91..f89ea93 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -31,6 +31,7 @@
>  #include <asm/hvm/io.h>
>  #include <asm/hvm/support.h>
>  #include <asm/hvm/vmx/vmx.h>
> +#include <asm/hvm/vmx/vvmx.h>
>  #include <asm/hvm/vmx/vmcs.h>
>  #include <asm/flushtlb.h>
>  #include <xen/event.h>
> @@ -423,6 +424,13 @@ static void vmx_load_vmcs(struct vcpu *v)
> 
>  int vmx_cpu_up_prepare(unsigned int cpu)
>  {
> +    /*
> +     * If nvmx_cpu_up_prepare() failed, do not return failure and just
> fallback
> +     * to legacy mode for vvmcs synchronization.
> +     */
> +    if ( nvmx_cpu_up_prepare(cpu) != 0 )
> +        printk("CPU%d: Could not allocate virtual VMCS
buffer.\n", cpu);
> +
>      if ( per_cpu(vmxon_region, cpu) != NULL )
>          return 0;
> 
> @@ -431,6 +439,7 @@ int vmx_cpu_up_prepare(unsigned int cpu)
>          return 0;
> 
>      printk("CPU%d: Could not allocate host VMCS\n", cpu);
> +    nvmx_cpu_dead(cpu);
>      return -ENOMEM;
>  }
> 
> @@ -438,6 +447,7 @@ void vmx_cpu_dead(unsigned int cpu)
>  {
>      vmx_free_vmcs(per_cpu(vmxon_region, cpu));
>      per_cpu(vmxon_region, cpu) = NULL;
> +    nvmx_cpu_dead(cpu);
>  }
> 
>  int vmx_cpu_up(void)
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 8000f84..1e1ad56 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -28,8 +28,31 @@
>  #include <asm/hvm/vmx/vvmx.h>
>  #include <asm/hvm/nestedhvm.h>
> 
> +static DEFINE_PER_CPU(u64 *, vvmcs_buf);
> +
>  static void nvmx_purge_vvmcs(struct vcpu *v);
> 
> +#define VMCS_BUF_SIZE 100
> +
> +int nvmx_cpu_up_prepare(unsigned int cpu)
> +{
> +    if ( per_cpu(vvmcs_buf, cpu) != NULL )
> +        return 0;
> +
> +    per_cpu(vvmcs_buf, cpu) = xzalloc_array(u64, VMCS_BUF_SIZE);
> +
> +    if ( per_cpu(vvmcs_buf, cpu) != NULL )
> +        return 0;
> +
> +    return -ENOMEM;
> +}
> +
> +void nvmx_cpu_dead(unsigned int cpu)
> +{
> +    xfree(per_cpu(vvmcs_buf, cpu));
> +    per_cpu(vvmcs_buf, cpu) = NULL;
> +}
> +
>  int nvmx_vcpu_initialise(struct vcpu *v)
>  {
>      struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
> @@ -834,6 +857,40 @@ static void vvmcs_to_shadow(void *vvmcs, unsigned
> int field)
>      __vmwrite(field, value);
>  }
> 
> +static void vvmcs_to_shadow_bulk(struct vcpu *v, unsigned int n,
> +                                 const u16 *field)
> +{
> +    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> +    void *vvmcs = nvcpu->nv_vvmcx;
> +    u64 *value = this_cpu(vvmcs_buf);
> +    unsigned int i;
> +
> +    if ( !cpu_has_vmx_vmcs_shadowing )
> +        goto fallback;
> +
> +    if ( !value || n > VMCS_BUF_SIZE )
> +    {
> +        gdprintk(XENLOG_DEBUG, "vmcs sync fall back to non-bulk mode,
> \
> +                 buffer: %p, buffer size: %d, fields number: %d.\n",
> +                 value, VMCS_BUF_SIZE, n);
> +        goto fallback;
> +    }
> +
> +    virtual_vmcs_enter(vvmcs);
> +    for ( i = 0; i < n; i++ )
> +        value[i] = __vmread(field[i]);
> +    virtual_vmcs_exit(vvmcs);
> +
> +    for ( i = 0; i < n; i++ )
> +        __vmwrite(field[i], value[i]);
> +
> +    return;
> +
> +fallback:
> +    for ( i = 0; i < n; i++ )
> +        vvmcs_to_shadow(vvmcs, field[i]);
> +}
> +
>  static void shadow_to_vvmcs(void *vvmcs, unsigned int field)
>  {
>      u64 value;
> @@ -844,6 +901,40 @@ static void shadow_to_vvmcs(void *vvmcs, unsigned
> int field)
>          __set_vvmcs(vvmcs, field, value);
>  }
> 
> +static void shadow_to_vvmcs_bulk(struct vcpu *v, unsigned int n,
> +                                 const u16 *field)
> +{
> +    struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
> +    void *vvmcs = nvcpu->nv_vvmcx;
> +    u64 *value = this_cpu(vvmcs_buf);
> +    unsigned int i;
> +
> +    if ( !cpu_has_vmx_vmcs_shadowing )
> +        goto fallback;
> +
> +    if ( !value || n > VMCS_BUF_SIZE )
> +    {
> +        gdprintk(XENLOG_DEBUG, "vmcs sync fall back to non-bulk mode,
> \
> +                 buffer: %p, buffer size: %d, fields number: %d.\n",
> +                 value, VMCS_BUF_SIZE, n);
> +        goto fallback;
> +    }
> +
> +    for ( i = 0; i < n; i++ )
> +        value[i] = __vmread(field[i]);
> +
> +    virtual_vmcs_enter(vvmcs);
> +    for ( i = 0; i < n; i++ )
> +        __vmwrite(field[i], value[i]);
> +    virtual_vmcs_exit(vvmcs);
> +
> +    return;
> +
> +fallback:
> +    for ( i = 0; i < n; i++ )
> +        shadow_to_vvmcs(vvmcs, field[i]);
> +}
> +
>  static void load_shadow_control(struct vcpu *v)
>  {
>      /*
> @@ -867,13 +958,18 @@ static void load_shadow_guest_state(struct vcpu
> *v)
>  {
>      struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
>      void *vvmcs = nvcpu->nv_vvmcx;
> -    int i;
>      u32 control;
>      u64 cr_gh_mask, cr_read_shadow;
> 
> +    static const u16 vmentry_fields[] = {
> +        VM_ENTRY_INTR_INFO,
> +        VM_ENTRY_EXCEPTION_ERROR_CODE,
> +        VM_ENTRY_INSTRUCTION_LEN,
> +    };
> +
>      /* vvmcs.gstate to shadow vmcs.gstate */
> -    for ( i = 0; i < ARRAY_SIZE(vmcs_gstate_field); i++ )
> -        vvmcs_to_shadow(vvmcs, vmcs_gstate_field[i]);
> +    vvmcs_to_shadow_bulk(v, ARRAY_SIZE(vmcs_gstate_field),
> +                         vmcs_gstate_field);
> 
>      hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0));
>      hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4));
> @@ -887,9 +983,7 @@ static void load_shadow_guest_state(struct vcpu *v)
> 
>      hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset);
> 
> -    vvmcs_to_shadow(vvmcs, VM_ENTRY_INTR_INFO);
> -    vvmcs_to_shadow(vvmcs, VM_ENTRY_EXCEPTION_ERROR_CODE);
> -    vvmcs_to_shadow(vvmcs, VM_ENTRY_INSTRUCTION_LEN);
> +    vvmcs_to_shadow_bulk(v, ARRAY_SIZE(vmentry_fields),
> vmentry_fields);
> 
>      /*
>       * While emulate CR0 and CR4 for nested virtualization, set the
> CR0/CR4
> @@ -909,10 +1003,13 @@ static void load_shadow_guest_state(struct vcpu
> *v)
>      if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) &&
>           (v->arch.hvm_vcpu.guest_efer & EFER_LMA) )
>      {
> -        vvmcs_to_shadow(vvmcs, GUEST_PDPTR0);
> -        vvmcs_to_shadow(vvmcs, GUEST_PDPTR1);
> -        vvmcs_to_shadow(vvmcs, GUEST_PDPTR2);
> -        vvmcs_to_shadow(vvmcs, GUEST_PDPTR3);
> +        static const u16 gpdptr_fields[] = {
> +            GUEST_PDPTR0,
> +            GUEST_PDPTR1,
> +            GUEST_PDPTR2,
> +            GUEST_PDPTR3,
> +        };
> +        vvmcs_to_shadow_bulk(v, ARRAY_SIZE(gpdptr_fields),
> gpdptr_fields);
>      }
> 
>      /* TODO: CR3 target control */
> @@ -1003,13 +1100,12 @@ static void virtual_vmentry(struct cpu_user_regs
> *regs)
> 
>  static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs
> *regs)
>  {
> -    int i;
>      struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
>      void *vvmcs = nvcpu->nv_vvmcx;
> 
>      /* copy shadow vmcs.gstate back to vvmcs.gstate */
> -    for ( i = 0; i < ARRAY_SIZE(vmcs_gstate_field); i++ )
> -        shadow_to_vvmcs(vvmcs, vmcs_gstate_field[i]);
> +    shadow_to_vvmcs_bulk(v, ARRAY_SIZE(vmcs_gstate_field),
> +                         vmcs_gstate_field);
>      /* RIP, RSP are in user regs */
>      __set_vvmcs(vvmcs, GUEST_RIP, regs->eip);
>      __set_vvmcs(vvmcs, GUEST_RSP, regs->esp);
> @@ -1021,13 +1117,11 @@ static void sync_vvmcs_guest_state(struct vcpu
> *v, struct cpu_user_regs *regs)
> 
>  static void sync_vvmcs_ro(struct vcpu *v)
>  {
> -    int i;
>      struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v);
>      struct nestedvmx *nvmx = &vcpu_2_nvmx(v);
>      void *vvmcs = nvcpu->nv_vvmcx;
> 
> -    for ( i = 0; i < ARRAY_SIZE(vmcs_ro_field); i++ )
> -        shadow_to_vvmcs(nvcpu->nv_vvmcx, vmcs_ro_field[i]);
> +    shadow_to_vvmcs_bulk(v, ARRAY_SIZE(vmcs_ro_field), vmcs_ro_field);
> 
>      /* Adjust exit_reason/exit_qualifciation for violation case */
>      if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) =>
EXIT_REASON_EPT_VIOLATION )
> diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h
> b/xen/include/asm-x86/hvm/vmx/vvmx.h
> index 73a67cc..3874525 100644
> --- a/xen/include/asm-x86/hvm/vmx/vvmx.h
> +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h
> @@ -229,5 +229,7 @@ int nept_translate_l2ga(struct vcpu *v, paddr_t l2ga,
>                          unsigned int *page_order, uint32_t rwx_acc,
>                          unsigned long *l1gfn, uint8_t *p2m_acc,
>                          uint64_t *exit_qual, uint32_t *exit_reason);
> +int nvmx_cpu_up_prepare(unsigned int cpu);
> +void nvmx_cpu_dead(unsigned int cpu);
>  #endif /* __ASM_X86_HVM_VVMX_H__ */
> 
> --
> 1.7.1

Dong, Eddie

2013-Jan-25 02:28 UTC

head link

Re: [PATCH v5 rebased 4/4] nested vmx: enable VMCS shadowing feature

Acked-by Eddie Dong <eddie.dong@intel.com>
> -----Original Message-----
> From: Xu, Dongxiao
> Sent: Wednesday, January 23, 2013 10:32 PM
> To: xen-devel@lists.xensource.com
> Cc: Dong, Eddie; Nakajima, Jun; Zhang, Xiantao; JBeulich@suse.com
> Subject: [PATCH v5 rebased 4/4] nested vmx: enable VMCS shadowing
> feature
> 
> The current logic for handling the non-root VMREAD/VMWRITE is by
> VM-Exit and emulate, which may bring certain overhead.
> 
> On new Intel platform, it introduces a new feature called VMCS
> shadowing, where non-root VMREAD/VMWRITE will not trigger VM-Exit,
> and the hardware will read/write the virtual VMCS instead.
> This is proved to have performance improvement with the feature.
> 
> Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
> ---
>  xen/arch/x86/hvm/vmx/vmcs.c        |    9 ++++
>  xen/arch/x86/hvm/vmx/vvmx.c        |   82
> ++++++++++++++++++++++++++++++++++++
>  xen/include/asm-x86/hvm/vmx/vmcs.h |   18 +++++++-
>  3 files changed, 108 insertions(+), 1 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
> index f89ea93..69b7dae 100644
> --- a/xen/arch/x86/hvm/vmx/vmcs.c
> +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> @@ -92,6 +92,7 @@ static void __init vmx_display_features(void)
>      P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest");
>      P(cpu_has_vmx_apic_reg_virt, "APIC Register
Virtualization");
>      P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt
Delivery");
> +    P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing");
>  #undef P
> 
>      if ( !printed )
> @@ -133,6 +134,7 @@ static int vmx_init_vmcs_config(void)
>      u32 _vmx_cpu_based_exec_control;
>      u32 _vmx_secondary_exec_control = 0;
>      u64 _vmx_ept_vpid_cap = 0;
> +    u64 _vmx_misc_cap = 0;
>      u32 _vmx_vmexit_control;
>      u32 _vmx_vmentry_control;
>      bool_t mismatch = 0;
> @@ -180,6 +182,9 @@ static int vmx_init_vmcs_config(void)
>                 SECONDARY_EXEC_ENABLE_RDTSCP |
>                 SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>                 SECONDARY_EXEC_ENABLE_INVPCID);
> +        rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
> +        if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
> +            opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
>          if ( opt_vpid_enabled )
>              opt |= SECONDARY_EXEC_ENABLE_VPID;
>          if ( opt_unrestricted_guest_enabled )
> @@ -383,6 +388,8 @@ static void __vmx_clear_vmcs(void *info)
>      if ( arch_vmx->active_cpu == smp_processor_id() )
>      {
>          __vmpclear(virt_to_maddr(arch_vmx->vmcs));
> +        if ( arch_vmx->vmcs_shadow_maddr )
> +            __vmpclear(arch_vmx->vmcs_shadow_maddr);
> 
>          arch_vmx->active_cpu = -1;
>          arch_vmx->launched   = 0;
> @@ -720,6 +727,8 @@ void vmx_vmcs_switch(struct vmcs_struct *from,
> struct vmcs_struct *to)
>      spin_lock(&vmx->vmcs_lock);
> 
>      __vmpclear(virt_to_maddr(from));
> +    if ( vmx->vmcs_shadow_maddr )
> +        __vmpclear(vmx->vmcs_shadow_maddr);
>      __vmptrld(virt_to_maddr(to));
> 
>      vmx->vmcs = to;
> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c
> index 1e1ad56..3c152c5 100644
> --- a/xen/arch/x86/hvm/vmx/vvmx.c
> +++ b/xen/arch/x86/hvm/vmx/vvmx.c
> @@ -64,6 +64,48 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>          gdprintk(XENLOG_ERR, "nest: allocation for shadow vmcs
> failed\n");
>  	goto out;
>      }
> +
> +    /* non-root VMREAD/VMWRITE bitmap. */
> +    if ( cpu_has_vmx_vmcs_shadowing )
> +    {
> +        struct page_info *vmread_bitmap, *vmwrite_bitmap;
> +        unsigned long *vr, *vw;
> +
> +        vmread_bitmap = alloc_domheap_page(NULL, 0);
> +        if ( !vmread_bitmap )
> +        {
> +            gdprintk(XENLOG_ERR, "nest: allocation for vmread bitmap
> failed\n");
> +            goto out1;
> +        }
> +        v->arch.hvm_vmx.vmread_bitmap = vmread_bitmap;
> +
> +        vmwrite_bitmap = alloc_domheap_page(NULL, 0);
> +        if ( !vmwrite_bitmap )
> +        {
> +            gdprintk(XENLOG_ERR, "nest: allocation for vmwrite bitmap
> failed\n");
> +            goto out2;
> +        }
> +        v->arch.hvm_vmx.vmwrite_bitmap = vmwrite_bitmap;
> +
> +        vr = __map_domain_page(vmread_bitmap);
> +        vw = __map_domain_page(vmwrite_bitmap);
> +
> +        clear_page(vr);
> +        clear_page(vw);
> +
> +        /*
> +         * For the following 4 encodings, we need to handle them in
> VMM.
> +         * Let them vmexit as usual.
> +         */
> +        set_bit(IO_BITMAP_A, vw);
> +        set_bit(IO_BITMAP_A_HIGH, vw);
> +        set_bit(IO_BITMAP_B, vw);
> +        set_bit(IO_BITMAP_B_HIGH, vw);
> +
> +        unmap_domain_page(vr);
> +        unmap_domain_page(vw);
> +    }
> +
>      nvmx->ept.enabled = 0;
>      nvmx->guest_vpid = 0;
>      nvmx->vmxon_region_pa = 0;
> @@ -76,6 +118,10 @@ int nvmx_vcpu_initialise(struct vcpu *v)
>      nvmx->msrbitmap = NULL;
>      INIT_LIST_HEAD(&nvmx->launched_list);
>      return 0;
> +out2:
> +    free_domheap_page(v->arch.hvm_vmx.vmread_bitmap);
> +out1:
> +    free_xenheap_page(nvcpu->nv_n2vmcx);
>  out:
>      return -ENOMEM;
>  }
> @@ -106,6 +152,11 @@ void nvmx_vcpu_destroy(struct vcpu *v)
>          list_del(&item->node);
>          xfree(item);
>      }
> +
> +    if ( v->arch.hvm_vmx.vmread_bitmap )
> +        free_domheap_page(v->arch.hvm_vmx.vmread_bitmap);
> +    if ( v->arch.hvm_vmx.vmwrite_bitmap )
> +        free_domheap_page(v->arch.hvm_vmx.vmwrite_bitmap);
>  }
> 
>  void nvmx_domain_relinquish_resources(struct domain *d)
> @@ -1035,6 +1086,32 @@ static bool_t nvmx_vpid_enabled(struct
> nestedvcpu *nvcpu)
>      return 0;
>  }
> 
> +static void nvmx_set_vmcs_pointer(struct vcpu *v, struct vmcs_struct
> *vvmcs)
> +{
> +    unsigned long vvmcs_mfn = domain_page_map_to_mfn(vvmcs);
> +    paddr_t vvmcs_maddr = vvmcs_mfn << PAGE_SHIFT;
> +
> +    __vmpclear(vvmcs_maddr);
> +    vvmcs->vmcs_revision_id |= VMCS_RID_TYPE_MASK;
> +    v->arch.hvm_vmx.vmcs_shadow_maddr = vvmcs_maddr;
> +    __vmwrite(VMCS_LINK_POINTER, vvmcs_maddr);
> +    __vmwrite(VMREAD_BITMAP,
> page_to_maddr(v->arch.hvm_vmx.vmread_bitmap));
> +    __vmwrite(VMWRITE_BITMAP,
> page_to_maddr(v->arch.hvm_vmx.vmwrite_bitmap));
> +}
> +
> +static void nvmx_clear_vmcs_pointer(struct vcpu *v, struct vmcs_struct
> *vvmcs)
> +{
> +    unsigned long vvmcs_mfn = domain_page_map_to_mfn(vvmcs);
> +    paddr_t vvmcs_maddr = vvmcs_mfn << PAGE_SHIFT;
> +
> +    __vmpclear(vvmcs_maddr);
> +    vvmcs->vmcs_revision_id &= ~VMCS_RID_TYPE_MASK;
> +    v->arch.hvm_vmx.vmcs_shadow_maddr = 0;
> +    __vmwrite(VMCS_LINK_POINTER, ~0ul);
> +    __vmwrite(VMREAD_BITMAP, 0);
> +    __vmwrite(VMWRITE_BITMAP, 0);
> +}
> +
>  static void virtual_vmentry(struct cpu_user_regs *regs)
>  {
>      struct vcpu *v = current;
> @@ -1476,6 +1553,9 @@ int nvmx_handle_vmptrld(struct cpu_user_regs
> *regs)
>          __map_msr_bitmap(v);
>      }
> 
> +    if ( cpu_has_vmx_vmcs_shadowing )
> +        nvmx_set_vmcs_pointer(v, nvcpu->nv_vvmcx);
> +
>      vmreturn(regs, VMSUCCEED);
> 
>  out:
> @@ -1526,6 +1606,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs
> *regs)
> 
>      if ( gpa == nvcpu->nv_vvmcxaddr )
>      {
> +        if ( cpu_has_vmx_vmcs_shadowing )
> +            nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx);
>          clear_vvmcs_launched(&nvmx->launched_list,
>              domain_page_map_to_mfn(nvcpu->nv_vvmcx));
>          nvmx_purge_vvmcs(v);
> diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h
> b/xen/include/asm-x86/hvm/vmx/vmcs.h
> index 652dc21..ba02221 100644
> --- a/xen/include/asm-x86/hvm/vmx/vmcs.h
> +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
> @@ -81,6 +81,8 @@ struct vmx_domain {
>  struct arch_vmx_struct {
>      /* Virtual address of VMCS. */
>      struct vmcs_struct  *vmcs;
> +    /* VMCS shadow machine address. */
> +    paddr_t             vmcs_shadow_maddr;
> 
>      /* Protects remote usage of VMCS (VMPTRLD/VMCLEAR). */
>      spinlock_t           vmcs_lock;
> @@ -125,6 +127,10 @@ struct arch_vmx_struct {
>      /* Remember EFLAGS while in virtual 8086 mode */
>      uint32_t             vm86_saved_eflags;
>      int                  hostenv_migrated;
> +
> +    /* Bitmap to control vmexit policy for Non-root VMREAD/VMWRITE */
> +    struct page_info     *vmread_bitmap;
> +    struct page_info     *vmwrite_bitmap;
>  };
> 
>  int vmx_create_vmcs(struct vcpu *v);
> @@ -191,6 +197,7 @@ extern u32 vmx_vmentry_control;
>  #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY    0x00000200
>  #define SECONDARY_EXEC_PAUSE_LOOP_EXITING       0x00000400
>  #define SECONDARY_EXEC_ENABLE_INVPCID           0x00001000
> +#define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING    0x00004000
>  extern u32 vmx_secondary_exec_control;
> 
>  extern bool_t cpu_has_vmx_ins_outs_instr_info;
> @@ -205,6 +212,8 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
>  #define VMX_EPT_INVEPT_SINGLE_CONTEXT           0x02000000
>  #define VMX_EPT_INVEPT_ALL_CONTEXT              0x04000000
> 
> +#define VMX_MISC_VMWRITE_ALL                    0x20000000
> +
>  #define VMX_VPID_INVVPID_INSTRUCTION
> 0x100000000ULL
>  #define VMX_VPID_INVVPID_INDIVIDUAL_ADDR
> 0x10000000000ULL
>  #define VMX_VPID_INVVPID_SINGLE_CONTEXT
> 0x20000000000ULL
> @@ -244,7 +253,10 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info;
>      (vmx_secondary_exec_control &
> SECONDARY_EXEC_APIC_REGISTER_VIRT)
>  #define cpu_has_vmx_virtual_intr_delivery \
>      (vmx_secondary_exec_control &
> SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY)
> -#define cpu_has_vmx_vmcs_shadowing 0
> +#define cpu_has_vmx_vmcs_shadowing \
> +    (vmx_secondary_exec_control &
> SECONDARY_EXEC_ENABLE_VMCS_SHADOWING)
> +
> +#define VMCS_RID_TYPE_MASK              0x80000000
> 
>  /* GUEST_INTERRUPTIBILITY_INFO flags. */
>  #define VMX_INTR_SHADOW_STI             0x00000001
> @@ -305,6 +317,10 @@ enum vmcs_field {
>      EOI_EXIT_BITMAP2_HIGH           = 0x00002021,
>      EOI_EXIT_BITMAP3                = 0x00002022,
>      EOI_EXIT_BITMAP3_HIGH           = 0x00002023,
> +    VMREAD_BITMAP                   = 0x00002026,
> +    VMREAD_BITMAP_HIGH              = 0x00002027,
> +    VMWRITE_BITMAP                  = 0x00002028,
> +    VMWRITE_BITMAP_HIGH             = 0x00002029,
>      GUEST_PHYSICAL_ADDRESS          = 0x00002400,
>      GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
>      VMCS_LINK_POINTER               = 0x00002800,
> --
> 1.7.1

Joerg Roedel

2013-Jan-29 10:19 UTC

head link

Re: [PATCH v5 rebased 0/4] nested vmx: enable VMCS shadowing feature

Hi Dongxiao,

On Wed, Jan 23, 2013 at 10:32:08PM +0800, Dongxiao Xu
wrote:> The main purpose of this feature is to reduce or eliminate the number of VM
exit
> for non-root VMREAD and VMWRITE. It provides the capability to link a
> "virtual VMCS" with the current running VMCS, so that after VM
entry, the
> non-root VMREAD and VMWRITE can get/set related data directly from/to the
> "virtual VMCS" without trap and emulation.
This looks like a neat feature. I am wondering when I can use it. Is it
planned for Haswell or a later chip?


Thanks,

	Joerg

Xen devel - Jan 2013 - [PATCH v5 rebased 0/4] nested vmx: enable VMCS shadowing feature

[PATCH v5 rebased 0/4] nested vmx: enable VMCS shadowing feature

[PATCH v5 rebased 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM

[PATCH v5 rebased 2/4] nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing

[PATCH v5 rebased 3/4] nested vmx: optimize for bulk access of virtual VMCS

[PATCH v5 rebased 4/4] nested vmx: enable VMCS shadowing feature

Re: [PATCH v5 rebased 4/4] nested vmx: enable VMCS shadowing feature

Re: [PATCH v5 rebased 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM

Re: [PATCH v5 rebased 2/4] nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing

Re: [PATCH v5 rebased 3/4] nested vmx: optimize for bulk access of virtual VMCS

Re: [PATCH v5 rebased 4/4] nested vmx: enable VMCS shadowing feature

Re: [PATCH v5 rebased 0/4] nested vmx: enable VMCS shadowing feature