Latest Intel SDM introduced a new feature "VMCS shadowing" at bit 14 in Secondary Processor-Based VM-Execution Controls for nested virtualization. The main purpose of this feature is to reduce or eliminate the number of VM exit for non-root VMREAD and VMWRITE. It provides the capability to link a "virtual VMCS" with the current running VMCS, so that after VM entry, the non-root VMREAD and VMWRITE can get/set related data directly from/to the "virtual VMCS" without trap and emulation. A separate bitmap is introduced for VMREAD and VMWRITE, from which hypervisor can control whether VMREAD/VMWRITE from/to certain VMCS field will trigger VM exit or directly get/set data by hardware. With the new feature introduced, all the in "virtual VMCS" need to be operated by VMREAD and VMWRITE because this VMCS will also be loaded into hardware. This requires the capability to VMWRITE all the VMCS fields, including those readonly ones. Intel SDM introduces this functionality at bit 29 in IA32_VMX_MISC MSR. For details, please refer to: http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html Thanks, Dongxiao Dongxiao Xu (4): nested vmx: Use a list to store the launched vvmcs for L1 VMM nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing nested vmx: optimize for bulk access of virtual VMCS nested vmx: enable VMCS shadowing feature xen/arch/x86/hvm/vmx/vmcs.c | 92 +++++++++++++++- xen/arch/x86/hvm/vmx/vvmx.c | 220 +++++++++++++++++++++++++++++++----- xen/include/asm-x86/hvm/vmx/vmcs.h | 23 ++++- xen/include/asm-x86/hvm/vmx/vvmx.h | 22 +++- 4 files changed, 321 insertions(+), 36 deletions(-)
Dongxiao Xu
2013-Jan-17 05:37 UTC
[PATCH 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM
Originally we use a virtual VMCS field to store the launch state of a certain vmcs. However if we introduce VMCS shadowing feature, this virtual VMCS should also be able to load into real hardware, and VMREAD/VMWRITE operate invalid fields. The new approach is to store the launch state into a list for L1 VMM. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> --- xen/arch/x86/hvm/vmx/vvmx.c | 88 ++++++++++++++++++++++++++++++++---- xen/include/asm-x86/hvm/vmx/vmcs.h | 2 - xen/include/asm-x86/hvm/vmx/vvmx.h | 6 +++ 3 files changed, 85 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index d4e9b02..1c7b1d4 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -51,6 +51,7 @@ int nvmx_vcpu_initialise(struct vcpu *v) nvmx->iobitmap[0] = NULL; nvmx->iobitmap[1] = NULL; nvmx->msrbitmap = NULL; + INIT_LIST_HEAD(&nvmx->launched_list); return 0; out: return -ENOMEM; @@ -58,7 +59,9 @@ out: void nvmx_vcpu_destroy(struct vcpu *v) { + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); + struct vvmcs_list *item, *n; /* * When destroying the vcpu, it may be running on behalf of L2 guest. @@ -74,6 +77,11 @@ void nvmx_vcpu_destroy(struct vcpu *v) free_xenheap_page(nvcpu->nv_n2vmcx); nvcpu->nv_n2vmcx = NULL; } + + list_for_each_entry_safe(item, n, &nvmx->launched_list, node) { + list_del(&item->node); + xfree(item); + } } void nvmx_domain_relinquish_resources(struct domain *d) @@ -1198,6 +1206,59 @@ int nvmx_handle_vmxoff(struct cpu_user_regs *regs) return X86EMUL_OKAY; } +static int vvmcs_launched(struct list_head *launched_list, paddr_t vvmcs_pa) +{ + struct vvmcs_list *vvmcs = NULL; + struct list_head *pos; + int launched = 0; + + list_for_each(pos, launched_list) + { + vvmcs = list_entry(pos, struct vvmcs_list, node); + if ( vvmcs_pa == vvmcs->vvmcs_pa ) + { + launched = 1; + break; + } + } + + return launched; +} + +static int set_vvmcs_launched(struct list_head *launched_list, paddr_t vvmcs_pa) +{ + struct vvmcs_list *vvmcs; + + if ( vvmcs_launched(launched_list, vvmcs_pa) ) + return 0; + + vvmcs = xzalloc(struct vvmcs_list); + if ( !vvmcs ) + return -ENOMEM; + + vvmcs->vvmcs_pa = vvmcs_pa; + list_add(&vvmcs->node, launched_list); + + return 0; +} + +static void clear_vvmcs_launched(struct list_head *launched_list, paddr_t vvmcs_pa) +{ + struct vvmcs_list *vvmcs; + struct list_head *pos; + + list_for_each(pos, launched_list) + { + vvmcs = list_entry(pos, struct vvmcs_list, node); + if ( vvmcs_pa == vvmcs->vvmcs_pa ) + { + list_del(&vvmcs->node); + xfree(vvmcs); + break; + } + } +} + int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs) { struct nestedvmx *nvmx = &vcpu_2_nvmx(v); @@ -1221,8 +1282,10 @@ int nvmx_vmresume(struct vcpu *v, struct cpu_user_regs *regs) int nvmx_handle_vmresume(struct cpu_user_regs *regs) { - int launched; struct vcpu *v = current; + struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); + int launched; if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR ) { @@ -1230,8 +1293,8 @@ int nvmx_handle_vmresume(struct cpu_user_regs *regs) return X86EMUL_OKAY; } - launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, - NVMX_LAUNCH_STATE); + launched = vvmcs_launched(&nvmx->launched_list, + virt_to_maddr(nvcpu->nv_vvmcx)); if ( !launched ) { vmreturn (regs, VMFAIL_VALID); return X86EMUL_OKAY; @@ -1244,6 +1307,8 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs) int launched; int rc; struct vcpu *v = current; + struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); if ( vcpu_nestedhvm(v).nv_vvmcxaddr == VMCX_EADDR ) { @@ -1251,8 +1316,8 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs) return X86EMUL_OKAY; } - launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, - NVMX_LAUNCH_STATE); + launched = vvmcs_launched(&nvmx->launched_list, + virt_to_maddr(nvcpu->nv_vvmcx)); if ( launched ) { vmreturn (regs, VMFAIL_VALID); return X86EMUL_OKAY; @@ -1260,8 +1325,12 @@ int nvmx_handle_vmlaunch(struct cpu_user_regs *regs) else { rc = nvmx_vmresume(v,regs); if ( rc == X86EMUL_OKAY ) - __set_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, - NVMX_LAUNCH_STATE, 1); + { + if ( set_vvmcs_launched(&nvmx->launched_list, + virt_to_maddr(nvcpu->nv_vvmcx)) < 0 ) + return X86EMUL_UNHANDLEABLE; + } + } return rc; } @@ -1328,6 +1397,7 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs) struct vcpu *v = current; struct vmx_inst_decoded decode; struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); + struct nestedvmx *nvmx = &vcpu_2_nvmx(v); unsigned long gpa = 0; void *vvmcs; int rc; @@ -1344,7 +1414,7 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs) if ( gpa == nvcpu->nv_vvmcxaddr ) { - __set_vvmcs(nvcpu->nv_vvmcx, NVMX_LAUNCH_STATE, 0); + clear_vvmcs_launched(&nvmx->launched_list, virt_to_maddr(nvcpu->nv_vvmcx)); nvmx_purge_vvmcs(v); } else @@ -1352,7 +1422,7 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs) /* Even if this VMCS isn''t the current one, we must clear it. */ vvmcs = hvm_map_guest_frame_rw(gpa >> PAGE_SHIFT); if ( vvmcs ) - __set_vvmcs(vvmcs, NVMX_LAUNCH_STATE, 0); + clear_vvmcs_launched(&nvmx->launched_list, virt_to_maddr(vvmcs)); hvm_unmap_guest_frame(vvmcs); } diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index 51df81e..9ff741f 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -421,8 +421,6 @@ enum vmcs_field { HOST_SYSENTER_EIP = 0x00006c12, HOST_RSP = 0x00006c14, HOST_RIP = 0x00006c16, - /* A virtual VMCS field used for nestedvmx only */ - NVMX_LAUNCH_STATE = 0x00006c20, }; #define VMCS_VPID_WIDTH 16 diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index 9e1dc77..1c5313d 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -23,6 +23,11 @@ #ifndef __ASM_X86_HVM_VVMX_H__ #define __ASM_X86_HVM_VVMX_H__ +struct vvmcs_list { + paddr_t vvmcs_pa; + struct list_head node; +}; + struct nestedvmx { paddr_t vmxon_region_pa; void *iobitmap[2]; /* map (va) of L1 guest I/O bitmap */ @@ -38,6 +43,7 @@ struct nestedvmx { uint32_t exit_qual; } ept; uint32_t guest_vpid; + struct list_head launched_list; }; #define vcpu_2_nvmx(v) (vcpu_nestedhvm(v).u.nvmx) -- 1.7.1
Dongxiao Xu
2013-Jan-17 05:37 UTC
[PATCH 2/4] nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing
Before the VMCS shadowing feature, we use memory operation to build up the virtual VMCS. This does work since this virtual VMCS will never be loaded into real hardware. However after we introduce the VMCS shadowing feature, this VMCS will be loaded into hardware, which requires all fields in the VMCS accessed by VMREAD/VMWRITE. Besides, the virtual VMCS revision identifer should also meet the hardware''s requirement, instead of using a faked one. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> --- xen/arch/x86/hvm/vmx/vmcs.c | 29 +++++++++++++++++++++++++++++ xen/arch/x86/hvm/vmx/vvmx.c | 20 ++++++++++++++++---- xen/include/asm-x86/hvm/vmx/vmcs.h | 6 +++++- xen/include/asm-x86/hvm/vmx/vvmx.h | 16 ++++++++++++---- 4 files changed, 62 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index de22e03..4b0e8e0 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -725,6 +725,35 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to) spin_unlock(&vmx->vmcs_lock); } +void virtual_vmcs_enter(void *vvmcs) +{ + __vmptrld(virt_to_maddr(vvmcs)); +} + +void virtual_vmcs_exit(void *vvmcs) +{ + __vmpclear(virt_to_maddr(vvmcs)); + __vmptrld(virt_to_maddr(this_cpu(current_vmcs))); +} + +u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding) +{ + u64 res; + + virtual_vmcs_enter(vvmcs); + res = __vmread(vmcs_encoding); + virtual_vmcs_exit(vvmcs); + + return res; +} + +void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val) +{ + virtual_vmcs_enter(vvmcs); + __vmwrite(vmcs_encoding, val); + virtual_vmcs_exit(vvmcs); +} + static int construct_vmcs(struct vcpu *v) { struct domain *d = v->domain; diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 1c7b1d4..2f0076a 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -174,7 +174,7 @@ static int vvmcs_offset(u32 width, u32 type, u32 index) return offset; } -u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding) +u64 __get_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding) { union vmcs_encoding enc; u64 *content = (u64 *) vvmcs; @@ -204,7 +204,12 @@ u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding) return res; } -void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val) +u64 __get_vvmcs_real(void *vvmcs, u32 vmcs_encoding) +{ + return virtual_vmcs_vmread(vvmcs, vmcs_encoding); +} + +void __set_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding, u64 val) { union vmcs_encoding enc; u64 *content = (u64 *) vvmcs; @@ -240,6 +245,11 @@ void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val) content[offset] = res; } +void __set_vvmcs_real(void *vvmcs, u32 vmcs_encoding, u64 val) +{ + virtual_vmcs_vmwrite(vvmcs, vmcs_encoding, val); +} + static unsigned long reg_read(struct cpu_user_regs *regs, enum vmx_regs_enc index) { @@ -1558,10 +1568,11 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs) */ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) { + struct vcpu *v = current; u64 data = 0, host_data = 0; int r = 1; - if ( !nestedhvm_enabled(current->domain) ) + if ( !nestedhvm_enabled(v->domain) ) return 0; rdmsrl(msr, host_data); @@ -1571,7 +1582,8 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) */ switch (msr) { case MSR_IA32_VMX_BASIC: - data = (host_data & (~0ul << 32)) | VVMCS_REVISION; + data = (host_data & (~0ul << 32)) | + ((v->arch.hvm_vmx.vmcs)->vmcs_revision_id); break; case MSR_IA32_VMX_PINBASED_CTLS: case MSR_IA32_VMX_TRUE_PINBASED_CTLS: diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index 9ff741f..901652d 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -244,7 +244,7 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info; (vmx_secondary_exec_control & SECONDARY_EXEC_APIC_REGISTER_VIRT) #define cpu_has_vmx_virtual_intr_delivery \ (vmx_secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) - +#define cpu_has_vmx_vmcs_shadowing 0 /* GUEST_INTERRUPTIBILITY_INFO flags. */ #define VMX_INTR_SHADOW_STI 0x00000001 #define VMX_INTR_SHADOW_MOV_SS 0x00000002 @@ -436,6 +436,10 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to); void vmx_set_eoi_exit_bitmap(struct vcpu *v, u8 vector); void vmx_clear_eoi_exit_bitmap(struct vcpu *v, u8 vector); int vmx_check_msr_bitmap(unsigned long *msr_bitmap, u32 msr, int access_type); +void virtual_vmcs_enter(void *vvmcs); +void virtual_vmcs_exit(void *vvmcs); +u64 virtual_vmcs_vmread(void *vvmcs, u32 vmcs_encoding); +void virtual_vmcs_vmwrite(void *vvmcs, u32 vmcs_encoding, u64 val); #endif /* ASM_X86_HVM_VMX_VMCS_H__ */ diff --git a/xen/include/asm-x86/hvm/vmx/vvmx.h b/xen/include/asm-x86/hvm/vmx/vvmx.h index 1c5313d..b87bfb1 100644 --- a/xen/include/asm-x86/hvm/vmx/vvmx.h +++ b/xen/include/asm-x86/hvm/vmx/vvmx.h @@ -152,8 +152,6 @@ nvmx_hap_walk_L1_p2m(struct vcpu *v, paddr_t L2_gpa, paddr_t *L1_gpa, * */ -#define VVMCS_REVISION 0x40000001u - struct vvmcs_header { u32 revision; u32 abort; @@ -185,8 +183,18 @@ enum vvmcs_encoding_type { VVMCS_TYPE_HSTATE, }; -u64 __get_vvmcs(void *vvmcs, u32 vmcs_encoding); -void __set_vvmcs(void *vvmcs, u32 vmcs_encoding, u64 val); +u64 __get_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding); +u64 __get_vvmcs_real(void *vvmcs, u32 vmcs_encoding); +void __set_vvmcs_virtual(void *vvmcs, u32 vmcs_encoding, u64 val); +void __set_vvmcs_real(void *vvmcs, u32 vmcs_encoding, u64 val); + +#define __get_vvmcs(_vvmcs, _vmcs_encoding) \ + (cpu_has_vmx_vmcs_shadowing ? __get_vvmcs_real(_vvmcs, _vmcs_encoding) \ + : __get_vvmcs_virtual(_vvmcs, _vmcs_encoding)) + +#define __set_vvmcs(_vvmcs, _vmcs_encoding, _val) \ + (cpu_has_vmx_vmcs_shadowing ? __set_vvmcs_real(_vvmcs, _vmcs_encoding, _val) \ + : __set_vvmcs_virtual(_vvmcs, _vmcs_encoding, _val)) uint64_t get_shadow_eptp(struct vcpu *v); -- 1.7.1
Dongxiao Xu
2013-Jan-17 05:37 UTC
[PATCH 3/4] nested vmx: optimize for bulk access of virtual VMCS
After we use the VMREAD/VMWRITE to build up the virtual VMCS, each access to the virtual VMCS needs two VMPTRLD and one VMCLEAR to switch the environment, which might be an overhead to performance. This commit tries to handle multiple virtual VMCS access together to improve the performance. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> --- xen/arch/x86/hvm/vmx/vvmx.c | 89 +++++++++++++++++++++++++++++++++++-------- 1 files changed, 73 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 2f0076a..9aba89e 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -829,6 +829,34 @@ static void vvmcs_to_shadow(void *vvmcs, unsigned int field) __vmwrite(field, value); } +static void vvmcs_to_shadow_bulk(void *vvmcs, int n, u16 *field) +{ + u64 *value = NULL; + int i = 0; + + if ( cpu_has_vmx_vmcs_shadowing ) + { + value = xzalloc_array(u64, n); + if ( !value ) + goto fallback; + + virtual_vmcs_enter(vvmcs); + for ( i = 0; i < n; i++ ) + value[i] = __vmread(field[i]); + virtual_vmcs_exit(vvmcs); + + for ( i = 0; i < n; i++ ) + __vmwrite(field[i], value[i]); + + xfree(value); + return; + } + +fallback: + for ( i = 0; i < n; i++ ) + vvmcs_to_shadow(vvmcs, field[i]); +} + static void shadow_to_vvmcs(void *vvmcs, unsigned int field) { u64 value; @@ -839,6 +867,34 @@ static void shadow_to_vvmcs(void *vvmcs, unsigned int field) __set_vvmcs(vvmcs, field, value); } +static void shadow_to_vvmcs_bulk(void *vvmcs, int n, u16 *field) +{ + u64 *value = NULL; + int i = 0; + + if ( cpu_has_vmx_vmcs_shadowing ) + { + value = xzalloc_array(u64, n); + if ( !value ) + goto fallback; + + for ( i = 0; i < n; i++ ) + value[i] = __vmread(field[i]); + + virtual_vmcs_enter(vvmcs); + for ( i = 0; i < n; i++ ) + __vmwrite(field[i], value[i]); + virtual_vmcs_exit(vvmcs); + + xfree(value); + return; + } + +fallback: + for ( i = 0; i < n; i++ ) + shadow_to_vvmcs(vvmcs, field[i]); +} + static void load_shadow_control(struct vcpu *v) { /* @@ -862,13 +918,17 @@ static void load_shadow_guest_state(struct vcpu *v) { struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); void *vvmcs = nvcpu->nv_vvmcx; - int i; u32 control; u64 cr_gh_mask, cr_read_shadow; + u16 vmentry_fields[] = { + VM_ENTRY_INTR_INFO, + VM_ENTRY_EXCEPTION_ERROR_CODE, + VM_ENTRY_INSTRUCTION_LEN, + }; + /* vvmcs.gstate to shadow vmcs.gstate */ - for ( i = 0; i < ARRAY_SIZE(vmcs_gstate_field); i++ ) - vvmcs_to_shadow(vvmcs, vmcs_gstate_field[i]); + vvmcs_to_shadow_bulk(vvmcs, ARRAY_SIZE(vmcs_gstate_field), (u16 *)vmcs_gstate_field); hvm_set_cr0(__get_vvmcs(vvmcs, GUEST_CR0)); hvm_set_cr4(__get_vvmcs(vvmcs, GUEST_CR4)); @@ -882,9 +942,7 @@ static void load_shadow_guest_state(struct vcpu *v) hvm_funcs.set_tsc_offset(v, v->arch.hvm_vcpu.cache_tsc_offset); - vvmcs_to_shadow(vvmcs, VM_ENTRY_INTR_INFO); - vvmcs_to_shadow(vvmcs, VM_ENTRY_EXCEPTION_ERROR_CODE); - vvmcs_to_shadow(vvmcs, VM_ENTRY_INSTRUCTION_LEN); + vvmcs_to_shadow_bulk(vvmcs, ARRAY_SIZE(vmentry_fields), vmentry_fields); /* * While emulate CR0 and CR4 for nested virtualization, set the CR0/CR4 @@ -904,10 +962,13 @@ static void load_shadow_guest_state(struct vcpu *v) if ( nvmx_ept_enabled(v) && hvm_pae_enabled(v) && (v->arch.hvm_vcpu.guest_efer & EFER_LMA) ) { - vvmcs_to_shadow(vvmcs, GUEST_PDPTR0); - vvmcs_to_shadow(vvmcs, GUEST_PDPTR1); - vvmcs_to_shadow(vvmcs, GUEST_PDPTR2); - vvmcs_to_shadow(vvmcs, GUEST_PDPTR3); + u16 gpdptr_fields[] = { + GUEST_PDPTR0, + GUEST_PDPTR1, + GUEST_PDPTR2, + GUEST_PDPTR3, + }; + vvmcs_to_shadow_bulk(vvmcs, ARRAY_SIZE(gpdptr_fields), gpdptr_fields); } /* TODO: CR3 target control */ @@ -998,13 +1059,11 @@ static void virtual_vmentry(struct cpu_user_regs *regs) static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs) { - int i; struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); void *vvmcs = nvcpu->nv_vvmcx; /* copy shadow vmcs.gstate back to vvmcs.gstate */ - for ( i = 0; i < ARRAY_SIZE(vmcs_gstate_field); i++ ) - shadow_to_vvmcs(vvmcs, vmcs_gstate_field[i]); + shadow_to_vvmcs_bulk(vvmcs, ARRAY_SIZE(vmcs_gstate_field), (u16 *)vmcs_gstate_field); /* RIP, RSP are in user regs */ __set_vvmcs(vvmcs, GUEST_RIP, regs->eip); __set_vvmcs(vvmcs, GUEST_RSP, regs->esp); @@ -1016,13 +1075,11 @@ static void sync_vvmcs_guest_state(struct vcpu *v, struct cpu_user_regs *regs) static void sync_vvmcs_ro(struct vcpu *v) { - int i; struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); struct nestedvmx *nvmx = &vcpu_2_nvmx(v); void *vvmcs = nvcpu->nv_vvmcx; - for ( i = 0; i < ARRAY_SIZE(vmcs_ro_field); i++ ) - shadow_to_vvmcs(nvcpu->nv_vvmcx, vmcs_ro_field[i]); + shadow_to_vvmcs_bulk(vvmcs, ARRAY_SIZE(vmcs_ro_field), (u16 *)vmcs_ro_field); /* Adjust exit_reason/exit_qualifciation for violation case */ if ( __get_vvmcs(vvmcs, VM_EXIT_REASON) == EXIT_REASON_EPT_VIOLATION ) -- 1.7.1
The current logic for handling the non-root VMREAD/VMWRITE is by VM-Exit and emulate, which may bring certain overhead. On new Intel platform, it introduces a new feature called VMCS shadowing, where non-root VMREAD/VMWRITE will not trigger VM-Exit, and the hardware will read/write the virtual VMCS instead. This is proved to have performance improvement with the feature. Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com> --- xen/arch/x86/hvm/vmx/vmcs.c | 63 +++++++++++++++++++++++++++++++++++- xen/arch/x86/hvm/vmx/vvmx.c | 23 +++++++++++++ xen/include/asm-x86/hvm/vmx/vmcs.h | 19 ++++++++++- 3 files changed, 103 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 4b0e8e0..190113f 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -91,6 +91,7 @@ static void __init vmx_display_features(void) P(cpu_has_vmx_unrestricted_guest, "Unrestricted Guest"); P(cpu_has_vmx_apic_reg_virt, "APIC Register Virtualization"); P(cpu_has_vmx_virtual_intr_delivery, "Virtual Interrupt Delivery"); + P(cpu_has_vmx_vmcs_shadowing, "VMCS shadowing"); #undef P if ( !printed ) @@ -132,6 +133,7 @@ static int vmx_init_vmcs_config(void) u32 _vmx_cpu_based_exec_control; u32 _vmx_secondary_exec_control = 0; u64 _vmx_ept_vpid_cap = 0; + u64 _vmx_misc_cap = 0; u32 _vmx_vmexit_control; u32 _vmx_vmentry_control; bool_t mismatch = 0; @@ -179,6 +181,9 @@ static int vmx_init_vmcs_config(void) SECONDARY_EXEC_ENABLE_RDTSCP | SECONDARY_EXEC_PAUSE_LOOP_EXITING | SECONDARY_EXEC_ENABLE_INVPCID); + rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap); + if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL ) + opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING; if ( opt_vpid_enabled ) opt |= SECONDARY_EXEC_ENABLE_VPID; if ( opt_unrestricted_guest_enabled ) @@ -382,6 +387,8 @@ static void __vmx_clear_vmcs(void *info) if ( arch_vmx->active_cpu == smp_processor_id() ) { __vmpclear(virt_to_maddr(arch_vmx->vmcs)); + if ( arch_vmx->shadow_vmcs_pa && arch_vmx->shadow_vmcs_pa != ~0ul ) + __vmpclear(arch_vmx->shadow_vmcs_pa); arch_vmx->active_cpu = -1; arch_vmx->launched = 0; @@ -710,6 +717,8 @@ void vmx_vmcs_switch(struct vmcs_struct *from, struct vmcs_struct *to) spin_lock(&vmx->vmcs_lock); __vmpclear(virt_to_maddr(from)); + if ( vmx->shadow_vmcs_pa && vmx->shadow_vmcs_pa != ~0ul ) + __vmpclear(vmx->shadow_vmcs_pa); __vmptrld(virt_to_maddr(to)); vmx->vmcs = to; @@ -761,6 +770,7 @@ static int construct_vmcs(struct vcpu *v) unsigned long sysenter_eip; u32 vmexit_ctl = vmx_vmexit_control; u32 vmentry_ctl = vmx_vmentry_control; + int ret = 0; vmx_vmcs_enter(v); @@ -816,7 +826,10 @@ static int construct_vmcs(struct vcpu *v) unsigned long *msr_bitmap = alloc_xenheap_page(); if ( msr_bitmap == NULL ) - return -ENOMEM; + { + ret = -ENOMEM; + goto out; + } memset(msr_bitmap, ~0, PAGE_SIZE); v->arch.hvm_vmx.msr_bitmap = msr_bitmap; @@ -843,6 +856,45 @@ static int construct_vmcs(struct vcpu *v) } } + /* non-root VMREAD/VMWRITE bitmap. */ + if ( cpu_has_vmx_vmcs_shadowing ) + { + unsigned long *vmread_bitmap, *vmwrite_bitmap; + + vmread_bitmap = alloc_xenheap_page(); + if ( !vmread_bitmap ) + { + gdprintk(XENLOG_ERR, "nest: allocation for vmread bitmap failed\n"); + ret = -ENOMEM; + goto out1; + } + v->arch.hvm_vmx.vmread_bitmap = vmread_bitmap; + + vmwrite_bitmap = alloc_xenheap_page(); + if ( !vmwrite_bitmap ) + { + gdprintk(XENLOG_ERR, "nest: allocation for vmwrite bitmap failed\n"); + ret = -ENOMEM; + goto out2; + } + v->arch.hvm_vmx.vmwrite_bitmap = vmwrite_bitmap; + + memset(vmread_bitmap, 0, PAGE_SIZE); + memset(vmwrite_bitmap, 0, PAGE_SIZE); + + /* + * For the following 4 encodings, we need to handle them in VMM. + * Let them vmexit as usual. + */ + set_bit(IO_BITMAP_A, vmwrite_bitmap); + set_bit(IO_BITMAP_A_HIGH, vmwrite_bitmap); + set_bit(IO_BITMAP_B, vmwrite_bitmap); + set_bit(IO_BITMAP_B_HIGH, vmwrite_bitmap); + + __vmwrite(VMREAD_BITMAP, virt_to_maddr(vmread_bitmap)); + __vmwrite(VMWRITE_BITMAP, virt_to_maddr(vmwrite_bitmap)); + } + /* I/O access bitmap. */ __vmwrite(IO_BITMAP_A, virt_to_maddr((char *)hvm_io_bitmap + 0)); __vmwrite(IO_BITMAP_B, virt_to_maddr((char *)hvm_io_bitmap + PAGE_SIZE)); @@ -997,6 +1049,13 @@ static int construct_vmcs(struct vcpu *v) vmx_vlapic_msr_changed(v); return 0; + +out2: + free_xenheap_page(v->arch.hvm_vmx.vmread_bitmap); +out1: + free_xenheap_page(v->arch.hvm_vmx.msr_bitmap); +out: + return ret; } int vmx_read_guest_msr(u32 msr, u64 *val) @@ -1154,6 +1213,8 @@ void vmx_destroy_vmcs(struct vcpu *v) free_xenheap_page(v->arch.hvm_vmx.host_msr_area); free_xenheap_page(v->arch.hvm_vmx.msr_area); free_xenheap_page(v->arch.hvm_vmx.msr_bitmap); + free_xenheap_page(v->arch.hvm_vmx.vmread_bitmap); + free_xenheap_page(v->arch.hvm_vmx.vmwrite_bitmap); } void vm_launch_fail(void) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 9aba89e..e75e997 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -994,6 +994,24 @@ static bool_t nvmx_vpid_enabled(struct nestedvcpu *nvcpu) return 0; } +static void nvmx_set_vmcs_pointer(struct vcpu *v, struct vmcs_struct *vvmcs) +{ + paddr_t vvmcs_pa = virt_to_maddr(vvmcs); + + __vmpclear(vvmcs_pa); + vvmcs->vmcs_revision_id |= VMCS_RID_TYPE_MASK; + v->arch.hvm_vmx.shadow_vmcs_pa = vvmcs_pa; + __vmwrite(VMCS_LINK_POINTER, vvmcs_pa); +} + +static void nvmx_clear_vmcs_pointer(struct vcpu *v, struct vmcs_struct *vvmcs) +{ + __vmpclear(virt_to_maddr(vvmcs)); + vvmcs->vmcs_revision_id &= ~VMCS_RID_TYPE_MASK; + v->arch.hvm_vmx.shadow_vmcs_pa = ~0ul; + __vmwrite(VMCS_LINK_POINTER, ~0ul); +} + static void virtual_vmentry(struct cpu_user_regs *regs) { struct vcpu *v = current; @@ -1431,6 +1449,9 @@ int nvmx_handle_vmptrld(struct cpu_user_regs *regs) __map_msr_bitmap(v); } + if ( cpu_has_vmx_vmcs_shadowing ) + nvmx_set_vmcs_pointer(v, nvcpu->nv_vvmcx); + vmreturn(regs, VMSUCCEED); out: @@ -1481,6 +1502,8 @@ int nvmx_handle_vmclear(struct cpu_user_regs *regs) if ( gpa == nvcpu->nv_vvmcxaddr ) { + if ( cpu_has_vmx_vmcs_shadowing ) + nvmx_clear_vmcs_pointer(v, nvcpu->nv_vvmcx); clear_vvmcs_launched(&nvmx->launched_list, virt_to_maddr(nvcpu->nv_vvmcx)); nvmx_purge_vvmcs(v); } diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h index 901652d..61c6655 100644 --- a/xen/include/asm-x86/hvm/vmx/vmcs.h +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h @@ -81,6 +81,8 @@ struct vmx_domain { struct arch_vmx_struct { /* Virtual address of VMCS. */ struct vmcs_struct *vmcs; + /* Physical address of shadow VMCS. */ + paddr_t shadow_vmcs_pa; /* Protects remote usage of VMCS (VMPTRLD/VMCLEAR). */ spinlock_t vmcs_lock; @@ -125,6 +127,10 @@ struct arch_vmx_struct { /* Remember EFLAGS while in virtual 8086 mode */ uint32_t vm86_saved_eflags; int hostenv_migrated; + + /* Bitmap to control vmexit policy for Non-root VMREAD/VMWRITE */ + unsigned long *vmread_bitmap; + unsigned long *vmwrite_bitmap; }; int vmx_create_vmcs(struct vcpu *v); @@ -191,6 +197,7 @@ extern u32 vmx_vmentry_control; #define SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY 0x00000200 #define SECONDARY_EXEC_PAUSE_LOOP_EXITING 0x00000400 #define SECONDARY_EXEC_ENABLE_INVPCID 0x00001000 +#define SECONDARY_EXEC_ENABLE_VMCS_SHADOWING 0x00004000 extern u32 vmx_secondary_exec_control; extern bool_t cpu_has_vmx_ins_outs_instr_info; @@ -205,6 +212,8 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info; #define VMX_EPT_INVEPT_SINGLE_CONTEXT 0x02000000 #define VMX_EPT_INVEPT_ALL_CONTEXT 0x04000000 +#define VMX_MISC_VMWRITE_ALL 0x20000000 + #define VMX_VPID_INVVPID_INSTRUCTION 0x100000000ULL #define VMX_VPID_INVVPID_INDIVIDUAL_ADDR 0x10000000000ULL #define VMX_VPID_INVVPID_SINGLE_CONTEXT 0x20000000000ULL @@ -244,7 +253,11 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info; (vmx_secondary_exec_control & SECONDARY_EXEC_APIC_REGISTER_VIRT) #define cpu_has_vmx_virtual_intr_delivery \ (vmx_secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) -#define cpu_has_vmx_vmcs_shadowing 0 +#define cpu_has_vmx_vmcs_shadowing \ + (vmx_secondary_exec_control & SECONDARY_EXEC_ENABLE_VMCS_SHADOWING) + +#define VMCS_RID_TYPE_MASK 0x80000000 + /* GUEST_INTERRUPTIBILITY_INFO flags. */ #define VMX_INTR_SHADOW_STI 0x00000001 #define VMX_INTR_SHADOW_MOV_SS 0x00000002 @@ -304,6 +317,10 @@ enum vmcs_field { EOI_EXIT_BITMAP2_HIGH = 0x00002021, EOI_EXIT_BITMAP3 = 0x00002022, EOI_EXIT_BITMAP3_HIGH = 0x00002023, + VMREAD_BITMAP = 0x00002026, + VMREAD_BITMAP_HIGH = 0x00002027, + VMWRITE_BITMAP = 0x00002028, + VMWRITE_BITMAP_HIGH = 0x00002029, GUEST_PHYSICAL_ADDRESS = 0x00002400, GUEST_PHYSICAL_ADDRESS_HIGH = 0x00002401, VMCS_LINK_POINTER = 0x00002800, -- 1.7.1
Jan Beulich
2013-Jan-17 11:38 UTC
Re: [PATCH 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM
>>> On 17.01.13 at 06:37, Dongxiao Xu <dongxiao.xu@intel.com> wrote: > @@ -74,6 +77,11 @@ void nvmx_vcpu_destroy(struct vcpu *v) > free_xenheap_page(nvcpu->nv_n2vmcx); > nvcpu->nv_n2vmcx = NULL; > } > + > + list_for_each_entry_safe(item, n, &nvmx->launched_list, node) {Misplaced brace.> + list_del(&item->node); > + xfree(item); > + } > } > > void nvmx_domain_relinquish_resources(struct domain *d) > @@ -1198,6 +1206,59 @@ int nvmx_handle_vmxoff(struct cpu_user_regs *regs) > return X86EMUL_OKAY; > } > > +static int vvmcs_launched(struct list_head *launched_list, paddr_t vvmcs_pa)Returning bool_t really?> +{ > + struct vvmcs_list *vvmcs = NULL;Pointless initializer.> + struct list_head *pos; > + int launched = 0;bool_t?> @@ -1230,8 +1293,8 @@ int nvmx_handle_vmresume(struct cpu_user_regs *regs) > return X86EMUL_OKAY; > } > > - launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, > - NVMX_LAUNCH_STATE); > + launched = vvmcs_launched(&nvmx->launched_list, > + virt_to_maddr(nvcpu->nv_vvmcx));nv_vvmcx is obtained through hvm_map_guest_frame_rw(), so you can''t validly use virt_to_maddr() on it. I know there are other examples of this in the code, but they''re all wrong and will all get fixed once I get to submit the 16Tb support patches. Jan
Jan Beulich
2013-Jan-17 11:40 UTC
Re: [PATCH 2/4] nested vmx: use VMREAD/VMWRITE to construct vVMCS if enabled VMCS shadowing
>>> On 17.01.13 at 06:37, Dongxiao Xu <dongxiao.xu@intel.com> wrote: > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > @@ -244,7 +244,7 @@ extern bool_t cpu_has_vmx_ins_outs_instr_info; > (vmx_secondary_exec_control & SECONDARY_EXEC_APIC_REGISTER_VIRT) > #define cpu_has_vmx_virtual_intr_delivery \ > (vmx_secondary_exec_control & SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) > -Please keep the blank line (and insert above it). Jan> +#define cpu_has_vmx_vmcs_shadowing 0 > /* GUEST_INTERRUPTIBILITY_INFO flags. */ > #define VMX_INTR_SHADOW_STI 0x00000001 > #define VMX_INTR_SHADOW_MOV_SS 0x00000002
Jan Beulich
2013-Jan-17 11:48 UTC
Re: [PATCH 3/4] nested vmx: optimize for bulk access of virtual VMCS
>>> On 17.01.13 at 06:37, Dongxiao Xu <dongxiao.xu@intel.com> wrote: > --- a/xen/arch/x86/hvm/vmx/vvmx.c > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > @@ -829,6 +829,34 @@ static void vvmcs_to_shadow(void *vvmcs, unsigned int field) > __vmwrite(field, value); > } > > +static void vvmcs_to_shadow_bulk(void *vvmcs, int n, u16 *field) > +{ > + u64 *value = NULL; > + int i = 0;Both ''n'' and ''i'' should be unsigned. And there again is a pointless initializer here.> + > + if ( cpu_has_vmx_vmcs_shadowing ) > + { > + value = xzalloc_array(u64, n); > + if ( !value ) > + goto fallback; > + > + virtual_vmcs_enter(vvmcs); > + for ( i = 0; i < n; i++ ) > + value[i] = __vmread(field[i]); > + virtual_vmcs_exit(vvmcs); > + > + for ( i = 0; i < n; i++ ) > + __vmwrite(field[i], value[i]); > + > + xfree(value); > + return; > + } > + > +fallback: > + for ( i = 0; i < n; i++ ) > + vvmcs_to_shadow(vvmcs, field[i]);Putting the fallback code in a conditional and the "normal" code outside would reduce overall amount of indentation.> +} > + > static void shadow_to_vvmcs(void *vvmcs, unsigned int field) > { > u64 value; > @@ -862,13 +918,17 @@ static void load_shadow_guest_state(struct vcpu *v) > { > struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); > void *vvmcs = nvcpu->nv_vvmcx; > - int i; > u32 control; > u64 cr_gh_mask, cr_read_shadow; > > + u16 vmentry_fields[] = { > + VM_ENTRY_INTR_INFO, > + VM_ENTRY_EXCEPTION_ERROR_CODE, > + VM_ENTRY_INSTRUCTION_LEN, > + }; > + > /* vvmcs.gstate to shadow vmcs.gstate */ > - for ( i = 0; i < ARRAY_SIZE(vmcs_gstate_field); i++ ) > - vvmcs_to_shadow(vvmcs, vmcs_gstate_field[i]); > + vvmcs_to_shadow_bulk(vvmcs, ARRAY_SIZE(vmcs_gstate_field), (u16 *)vmcs_gstate_field);The cast should be dropped as being dangerous. Just const qualify the function parameter. Also - long line? Jan
Xu, Dongxiao
2013-Jan-17 12:39 UTC
Re: [PATCH 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Thursday, January 17, 2013 7:38 PM > To: Xu, Dongxiao > Cc: Dong, Eddie; Nakajima, Jun; Zhang, Xiantao; xen-devel > Subject: Re: [Xen-devel] [PATCH 1/4] nested vmx: Use a list to store the > launched vvmcs for L1 VMM > > >>> On 17.01.13 at 06:37, Dongxiao Xu <dongxiao.xu@intel.com> wrote: > > @@ -74,6 +77,11 @@ void nvmx_vcpu_destroy(struct vcpu *v) > > free_xenheap_page(nvcpu->nv_n2vmcx); > > nvcpu->nv_n2vmcx = NULL; > > } > > + > > + list_for_each_entry_safe(item, n, &nvmx->launched_list, node) { > > Misplaced brace. > > > + list_del(&item->node); > > + xfree(item); > > + } > > } > > > > void nvmx_domain_relinquish_resources(struct domain *d) > > @@ -1198,6 +1206,59 @@ int nvmx_handle_vmxoff(struct cpu_user_regs > *regs) > > return X86EMUL_OKAY; > > } > > > > +static int vvmcs_launched(struct list_head *launched_list, paddr_t > vvmcs_pa) > > Returning bool_t really? > > > +{ > > + struct vvmcs_list *vvmcs = NULL; > > Pointless initializer. > > > + struct list_head *pos; > > + int launched = 0; > > bool_t? > > > @@ -1230,8 +1293,8 @@ int nvmx_handle_vmresume(struct cpu_user_regs > *regs) > > return X86EMUL_OKAY; > > } > > > > - launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, > > - NVMX_LAUNCH_STATE); > > + launched = vvmcs_launched(&nvmx->launched_list, > > + virt_to_maddr(nvcpu->nv_vvmcx)); > > nv_vvmcx is obtained through hvm_map_guest_frame_rw(), so > you can''t validly use virt_to_maddr() on it. I know there are other > examples of this in the code, but they''re all wrong and will all get > fixed once I get to submit the 16Tb support patches.What''s the correct way to get such machine address? Also could you help to explain why it is wrong? Thanks, Dongxiao> > Jan
Jan Beulich
2013-Jan-17 12:58 UTC
Re: [PATCH 1/4] nested vmx: Use a list to store the launched vvmcs for L1 VMM
>>> On 17.01.13 at 13:39, "Xu, Dongxiao" <dongxiao.xu@intel.com> wrote: >> > - launched = __get_vvmcs(vcpu_nestedhvm(v).nv_vvmcx, >> > - NVMX_LAUNCH_STATE); >> > + launched = vvmcs_launched(&nvmx->launched_list, >> > + virt_to_maddr(nvcpu->nv_vvmcx)); >> >> nv_vvmcx is obtained through hvm_map_guest_frame_rw(), so >> you can''t validly use virt_to_maddr() on it. I know there are other >> examples of this in the code, but they''re all wrong and will all get >> fixed once I get to submit the 16Tb support patches. > > What''s the correct way to get such machine address?domain_page_map_to_mfn().> Also could you help to explain why it is wrong?Because you''re assuming that map_domain_page() reduces to mfn_to_virt(), which only happens to be the case right now (where 32-bit code is gone and huge memory support isn''t there yet). In order to not introduce further (latent) bugs, I''m going to veto (as far as I''m being listened to) any change that violates the abstract model, and I regret that I didn''t notice the other flaws before they went in. Jan