Yang Zhang
2013-Sep-10 06:14 UTC
[PATCH v2 0/3] Nested VMX: fix bugs when reading VMX MSRs.
From: Yang Zhang <yang.z.zhang@Intel.com> The following patches fix the issues which existing in current vmx msr reading logic. Yang Zhang (3): Nested VMX: Check VMX capability before read VMX related MSRs. Nested VMX: Clear bit 31 of IA32_VMX_BASIC MSR Nested VMX: Fix IA32_VMX_CR4_FIXED1 msr emulation xen/arch/x86/hvm/vmx/vvmx.c | 68 ++++++++++++++++++++++++++++++++++++-- xen/include/asm-x86/cpufeature.h | 1 + xen/include/asm-x86/processor.h | 1 + 3 files changed, 67 insertions(+), 3 deletions(-)
Yang Zhang
2013-Sep-10 06:14 UTC
[PATCH v2 1/3] Nested VMX: Check VMX capability before read VMX related MSRs.
From: Yang Zhang <yang.z.zhang@Intel.com> VMX MSRs only available when the CPU support the VMX feature. In addition, VMX_TRUE* MSRs only available when bit 55 of VMX_BASIC MSR is set. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> --- xen/arch/x86/hvm/vmx/vvmx.c | 19 +++++++++++++++++++ 1 files changed, 19 insertions(+), 0 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index cecc72f..c6c8b88 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1814,12 +1814,31 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs) int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) { struct vcpu *v = current; + unsigned int eax, ebx, ecx, edx; u64 data = 0, host_data = 0; int r = 1; if ( !nestedhvm_enabled(v->domain) ) return 0; + /* + * VMX capablitys MSRs available only when guest + * support VMX. + */ + hvm_cpuid(0x1, &eax, &ebx, &ecx, &edx); + if ( !(ecx & cpufeat_mask(X86_FEATURE_VMXE)) ) + return 0; + + /* + * Those MSRs available only when bit 55 of + * MSR_IA32_VMX_BASIC is set. + */ + rdmsrl(MSR_IA32_VMX_BASIC, data); + if ( msr >= MSR_IA32_VMX_TRUE_PINBASED_CTLS && + msr <= MSR_IA32_VMX_TRUE_ENTRY_CTLS && + !(data & VMX_BASIC_DEFAULT1_ZERO) ) + return 0; + rdmsrl(msr, host_data); /* -- 1.7.1
Yang Zhang
2013-Sep-10 06:14 UTC
[PATCH v2 2/3] Nested VMX: Clear bit 31 of IA32_VMX_BASIC MSR
From: Yang Zhang <yang.z.zhang@Intel.com> The bit 31 of revision_id will set to 1 if vmcs shadowing enabled. And according intel SDM, the bit 31 of IA32_VMX_BASIC MSR is always 0. So we cannot set low 32 bit of IA32_VMX_BASIC to revision_id directly. Must clear the bit 31 to 0. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> --- xen/arch/x86/hvm/vmx/vvmx.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index c6c8b88..122462f 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1847,7 +1847,7 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) switch (msr) { case MSR_IA32_VMX_BASIC: data = (host_data & (~0ul << 32)) | - ((v->arch.hvm_vmx.vmcs)->vmcs_revision_id); + (v->arch.hvm_vmx.vmcs->vmcs_revision_id & 0x7fffffff); break; case MSR_IA32_VMX_PINBASED_CTLS: case MSR_IA32_VMX_TRUE_PINBASED_CTLS: -- 1.7.1
Yang Zhang
2013-Sep-10 06:14 UTC
[PATCH v2 3/3] Nested VMX: Fix IA32_VMX_CR4_FIXED1 msr emulation
From: Yang Zhang <yang.z.zhang@Intel.com> Currently, it use hardcode value for IA32_VMX_CR4_FIXED1. This is wrong. We should check guest''s cpuid to know which bits are writeable in CR4 by guest and allow the guest to set the corresponding bit only when guest has the feature. Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> --- xen/arch/x86/hvm/vmx/vvmx.c | 47 ++++++++++++++++++++++++++++++++++++- xen/include/asm-x86/cpufeature.h | 1 + xen/include/asm-x86/processor.h | 1 + 3 files changed, 47 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index 122462f..716891e 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -1944,8 +1944,51 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) data = X86_CR4_VMXE; break; case MSR_IA32_VMX_CR4_FIXED1: - /* allow 0-settings except SMXE */ - data = 0x267ff & ~X86_CR4_SMXE; + if ( edx & cpufeat_mask(X86_FEATURE_VME) ) + data |= X86_CR4_VME | X86_CR4_PVI; + if ( edx & cpufeat_mask(X86_FEATURE_TSC) ) + data |= X86_CR4_TSD; + if ( edx & cpufeat_mask(X86_FEATURE_DE) ) + data |= X86_CR4_DE; + if ( edx & cpufeat_mask(X86_FEATURE_PSE) ) + data |= X86_CR4_PSE; + if ( edx & cpufeat_mask(X86_FEATURE_PAE) ) + data |= X86_CR4_PAE; + if ( edx & cpufeat_mask(X86_FEATURE_MCE) ) + data |= X86_CR4_MCE; + if ( edx & cpufeat_mask(X86_FEATURE_PGE) ) + data |= X86_CR4_PGE; + if ( edx & cpufeat_mask(X86_FEATURE_FXSR) ) + data |= X86_CR4_OSFXSR; + if ( edx & cpufeat_mask(X86_FEATURE_XMM) ) + data |= X86_CR4_OSXMMEXCPT; + if ( ecx & cpufeat_mask(X86_FEATURE_VMXE) ) + data |= X86_CR4_VMXE; + if ( ecx & cpufeat_mask(X86_FEATURE_SMXE) ) + data |= X86_CR4_SMXE; + if ( ecx & cpufeat_mask(X86_FEATURE_PCID) ) + data |= X86_CR4_PCIDE; + if ( ecx & cpufeat_mask(X86_FEATURE_XSAVE) ) + data |= X86_CR4_OSXSAVE; + + hvm_cpuid(0x0, &eax, &ebx, &ecx, &edx); + if ( eax >= 0xa ) + { + hvm_cpuid(0xa, &eax, &ebx, &ecx, &edx); + /* Check whether guest has the perf monitor feature. */ + if ( (eax & 0xff) && (eax & 0xff00) ) + data |= X86_CR4_PCE; + } + else if ( eax >= 0x7 ) + { + hvm_cpuid(0x7, &eax, &ebx, &ecx, &edx); + if ( ebx & cpufeat_mask(X86_FEATURE_FSGSBASE) ) + data |= X86_CR4_FSGSBASE; + if ( ebx & cpufeat_mask(X86_FEATURE_SMEP) ) + data |= X86_CR4_SMEP; + if ( ebx & cpufeat_mask(X86_FEATURE_SMAP) ) + data |= X86_CR4_SMAP; + } break; case MSR_IA32_VMX_MISC: /* Do not support CR3-target feature now */ diff --git a/xen/include/asm-x86/cpufeature.h b/xen/include/asm-x86/cpufeature.h index 065c265..73d5cb6 100644 --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -148,6 +148,7 @@ #define X86_FEATURE_INVPCID (7*32+10) /* Invalidate Process Context ID */ #define X86_FEATURE_RTM (7*32+11) /* Restricted Transactional Memory */ #define X86_FEATURE_NO_FPU_SEL (7*32+13) /* FPU CS/DS stored as zero */ +#define X86_FEATURE_SMAP (7*32+ 20) /* Supervisor Mode Access Prevention */ #define cpu_has(c, bit) test_bit(bit, (c)->x86_capability) #define boot_cpu_has(bit) test_bit(bit, boot_cpu_data.x86_capability) diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h index 5cdacc7..893afa3 100644 --- a/xen/include/asm-x86/processor.h +++ b/xen/include/asm-x86/processor.h @@ -87,6 +87,7 @@ #define X86_CR4_PCIDE 0x20000 /* enable PCID */ #define X86_CR4_OSXSAVE 0x40000 /* enable XSAVE/XRSTOR */ #define X86_CR4_SMEP 0x100000/* enable SMEP */ +#define X86_CR4_SMAP 0x200000/* enable SMAP */ /* * Trap/fault mnemonics. -- 1.7.1
Andrew Cooper
2013-Sep-10 08:51 UTC
Re: [PATCH v2 2/3] Nested VMX: Clear bit 31 of IA32_VMX_BASIC MSR
On 10/09/13 07:14, Yang Zhang wrote:> From: Yang Zhang <yang.z.zhang@Intel.com> > > The bit 31 of revision_id will set to 1 if vmcs shadowing enabled. And > according intel SDM, the bit 31 of IA32_VMX_BASIC MSR is always 0. So we > cannot set low 32 bit of IA32_VMX_BASIC to revision_id directly. Must clear > the bit 31 to 0. > > Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com>Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com>> --- > xen/arch/x86/hvm/vmx/vvmx.c | 2 +- > 1 files changed, 1 insertions(+), 1 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c > index c6c8b88..122462f 100644 > --- a/xen/arch/x86/hvm/vmx/vvmx.c > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > @@ -1847,7 +1847,7 @@ int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) > switch (msr) { > case MSR_IA32_VMX_BASIC: > data = (host_data & (~0ul << 32)) | > - ((v->arch.hvm_vmx.vmcs)->vmcs_revision_id); > + (v->arch.hvm_vmx.vmcs->vmcs_revision_id & 0x7fffffff); > break; > case MSR_IA32_VMX_PINBASED_CTLS: > case MSR_IA32_VMX_TRUE_PINBASED_CTLS:
Andrew Cooper
2013-Sep-10 10:56 UTC
Re: [PATCH v2 1/3] Nested VMX: Check VMX capability before read VMX related MSRs.
On 10/09/13 07:14, Yang Zhang wrote:> From: Yang Zhang <yang.z.zhang@Intel.com> > > VMX MSRs only available when the CPU support the VMX feature. In addition, > VMX_TRUE* MSRs only available when bit 55 of VMX_BASIC MSR is set. > > Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> > --- > xen/arch/x86/hvm/vmx/vvmx.c | 19 +++++++++++++++++++ > 1 files changed, 19 insertions(+), 0 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c > index cecc72f..c6c8b88 100644 > --- a/xen/arch/x86/hvm/vmx/vvmx.c > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > @@ -1814,12 +1814,31 @@ int nvmx_handle_invvpid(struct cpu_user_regs *regs) > int nvmx_msr_read_intercept(unsigned int msr, u64 *msr_content) > { > struct vcpu *v = current; > + unsigned int eax, ebx, ecx, edx; > u64 data = 0, host_data = 0; > int r = 1; > > if ( !nestedhvm_enabled(v->domain) ) > return 0; > > + /* > + * VMX capablitys MSRs available only when guestsp. capability.> + * support VMX. > + */ > + hvm_cpuid(0x1, &eax, &ebx, &ecx, &edx); > + if ( !(ecx & cpufeat_mask(X86_FEATURE_VMXE)) ) > + return 0; > + > + /* > + * Those MSRs available only when bit 55 of > + * MSR_IA32_VMX_BASIC is set.Given pedantry about spelling, "Those MSRs are" Furthermore, do those comments really not fit on a single line? ~Andrew> + */ > + rdmsrl(MSR_IA32_VMX_BASIC, data); > + if ( msr >= MSR_IA32_VMX_TRUE_PINBASED_CTLS && > + msr <= MSR_IA32_VMX_TRUE_ENTRY_CTLS && > + !(data & VMX_BASIC_DEFAULT1_ZERO) ) > + return 0; > + > rdmsrl(msr, host_data); > > /*
Jan Beulich
2013-Sep-10 13:41 UTC
Re: [PATCH v2 1/3] Nested VMX: Check VMX capability before read VMX related MSRs.
>>> On 10.09.13 at 08:14, Yang Zhang <yang.z.zhang@intel.com> wrote: > + /* > + * Those MSRs available only when bit 55 of > + * MSR_IA32_VMX_BASIC is set. > + */ > + rdmsrl(MSR_IA32_VMX_BASIC, data); > + if ( msr >= MSR_IA32_VMX_TRUE_PINBASED_CTLS && > + msr <= MSR_IA32_VMX_TRUE_ENTRY_CTLS &&Doing this with a range check is pretty odd - neither the lower nor the upper bound are really meant to be boundaries. I''d prefer if you did this with a switch statement, enumerating all four MSRs. The compiler can then still convert that to a range comparison. And that way you can also more the rdmsrl() out of the fast path. Which makes me ask - wouldn''t it be better to read this MSR into a global variable once at boot time, and then use that variable instead of reading the MSR over and over? Jan> + !(data & VMX_BASIC_DEFAULT1_ZERO) ) > + return 0; > +
Jan Beulich
2013-Sep-10 13:45 UTC
Re: [PATCH v2 3/3] Nested VMX: Fix IA32_VMX_CR4_FIXED1 msr emulation
>>> On 10.09.13 at 08:14, Yang Zhang <yang.z.zhang@intel.com> wrote: > + hvm_cpuid(0x0, &eax, &ebx, &ecx, &edx); > + if ( eax >= 0xa ) > + { > + hvm_cpuid(0xa, &eax, &ebx, &ecx, &edx); > + /* Check whether guest has the perf monitor feature. */ > + if ( (eax & 0xff) && (eax & 0xff00) ) > + data |= X86_CR4_PCE; > + } > + else if ( eax >= 0x7 ) > + { > + hvm_cpuid(0x7, &eax, &ebx, &ecx, &edx); > + if ( ebx & cpufeat_mask(X86_FEATURE_FSGSBASE) ) > + data |= X86_CR4_FSGSBASE; > + if ( ebx & cpufeat_mask(X86_FEATURE_SMEP) ) > + data |= X86_CR4_SMEP; > + if ( ebx & cpufeat_mask(X86_FEATURE_SMAP) ) > + data |= X86_CR4_SMAP; > + }I told you on v1 already that if/else-if doesn''t work here: If the maximum leaf is 10 or higher, the last three bits will never get set. Either you need to retain the use of another variable for the maximum leaf, or (my preference) you need to use a switch statement with suitably annotated fall through. Jan
Zhang, Yang Z
2013-Sep-11 00:40 UTC
Re: [PATCH v2 1/3] Nested VMX: Check VMX capability before read VMX related MSRs.
Andrew Cooper wrote on 2013-09-10:> On 10/09/13 07:14, Yang Zhang wrote: >> From: Yang Zhang <yang.z.zhang@Intel.com> >> >> VMX MSRs only available when the CPU support the VMX feature. In >> addition, >> VMX_TRUE* MSRs only available when bit 55 of VMX_BASIC MSR is set. >> >> Signed-off-by: Yang Zhang <yang.z.zhang@Intel.com> >> --- >> xen/arch/x86/hvm/vmx/vvmx.c | 19 +++++++++++++++++++ >> 1 files changed, 19 insertions(+), 0 deletions(-) >> diff --git a/xen/arch/x86/hvm/vmx/vvmx.c >> b/xen/arch/x86/hvm/vmx/vvmx.c index cecc72f..c6c8b88 100644 >> --- a/xen/arch/x86/hvm/vmx/vvmx.c >> +++ b/xen/arch/x86/hvm/vmx/vvmx.c >> @@ -1814,12 +1814,31 @@ int nvmx_handle_invvpid(struct cpu_user_regs >> *regs) int nvmx_msr_read_intercept(unsigned int msr, u64 >> *msr_content) { >> struct vcpu *v = current; + unsigned int eax, ebx, ecx, edx; >> u64 data = 0, host_data = 0; int r = 1; >> >> if ( !nestedhvm_enabled(v->domain) ) >> return 0; >> + /* >> + * VMX capablitys MSRs available only when guest > > sp. capability. > >> + * support VMX. >> + */ >> + hvm_cpuid(0x1, &eax, &ebx, &ecx, &edx); >> + if ( !(ecx & cpufeat_mask(X86_FEATURE_VMXE)) ) >> + return 0; >> + >> + /* >> + * Those MSRs available only when bit 55 of >> + * MSR_IA32_VMX_BASIC is set. > > Given pedantry about spelling, > > "Those MSRs are" > > Furthermore, do those comments really not fit on a single line?The first one is ok in a single line, but the second one will exceed the 80 characters.> > ~Andrew > >> + */ >> + rdmsrl(MSR_IA32_VMX_BASIC, data); >> + if ( msr >= MSR_IA32_VMX_TRUE_PINBASED_CTLS && >> + msr <= MSR_IA32_VMX_TRUE_ENTRY_CTLS && >> + !(data & VMX_BASIC_DEFAULT1_ZERO) ) >> + return 0; >> + >> rdmsrl(msr, host_data); >> >> /*Best regards, Yang
Zhang, Yang Z
2013-Sep-11 00:41 UTC
Re: [PATCH v2 1/3] Nested VMX: Check VMX capability before read VMX related MSRs.
Jan Beulich wrote on 2013-09-10:>>>> On 10.09.13 at 08:14, Yang Zhang <yang.z.zhang@intel.com> wrote: >> + /* >> + * Those MSRs available only when bit 55 of >> + * MSR_IA32_VMX_BASIC is set. >> + */ >> + rdmsrl(MSR_IA32_VMX_BASIC, data); >> + if ( msr >= MSR_IA32_VMX_TRUE_PINBASED_CTLS && >> + msr <= MSR_IA32_VMX_TRUE_ENTRY_CTLS && > > Doing this with a range check is pretty odd - neither the lower nor > the upper bound are really meant to be boundaries. I''d prefer if you > did this with a switch statement, enumerating all four MSRs. > The compiler can then still convert that to a range comparison. > > And that way you can also more the rdmsrl() out of the fast path. > Which makes me ask - wouldn''t it be better to read this MSR into a > global variable once at boot time, and then use that variable instead > of reading the MSR over and over?Sure. I will update the patch according your comments here and in third patch.> > Jan > >> + !(data & VMX_BASIC_DEFAULT1_ZERO) ) >> + return 0; >> + >Best regards, Yang