The first patch really isn''t as much of a follow-up than what triggered the security issues to be noticed in the first place. 1: x86: preserve FPU selectors for 32-bit guest code 2: x86: fix XCR0 handling 3: x86/xsave: adjust state management 4: x86/fxsave: bring in line with recent xsave adjustments The first two I would see as candidates for 4.3 (as well as subsequent backporting, albeit I realize that especially the first one is non-trivial), while the third is code improvement only, and the fourth is really just cleanup, and hence I''d be fine with deferring them until after 4.3. Signed-off-by: Jan Beulich <jbeulich@suse.com>
Jan Beulich
2013-Jun-04 08:05 UTC
[PATCH 1/4] x86: preserve FPU selectors for 32-bit guest code
Doing {,F}X{SAVE,RSTOR} unconditionally with 64-bit operand size leads to the selector values associated with the last instruction/data pointers getting lost. This, besides being inconsistent and not compatible with native hardware behavior especially for 32-bit guests, leads to bug checks in 32-bit Windows when running with "Driver verifier" (see e.g. http://support.microsoft.com/kb/244617). In a first try I made the code figure out the current guest mode, but that has the disadvantage of only taking care of the issue when the guest executes in the mode for which the state currently is (i.e. namely not in the case of a 64-bit guest running a 32-bit application, but being in kernle [64-bit] mode). The solution presented here is to conditionally execute an auxiliary FNSTENV and use the selectors from there. In either case the determined word size gets stored in the last byte of the FPU/SSE save area, which is available for software use (and I verified is being cleared to zero by all versions of Xen, i.e. will not present a problem when migrating guests from older to newer hosts), and evaluated for determining the operand size {,F}XRSTOR is to be issued with. Note that I did check whether using a second FXSAVE or a partial second XSAVE would be faster than FNSTENV - neither on my Westmere (FXSAVE) nor on my Romley (XSAVE) they are. I was really tempted to use branches into the middle of instructions (i.e. past the REX64 prefixes) here, as that would have allowed to collapse the otherwise identical fault recovery blocks. I stayed away from doing so just because I expect others to dislike such slightly subtle/tricky code. Reported-by: Ben Guthro <Benjamin.Guthro@citrix.com> Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/i387.c +++ b/xen/arch/x86/i387.c @@ -55,28 +55,53 @@ static inline void fpu_fxrstor(struct vc * possibility, which may occur if the block was passed to us by control * tools or through VCPUOP_initialise, by silently clearing the block. */ - asm volatile ( - /* See above for why the operands/constraints are this way. */ - "1: " REX64_PREFIX "fxrstor (%2)\n" - ".section .fixup,\"ax\" \n" - "2: push %%"__OP"ax \n" - " push %%"__OP"cx \n" - " push %%"__OP"di \n" - " lea %0,%%"__OP"di \n" - " mov %1,%%ecx \n" - " xor %%eax,%%eax \n" - " rep ; stosl \n" - " pop %%"__OP"di \n" - " pop %%"__OP"cx \n" - " pop %%"__OP"ax \n" - " jmp 1b \n" - ".previous \n" - _ASM_EXTABLE(1b, 2b) - : - : "m" (*fpu_ctxt), - "i" (sizeof(v->arch.xsave_area->fpu_sse)/4) - ,"cdaSDb" (fpu_ctxt) - ); + switch ( __builtin_expect(fpu_ctxt[FPU_WORD_SIZE_OFFSET], 8) ) + { + default: + asm volatile ( + /* See below for why the operands/constraints are this way. */ + "1: " REX64_PREFIX "fxrstor (%2)\n" + ".section .fixup,\"ax\" \n" + "2: push %%"__OP"ax \n" + " push %%"__OP"cx \n" + " push %%"__OP"di \n" + " mov %2,%%"__OP"di \n" + " mov %1,%%ecx \n" + " xor %%eax,%%eax \n" + " rep ; stosl \n" + " pop %%"__OP"di \n" + " pop %%"__OP"cx \n" + " pop %%"__OP"ax \n" + " jmp 1b \n" + ".previous \n" + _ASM_EXTABLE(1b, 2b) + : + : "m" (*fpu_ctxt), + "i" (sizeof(v->arch.xsave_area->fpu_sse) / 4), + "cdaSDb" (fpu_ctxt) ); + break; + case 4: case 2: + asm volatile ( + "1: fxrstor %0 \n" + ".section .fixup,\"ax\"\n" + "2: push %%"__OP"ax \n" + " push %%"__OP"cx \n" + " push %%"__OP"di \n" + " lea %0,%%"__OP"di \n" + " mov %1,%%ecx \n" + " xor %%eax,%%eax \n" + " rep ; stosl \n" + " pop %%"__OP"di \n" + " pop %%"__OP"cx \n" + " pop %%"__OP"ax \n" + " jmp 1b \n" + ".previous \n" + _ASM_EXTABLE(1b, 2b) + : + : "m" (*fpu_ctxt), + "i" (sizeof(v->arch.xsave_area->fpu_sse) / 4) ); + break; + } } /* Restore x87 extended state */ @@ -104,19 +129,49 @@ static inline void fpu_xsave(struct vcpu /* Save x87 FPU, MMX, SSE and SSE2 state */ static inline void fpu_fxsave(struct vcpu *v) { - char *fpu_ctxt = v->arch.fpu_ctxt; + typeof(v->arch.xsave_area->fpu_sse) *fpu_ctxt = v->arch.fpu_ctxt; + int word_size = cpu_has_fpu_sel ? 8 : 0; - /* - * The only way to force fxsaveq on a wide range of gas versions. On - * older versions the rex64 prefix works only if we force an - * addressing mode that doesn''t require extended registers. - */ - asm volatile ( - REX64_PREFIX "fxsave (%1)" - : "=m" (*fpu_ctxt) : "cdaSDb" (fpu_ctxt) ); + if ( !is_pv_32bit_vcpu(v) ) + { + /* + * The only way to force fxsaveq on a wide range of gas versions. + * On older versions the rex64 prefix works only if we force an + * addressing mode that doesn''t require extended registers. + */ + asm volatile ( REX64_PREFIX "fxsave (%1)" + : "=m" (*fpu_ctxt) : "cdaSDb" (fpu_ctxt) ); + + /* + * AMD CPUs don''t save/restore FDP/FIP/FOP unless an exception + * is pending. + */ + if ( !(fpu_ctxt->fsw & 0x0080) && + boot_cpu_data.x86_vendor == X86_VENDOR_AMD ) + word_size = -1; + + if ( word_size > 0 && + !((fpu_ctxt->fip.addr | fpu_ctxt->fdp.addr) >> 32) ) + { + struct ix87_env fpu_env; + + asm volatile ( "fnstenv %0" : "=m" (fpu_env) ); + fpu_ctxt->fip.sel = fpu_env.fcs; + fpu_ctxt->fdp.sel = fpu_env.fds; + word_size = 4; + } + } + else + { + asm volatile ( "fxsave %0" : "=m" (*fpu_ctxt) ); + word_size = 4; + } + + if ( word_size >= 0 ) + fpu_ctxt->x[FPU_WORD_SIZE_OFFSET] = word_size; /* Clear exception flags if FSW.ES is set. */ - if ( unlikely(fpu_ctxt[2] & 0x80) ) + if ( unlikely(fpu_ctxt->fsw & 0x0080) ) asm volatile ("fnclex"); /* --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -10,6 +10,7 @@ #include <asm/current.h> #include <asm/processor.h> #include <asm/hvm/support.h> +#include <asm/i387.h> #include <asm/xstate.h> #include <asm/asm_defns.h> @@ -56,26 +57,59 @@ void xsave(struct vcpu *v, uint64_t mask struct xsave_struct *ptr = v->arch.xsave_area; uint32_t hmask = mask >> 32; uint32_t lmask = mask; + int word_size = mask & XSTATE_FP ? (cpu_has_fpu_sel ? 8 : 0) : -1; - if ( cpu_has_xsaveopt ) - asm volatile ( - ".byte " REX_PREFIX "0x0f,0xae,0x37" - : - : "a" (lmask), "d" (hmask), "D"(ptr) - : "memory" ); + if ( word_size <= 0 || !is_pv_32bit_vcpu(v) ) + { + if ( cpu_has_xsaveopt ) + asm volatile ( ".byte 0x48,0x0f,0xae,0x37" + : "=m" (*ptr) + : "a" (lmask), "d" (hmask), "D" (ptr) ); + else + asm volatile ( ".byte 0x48,0x0f,0xae,0x27" + : "=m" (*ptr) + : "a" (lmask), "d" (hmask), "D" (ptr) ); + + if ( !(mask & ptr->xsave_hdr.xstate_bv & XSTATE_FP) || + /* + * AMD CPUs don''t save/restore FDP/FIP/FOP unless an exception + * is pending. + */ + (!(ptr->fpu_sse.fsw & 0x0080) && + boot_cpu_data.x86_vendor == X86_VENDOR_AMD) ) + return; + + if ( word_size > 0 && + !((ptr->fpu_sse.fip.addr | ptr->fpu_sse.fdp.addr) >> 32) ) + { + struct ix87_env fpu_env; + + asm volatile ( "fnstenv %0" : "=m" (fpu_env) ); + ptr->fpu_sse.fip.sel = fpu_env.fcs; + ptr->fpu_sse.fdp.sel = fpu_env.fds; + word_size = 4; + } + } else - asm volatile ( - ".byte " REX_PREFIX "0x0f,0xae,0x27" - : - : "a" (lmask), "d" (hmask), "D"(ptr) - : "memory" ); + { + if ( cpu_has_xsaveopt ) + asm volatile ( ".byte 0x0f,0xae,0x37" + : "=m" (*ptr) + : "a" (lmask), "d" (hmask), "D" (ptr) ); + else + asm volatile ( ".byte 0x0f,0xae,0x27" + : "=m" (*ptr) + : "a" (lmask), "d" (hmask), "D" (ptr) ); + word_size = 4; + } + if ( word_size >= 0 ) + ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET] = word_size; } void xrstor(struct vcpu *v, uint64_t mask) { uint32_t hmask = mask >> 32; uint32_t lmask = mask; - struct xsave_struct *ptr = v->arch.xsave_area; /* @@ -98,20 +132,41 @@ void xrstor(struct vcpu *v, uint64_t mas * possibility, which may occur if the block was passed to us by control * tools or through VCPUOP_initialise, by silently clearing the block. */ - asm volatile ( "1: .byte " REX_PREFIX "0x0f,0xae,0x2f\n" - ".section .fixup,\"ax\"\n" - "2: mov %5,%%ecx \n" - " xor %1,%1 \n" - " rep stosb \n" - " lea %2,%0 \n" - " mov %3,%1 \n" - " jmp 1b \n" - ".previous \n" - _ASM_EXTABLE(1b, 2b) - : "+&D" (ptr), "+&a" (lmask) - : "m" (*ptr), "g" (lmask), "d" (hmask), - "m" (xsave_cntxt_size) - : "ecx" ); + switch ( __builtin_expect(ptr->fpu_sse.x[FPU_WORD_SIZE_OFFSET], 8) ) + { + default: + asm volatile ( "1: .byte 0x48,0x0f,0xae,0x2f\n" + ".section .fixup,\"ax\" \n" + "2: mov %5,%%ecx \n" + " xor %1,%1 \n" + " rep stosb \n" + " lea %2,%0 \n" + " mov %3,%1 \n" + " jmp 1b \n" + ".previous \n" + _ASM_EXTABLE(1b, 2b) + : "+&D" (ptr), "+&a" (lmask) + : "m" (*ptr), "g" (lmask), "d" (hmask), + "m" (xsave_cntxt_size) + : "ecx" ); + break; + case 4: case 2: + asm volatile ( "1: .byte 0x0f,0xae,0x2f\n" + ".section .fixup,\"ax\" \n" + "2: mov %5,%%ecx \n" + " xor %1,%1 \n" + " rep stosb \n" + " lea %2,%0 \n" + " mov %3,%1 \n" + " jmp 1b \n" + ".previous \n" + _ASM_EXTABLE(1b, 2b) + : "+&D" (ptr), "+&a" (lmask) + : "m" (*ptr), "g" (lmask), "d" (hmask), + "m" (xsave_cntxt_size) + : "ecx" ); + break; + } } bool_t xsave_enabled(const struct vcpu *v) --- a/xen/include/asm-x86/cpufeature.h +++ b/xen/include/asm-x86/cpufeature.h @@ -147,6 +147,7 @@ #define X86_FEATURE_ERMS (7*32+ 9) /* Enhanced REP MOVSB/STOSB */ #define X86_FEATURE_INVPCID (7*32+10) /* Invalidate Process Context ID */ #define X86_FEATURE_RTM (7*32+11) /* Restricted Transactional Memory */ +#define X86_FEATURE_NO_FPU_SEL (7*32+13) /* FPU CS/DS stored as zero */ #define cpu_has(c, bit) test_bit(bit, (c)->x86_capability) #define boot_cpu_has(bit) test_bit(bit, boot_cpu_data.x86_capability) @@ -183,6 +184,7 @@ #define cpu_has_fsgsbase boot_cpu_has(X86_FEATURE_FSGSBASE) #define cpu_has_smep boot_cpu_has(X86_FEATURE_SMEP) +#define cpu_has_fpu_sel (!boot_cpu_has(X86_FEATURE_NO_FPU_SEL)) #define cpu_has_ffxsr ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) \ && boot_cpu_has(X86_FEATURE_FFXSR)) --- a/xen/include/asm-x86/i387.h +++ b/xen/include/asm-x86/i387.h @@ -14,6 +14,27 @@ #include <xen/types.h> #include <xen/percpu.h> +/* Byte offset of the stored word size within the FXSAVE area/portion. */ +#define FPU_WORD_SIZE_OFFSET 511 + +struct ix87_state { + struct ix87_env { + uint16_t fcw, _res0; + uint16_t fsw, _res1; + uint16_t ftw, _res2; + uint32_t fip; + uint16_t fcs; + uint16_t fop; + uint32_t fdp; + uint16_t fds, _res6; + } env; + struct ix87_reg { + uint64_t mantissa; + uint16_t exponent:15; + uint16_t sign:1; + } __attribute__((__packed__)) r[8]; +}; + void vcpu_restore_fpu_eager(struct vcpu *v); void vcpu_restore_fpu_lazy(struct vcpu *v); void vcpu_save_fpu(struct vcpu *v); --- a/xen/include/asm-x86/xstate.h +++ b/xen/include/asm-x86/xstate.h @@ -34,8 +34,6 @@ #define XSTATE_NONLAZY (XSTATE_LWP) #define XSTATE_LAZY (XSTATE_ALL & ~XSTATE_NONLAZY) -#define REX_PREFIX "0x48, " - /* extended state variables */ DECLARE_PER_CPU(uint64_t, xcr0); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
- both VMX and SVM ignored the ECX input to XSETBV - both SVM and VMX used the full 64-bit RAX when calculating the input mask to XSETBV - faults on XSETBV did not get recovered from Also consolidate the handling for PV and HVM into a single function, and make the per-CPU variable "xcr0" static to xstate.c. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1347,8 +1347,9 @@ static void __context_switch(void) if ( !is_idle_vcpu(n) ) { memcpy(stack_regs, &n->arch.user_regs, CTXT_SWITCH_STACK_BYTES); - if ( xsave_enabled(n) && n->arch.xcr0 != get_xcr0() ) - set_xcr0(n->arch.xcr0); + if ( xsave_enabled(n) && n->arch.xcr0 != get_xcr0() && + !set_xcr0(n->arch.xcr0) ) + BUG(); vcpu_restore_fpu_eager(n); n->arch.ctxt_switch_to(n); } --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -1502,25 +1502,17 @@ out: return rc; } -int hvm_handle_xsetbv(u64 new_bv) +int hvm_handle_xsetbv(u32 index, u64 new_bv) { - struct vcpu *v = current; struct segment_register sreg; - hvm_get_segment_register(v, x86_seg_ss, &sreg); + hvm_get_segment_register(current, x86_seg_ss, &sreg); if ( sreg.attr.fields.dpl != 0 ) goto err; - if ( ((new_bv ^ xfeature_mask) & ~xfeature_mask) || !(new_bv & 1) ) - goto err; - - if ( (xfeature_mask & XSTATE_YMM & new_bv) && !(new_bv & XSTATE_SSE) ) + if ( handle_xsetbv(index, new_bv) ) goto err; - v->arch.xcr0 = new_bv; - v->arch.xcr0_accum |= new_bv; - set_xcr0(new_bv); - return 0; err: hvm_inject_hw_exception(TRAP_gp_fault, 0); --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -2367,7 +2367,8 @@ void svm_vmexit_handler(struct cpu_user_ case VMEXIT_XSETBV: if ( (inst_len = __get_instruction_length(current, INSTR_XSETBV))==0 ) break; - if ( hvm_handle_xsetbv((((u64)regs->edx) << 32) | regs->eax) == 0 ) + if ( hvm_handle_xsetbv(regs->ecx, + (regs->rdx << 32) | regs->_eax) == 0 ) __update_guest_eip(regs, inst_len); break; --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2827,12 +2827,10 @@ void vmx_vmexit_handler(struct cpu_user_ break; case EXIT_REASON_XSETBV: - { - u64 new_bv = (((u64)regs->edx) << 32) | regs->eax; - if ( hvm_handle_xsetbv(new_bv) == 0 ) + if ( hvm_handle_xsetbv(regs->ecx, + (regs->rdx << 32) | regs->_eax) == 0 ) update_guest_eip(); /* Safe: XSETBV */ break; - } case EXIT_REASON_APIC_WRITE: if ( vmx_handle_apic_write() ) --- a/xen/arch/x86/i387.c +++ b/xen/arch/x86/i387.c @@ -36,13 +36,17 @@ static void fpu_init(void) /* Restore x87 extended state */ static inline void fpu_xrstor(struct vcpu *v, uint64_t mask) { + bool_t ok; + /* * XCR0 normally represents what guest OS set. In case of Xen itself, * we set all supported feature mask before doing save/restore. */ - set_xcr0(v->arch.xcr0_accum); + ok = set_xcr0(v->arch.xcr0_accum); + ASSERT(ok); xrstor(v, mask); - set_xcr0(v->arch.xcr0); + ok = set_xcr0(v->arch.xcr0); + ASSERT(ok); } /* Restor x87 FPU, MMX, SSE and SSE2 state */ @@ -118,12 +122,16 @@ static inline void fpu_frstor(struct vcp /* Save x87 extended state */ static inline void fpu_xsave(struct vcpu *v) { + bool_t ok; + /* XCR0 normally represents what guest OS set. In case of Xen itself, * we set all accumulated feature mask before doing save/restore. */ - set_xcr0(v->arch.xcr0_accum); + ok = set_xcr0(v->arch.xcr0_accum); + ASSERT(ok); xsave(v, v->arch.nonlazy_xstate_used ? XSTATE_ALL : XSTATE_LAZY); - set_xcr0(v->arch.xcr0); + ok = set_xcr0(v->arch.xcr0); + ASSERT(ok); } /* Save x87 FPU, MMX, SSE and SSE2 state */ --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -2198,25 +2198,9 @@ static int emulate_privileged_op(struct if ( !guest_kernel_mode(v, regs) ) goto fail; - switch ( (u32)regs->ecx ) - { - case XCR_XFEATURE_ENABLED_MASK: - /* bit 0 of XCR0 must be set and reserved bit must not be set */ - if ( !(new_xfeature & XSTATE_FP) || (new_xfeature & ~xfeature_mask) ) - goto fail; - - /* YMM state takes SSE state as prerequisite. */ - if ( (xfeature_mask & new_xfeature & XSTATE_YMM) && - !(new_xfeature & XSTATE_SSE) ) - goto fail; - - v->arch.xcr0 = new_xfeature; - v->arch.xcr0_accum |= new_xfeature; - set_xcr0(new_xfeature); - break; - default: - goto fail; - } + if ( handle_xsetbv(regs->ecx, new_xfeature) ) + goto fail; + break; } default: --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -5,7 +5,7 @@ * */ -#include <xen/config.h> +#include <xen/percpu.h> #include <xen/sched.h> #include <asm/current.h> #include <asm/processor.h> @@ -27,24 +27,34 @@ u32 xsave_cntxt_size; u64 xfeature_mask; /* Cached xcr0 for fast read */ -DEFINE_PER_CPU(uint64_t, xcr0); +static DEFINE_PER_CPU(uint64_t, xcr0); /* Because XCR0 is cached for each CPU, xsetbv() is not exposed. Users should * use set_xcr0() instead. */ -static inline void xsetbv(u32 index, u64 xfeatures) +static inline bool_t xsetbv(u32 index, u64 xfeatures) { u32 hi = xfeatures >> 32; u32 lo = (u32)xfeatures; - asm volatile (".byte 0x0f,0x01,0xd1" :: "c" (index), - "a" (lo), "d" (hi)); + asm volatile ( "1: .byte 0x0f,0x01,0xd1\n" + "3: \n" + ".section .fixup,\"ax\" \n" + "2: xor %0,%0 \n" + " jmp 3b \n" + ".previous \n" + _ASM_EXTABLE(1b, 2b) + : "+a" (lo) + : "c" (index), "d" (hi)); + return lo != 0; } -void set_xcr0(u64 xfeatures) +bool_t set_xcr0(u64 xfeatures) { + if ( !xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures) ) + return 0; this_cpu(xcr0) = xfeatures; - xsetbv(XCR_XFEATURE_ENABLED_MASK, xfeatures); + return 1; } uint64_t get_xcr0(void) @@ -236,7 +246,8 @@ void xstate_init(void) * Set CR4_OSXSAVE and run "cpuid" to get xsave_cntxt_size. */ set_in_cr4(X86_CR4_OSXSAVE); - set_xcr0((((u64)edx << 32) | eax) & XCNTXT_MASK); + if ( !set_xcr0((((u64)edx << 32) | eax) & XCNTXT_MASK) ) + BUG(); cpuid_count(XSTATE_CPUID, 0, &eax, &ebx, &ecx, &edx); if ( cpu == 0 ) @@ -262,6 +273,28 @@ void xstate_init(void) } } +int handle_xsetbv(u32 index, u64 new_bv) +{ + struct vcpu *curr = current; + + if ( index != XCR_XFEATURE_ENABLED_MASK ) + return -EOPNOTSUPP; + + if ( (new_bv & ~xfeature_mask) || !(new_bv & XSTATE_FP) ) + return -EINVAL; + + if ( (new_bv & XSTATE_YMM) && !(new_bv & XSTATE_SSE) ) + return -EINVAL; + + if ( !set_xcr0(new_bv) ) + return -EFAULT; + + curr->arch.xcr0 = new_bv; + curr->arch.xcr0_accum |= new_bv; + + return 0; +} + /* * Local variables: * mode: C --- a/xen/include/asm-x86/hvm/support.h +++ b/xen/include/asm-x86/hvm/support.h @@ -128,7 +128,7 @@ void hvm_triple_fault(void); void hvm_rdtsc_intercept(struct cpu_user_regs *regs); -int hvm_handle_xsetbv(u64 new_bv); +int __must_check hvm_handle_xsetbv(u32 index, u64 new_bv); /* These functions all return X86EMUL return codes. */ int hvm_set_efer(uint64_t value); --- a/xen/include/asm-x86/xstate.h +++ b/xen/include/asm-x86/xstate.h @@ -9,7 +9,6 @@ #define __ASM_XSTATE_H #include <xen/types.h> -#include <xen/percpu.h> #define FCW_DEFAULT 0x037f #define MXCSR_DEFAULT 0x1f80 @@ -34,9 +33,6 @@ #define XSTATE_NONLAZY (XSTATE_LWP) #define XSTATE_LAZY (XSTATE_ALL & ~XSTATE_NONLAZY) -/* extended state variables */ -DECLARE_PER_CPU(uint64_t, xcr0); - extern unsigned int xsave_cntxt_size; extern u64 xfeature_mask; @@ -75,11 +71,12 @@ struct xsave_struct } __attribute__ ((packed, aligned (64))); /* extended state operations */ -void set_xcr0(u64 xfeatures); +bool_t __must_check set_xcr0(u64 xfeatures); uint64_t get_xcr0(void); void xsave(struct vcpu *v, uint64_t mask); void xrstor(struct vcpu *v, uint64_t mask); bool_t xsave_enabled(const struct vcpu *v); +int __must_check handle_xsetbv(u32 index, u64 new_bv); /* extended state init and cleanup functions */ void xstate_free_save_area(struct vcpu *v); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
The initial state for a vCPU is using default values, so there''s no need to force the XRSTOR to read the state from memory. This saves a couple of thousand restores from memory just during boot of Linux on my Sandy Bridge system (I didn''t try to make further measurements). The above requires that arch_set_info_guest() updates the state flags in the save area when valid floating point state got passed in, but that would really have been needed even before in case XSAVE{,OPT} decided to clear one or both of the FP and SSE bits. Furthermore, hvm_vcpu_reset_state() shouldn''t just clear out the FPU/ SSE area, but needs to re-initialized MXCSR and FCW. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -712,7 +712,11 @@ int arch_set_info_guest( v->arch.vgc_flags = flags; if ( flags & VGCF_I387_VALID ) + { memcpy(v->arch.fpu_ctxt, &c.nat->fpu_ctxt, sizeof(c.nat->fpu_ctxt)); + if ( v->arch.xsave_area ) + v->arch.xsave_area->xsave_hdr.xstate_bv = XSTATE_FP_SSE; + } if ( !compat ) { --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -3454,6 +3454,7 @@ void hvm_vcpu_reset_state(struct vcpu *v { struct domain *d = v->domain; struct segment_register reg; + typeof(v->arch.xsave_area->fpu_sse) *fpu_ctxt = v->arch.fpu_ctxt; domain_lock(d); @@ -3467,7 +3468,12 @@ void hvm_vcpu_reset_state(struct vcpu *v v->arch.guest_table = pagetable_null(); } - memset(v->arch.fpu_ctxt, 0, sizeof(v->arch.xsave_area->fpu_sse)); + memset(fpu_ctxt, 0, sizeof(*fpu_ctxt)); + fpu_ctxt->fcw = FCW_RESET; + fpu_ctxt->mxcsr = MXCSR_DEFAULT; + if ( v->arch.xsave_area ) + v->arch.xsave_area->xsave_hdr.xstate_bv = XSTATE_FP; + v->arch.vgc_flags = VGCF_online; memset(&v->arch.user_regs, 0, sizeof(v->arch.user_regs)); v->arch.user_regs.eflags = 2; --- a/xen/arch/x86/xstate.c +++ b/xen/arch/x86/xstate.c @@ -204,9 +204,13 @@ int xstate_alloc_save_area(struct vcpu * if ( save_area == NULL ) return -ENOMEM; + /* + * Set the memory image to default values, but don''t force the context + * to be loaded from memory (i.e. keep save_area->xsave_hdr.xstate_bv + * clear). + */ save_area->fpu_sse.fcw = FCW_DEFAULT; save_area->fpu_sse.mxcsr = MXCSR_DEFAULT; - save_area->xsave_hdr.xstate_bv = XSTATE_FP_SSE; v->arch.xsave_area = save_area; v->arch.xcr0 = XSTATE_FP_SSE; --- a/xen/include/asm-x86/xstate.h +++ b/xen/include/asm-x86/xstate.h @@ -11,6 +11,7 @@ #include <xen/types.h> #define FCW_DEFAULT 0x037f +#define FCW_RESET 0x0040 #define MXCSR_DEFAULT 0x1f80 #define XSTATE_CPUID 0x0000000d _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Jun-04 08:10 UTC
[PATCH 4/4] x86/fxsave: bring in line with recent xsave adjustments
Defer the FIP/FDP pointer reset needed on AMD CPUs to the restore path, and switch from using EMMS to FFREE here too (to be resistant against eventual future CPUs without MMX support). Also switch from using an almost typeless pointer in fpu_fxrstor() to a properly typed one, thus telling the compiler the truth about which memory gets accessed. Signed-off-by: Jan Beulich <jbeulich@suse.com> --- a/xen/arch/x86/i387.c +++ b/xen/arch/x86/i387.c @@ -52,14 +52,30 @@ static inline void fpu_xrstor(struct vcp /* Restor x87 FPU, MMX, SSE and SSE2 state */ static inline void fpu_fxrstor(struct vcpu *v) { - const char *fpu_ctxt = v->arch.fpu_ctxt; + const typeof(v->arch.xsave_area->fpu_sse) *fpu_ctxt = v->arch.fpu_ctxt; + + /* + * AMD CPUs don''t save/restore FDP/FIP/FOP unless an exception + * is pending. Clear the x87 state here by setting it to fixed + * values. The hypervisor data segment can be sometimes 0 and + * sometimes new user value. Both should be ok. Use the FPU saved + * data block as a safe address because it should be in L1. + */ + if ( !(fpu_ctxt->fsw & 0x0080) && + boot_cpu_data.x86_vendor == X86_VENDOR_AMD ) + { + asm volatile ( "fnclex\n\t" + "ffree %%st(7)\n\t" /* clear stack tag */ + "fildl %0" /* load to clear state */ + : : "m" (*fpu_ctxt) ); + } /* * FXRSTOR can fault if passed a corrupted data block. We handle this * possibility, which may occur if the block was passed to us by control * tools or through VCPUOP_initialise, by silently clearing the block. */ - switch ( __builtin_expect(fpu_ctxt[FPU_WORD_SIZE_OFFSET], 8) ) + switch ( __builtin_expect(fpu_ctxt->x[FPU_WORD_SIZE_OFFSET], 8) ) { default: asm volatile ( @@ -80,8 +96,7 @@ static inline void fpu_fxrstor(struct vc ".previous \n" _ASM_EXTABLE(1b, 2b) : - : "m" (*fpu_ctxt), - "i" (sizeof(v->arch.xsave_area->fpu_sse) / 4), + : "m" (*fpu_ctxt), "i" (sizeof(*fpu_ctxt) / 4), "cdaSDb" (fpu_ctxt) ); break; case 4: case 2: @@ -102,8 +117,7 @@ static inline void fpu_fxrstor(struct vc ".previous \n" _ASM_EXTABLE(1b, 2b) : - : "m" (*fpu_ctxt), - "i" (sizeof(v->arch.xsave_area->fpu_sse) / 4) ); + : "m" (*fpu_ctxt), "i" (sizeof(*fpu_ctxt) / 4) ); break; } } @@ -156,7 +170,7 @@ static inline void fpu_fxsave(struct vcp */ if ( !(fpu_ctxt->fsw & 0x0080) && boot_cpu_data.x86_vendor == X86_VENDOR_AMD ) - word_size = -1; + return; if ( word_size > 0 && !((fpu_ctxt->fip.addr | fpu_ctxt->fdp.addr) >> 32) ) @@ -177,25 +191,6 @@ static inline void fpu_fxsave(struct vcp if ( word_size >= 0 ) fpu_ctxt->x[FPU_WORD_SIZE_OFFSET] = word_size; - - /* Clear exception flags if FSW.ES is set. */ - if ( unlikely(fpu_ctxt->fsw & 0x0080) ) - asm volatile ("fnclex"); - - /* - * AMD CPUs don''t save/restore FDP/FIP/FOP unless an exception - * is pending. Clear the x87 state here by setting it to fixed - * values. The hypervisor data segment can be sometimes 0 and - * sometimes new user value. Both should be ok. Use the FPU saved - * data block as a safe address because it should be in L1. - */ - if ( boot_cpu_data.x86_vendor == X86_VENDOR_AMD ) - { - asm volatile ( - "emms\n\t" /* clear stack tags */ - "fildl %0" /* load to clear state */ - : : "m" (*fpu_ctxt) ); - } } /* Save x87 FPU state */ _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 04/06/2013 08:58, "Jan Beulich" <JBeulich@suse.com> wrote:> The first patch really isn''t as much of a follow-up than what triggered > the security issues to be noticed in the first place. > > 1: x86: preserve FPU selectors for 32-bit guest code > 2: x86: fix XCR0 handling > 3: x86/xsave: adjust state management > 4: x86/fxsave: bring in line with recent xsave adjustments > > The first two I would see as candidates for 4.3 (as well as > subsequent backporting, albeit I realize that especially the first > one is non-trivial), while the third is code improvement only, > and the fourth is really just cleanup, and hence I''d be fine with > deferring them until after 4.3. > > Signed-off-by: Jan Beulich <jbeulich@suse.com>I like the patches, 1 & 2 are good bug fixes. Acked-by: Keir Fraser <keir@xen.org> Patch #1 is quite scary though! I wonder really whether these long-lived issues must be fixed right now, let alone backported? -- Keir
On 06/04/2013 11:00 AM, Keir Fraser wrote:> On 04/06/2013 08:58, "Jan Beulich" <JBeulich@suse.com> wrote: > >> The first patch really isn''t as much of a follow-up than what triggered >> the security issues to be noticed in the first place. >> >> 1: x86: preserve FPU selectors for 32-bit guest code >> 2: x86: fix XCR0 handling >> 3: x86/xsave: adjust state management >> 4: x86/fxsave: bring in line with recent xsave adjustments >> >> The first two I would see as candidates for 4.3 (as well as >> subsequent backporting, albeit I realize that especially the first >> one is non-trivial), while the third is code improvement only, >> and the fourth is really just cleanup, and hence I''d be fine with >> deferring them until after 4.3. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > I like the patches, 1 & 2 are good bug fixes. > Acked-by: Keir Fraser <keir@xen.org> > > Patch #1 is quite scary though! I wonder really whether these long-lived > issues must be fixed right now, let alone backported?Yeah, I was going to say, with all this tricky code going in, including this one, and the XSA-55 (?) one that seems to have tons of tricky changes, whether it might not be a good idea to make sure we have at least 2 weeks of testing and another test day -- or, delay the test day Wednesday until we can get all of these in. Jan, looking at the comments, it seems like 3 and 4 are more about performance than correctness? I think those should probably wait until the 4.4 dev window opens up. -George
>>> On 04.06.13 at 12:00, Keir Fraser <keir.xen@gmail.com> wrote: > On 04/06/2013 08:58, "Jan Beulich" <JBeulich@suse.com> wrote: > >> The first patch really isn''t as much of a follow-up than what triggered >> the security issues to be noticed in the first place. >> >> 1: x86: preserve FPU selectors for 32-bit guest code >> 2: x86: fix XCR0 handling >> 3: x86/xsave: adjust state management >> 4: x86/fxsave: bring in line with recent xsave adjustments >> >> The first two I would see as candidates for 4.3 (as well as >> subsequent backporting, albeit I realize that especially the first >> one is non-trivial), while the third is code improvement only, >> and the fourth is really just cleanup, and hence I''d be fine with >> deferring them until after 4.3. >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > I like the patches, 1 & 2 are good bug fixes. > Acked-by: Keir Fraser <keir@xen.org> > > Patch #1 is quite scary though! I wonder really whether these long-lived > issues must be fixed right now, let alone backported?You can also view this as "finally we get these long standing issues fixed" - I was aware of the selector corruption since back when dealing with the original FXSAVE information leak, just that no-one had an issue with this in practice. Ben, who originally initiated this work, might want to give an estimation on how important these fixes are for their practical purposes. With that, I''ll then defer to George for whether to pull them in right now or only once branched. Jan
>>> On 04.06.13 at 12:09, George Dunlap <george.dunlap@eu.citrix.com> wrote: > On 06/04/2013 11:00 AM, Keir Fraser wrote: >> On 04/06/2013 08:58, "Jan Beulich" <JBeulich@suse.com> wrote: >> >>> The first patch really isn''t as much of a follow-up than what triggered >>> the security issues to be noticed in the first place. >>> >>> 1: x86: preserve FPU selectors for 32-bit guest code >>> 2: x86: fix XCR0 handling >>> 3: x86/xsave: adjust state management >>> 4: x86/fxsave: bring in line with recent xsave adjustments >>> >>> The first two I would see as candidates for 4.3 (as well as >>> subsequent backporting, albeit I realize that especially the first >>> one is non-trivial), while the third is code improvement only, >>> and the fourth is really just cleanup, and hence I''d be fine with >>> deferring them until after 4.3. >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> >> I like the patches, 1 & 2 are good bug fixes. >> Acked-by: Keir Fraser <keir@xen.org> >> >> Patch #1 is quite scary though! I wonder really whether these long-lived >> issues must be fixed right now, let alone backported? > > Yeah, I was going to say, with all this tricky code going in, including > this one, and the XSA-55 (?) one that seems to have tons of tricky > changes, whether it might not be a good idea to make sure we have at > least 2 weeks of testing and another test day -- or, delay the test day > Wednesday until we can get all of these in.Agreed, but I don''t know what implications delaying a Test Day would have. We certainly don''t want to release in a rush with all these new fixes.> Jan, looking at the comments, it seems like 3 and 4 are more about > performance than correctness? I think those should probably wait until > the 4.4 dev window opens up.Yes, as I also said in the overview description above. The question is really just about the first two to go in right away. Jan
On 06/04/2013 11:45 AM, Jan Beulich wrote:>>>> On 04.06.13 at 12:09, George Dunlap <george.dunlap@eu.citrix.com> wrote: >> On 06/04/2013 11:00 AM, Keir Fraser wrote: >>> On 04/06/2013 08:58, "Jan Beulich" <JBeulich@suse.com> wrote: >>> >>>> The first patch really isn''t as much of a follow-up than what triggered >>>> the security issues to be noticed in the first place. >>>> >>>> 1: x86: preserve FPU selectors for 32-bit guest code >>>> 2: x86: fix XCR0 handling >>>> 3: x86/xsave: adjust state management >>>> 4: x86/fxsave: bring in line with recent xsave adjustments >>>> >>>> The first two I would see as candidates for 4.3 (as well as >>>> subsequent backporting, albeit I realize that especially the first >>>> one is non-trivial), while the third is code improvement only, >>>> and the fourth is really just cleanup, and hence I''d be fine with >>>> deferring them until after 4.3. >>>> >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> >>> I like the patches, 1 & 2 are good bug fixes. >>> Acked-by: Keir Fraser <keir@xen.org> >>> >>> Patch #1 is quite scary though! I wonder really whether these long-lived >>> issues must be fixed right now, let alone backported? >> >> Yeah, I was going to say, with all this tricky code going in, including >> this one, and the XSA-55 (?) one that seems to have tons of tricky >> changes, whether it might not be a good idea to make sure we have at >> least 2 weeks of testing and another test day -- or, delay the test day >> Wednesday until we can get all of these in. > > Agreed, but I don''t know what implications delaying a Test Day > would have. We certainly don''t want to release in a rush with all > these new fixes. > >> Jan, looking at the comments, it seems like 3 and 4 are more about >> performance than correctness? I think those should probably wait until >> the 4.4 dev window opens up. > > Yes, as I also said in the overview description above. The question > is really just about the first two to go in right away.These seem pretty clearly like things we need to have fixed in the release -- they''re the kind of thing that is likely to have potentially nasty, hard-to-track-down side effects. -George
On 06/04/2013 06:32 AM, Jan Beulich wrote:>>>> On 04.06.13 at 12:00, Keir Fraser <keir.xen@gmail.com> wrote: >> On 04/06/2013 08:58, "Jan Beulich" <JBeulich@suse.com> wrote: >> >>> The first patch really isn''t as much of a follow-up than what triggered >>> the security issues to be noticed in the first place. >>> >>> 1: x86: preserve FPU selectors for 32-bit guest code >>> 2: x86: fix XCR0 handling >>> 3: x86/xsave: adjust state management >>> 4: x86/fxsave: bring in line with recent xsave adjustments >>> >>> The first two I would see as candidates for 4.3 (as well as >>> subsequent backporting, albeit I realize that especially the first >>> one is non-trivial), while the third is code improvement only, >>> and the fourth is really just cleanup, and hence I''d be fine with >>> deferring them until after 4.3. >>> >>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >> >> I like the patches, 1 & 2 are good bug fixes. >> Acked-by: Keir Fraser <keir@xen.org> >> >> Patch #1 is quite scary though! I wonder really whether these long-lived >> issues must be fixed right now, let alone backported? > > You can also view this as "finally we get these long standing > issues fixed" - I was aware of the selector corruption since back > when dealing with the original FXSAVE information leak, just that > no-one had an issue with this in practice. > > Ben, who originally initiated this work, might want to give an > estimation on how important these fixes are for their practical > purposes. With that, I''ll then defer to George for whether to > pull them in right now or only once branched.This was more of an exercise of QA exercising of code paths uncovering a bug, rather than real-world usage implications. Microsoft ships something called the "driver verifier"[1] in every OS after XP, that can dynamically be turned on for any driver. When enabled for some Citrix PV drivers, they would bug-check (occasionally) during QA nightly test runs on 32bit OSes on this XSAVE/XRSTOR data mismatch. To the best of my knowledge, outside of this somewhat contrived scenario, we never saw this fail in any real-world usage. Therefore, I think the assessment of deferring until after 4.3 is prudent, given the release cycle stage. With my downstream hat on, it would certainly be nice to see these backported to 4.2 though, once in mainline, so we don''t have to carry a patch. However, we can always do so, if necessary. Ben [1] http://support.microsoft.com/kb/244617
>>> On 04.06.13 at 12:47, George Dunlap <george.dunlap@eu.citrix.com> wrote: > On 06/04/2013 11:45 AM, Jan Beulich wrote: >>>>> On 04.06.13 at 12:09, George Dunlap <george.dunlap@eu.citrix.com> wrote: >>> On 06/04/2013 11:00 AM, Keir Fraser wrote: >>>> On 04/06/2013 08:58, "Jan Beulich" <JBeulich@suse.com> wrote: >>>> >>>>> The first patch really isn''t as much of a follow-up than what triggered >>>>> the security issues to be noticed in the first place. >>>>> >>>>> 1: x86: preserve FPU selectors for 32-bit guest code >>>>> 2: x86: fix XCR0 handling >>>>> 3: x86/xsave: adjust state management >>>>> 4: x86/fxsave: bring in line with recent xsave adjustments >>>>> >>>>> The first two I would see as candidates for 4.3 (as well as >>>>> subsequent backporting, albeit I realize that especially the first >>>>> one is non-trivial), while the third is code improvement only, >>>>> and the fourth is really just cleanup, and hence I''d be fine with >>>>> deferring them until after 4.3. >>>>> >>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>> >>>> I like the patches, 1 & 2 are good bug fixes. >>>> Acked-by: Keir Fraser <keir@xen.org> >>>> >>>> Patch #1 is quite scary though! I wonder really whether these long-lived >>>> issues must be fixed right now, let alone backported? >>> >>> Yeah, I was going to say, with all this tricky code going in, including >>> this one, and the XSA-55 (?) one that seems to have tons of tricky >>> changes, whether it might not be a good idea to make sure we have at >>> least 2 weeks of testing and another test day -- or, delay the test day >>> Wednesday until we can get all of these in. >> >> Agreed, but I don''t know what implications delaying a Test Day >> would have. We certainly don''t want to release in a rush with all >> these new fixes. >> >>> Jan, looking at the comments, it seems like 3 and 4 are more about >>> performance than correctness? I think those should probably wait until >>> the 4.4 dev window opens up. >> >> Yes, as I also said in the overview description above. The question >> is really just about the first two to go in right away. > > These seem pretty clearly like things we need to have fixed in the > release -- they''re the kind of thing that is likely to have potentially > nasty, hard-to-track-down side effects.So together with Ben''s most recent reply - do I read the above as an ack for 4.3? Jan
On Tue, Jun 4, 2013 at 2:58 PM, Jan Beulich <JBeulich@suse.com> wrote:>>>> On 04.06.13 at 12:47, George Dunlap <george.dunlap@eu.citrix.com> wrote: >> On 06/04/2013 11:45 AM, Jan Beulich wrote: >>>>>> On 04.06.13 at 12:09, George Dunlap <george.dunlap@eu.citrix.com> wrote: >>>> On 06/04/2013 11:00 AM, Keir Fraser wrote: >>>>> On 04/06/2013 08:58, "Jan Beulich" <JBeulich@suse.com> wrote: >>>>> >>>>>> The first patch really isn''t as much of a follow-up than what triggered >>>>>> the security issues to be noticed in the first place. >>>>>> >>>>>> 1: x86: preserve FPU selectors for 32-bit guest code >>>>>> 2: x86: fix XCR0 handling >>>>>> 3: x86/xsave: adjust state management >>>>>> 4: x86/fxsave: bring in line with recent xsave adjustments >>>>>> >>>>>> The first two I would see as candidates for 4.3 (as well as >>>>>> subsequent backporting, albeit I realize that especially the first >>>>>> one is non-trivial), while the third is code improvement only, >>>>>> and the fourth is really just cleanup, and hence I''d be fine with >>>>>> deferring them until after 4.3. >>>>>> >>>>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>>>> >>>>> I like the patches, 1 & 2 are good bug fixes. >>>>> Acked-by: Keir Fraser <keir@xen.org> >>>>> >>>>> Patch #1 is quite scary though! I wonder really whether these long-lived >>>>> issues must be fixed right now, let alone backported? >>>> >>>> Yeah, I was going to say, with all this tricky code going in, including >>>> this one, and the XSA-55 (?) one that seems to have tons of tricky >>>> changes, whether it might not be a good idea to make sure we have at >>>> least 2 weeks of testing and another test day -- or, delay the test day >>>> Wednesday until we can get all of these in. >>> >>> Agreed, but I don''t know what implications delaying a Test Day >>> would have. We certainly don''t want to release in a rush with all >>> these new fixes. >>> >>>> Jan, looking at the comments, it seems like 3 and 4 are more about >>>> performance than correctness? I think those should probably wait until >>>> the 4.4 dev window opens up. >>> >>> Yes, as I also said in the overview description above. The question >>> is really just about the first two to go in right away. >> >> These seem pretty clearly like things we need to have fixed in the >> release -- they''re the kind of thing that is likely to have potentially >> nasty, hard-to-track-down side effects. > > So together with Ben''s most recent reply - do I read the above as > an ack for 4.3?Yes; the two bug fixes, re the release: Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Possibly Parallel Threads
- [PATCH 2/5] X86 architecture instruction set extension definiation
- [PATCH 2/3 v2] XSAVE/XRSTOR: fix frozen states
- [PATCH 2/4 V2] X86: enable support for new ISA extensions
- [PATCH v3 64/75] x86/sev-es: Cache CPUID results for improved performance
- [PATCH v3 64/75] x86/sev-es: Cache CPUID results for improved performance