Liu, Jinsong
2013-Oct-30 16:07 UTC
[PATCH 4/4] XSA-60 security hole: flush cache when vmentry back to UC guest
From 159251a04afcdcd8ca08e9f2bdfae279b2aa5471 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Thu, 31 Oct 2013 06:38:15 +0800 Subject: [PATCH 4/4] XSA-60 security hole: flush cache when vmentry back to UC guest This patch flush cache when vmentry back to UC guest, to prevent cache polluted by hypervisor access guest memory during UC mode. The elegant way to do this is, simply add wbinvd just before vmentry. However, currently wbinvd before vmentry will mysteriously trigger lapic timer interrupt storm, hung booting stage for 10s ~ 60s. We still didn''t dig out the root cause of interrupt storm, so currently this patch add flag indicating hypervisor access UC guest memory to prevent interrupt storm problem. Whenever the interrupt storm got root caused and fixed, the protection flag can be removed. Suggested-by: Jan Beulich <jbeulich@suse.com> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> --- xen/arch/x86/hvm/hvm.c | 7 +++++++ xen/arch/x86/hvm/vmx/vmx.c | 7 +++++++ xen/include/asm-x86/hvm/hvm.h | 1 + 3 files changed, 15 insertions(+), 0 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index df021de..47eb18d 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -68,6 +68,7 @@ #include <public/mem_event.h> bool_t __read_mostly hvm_enabled; +bool_t __read_mostly hypervisor_access_uc_hvm_memory; unsigned int opt_hvm_debug_level __read_mostly; integer_param("hvm_debug", opt_hvm_debug_level); @@ -2483,6 +2484,9 @@ static enum hvm_copy_result __hvm_copy( return HVMCOPY_unhandleable; #endif + if ( unlikely(curr->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) ) + hypervisor_access_uc_hvm_memory = 1; + while ( todo > 0 ) { count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo); @@ -2596,6 +2600,9 @@ static enum hvm_copy_result __hvm_clear(paddr_t addr, int size) return HVMCOPY_unhandleable; #endif + if ( unlikely(curr->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) ) + hypervisor_access_uc_hvm_memory = 1; + while ( todo > 0 ) { count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo); diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index d846a9c..1cea5a3 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2974,6 +2974,13 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs) struct hvm_vcpu_asid *p_asid; bool_t need_flush; + /* In case hypervisor accessor hvm memory when guest uc mode */ + if ( unlikely(hypervisor_access_uc_hvm_memory) ) + { + hypervisor_access_uc_hvm_memory = 0; + wbinvd(); + } + if ( !cpu_has_vmx_vpid ) goto out; if ( nestedhvm_vcpu_in_guestmode(curr) ) diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h index c9afb56..c7ac6b8 100644 --- a/xen/include/asm-x86/hvm/hvm.h +++ b/xen/include/asm-x86/hvm/hvm.h @@ -197,6 +197,7 @@ struct hvm_function_table { extern struct hvm_function_table hvm_funcs; extern bool_t hvm_enabled; +extern bool_t hypervisor_access_uc_hvm_memory; extern bool_t cpu_has_lmsl; extern s8 hvm_port80_allowed; -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Oct-30 16:39 UTC
Re: [PATCH 4/4] XSA-60 security hole: flush cache when vmentry back to UC guest
>>> On 30.10.13 at 17:07, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > From 159251a04afcdcd8ca08e9f2bdfae279b2aa5471 Mon Sep 17 00:00:00 2001 > From: Liu Jinsong <jinsong.liu@intel.com> > Date: Thu, 31 Oct 2013 06:38:15 +0800 > Subject: [PATCH 4/4] XSA-60 security hole: flush cache when vmentry back to UC guest > > This patch flush cache when vmentry back to UC guest, to prevent > cache polluted by hypervisor access guest memory during UC mode. > > The elegant way to do this is, simply add wbinvd just before vmentry. > However, currently wbinvd before vmentry will mysteriously trigger > lapic timer interrupt storm, hung booting stage for 10s ~ 60s. We still > didn''t dig out the root cause of interrupt storm, so currently this > patch add flag indicating hypervisor access UC guest memory to prevent > interrupt storm problem. Whenever the interrupt storm got root caused > and fixed, the protection flag can be removed.Yeah, almost, except that - the flag should be per-vCPU - you should mention in the description that this still leaves aspects un-addressed (speculative reads at least, and multi-vCPU issues, and I''m sure there are more that I didn''t think of so far) Jan> Suggested-by: Jan Beulich <jbeulich@suse.com> > Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com> > Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> > --- > xen/arch/x86/hvm/hvm.c | 7 +++++++ > xen/arch/x86/hvm/vmx/vmx.c | 7 +++++++ > xen/include/asm-x86/hvm/hvm.h | 1 + > 3 files changed, 15 insertions(+), 0 deletions(-) > > diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c > index df021de..47eb18d 100644 > --- a/xen/arch/x86/hvm/hvm.c > +++ b/xen/arch/x86/hvm/hvm.c > @@ -68,6 +68,7 @@ > #include <public/mem_event.h> > > bool_t __read_mostly hvm_enabled; > +bool_t __read_mostly hypervisor_access_uc_hvm_memory; > > unsigned int opt_hvm_debug_level __read_mostly; > integer_param("hvm_debug", opt_hvm_debug_level); > @@ -2483,6 +2484,9 @@ static enum hvm_copy_result __hvm_copy( > return HVMCOPY_unhandleable; > #endif > > + if ( unlikely(curr->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) ) > + hypervisor_access_uc_hvm_memory = 1; > + > while ( todo > 0 ) > { > count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo); > @@ -2596,6 +2600,9 @@ static enum hvm_copy_result __hvm_clear(paddr_t addr, > int size) > return HVMCOPY_unhandleable; > #endif > > + if ( unlikely(curr->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) ) > + hypervisor_access_uc_hvm_memory = 1; > + > while ( todo > 0 ) > { > count = min_t(int, PAGE_SIZE - (addr & ~PAGE_MASK), todo); > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index d846a9c..1cea5a3 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -2974,6 +2974,13 @@ void vmx_vmenter_helper(const struct cpu_user_regs > *regs) > struct hvm_vcpu_asid *p_asid; > bool_t need_flush; > > + /* In case hypervisor accessor hvm memory when guest uc mode */ > + if ( unlikely(hypervisor_access_uc_hvm_memory) ) > + { > + hypervisor_access_uc_hvm_memory = 0; > + wbinvd(); > + } > + > if ( !cpu_has_vmx_vpid ) > goto out; > if ( nestedhvm_vcpu_in_guestmode(curr) ) > diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h > index c9afb56..c7ac6b8 100644 > --- a/xen/include/asm-x86/hvm/hvm.h > +++ b/xen/include/asm-x86/hvm/hvm.h > @@ -197,6 +197,7 @@ struct hvm_function_table { > > extern struct hvm_function_table hvm_funcs; > extern bool_t hvm_enabled; > +extern bool_t hypervisor_access_uc_hvm_memory; > extern bool_t cpu_has_lmsl; > extern s8 hvm_port80_allowed; > > -- > 1.7.1
Liu, Jinsong
2013-Oct-30 17:22 UTC
Re: [PATCH 4/4] XSA-60 security hole: flush cache when vmentry back to UC guest
Jan Beulich wrote:>>>> On 30.10.13 at 17:07, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: >> From 159251a04afcdcd8ca08e9f2bdfae279b2aa5471 Mon Sep 17 00:00:00 >> 2001 >> From: Liu Jinsong <jinsong.liu@intel.com> >> Date: Thu, 31 Oct 2013 06:38:15 +0800 >> Subject: [PATCH 4/4] XSA-60 security hole: flush cache when vmentry >> back to UC guest >> >> This patch flush cache when vmentry back to UC guest, to prevent >> cache polluted by hypervisor access guest memory during UC mode. >> >> The elegant way to do this is, simply add wbinvd just before vmentry. >> However, currently wbinvd before vmentry will mysteriously trigger >> lapic timer interrupt storm, hung booting stage for 10s ~ 60s. We >> still >> didn''t dig out the root cause of interrupt storm, so currently this >> patch add flag indicating hypervisor access UC guest memory to >> prevent >> interrupt storm problem. Whenever the interrupt storm got root caused >> and fixed, the protection flag can be removed. > > Yeah, almost, except that > - the flag should be per-vCPU > - you should mention in the description that this still leaves aspects > un-addressed (speculative reads at least, and multi-vCPU issues, > and I''m sure there are more that I didn''t think of so far) > > Jan >Update, thanks! Jinsong
Liu, Jinsong
2013-Nov-25 16:08 UTC
Re: [PATCH 4/4] XSA-60 security hole: flush cache when vmentry back to UC guest
>> From: Liu Jinsong <jinsong.liu@intel.com> >> Date: Thu, 31 Oct 2013 06:38:15 +0800 >> Subject: [PATCH 4/4] XSA-60 security hole: flush cache when vmentry >> back to UC guest >> >> This patch flush cache when vmentry back to UC guest, to prevent >> cache polluted by hypervisor access guest memory during UC mode. >> >> The elegant way to do this is, simply add wbinvd just before vmentry. >> However, currently wbinvd before vmentry will mysteriously trigger >> lapic timer interrupt storm, hung booting stage for 10s ~ 60s. We >> still didn''t dig out the root cause of interrupt storm, so currently this >> patch add flag indicating hypervisor access UC guest memory to >> prevent interrupt storm problem. Whenever the interrupt storm got root caused >> and fixed, the protection flag can be removed. >Hi, We re-do some investigation this weekend, dig out the root cause of the interrupt storm, per the test data: ======================================================== Test 1: add wbinvd before vmentry (at vmx_vmenter_helper()) (XEN) uc_vmenter_count = 10607 (XEN) uc_vmexit_count = 10607 (XEN) EXIT-REASON COUNT (XEN) 1 10463 // EXIT_REASON_EXTERNAL_INTERRUPT (XEN) 28 10 // EXIT_REASON_CR_ACCESS (XEN) 31 114 // EXIT_REASON_MSR_READ (XEN) 32 15 // EXIT_REASON_MSR_WRITE (XEN) 54 5 // EXIT_REASON_WBINVD (XEN) TOTAL EXIT-REASON-COUNT = 10607 (XEN) (XEN) vcpu[0] vmentry count = 10492 (XEN) vcpu[1] vmentry count = 37 (XEN) vcpu[2] vmentry count = 40 (XEN) vcpu[3] vmentry count = 38 (XEN) interrupt vec 0xfa occurs 10450 times // lapic timer (XEN) interrupt vec 0xfb occurs 13 times // call function IPI Test 2: current patch which didn''t add wbinvd before vmentry (XEN) uc_vmenter_count = 147 (XEN) uc_vmexit_count = 147 (XEN) EXIT-REASON COUNT (XEN) 1 3 // EXIT_REASON_EXTERNAL_INTERRUPT (XEN) 28 10 // EXIT_REASON_CR_ACCESS (XEN) 31 114 // EXIT_REASON_MSR_READ (XEN) 32 15 // EXIT_REASON_MSR_WRITE (XEN) 54 5 // EXIT_REASON_WBINVD (XEN) TOTAL EXIT-REASON-COUNT = 147 (XEN) (XEN) vcpu[0] vmentry count = 45 (XEN) vcpu[1] vmentry count = 34 (XEN) vcpu[2] vmentry count = 34 (XEN) vcpu[3] vmentry count = 34 (XEN) interrupt vec 0xfa occurs 3 times // lapic timer ================================================================= From data above, (uc_vmentey_count of tes1) - (uc_vmentry_count of test2) = 10607 - 147 = 10460 (intr exit of test1) - (intr exit of test2) = 10463 - 3 = 10460 That means new-added vmentry count _all_ comes from external interrupt, almost 1 wbinvd trigger 1 lapic timer. The root cause is, wbinvd is a _very_ time consuming operation, so 1. wbinvd at vmentry ... timer has a good possibility to expire, and now irq disabled so it would be delayed until 2. ... vmentry back to guest (and irq enalbed), timer interrupt then occurs and drops guest at once; 3. drop to hypervisor ... then vmentry and wbinvd again; This loop will run again and again, interrupt and vmexit again and again, until lucky enough wbinvd happens not to expire timer and then loop break, usually it would occur 10K~60K times, blocking guest 10s~60s. To fix the the dead_like_loop and interrupt storm, an attracted approach is to do the wbinvd at vmentry but at irq enabled stage, so that lapic timer interrupt could be triggerred after wbinvd and got handled at hypervisor, and then ''cleanly'' vmentry back to guest. Unfortunately, this would trigger another real dead-loop inside hypervisor: 1. wbinvd at vmentry ... then lapic timer interrupt since irq enabled 2. timer interrupt handler raise TIMER_SOFTIRQ 3. check softirq pending bits and do_softirq; 4. after softirq handled, re-do vmentry; --> real dead-loop So currently our approach is, still do wbinvd at irq disabled stage of vmentry, but adjust timer by delaying a little time, so that it will not triggerred at once when back to guest. This way all works fine except at rarely occurred case timer delay a little, say, 1ms. Patch will be sent out later. Thanks, Jinsong