From e2d47e2f75bac6876b7c2eaecfe946966bf27516 Mon Sep 17 00:00:00 2001 From: Liu Jinsong <jinsong.liu@intel.com> Date: Tue, 26 Nov 2013 04:53:17 +0800 Subject: [PATCH] VMX: wbinvd when vmentry under UC This patch flush cache when vmentry back to UC guest, to prevent cache polluted by hypervisor access guest memory during UC mode. However, wbinvd is a _very_ time consuming operation, so 1. wbinvd ... timer has a good possibility to expire while irq disabled, it then would be delayed until 2. ... vmentry back to guest (and irq enalbed), timer interrupt then occurs and drops guest at once; 3. drop to hypervisor ... then vmentry and wbinvd again; This loop will run again and again, until lucky enough wbinvd happens not to expire timer and then loop break, usually it would occur 10K~60K times, blocking guest 10s~60s. reprogram timer to avoid dead_like_loop. Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> --- xen/arch/x86/hvm/vmx/vmx.c | 32 ++++++++++++++++++++++++++++---- 1 files changed, 28 insertions(+), 4 deletions(-) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 75be62e..4768c9b 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -642,10 +642,6 @@ static void vmx_ctxt_switch_to(struct vcpu *v) __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0); } - /* For guest cr0.cd setting, do not use potentially polluted cache */ - if ( unlikely(v->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) ) - wbinvd(); - vmx_restore_guest_msrs(v); vmx_restore_dr(v); } @@ -2967,6 +2963,27 @@ out: nvmx_idtv_handling(); } +/* + * wbinvd is a _very_ time consuming operation, so + * 1. wbinvd ... timer has a good possibility to expire while + * irq disabled, it then would be delayed until + * 2. ... vmentry back to guest (and irq enalbed), timer interrupt + * then occurs and drops guest at once; + * 3. drop to hypervisor ... then vmentry and wbinvd again; + * + * This loop will run again and again, until lucky enough wbinvd + * happens not to expire timer and then loop break, usually it would + * occur 10K~60K times, blocking guest 10s~60s. + * + * reprogram timer to avoid dead_like_loop. + */ +static inline void uc_wbinvd_and_timer_adjust(void) +{ + reprogram_timer(0); + wbinvd(); + reprogram_timer(NOW() + MILLISECS(1)); +} + void vmx_vmenter_helper(const struct cpu_user_regs *regs) { struct vcpu *curr = current; @@ -2974,6 +2991,13 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs) struct hvm_vcpu_asid *p_asid; bool_t need_flush; + /* + * In case hypervisor may access hvm guest memory, and then + * cache line polluted under UC mode. + */ + if ( unlikely(curr->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) ) + uc_wbinvd_and_timer_adjust(); + if ( !cpu_has_vmx_vpid ) goto out; if ( nestedhvm_vcpu_in_guestmode(curr) ) -- 1.7.1 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
On 25/11/13 16:14, Liu, Jinsong wrote:> From e2d47e2f75bac6876b7c2eaecfe946966bf27516 Mon Sep 17 00:00:00 2001 > From: Liu Jinsong <jinsong.liu@intel.com> > Date: Tue, 26 Nov 2013 04:53:17 +0800 > Subject: [PATCH] VMX: wbinvd when vmentry under UC > > This patch flush cache when vmentry back to UC guest, to prevent > cache polluted by hypervisor access guest memory during UC mode. > > However, wbinvd is a _very_ time consuming operation, so > 1. wbinvd ... timer has a good possibility to expire while > irq disabled, it then would be delayed until > 2. ... vmentry back to guest (and irq enalbed), timer interrupt > then occurs and drops guest at once; > 3. drop to hypervisor ... then vmentry and wbinvd again; > > This loop will run again and again, until lucky enough wbinvd > happens not to expire timer and then loop break, usually it would > occur 10K~60K times, blocking guest 10s~60s. > > reprogram timer to avoid dead_like_loop. > > Signed-off-by: Liu Jinsong <jinsong.liu@intel.com> > --- > xen/arch/x86/hvm/vmx/vmx.c | 32 ++++++++++++++++++++++++++++---- > 1 files changed, 28 insertions(+), 4 deletions(-) > > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c > index 75be62e..4768c9b 100644 > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -642,10 +642,6 @@ static void vmx_ctxt_switch_to(struct vcpu *v) > __invept(INVEPT_SINGLE_CONTEXT, ept_get_eptp(ept_data), 0); > } > > - /* For guest cr0.cd setting, do not use potentially polluted cache */ > - if ( unlikely(v->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) ) > - wbinvd(); > - > vmx_restore_guest_msrs(v); > vmx_restore_dr(v); > } > @@ -2967,6 +2963,27 @@ out: > nvmx_idtv_handling(); > } > > +/* > + * wbinvd is a _very_ time consuming operation, so > + * 1. wbinvd ... timer has a good possibility to expire while > + * irq disabled, it then would be delayed until > + * 2. ... vmentry back to guest (and irq enalbed), timer interrupt > + * then occurs and drops guest at once; > + * 3. drop to hypervisor ... then vmentry and wbinvd again; > + * > + * This loop will run again and again, until lucky enough wbinvd > + * happens not to expire timer and then loop break, usually it would > + * occur 10K~60K times, blocking guest 10s~60s. > + * > + * reprogram timer to avoid dead_like_loop. > + */ > +static inline void uc_wbinvd_and_timer_adjust(void) > +{ > + reprogram_timer(0); > + wbinvd(); > + reprogram_timer(NOW() + MILLISECS(1));Do you have any number of the time delta across the wbinvd() ? As it currently stands, I do not think it is reasonable to reprogram the timer like this. ~Andrew> +} > + > void vmx_vmenter_helper(const struct cpu_user_regs *regs) > { > struct vcpu *curr = current; > @@ -2974,6 +2991,13 @@ void vmx_vmenter_helper(const struct cpu_user_regs *regs) > struct hvm_vcpu_asid *p_asid; > bool_t need_flush; > > + /* > + * In case hypervisor may access hvm guest memory, and then > + * cache line polluted under UC mode. > + */ > + if ( unlikely(curr->arch.hvm_vcpu.cache_mode == NO_FILL_CACHE_MODE) ) > + uc_wbinvd_and_timer_adjust(); > + > if ( !cpu_has_vmx_vpid ) > goto out; > if ( nestedhvm_vcpu_in_guestmode(curr) )
>>> On 25.11.13 at 17:39, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 25/11/13 16:14, Liu, Jinsong wrote: >> +/* >> + * wbinvd is a _very_ time consuming operation, so >> + * 1. wbinvd ... timer has a good possibility to expire while >> + * irq disabled, it then would be delayed until >> + * 2. ... vmentry back to guest (and irq enalbed), timer interrupt >> + * then occurs and drops guest at once; >> + * 3. drop to hypervisor ... then vmentry and wbinvd again; >> + * >> + * This loop will run again and again, until lucky enough wbinvd >> + * happens not to expire timer and then loop break, usually it would >> + * occur 10K~60K times, blocking guest 10s~60s. >> + * >> + * reprogram timer to avoid dead_like_loop. >> + */ >> +static inline void uc_wbinvd_and_timer_adjust(void) >> +{ >> + reprogram_timer(0); >> + wbinvd(); >> + reprogram_timer(NOW() + MILLISECS(1)); > > Do you have any number of the time delta across the wbinvd() ? > > As it currently stands, I do not think it is reasonable to reprogram the > timer like this.Indeed I was wondering too, but didn''t get to look in detail at what consequences would arise from doing this. Jan
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Monday, November 25, 2013 8:47 AM > To: Andrew Cooper; Liu, Jinsong > Cc: sherry.hurwitz@amd.com; suravee.suthikulpanit@amd.com; Dugger, > Donald D; Dong, Eddie; Nakajima, Jun; Auld, Will; Zhang, Xiantao; xen- > devel@lists.xen.org; konrad.wilk@oracle.com; zhenzhong.duan@oracle.com; > keir@xen.org; tim@xen.org > Subject: Re: [PATCH] VMX: wbinvd when vmentry under UC > > >>> On 25.11.13 at 17:39, Andrew Cooper <andrew.cooper3@citrix.com> > wrote: > > On 25/11/13 16:14, Liu, Jinsong wrote: > >> +/* > >> + * wbinvd is a _very_ time consuming operation, so > >> + * 1. wbinvd ... timer has a good possibility to expire while > >> + * irq disabled, it then would be delayed until > >> + * 2. ... vmentry back to guest (and irq enalbed), timer interrupt > >> + * then occurs and drops guest at once; > >> + * 3. drop to hypervisor ... then vmentry and wbinvd again; > >> + * > >> + * This loop will run again and again, until lucky enough wbinvd > >> + * happens not to expire timer and then loop break, usually it > would > >> + * occur 10K~60K times, blocking guest 10s~60s. > >> + * > >> + * reprogram timer to avoid dead_like_loop. > >> + */ > >> +static inline void uc_wbinvd_and_timer_adjust(void) { > >> + reprogram_timer(0); > >> + wbinvd(); > >> + reprogram_timer(NOW() + MILLISECS(1)); > > > > Do you have any number of the time delta across the wbinvd() ? > > > > As it currently stands, I do not think it is reasonable to reprogram > > the timer like this. > > Indeed I was wondering too, but didn''t get to look in detail at what > consequences would arise from doing this. > > JanBasically, increase the timer setting so that it is unlikely to fire during wbinvd() but still be there as a safeguard. Then reset as you are currently doing after wbinvd(). Will
Auld, Will wrote:>> -----Original Message----- >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: Monday, November 25, 2013 8:47 AM >> To: Andrew Cooper; Liu, Jinsong >> Cc: sherry.hurwitz@amd.com; suravee.suthikulpanit@amd.com; Dugger, >> Donald D; Dong, Eddie; Nakajima, Jun; Auld, Will; Zhang, Xiantao; >> xen- devel@lists.xen.org; konrad.wilk@oracle.com; >> zhenzhong.duan@oracle.com; keir@xen.org; tim@xen.org Subject: Re: >> [PATCH] VMX: wbinvd when vmentry under UC >> >>>>> On 25.11.13 at 17:39, Andrew Cooper <andrew.cooper3@citrix.com> >> wrote: >>> On 25/11/13 16:14, Liu, Jinsong wrote: >>>> +/* >>>> + * wbinvd is a _very_ time consuming operation, so >>>> + * 1. wbinvd ... timer has a good possibility to expire while >>>> + * irq disabled, it then would be delayed until >>>> + * 2. ... vmentry back to guest (and irq enalbed), timer interrupt >>>> + * then occurs and drops guest at once; >>>> + * 3. drop to hypervisor ... then vmentry and wbinvd again; + * >>>> + * This loop will run again and again, until lucky enough wbinvd >>>> + * happens not to expire timer and then loop break, usually it >>>> would + * occur 10K~60K times, blocking guest 10s~60s. >>>> + * >>>> + * reprogram timer to avoid dead_like_loop. >>>> + */ >>>> +static inline void uc_wbinvd_and_timer_adjust(void) { + >>>> reprogram_timer(0); + wbinvd(); >>>> + reprogram_timer(NOW() + MILLISECS(1)); >>> >>> Do you have any number of the time delta across the wbinvd() ?Generally it depends on how big and how dirty the cache is. In my environment (L1/L2/L3 cache: 64/256/20480K, 2.3G processor), it varies from 1~3ms: (XEN) wbinvd overhead: 0x6be9ad tsc, 3082209 ns ... (XEN) wbinvd overhead: 0x26bc68 tsc, 1106378 ns>>> >>> As it currently stands, I do not think it is reasonable to reprogram >>> the timer like this. >> >> Indeed I was wondering too, but didn''t get to look in detail at what >> consequences would arise from doing this. >> >> Jan > > Basically, increase the timer setting so that it is unlikely to fire > during wbinvd() but still be there as a safeguard. Then reset as you > are currently doing after wbinvd(). > > WillYes. reprogram_timer here just delay timer a little slot, say, 1~2ms. I think it''s OK, i.e. at any point of wbinvd() operation at hypervisor, or any irq disabled area, timer interrupt in fact also has good chance to be delayed some time -- however at TIMER_SOFTIRQ, all expired thing would be executed, and re-calculated and set next time point via reprogram_timer -- that''s OK. Another option is to trust guest. We can remove wbinvd from vmentry -- per SDM when guest sets cr0.cd it should explicitly flush cache, otherwise itself cannot guarantee cache coherency. Thoughts? Thanks, Jinsong
>>> On 26.11.13 at 09:56, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Another option is to trust guest. We can remove wbinvd from vmentry -- per SDM > when guest sets cr0.cd it should explicitly flush cache, otherwise itself > cannot guarantee cache coherency.Which - once again - wouldn''t account for hypervisor writes to guest memory. Jan
Liu, Jinsong wrote:> Auld, Will wrote: >>> -----Original Message----- >>> From: Jan Beulich [mailto:JBeulich@suse.com] >>> Sent: Monday, November 25, 2013 8:47 AM >>> To: Andrew Cooper; Liu, Jinsong >>> Cc: sherry.hurwitz@amd.com; suravee.suthikulpanit@amd.com; Dugger, >>> Donald D; Dong, Eddie; Nakajima, Jun; Auld, Will; Zhang, Xiantao; >>> xen- devel@lists.xen.org; konrad.wilk@oracle.com; >>> zhenzhong.duan@oracle.com; keir@xen.org; tim@xen.org Subject: Re: >>> [PATCH] VMX: wbinvd when vmentry under UC >>> >>>>>> On 25.11.13 at 17:39, Andrew Cooper <andrew.cooper3@citrix.com> >>> wrote: >>>> On 25/11/13 16:14, Liu, Jinsong wrote: >>>>> +/* >>>>> + * wbinvd is a _very_ time consuming operation, so >>>>> + * 1. wbinvd ... timer has a good possibility to expire while >>>>> + * irq disabled, it then would be delayed until >>>>> + * 2. ... vmentry back to guest (and irq enalbed), timer >>>>> interrupt + * then occurs and drops guest at once; >>>>> + * 3. drop to hypervisor ... then vmentry and wbinvd again; + * >>>>> + * This loop will run again and again, until lucky enough wbinvd >>>>> + * happens not to expire timer and then loop break, usually it >>>>> would + * occur 10K~60K times, blocking guest 10s~60s. + * >>>>> + * reprogram timer to avoid dead_like_loop. >>>>> + */ >>>>> +static inline void uc_wbinvd_and_timer_adjust(void) { + >>>>> reprogram_timer(0); + wbinvd(); >>>>> + reprogram_timer(NOW() + MILLISECS(1)); >>>> >>>> Do you have any number of the time delta across the wbinvd() ? > > Generally it depends on how big and how dirty the cache is. > In my environment (L1/L2/L3 cache: 64/256/20480K, 2.3G processor), it > varies from 1~3ms: (XEN) wbinvd overhead: 0x6be9ad tsc, 3082209 ns > ... > (XEN) wbinvd overhead: 0x26bc68 tsc, 1106378 ns > >>>> >>>> As it currently stands, I do not think it is reasonable to >>>> reprogram the timer like this. >>> >>> Indeed I was wondering too, but didn''t get to look in detail at what >>> consequences would arise from doing this. >>> >>> Jan >> >> Basically, increase the timer setting so that it is unlikely to fire >> during wbinvd() but still be there as a safeguard. Then reset as you >> are currently doing after wbinvd(). >> >> Will > > Yes. reprogram_timer here just delay timer a little slot, say, 1~2ms. > I think it''s OK, i.e. at any point of wbinvd() operation at > hypervisor, or any irq disabled area, timer interrupt in fact also > has good chance to be delayed some time -- however at TIMER_SOFTIRQ, > all expired thing would be executed, and re-calculated and set next > time point via reprogram_timer -- that''s OK.Comments/thoughts about this option?> > Another option is to trust guest. We can remove wbinvd from vmentry > -- per SDM when guest sets cr0.cd it should explicitly flush cache, > otherwise itself cannot guarantee cache coherency.OK, as you said we don''t consider option 2. Thanks, Jinsong
>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>> >Liu, Jinsong wrote: >> Yes. reprogram_timer here just delay timer a little slot, say, 1~2ms. >> I think it''s OK, i.e. at any point of wbinvd() operation at >> hypervisor, or any irq disabled area, timer interrupt in fact also >> has good chance to be delayed some time -- however at TIMER_SOFTIRQ, >> all expired thing would be executed, and re-calculated and set next >> time point via reprogram_timer -- that''s OK. > >Comments/thoughts about this option?Apart from continuing to be very uncertain that this won''t have any bad side effects, I''m also rather concerned that you deal with one special case interrupt here, ignoring other potentially high rate ones (like such coming from NICs). Jan
Jan Beulich wrote:>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>> >> Liu, Jinsong wrote: >>> Yes. reprogram_timer here just delay timer a little slot, say, >>> 1~2ms. I think it''s OK, i.e. at any point of wbinvd() operation at >>> hypervisor, or any irq disabled area, timer interrupt in fact also >>> has good chance to be delayed some time -- however at TIMER_SOFTIRQ, >>> all expired thing would be executed, and re-calculated and set next >>> time point via reprogram_timer -- that''s OK. >> >> Comments/thoughts about this option? > > Apart from continuing to be very uncertain that this won''t have any > bad side effects, I''m also rather concerned that you deal with one > special case interrupt here, ignoring other potentially high rate ones > (like such coming from NICs). > > JanConsidering this, seems adding flag is the only work around way since high freq interrupt would result in dead-like-loop. My concern of adding flag is it''s not easy to clean every possible path, especially future extension. Or, do not support vt-d w/o snoop. Thanks, Jinsong
On 29/11/13 14:15, Liu, Jinsong wrote:> Jan Beulich wrote: >>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>> >>> Liu, Jinsong wrote: >>>> Yes. reprogram_timer here just delay timer a little slot, say, >>>> 1~2ms. I think it''s OK, i.e. at any point of wbinvd() operation at >>>> hypervisor, or any irq disabled area, timer interrupt in fact also >>>> has good chance to be delayed some time -- however at TIMER_SOFTIRQ, >>>> all expired thing would be executed, and re-calculated and set next >>>> time point via reprogram_timer -- that''s OK. >>> Comments/thoughts about this option? >> Apart from continuing to be very uncertain that this won''t have any >> bad side effects, I''m also rather concerned that you deal with one >> special case interrupt here, ignoring other potentially high rate ones >> (like such coming from NICs). >> >> Jan > Considering this, seems adding flag is the only work around way since high freq interrupt would result in dead-like-loop. My concern of adding flag is it''s not easy to clean every possible path, especially future extension. > > Or, do not support vt-d w/o snoop. > > Thanks, > JinsongDo you know how many systems have vt-d without snoop ? ~Andrew
Andrew Cooper wrote:> On 29/11/13 14:15, Liu, Jinsong wrote: >> Jan Beulich wrote: >>>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>> >>>> Liu, Jinsong wrote: >>>>> Yes. reprogram_timer here just delay timer a little slot, say, >>>>> 1~2ms. I think it''s OK, i.e. at any point of wbinvd() operation at >>>>> hypervisor, or any irq disabled area, timer interrupt in fact also >>>>> has good chance to be delayed some time -- however at >>>>> TIMER_SOFTIRQ, all expired thing would be executed, and >>>>> re-calculated and set next time point via reprogram_timer -- >>>>> that''s OK. >>>> Comments/thoughts about this option? >>> Apart from continuing to be very uncertain that this won''t have any >>> bad side effects, I''m also rather concerned that you deal with one >>> special case interrupt here, ignoring other potentially high rate >>> ones (like such coming from NICs). >>> >>> Jan >> Considering this, seems adding flag is the only work around way >> since high freq interrupt would result in dead-like-loop. My concern >> of adding flag is it''s not easy to clean every possible path, >> especially future extension. >> >> Or, do not support vt-d w/o snoop. >> >> Thanks, >> Jinsong > > Do you know how many systems have vt-d without snoop ? > > ~AndrewYes, that''s what I need check inside Intel. Maybe not feasible idea I agree. Thanks, Jinsong
On 29/11/13 14:31, Liu, Jinsong wrote:> Andrew Cooper wrote: >> On 29/11/13 14:15, Liu, Jinsong wrote: >>> Jan Beulich wrote: >>>>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>> >>>>> Liu, Jinsong wrote: >>>>>> Yes. reprogram_timer here just delay timer a little slot, say, >>>>>> 1~2ms. I think it''s OK, i.e. at any point of wbinvd() operation at >>>>>> hypervisor, or any irq disabled area, timer interrupt in fact also >>>>>> has good chance to be delayed some time -- however at >>>>>> TIMER_SOFTIRQ, all expired thing would be executed, and >>>>>> re-calculated and set next time point via reprogram_timer -- >>>>>> that''s OK. >>>>> Comments/thoughts about this option? >>>> Apart from continuing to be very uncertain that this won''t have any >>>> bad side effects, I''m also rather concerned that you deal with one >>>> special case interrupt here, ignoring other potentially high rate >>>> ones (like such coming from NICs). >>>> >>>> Jan >>> Considering this, seems adding flag is the only work around way >>> since high freq interrupt would result in dead-like-loop. My concern >>> of adding flag is it''s not easy to clean every possible path, >>> especially future extension. >>> >>> Or, do not support vt-d w/o snoop. >>> >>> Thanks, >>> Jinsong >> Do you know how many systems have vt-d without snoop ? >> >> ~Andrew > Yes, that''s what I need check inside Intel. Maybe not feasible idea I agree. > > Thanks, > JinsongGiven that PCIPassthrough realistically involves requiring trusting the guest administrator, it might be feasible to have another iommu= option of "allow passthough even without snoop". ~Andrew
>>> On 29.11.13 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com> wrote: > Andrew Cooper wrote: >> On 29/11/13 14:15, Liu, Jinsong wrote: >>> Jan Beulich wrote: >>>>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>> >>>>> Liu, Jinsong wrote: >>>>>> Yes. reprogram_timer here just delay timer a little slot, say, >>>>>> 1~2ms. I think it''s OK, i.e. at any point of wbinvd() operation at >>>>>> hypervisor, or any irq disabled area, timer interrupt in fact also >>>>>> has good chance to be delayed some time -- however at >>>>>> TIMER_SOFTIRQ, all expired thing would be executed, and >>>>>> re-calculated and set next time point via reprogram_timer -- >>>>>> that''s OK. >>>>> Comments/thoughts about this option? >>>> Apart from continuing to be very uncertain that this won''t have any >>>> bad side effects, I''m also rather concerned that you deal with one >>>> special case interrupt here, ignoring other potentially high rate >>>> ones (like such coming from NICs). >>>> >>> Considering this, seems adding flag is the only work around way >>> since high freq interrupt would result in dead-like-loop. My concern >>> of adding flag is it''s not easy to clean every possible path, >>> especially future extension. >>> >>> Or, do not support vt-d w/o snoop. >> >> Do you know how many systems have vt-d without snoop ? > > Yes, that''s what I need check inside Intel. Maybe not feasible idea I agree.But let''s take a step back here: Even with the old solution, there wasn''t any cache flushing being done when the guest was running in UC mode. So with what we have now we''re already doing better than before in this regard. The primary goal here (and the criteria for backporting the patches) is to address XSA-60. Which, if I''m not mistaken, we achieved with the first three of the four initial patches. Patch 4 and anything beyond really are addressing a distinct issue. Jan
>>> On 29.11.13 at 15:50, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > Given that PCIPassthrough realistically involves requiring trusting the > guest administrator,Why? As long as the passed through device isn''t buggy, there shouldn''t be any risk associated with handing it to the guest (once we managed to fix all respective software bugs). Jan> it might be feasible to have another iommu= option > of "allow passthough even without snoop". > > ~Andrew
Jan Beulich wrote:>>>> On 29.11.13 at 15:31, "Liu, Jinsong" <jinsong.liu@intel.com> >>>> wrote: Andrew Cooper wrote: On 29/11/13 14:15, Liu, Jinsong wrote: >>>> Jan Beulich wrote: >>>>>>>> "Liu, Jinsong" <jinsong.liu@intel.com> 11/28/13 8:17 AM >>> >>>>>> Liu, Jinsong wrote: >>>>>>> Yes. reprogram_timer here just delay timer a little slot, say, >>>>>>> 1~2ms. I think it''s OK, i.e. at any point of wbinvd() operation >>>>>>> at hypervisor, or any irq disabled area, timer interrupt in >>>>>>> fact also has good chance to be delayed some time -- however at >>>>>>> TIMER_SOFTIRQ, all expired thing would be executed, and >>>>>>> re-calculated and set next time point via reprogram_timer -- >>>>>>> that''s OK. >>>>>> Comments/thoughts about this option? >>>>> Apart from continuing to be very uncertain that this won''t have >>>>> any bad side effects, I''m also rather concerned that you deal >>>>> with one special case interrupt here, ignoring other potentially >>>>> high rate ones (like such coming from NICs). >>>>> >>>> Considering this, seems adding flag is the only work around way >>>> since high freq interrupt would result in dead-like-loop. My >>>> concern of adding flag is it''s not easy to clean every possible >>>> path, especially future extension. >>>> >>>> Or, do not support vt-d w/o snoop. >>> >>> Do you know how many systems have vt-d without snoop ? >> >> Yes, that''s what I need check inside Intel. Maybe not feasible idea >> I agree. > > But let''s take a step back here: Even with the old solution, there > wasn''t any cache flushing being done when the guest was running > in UC mode. So with what we have now we''re already doing better > than before in this regard. The primary goal here (and the criteria > for backporting the patches) is to address XSA-60. Which, if I''m not > mistaken, we achieved with the first three of the four initial > patches. Patch 4 and anything beyond really are addressing a distinct > issue. > > JanThe reason of ''do not support vt-d w/o snoop'' is not only for XSA-60, though it seems got fix. My real concern comes from cacheability confliction coming from vt-d w/o snoop (and IPAT bit) -- same cpu can access same physical memory w/ different cache attributes via different mapping -- which is explicitly warned by Intel SDM, though until now it didn''t bring other safety issue at Xen. Thanks, Jinsong
Possibly Parallel Threads
- [PATCH 4/4] XSA-60 security hole: flush cache when vmentry back to UC guest
- [PATCH 2/4] CPUIDLE: Avoid remnant LAPIC timer intr while force hpetbroadcast
- [PATCH v2] x86: Use deep C states for off-lined CPUs
- [PATCH 01/13] x86/paravirt: remove wbinvd() paravirt interface
- [PATCH][VT] add "wbinvd" instruction emulattion for real mode code