Tim Deegan
2008-Oct-20 09:30 UTC
[Xen-devel] [PATCH] VMX: avoid taking locks with irqs disabled
Shuffle the bits of the vmexit handler that run with EFLAGS.IF == 0 up to the top. Otherwise we end up calling spin_lock() with interrupts disabled, which can deadlock against the time-synchronization rendezvous code. Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Oct-20 10:08 UTC
Re: [Xen-devel] [PATCH] VMX: avoid taking locks with irqs disabled
Oh yes. That''s a class of race that I hadn''t realised would be introduced by the new rendezvous code. Spinlocks which are ever taken with IRQs disabled must *always* be taken with IRQs disabled. Spinlocks which are ever taken with IRQs enabled must *always* be taken with IRQs enabled. It''s quite an extra constraint on spinlock usage actually. :-( Actually this new partitioning of locks into two equivalence classes is begging for some run-time checking in debug builds. I''ll sort out a patch for that. -- Keir On 20/10/08 10:30, "Tim Deegan" <Tim.Deegan@citrix.com> wrote:> Shuffle the bits of the vmexit handler that run with EFLAGS.IF == 0 up > to the top. Otherwise we end up calling spin_lock() with interrupts > disabled, which can deadlock against the time-synchronization rendezvous > code. > > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com> > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Oct-21 12:29 UTC
RE: [Xen-devel] [PATCH] VMX: avoid taking locks with irqs disabled
>From: Keir Fraser >Sent: Monday, October 20, 2008 6:09 PM > >Oh yes. That''s a class of race that I hadn''t realised would be >introduced by >the new rendezvous code. Spinlocks which are ever taken with >IRQs disabled >must *always* be taken with IRQs disabled. Spinlocks which are >ever taken >with IRQs enabled must *always* be taken with IRQs enabled.We encountered a similar bug today, when Jimmy is developing FSB HPET broadcast feature. The reason is same that above guideline is not obeyed.> >It''s quite an extra constraint on spinlock usage actually. :-( > >Actually this new partitioning of locks into two equivalence classes is >begging for some run-time checking in debug builds. I''ll sort >out a patch >for that. >Did you flush out the patch into xen bits? I guess there may be other instances breaking that guideline. Just curious whether Linux side already recognized similar constraints. It''s possibly not. Then another angle is, instead of pushing constraint on spinlock usage, could we instead change timer rendezvous style? Use softirq instead of call function should release the constraints like stop_machine. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Oct-21 12:37 UTC
Re: [Xen-devel] [PATCH] VMX: avoid taking locks with irqs disabled
On 21/10/08 13:29, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> Actually this new partitioning of locks into two equivalence classes is >> begging for some run-time checking in debug builds. I''ll sort >> out a patch >> for that. > > Did you flush out the patch into xen bits? I guess there may > be other instances breaking that guideline. Just curious whether > Linux side already recognized similar constraints. It''s possibly > not. Then another angle is, instead of pushing constraint on > spinlock usage, could we instead change timer rendezvous > style? Use softirq instead of call function should release the > constraints like stop_machine. :-)I''m flushing out fixes for this class of race before I check in the debug code. I don''t think it''s an unreasonable new constraint, it just hasn''t been enforced before and so various Xen code breaks. I should get the patches flushed out later today. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Oct-21 12:50 UTC
RE: [Xen-devel] [PATCH] VMX: avoid taking locks with irqs disabled
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Tuesday, October 21, 2008 8:37 PM >On 21/10/08 13:29, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> Actually this new partitioning of locks into two >equivalence classes is >>> begging for some run-time checking in debug builds. I''ll sort >>> out a patch >>> for that. >> >> Did you flush out the patch into xen bits? I guess there may >> be other instances breaking that guideline. Just curious whether >> Linux side already recognized similar constraints. It''s possibly >> not. Then another angle is, instead of pushing constraint on >> spinlock usage, could we instead change timer rendezvous >> style? Use softirq instead of call function should release the >> constraints like stop_machine. :-) > >I''m flushing out fixes for this class of race before I check >in the debug >code. I don''t think it''s an unreasonable new constraint, it >just hasn''t been >enforced before and so various Xen code breaks. I should get >the patches >flushed out later today.I''m a bit curious why call funtion ipi is required here, or why rendezvous is required here. All the rendezvous stuff in current ipi function is just: a) cpu0 waits for all other cpus entering rendezvous loop, and then update master_stime b) other cpus enter loop and wait for cpu0 to update master_stime Then each cpu continues with rest stuff independently. In this case, it seems enough to just ensure master_stime updated before sending softirq, and thus ipi is actually not required. Do I miss anything? :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2008-Oct-21 13:06 UTC
RE: [Xen-devel] [PATCH] VMX: avoid taking locks with irqs disabled
BTW, I just meant for time rendezvous code, not for what you''re adding to spinlock which seems safer to avoid similar dead lock when real rendezvous usage is introduced in future. Thanks, Kevin>From: Tian, Kevin >Sent: Tuesday, October 21, 2008 8:50 PM > >>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>Sent: Tuesday, October 21, 2008 8:37 PM >>On 21/10/08 13:29, "Tian, Kevin" <kevin.tian@intel.com> wrote: >> >>>> Actually this new partitioning of locks into two >>equivalence classes is >>>> begging for some run-time checking in debug builds. I''ll sort >>>> out a patch >>>> for that. >>> >>> Did you flush out the patch into xen bits? I guess there may >>> be other instances breaking that guideline. Just curious whether >>> Linux side already recognized similar constraints. It''s possibly >>> not. Then another angle is, instead of pushing constraint on >>> spinlock usage, could we instead change timer rendezvous >>> style? Use softirq instead of call function should release the >>> constraints like stop_machine. :-) >> >>I''m flushing out fixes for this class of race before I check >>in the debug >>code. I don''t think it''s an unreasonable new constraint, it >>just hasn''t been >>enforced before and so various Xen code breaks. I should get >>the patches >>flushed out later today. > >I''m a bit curious why call funtion ipi is required here, or why >rendezvous is required here. All the rendezvous stuff in current >ipi function is just: >a) cpu0 waits for all other cpus entering rendezvous loop, and >then update master_stime >b) other cpus enter loop and wait for cpu0 to update master_stime > >Then each cpu continues with rest stuff independently. In this >case, it seems enough to just ensure master_stime updated >before sending softirq, and thus ipi is actually not required. >Do I miss anything? :-) > >Thanks, >Kevin > >_______________________________________________ >Xen-devel mailing list >Xen-devel@lists.xensource.com >http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Oct-21 13:08 UTC
Re: [Xen-devel] [PATCH] VMX: avoid taking locks with irqs disabled
On 21/10/08 13:50, "Tian, Kevin" <kevin.tian@intel.com> wrote:> > I''m a bit curious why call funtion ipi is required here, or why > rendezvous is required here. All the rendezvous stuff in current > ipi function is just: > a) cpu0 waits for all other cpus entering rendezvous loop, and > then update master_stime > b) other cpus enter loop and wait for cpu0 to update master_stime > > Then each cpu continues with rest stuff independently. In this > case, it seems enough to just ensure master_stime updated > before sending softirq, and thus ipi is actually not required. > Do I miss anything? :-)We want to gather all timestamps as close together as possible. Dan measured that this produced vastly less system-time skew across CPUs. Hence we do all the stamp gathering in IRQ context. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel