David Vrabel
2013-Oct-10 17:29 UTC
[PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
From: David Vrabel <david.vrabel@citrix.com> sched_move_domain() changes v->processor for all the domain''s VCPUs. If another domain, softirq etc. triggers a simultaneous call to vcpu_wake() (e.g., by setting an event channel as pending), then vcpu_wake() may lock one schedule lock and try to unlock another. vcpu_schedule_lock() attempts to handle this but only does so for the window between reading the schedule_lock from the per-CPU data and the spin_lock() call. This does not help with sched_move_domain() changing v->processor between the calls to vcpu_schedule_lock() and vcpu_schedule_unlock(). Fix the race by taking the schedule_lock for v->processor in sched_move_domain(). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: George Dunlap <george.dunlap@eu.citrix.com> Cc: Juergen Gross <juergen.gross@ts.fujitsu.com> Cc: Andrew Cooper <andrew.cooper3@citrix.com> --- Just taking the lock for the old processor seemed sufficient to me as anything seeing the new value would lock and unlock using the same new value. But do we need to take the schedule_lock for the new processor as well (in the right order of course)? This is reproducable by constantly migrating a domain between two CPU pools. 8<------------ while true; do xl cpupool-migrate $1 Pool-1 xl cpupool-migrate $1 Pool-0 done --- xen/common/schedule.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 1ddfb22..28e063e 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -278,6 +278,9 @@ int sched_move_domain(struct domain *d, struct cpupool *c) new_p = cpumask_first(c->cpu_valid); for_each_vcpu ( d, v ) { + spinlock_t *schedule_lock = per_cpu(schedule_data, + v->processor).schedule_lock; + vcpudata = v->sched_priv; migrate_timer(&v->periodic_timer, new_p); @@ -285,7 +288,11 @@ int sched_move_domain(struct domain *d, struct cpupool *c) migrate_timer(&v->poll_timer, new_p); cpumask_setall(v->cpu_affinity); + + spin_lock_irq(schedule_lock); v->processor = new_p; + spin_unlock_irq(schedule_lock); + v->sched_priv = vcpu_priv[v->vcpu_id]; evtchn_move_pirqs(v); -- 1.7.2.5
Andrew Cooper
2013-Oct-10 18:01 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 10/10/13 18:29, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > sched_move_domain() changes v->processor for all the domain''s VCPUs. > If another domain, softirq etc. triggers a simultaneous call to > vcpu_wake() (e.g., by setting an event channel as pending), then > vcpu_wake() may lock one schedule lock and try to unlock another. > > vcpu_schedule_lock() attempts to handle this but only does so for the > window between reading the schedule_lock from the per-CPU data and the > spin_lock() call. This does not help with sched_move_domain() > changing v->processor between the calls to vcpu_schedule_lock() and > vcpu_schedule_unlock(). > > Fix the race by taking the schedule_lock for v->processor in > sched_move_domain(). > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: George Dunlap <george.dunlap@eu.citrix.com> > Cc: Juergen Gross <juergen.gross@ts.fujitsu.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > --- > > Just taking the lock for the old processor seemed sufficient to me as > anything seeing the new value would lock and unlock using the same new > value. But do we need to take the schedule_lock for the new processor > as well (in the right order of course)?David and I have been discussing this for a while, involving a whiteboard, and not come to a firm conclusion either way. From my point of view, holding the appropriate vcpu schedule lock entitles you to play with vcpu scheduling state, which involves following v->sched_priv which we update outside the critical region later. Only taking the one lock still leaves a race condition where another cpu can follow the new v->processor and obtain the schedule lock, at which point we have two threads both working on the internals of a vcpu. The change below certainly will fix the current bug of locking one spinlock and unlocking another. My gut feeling is that we do need to take both locks to be safe in terms of data access, but we would appreciate advice from someone more familiar with the scheduler locking. ~Andrew> > This is reproducable by constantly migrating a domain between two CPU > pools. > 8<------------ > while true; do > xl cpupool-migrate $1 Pool-1 > xl cpupool-migrate $1 Pool-0 > done > --- > xen/common/schedule.c | 7 +++++++ > 1 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index 1ddfb22..28e063e 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -278,6 +278,9 @@ int sched_move_domain(struct domain *d, struct cpupool *c) > new_p = cpumask_first(c->cpu_valid); > for_each_vcpu ( d, v ) > { > + spinlock_t *schedule_lock = per_cpu(schedule_data, > + v->processor).schedule_lock; > + > vcpudata = v->sched_priv; > > migrate_timer(&v->periodic_timer, new_p); > @@ -285,7 +288,11 @@ int sched_move_domain(struct domain *d, struct cpupool *c) > migrate_timer(&v->poll_timer, new_p); > > cpumask_setall(v->cpu_affinity); > + > + spin_lock_irq(schedule_lock); > v->processor = new_p; > + spin_unlock_irq(schedule_lock); > + > v->sched_priv = vcpu_priv[v->vcpu_id]; > evtchn_move_pirqs(v); >
Keir Fraser
2013-Oct-10 18:27 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 10/10/2013 19:01, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:>> Just taking the lock for the old processor seemed sufficient to me as >> anything seeing the new value would lock and unlock using the same new >> value. But do we need to take the schedule_lock for the new processor >> as well (in the right order of course)? > > David and I have been discussing this for a while, involving a > whiteboard, and not come to a firm conclusion either way. > > From my point of view, holding the appropriate vcpu schedule lock > entitles you to play with vcpu scheduling state, which involves > following v->sched_priv which we update outside the critical region later. > > Only taking the one lock still leaves a race condition where another cpu > can follow the new v->processor and obtain the schedule lock, at which > point we have two threads both working on the internals of a vcpu. The > change below certainly will fix the current bug of locking one spinlock > and unlocking another. > > My gut feeling is that we do need to take both locks to be safe in terms > of data access, but we would appreciate advice from someone more > familiar with the scheduler locking.If it''s that tricky to work out, why not just take the two locks, appropriately ordered? This isn''t a hot path. -- Keir
Juergen Gross
2013-Oct-11 06:37 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 10.10.2013 19:29, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > sched_move_domain() changes v->processor for all the domain''s VCPUs. > If another domain, softirq etc. triggers a simultaneous call to > vcpu_wake() (e.g., by setting an event channel as pending), then > vcpu_wake() may lock one schedule lock and try to unlock another. > > vcpu_schedule_lock() attempts to handle this but only does so for the > window between reading the schedule_lock from the per-CPU data and the > spin_lock() call. This does not help with sched_move_domain() > changing v->processor between the calls to vcpu_schedule_lock() and > vcpu_schedule_unlock(). > > Fix the race by taking the schedule_lock for v->processor in > sched_move_domain(). > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: George Dunlap <george.dunlap@eu.citrix.com> > Cc: Juergen Gross <juergen.gross@ts.fujitsu.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > --- > > Just taking the lock for the old processor seemed sufficient to me as > anything seeing the new value would lock and unlock using the same new > value. But do we need to take the schedule_lock for the new processor > as well (in the right order of course)?I don''t think it is necessary to take both locks. There can''t be any scheduler specific (e.g. credit) activity on the vcpu(s), as they are removed from the source scheduler before and will be added to the target scheduler after switching the processor. BTW: good catch! I think this explains a problem I have been searching for some time now... Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>> > This is reproducable by constantly migrating a domain between two CPU > pools. > 8<------------ > while true; do > xl cpupool-migrate $1 Pool-1 > xl cpupool-migrate $1 Pool-0 > done > --- > xen/common/schedule.c | 7 +++++++ > 1 files changed, 7 insertions(+), 0 deletions(-) > > diff --git a/xen/common/schedule.c b/xen/common/schedule.c > index 1ddfb22..28e063e 100644 > --- a/xen/common/schedule.c > +++ b/xen/common/schedule.c > @@ -278,6 +278,9 @@ int sched_move_domain(struct domain *d, struct cpupool *c) > new_p = cpumask_first(c->cpu_valid); > for_each_vcpu ( d, v ) > { > + spinlock_t *schedule_lock = per_cpu(schedule_data, > + v->processor).schedule_lock; > + > vcpudata = v->sched_priv; > > migrate_timer(&v->periodic_timer, new_p); > @@ -285,7 +288,11 @@ int sched_move_domain(struct domain *d, struct cpupool *c) > migrate_timer(&v->poll_timer, new_p); > > cpumask_setall(v->cpu_affinity); > + > + spin_lock_irq(schedule_lock); > v->processor = new_p; > + spin_unlock_irq(schedule_lock); > + > v->sched_priv = vcpu_priv[v->vcpu_id]; > evtchn_move_pirqs(v); > >-- Juergen Gross Principal Developer Operating Systems PBG PDG ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Mies van der Rohe Str. 8 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html
Jan Beulich
2013-Oct-11 07:12 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
>>> On 10.10.13 at 20:27, Keir Fraser <keir.xen@gmail.com> wrote: > On 10/10/2013 19:01, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote: > >>> Just taking the lock for the old processor seemed sufficient to me as >>> anything seeing the new value would lock and unlock using the same new >>> value. But do we need to take the schedule_lock for the new processor >>> as well (in the right order of course)? >> >> David and I have been discussing this for a while, involving a >> whiteboard, and not come to a firm conclusion either way. >> >> From my point of view, holding the appropriate vcpu schedule lock >> entitles you to play with vcpu scheduling state, which involves >> following v->sched_priv which we update outside the critical region later. >> >> Only taking the one lock still leaves a race condition where another cpu >> can follow the new v->processor and obtain the schedule lock, at which >> point we have two threads both working on the internals of a vcpu. The >> change below certainly will fix the current bug of locking one spinlock >> and unlocking another. >> >> My gut feeling is that we do need to take both locks to be safe in terms >> of data access, but we would appreciate advice from someone more >> familiar with the scheduler locking. > > If it''s that tricky to work out, why not just take the two locks, > appropriately ordered? This isn''t a hot path.Shouldn''t we rather fix the locking mechanism itself, making vcpu_schedule_lock...() return the lock, such that the unlock will unavoidably use the correct lock? That would at once allow dropping vcpu_schedule_unlock...() altogether, which would be a good thing even if only considering the explicit uses of local_irq_disable() there (instead of using the right spin lock primitives). And if done that way, replacing the explicit uses of local_irq_enable() in the locking paths would also seem rather desirable - after all this defeats the spin lock primitives wanting to re-enable interrupts while waiting for a lock. Jan
Keir Fraser
2013-Oct-11 08:07 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 11/10/2013 08:12, "Jan Beulich" <JBeulich@suse.com> wrote:>>>> On 10.10.13 at 20:27, Keir Fraser <keir.xen@gmail.com> wrote: >> On 10/10/2013 19:01, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote: >> >>>> Just taking the lock for the old processor seemed sufficient to me as >>>> anything seeing the new value would lock and unlock using the same new >>>> value. But do we need to take the schedule_lock for the new processor >>>> as well (in the right order of course)? >>> >>> David and I have been discussing this for a while, involving a >>> whiteboard, and not come to a firm conclusion either way. >>> >>> From my point of view, holding the appropriate vcpu schedule lock >>> entitles you to play with vcpu scheduling state, which involves >>> following v->sched_priv which we update outside the critical region later. >>> >>> Only taking the one lock still leaves a race condition where another cpu >>> can follow the new v->processor and obtain the schedule lock, at which >>> point we have two threads both working on the internals of a vcpu. The >>> change below certainly will fix the current bug of locking one spinlock >>> and unlocking another. >>> >>> My gut feeling is that we do need to take both locks to be safe in terms >>> of data access, but we would appreciate advice from someone more >>> familiar with the scheduler locking. >> >> If it''s that tricky to work out, why not just take the two locks, >> appropriately ordered? This isn''t a hot path. > > Shouldn''t we rather fix the locking mechanism itself, making > vcpu_schedule_lock...() return the lock, such that the unlock > will unavoidably use the correct lock? > > That would at once allow dropping vcpu_schedule_unlock...() > altogether, which would be a good thing even if only considering > the explicit uses of local_irq_disable() there (instead of using the > right spin lock primitives). And if done that way, replacing the > explicit uses of local_irq_enable() in the locking paths would also > seem rather desirable - after all this defeats the spin lock > primitives wanting to re-enable interrupts while waiting for a > lock.It feels to me like this is separate from Andrew''s concern. Also I think that holding the schedule_lock should protect you from changes to v->processor. But if that''s really unreasonable (e.g., inefficient) then your suggestion here is perfectly sensible. Improving the vcpu_schedule_lock_irq implementations to use the providied underlying spin_lock_irq functions would also be nice, I guess :) -- Keir> Jan >
Andrew Cooper
2013-Oct-11 09:02 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 11/10/2013 09:07, Keir Fraser wrote:> On 11/10/2013 08:12, "Jan Beulich" <JBeulich@suse.com> wrote: > >>>>> On 10.10.13 at 20:27, Keir Fraser <keir.xen@gmail.com> wrote: >>> On 10/10/2013 19:01, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote: >>> >>>>> Just taking the lock for the old processor seemed sufficient to me as >>>>> anything seeing the new value would lock and unlock using the same new >>>>> value. But do we need to take the schedule_lock for the new processor >>>>> as well (in the right order of course)? >>>> David and I have been discussing this for a while, involving a >>>> whiteboard, and not come to a firm conclusion either way. >>>> >>>> From my point of view, holding the appropriate vcpu schedule lock >>>> entitles you to play with vcpu scheduling state, which involves >>>> following v->sched_priv which we update outside the critical region later. >>>> >>>> Only taking the one lock still leaves a race condition where another cpu >>>> can follow the new v->processor and obtain the schedule lock, at which >>>> point we have two threads both working on the internals of a vcpu. The >>>> change below certainly will fix the current bug of locking one spinlock >>>> and unlocking another. >>>> >>>> My gut feeling is that we do need to take both locks to be safe in terms >>>> of data access, but we would appreciate advice from someone more >>>> familiar with the scheduler locking. >>> If it''s that tricky to work out, why not just take the two locks, >>> appropriately ordered? This isn''t a hot path. >> Shouldn''t we rather fix the locking mechanism itself, making >> vcpu_schedule_lock...() return the lock, such that the unlock >> will unavoidably use the correct lock? >> >> That would at once allow dropping vcpu_schedule_unlock...() >> altogether, which would be a good thing even if only considering >> the explicit uses of local_irq_disable() there (instead of using the >> right spin lock primitives). And if done that way, replacing the >> explicit uses of local_irq_enable() in the locking paths would also >> seem rather desirable - after all this defeats the spin lock >> primitives wanting to re-enable interrupts while waiting for a >> lock. > It feels to me like this is separate from Andrew''s concern. Also I think > that holding the schedule_lock should protect you from changes to > v->processor. But if that''s really unreasonable (e.g., inefficient) then > your suggestion here is perfectly sensible. > > Improving the vcpu_schedule_lock_irq implementations to use the providied > underlying spin_lock_irq functions would also be nice, I guess :)This is an orthogonal issue which could do with fixing. Do note that simply making changes to vcpu_schedule_lock() to return the appropriate lock is not sufficient to fix this issue, as the race with changing v->processor can cause two cpus to "successfully" lock the vcpu schedule lock for a particular vcpu. ~Andrew
Jan Beulich
2013-Oct-11 09:32 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
>>> On 11.10.13 at 11:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 11/10/2013 09:07, Keir Fraser wrote: >> It feels to me like this is separate from Andrew''s concern. Also I think >> that holding the schedule_lock should protect you from changes to >> v->processor. But if that''s really unreasonable (e.g., inefficient) then >> your suggestion here is perfectly sensible. >> >> Improving the vcpu_schedule_lock_irq implementations to use the providied >> underlying spin_lock_irq functions would also be nice, I guess :) > > This is an orthogonal issue which could do with fixing. Do note that > simply making changes to vcpu_schedule_lock() to return the appropriate > lock is not sufficient to fix this issue, as the race with changing > v->processor can cause two cpus to "successfully" lock the vcpu schedule > lock for a particular vcpu.Yes indeed. It''s just that with such adjustments the fix here would become more "natural" in no longer having to open-code the schedule_lock access. I suppose you scanned the code for other cases like this, and there are none? Jan
David Vrabel
2013-Oct-11 09:36 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 11/10/13 10:32, Jan Beulich wrote:>>>> On 11.10.13 at 11:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> On 11/10/2013 09:07, Keir Fraser wrote: >>> It feels to me like this is separate from Andrew''s concern. Also I think >>> that holding the schedule_lock should protect you from changes to >>> v->processor. But if that''s really unreasonable (e.g., inefficient) then >>> your suggestion here is perfectly sensible. >>> >>> Improving the vcpu_schedule_lock_irq implementations to use the providied >>> underlying spin_lock_irq functions would also be nice, I guess :) >> >> This is an orthogonal issue which could do with fixing. Do note that >> simply making changes to vcpu_schedule_lock() to return the appropriate >> lock is not sufficient to fix this issue, as the race with changing >> v->processor can cause two cpus to "successfully" lock the vcpu schedule >> lock for a particular vcpu. > > Yes indeed. It''s just that with such adjustments the fix here > would become more "natural" in no longer having to open-code > the schedule_lock access. > > I suppose you scanned the code for other cases like this, and > there are none?Would it be sensible to get this fix in as-is? It''s a minimal fix that I think would be more suitable for backporting to the stable trees rather than a reworking of the vcpu_schedule_lock() and friends? David
Jan Beulich
2013-Oct-11 09:37 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
>>> On 11.10.13 at 11:36, David Vrabel <david.vrabel@citrix.com> wrote: > On 11/10/13 10:32, Jan Beulich wrote: >>>>> On 11.10.13 at 11:02, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> On 11/10/2013 09:07, Keir Fraser wrote: >>>> It feels to me like this is separate from Andrew''s concern. Also I think >>>> that holding the schedule_lock should protect you from changes to >>>> v->processor. But if that''s really unreasonable (e.g., inefficient) then >>>> your suggestion here is perfectly sensible. >>>> >>>> Improving the vcpu_schedule_lock_irq implementations to use the providied >>>> underlying spin_lock_irq functions would also be nice, I guess :) >>> >>> This is an orthogonal issue which could do with fixing. Do note that >>> simply making changes to vcpu_schedule_lock() to return the appropriate >>> lock is not sufficient to fix this issue, as the race with changing >>> v->processor can cause two cpus to "successfully" lock the vcpu schedule >>> lock for a particular vcpu. >> >> Yes indeed. It''s just that with such adjustments the fix here >> would become more "natural" in no longer having to open-code >> the schedule_lock access. >> >> I suppose you scanned the code for other cases like this, and >> there are none? > > Would it be sensible to get this fix in as-is? It''s a minimal fix that > I think would be more suitable for backporting to the stable trees > rather than a reworking of the vcpu_schedule_lock() and friends?Oh yes, I''m not denying that; it just needs an ack from either Keir or George. And I''m already working on the follow-on one. Jan
George Dunlap
2013-Oct-11 10:32 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 10/10/13 18:29, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > sched_move_domain() changes v->processor for all the domain''s VCPUs. > If another domain, softirq etc. triggers a simultaneous call to > vcpu_wake() (e.g., by setting an event channel as pending), then > vcpu_wake() may lock one schedule lock and try to unlock another. > > vcpu_schedule_lock() attempts to handle this but only does so for the > window between reading the schedule_lock from the per-CPU data and the > spin_lock() call. This does not help with sched_move_domain() > changing v->processor between the calls to vcpu_schedule_lock() and > vcpu_schedule_unlock(). > > Fix the race by taking the schedule_lock for v->processor in > sched_move_domain(). > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: George Dunlap <george.dunlap@eu.citrix.com> > Cc: Juergen Gross <juergen.gross@ts.fujitsu.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com> > --- > > Just taking the lock for the old processor seemed sufficient to me as > anything seeing the new value would lock and unlock using the same new > value. But do we need to take the schedule_lock for the new processor > as well (in the right order of course)?So going through the code and trying to reconstruct all the state in my head... If you look at vcpu_migrate(), it grabs both locks. But it looks like the main purpose for that is so that we can call the migrate SCHED_OP(), which for credit2 needs to do some mucking about with runqueues, and thus needs both locks. In the case of move_domain, this is unnecessary, since it is removed from the old scheduler and then added to the new one. In a sense, Andrew, you''re right: if you change v->processor, then you no longer hold v''s schedule lock (unless you do what vcpu_migrate() does, and grab the lock of the processor you''re moving to as well). In this case, it doesn''t matter, because you''re just about to release the lock anyway. But it may be misleading to people in the future trying to figure out what the right thing is to do -- we should at very least put a comment saying that changing v->processor without having the new lock effectively unlocks v, so don''t do any more changes to the processor state. (Or we can do as Keir says, and do the double-locking, but that''s a bit of a pain, as you can see from vcpu_migrate().) But I think this patch is still not quite right: both v->processor and per_cpu(schedule_data, ...).schedule_lock may change under your feet; so you always need to do the lock in a loop, checking to make sure that you *still* have the right lock after you have actually grabbed it. The gears on this code are rusty, however, so please do double-check my thinking here... -George
George Dunlap
2013-Oct-11 10:36 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 11/10/13 08:12, Jan Beulich wrote:>>>> On 10.10.13 at 20:27, Keir Fraser <keir.xen@gmail.com> wrote: >> On 10/10/2013 19:01, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote: >> >>>> Just taking the lock for the old processor seemed sufficient to me as >>>> anything seeing the new value would lock and unlock using the same new >>>> value. But do we need to take the schedule_lock for the new processor >>>> as well (in the right order of course)? >>> David and I have been discussing this for a while, involving a >>> whiteboard, and not come to a firm conclusion either way. >>> >>> From my point of view, holding the appropriate vcpu schedule lock >>> entitles you to play with vcpu scheduling state, which involves >>> following v->sched_priv which we update outside the critical region later. >>> >>> Only taking the one lock still leaves a race condition where another cpu >>> can follow the new v->processor and obtain the schedule lock, at which >>> point we have two threads both working on the internals of a vcpu. The >>> change below certainly will fix the current bug of locking one spinlock >>> and unlocking another. >>> >>> My gut feeling is that we do need to take both locks to be safe in terms >>> of data access, but we would appreciate advice from someone more >>> familiar with the scheduler locking. >> If it''s that tricky to work out, why not just take the two locks, >> appropriately ordered? This isn''t a hot path. > Shouldn''t we rather fix the locking mechanism itself, making > vcpu_schedule_lock...() return the lock, such that the unlock > will unavoidably use the correct lock?That''s an idea; but I half wonder if it wouldn''t be better to actually keep vcpu_schedule_unlock(), but pass it the old lock. Then for debug builds we can ASSERT that the lock hasn''t changed. -George
Dario Faggioli
2013-Oct-11 11:15 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On ven, 2013-10-11 at 11:32 +0100, George Dunlap wrote:> So going through the code and trying to reconstruct all the state in my > head... > > If you look at vcpu_migrate(), it grabs both locks. But it looks like > the main purpose for that is so that we can call the migrate SCHED_OP(), > which for credit2 needs to do some mucking about with runqueues, and > thus needs both locks. >Just to make sure I understood what''s going on, the SCHED_OP you''re referring is SCHED_OP(..,migrate,..) here, right?> In the case of move_domain, this is unnecessary, > since it is removed from the old scheduler and then added to the new one. >I think that too, and that''s why I wouldn''t take both locks: it''d actually be misleading rather than enlightening for people reading the code, at least that''s how I see it. Perhaps, we can put a comment somewhere (as George is also saying). Regarding the patch, I personally like Jan''s idea.> But I think this patch is still not quite right: both v->processor and > per_cpu(schedule_data, ...).schedule_lock may change under your feet; so > you always need to do the lock in a loop, checking to make sure that you > *still* have the right lock after you have actually grabbed it. >Which, if I''m not mistaken, we sort of get for free it we go Jan''s way, don''t we? Regards, Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
George Dunlap
2013-Oct-11 11:32 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 11/10/13 12:15, Dario Faggioli wrote:> On ven, 2013-10-11 at 11:32 +0100, George Dunlap wrote: >> So going through the code and trying to reconstruct all the state in my >> head... >> >> If you look at vcpu_migrate(), it grabs both locks. But it looks like >> the main purpose for that is so that we can call the migrate SCHED_OP(), >> which for credit2 needs to do some mucking about with runqueues, and >> thus needs both locks. >> > Just to make sure I understood what''s going on, the SCHED_OP you''re > referring is SCHED_OP(..,migrate,..) here, right? > >> In the case of move_domain, this is unnecessary, >> since it is removed from the old scheduler and then added to the new one. >> > I think that too, and that''s why I wouldn''t take both locks: it''d > actually be misleading rather than enlightening for people reading the > code, at least that''s how I see it. > > Perhaps, we can put a comment somewhere (as George is also saying). > > Regarding the patch, I personally like Jan''s idea. > >> But I think this patch is still not quite right: both v->processor and >> per_cpu(schedule_data, ...).schedule_lock may change under your feet; so >> you always need to do the lock in a loop, checking to make sure that you >> *still* have the right lock after you have actually grabbed it. >> > Which, if I''m not mistaken, we sort of get for free it we go Jan''s way, > don''t we?You mean, we could just call vcpu_schedule_lock..() instead of writing a bespoke loop code? Sure, that''s definitely an advantage. -George
Keir Fraser
2013-Oct-11 11:47 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 10/10/2013 18:29, "David Vrabel" <david.vrabel@citrix.com> wrote:> From: David Vrabel <david.vrabel@citrix.com> > > sched_move_domain() changes v->processor for all the domain''s VCPUs. > If another domain, softirq etc. triggers a simultaneous call to > vcpu_wake() (e.g., by setting an event channel as pending), then > vcpu_wake() may lock one schedule lock and try to unlock another. > > vcpu_schedule_lock() attempts to handle this but only does so for the > window between reading the schedule_lock from the per-CPU data and the > spin_lock() call. This does not help with sched_move_domain() > changing v->processor between the calls to vcpu_schedule_lock() and > vcpu_schedule_unlock(). > > Fix the race by taking the schedule_lock for v->processor in > sched_move_domain(). > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Cc: George Dunlap <george.dunlap@eu.citrix.com> > Cc: Juergen Gross <juergen.gross@ts.fujitsu.com> > Cc: Andrew Cooper <andrew.cooper3@citrix.com>Acked-by: Keir Fraser <keir@xen.org>
Dario Faggioli
2013-Oct-11 11:49 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On ven, 2013-10-11 at 12:32 +0100, George Dunlap wrote:> On 11/10/13 12:15, Dario Faggioli wrote: > > On ven, 2013-10-11 at 11:32 +0100, George Dunlap wrote:> >> But I think this patch is still not quite right: both v->processor and > >> per_cpu(schedule_data, ...).schedule_lock may change under your feet; so > >> you always need to do the lock in a loop, checking to make sure that you > >> *still* have the right lock after you have actually grabbed it. > >> > > Which, if I''m not mistaken, we sort of get for free it we go Jan''s way, > > don''t we? > > You mean, we could just call vcpu_schedule_lock..() instead of writing a > bespoke loop code? Sure, that''s definitely an advantage. >Yes, provided we go Jan''s way and have it return the lock. That way, we''ll have vcpu_schedule_lock() responsible for both finding the proper lock, and doing it in the right way (with the loop, as you''re suggesting above), and returning it to the caller. That would mean result in code that is both correct and looks better (no per_cpu().schedule_lock in the caller), so it''s a win win. :-) Dario -- <<This happens because I choose it to happen!>> (Raistlin Majere) ----------------------------------------------------------------- Dario Faggioli, Ph.D, http://about.me/dario.faggioli Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Jan Beulich
2013-Oct-11 12:03 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
>>> On 11.10.13 at 13:49, Dario Faggioli <dario.faggioli@citrix.com> wrote: > On ven, 2013-10-11 at 12:32 +0100, George Dunlap wrote: >> On 11/10/13 12:15, Dario Faggioli wrote: >> > On ven, 2013-10-11 at 11:32 +0100, George Dunlap wrote: > >> >> But I think this patch is still not quite right: both v->processor and >> >> per_cpu(schedule_data, ...).schedule_lock may change under your feet; so >> >> you always need to do the lock in a loop, checking to make sure that you >> >> *still* have the right lock after you have actually grabbed it. >> >> >> > Which, if I''m not mistaken, we sort of get for free it we go Jan''s way, >> > don''t we? >> >> You mean, we could just call vcpu_schedule_lock..() instead of writing a >> bespoke loop code? Sure, that''s definitely an advantage. >> > Yes, provided we go Jan''s way and have it return the lock. > > That way, we''ll have vcpu_schedule_lock() responsible for both finding > the proper lock, and doing it in the right way (with the loop, as you''re > suggesting above), and returning it to the caller. > > That would mean result in code that is both correct and looks better (no > per_cpu().schedule_lock in the caller), so it''s a win win. :-)So right now I have that change on top of David''s. As David''s alone is an improvement (even if still leaving a hole as you validly pointed out), I''d think we should commit it as is and then discuss the details of my additional change (in particular the question of whether to keep [vp]cpu_schedule_unlock...(), which I for now the patch is dropping). Jan
Jan Beulich
2013-Oct-11 12:20 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
>>> On 11.10.13 at 11:32, "Jan Beulich" <JBeulich@suse.com> wrote: > I suppose you scanned the code for other cases like this, and > there are none?Actually I did just now, and I think there''s a similar issue in credit2''s init_pcpu(): After taking pcpu_schedule_lock(cpu) it alters schedule_lock and hence effectively drops the locking, yet continues to do other stuff before in fact releasing it. What is being done prior to unlocking, however, looks to be unrelated to the lock being held, and rather independently (of the effective releasing) wanting &rqd->lock held. Jan
George Dunlap
2013-Oct-11 14:39 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
On 11/10/13 13:20, Jan Beulich wrote:>>>> On 11.10.13 at 11:32, "Jan Beulich" <JBeulich@suse.com> wrote: >> I suppose you scanned the code for other cases like this, and >> there are none? > Actually I did just now, and I think there''s a similar issue in > credit2''s init_pcpu(): After taking pcpu_schedule_lock(cpu) it > alters schedule_lock and hence effectively drops the locking, > yet continues to do other stuff before in fact releasing it. > > What is being done prior to unlocking, however, looks to be > unrelated to the lock being held, and rather independently > (of the effective releasing) wanting &rqd->lock held.I can''t quite make out what you mean in the last sentence; but setting the cpu in rqd->idle and rqd->active should certainly be protected by rqd->lock, and it certainly looks like it''s not being grabbed at the moment. Hmm -- I think we may need to do some kind of fancy looping thing like we do in vcpu_migrate, to lock both the current schedule lock and rqd->lock; with the difference, I suppose, that rqd lock won''t change (since the assignment of cpu->runqueue at the moment is static). Let me put this on my list of things to do before the release. -George
George Dunlap
2013-Oct-11 14:45 UTC
Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
create ^ title it credit2: init_pcpu doesn''t grab runqueue lock before pointing pcpu schedule_lock to it thanks On Fri, Oct 11, 2013 at 1:20 PM, Jan Beulich <JBeulich@suse.com> wrote:>>>> On 11.10.13 at 11:32, "Jan Beulich" <JBeulich@suse.com> wrote: >> I suppose you scanned the code for other cases like this, and >> there are none? > > Actually I did just now, and I think there''s a similar issue in > credit2''s init_pcpu(): After taking pcpu_schedule_lock(cpu) it > alters schedule_lock and hence effectively drops the locking, > yet continues to do other stuff before in fact releasing it. > > What is being done prior to unlocking, however, looks to be > unrelated to the lock being held, and rather independently > (of the effective releasing) wanting &rqd->lock held. > > Jan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
xen@bugs.xenproject.org
2013-Oct-11 15:00 UTC
Processed: Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()
Processing commands for xen@bugs.xenproject.org:> create ^Created new bug #21 rooted at `<5258094502000078000FA917@nat28.tlf.novell.com>'' Title: `Re: [Xen-devel] [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()''> title it credit2: init_pcpu doesn''t grab runqueue lock before pointingSet title for #21 to `credit2: init_pcpu doesn''t grab runqueue lock before pointing''> pcpu schedule_lock to itCommand failed: Unknown command `pcpu''. at /srv/xen-devel-bugs/lib/emesinae/control.pl line 437, <M> line 42. Stop processing here. Modified/created Bugs: - 21: http://bugs.xenproject.org/xen/bug/21 (new) --- Xen Hypervisor Bug Tracker See http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for information on reporting bugs Contact xen-bugs-owner@bugs.xenproject.org with any infrastructure issues