thr3ads.net - Xen devel - [PATCH] sched: fix race between sched_move_domain() and vcpu

If this information is useful, please help other people find it:
Share via:

David Vrabel

2013-Oct-10 17:29 UTC

[PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

From: David Vrabel <david.vrabel@citrix.com>

sched_move_domain() changes v->processor for all the domain''s VCPUs.
If another domain, softirq etc. triggers a simultaneous call to
vcpu_wake() (e.g., by setting an event channel as pending), then
vcpu_wake() may lock one schedule lock and try to unlock another.

vcpu_schedule_lock() attempts to handle this but only does so for the
window between reading the schedule_lock from the per-CPU data and the
spin_lock() call.  This does not help with sched_move_domain()
changing v->processor between the calls to vcpu_schedule_lock() and
vcpu_schedule_unlock().

Fix the race by taking the schedule_lock for v->processor in
sched_move_domain().

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Juergen Gross <juergen.gross@ts.fujitsu.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
---

Just taking the lock for the old processor seemed sufficient to me as
anything seeing the new value would lock and unlock using the same new
value.  But do we need to take the schedule_lock for the new processor
as well (in the right order of course)?

This is reproducable by constantly migrating a domain between two CPU
pools.
8<------------
while true; do
    xl cpupool-migrate $1 Pool-1
    xl cpupool-migrate $1 Pool-0
done
---
 xen/common/schedule.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 1ddfb22..28e063e 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -278,6 +278,9 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
     new_p = cpumask_first(c->cpu_valid);
     for_each_vcpu ( d, v )
     {
+        spinlock_t *schedule_lock = per_cpu(schedule_data,
+                                            v->processor).schedule_lock;
+
         vcpudata = v->sched_priv;
 
         migrate_timer(&v->periodic_timer, new_p);
@@ -285,7 +288,11 @@ int sched_move_domain(struct domain *d, struct cpupool *c)
         migrate_timer(&v->poll_timer, new_p);
 
         cpumask_setall(v->cpu_affinity);
+
+        spin_lock_irq(schedule_lock);
         v->processor = new_p;
+        spin_unlock_irq(schedule_lock);
+
         v->sched_priv = vcpu_priv[v->vcpu_id];
         evtchn_move_pirqs(v);
 
-- 
1.7.2.5

Andrew Cooper

2013-Oct-10 18:01 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 10/10/13 18:29, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
>
> sched_move_domain() changes v->processor for all the domain''s
VCPUs.
> If another domain, softirq etc. triggers a simultaneous call to
> vcpu_wake() (e.g., by setting an event channel as pending), then
> vcpu_wake() may lock one schedule lock and try to unlock another.
>
> vcpu_schedule_lock() attempts to handle this but only does so for the
> window between reading the schedule_lock from the per-CPU data and the
> spin_lock() call.  This does not help with sched_move_domain()
> changing v->processor between the calls to vcpu_schedule_lock() and
> vcpu_schedule_unlock().
>
> Fix the race by taking the schedule_lock for v->processor in
> sched_move_domain().
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <juergen.gross@ts.fujitsu.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>
> Just taking the lock for the old processor seemed sufficient to me as
> anything seeing the new value would lock and unlock using the same new
> value.  But do we need to take the schedule_lock for the new processor
> as well (in the right order of course)?
David and I have been discussing this for a while, involving a
whiteboard, and not come to a firm conclusion either way.

From my point of view, holding the appropriate vcpu schedule lock
entitles you to play with vcpu scheduling state, which involves
following v->sched_priv which we update outside the critical region later.

Only taking the one lock still leaves a race condition where another cpu
can follow the new v->processor and obtain the schedule lock, at which
point we have two threads both working on the internals of a vcpu.  The
change below certainly will fix the current bug of locking one spinlock
and unlocking another.

My gut feeling is that we do need to take both locks to be safe in terms
of data access, but we would appreciate advice from someone more
familiar with the scheduler locking.

~Andrew
>
> This is reproducable by constantly migrating a domain between two CPU
> pools.
> 8<------------
> while true; do
>     xl cpupool-migrate $1 Pool-1
>     xl cpupool-migrate $1 Pool-0
> done
> ---
>  xen/common/schedule.c |    7 +++++++
>  1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 1ddfb22..28e063e 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -278,6 +278,9 @@ int sched_move_domain(struct domain *d, struct cpupool
*c)
>      new_p = cpumask_first(c->cpu_valid);
>      for_each_vcpu ( d, v )
>      {
> +        spinlock_t *schedule_lock = per_cpu(schedule_data,
> +                                           
v->processor).schedule_lock;
> +
>          vcpudata = v->sched_priv;
>  
>          migrate_timer(&v->periodic_timer, new_p);
> @@ -285,7 +288,11 @@ int sched_move_domain(struct domain *d, struct cpupool
*c)
>          migrate_timer(&v->poll_timer, new_p);
>  
>          cpumask_setall(v->cpu_affinity);
> +
> +        spin_lock_irq(schedule_lock);
>          v->processor = new_p;
> +        spin_unlock_irq(schedule_lock);
> +
>          v->sched_priv = vcpu_priv[v->vcpu_id];
>          evtchn_move_pirqs(v);
>

Keir Fraser

2013-Oct-10 18:27 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 10/10/2013 19:01, "Andrew Cooper" <andrew.cooper3@citrix.com>
wrote:
>> Just taking the lock for the old processor seemed sufficient to me as
>> anything seeing the new value would lock and unlock using the same new
>> value.  But do we need to take the schedule_lock for the new processor
>> as well (in the right order of course)?
> 
> David and I have been discussing this for a while, involving a
> whiteboard, and not come to a firm conclusion either way.
> 
> From my point of view, holding the appropriate vcpu schedule lock
> entitles you to play with vcpu scheduling state, which involves
> following v->sched_priv which we update outside the critical region
later.
> 
> Only taking the one lock still leaves a race condition where another cpu
> can follow the new v->processor and obtain the schedule lock, at which
> point we have two threads both working on the internals of a vcpu.  The
> change below certainly will fix the current bug of locking one spinlock
> and unlocking another.
> 
> My gut feeling is that we do need to take both locks to be safe in terms
> of data access, but we would appreciate advice from someone more
> familiar with the scheduler locking.
If it''s that tricky to work out, why not just take the two locks,
appropriately ordered? This isn''t a hot path.

 -- Keir

Juergen Gross

2013-Oct-11 06:37 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 10.10.2013 19:29, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
>
> sched_move_domain() changes v->processor for all the domain''s
VCPUs.
> If another domain, softirq etc. triggers a simultaneous call to
> vcpu_wake() (e.g., by setting an event channel as pending), then
> vcpu_wake() may lock one schedule lock and try to unlock another.
>
> vcpu_schedule_lock() attempts to handle this but only does so for the
> window between reading the schedule_lock from the per-CPU data and the
> spin_lock() call.  This does not help with sched_move_domain()
> changing v->processor between the calls to vcpu_schedule_lock() and
> vcpu_schedule_unlock().
>
> Fix the race by taking the schedule_lock for v->processor in
> sched_move_domain().
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <juergen.gross@ts.fujitsu.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>
> Just taking the lock for the old processor seemed sufficient to me as
> anything seeing the new value would lock and unlock using the same new
> value.  But do we need to take the schedule_lock for the new processor
> as well (in the right order of course)?
I don''t think it is necessary to take both locks. There can''t
be any
scheduler specific (e.g. credit) activity on the vcpu(s), as they are
removed from the source scheduler before and will be added to the target
scheduler after switching the processor.

BTW: good catch! I think this explains a problem I have been searching for
some time now...

Acked-by: Juergen Gross <juergen.gross@ts.fujitsu.com>
>
> This is reproducable by constantly migrating a domain between two CPU
> pools.
> 8<------------
> while true; do
>      xl cpupool-migrate $1 Pool-1
>      xl cpupool-migrate $1 Pool-0
> done
> ---
>   xen/common/schedule.c |    7 +++++++
>   1 files changed, 7 insertions(+), 0 deletions(-)
>
> diff --git a/xen/common/schedule.c b/xen/common/schedule.c
> index 1ddfb22..28e063e 100644
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -278,6 +278,9 @@ int sched_move_domain(struct domain *d, struct cpupool
*c)
>       new_p = cpumask_first(c->cpu_valid);
>       for_each_vcpu ( d, v )
>       {
> +        spinlock_t *schedule_lock = per_cpu(schedule_data,
> +                                           
v->processor).schedule_lock;
> +
>           vcpudata = v->sched_priv;
>
>           migrate_timer(&v->periodic_timer, new_p);
> @@ -285,7 +288,11 @@ int sched_move_domain(struct domain *d, struct cpupool
*c)
>           migrate_timer(&v->poll_timer, new_p);
>
>           cpumask_setall(v->cpu_affinity);
> +
> +        spin_lock_irq(schedule_lock);
>           v->processor = new_p;
> +        spin_unlock_irq(schedule_lock);
> +
>           v->sched_priv = vcpu_priv[v->vcpu_id];
>           evtchn_move_pirqs(v);
>
>

-- 
Juergen Gross                 Principal Developer Operating Systems
PBG PDG ES&S SWE OS6                   Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Mies van der Rohe Str. 8                Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

Jan Beulich

2013-Oct-11 07:12 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

>>> On 10.10.13 at 20:27, Keir Fraser <keir.xen@gmail.com> wrote:
> On 10/10/2013 19:01, "Andrew Cooper"
<andrew.cooper3@citrix.com> wrote:
> 
>>> Just taking the lock for the old processor seemed sufficient to me
as
>>> anything seeing the new value would lock and unlock using the same
new
>>> value.  But do we need to take the schedule_lock for the new
processor
>>> as well (in the right order of course)?
>> 
>> David and I have been discussing this for a while, involving a
>> whiteboard, and not come to a firm conclusion either way.
>> 
>> From my point of view, holding the appropriate vcpu schedule lock
>> entitles you to play with vcpu scheduling state, which involves
>> following v->sched_priv which we update outside the critical region
later.
>> 
>> Only taking the one lock still leaves a race condition where another
cpu
>> can follow the new v->processor and obtain the schedule lock, at
which
>> point we have two threads both working on the internals of a vcpu.  The
>> change below certainly will fix the current bug of locking one spinlock
>> and unlocking another.
>> 
>> My gut feeling is that we do need to take both locks to be safe in
terms
>> of data access, but we would appreciate advice from someone more
>> familiar with the scheduler locking.
> 
> If it''s that tricky to work out, why not just take the two locks,
> appropriately ordered? This isn''t a hot path.
Shouldn''t we rather fix the locking mechanism itself, making
vcpu_schedule_lock...() return the lock, such that the unlock
will unavoidably use the correct lock?

That would at once allow dropping vcpu_schedule_unlock...()
altogether, which would be a good thing even if only considering
the explicit uses of local_irq_disable() there (instead of using the
right spin lock primitives). And if done that way, replacing the
explicit uses of local_irq_enable() in the locking paths would also
seem rather desirable - after all this defeats the spin lock
primitives wanting to re-enable interrupts while waiting for a
lock.

Jan

Keir Fraser

2013-Oct-11 08:07 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 11/10/2013 08:12, "Jan Beulich" <JBeulich@suse.com> wrote:
>>>> On 10.10.13 at 20:27, Keir Fraser <keir.xen@gmail.com>
wrote:
>> On 10/10/2013 19:01, "Andrew Cooper"
<andrew.cooper3@citrix.com> wrote:
>> 
>>>> Just taking the lock for the old processor seemed sufficient to
me as
>>>> anything seeing the new value would lock and unlock using the
same new
>>>> value.  But do we need to take the schedule_lock for the new
processor
>>>> as well (in the right order of course)?
>>> 
>>> David and I have been discussing this for a while, involving a
>>> whiteboard, and not come to a firm conclusion either way.
>>> 
>>> From my point of view, holding the appropriate vcpu schedule lock
>>> entitles you to play with vcpu scheduling state, which involves
>>> following v->sched_priv which we update outside the critical
region later.
>>> 
>>> Only taking the one lock still leaves a race condition where
another cpu
>>> can follow the new v->processor and obtain the schedule lock, at
which
>>> point we have two threads both working on the internals of a vcpu. 
The
>>> change below certainly will fix the current bug of locking one
spinlock
>>> and unlocking another.
>>> 
>>> My gut feeling is that we do need to take both locks to be safe in
terms
>>> of data access, but we would appreciate advice from someone more
>>> familiar with the scheduler locking.
>> 
>> If it''s that tricky to work out, why not just take the two
locks,
>> appropriately ordered? This isn''t a hot path.
> 
> Shouldn''t we rather fix the locking mechanism itself, making
> vcpu_schedule_lock...() return the lock, such that the unlock
> will unavoidably use the correct lock?
> 
> That would at once allow dropping vcpu_schedule_unlock...()
> altogether, which would be a good thing even if only considering
> the explicit uses of local_irq_disable() there (instead of using the
> right spin lock primitives). And if done that way, replacing the
> explicit uses of local_irq_enable() in the locking paths would also
> seem rather desirable - after all this defeats the spin lock
> primitives wanting to re-enable interrupts while waiting for a
> lock.
It feels to me like this is separate from Andrew''s concern. Also I
think
that holding the schedule_lock should protect you from changes to
v->processor. But if that''s really unreasonable (e.g., inefficient)
then
your suggestion here is perfectly sensible.

Improving the vcpu_schedule_lock_irq implementations to use the providied
underlying spin_lock_irq functions would also be nice, I guess :)

 -- Keir
> Jan
>

Andrew Cooper

2013-Oct-11 09:02 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 11/10/2013 09:07, Keir Fraser wrote:> On 11/10/2013 08:12, "Jan Beulich" <JBeulich@suse.com>
wrote:
>
>>>>> On 10.10.13 at 20:27, Keir Fraser
<keir.xen@gmail.com> wrote:
>>> On 10/10/2013 19:01, "Andrew Cooper"
<andrew.cooper3@citrix.com> wrote:
>>>
>>>>> Just taking the lock for the old processor seemed
sufficient to me as
>>>>> anything seeing the new value would lock and unlock using
the same new
>>>>> value.  But do we need to take the schedule_lock for the
new processor
>>>>> as well (in the right order of course)?
>>>> David and I have been discussing this for a while, involving a
>>>> whiteboard, and not come to a firm conclusion either way.
>>>>
>>>> From my point of view, holding the appropriate vcpu schedule
lock
>>>> entitles you to play with vcpu scheduling state, which involves
>>>> following v->sched_priv which we update outside the critical
region later.
>>>>
>>>> Only taking the one lock still leaves a race condition where
another cpu
>>>> can follow the new v->processor and obtain the schedule
lock, at which
>>>> point we have two threads both working on the internals of a
vcpu.  The
>>>> change below certainly will fix the current bug of locking one
spinlock
>>>> and unlocking another.
>>>>
>>>> My gut feeling is that we do need to take both locks to be safe
in terms
>>>> of data access, but we would appreciate advice from someone
more
>>>> familiar with the scheduler locking.
>>> If it''s that tricky to work out, why not just take the two
locks,
>>> appropriately ordered? This isn''t a hot path.
>> Shouldn''t we rather fix the locking mechanism itself, making
>> vcpu_schedule_lock...() return the lock, such that the unlock
>> will unavoidably use the correct lock?
>>
>> That would at once allow dropping vcpu_schedule_unlock...()
>> altogether, which would be a good thing even if only considering
>> the explicit uses of local_irq_disable() there (instead of using the
>> right spin lock primitives). And if done that way, replacing the
>> explicit uses of local_irq_enable() in the locking paths would also
>> seem rather desirable - after all this defeats the spin lock
>> primitives wanting to re-enable interrupts while waiting for a
>> lock.
> It feels to me like this is separate from Andrew''s concern. Also I
think
> that holding the schedule_lock should protect you from changes to
> v->processor. But if that''s really unreasonable (e.g.,
inefficient) then
> your suggestion here is perfectly sensible.
>
> Improving the vcpu_schedule_lock_irq implementations to use the providied
> underlying spin_lock_irq functions would also be nice, I guess :)
This is an orthogonal issue which could do with fixing.  Do note that
simply making changes to vcpu_schedule_lock() to return the appropriate
lock is not sufficient to fix this issue, as the race with changing
v->processor can cause two cpus to "successfully" lock the vcpu
schedule
lock for a particular vcpu.

~Andrew

Jan Beulich

2013-Oct-11 09:32 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

>>> On 11.10.13 at 11:02, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 11/10/2013 09:07, Keir Fraser wrote:
>> It feels to me like this is separate from Andrew''s concern.
Also I think
>> that holding the schedule_lock should protect you from changes to
>> v->processor. But if that''s really unreasonable (e.g.,
inefficient) then
>> your suggestion here is perfectly sensible.
>>
>> Improving the vcpu_schedule_lock_irq implementations to use the
providied
>> underlying spin_lock_irq functions would also be nice, I guess :)
> 
> This is an orthogonal issue which could do with fixing.  Do note that
> simply making changes to vcpu_schedule_lock() to return the appropriate
> lock is not sufficient to fix this issue, as the race with changing
> v->processor can cause two cpus to "successfully" lock the
vcpu schedule
> lock for a particular vcpu.
Yes indeed. It''s just that with such adjustments the fix here
would become more "natural" in no longer having to open-code
the schedule_lock access.

I suppose you scanned the code for other cases like this, and
there are none?

Jan

David Vrabel

2013-Oct-11 09:36 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 11/10/13 10:32, Jan Beulich wrote:>>>> On 11.10.13 at 11:02, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> On 11/10/2013 09:07, Keir Fraser wrote:
>>> It feels to me like this is separate from Andrew''s
concern. Also I think
>>> that holding the schedule_lock should protect you from changes to
>>> v->processor. But if that''s really unreasonable (e.g.,
inefficient) then
>>> your suggestion here is perfectly sensible.
>>>
>>> Improving the vcpu_schedule_lock_irq implementations to use the
providied
>>> underlying spin_lock_irq functions would also be nice, I guess :)
>>
>> This is an orthogonal issue which could do with fixing.  Do note that
>> simply making changes to vcpu_schedule_lock() to return the appropriate
>> lock is not sufficient to fix this issue, as the race with changing
>> v->processor can cause two cpus to "successfully" lock the
vcpu schedule
>> lock for a particular vcpu.
> 
> Yes indeed. It''s just that with such adjustments the fix here
> would become more "natural" in no longer having to open-code
> the schedule_lock access.
> 
> I suppose you scanned the code for other cases like this, and
> there are none?
Would it be sensible to get this fix in as-is?  It''s a minimal fix that
I think would be more suitable for backporting to the stable trees
rather than a reworking of the vcpu_schedule_lock() and friends?

David

Jan Beulich

2013-Oct-11 09:37 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

>>> On 11.10.13 at 11:36, David Vrabel <david.vrabel@citrix.com>
wrote:
> On 11/10/13 10:32, Jan Beulich wrote:
>>>>> On 11.10.13 at 11:02, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>>> On 11/10/2013 09:07, Keir Fraser wrote:
>>>> It feels to me like this is separate from Andrew''s
concern. Also I think
>>>> that holding the schedule_lock should protect you from changes
to
>>>> v->processor. But if that''s really unreasonable
(e.g., inefficient) then
>>>> your suggestion here is perfectly sensible.
>>>>
>>>> Improving the vcpu_schedule_lock_irq implementations to use the
providied
>>>> underlying spin_lock_irq functions would also be nice, I guess
:)
>>>
>>> This is an orthogonal issue which could do with fixing.  Do note
that
>>> simply making changes to vcpu_schedule_lock() to return the
appropriate
>>> lock is not sufficient to fix this issue, as the race with changing
>>> v->processor can cause two cpus to "successfully" lock
the vcpu schedule
>>> lock for a particular vcpu.
>> 
>> Yes indeed. It''s just that with such adjustments the fix here
>> would become more "natural" in no longer having to open-code
>> the schedule_lock access.
>> 
>> I suppose you scanned the code for other cases like this, and
>> there are none?
> 
> Would it be sensible to get this fix in as-is?  It''s a minimal fix
that
> I think would be more suitable for backporting to the stable trees
> rather than a reworking of the vcpu_schedule_lock() and friends?
Oh yes, I''m not denying that; it just needs an ack from either
Keir or George. And I''m already working on the follow-on one.

Jan

George Dunlap

2013-Oct-11 10:32 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 10/10/13 18:29, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com>
>
> sched_move_domain() changes v->processor for all the domain''s
VCPUs.
> If another domain, softirq etc. triggers a simultaneous call to
> vcpu_wake() (e.g., by setting an event channel as pending), then
> vcpu_wake() may lock one schedule lock and try to unlock another.
>
> vcpu_schedule_lock() attempts to handle this but only does so for the
> window between reading the schedule_lock from the per-CPU data and the
> spin_lock() call.  This does not help with sched_move_domain()
> changing v->processor between the calls to vcpu_schedule_lock() and
> vcpu_schedule_unlock().
>
> Fix the race by taking the schedule_lock for v->processor in
> sched_move_domain().
>
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <juergen.gross@ts.fujitsu.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
>
> Just taking the lock for the old processor seemed sufficient to me as
> anything seeing the new value would lock and unlock using the same new
> value.  But do we need to take the schedule_lock for the new processor
> as well (in the right order of course)?
So going through the code and trying to reconstruct all the state in my 
head...

If you look at vcpu_migrate(), it grabs both locks.  But it looks like 
the main purpose for that is so that we can call the migrate SCHED_OP(), 
which for credit2 needs to do some mucking about with runqueues, and 
thus needs both locks.  In the case of move_domain, this is unnecessary, 
since it is removed from the old scheduler and then added to the new one.

In a sense, Andrew, you''re right: if you change v->processor, then
you
no longer hold v''s schedule lock (unless you do what vcpu_migrate() 
does, and grab the lock of the processor you''re moving to as well).  In
this case, it doesn''t matter, because you''re just about to
release the
lock anyway.  But it may be misleading to people in the future trying to 
figure out what the right thing is to do -- we should at very least put 
a comment saying that changing v->processor without having the new lock 
effectively unlocks v, so don''t do any more changes to the processor 
state. (Or we can do as Keir says, and do the double-locking, but
that''s
a bit of a pain, as you can see from vcpu_migrate().)

But I think this patch is still not quite right: both v->processor and 
per_cpu(schedule_data, ...).schedule_lock may change under your feet; so 
you always need to do the lock in a loop, checking to make sure that you 
*still* have the right lock after you have actually grabbed it.

The gears on this code are rusty, however, so please do double-check my 
thinking here...

  -George

George Dunlap

2013-Oct-11 10:36 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 11/10/13 08:12, Jan Beulich wrote:>>>> On 10.10.13 at 20:27, Keir Fraser <keir.xen@gmail.com>
wrote:
>> On 10/10/2013 19:01, "Andrew Cooper"
<andrew.cooper3@citrix.com> wrote:
>>
>>>> Just taking the lock for the old processor seemed sufficient to
me as
>>>> anything seeing the new value would lock and unlock using the
same new
>>>> value.  But do we need to take the schedule_lock for the new
processor
>>>> as well (in the right order of course)?
>>> David and I have been discussing this for a while, involving a
>>> whiteboard, and not come to a firm conclusion either way.
>>>
>>>  From my point of view, holding the appropriate vcpu schedule lock
>>> entitles you to play with vcpu scheduling state, which involves
>>> following v->sched_priv which we update outside the critical
region later.
>>>
>>> Only taking the one lock still leaves a race condition where
another cpu
>>> can follow the new v->processor and obtain the schedule lock, at
which
>>> point we have two threads both working on the internals of a vcpu. 
The
>>> change below certainly will fix the current bug of locking one
spinlock
>>> and unlocking another.
>>>
>>> My gut feeling is that we do need to take both locks to be safe in
terms
>>> of data access, but we would appreciate advice from someone more
>>> familiar with the scheduler locking.
>> If it''s that tricky to work out, why not just take the two
locks,
>> appropriately ordered? This isn''t a hot path.
> Shouldn''t we rather fix the locking mechanism itself, making
> vcpu_schedule_lock...() return the lock, such that the unlock
> will unavoidably use the correct lock?
That''s an idea; but I half wonder if it wouldn''t be better to
actually
keep vcpu_schedule_unlock(), but pass it the old lock. Then for debug 
builds we can ASSERT that the lock hasn''t changed.

  -George

Dario Faggioli

2013-Oct-11 11:15 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On ven, 2013-10-11 at 11:32 +0100, George Dunlap wrote:> So going through the code and trying to reconstruct all the state in my 
> head...
> 
> If you look at vcpu_migrate(), it grabs both locks.  But it looks like 
> the main purpose for that is so that we can call the migrate SCHED_OP(), 
> which for credit2 needs to do some mucking about with runqueues, and 
> thus needs both locks.
>Just to make sure I understood what''s going on, the SCHED_OP
you''re
referring is SCHED_OP(..,migrate,..) here, right?
> In the case of move_domain, this is unnecessary, 
> since it is removed from the old scheduler and then added to the new one.
> I think that too, and that''s why I wouldn''t take both locks:
it''d
actually be misleading rather than enlightening for people reading the
code, at least that''s how I see it.

Perhaps, we can put a comment somewhere (as George is also saying).

Regarding the patch, I personally like Jan''s idea.
> But I think this patch is still not quite right: both v->processor and 
> per_cpu(schedule_data, ...).schedule_lock may change under your feet; so 
> you always need to do the lock in a loop, checking to make sure that you 
> *still* have the right lock after you have actually grabbed it.
> Which, if I''m not mistaken, we sort of get for free it we go
Jan''s way,
don''t we?

Regards,
Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

George Dunlap

2013-Oct-11 11:32 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 11/10/13 12:15, Dario Faggioli wrote:> On ven, 2013-10-11 at 11:32 +0100, George Dunlap wrote:
>> So going through the code and trying to reconstruct all the state in my
>> head...
>>
>> If you look at vcpu_migrate(), it grabs both locks.  But it looks like
>> the main purpose for that is so that we can call the migrate
SCHED_OP(),
>> which for credit2 needs to do some mucking about with runqueues, and
>> thus needs both locks.
>>
> Just to make sure I understood what''s going on, the SCHED_OP
you''re
> referring is SCHED_OP(..,migrate,..) here, right?
>
>> In the case of move_domain, this is unnecessary,
>> since it is removed from the old scheduler and then added to the new
one.
>>
> I think that too, and that''s why I wouldn''t take both
locks: it''d
> actually be misleading rather than enlightening for people reading the
> code, at least that''s how I see it.
>
> Perhaps, we can put a comment somewhere (as George is also saying).
>
> Regarding the patch, I personally like Jan''s idea.
>
>> But I think this patch is still not quite right: both v->processor
and
>> per_cpu(schedule_data, ...).schedule_lock may change under your feet;
so
>> you always need to do the lock in a loop, checking to make sure that
you
>> *still* have the right lock after you have actually grabbed it.
>>
> Which, if I''m not mistaken, we sort of get for free it we go
Jan''s way,
> don''t we?
You mean, we could just call vcpu_schedule_lock..() instead of writing a 
bespoke loop code?  Sure, that''s definitely an advantage.

  -George

Keir Fraser

2013-Oct-11 11:47 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 10/10/2013 18:29, "David Vrabel" <david.vrabel@citrix.com>
wrote:
> From: David Vrabel <david.vrabel@citrix.com>
> 
> sched_move_domain() changes v->processor for all the domain''s
VCPUs.
> If another domain, softirq etc. triggers a simultaneous call to
> vcpu_wake() (e.g., by setting an event channel as pending), then
> vcpu_wake() may lock one schedule lock and try to unlock another.
> 
> vcpu_schedule_lock() attempts to handle this but only does so for the
> window between reading the schedule_lock from the per-CPU data and the
> spin_lock() call.  This does not help with sched_move_domain()
> changing v->processor between the calls to vcpu_schedule_lock() and
> vcpu_schedule_unlock().
> 
> Fix the race by taking the schedule_lock for v->processor in
> sched_move_domain().
> 
> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Juergen Gross <juergen.gross@ts.fujitsu.com>
> Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>

Dario Faggioli

2013-Oct-11 11:49 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On ven, 2013-10-11 at 12:32 +0100, George Dunlap wrote:> On 11/10/13 12:15, Dario Faggioli wrote:
> > On ven, 2013-10-11 at 11:32 +0100, George Dunlap wrote:
> >> But I think this patch is still not quite right: both
v->processor and
> >> per_cpu(schedule_data, ...).schedule_lock may change under your
feet; so
> >> you always need to do the lock in a loop, checking to make sure
that you
> >> *still* have the right lock after you have actually grabbed it.
> >>
> > Which, if I''m not mistaken, we sort of get for free it we go
Jan''s way,
> > don''t we?
> 
> You mean, we could just call vcpu_schedule_lock..() instead of writing a 
> bespoke loop code?  Sure, that''s definitely an advantage.
> Yes, provided we go Jan''s way and have it return the lock.

That way, we''ll have vcpu_schedule_lock() responsible for both finding
the proper lock, and doing it in the right way (with the loop, as
you''re
suggesting above), and returning it to the caller.

That would mean result in code that is both correct and looks better (no
per_cpu().schedule_lock in the caller), so it''s a win win. :-)

Dario

-- 
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Jan Beulich

2013-Oct-11 12:03 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

>>> On 11.10.13 at 13:49, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> On ven, 2013-10-11 at 12:32 +0100, George Dunlap wrote:
>> On 11/10/13 12:15, Dario Faggioli wrote:
>> > On ven, 2013-10-11 at 11:32 +0100, George Dunlap wrote:
> 
>> >> But I think this patch is still not quite right: both
v->processor and
>> >> per_cpu(schedule_data, ...).schedule_lock may change under
your feet; so
>> >> you always need to do the lock in a loop, checking to make
sure that you
>> >> *still* have the right lock after you have actually grabbed
it.
>> >>
>> > Which, if I''m not mistaken, we sort of get for free it we
go Jan''s way,
>> > don''t we?
>> 
>> You mean, we could just call vcpu_schedule_lock..() instead of writing
a
>> bespoke loop code?  Sure, that''s definitely an advantage.
>> 
> Yes, provided we go Jan''s way and have it return the lock.
> 
> That way, we''ll have vcpu_schedule_lock() responsible for both
finding
> the proper lock, and doing it in the right way (with the loop, as
you''re
> suggesting above), and returning it to the caller.
> 
> That would mean result in code that is both correct and looks better (no
> per_cpu().schedule_lock in the caller), so it''s a win win. :-)
So right now I have that change on top of David''s. As David''s
alone is an improvement (even if still leaving a hole as you
validly pointed out), I''d think we should commit it as is and then
discuss the details of my additional change (in particular the
question of whether to keep [vp]cpu_schedule_unlock...(),
which I for now the patch is dropping).

Jan

Jan Beulich

2013-Oct-11 12:20 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

>>> On 11.10.13 at 11:32, "Jan Beulich"
<JBeulich@suse.com> wrote:
> I suppose you scanned the code for other cases like this, and
> there are none?
Actually I did just now, and I think there''s a similar issue in
credit2''s init_pcpu(): After taking pcpu_schedule_lock(cpu) it
alters schedule_lock and hence effectively drops the locking,
yet continues to do other stuff before in fact releasing it.

What is being done prior to unlocking, however, looks to be
unrelated to the lock being held, and rather independently
(of the effective releasing) wanting &rqd->lock held.

Jan

George Dunlap

2013-Oct-11 14:39 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

On 11/10/13 13:20, Jan Beulich wrote:>>>> On 11.10.13 at 11:32, "Jan Beulich"
<JBeulich@suse.com> wrote:
>> I suppose you scanned the code for other cases like this, and
>> there are none?
> Actually I did just now, and I think there''s a similar issue in
> credit2''s init_pcpu(): After taking pcpu_schedule_lock(cpu) it
> alters schedule_lock and hence effectively drops the locking,
> yet continues to do other stuff before in fact releasing it.
>
> What is being done prior to unlocking, however, looks to be
> unrelated to the lock being held, and rather independently
> (of the effective releasing) wanting &rqd->lock held.
I can''t quite make out what you mean in the last sentence; but setting 
the cpu in rqd->idle and rqd->active should certainly be protected by 
rqd->lock, and it certainly looks like it''s not being grabbed at the
moment.

Hmm -- I think we may need to do some kind of fancy looping thing like 
we do in vcpu_migrate, to lock both the current schedule lock and 
rqd->lock; with the difference, I suppose, that rqd lock won''t
change
(since the assignment of cpu->runqueue at the moment is static).

Let me put this on my list of things to do before the release.

  -George

George Dunlap

2013-Oct-11 14:45 UTC

head link

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

create ^
title it credit2: init_pcpu doesn''t grab runqueue lock before pointing
pcpu schedule_lock to it
thanks

On Fri, Oct 11, 2013 at 1:20 PM, Jan Beulich <JBeulich@suse.com>
wrote:>>>> On 11.10.13 at 11:32, "Jan Beulich"
<JBeulich@suse.com> wrote:
>> I suppose you scanned the code for other cases like this, and
>> there are none?
>
> Actually I did just now, and I think there''s a similar issue in
> credit2''s init_pcpu(): After taking pcpu_schedule_lock(cpu) it
> alters schedule_lock and hence effectively drops the locking,
> yet continues to do other stuff before in fact releasing it.
>
> What is being done prior to unlocking, however, looks to be
> unrelated to the lock being held, and rather independently
> (of the effective releasing) wanting &rqd->lock held.
>
> Jan
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

xen@bugs.xenproject.org

2013-Oct-11 15:00 UTC

head link

Processed: Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Processing commands for xen@bugs.xenproject.org:
> create ^Created new bug #21 rooted at
`<5258094502000078000FA917@nat28.tlf.novell.com>''
Title: `Re: [Xen-devel] [PATCH] sched: fix race between sched_move_domain() and
vcpu_wake()''> title it credit2: init_pcpu doesn''t grab runqueue lock before
pointingSet title for #21 to `credit2: init_pcpu doesn''t grab runqueue lock
before pointing''> pcpu schedule_lock to itCommand failed: Unknown command `pcpu''. at
/srv/xen-devel-bugs/lib/emesinae/control.pl line 437, <M> line 42.
Stop processing here.

Modified/created Bugs:
 - 21: http://bugs.xenproject.org/xen/bug/21 (new)

---
Xen Hypervisor Bug Tracker
See http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for information on
reporting bugs
Contact xen-bugs-owner@bugs.xenproject.org with any infrastructure issues

Xen devel - Oct 2013 - [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

[PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()

Processed: Re: [PATCH] sched: fix race between sched_move_domain() and vcpu_wake()