thr3ads.net - Xen devel - [Xen-devel] CPU offlining patch xen-unstable:21049 [Apr 2010]

If this information is useful, please help other people find it:
Share via:

Keir Fraser

2010-Apr-15 08:31 UTC

[Xen-devel] CPU offlining patch xen-unstable:21049

Yunhong,

Just thinking some more about the above patch, which made the xenpf_lock and
systctl_lock both retryable. I don''t think this is necessary after all,
since cpu_down() is called via continue_hypercall_on_cpu(), and hence these
locks are released before the hypercall continuation is executed. Hence we
should be able to revert a couple of hunks of this changeset, as only
cpu_add_remove_lock is actually a problem, wouldn''t you agree?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Apr-15 08:50 UTC

head link

[Xen-devel] RE: CPU offlining patch xen-unstable:21049

I think the try_lock is not for the cpu_down(). The point is, if another CPU is
trying the get the lock.

Considering following scnerio:
1) cpu_down() in CPU A, and get the xenpf_lock, then call to stop_machine_run(),
trying to bring all CPU to stop_machine_run context.
2) At the same time, another vcpu in CPU B do a xenpf hypercall, and try to get
the xenpf_lock. If ther is no retyr for this lock, it can''t get
xenpf_lock, it will never go to the softirq
So the system will hang.

Hope this make thing clear.

Thanks
--jyh
>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>Sent: Thursday, April 15, 2010 4:31 PM
>To: Jiang, Yunhong
>Cc: xen-devel@lists.xensource.com
>Subject: CPU offlining patch xen-unstable:21049
>
>Yunhong,
>
>Just thinking some more about the above patch, which made the xenpf_lock and
>systctl_lock both retryable. I don''t think this is necessary after
all,
>since cpu_down() is called via continue_hypercall_on_cpu(), and hence these
>locks are released before the hypercall continuation is executed. Hence we
>should be able to revert a couple of hunks of this changeset, as only
>cpu_add_remove_lock is actually a problem, wouldn''t you agree?
>
> -- Keir
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-15 10:17 UTC

head link

[Xen-devel] Re: CPU offlining patch xen-unstable:21049

On 15/04/2010 09:50, "Jiang, Yunhong" <yunhong.jiang@intel.com>
wrote:
> I think the try_lock is not for the cpu_down(). The point is, if another
CPU
> is trying the get the lock.
> 
> Considering following scnerio:
> 1) cpu_down() in CPU A, and get the xenpf_lock, then call to
> stop_machine_run(), trying to bring all CPU to stop_machine_run context.
> 2) At the same time, another vcpu in CPU B do a xenpf hypercall, and try to
> get the xenpf_lock. If ther is no retyr for this lock, it can''t
get
> xenpf_lock, it will never go to the softirq
> So the system will hang.
> 
> Hope this make thing clear.
But CPU A doesn''t hold the xenpf_lock when it calls stop_machine_run().
It
dropped it before cpu_down() got invoked, because that gets executed via
continue_hypercall_on_cpu().

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Apr-15 10:19 UTC

head link

[Xen-devel] RE: CPU offlining patch xen-unstable:21049

Aha, yes, you are right. So do I need create a patch, or you can simply revert
some chunks?

--jyh
>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>Sent: Thursday, April 15, 2010 6:17 PM
>To: Jiang, Yunhong
>Cc: xen-devel@lists.xensource.com
>Subject: Re: CPU offlining patch xen-unstable:21049
>
>On 15/04/2010 09:50, "Jiang, Yunhong"
<yunhong.jiang@intel.com> wrote:
>
>> I think the try_lock is not for the cpu_down(). The point is, if
another CPU
>> is trying the get the lock.
>>
>> Considering following scnerio:
>> 1) cpu_down() in CPU A, and get the xenpf_lock, then call to
>> stop_machine_run(), trying to bring all CPU to stop_machine_run
context.
>> 2) At the same time, another vcpu in CPU B do a xenpf hypercall, and
try to
>> get the xenpf_lock. If ther is no retyr for this lock, it
can''t get
>> xenpf_lock, it will never go to the softirq
>> So the system will hang.
>>
>> Hope this make thing clear.
>
>But CPU A doesn''t hold the xenpf_lock when it calls
stop_machine_run(). It
>dropped it before cpu_down() got invoked, because that gets executed via
>continue_hypercall_on_cpu().
>
> -- Keir
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-15 11:04 UTC

head link

[Xen-devel] Re: CPU offlining patch xen-unstable:21049

I decided to keep the spin_trylock as they quell my paranoia about other
possible deadlock scenarios inside those complicated hypercall functions.
But I have modified the comments appropriately, in xen-unstable:21179. Note
that this also depends on xen-unstable:21178 (we mustn''t execute the
hypercall continuation immediately, in the context of the caller of
c_h_o_c()). Thanks.

But, here''s a more subtle and more tricky deadlock scenario for you.
You''ll
like this one :-): stop_machine_run() schedules a softirq on every CPU.
Let''s say CPU A enters our softirq handler, interrupting some guest
VCPU X
which is still scheduled on CPU A. But some other CPU B could be waiting for
X to be descheduled (one obvious example is hvmop_flush_tlb_all, which is a
good one because some HVM guest can call that at any time). So we never get
full softirq rendezvous because CPU B is spinning in hvmop_flush_tlb_all(),
while CPU A spins in the stop_machine softirq handler. Deadlock!

What do you think of that? :-D

 -- Keir

On 15/04/2010 11:19, "Jiang, Yunhong" <yunhong.jiang@intel.com>
wrote:
> Aha, yes, you are right. So do I need create a patch, or you can simply
revert
> some chunks?
> 
> --jyh
> 
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> Sent: Thursday, April 15, 2010 6:17 PM
>> To: Jiang, Yunhong
>> Cc: xen-devel@lists.xensource.com
>> Subject: Re: CPU offlining patch xen-unstable:21049
>> 
>> On 15/04/2010 09:50, "Jiang, Yunhong"
<yunhong.jiang@intel.com> wrote:
>> 
>>> I think the try_lock is not for the cpu_down(). The point is, if
another CPU
>>> is trying the get the lock.
>>> 
>>> Considering following scnerio:
>>> 1) cpu_down() in CPU A, and get the xenpf_lock, then call to
>>> stop_machine_run(), trying to bring all CPU to stop_machine_run
context.
>>> 2) At the same time, another vcpu in CPU B do a xenpf hypercall,
and try to
>>> get the xenpf_lock. If ther is no retyr for this lock, it
can''t get
>>> xenpf_lock, it will never go to the softirq
>>> So the system will hang.
>>> 
>>> Hope this make thing clear.
>> 
>> But CPU A doesn''t hold the xenpf_lock when it calls
stop_machine_run(). It
>> dropped it before cpu_down() got invoked, because that gets executed
via
>> continue_hypercall_on_cpu().
>> 
>> -- Keir
>> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2010 - CPU offlining patch xen-unstable:21049

[Xen-devel] CPU offlining patch xen-unstable:21049

[Xen-devel] RE: CPU offlining patch xen-unstable:21049

[Xen-devel] Re: CPU offlining patch xen-unstable:21049

[Xen-devel] RE: CPU offlining patch xen-unstable:21049

[Xen-devel] Re: CPU offlining patch xen-unstable:21049