thr3ads.net - Xen devel - [Xen-devel] [Patch] continue_hypercall_on

If this information is useful, please help other people find it:
Share via:

Juergen Gross

2010-Apr-13 13:19 UTC

[Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Hi,

attached is a patch to use tasklets for continue_hypercall_on_cpu instead of
temporarily pinning the vcpu to the target physical cpu.

This is thought as base for cpupools as Keir requested to get rid of the
"borrow cpu" stuff in my original solution.

This is a rework of the patch I sent before based on some comments by Jan
Beulich.

Keir, if you agree to take this patch, I''ll send out the cpupool stuff
based
on this patch again. There is just one question pending:

cpupools are using continue_hypercall_on_cpu, but this function is defined for
x86 only, not for ia64. I see 2 possible solutions:

1. make continue_hypercall_on_cpu available on ia64, too
2. make cpupools a pure x86 feature

I think the first solution would be better, but I would need help on this as I
don''t have any test machine.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2010-Apr-13 13:40 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

>>> Juergen Gross <juergen.gross@ts.fujitsu.com> 13.04.10 15:19
>>>
>cpupools are using continue_hypercall_on_cpu, but this function is defined
for
>x86 only, not for ia64. I see 2 possible solutions:
>
>1. make continue_hypercall_on_cpu available on ia64, too
>2. make cpupools a pure x86 feature
Is there anything meaningful in the new code that''s really x86-specific
(i.e. can''t the whole code chunk be moved to xen/common/)?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Apr-13 13:49 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Jan Beulich wrote:>>>> Juergen Gross <juergen.gross@ts.fujitsu.com> 13.04.10
15:19 >>>
>> cpupools are using continue_hypercall_on_cpu, but this function is
defined for
>> x86 only, not for ia64. I see 2 possible solutions:
>>
>> 1. make continue_hypercall_on_cpu available on ia64, too
>> 2. make cpupools a pure x86 feature
> 
> Is there anything meaningful in the new code that''s really
x86-specific
> (i.e. can''t the whole code chunk be moved to xen/common/)?
My main concern is the difference in the handling of schedule_tail. I just
can''t tell how to change the ia64 variant to make it work correctly.

The other x86-specific part is the setting of the return register, but this
is rather easy doable by using an architecture specific macro.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-13 14:41 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 13/04/2010 14:19, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> attached is a patch to use tasklets for continue_hypercall_on_cpu instead
of
> temporarily pinning the vcpu to the target physical cpu.
> 
> This is thought as base for cpupools as Keir requested to get rid of the
> "borrow cpu" stuff in my original solution.
Why do you change the interface of continue_hypercall_on_cpu()? What''s
a
''hdl'' anyway?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-13 15:08 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 13/04/2010 14:49, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>>> 1. make continue_hypercall_on_cpu available on ia64, too
>>> 2. make cpupools a pure x86 feature
>> 
>> Is there anything meaningful in the new code that''s really
x86-specific
>> (i.e. can''t the whole code chunk be moved to xen/common/)?
> 
> My main concern is the difference in the handling of schedule_tail. I just
> can''t tell how to change the ia64 variant to make it work
correctly.
We were messing with schedule_tail in order to get control into the
hypercall continuation in the context of the same vcpu but on a different
pcpu. Now that you just execute the continuation in a straightforward
softirq/tasklet context, you don''t need to mess with schedule_tail at
all...
Right?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Apr-14 04:26 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Keir Fraser wrote:> On 13/04/2010 14:19, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>> attached is a patch to use tasklets for continue_hypercall_on_cpu
instead of
>> temporarily pinning the vcpu to the target physical cpu.
>>
>> This is thought as base for cpupools as Keir requested to get rid of
the
>> "borrow cpu" stuff in my original solution.
> 
> Why do you change the interface of continue_hypercall_on_cpu()?
What''s a
> ''hdl'' anyway?
I need a way to find the migrate_info structure in case of nested calls of
continue_hypercall_on_cpu(). Originally this was done by storing it in the
vcpu structure, but this can''t be done any more using tasklets. In my
first
attempt I saved it in the per-cpu area, but this approach isn''t working
if
continue_hypercall_on_cpu() is called concurrently. So the cleanest way is
to pass it via a parameter.
''hdl'' is just a ''handle''.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-14 06:46 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 14/04/2010 05:26, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>> Why do you change the interface of continue_hypercall_on_cpu()?
What''s a
>> ''hdl'' anyway?
> 
> I need a way to find the migrate_info structure in case of nested calls of
> continue_hypercall_on_cpu(). Originally this was done by storing it in the
> vcpu structure, but this can''t be done any more using tasklets. In
my first
> attempt I saved it in the per-cpu area, but this approach isn''t
working if
> continue_hypercall_on_cpu() is called concurrently. So the cleanest way is
> to pass it via a parameter.
The per-cpu area method should work fine, since Xen is non-preemptive? I
don''t think the concurrency you are concerned about can happen.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Apr-14 06:54 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Keir Fraser wrote:> On 13/04/2010 14:49, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>>> 1. make continue_hypercall_on_cpu available on ia64, too
>>>> 2. make cpupools a pure x86 feature
>>> Is there anything meaningful in the new code that''s really
x86-specific
>>> (i.e. can''t the whole code chunk be moved to xen/common/)?
>> My main concern is the difference in the handling of schedule_tail. I
just
>> can''t tell how to change the ia64 variant to make it work
correctly.
> 
> We were messing with schedule_tail in order to get control into the
> hypercall continuation in the context of the same vcpu but on a different
> pcpu. Now that you just execute the continuation in a straightforward
> softirq/tasklet context, you don''t need to mess with schedule_tail
at all...
> Right?
Aah, you mean something like the attached patch?


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Apr-14 06:58 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Keir Fraser wrote:> On 14/04/2010 05:26, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>> Why do you change the interface of continue_hypercall_on_cpu()?
What''s a
>>> ''hdl'' anyway?
>> I need a way to find the migrate_info structure in case of nested calls
of
>> continue_hypercall_on_cpu(). Originally this was done by storing it in
the
>> vcpu structure, but this can''t be done any more using
tasklets. In my first
>> attempt I saved it in the per-cpu area, but this approach
isn''t working if
>> continue_hypercall_on_cpu() is called concurrently. So the cleanest way
is
>> to pass it via a parameter.
> 
> The per-cpu area method should work fine, since Xen is non-preemptive? I
> don''t think the concurrency you are concerned about can happen.
The tasklet knows only on which cpu it is running, so the data has to be
stored on the target cpu. And one pcpu can be the target of concurrent calls
from different calling cpus...


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-14 07:15 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 14/04/2010 07:58, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>> The per-cpu area method should work fine, since Xen is non-preemptive?
I
>> don''t think the concurrency you are concerned about can
happen.
> 
> The tasklet knows only on which cpu it is running, so the data has to be
> stored on the target cpu. And one pcpu can be the target of concurrent
calls
> from different calling cpus...
A tasklet also takes an arbitrary ulong parameter, which you can cast to a
pointer to your informational structure. The parameter is specified via
tasklet_init(). That should suffice.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Apr-14 07:25 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Keir Fraser wrote:> On 14/04/2010 07:58, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>> The per-cpu area method should work fine, since Xen is
non-preemptive? I
>>> don''t think the concurrency you are concerned about can
happen.
>> The tasklet knows only on which cpu it is running, so the data has to
be
>> stored on the target cpu. And one pcpu can be the target of concurrent
calls
>> from different calling cpus...
> 
> A tasklet also takes an arbitrary ulong parameter, which you can cast to a
> pointer to your informational structure. The parameter is specified via
> tasklet_init(). That should suffice.
I''m already using this. The problem is to find the original calling
vcpu in
case of a nested call of continue_hypercall_on_cpu() while not conflicting
with concurrent calls from other vcpus which happen to address the same pcpu.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-14 07:35 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 14/04/2010 08:25, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>> A tasklet also takes an arbitrary ulong parameter, which you can cast
to a
>> pointer to your informational structure. The parameter is specified via
>> tasklet_init(). That should suffice.
> 
> I''m already using this. The problem is to find the original
calling vcpu in
> case of a nested call of continue_hypercall_on_cpu() while not conflicting
> with concurrent calls from other vcpus which happen to address the same
pcpu.
There can be only one nested invocation on any given pcpu, since a running
invocation is never preempted. Hence on entry to c_h_o_c() you can check a
per-cpu variable to see whether this invocation is nesting, or not. And if
it is, that variable can be a pointer to an info structure which includes a
pointer to the invoking vcpu.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Apr-14 08:04 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Keir Fraser wrote:> On 14/04/2010 08:25, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>> A tasklet also takes an arbitrary ulong parameter, which you can
cast to a
>>> pointer to your informational structure. The parameter is specified
via
>>> tasklet_init(). That should suffice.
>> I''m already using this. The problem is to find the original
calling vcpu in
>> case of a nested call of continue_hypercall_on_cpu() while not
conflicting
>> with concurrent calls from other vcpus which happen to address the same
pcpu.
> 
> There can be only one nested invocation on any given pcpu, since a running
> invocation is never preempted. Hence on entry to c_h_o_c() you can check a
> per-cpu variable to see whether this invocation is nesting, or not. And if
> it is, that variable can be a pointer to an info structure which includes a
> pointer to the invoking vcpu.
Okay, attached is the modified patch again.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-14 10:30 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 14/04/2010 09:04, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>> 
>> There can be only one nested invocation on any given pcpu, since a
running
>> invocation is never preempted. Hence on entry to c_h_o_c() you can
check a
>> per-cpu variable to see whether this invocation is nesting, or not. And
if
>> it is, that variable can be a pointer to an info structure which
includes a
>> pointer to the invoking vcpu.
> 
> Okay, attached is the modified patch again.
I cleaned up some more and applied as xen-unstable:21165 and
xen-unstable:21166.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Apr-15 06:31 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Keir Fraser wrote:> On 14/04/2010 09:04, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>> There can be only one nested invocation on any given pcpu, since a
running
>>> invocation is never preempted. Hence on entry to c_h_o_c() you can
check a
>>> per-cpu variable to see whether this invocation is nesting, or not.
And if
>>> it is, that variable can be a pointer to an info structure which
includes a
>>> pointer to the invoking vcpu.
>> Okay, attached is the modified patch again.
> 
> I cleaned up some more and applied as xen-unstable:21165 and
> xen-unstable:21166.
Thanks. Unfortunately, your modifications are not working. Microcode update
hangs. My version worked without problems.

Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-15 06:39 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 15/04/2010 07:31, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>>> Okay, attached is the modified patch again.
>> 
>> I cleaned up some more and applied as xen-unstable:21165 and
>> xen-unstable:21166.
> 
> Thanks. Unfortunately, your modifications are not working. Microcode update
> hangs. My version worked without problems.
What revision did you test? I put in some fixes as c/s 21173.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Apr-15 07:57 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Keir Fraser wrote:> On 15/04/2010 07:31, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>>> Okay, attached is the modified patch again.
>>> I cleaned up some more and applied as xen-unstable:21165 and
>>> xen-unstable:21166.
>> Thanks. Unfortunately, your modifications are not working. Microcode
update
>> hangs. My version worked without problems.
> 
> What revision did you test? I put in some fixes as c/s 21173.
My highest c/s was 21167.
c/s 21173 is hanging, too (sorry for the delay, but I had to remove my cpupool
stuff due to the scheduler changes for credit2).

Is a call of sync_vcpu_execstate() fro a tasklet really allowed? I
don''t
think the ASSERTs in __sync_lazy_execstate() are all fulfilled in this case.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-15 08:13 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 15/04/2010 08:57, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>> What revision did you test? I put in some fixes as c/s 21173.
> 
> My highest c/s was 21167.
> c/s 21173 is hanging, too (sorry for the delay, but I had to remove my
cpupool
> stuff due to the scheduler changes for credit2).
Ah yes, just done a test myself (clearly my dom0 setup is not doing
microcode updates) and I''ve now fixed it as c/s 21176. Thanks!
> Is a call of sync_vcpu_execstate() fro a tasklet really allowed? I
don''t
> think the ASSERTs in __sync_lazy_execstate() are all fulfilled in this
case.
Better hope so or e.g.,
acpi_enter_sleep
->continue_hypercall_on_cpu(enter_state_helper)
->enter_state
->freeze_domains
->domain_pause
->vcpu_sleep_sync
->sync_vcpu_execstate
Also wouldn''t work.

There is only one ASSERT in __sync_lazy_execstate, and it''s safe for
this
case. Bear in mind that our softirqs always run in the context of whatever
domain happens to be running on that cpu currently -- they don''t have
their
own proper vcpu context.

By the by, your original attempt at synchronisation (spin on return value in
regs changing) was risky as it could be unbounded time before the vcpu
registers get copied out of the original cpu''s stack. Especially during
early dom0 boot, when the system is very idle.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-15 08:22 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 15/04/2010 09:13, "Keir Fraser" <keir.fraser@eu.citrix.com>
wrote:
> Better hope so or e.g.,
> acpi_enter_sleep
> ->continue_hypercall_on_cpu(enter_state_helper)
> ->enter_state
> ->freeze_domains
> ->domain_pause
> ->vcpu_sleep_sync
> ->sync_vcpu_execstate
> Also wouldn''t work.
Actually that''s a good example because it now won''t work, but
for other
reasons! The hypercall continuation can interrupt another vcpu''s
execution,
and then try to synchronously pause that vcpu. Which will deadlock.

Luckily I think we can re-jig this code to freeze_domains() before doing the
continue_hypercall_on_cpu(). I''ve cc''ed one of the CPU RAS
guys. :-)

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Apr-15 08:29 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Keir Fraser wrote:> On 15/04/2010 08:57, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
> 
>>> What revision did you test? I put in some fixes as c/s 21173.
>> My highest c/s was 21167.
>> c/s 21173 is hanging, too (sorry for the delay, but I had to remove my
cpupool
>> stuff due to the scheduler changes for credit2).
> 
> Ah yes, just done a test myself (clearly my dom0 setup is not doing
> microcode updates) and I''ve now fixed it as c/s 21176. Thanks!
Great! Works for me, too.
> 
>> Is a call of sync_vcpu_execstate() fro a tasklet really allowed? I
don''t
>> think the ASSERTs in __sync_lazy_execstate() are all fulfilled in this
case.
> 
> Better hope so or e.g.,
> acpi_enter_sleep
> ->continue_hypercall_on_cpu(enter_state_helper)
> ->enter_state
> ->freeze_domains
> ->domain_pause
> ->vcpu_sleep_sync
> ->sync_vcpu_execstate
> Also wouldn''t work.
> 
> There is only one ASSERT in __sync_lazy_execstate, and it''s safe
for this
> case. Bear in mind that our softirqs always run in the context of whatever
> domain happens to be running on that cpu currently -- they don''t
have their
> own proper vcpu context.
I don''t see how it should be guaranteed that the current vcpu MUST be
an idle
one...
> 
> By the by, your original attempt at synchronisation (spin on return value
in
> regs changing) was risky as it could be unbounded time before the vcpu
> registers get copied out of the original cpu''s stack. Especially
during
> early dom0 boot, when the system is very idle.
Indeed. I realized my version had another flaw in case of nested calls of
c_h_o_c: if one call would return a non-zero value and issue another call
of c_h_o_c, the tasklet would hang for ever...
Your version is cleaner.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-15 09:08 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 15/04/2010 09:29, "Juergen Gross"
<juergen.gross@ts.fujitsu.com> wrote:
>> There is only one ASSERT in __sync_lazy_execstate, and it''s
safe for this
>> case. Bear in mind that our softirqs always run in the context of
whatever
>> domain happens to be running on that cpu currently -- they
don''t have their
>> own proper vcpu context.
> 
> I don''t see how it should be guaranteed that the current vcpu MUST
be an idle
> one...
It''s odd because someone else asked this exact same question, so the
code
must be more subtle than I thought. Note that the ASSERT is inside
if(switch_required), and switch_required is true only if the vcpu whose
state is loaded on this CPU is not the same as the currently-scheduled vcpu.
This odd situation is only possible if we are lazy-syncing the former
vcpu''s
state, and that only occurs if the currently-running vcpu is the idle vcpu
(as all other vcpus would need their state loaded immediately, so lazy sync
does not apply).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Apr-15 09:59 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>Sent: Thursday, April 15, 2010 4:22 PM
>To: Juergen Gross
>Cc: xen-devel@lists.xensource.com; Jiang, Yunhong
>Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using
tasklets
>
>On 15/04/2010 09:13, "Keir Fraser"
<keir.fraser@eu.citrix.com> wrote:
>
>> Better hope so or e.g.,
>> acpi_enter_sleep
>> ->continue_hypercall_on_cpu(enter_state_helper)
>> ->enter_state
>> ->freeze_domains
>> ->domain_pause
>> ->vcpu_sleep_sync
>> ->sync_vcpu_execstate
>> Also wouldn''t work.
>
>Actually that''s a good example because it now won''t work,
but for other
>reasons! The hypercall continuation can interrupt another vcpu''s
execution,
>and then try to synchronously pause that vcpu. Which will deadlock.
>
>Luckily I think we can re-jig this code to freeze_domains() before doing the
>continue_hypercall_on_cpu(). I''ve cc''ed one of the CPU RAS
guys. :-)
Hmm, I have cc''ed one of the PM guys because it is enter_state :-)
Can we add check in vcpu_sleep_sync() for current? It is meaningless to
cpu_relax for current vcpu in that situation, especially if we are not in irq
context.
I''m not sure why in freeze_domains it only checkes dom0''s vcpu
for current, instead of all domains.

--jyh
>
> -- Keir
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-15 11:06 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 15/04/2010 10:59, "Jiang, Yunhong" <yunhong.jiang@intel.com>
wrote:
>> Actually that''s a good example because it now won''t
work, but for other
>> reasons! The hypercall continuation can interrupt another
vcpu''s execution,
>> and then try to synchronously pause that vcpu. Which will deadlock.
>> 
>> Luckily I think we can re-jig this code to freeze_domains() before
doing the
>> continue_hypercall_on_cpu(). I''ve cc''ed one of the
CPU RAS guys. :-)
> 
> Hmm, I have cc''ed one of the PM guys because it is enter_state :-)
> Can we add check in vcpu_sleep_sync() for current? It is meaningless to
> cpu_relax for current vcpu in that situation, especially if we are not in
irq
> context.
> I''m not sure why in freeze_domains it only checkes dom0''s
vcpu for current,
> instead of all domains.
Well actually pausing any vcpu from within the hypercall continuation is
dangerous. The softirq handler running the hypercall continuation may have
interrupted some running VCPU X. And the VCPU Y that the continuation is
currently trying to pause may itself be trying to pause X. So we can get a
deadlock that way. The freeze_domains() *has* to be pulled outside of the
hypercall continuation.

It''s a little bit similar to the super-subtle stop_machine_run deadlock
possibility I just emailed to you a second ago. :-)

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-15 11:22 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 15/04/2010 12:06, "Keir Fraser" <keir.fraser@eu.citrix.com>
wrote:
>> Hmm, I have cc''ed one of the PM guys because it is enter_state
:-)
>> Can we add check in vcpu_sleep_sync() for current? It is meaningless to
>> cpu_relax for current vcpu in that situation, especially if we are not
in irq
>> context.
>> I''m not sure why in freeze_domains it only checkes
dom0''s vcpu for current,
>> instead of all domains.
> 
> Well actually pausing any vcpu from within the hypercall continuation is
> dangerous. The softirq handler running the hypercall continuation may have
> interrupted some running VCPU X. And the VCPU Y that the continuation is
> currently trying to pause may itself be trying to pause X. So we can get a
> deadlock that way. The freeze_domains() *has* to be pulled outside of the
> hypercall continuation.
Changesets 21180 and 21181 in xen-unstable are my attempts to fix this. Yu
Ke, please let me know if these look and test okay for you guys.

 Thanks,
 Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Apr-16 06:42 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>Sent: Thursday, April 15, 2010 7:07 PM
>To: Jiang, Yunhong; Juergen Gross
>Cc: xen-devel@lists.xensource.com; Yu, Ke
>Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using
tasklets
>
>On 15/04/2010 10:59, "Jiang, Yunhong"
<yunhong.jiang@intel.com> wrote:
>
>>> Actually that''s a good example because it now
won''t work, but for other
>>> reasons! The hypercall continuation can interrupt another
vcpu''s execution,
>>> and then try to synchronously pause that vcpu. Which will deadlock.
>>>
>>> Luckily I think we can re-jig this code to freeze_domains() before
doing the
>>> continue_hypercall_on_cpu(). I''ve cc''ed one of
the CPU RAS guys. :-)
>>
>> Hmm, I have cc''ed one of the PM guys because it is enter_state
:-)
>> Can we add check in vcpu_sleep_sync() for current? It is meaningless to
>> cpu_relax for current vcpu in that situation, especially if we are not
in irq
>> context.
>> I''m not sure why in freeze_domains it only checkes
dom0''s vcpu for current,
>> instead of all domains.
>
>Well actually pausing any vcpu from within the hypercall continuation is
>dangerous. The softirq handler running the hypercall continuation may have
>interrupted some running VCPU X. And the VCPU Y that the continuation is
>currently trying to pause may itself be trying to pause X. So we can get a
>deadlock that way. The freeze_domains() *has* to be pulled outside of the
>hypercall continuation.
>
>It''s a little bit similar to the super-subtle stop_machine_run
deadlock
>possibility I just emailed to you a second ago. :-)
Thanks for pointing out the stop_machine_run deadlock issue.

After more consideration and internally discussion, seems the key point is, the
tasklet softirq is something like getting a lock for the current vcpu''s
state(i.e. no one else could change that state unless this softirq is finished).
So any block action in softirq context, not just vcpu_pause_sync, is dangerous
and should be avoided (we can''t get a lock and do block action per my
understanding).
This is because vcpu''s state can only be changed by schedule softirq
(am I right on this?), while schedule softirq can''t prempt other
softirq. So, more generally, anything that will be updated by a softirq context,
and will be syncrhozed/blocking waitied in xen''s vcpu context is in
fact a implicit lock held by the softirq.

To the tricky bug on the stop_machine_run(), I think it is in fact similar to
the cpu_add_remove_lock. The stop_machine_run() is a block action, so we must
make sure no one will be blocking to get the lock that is held by
stop_machine_run() already. At that time, we change all components that try to
get the cpu_add_remove_lock to be try_lock.

The changes caused by the tasklet is, a new implicit lock is added, i.e. the
vcpu''s state.
The straightforward method is like the cpu_add_remove_lock: make everything that
waiting for the vcpu state change to do softirq between the checking. Maybe the
cleaner way is your previous suggestion, that is, put the stop_machine_run() in
the idle_vcpu(), another way is, turn back to the original method, i.e. do it in
the schedule_tail.

Also We are not sure why the continue_hypercall_on_cpu is changed to use
tasklet. What''s the benifit for it? At least I think this is quite
confusing, because per our understanding, usually hypercall is assumed to execut
in current context, while this change break the assumption. So any hypercall
that may use this _c_h_o_c, and any function called by that hypercall, should be
aware of this, I''m not sure if this is really so correct, at least it
may cause trouble if someone use this without realize the limitation. From
Juergen Gross''s mail, it seems for cpupool, but I have no idea of the
cpupool :-(

--jyh

>
> -- Keir
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Apr-16 06:45 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

BTW, I suspect if cpu online/offline lock should really wrap the
stop_machine_run(). I remember Criping questioned this also. Will have a look on
it.

--jyh
>-----Original Message-----
>From: Jiang, Yunhong
>Sent: Friday, April 16, 2010 2:43 PM
>To: Keir Fraser; Juergen Gross
>Cc: xen-devel@lists.xensource.com; Yu, Ke
>Subject: RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using
tasklets
>
>
>
>>-----Original Message-----
>>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>Sent: Thursday, April 15, 2010 7:07 PM
>>To: Jiang, Yunhong; Juergen Gross
>>Cc: xen-devel@lists.xensource.com; Yu, Ke
>>Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using
tasklets
>>
>>On 15/04/2010 10:59, "Jiang, Yunhong"
<yunhong.jiang@intel.com> wrote:
>>
>>>> Actually that''s a good example because it now
won''t work, but for other
>>>> reasons! The hypercall continuation can interrupt another
vcpu''s execution,
>>>> and then try to synchronously pause that vcpu. Which will
deadlock.
>>>>
>>>> Luckily I think we can re-jig this code to freeze_domains()
before doing the
>>>> continue_hypercall_on_cpu(). I''ve cc''ed one
of the CPU RAS guys. :-)
>>>
>>> Hmm, I have cc''ed one of the PM guys because it is
enter_state :-)
>>> Can we add check in vcpu_sleep_sync() for current? It is
meaningless to
>>> cpu_relax for current vcpu in that situation, especially if we are
not in irq
>>> context.
>>> I''m not sure why in freeze_domains it only checkes
dom0''s vcpu for current,
>>> instead of all domains.
>>
>>Well actually pausing any vcpu from within the hypercall continuation is
>>dangerous. The softirq handler running the hypercall continuation may
have
>>interrupted some running VCPU X. And the VCPU Y that the continuation is
>>currently trying to pause may itself be trying to pause X. So we can get
a
>>deadlock that way. The freeze_domains() *has* to be pulled outside of
the
>>hypercall continuation.
>>
>>It''s a little bit similar to the super-subtle stop_machine_run
deadlock
>>possibility I just emailed to you a second ago. :-)
>
>Thanks for pointing out the stop_machine_run deadlock issue.
>
>After more consideration and internally discussion, seems the key point is,
the
>tasklet softirq is something like getting a lock for the current
vcpu''s state(i.e. no one
>else could change that state unless this softirq is finished). So any block
action in
>softirq context, not just vcpu_pause_sync, is dangerous and should be
avoided (we
>can''t get a lock and do block action per my understanding).
>This is because vcpu''s state can only be changed by schedule
softirq (am I right on
>this?), while schedule softirq can''t prempt other softirq. So, more
generally, anything
>that will be updated by a softirq context, and will be syncrhozed/blocking
waitied in
>xen''s vcpu context is in fact a implicit lock held by the softirq.
>
>To the tricky bug on the stop_machine_run(), I think it is in fact similar
to the
>cpu_add_remove_lock. The stop_machine_run() is a block action, so we must
make
>sure no one will be blocking to get the lock that is held by
stop_machine_run()
>already. At that time, we change all components that try to get the
>cpu_add_remove_lock to be try_lock.
>
>The changes caused by the tasklet is, a new implicit lock is added, i.e. the
vcpu''s
>state.
>The straightforward method is like the cpu_add_remove_lock: make everything
that
>waiting for the vcpu state change to do softirq between the checking. Maybe
the
>cleaner way is your previous suggestion, that is, put the stop_machine_run()
in the
>idle_vcpu(), another way is, turn back to the original method, i.e. do it in
the
>schedule_tail.
>
>Also We are not sure why the continue_hypercall_on_cpu is changed to use
tasklet.
>What''s the benifit for it? At least I think this is quite
confusing, because per our
>understanding, usually hypercall is assumed to execut in current context,
while this
>change break the assumption. So any hypercall that may use this _c_h_o_c,
and any
>function called by that hypercall, should be aware of this, I''m not
sure if this is really
>so correct, at least it may cause trouble if someone use this without
realize the
>limitation. From Juergen Gross''s mail, it seems for cpupool, but I
have no idea of the
>cpupool :-(
>
>--jyh
>
>
>>
>> -- Keir
>>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Apr-16 06:55 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Jiang, Yunhong wrote:> 
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> Sent: Thursday, April 15, 2010 7:07 PM
>> To: Jiang, Yunhong; Juergen Gross
>> Cc: xen-devel@lists.xensource.com; Yu, Ke
>> Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using
tasklets
>>
>> On 15/04/2010 10:59, "Jiang, Yunhong"
<yunhong.jiang@intel.com> wrote:
>>
>>>> Actually that''s a good example because it now
won''t work, but for other
>>>> reasons! The hypercall continuation can interrupt another
vcpu''s execution,
>>>> and then try to synchronously pause that vcpu. Which will
deadlock.
>>>>
>>>> Luckily I think we can re-jig this code to freeze_domains()
before doing the
>>>> continue_hypercall_on_cpu(). I''ve cc''ed one
of the CPU RAS guys. :-)
>>> Hmm, I have cc''ed one of the PM guys because it is
enter_state :-)
>>> Can we add check in vcpu_sleep_sync() for current? It is
meaningless to
>>> cpu_relax for current vcpu in that situation, especially if we are
not in irq
>>> context.
>>> I''m not sure why in freeze_domains it only checkes
dom0''s vcpu for current,
>>> instead of all domains.
>> Well actually pausing any vcpu from within the hypercall continuation
is
>> dangerous. The softirq handler running the hypercall continuation may
have
>> interrupted some running VCPU X. And the VCPU Y that the continuation
is
>> currently trying to pause may itself be trying to pause X. So we can
get a
>> deadlock that way. The freeze_domains() *has* to be pulled outside of
the
>> hypercall continuation.
>>
>> It''s a little bit similar to the super-subtle stop_machine_run
deadlock
>> possibility I just emailed to you a second ago. :-)
> 
> Thanks for pointing out the stop_machine_run deadlock issue.
> 
> After more consideration and internally discussion, seems the key point is,
the tasklet softirq is something like getting a lock for the current
vcpu''s state(i.e. no one else could change that state unless this
softirq is finished). So any block action in softirq context, not just
vcpu_pause_sync, is dangerous and should be avoided (we can''t get a
lock and do block action per my understanding).
> This is because vcpu''s state can only be changed by schedule
softirq (am I right on this?), while schedule softirq can''t prempt
other softirq. So, more generally, anything that will be updated by a softirq
context, and will be syncrhozed/blocking waitied in xen''s vcpu context
is in fact a implicit lock held by the softirq.
> 
> To the tricky bug on the stop_machine_run(), I think it is in fact similar
to the cpu_add_remove_lock. The stop_machine_run() is a block action, so we must
make sure no one will be blocking to get the lock that is held by
stop_machine_run() already. At that time, we change all components that try to
get the cpu_add_remove_lock to be try_lock.
> 
> The changes caused by the tasklet is, a new implicit lock is added, i.e.
the vcpu''s state.
> The straightforward method is like the cpu_add_remove_lock: make everything
that waiting for the vcpu state change to do softirq between the checking. Maybe
the cleaner way is your previous suggestion, that is, put the stop_machine_run()
in the idle_vcpu(), another way is, turn back to the original method, i.e. do it
in the schedule_tail.
> 
> Also We are not sure why the continue_hypercall_on_cpu is changed to use
tasklet. What''s the benifit for it? At least I think this is quite
confusing, because per our understanding, usually hypercall is assumed to execut
in current context, while this change break the assumption. So any hypercall
that may use this _c_h_o_c, and any function called by that hypercall, should be
aware of this, I''m not sure if this is really so correct, at least it
may cause trouble if someone use this without realize the limitation. From
Juergen Gross''s mail, it seems for cpupool, but I have no idea of the
cpupool :-(
Cpupools introduce something like "scheduling domains" in xen. Each
cpupool
owns a set of physical cpus and has an own scheduler. Each domain is member
of a cpupool.

It is possible to move cpus or domains between pools, but a domain is always
limited to the physical cpus being in the cpupool of the domain.

This limitation makes it impossible to use continue_hypercall_on_cpu with
cpupools for any physical cpu without changing it. My original solution
temporarily moved the target cpu into the cpupool of the issuing domain,
but this was regarded as an ugly hack.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-16 07:13 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 16/04/2010 07:42, "Jiang, Yunhong" <yunhong.jiang@intel.com>
wrote:
> Thanks for pointing out the stop_machine_run deadlock issue.
> 
> After more consideration and internally discussion, seems the key point is,
> the tasklet softirq is something like getting a lock for the current
vcpu''s
> state(i.e. no one else could change that state unless this softirq is
> finished).
>
> So any block action in softirq context, not just vcpu_pause_sync,
> is dangerous and should be avoided (we can''t get a lock and do
block action
> per my understanding).
> This is because vcpu''s state can only be changed by schedule
softirq (am I
> right on this?), while schedule softirq can''t prempt other
softirq. So, more
> generally, anything that will be updated by a softirq context, and will be
> syncrhozed/blocking waitied in xen''s vcpu context is in fact a
implicit lock
> held by the softirq.
I think you''re at risk of over-analysing the situation. You cannot
safely
synchronously pause vcpus from within softirq context. I''m not sure we
can
extrapolate further than that.
> To the tricky bug on the stop_machine_run(), I think it is in fact similar
to
> the cpu_add_remove_lock. The stop_machine_run() is a block action, so we
must
> make sure no one will be blocking to get the lock that is held by
> stop_machine_run() already. At that time, we change all components that try
to
> get the cpu_add_remove_lock to be try_lock.
> 
> The changes caused by the tasklet is, a new implicit lock is added, i.e.
the
> vcpu''s state.
Let me be clear: the deadlock I described in stop_machine_run() is *not*
introduced by the c_h_o_c reimplementation. It has been there all along.
Nothing in my description of the deadlock depended on the implementation of
c_h_o_c: it depended only on the baheviour of stop_machine_run itself, which
does not internally use c_h_o_c.
> The straightforward method is like the cpu_add_remove_lock: make everything
> that waiting for the vcpu state change to do softirq between the checking.
Possible, could have impacts of its own of course. We couldn''t do
SCHEDULE_SOFTIRQ as we would lose the caller''s context, and the caller
could
not hold locks when pausing others (although I suppose they generally
can''t
anyway).
> Maybe the cleaner way is your previous suggestion, that is, put the
> stop_machine_run() in the idle_vcpu(), another way is, turn back to the
> original method, i.e. do it in the schedule_tail.
Argh. That would be annoying!

Another possibility is to... shudder... introduce a timeout. Wait only e.g.,
1ms for all vcpus to enter the STOPMACHINE_PREPARE state and if they
don''t,
cancel the operation and return -EBUSY. There are already a bunch of
opportunities to return ''spurious'' -EBUSY already in the
cpu-offline paths,
so tools already need to know to retry at least a certain number of times.
Undoubtedly this is the dirty solution, but it is easy-ish to implement and
should only be allowing us to avoid an extremely rare deadlock possibility.
It would just be critical to pick a large enough timeout!
> Also We are not sure why the continue_hypercall_on_cpu is changed to use
> tasklet. What''s the benifit for it? At least I think this is quite
confusing,
> because per our understanding, usually hypercall is assumed to execut in
> current context, while this change break the assumption. So any hypercall
that
> may use this _c_h_o_c, and any function called by that hypercall, should be
> aware of this, I''m not sure if this is really so correct, at least
it may
> cause trouble if someone use this without realize the limitation. From
Juergen
> Gross''s mail, it seems for cpupool, but I have no idea of the
cpupool :-(
The implementation is simpler and lets us get rid of the complexity around
vcpu affinity logic. There aren''t that many users of c_h_o_c and most
are
still trivially correct. I don''t think the changes to c_h_o_c have
introduced any more deadlocks, apart from the freeze_domains issue (which I
believe I have now fixed).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Apr-16 08:16 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>Sent: Friday, April 16, 2010 3:14 PM
>To: Jiang, Yunhong; Juergen Gross
>Cc: xen-devel@lists.xensource.com; Yu, Ke
>Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using
tasklets
>
>On 16/04/2010 07:42, "Jiang, Yunhong"
<yunhong.jiang@intel.com> wrote:
>
>> Thanks for pointing out the stop_machine_run deadlock issue.
>>
>> After more consideration and internally discussion, seems the key point
is,
>> the tasklet softirq is something like getting a lock for the current
vcpu''s
>> state(i.e. no one else could change that state unless this softirq is
>> finished).
>>
>> So any block action in softirq context, not just vcpu_pause_sync,
>> is dangerous and should be avoided (we can''t get a lock and do
block action
>> per my understanding).
>> This is because vcpu''s state can only be changed by schedule
softirq (am I
>> right on this?), while schedule softirq can''t prempt other
softirq. So, more
>> generally, anything that will be updated by a softirq context, and will
be
>> syncrhozed/blocking waitied in xen''s vcpu context is in fact a
implicit lock
>> held by the softirq.
>
>I think you''re at risk of over-analysing the situation. You cannot
safely
>synchronously pause vcpus from within softirq context. I''m not sure
we can
>extrapolate further than that.
>
>> To the tricky bug on the stop_machine_run(), I think it is in fact
similar to
>> the cpu_add_remove_lock. The stop_machine_run() is a block action, so
we must
>> make sure no one will be blocking to get the lock that is held by
>> stop_machine_run() already. At that time, we change all components that
try to
>> get the cpu_add_remove_lock to be try_lock.
>>
>> The changes caused by the tasklet is, a new implicit lock is added,
i.e. the
>> vcpu''s state.
>
>Let me be clear: the deadlock I described in stop_machine_run() is *not*
>introduced by the c_h_o_c reimplementation. It has been there all along.
>Nothing in my description of the deadlock depended on the implementation of
>c_h_o_c: it depended only on the baheviour of stop_machine_run itself, which
>does not internally use c_h_o_c.
Oops, yes, this should be there already, without c_h_o_c.
So, if another vcpu is trying to vcpu_pause() this vcpu, it will deadlock
becaues this vcpu can''t be paused, right? (I assume the same reason to
hvmop_flush_tlb_all).

If this is true, then how about checking !vcpu_runnable(current) in
stop_machine_run()''s stopmachine_set_state()? If this check failed, it
means someone try to change our state and potential deadlock may happen, we can
cancel this stop_machine_run()? This is at least a bit cleaner than the timeout
mechanism.
>
>> The straightforward method is like the cpu_add_remove_lock: make
everything
>> that waiting for the vcpu state change to do softirq between the
checking.
>
>Possible, could have impacts of its own of course. We couldn''t do
>SCHEDULE_SOFTIRQ as we would lose the caller''s context, and the
caller could
>not hold locks when pausing others (although I suppose they generally
can''t
>anyway).
>
>> Maybe the cleaner way is your previous suggestion, that is, put the
>> stop_machine_run() in the idle_vcpu(), another way is, turn back to the
>> original method, i.e. do it in the schedule_tail.
>
>Argh. That would be annoying!
Seems do it in the schedule_tail will not benifit this deadlock issues.
>
>Another possibility is to... shudder... introduce a timeout. Wait only e.g.,
>1ms for all vcpus to enter the STOPMACHINE_PREPARE state and if they
don''t,
>cancel the operation and return -EBUSY. There are already a bunch of
>opportunities to return ''spurious'' -EBUSY already in the
cpu-offline paths,
>so tools already need to know to retry at least a certain number of times.
>Undoubtedly this is the dirty solution, but it is easy-ish to implement and
>should only be allowing us to avoid an extremely rare deadlock possibility.
>It would just be critical to pick a large enough timeout!
>
>> Also We are not sure why the continue_hypercall_on_cpu is changed to
use
>> tasklet. What''s the benifit for it? At least I think this is
quite confusing,
>> because per our understanding, usually hypercall is assumed to execut
in
>> current context, while this change break the assumption. So any
hypercall that
>> may use this _c_h_o_c, and any function called by that hypercall,
should be
>> aware of this, I''m not sure if this is really so correct, at
least it may
>> cause trouble if someone use this without realize the limitation. From
Juergen
>> Gross''s mail, it seems for cpupool, but I have no idea of the
cpupool :-(
>
>The implementation is simpler and lets us get rid of the complexity around
>vcpu affinity logic. There aren''t that many users of c_h_o_c and
most are
>still trivially correct. I don''t think the changes to c_h_o_c have
>introduced any more deadlocks, apart from the freeze_domains issue (which I
>believe I have now fixed).
Yes, c_h_o_c does not introduce any more deadlock, my concern is, it''s
name maybe confusing if one try to use it for other hypercall . After all, if a
hypercall and function used by the hypercall depends on current, it should not
use this c_h_o_c.

--jyh
>
> -- Keir
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Apr-16 08:30 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Juergen, thanks for your explaination.

--jyh
>-----Original Message-----
>From: Juergen Gross [mailto:juergen.gross@ts.fujitsu.com]
>Sent: Friday, April 16, 2010 2:56 PM
>To: Jiang, Yunhong
>Cc: Keir Fraser; xen-devel@lists.xensource.com; Yu, Ke
>Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using
tasklets
>
>Jiang, Yunhong wrote:
>>
>>> -----Original Message-----
>>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>> Sent: Thursday, April 15, 2010 7:07 PM
>>> To: Jiang, Yunhong; Juergen Gross
>>> Cc: xen-devel@lists.xensource.com; Yu, Ke
>>> Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework
using tasklets
>>>
>>> On 15/04/2010 10:59, "Jiang, Yunhong"
<yunhong.jiang@intel.com> wrote:
>>>
>>>>> Actually that''s a good example because it now
won''t work, but for other
>>>>> reasons! The hypercall continuation can interrupt another
vcpu''s execution,
>>>>> and then try to synchronously pause that vcpu. Which will
deadlock.
>>>>>
>>>>> Luckily I think we can re-jig this code to freeze_domains()
before doing the
>>>>> continue_hypercall_on_cpu(). I''ve cc''ed
one of the CPU RAS guys. :-)
>>>> Hmm, I have cc''ed one of the PM guys because it is
enter_state :-)
>>>> Can we add check in vcpu_sleep_sync() for current? It is
meaningless to
>>>> cpu_relax for current vcpu in that situation, especially if we
are not in irq
>>>> context.
>>>> I''m not sure why in freeze_domains it only checkes
dom0''s vcpu for current,
>>>> instead of all domains.
>>> Well actually pausing any vcpu from within the hypercall
continuation is
>>> dangerous. The softirq handler running the hypercall continuation
may have
>>> interrupted some running VCPU X. And the VCPU Y that the
continuation is
>>> currently trying to pause may itself be trying to pause X. So we
can get a
>>> deadlock that way. The freeze_domains() *has* to be pulled outside
of the
>>> hypercall continuation.
>>>
>>> It''s a little bit similar to the super-subtle
stop_machine_run deadlock
>>> possibility I just emailed to you a second ago. :-)
>>
>> Thanks for pointing out the stop_machine_run deadlock issue.
>>
>> After more consideration and internally discussion, seems the key point
is, the
>tasklet softirq is something like getting a lock for the current
vcpu''s state(i.e. no one
>else could change that state unless this softirq is finished). So any block
action in
>softirq context, not just vcpu_pause_sync, is dangerous and should be
avoided (we
>can''t get a lock and do block action per my understanding).
>> This is because vcpu''s state can only be changed by schedule
softirq (am I right on
>this?), while schedule softirq can''t prempt other softirq. So, more
generally, anything
>that will be updated by a softirq context, and will be syncrhozed/blocking
waitied in
>xen''s vcpu context is in fact a implicit lock held by the softirq.
>>
>> To the tricky bug on the stop_machine_run(), I think it is in fact
similar to the
>cpu_add_remove_lock. The stop_machine_run() is a block action, so we must
make
>sure no one will be blocking to get the lock that is held by
stop_machine_run()
>already. At that time, we change all components that try to get the
>cpu_add_remove_lock to be try_lock.
>>
>> The changes caused by the tasklet is, a new implicit lock is added,
i.e. the vcpu''s
>state.
>> The straightforward method is like the cpu_add_remove_lock: make
everything
>that waiting for the vcpu state change to do softirq between the checking.
Maybe
>the cleaner way is your previous suggestion, that is, put the
stop_machine_run() in
>the idle_vcpu(), another way is, turn back to the original method, i.e. do
it in the
>schedule_tail.
>>
>> Also We are not sure why the continue_hypercall_on_cpu is changed to
use tasklet.
>What''s the benifit for it? At least I think this is quite
confusing, because per our
>understanding, usually hypercall is assumed to execut in current context,
while this
>change break the assumption. So any hypercall that may use this _c_h_o_c,
and any
>function called by that hypercall, should be aware of this, I''m not
sure if this is really
>so correct, at least it may cause trouble if someone use this without
realize the
>limitation. From Juergen Gross''s mail, it seems for cpupool, but I
have no idea of the
>cpupool :-(
>
>Cpupools introduce something like "scheduling domains" in xen.
Each cpupool
>owns a set of physical cpus and has an own scheduler. Each domain is member
>of a cpupool.
>
>It is possible to move cpus or domains between pools, but a domain is always
>limited to the physical cpus being in the cpupool of the domain.
>
>This limitation makes it impossible to use continue_hypercall_on_cpu with
>cpupools for any physical cpu without changing it. My original solution
>temporarily moved the target cpu into the cpupool of the issuing domain,
>but this was regarded as an ugly hack.
>
>
>Juergen
>
>--
>Juergen Gross                 Principal Developer Operating Systems
>TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
>Fujitsu Technology Solutions              e-mail:
juergen.gross@ts.fujitsu.com
>Domagkstr. 28                           Internet: ts.fujitsu.com
>D-80807 Muenchen                 Company details:
>ts.fujitsu.com/imprint.html
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu, Ke

2010-Apr-16 09:20 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Thursday, April 15, 2010 7:23 PM
> To: Jiang, Yunhong
> Cc: xen-devel@lists.xensource.com; Yu, Ke
> Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using
> tasklets
> 
> On 15/04/2010 12:06, "Keir Fraser"
<keir.fraser@eu.citrix.com> wrote:
> 
> >> Hmm, I have cc''ed one of the PM guys because it is
enter_state :-)
> >> Can we add check in vcpu_sleep_sync() for current? It is
meaningless to
> >> cpu_relax for current vcpu in that situation, especially if we are
not in irq
> >> context.
> >> I''m not sure why in freeze_domains it only checkes
dom0''s vcpu for
> current,
> >> instead of all domains.
> >
> > Well actually pausing any vcpu from within the hypercall continuation
is
> > dangerous. The softirq handler running the hypercall continuation may
> have
> > interrupted some running VCPU X. And the VCPU Y that the continuation
> is
> > currently trying to pause may itself be trying to pause X. So we can
get a
> > deadlock that way. The freeze_domains() *has* to be pulled outside of
the
> > hypercall continuation.
> 
> Changesets 21180 and 21181 in xen-unstable are my attempts to fix this. Yu
> Ke, please let me know if these look and test okay for you guys.
> 
We have a stress S3 test upon the Xen 4.1 unstable. Unfortunately, cset 21181
cannot pass the stress. It hangs after 48 times S3 suspend/resume.  Another
round test even shows it hangs after 2 times suspend/resume. But it is too early
to say cset 21181/21180 is the evil, because even cset 21172 (cset ahead of
continue_hypercall_on_cpu improvement patch) cannot pass the S3 test either,
although it has bigger success suspend/resume times than cset 21181 . so
generally, there must be something wrong with S3 logic since Xen 4.0 release. We
are still trying to dig it out...

Best Regards
Ke 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-16 17:51 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 16/04/2010 10:20, "Yu, Ke" <ke.yu@intel.com> wrote:
>> Changesets 21180 and 21181 in xen-unstable are my attempts to fix this.
Yu
>> Ke, please let me know if these look and test okay for you guys.
>> 
> 
> We have a stress S3 test upon the Xen 4.1 unstable. Unfortunately, cset
21181
> cannot pass the stress. It hangs after 48 times S3 suspend/resume.  Another
> round test even shows it hangs after 2 times suspend/resume. But it is too
> early to say cset 21181/21180 is the evil, because even cset 21172 (cset
ahead
> of continue_hypercall_on_cpu improvement patch) cannot pass the S3 test
> either, although it has bigger success suspend/resume times than cset 21181
.
> so generally, there must be something wrong with S3 logic since Xen 4.0
> release. We are still trying to dig it out...
21172 was not a good changeset to pick. I suggest trying xen-unstable tip
with just 21181/21180 reverted.

 Thanks,
 Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-16 17:57 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 16/04/2010 09:16, "Jiang, Yunhong" <yunhong.jiang@intel.com>
wrote:
> Oops, yes, this should be there already, without c_h_o_c.
> So, if another vcpu is trying to vcpu_pause() this vcpu, it will deadlock
> becaues this vcpu can''t be paused, right? (I assume the same
reason to
> hvmop_flush_tlb_all).
Yes.
> If this is true, then how about checking !vcpu_runnable(current) in
> stop_machine_run()''s stopmachine_set_state()? If this check
failed, it means
> someone try to change our state and potential deadlock may happen, we can
> cancel this stop_machine_run()? This is at least a bit cleaner than the
> timeout mechanism.
Interesting... That could work. But...
>>> Maybe the cleaner way is your previous suggestion, that is, put the
>>> stop_machine_run() in the idle_vcpu(), another way is, turn back to
the
>>> original method, i.e. do it in the schedule_tail.
>> 
>> Argh. That would be annoying!
> 
> Seems do it in the schedule_tail will not benifit this deadlock issues.
Yes, but now I think about it, scheduling c_h_o_c() and s_m_r() in idle-vcpu
context instead of softirq context might not be *that* hard to do, and it
avoids all these subtle deadlock possibilities. I think I''m warming to
it
compared with the alternatives.
> Yes, c_h_o_c does not introduce any more deadlock, my concern is,
it''s name
> maybe confusing if one try to use it for other hypercall . After all, if a
> hypercall and function used by the hypercall depends on current, it should
not
> use this c_h_o_c.
Well, I think there are two issues: (1) we run the continuation as a
softirq; (2) we don''t run the continuation in the context of the
original
vcpu. I think (1) is a bigger problem than (2) as it introduces the
possibility of all these nasty subtle deadlocks. (2) is obviously not
desirable, *but* I don''t think any callers much care about the context
of
the original caller, *and* if they do the resulting bug is generally going
to be pretty obvious. That is, the hypercall just won''t work, ever --
it''s
much less likely than (1) to result in very rare very subtle bugs.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Apr-19 05:53 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

>-----Original Message-----
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>Sent: Saturday, April 17, 2010 1:58 AM
>To: Jiang, Yunhong; Juergen Gross
>Cc: xen-devel@lists.xensource.com; Yu, Ke
>Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using
tasklets
>
>On 16/04/2010 09:16, "Jiang, Yunhong"
<yunhong.jiang@intel.com> wrote:
>
>> Oops, yes, this should be there already, without c_h_o_c.
>> So, if another vcpu is trying to vcpu_pause() this vcpu, it will
deadlock
>> becaues this vcpu can''t be paused, right? (I assume the same
reason to
>> hvmop_flush_tlb_all).
>
>Yes.
>
>> If this is true, then how about checking !vcpu_runnable(current) in
>> stop_machine_run()''s stopmachine_set_state()? If this check
failed, it means
>> someone try to change our state and potential deadlock may happen, we
can
>> cancel this stop_machine_run()? This is at least a bit cleaner than the
>> timeout mechanism.
>
>Interesting... That could work. But...
I agree that idle vcpu method is more cleaner in the long run.
>
>>>> Maybe the cleaner way is your previous suggestion, that is, put
the
>>>> stop_machine_run() in the idle_vcpu(), another way is, turn
back to the
>>>> original method, i.e. do it in the schedule_tail.
>>>
>>> Argh. That would be annoying!
>>
>> Seems do it in the schedule_tail will not benifit this deadlock issues.
>
>Yes, but now I think about it, scheduling c_h_o_c() and s_m_r() in idle-vcpu
>context instead of softirq context might not be *that* hard to do, and it
>avoids all these subtle deadlock possibilities. I think I''m warming
to it
>compared with the alternatives.
>
>> Yes, c_h_o_c does not introduce any more deadlock, my concern is,
it''s name
>> maybe confusing if one try to use it for other hypercall . After all,
if a
>> hypercall and function used by the hypercall depends on current, it
should not
>> use this c_h_o_c.
>
>Well, I think there are two issues: (1) we run the continuation as a
>softirq; (2) we don''t run the continuation in the context of the
original
>vcpu. I think (1) is a bigger problem than (2) as it introduces the
>possibility of all these nasty subtle deadlocks. (2) is obviously not
>desirable, *but* I don''t think any callers much care about the
context of
>the original caller, *and* if they do the resulting bug is generally going
>to be pretty obvious. That is, the hypercall just won''t work, ever
-- it''s
>much less likely than (1) to result in very rare very subtle bugs.
For issue 2, In fact this c_h_o_c is sometihng like xen action, i.e. it is some
utility provided by xen hypervisor that can be utilized by guest, so maybe we
can change the name of c_h_o_c, to be like call_xen_XXX?
To your idle_vcpu context work, I think it is a bit like hvm domain waiting for
a IO assist from qemu (i.e., use prepare_wait_on_xen_event_channel()), is it
right?

--jyh
>
> -- Keir
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-19 06:48 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 19/04/2010 06:53, "Jiang, Yunhong" <yunhong.jiang@intel.com>
wrote:
>> Well, I think there are two issues: (1) we run the continuation as a
>> softirq; (2) we don''t run the continuation in the context of
the original
>> vcpu. I think (1) is a bigger problem than (2) as it introduces the
>> possibility of all these nasty subtle deadlocks. (2) is obviously not
>> desirable, *but* I don''t think any callers much care about the
context of
>> the original caller, *and* if they do the resulting bug is generally
going
>> to be pretty obvious. That is, the hypercall just won''t work,
ever -- it''s
>> much less likely than (1) to result in very rare very subtle bugs.
> 
> For issue 2, In fact this c_h_o_c is sometihng like xen action, i.e. it is
> some utility provided by xen hypervisor that can be utilized by guest, so
> maybe we can change the name of c_h_o_c, to be like call_xen_XXX?
Well, the handler does provide the final hypercall return code, so it is
still a hypercall continuation even if it doesn''t run in the correct
vcpu''s
context.
> To your idle_vcpu context work, I think it is a bit like hvm domain waiting
> for a IO assist from qemu (i.e., use prepare_wait_on_xen_event_channel()),
is
> it right?
It''s easier than that because the work flow does not leave the
hypervisor
itself. So we can simply pause/unpause the guest vcpu -- exactly as we are
currently doing in the new tasklet-based c_h_o_c().

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-19 10:50 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 16/04/2010 18:51, "Keir Fraser" <keir.fraser@eu.citrix.com>
wrote:
>> We have a stress S3 test upon the Xen 4.1 unstable. Unfortunately, cset
21181
>> cannot pass the stress. It hangs after 48 times S3 suspend/resume. 
Another
>> round test even shows it hangs after 2 times suspend/resume. But it is
too
>> early to say cset 21181/21180 is the evil, because even cset 21172
(cset
>> ahead
>> of continue_hypercall_on_cpu improvement patch) cannot pass the S3 test
>> either, although it has bigger success suspend/resume times than cset
21181 .
>> so generally, there must be something wrong with S3 logic since Xen 4.0
>> release. We are still trying to dig it out...
> 
> 21172 was not a good changeset to pick. I suggest trying xen-unstable tip
> with just 21181/21180 reverted.
Actually, try xen-unstable staging tip (c/s 21202). I''ve just
reimplemented
tasklets, so things are quite different at latest tip.

 Thanks,
 Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Yu, Ke

2010-Apr-19 12:04 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Monday, April 19, 2010 6:51 PM
> To: Yu, Ke; Jiang, Yunhong
> Cc: xen-devel@lists.xensource.com
> Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using
> tasklets
> 
> On 16/04/2010 18:51, "Keir Fraser"
<keir.fraser@eu.citrix.com> wrote:
> 
> >> We have a stress S3 test upon the Xen 4.1 unstable. Unfortunately,
cset
> 21181
> >> cannot pass the stress. It hangs after 48 times S3 suspend/resume.
> Another
> >> round test even shows it hangs after 2 times suspend/resume. But
it is
> too
> >> early to say cset 21181/21180 is the evil, because even cset 21172
(cset
> >> ahead
> >> of continue_hypercall_on_cpu improvement patch) cannot pass the S3
> test
> >> either, although it has bigger success suspend/resume times than
cset
> 21181 .
> >> so generally, there must be something wrong with S3 logic since
Xen 4.0
> >> release. We are still trying to dig it out...
> >
> > 21172 was not a good changeset to pick. I suggest trying xen-unstable
tip
> > with just 21181/21180 reverted.
> 
> Actually, try xen-unstable staging tip (c/s 21202). I''ve just
reimplemented
> tasklets, so things are quite different at latest tip.
> 
>  Thanks,
>  Keir
> 
Yeah, move those stuff to idle cpu context is cleaner. I will try this.

Best Regards
Ke

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wei, Gang

2010-Apr-20 05:35 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On Monday, 2010-4-19 6:51 PM, Keir Fraser wrote:> On 16/04/2010 18:51, "Keir Fraser"
<keir.fraser@eu.citrix.com> wrote:
> 
>>> We have a stress S3 test upon the Xen 4.1 unstable. Unfortunately,
>>> cset 21181 cannot pass the stress. It hangs after 48 times S3
>>> suspend/resume.  Another round test even shows it hangs after 2
>>> times suspend/resume. But it is too early to say cset 21181/21180
>>> is the evil, because even cset 21172 (cset ahead of
>>> continue_hypercall_on_cpu improvement patch) cannot pass the S3
>>> test either, although it has bigger success suspend/resume times
>>> than cset 21181 . so generally, there must be something wrong with
>>> S3 logic since Xen 4.0 release. We are still trying to dig it
>>> out...  
>> 
>> 21172 was not a good changeset to pick. I suggest trying
>> xen-unstable tip with just 21181/21180 reverted.
> 
> Actually, try xen-unstable staging tip (c/s 21202). I''ve just
> reimplemented tasklets, so things are quite different at latest tip.
Just tried c/s 21202, hung up during the 285th S3 resume on system with only
legacy HPET broadcast. And it is more easy to hung up on HPET MSI broadcast
capable system - need only <100 times.

Nothing wrong in the serial log, and ctrl-a didn''t respond. I will try
to find a debuging system with ITP to figure out the hanging point. Meanwhile, I
will try cpu online/offline stress test see whether it is a general problem in
cpu online/offline or just S3 specific.

Jimmy
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Apr-20 12:51 UTC

head link

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On 20/04/2010 06:35, "Wei, Gang" <gang.wei@intel.com> wrote:
>>> 21172 was not a good changeset to pick. I suggest trying
>>> xen-unstable tip with just 21181/21180 reverted.
>> 
>> Actually, try xen-unstable staging tip (c/s 21202). I''ve just
>> reimplemented tasklets, so things are quite different at latest tip.
> 
> Just tried c/s 21202, hung up during the 285th S3 resume on system with
only
> legacy HPET broadcast. And it is more easy to hung up on HPET MSI broadcast
> capable system - need only <100 times.
> 
> Nothing wrong in the serial log, and ctrl-a didn''t respond. I will
try to find
> a debuging system with ITP to figure out the hanging point. Meanwhile, I
will
> try cpu online/offline stress test see whether it is a general problem in
cpu
> online/offline or just S3 specific.
How reliable is S3 in 4.0, for comparison? Did you get it rock solid?

Also, try c/s 21204 before getting into heavy debugging. It fixes being
able to access VMCS fields from c_h_o_c() context. Worth a try.

 -- Keir




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wei, Gang

2010-Apr-20 14:10 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On Tuesday, 2010-4-20 8:52 PM, Keir Fraser wrote:> How reliable is S3 in 4.0, for comparison? Did you get it rock solid?
S3 in 4.0 just pass several 100-time S3 stress test. It is not that rock solid.
So I am not sure whether it is a regression or not yet.
> Also, try c/s 21204 before getting into heavy debugging. It fixes
> being able to access VMCS fields from c_h_o_c() context. Worth a try.
I will give 21204 a try.

Jimmy
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Wei, Gang

2010-Apr-21 06:47 UTC

head link

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

On Tuesday, 2010-4-20 8:52 PM, Keir Fraser wrote:> Also, try c/s 21204 before getting into heavy debugging. It fixes
> being able to access VMCS fields from c_h_o_c() context. Worth a try.
Tried c/s 21208, still can''t pass 100-time S3 test. Hung up after 30~60
times.

Jimmy
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2010 - [Patch] continue_hypercall_on_cpu rework using tasklets

[Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets

RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets