Juergen Gross
2010-Apr-13 13:19 UTC
[Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Hi, attached is a patch to use tasklets for continue_hypercall_on_cpu instead of temporarily pinning the vcpu to the target physical cpu. This is thought as base for cpupools as Keir requested to get rid of the "borrow cpu" stuff in my original solution. This is a rework of the patch I sent before based on some comments by Jan Beulich. Keir, if you agree to take this patch, I''ll send out the cpupool stuff based on this patch again. There is just one question pending: cpupools are using continue_hypercall_on_cpu, but this function is defined for x86 only, not for ia64. I see 2 possible solutions: 1. make continue_hypercall_on_cpu available on ia64, too 2. make cpupools a pure x86 feature I think the first solution would be better, but I would need help on this as I don''t have any test machine. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Apr-13 13:40 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
>>> Juergen Gross <juergen.gross@ts.fujitsu.com> 13.04.10 15:19 >>> >cpupools are using continue_hypercall_on_cpu, but this function is defined for >x86 only, not for ia64. I see 2 possible solutions: > >1. make continue_hypercall_on_cpu available on ia64, too >2. make cpupools a pure x86 featureIs there anything meaningful in the new code that''s really x86-specific (i.e. can''t the whole code chunk be moved to xen/common/)? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-13 13:49 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Jan Beulich wrote:>>>> Juergen Gross <juergen.gross@ts.fujitsu.com> 13.04.10 15:19 >>> >> cpupools are using continue_hypercall_on_cpu, but this function is defined for >> x86 only, not for ia64. I see 2 possible solutions: >> >> 1. make continue_hypercall_on_cpu available on ia64, too >> 2. make cpupools a pure x86 feature > > Is there anything meaningful in the new code that''s really x86-specific > (i.e. can''t the whole code chunk be moved to xen/common/)?My main concern is the difference in the handling of schedule_tail. I just can''t tell how to change the ia64 variant to make it work correctly. The other x86-specific part is the setting of the return register, but this is rather easy doable by using an architecture specific macro. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-13 14:41 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 13/04/2010 14:19, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:> attached is a patch to use tasklets for continue_hypercall_on_cpu instead of > temporarily pinning the vcpu to the target physical cpu. > > This is thought as base for cpupools as Keir requested to get rid of the > "borrow cpu" stuff in my original solution.Why do you change the interface of continue_hypercall_on_cpu()? What''s a ''hdl'' anyway? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-13 15:08 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 13/04/2010 14:49, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>>> 1. make continue_hypercall_on_cpu available on ia64, too >>> 2. make cpupools a pure x86 feature >> >> Is there anything meaningful in the new code that''s really x86-specific >> (i.e. can''t the whole code chunk be moved to xen/common/)? > > My main concern is the difference in the handling of schedule_tail. I just > can''t tell how to change the ia64 variant to make it work correctly.We were messing with schedule_tail in order to get control into the hypercall continuation in the context of the same vcpu but on a different pcpu. Now that you just execute the continuation in a straightforward softirq/tasklet context, you don''t need to mess with schedule_tail at all... Right? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-14 04:26 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Keir Fraser wrote:> On 13/04/2010 14:19, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >> attached is a patch to use tasklets for continue_hypercall_on_cpu instead of >> temporarily pinning the vcpu to the target physical cpu. >> >> This is thought as base for cpupools as Keir requested to get rid of the >> "borrow cpu" stuff in my original solution. > > Why do you change the interface of continue_hypercall_on_cpu()? What''s a > ''hdl'' anyway?I need a way to find the migrate_info structure in case of nested calls of continue_hypercall_on_cpu(). Originally this was done by storing it in the vcpu structure, but this can''t be done any more using tasklets. In my first attempt I saved it in the per-cpu area, but this approach isn''t working if continue_hypercall_on_cpu() is called concurrently. So the cleanest way is to pass it via a parameter. ''hdl'' is just a ''handle''. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-14 06:46 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 14/04/2010 05:26, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> Why do you change the interface of continue_hypercall_on_cpu()? What''s a >> ''hdl'' anyway? > > I need a way to find the migrate_info structure in case of nested calls of > continue_hypercall_on_cpu(). Originally this was done by storing it in the > vcpu structure, but this can''t be done any more using tasklets. In my first > attempt I saved it in the per-cpu area, but this approach isn''t working if > continue_hypercall_on_cpu() is called concurrently. So the cleanest way is > to pass it via a parameter.The per-cpu area method should work fine, since Xen is non-preemptive? I don''t think the concurrency you are concerned about can happen. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-14 06:54 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Keir Fraser wrote:> On 13/04/2010 14:49, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>>> 1. make continue_hypercall_on_cpu available on ia64, too >>>> 2. make cpupools a pure x86 feature >>> Is there anything meaningful in the new code that''s really x86-specific >>> (i.e. can''t the whole code chunk be moved to xen/common/)? >> My main concern is the difference in the handling of schedule_tail. I just >> can''t tell how to change the ia64 variant to make it work correctly. > > We were messing with schedule_tail in order to get control into the > hypercall continuation in the context of the same vcpu but on a different > pcpu. Now that you just execute the continuation in a straightforward > softirq/tasklet context, you don''t need to mess with schedule_tail at all... > Right?Aah, you mean something like the attached patch? Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-14 06:58 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Keir Fraser wrote:> On 14/04/2010 05:26, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>> Why do you change the interface of continue_hypercall_on_cpu()? What''s a >>> ''hdl'' anyway? >> I need a way to find the migrate_info structure in case of nested calls of >> continue_hypercall_on_cpu(). Originally this was done by storing it in the >> vcpu structure, but this can''t be done any more using tasklets. In my first >> attempt I saved it in the per-cpu area, but this approach isn''t working if >> continue_hypercall_on_cpu() is called concurrently. So the cleanest way is >> to pass it via a parameter. > > The per-cpu area method should work fine, since Xen is non-preemptive? I > don''t think the concurrency you are concerned about can happen.The tasklet knows only on which cpu it is running, so the data has to be stored on the target cpu. And one pcpu can be the target of concurrent calls from different calling cpus... Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-14 07:15 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 14/04/2010 07:58, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> The per-cpu area method should work fine, since Xen is non-preemptive? I >> don''t think the concurrency you are concerned about can happen. > > The tasklet knows only on which cpu it is running, so the data has to be > stored on the target cpu. And one pcpu can be the target of concurrent calls > from different calling cpus...A tasklet also takes an arbitrary ulong parameter, which you can cast to a pointer to your informational structure. The parameter is specified via tasklet_init(). That should suffice. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-14 07:25 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Keir Fraser wrote:> On 14/04/2010 07:58, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>> The per-cpu area method should work fine, since Xen is non-preemptive? I >>> don''t think the concurrency you are concerned about can happen. >> The tasklet knows only on which cpu it is running, so the data has to be >> stored on the target cpu. And one pcpu can be the target of concurrent calls >> from different calling cpus... > > A tasklet also takes an arbitrary ulong parameter, which you can cast to a > pointer to your informational structure. The parameter is specified via > tasklet_init(). That should suffice.I''m already using this. The problem is to find the original calling vcpu in case of a nested call of continue_hypercall_on_cpu() while not conflicting with concurrent calls from other vcpus which happen to address the same pcpu. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-14 07:35 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 14/04/2010 08:25, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> A tasklet also takes an arbitrary ulong parameter, which you can cast to a >> pointer to your informational structure. The parameter is specified via >> tasklet_init(). That should suffice. > > I''m already using this. The problem is to find the original calling vcpu in > case of a nested call of continue_hypercall_on_cpu() while not conflicting > with concurrent calls from other vcpus which happen to address the same pcpu.There can be only one nested invocation on any given pcpu, since a running invocation is never preempted. Hence on entry to c_h_o_c() you can check a per-cpu variable to see whether this invocation is nesting, or not. And if it is, that variable can be a pointer to an info structure which includes a pointer to the invoking vcpu. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-14 08:04 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Keir Fraser wrote:> On 14/04/2010 08:25, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>> A tasklet also takes an arbitrary ulong parameter, which you can cast to a >>> pointer to your informational structure. The parameter is specified via >>> tasklet_init(). That should suffice. >> I''m already using this. The problem is to find the original calling vcpu in >> case of a nested call of continue_hypercall_on_cpu() while not conflicting >> with concurrent calls from other vcpus which happen to address the same pcpu. > > There can be only one nested invocation on any given pcpu, since a running > invocation is never preempted. Hence on entry to c_h_o_c() you can check a > per-cpu variable to see whether this invocation is nesting, or not. And if > it is, that variable can be a pointer to an info structure which includes a > pointer to the invoking vcpu.Okay, attached is the modified patch again. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-14 10:30 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 14/04/2010 09:04, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> >> There can be only one nested invocation on any given pcpu, since a running >> invocation is never preempted. Hence on entry to c_h_o_c() you can check a >> per-cpu variable to see whether this invocation is nesting, or not. And if >> it is, that variable can be a pointer to an info structure which includes a >> pointer to the invoking vcpu. > > Okay, attached is the modified patch again.I cleaned up some more and applied as xen-unstable:21165 and xen-unstable:21166. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-15 06:31 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Keir Fraser wrote:> On 14/04/2010 09:04, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>> There can be only one nested invocation on any given pcpu, since a running >>> invocation is never preempted. Hence on entry to c_h_o_c() you can check a >>> per-cpu variable to see whether this invocation is nesting, or not. And if >>> it is, that variable can be a pointer to an info structure which includes a >>> pointer to the invoking vcpu. >> Okay, attached is the modified patch again. > > I cleaned up some more and applied as xen-unstable:21165 and > xen-unstable:21166.Thanks. Unfortunately, your modifications are not working. Microcode update hangs. My version worked without problems. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-15 06:39 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 15/04/2010 07:31, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>>> Okay, attached is the modified patch again. >> >> I cleaned up some more and applied as xen-unstable:21165 and >> xen-unstable:21166. > > Thanks. Unfortunately, your modifications are not working. Microcode update > hangs. My version worked without problems.What revision did you test? I put in some fixes as c/s 21173. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-15 07:57 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Keir Fraser wrote:> On 15/04/2010 07:31, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>>> Okay, attached is the modified patch again. >>> I cleaned up some more and applied as xen-unstable:21165 and >>> xen-unstable:21166. >> Thanks. Unfortunately, your modifications are not working. Microcode update >> hangs. My version worked without problems. > > What revision did you test? I put in some fixes as c/s 21173.My highest c/s was 21167. c/s 21173 is hanging, too (sorry for the delay, but I had to remove my cpupool stuff due to the scheduler changes for credit2). Is a call of sync_vcpu_execstate() fro a tasklet really allowed? I don''t think the ASSERTs in __sync_lazy_execstate() are all fulfilled in this case. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-15 08:13 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 15/04/2010 08:57, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> What revision did you test? I put in some fixes as c/s 21173. > > My highest c/s was 21167. > c/s 21173 is hanging, too (sorry for the delay, but I had to remove my cpupool > stuff due to the scheduler changes for credit2).Ah yes, just done a test myself (clearly my dom0 setup is not doing microcode updates) and I''ve now fixed it as c/s 21176. Thanks!> Is a call of sync_vcpu_execstate() fro a tasklet really allowed? I don''t > think the ASSERTs in __sync_lazy_execstate() are all fulfilled in this case.Better hope so or e.g., acpi_enter_sleep ->continue_hypercall_on_cpu(enter_state_helper) ->enter_state ->freeze_domains ->domain_pause ->vcpu_sleep_sync ->sync_vcpu_execstate Also wouldn''t work. There is only one ASSERT in __sync_lazy_execstate, and it''s safe for this case. Bear in mind that our softirqs always run in the context of whatever domain happens to be running on that cpu currently -- they don''t have their own proper vcpu context. By the by, your original attempt at synchronisation (spin on return value in regs changing) was risky as it could be unbounded time before the vcpu registers get copied out of the original cpu''s stack. Especially during early dom0 boot, when the system is very idle. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-15 08:22 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 15/04/2010 09:13, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> Better hope so or e.g., > acpi_enter_sleep > ->continue_hypercall_on_cpu(enter_state_helper) > ->enter_state > ->freeze_domains > ->domain_pause > ->vcpu_sleep_sync > ->sync_vcpu_execstate > Also wouldn''t work.Actually that''s a good example because it now won''t work, but for other reasons! The hypercall continuation can interrupt another vcpu''s execution, and then try to synchronously pause that vcpu. Which will deadlock. Luckily I think we can re-jig this code to freeze_domains() before doing the continue_hypercall_on_cpu(). I''ve cc''ed one of the CPU RAS guys. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-15 08:29 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Keir Fraser wrote:> On 15/04/2010 08:57, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote: > >>> What revision did you test? I put in some fixes as c/s 21173. >> My highest c/s was 21167. >> c/s 21173 is hanging, too (sorry for the delay, but I had to remove my cpupool >> stuff due to the scheduler changes for credit2). > > Ah yes, just done a test myself (clearly my dom0 setup is not doing > microcode updates) and I''ve now fixed it as c/s 21176. Thanks!Great! Works for me, too.> >> Is a call of sync_vcpu_execstate() fro a tasklet really allowed? I don''t >> think the ASSERTs in __sync_lazy_execstate() are all fulfilled in this case. > > Better hope so or e.g., > acpi_enter_sleep > ->continue_hypercall_on_cpu(enter_state_helper) > ->enter_state > ->freeze_domains > ->domain_pause > ->vcpu_sleep_sync > ->sync_vcpu_execstate > Also wouldn''t work. > > There is only one ASSERT in __sync_lazy_execstate, and it''s safe for this > case. Bear in mind that our softirqs always run in the context of whatever > domain happens to be running on that cpu currently -- they don''t have their > own proper vcpu context.I don''t see how it should be guaranteed that the current vcpu MUST be an idle one...> > By the by, your original attempt at synchronisation (spin on return value in > regs changing) was risky as it could be unbounded time before the vcpu > registers get copied out of the original cpu''s stack. Especially during > early dom0 boot, when the system is very idle.Indeed. I realized my version had another flaw in case of nested calls of c_h_o_c: if one call would return a non-zero value and issue another call of c_h_o_c, the tasklet would hang for ever... Your version is cleaner. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-15 09:08 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 15/04/2010 09:29, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> There is only one ASSERT in __sync_lazy_execstate, and it''s safe for this >> case. Bear in mind that our softirqs always run in the context of whatever >> domain happens to be running on that cpu currently -- they don''t have their >> own proper vcpu context. > > I don''t see how it should be guaranteed that the current vcpu MUST be an idle > one...It''s odd because someone else asked this exact same question, so the code must be more subtle than I thought. Note that the ASSERT is inside if(switch_required), and switch_required is true only if the vcpu whose state is loaded on this CPU is not the same as the currently-scheduled vcpu. This odd situation is only possible if we are lazy-syncing the former vcpu''s state, and that only occurs if the currently-running vcpu is the idle vcpu (as all other vcpus would need their state loaded immediately, so lazy sync does not apply). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Apr-15 09:59 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Thursday, April 15, 2010 4:22 PM >To: Juergen Gross >Cc: xen-devel@lists.xensource.com; Jiang, Yunhong >Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets > >On 15/04/2010 09:13, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >> Better hope so or e.g., >> acpi_enter_sleep >> ->continue_hypercall_on_cpu(enter_state_helper) >> ->enter_state >> ->freeze_domains >> ->domain_pause >> ->vcpu_sleep_sync >> ->sync_vcpu_execstate >> Also wouldn''t work. > >Actually that''s a good example because it now won''t work, but for other >reasons! The hypercall continuation can interrupt another vcpu''s execution, >and then try to synchronously pause that vcpu. Which will deadlock. > >Luckily I think we can re-jig this code to freeze_domains() before doing the >continue_hypercall_on_cpu(). I''ve cc''ed one of the CPU RAS guys. :-)Hmm, I have cc''ed one of the PM guys because it is enter_state :-) Can we add check in vcpu_sleep_sync() for current? It is meaningless to cpu_relax for current vcpu in that situation, especially if we are not in irq context. I''m not sure why in freeze_domains it only checkes dom0''s vcpu for current, instead of all domains. --jyh> > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-15 11:06 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 15/04/2010 10:59, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Actually that''s a good example because it now won''t work, but for other >> reasons! The hypercall continuation can interrupt another vcpu''s execution, >> and then try to synchronously pause that vcpu. Which will deadlock. >> >> Luckily I think we can re-jig this code to freeze_domains() before doing the >> continue_hypercall_on_cpu(). I''ve cc''ed one of the CPU RAS guys. :-) > > Hmm, I have cc''ed one of the PM guys because it is enter_state :-) > Can we add check in vcpu_sleep_sync() for current? It is meaningless to > cpu_relax for current vcpu in that situation, especially if we are not in irq > context. > I''m not sure why in freeze_domains it only checkes dom0''s vcpu for current, > instead of all domains.Well actually pausing any vcpu from within the hypercall continuation is dangerous. The softirq handler running the hypercall continuation may have interrupted some running VCPU X. And the VCPU Y that the continuation is currently trying to pause may itself be trying to pause X. So we can get a deadlock that way. The freeze_domains() *has* to be pulled outside of the hypercall continuation. It''s a little bit similar to the super-subtle stop_machine_run deadlock possibility I just emailed to you a second ago. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-15 11:22 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 15/04/2010 12:06, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> Hmm, I have cc''ed one of the PM guys because it is enter_state :-) >> Can we add check in vcpu_sleep_sync() for current? It is meaningless to >> cpu_relax for current vcpu in that situation, especially if we are not in irq >> context. >> I''m not sure why in freeze_domains it only checkes dom0''s vcpu for current, >> instead of all domains. > > Well actually pausing any vcpu from within the hypercall continuation is > dangerous. The softirq handler running the hypercall continuation may have > interrupted some running VCPU X. And the VCPU Y that the continuation is > currently trying to pause may itself be trying to pause X. So we can get a > deadlock that way. The freeze_domains() *has* to be pulled outside of the > hypercall continuation.Changesets 21180 and 21181 in xen-unstable are my attempts to fix this. Yu Ke, please let me know if these look and test okay for you guys. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Apr-16 06:42 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Thursday, April 15, 2010 7:07 PM >To: Jiang, Yunhong; Juergen Gross >Cc: xen-devel@lists.xensource.com; Yu, Ke >Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets > >On 15/04/2010 10:59, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >>> Actually that''s a good example because it now won''t work, but for other >>> reasons! The hypercall continuation can interrupt another vcpu''s execution, >>> and then try to synchronously pause that vcpu. Which will deadlock. >>> >>> Luckily I think we can re-jig this code to freeze_domains() before doing the >>> continue_hypercall_on_cpu(). I''ve cc''ed one of the CPU RAS guys. :-) >> >> Hmm, I have cc''ed one of the PM guys because it is enter_state :-) >> Can we add check in vcpu_sleep_sync() for current? It is meaningless to >> cpu_relax for current vcpu in that situation, especially if we are not in irq >> context. >> I''m not sure why in freeze_domains it only checkes dom0''s vcpu for current, >> instead of all domains. > >Well actually pausing any vcpu from within the hypercall continuation is >dangerous. The softirq handler running the hypercall continuation may have >interrupted some running VCPU X. And the VCPU Y that the continuation is >currently trying to pause may itself be trying to pause X. So we can get a >deadlock that way. The freeze_domains() *has* to be pulled outside of the >hypercall continuation. > >It''s a little bit similar to the super-subtle stop_machine_run deadlock >possibility I just emailed to you a second ago. :-)Thanks for pointing out the stop_machine_run deadlock issue. After more consideration and internally discussion, seems the key point is, the tasklet softirq is something like getting a lock for the current vcpu''s state(i.e. no one else could change that state unless this softirq is finished). So any block action in softirq context, not just vcpu_pause_sync, is dangerous and should be avoided (we can''t get a lock and do block action per my understanding). This is because vcpu''s state can only be changed by schedule softirq (am I right on this?), while schedule softirq can''t prempt other softirq. So, more generally, anything that will be updated by a softirq context, and will be syncrhozed/blocking waitied in xen''s vcpu context is in fact a implicit lock held by the softirq. To the tricky bug on the stop_machine_run(), I think it is in fact similar to the cpu_add_remove_lock. The stop_machine_run() is a block action, so we must make sure no one will be blocking to get the lock that is held by stop_machine_run() already. At that time, we change all components that try to get the cpu_add_remove_lock to be try_lock. The changes caused by the tasklet is, a new implicit lock is added, i.e. the vcpu''s state. The straightforward method is like the cpu_add_remove_lock: make everything that waiting for the vcpu state change to do softirq between the checking. Maybe the cleaner way is your previous suggestion, that is, put the stop_machine_run() in the idle_vcpu(), another way is, turn back to the original method, i.e. do it in the schedule_tail. Also We are not sure why the continue_hypercall_on_cpu is changed to use tasklet. What''s the benifit for it? At least I think this is quite confusing, because per our understanding, usually hypercall is assumed to execut in current context, while this change break the assumption. So any hypercall that may use this _c_h_o_c, and any function called by that hypercall, should be aware of this, I''m not sure if this is really so correct, at least it may cause trouble if someone use this without realize the limitation. From Juergen Gross''s mail, it seems for cpupool, but I have no idea of the cpupool :-( --jyh> > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Apr-16 06:45 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
BTW, I suspect if cpu online/offline lock should really wrap the stop_machine_run(). I remember Criping questioned this also. Will have a look on it. --jyh>-----Original Message----- >From: Jiang, Yunhong >Sent: Friday, April 16, 2010 2:43 PM >To: Keir Fraser; Juergen Gross >Cc: xen-devel@lists.xensource.com; Yu, Ke >Subject: RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets > > > >>-----Original Message----- >>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>Sent: Thursday, April 15, 2010 7:07 PM >>To: Jiang, Yunhong; Juergen Gross >>Cc: xen-devel@lists.xensource.com; Yu, Ke >>Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets >> >>On 15/04/2010 10:59, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >> >>>> Actually that''s a good example because it now won''t work, but for other >>>> reasons! The hypercall continuation can interrupt another vcpu''s execution, >>>> and then try to synchronously pause that vcpu. Which will deadlock. >>>> >>>> Luckily I think we can re-jig this code to freeze_domains() before doing the >>>> continue_hypercall_on_cpu(). I''ve cc''ed one of the CPU RAS guys. :-) >>> >>> Hmm, I have cc''ed one of the PM guys because it is enter_state :-) >>> Can we add check in vcpu_sleep_sync() for current? It is meaningless to >>> cpu_relax for current vcpu in that situation, especially if we are not in irq >>> context. >>> I''m not sure why in freeze_domains it only checkes dom0''s vcpu for current, >>> instead of all domains. >> >>Well actually pausing any vcpu from within the hypercall continuation is >>dangerous. The softirq handler running the hypercall continuation may have >>interrupted some running VCPU X. And the VCPU Y that the continuation is >>currently trying to pause may itself be trying to pause X. So we can get a >>deadlock that way. The freeze_domains() *has* to be pulled outside of the >>hypercall continuation. >> >>It''s a little bit similar to the super-subtle stop_machine_run deadlock >>possibility I just emailed to you a second ago. :-) > >Thanks for pointing out the stop_machine_run deadlock issue. > >After more consideration and internally discussion, seems the key point is, the >tasklet softirq is something like getting a lock for the current vcpu''s state(i.e. no one >else could change that state unless this softirq is finished). So any block action in >softirq context, not just vcpu_pause_sync, is dangerous and should be avoided (we >can''t get a lock and do block action per my understanding). >This is because vcpu''s state can only be changed by schedule softirq (am I right on >this?), while schedule softirq can''t prempt other softirq. So, more generally, anything >that will be updated by a softirq context, and will be syncrhozed/blocking waitied in >xen''s vcpu context is in fact a implicit lock held by the softirq. > >To the tricky bug on the stop_machine_run(), I think it is in fact similar to the >cpu_add_remove_lock. The stop_machine_run() is a block action, so we must make >sure no one will be blocking to get the lock that is held by stop_machine_run() >already. At that time, we change all components that try to get the >cpu_add_remove_lock to be try_lock. > >The changes caused by the tasklet is, a new implicit lock is added, i.e. the vcpu''s >state. >The straightforward method is like the cpu_add_remove_lock: make everything that >waiting for the vcpu state change to do softirq between the checking. Maybe the >cleaner way is your previous suggestion, that is, put the stop_machine_run() in the >idle_vcpu(), another way is, turn back to the original method, i.e. do it in the >schedule_tail. > >Also We are not sure why the continue_hypercall_on_cpu is changed to use tasklet. >What''s the benifit for it? At least I think this is quite confusing, because per our >understanding, usually hypercall is assumed to execut in current context, while this >change break the assumption. So any hypercall that may use this _c_h_o_c, and any >function called by that hypercall, should be aware of this, I''m not sure if this is really >so correct, at least it may cause trouble if someone use this without realize the >limitation. From Juergen Gross''s mail, it seems for cpupool, but I have no idea of the >cpupool :-( > >--jyh > > >> >> -- Keir >>_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2010-Apr-16 06:55 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Jiang, Yunhong wrote:> >> -----Original Message----- >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Thursday, April 15, 2010 7:07 PM >> To: Jiang, Yunhong; Juergen Gross >> Cc: xen-devel@lists.xensource.com; Yu, Ke >> Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets >> >> On 15/04/2010 10:59, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >> >>>> Actually that''s a good example because it now won''t work, but for other >>>> reasons! The hypercall continuation can interrupt another vcpu''s execution, >>>> and then try to synchronously pause that vcpu. Which will deadlock. >>>> >>>> Luckily I think we can re-jig this code to freeze_domains() before doing the >>>> continue_hypercall_on_cpu(). I''ve cc''ed one of the CPU RAS guys. :-) >>> Hmm, I have cc''ed one of the PM guys because it is enter_state :-) >>> Can we add check in vcpu_sleep_sync() for current? It is meaningless to >>> cpu_relax for current vcpu in that situation, especially if we are not in irq >>> context. >>> I''m not sure why in freeze_domains it only checkes dom0''s vcpu for current, >>> instead of all domains. >> Well actually pausing any vcpu from within the hypercall continuation is >> dangerous. The softirq handler running the hypercall continuation may have >> interrupted some running VCPU X. And the VCPU Y that the continuation is >> currently trying to pause may itself be trying to pause X. So we can get a >> deadlock that way. The freeze_domains() *has* to be pulled outside of the >> hypercall continuation. >> >> It''s a little bit similar to the super-subtle stop_machine_run deadlock >> possibility I just emailed to you a second ago. :-) > > Thanks for pointing out the stop_machine_run deadlock issue. > > After more consideration and internally discussion, seems the key point is, the tasklet softirq is something like getting a lock for the current vcpu''s state(i.e. no one else could change that state unless this softirq is finished). So any block action in softirq context, not just vcpu_pause_sync, is dangerous and should be avoided (we can''t get a lock and do block action per my understanding). > This is because vcpu''s state can only be changed by schedule softirq (am I right on this?), while schedule softirq can''t prempt other softirq. So, more generally, anything that will be updated by a softirq context, and will be syncrhozed/blocking waitied in xen''s vcpu context is in fact a implicit lock held by the softirq. > > To the tricky bug on the stop_machine_run(), I think it is in fact similar to the cpu_add_remove_lock. The stop_machine_run() is a block action, so we must make sure no one will be blocking to get the lock that is held by stop_machine_run() already. At that time, we change all components that try to get the cpu_add_remove_lock to be try_lock. > > The changes caused by the tasklet is, a new implicit lock is added, i.e. the vcpu''s state. > The straightforward method is like the cpu_add_remove_lock: make everything that waiting for the vcpu state change to do softirq between the checking. Maybe the cleaner way is your previous suggestion, that is, put the stop_machine_run() in the idle_vcpu(), another way is, turn back to the original method, i.e. do it in the schedule_tail. > > Also We are not sure why the continue_hypercall_on_cpu is changed to use tasklet. What''s the benifit for it? At least I think this is quite confusing, because per our understanding, usually hypercall is assumed to execut in current context, while this change break the assumption. So any hypercall that may use this _c_h_o_c, and any function called by that hypercall, should be aware of this, I''m not sure if this is really so correct, at least it may cause trouble if someone use this without realize the limitation. From Juergen Gross''s mail, it seems for cpupool, but I have no idea of the cpupool :-(Cpupools introduce something like "scheduling domains" in xen. Each cpupool owns a set of physical cpus and has an own scheduler. Each domain is member of a cpupool. It is possible to move cpus or domains between pools, but a domain is always limited to the physical cpus being in the cpupool of the domain. This limitation makes it impossible to use continue_hypercall_on_cpu with cpupools for any physical cpu without changing it. My original solution temporarily moved the target cpu into the cpupool of the issuing domain, but this was regarded as an ugly hack. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-16 07:13 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 16/04/2010 07:42, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> Thanks for pointing out the stop_machine_run deadlock issue. > > After more consideration and internally discussion, seems the key point is, > the tasklet softirq is something like getting a lock for the current vcpu''s > state(i.e. no one else could change that state unless this softirq is > finished). > > So any block action in softirq context, not just vcpu_pause_sync, > is dangerous and should be avoided (we can''t get a lock and do block action > per my understanding). > This is because vcpu''s state can only be changed by schedule softirq (am I > right on this?), while schedule softirq can''t prempt other softirq. So, more > generally, anything that will be updated by a softirq context, and will be > syncrhozed/blocking waitied in xen''s vcpu context is in fact a implicit lock > held by the softirq.I think you''re at risk of over-analysing the situation. You cannot safely synchronously pause vcpus from within softirq context. I''m not sure we can extrapolate further than that.> To the tricky bug on the stop_machine_run(), I think it is in fact similar to > the cpu_add_remove_lock. The stop_machine_run() is a block action, so we must > make sure no one will be blocking to get the lock that is held by > stop_machine_run() already. At that time, we change all components that try to > get the cpu_add_remove_lock to be try_lock. > > The changes caused by the tasklet is, a new implicit lock is added, i.e. the > vcpu''s state.Let me be clear: the deadlock I described in stop_machine_run() is *not* introduced by the c_h_o_c reimplementation. It has been there all along. Nothing in my description of the deadlock depended on the implementation of c_h_o_c: it depended only on the baheviour of stop_machine_run itself, which does not internally use c_h_o_c.> The straightforward method is like the cpu_add_remove_lock: make everything > that waiting for the vcpu state change to do softirq between the checking.Possible, could have impacts of its own of course. We couldn''t do SCHEDULE_SOFTIRQ as we would lose the caller''s context, and the caller could not hold locks when pausing others (although I suppose they generally can''t anyway).> Maybe the cleaner way is your previous suggestion, that is, put the > stop_machine_run() in the idle_vcpu(), another way is, turn back to the > original method, i.e. do it in the schedule_tail.Argh. That would be annoying! Another possibility is to... shudder... introduce a timeout. Wait only e.g., 1ms for all vcpus to enter the STOPMACHINE_PREPARE state and if they don''t, cancel the operation and return -EBUSY. There are already a bunch of opportunities to return ''spurious'' -EBUSY already in the cpu-offline paths, so tools already need to know to retry at least a certain number of times. Undoubtedly this is the dirty solution, but it is easy-ish to implement and should only be allowing us to avoid an extremely rare deadlock possibility. It would just be critical to pick a large enough timeout!> Also We are not sure why the continue_hypercall_on_cpu is changed to use > tasklet. What''s the benifit for it? At least I think this is quite confusing, > because per our understanding, usually hypercall is assumed to execut in > current context, while this change break the assumption. So any hypercall that > may use this _c_h_o_c, and any function called by that hypercall, should be > aware of this, I''m not sure if this is really so correct, at least it may > cause trouble if someone use this without realize the limitation. From Juergen > Gross''s mail, it seems for cpupool, but I have no idea of the cpupool :-(The implementation is simpler and lets us get rid of the complexity around vcpu affinity logic. There aren''t that many users of c_h_o_c and most are still trivially correct. I don''t think the changes to c_h_o_c have introduced any more deadlocks, apart from the freeze_domains issue (which I believe I have now fixed). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Apr-16 08:16 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Friday, April 16, 2010 3:14 PM >To: Jiang, Yunhong; Juergen Gross >Cc: xen-devel@lists.xensource.com; Yu, Ke >Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets > >On 16/04/2010 07:42, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> Thanks for pointing out the stop_machine_run deadlock issue. >> >> After more consideration and internally discussion, seems the key point is, >> the tasklet softirq is something like getting a lock for the current vcpu''s >> state(i.e. no one else could change that state unless this softirq is >> finished). >> >> So any block action in softirq context, not just vcpu_pause_sync, >> is dangerous and should be avoided (we can''t get a lock and do block action >> per my understanding). >> This is because vcpu''s state can only be changed by schedule softirq (am I >> right on this?), while schedule softirq can''t prempt other softirq. So, more >> generally, anything that will be updated by a softirq context, and will be >> syncrhozed/blocking waitied in xen''s vcpu context is in fact a implicit lock >> held by the softirq. > >I think you''re at risk of over-analysing the situation. You cannot safely >synchronously pause vcpus from within softirq context. I''m not sure we can >extrapolate further than that. > >> To the tricky bug on the stop_machine_run(), I think it is in fact similar to >> the cpu_add_remove_lock. The stop_machine_run() is a block action, so we must >> make sure no one will be blocking to get the lock that is held by >> stop_machine_run() already. At that time, we change all components that try to >> get the cpu_add_remove_lock to be try_lock. >> >> The changes caused by the tasklet is, a new implicit lock is added, i.e. the >> vcpu''s state. > >Let me be clear: the deadlock I described in stop_machine_run() is *not* >introduced by the c_h_o_c reimplementation. It has been there all along. >Nothing in my description of the deadlock depended on the implementation of >c_h_o_c: it depended only on the baheviour of stop_machine_run itself, which >does not internally use c_h_o_c.Oops, yes, this should be there already, without c_h_o_c. So, if another vcpu is trying to vcpu_pause() this vcpu, it will deadlock becaues this vcpu can''t be paused, right? (I assume the same reason to hvmop_flush_tlb_all). If this is true, then how about checking !vcpu_runnable(current) in stop_machine_run()''s stopmachine_set_state()? If this check failed, it means someone try to change our state and potential deadlock may happen, we can cancel this stop_machine_run()? This is at least a bit cleaner than the timeout mechanism.> >> The straightforward method is like the cpu_add_remove_lock: make everything >> that waiting for the vcpu state change to do softirq between the checking. > >Possible, could have impacts of its own of course. We couldn''t do >SCHEDULE_SOFTIRQ as we would lose the caller''s context, and the caller could >not hold locks when pausing others (although I suppose they generally can''t >anyway). > >> Maybe the cleaner way is your previous suggestion, that is, put the >> stop_machine_run() in the idle_vcpu(), another way is, turn back to the >> original method, i.e. do it in the schedule_tail. > >Argh. That would be annoying!Seems do it in the schedule_tail will not benifit this deadlock issues.> >Another possibility is to... shudder... introduce a timeout. Wait only e.g., >1ms for all vcpus to enter the STOPMACHINE_PREPARE state and if they don''t, >cancel the operation and return -EBUSY. There are already a bunch of >opportunities to return ''spurious'' -EBUSY already in the cpu-offline paths, >so tools already need to know to retry at least a certain number of times. >Undoubtedly this is the dirty solution, but it is easy-ish to implement and >should only be allowing us to avoid an extremely rare deadlock possibility. >It would just be critical to pick a large enough timeout! > >> Also We are not sure why the continue_hypercall_on_cpu is changed to use >> tasklet. What''s the benifit for it? At least I think this is quite confusing, >> because per our understanding, usually hypercall is assumed to execut in >> current context, while this change break the assumption. So any hypercall that >> may use this _c_h_o_c, and any function called by that hypercall, should be >> aware of this, I''m not sure if this is really so correct, at least it may >> cause trouble if someone use this without realize the limitation. From Juergen >> Gross''s mail, it seems for cpupool, but I have no idea of the cpupool :-( > >The implementation is simpler and lets us get rid of the complexity around >vcpu affinity logic. There aren''t that many users of c_h_o_c and most are >still trivially correct. I don''t think the changes to c_h_o_c have >introduced any more deadlocks, apart from the freeze_domains issue (which I >believe I have now fixed).Yes, c_h_o_c does not introduce any more deadlock, my concern is, it''s name maybe confusing if one try to use it for other hypercall . After all, if a hypercall and function used by the hypercall depends on current, it should not use this c_h_o_c. --jyh> > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Apr-16 08:30 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
Juergen, thanks for your explaination. --jyh>-----Original Message----- >From: Juergen Gross [mailto:juergen.gross@ts.fujitsu.com] >Sent: Friday, April 16, 2010 2:56 PM >To: Jiang, Yunhong >Cc: Keir Fraser; xen-devel@lists.xensource.com; Yu, Ke >Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets > >Jiang, Yunhong wrote: >> >>> -----Original Message----- >>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>> Sent: Thursday, April 15, 2010 7:07 PM >>> To: Jiang, Yunhong; Juergen Gross >>> Cc: xen-devel@lists.xensource.com; Yu, Ke >>> Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets >>> >>> On 15/04/2010 10:59, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: >>> >>>>> Actually that''s a good example because it now won''t work, but for other >>>>> reasons! The hypercall continuation can interrupt another vcpu''s execution, >>>>> and then try to synchronously pause that vcpu. Which will deadlock. >>>>> >>>>> Luckily I think we can re-jig this code to freeze_domains() before doing the >>>>> continue_hypercall_on_cpu(). I''ve cc''ed one of the CPU RAS guys. :-) >>>> Hmm, I have cc''ed one of the PM guys because it is enter_state :-) >>>> Can we add check in vcpu_sleep_sync() for current? It is meaningless to >>>> cpu_relax for current vcpu in that situation, especially if we are not in irq >>>> context. >>>> I''m not sure why in freeze_domains it only checkes dom0''s vcpu for current, >>>> instead of all domains. >>> Well actually pausing any vcpu from within the hypercall continuation is >>> dangerous. The softirq handler running the hypercall continuation may have >>> interrupted some running VCPU X. And the VCPU Y that the continuation is >>> currently trying to pause may itself be trying to pause X. So we can get a >>> deadlock that way. The freeze_domains() *has* to be pulled outside of the >>> hypercall continuation. >>> >>> It''s a little bit similar to the super-subtle stop_machine_run deadlock >>> possibility I just emailed to you a second ago. :-) >> >> Thanks for pointing out the stop_machine_run deadlock issue. >> >> After more consideration and internally discussion, seems the key point is, the >tasklet softirq is something like getting a lock for the current vcpu''s state(i.e. no one >else could change that state unless this softirq is finished). So any block action in >softirq context, not just vcpu_pause_sync, is dangerous and should be avoided (we >can''t get a lock and do block action per my understanding). >> This is because vcpu''s state can only be changed by schedule softirq (am I right on >this?), while schedule softirq can''t prempt other softirq. So, more generally, anything >that will be updated by a softirq context, and will be syncrhozed/blocking waitied in >xen''s vcpu context is in fact a implicit lock held by the softirq. >> >> To the tricky bug on the stop_machine_run(), I think it is in fact similar to the >cpu_add_remove_lock. The stop_machine_run() is a block action, so we must make >sure no one will be blocking to get the lock that is held by stop_machine_run() >already. At that time, we change all components that try to get the >cpu_add_remove_lock to be try_lock. >> >> The changes caused by the tasklet is, a new implicit lock is added, i.e. the vcpu''s >state. >> The straightforward method is like the cpu_add_remove_lock: make everything >that waiting for the vcpu state change to do softirq between the checking. Maybe >the cleaner way is your previous suggestion, that is, put the stop_machine_run() in >the idle_vcpu(), another way is, turn back to the original method, i.e. do it in the >schedule_tail. >> >> Also We are not sure why the continue_hypercall_on_cpu is changed to use tasklet. >What''s the benifit for it? At least I think this is quite confusing, because per our >understanding, usually hypercall is assumed to execut in current context, while this >change break the assumption. So any hypercall that may use this _c_h_o_c, and any >function called by that hypercall, should be aware of this, I''m not sure if this is really >so correct, at least it may cause trouble if someone use this without realize the >limitation. From Juergen Gross''s mail, it seems for cpupool, but I have no idea of the >cpupool :-( > >Cpupools introduce something like "scheduling domains" in xen. Each cpupool >owns a set of physical cpus and has an own scheduler. Each domain is member >of a cpupool. > >It is possible to move cpus or domains between pools, but a domain is always >limited to the physical cpus being in the cpupool of the domain. > >This limitation makes it impossible to use continue_hypercall_on_cpu with >cpupools for any physical cpu without changing it. My original solution >temporarily moved the target cpu into the cpupool of the issuing domain, >but this was regarded as an ugly hack. > > >Juergen > >-- >Juergen Gross Principal Developer Operating Systems >TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 >Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com >Domagkstr. 28 Internet: ts.fujitsu.com >D-80807 Muenchen Company details: >ts.fujitsu.com/imprint.html_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yu, Ke
2010-Apr-16 09:20 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Thursday, April 15, 2010 7:23 PM > To: Jiang, Yunhong > Cc: xen-devel@lists.xensource.com; Yu, Ke > Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using > tasklets > > On 15/04/2010 12:06, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > > >> Hmm, I have cc''ed one of the PM guys because it is enter_state :-) > >> Can we add check in vcpu_sleep_sync() for current? It is meaningless to > >> cpu_relax for current vcpu in that situation, especially if we are not in irq > >> context. > >> I''m not sure why in freeze_domains it only checkes dom0''s vcpu for > current, > >> instead of all domains. > > > > Well actually pausing any vcpu from within the hypercall continuation is > > dangerous. The softirq handler running the hypercall continuation may > have > > interrupted some running VCPU X. And the VCPU Y that the continuation > is > > currently trying to pause may itself be trying to pause X. So we can get a > > deadlock that way. The freeze_domains() *has* to be pulled outside of the > > hypercall continuation. > > Changesets 21180 and 21181 in xen-unstable are my attempts to fix this. Yu > Ke, please let me know if these look and test okay for you guys. >We have a stress S3 test upon the Xen 4.1 unstable. Unfortunately, cset 21181 cannot pass the stress. It hangs after 48 times S3 suspend/resume. Another round test even shows it hangs after 2 times suspend/resume. But it is too early to say cset 21181/21180 is the evil, because even cset 21172 (cset ahead of continue_hypercall_on_cpu improvement patch) cannot pass the S3 test either, although it has bigger success suspend/resume times than cset 21181 . so generally, there must be something wrong with S3 logic since Xen 4.0 release. We are still trying to dig it out... Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-16 17:51 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 16/04/2010 10:20, "Yu, Ke" <ke.yu@intel.com> wrote:>> Changesets 21180 and 21181 in xen-unstable are my attempts to fix this. Yu >> Ke, please let me know if these look and test okay for you guys. >> > > We have a stress S3 test upon the Xen 4.1 unstable. Unfortunately, cset 21181 > cannot pass the stress. It hangs after 48 times S3 suspend/resume. Another > round test even shows it hangs after 2 times suspend/resume. But it is too > early to say cset 21181/21180 is the evil, because even cset 21172 (cset ahead > of continue_hypercall_on_cpu improvement patch) cannot pass the S3 test > either, although it has bigger success suspend/resume times than cset 21181 . > so generally, there must be something wrong with S3 logic since Xen 4.0 > release. We are still trying to dig it out...21172 was not a good changeset to pick. I suggest trying xen-unstable tip with just 21181/21180 reverted. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-16 17:57 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 16/04/2010 09:16, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:> Oops, yes, this should be there already, without c_h_o_c. > So, if another vcpu is trying to vcpu_pause() this vcpu, it will deadlock > becaues this vcpu can''t be paused, right? (I assume the same reason to > hvmop_flush_tlb_all).Yes.> If this is true, then how about checking !vcpu_runnable(current) in > stop_machine_run()''s stopmachine_set_state()? If this check failed, it means > someone try to change our state and potential deadlock may happen, we can > cancel this stop_machine_run()? This is at least a bit cleaner than the > timeout mechanism.Interesting... That could work. But...>>> Maybe the cleaner way is your previous suggestion, that is, put the >>> stop_machine_run() in the idle_vcpu(), another way is, turn back to the >>> original method, i.e. do it in the schedule_tail. >> >> Argh. That would be annoying! > > Seems do it in the schedule_tail will not benifit this deadlock issues.Yes, but now I think about it, scheduling c_h_o_c() and s_m_r() in idle-vcpu context instead of softirq context might not be *that* hard to do, and it avoids all these subtle deadlock possibilities. I think I''m warming to it compared with the alternatives.> Yes, c_h_o_c does not introduce any more deadlock, my concern is, it''s name > maybe confusing if one try to use it for other hypercall . After all, if a > hypercall and function used by the hypercall depends on current, it should not > use this c_h_o_c.Well, I think there are two issues: (1) we run the continuation as a softirq; (2) we don''t run the continuation in the context of the original vcpu. I think (1) is a bigger problem than (2) as it introduces the possibility of all these nasty subtle deadlocks. (2) is obviously not desirable, *but* I don''t think any callers much care about the context of the original caller, *and* if they do the resulting bug is generally going to be pretty obvious. That is, the hypercall just won''t work, ever -- it''s much less likely than (1) to result in very rare very subtle bugs. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jiang, Yunhong
2010-Apr-19 05:53 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
>-----Original Message----- >From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Saturday, April 17, 2010 1:58 AM >To: Jiang, Yunhong; Juergen Gross >Cc: xen-devel@lists.xensource.com; Yu, Ke >Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets > >On 16/04/2010 09:16, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote: > >> Oops, yes, this should be there already, without c_h_o_c. >> So, if another vcpu is trying to vcpu_pause() this vcpu, it will deadlock >> becaues this vcpu can''t be paused, right? (I assume the same reason to >> hvmop_flush_tlb_all). > >Yes. > >> If this is true, then how about checking !vcpu_runnable(current) in >> stop_machine_run()''s stopmachine_set_state()? If this check failed, it means >> someone try to change our state and potential deadlock may happen, we can >> cancel this stop_machine_run()? This is at least a bit cleaner than the >> timeout mechanism. > >Interesting... That could work. But...I agree that idle vcpu method is more cleaner in the long run.> >>>> Maybe the cleaner way is your previous suggestion, that is, put the >>>> stop_machine_run() in the idle_vcpu(), another way is, turn back to the >>>> original method, i.e. do it in the schedule_tail. >>> >>> Argh. That would be annoying! >> >> Seems do it in the schedule_tail will not benifit this deadlock issues. > >Yes, but now I think about it, scheduling c_h_o_c() and s_m_r() in idle-vcpu >context instead of softirq context might not be *that* hard to do, and it >avoids all these subtle deadlock possibilities. I think I''m warming to it >compared with the alternatives. > >> Yes, c_h_o_c does not introduce any more deadlock, my concern is, it''s name >> maybe confusing if one try to use it for other hypercall . After all, if a >> hypercall and function used by the hypercall depends on current, it should not >> use this c_h_o_c. > >Well, I think there are two issues: (1) we run the continuation as a >softirq; (2) we don''t run the continuation in the context of the original >vcpu. I think (1) is a bigger problem than (2) as it introduces the >possibility of all these nasty subtle deadlocks. (2) is obviously not >desirable, *but* I don''t think any callers much care about the context of >the original caller, *and* if they do the resulting bug is generally going >to be pretty obvious. That is, the hypercall just won''t work, ever -- it''s >much less likely than (1) to result in very rare very subtle bugs.For issue 2, In fact this c_h_o_c is sometihng like xen action, i.e. it is some utility provided by xen hypervisor that can be utilized by guest, so maybe we can change the name of c_h_o_c, to be like call_xen_XXX? To your idle_vcpu context work, I think it is a bit like hvm domain waiting for a IO assist from qemu (i.e., use prepare_wait_on_xen_event_channel()), is it right? --jyh> > -- Keir >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-19 06:48 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 19/04/2010 06:53, "Jiang, Yunhong" <yunhong.jiang@intel.com> wrote:>> Well, I think there are two issues: (1) we run the continuation as a >> softirq; (2) we don''t run the continuation in the context of the original >> vcpu. I think (1) is a bigger problem than (2) as it introduces the >> possibility of all these nasty subtle deadlocks. (2) is obviously not >> desirable, *but* I don''t think any callers much care about the context of >> the original caller, *and* if they do the resulting bug is generally going >> to be pretty obvious. That is, the hypercall just won''t work, ever -- it''s >> much less likely than (1) to result in very rare very subtle bugs. > > For issue 2, In fact this c_h_o_c is sometihng like xen action, i.e. it is > some utility provided by xen hypervisor that can be utilized by guest, so > maybe we can change the name of c_h_o_c, to be like call_xen_XXX?Well, the handler does provide the final hypercall return code, so it is still a hypercall continuation even if it doesn''t run in the correct vcpu''s context.> To your idle_vcpu context work, I think it is a bit like hvm domain waiting > for a IO assist from qemu (i.e., use prepare_wait_on_xen_event_channel()), is > it right?It''s easier than that because the work flow does not leave the hypervisor itself. So we can simply pause/unpause the guest vcpu -- exactly as we are currently doing in the new tasklet-based c_h_o_c(). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-19 10:50 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 16/04/2010 18:51, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:>> We have a stress S3 test upon the Xen 4.1 unstable. Unfortunately, cset 21181 >> cannot pass the stress. It hangs after 48 times S3 suspend/resume. Another >> round test even shows it hangs after 2 times suspend/resume. But it is too >> early to say cset 21181/21180 is the evil, because even cset 21172 (cset >> ahead >> of continue_hypercall_on_cpu improvement patch) cannot pass the S3 test >> either, although it has bigger success suspend/resume times than cset 21181 . >> so generally, there must be something wrong with S3 logic since Xen 4.0 >> release. We are still trying to dig it out... > > 21172 was not a good changeset to pick. I suggest trying xen-unstable tip > with just 21181/21180 reverted.Actually, try xen-unstable staging tip (c/s 21202). I''ve just reimplemented tasklets, so things are quite different at latest tip. Thanks, Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Yu, Ke
2010-Apr-19 12:04 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Monday, April 19, 2010 6:51 PM > To: Yu, Ke; Jiang, Yunhong > Cc: xen-devel@lists.xensource.com > Subject: Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using > tasklets > > On 16/04/2010 18:51, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > > >> We have a stress S3 test upon the Xen 4.1 unstable. Unfortunately, cset > 21181 > >> cannot pass the stress. It hangs after 48 times S3 suspend/resume. > Another > >> round test even shows it hangs after 2 times suspend/resume. But it is > too > >> early to say cset 21181/21180 is the evil, because even cset 21172 (cset > >> ahead > >> of continue_hypercall_on_cpu improvement patch) cannot pass the S3 > test > >> either, although it has bigger success suspend/resume times than cset > 21181 . > >> so generally, there must be something wrong with S3 logic since Xen 4.0 > >> release. We are still trying to dig it out... > > > > 21172 was not a good changeset to pick. I suggest trying xen-unstable tip > > with just 21181/21180 reverted. > > Actually, try xen-unstable staging tip (c/s 21202). I''ve just reimplemented > tasklets, so things are quite different at latest tip. > > Thanks, > Keir >Yeah, move those stuff to idle cpu context is cleaner. I will try this. Best Regards Ke _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wei, Gang
2010-Apr-20 05:35 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On Monday, 2010-4-19 6:51 PM, Keir Fraser wrote:> On 16/04/2010 18:51, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >>> We have a stress S3 test upon the Xen 4.1 unstable. Unfortunately, >>> cset 21181 cannot pass the stress. It hangs after 48 times S3 >>> suspend/resume. Another round test even shows it hangs after 2 >>> times suspend/resume. But it is too early to say cset 21181/21180 >>> is the evil, because even cset 21172 (cset ahead of >>> continue_hypercall_on_cpu improvement patch) cannot pass the S3 >>> test either, although it has bigger success suspend/resume times >>> than cset 21181 . so generally, there must be something wrong with >>> S3 logic since Xen 4.0 release. We are still trying to dig it >>> out... >> >> 21172 was not a good changeset to pick. I suggest trying >> xen-unstable tip with just 21181/21180 reverted. > > Actually, try xen-unstable staging tip (c/s 21202). I''ve just > reimplemented tasklets, so things are quite different at latest tip.Just tried c/s 21202, hung up during the 285th S3 resume on system with only legacy HPET broadcast. And it is more easy to hung up on HPET MSI broadcast capable system - need only <100 times. Nothing wrong in the serial log, and ctrl-a didn''t respond. I will try to find a debuging system with ITP to figure out the hanging point. Meanwhile, I will try cpu online/offline stress test see whether it is a general problem in cpu online/offline or just S3 specific. Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2010-Apr-20 12:51 UTC
Re: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On 20/04/2010 06:35, "Wei, Gang" <gang.wei@intel.com> wrote:>>> 21172 was not a good changeset to pick. I suggest trying >>> xen-unstable tip with just 21181/21180 reverted. >> >> Actually, try xen-unstable staging tip (c/s 21202). I''ve just >> reimplemented tasklets, so things are quite different at latest tip. > > Just tried c/s 21202, hung up during the 285th S3 resume on system with only > legacy HPET broadcast. And it is more easy to hung up on HPET MSI broadcast > capable system - need only <100 times. > > Nothing wrong in the serial log, and ctrl-a didn''t respond. I will try to find > a debuging system with ITP to figure out the hanging point. Meanwhile, I will > try cpu online/offline stress test see whether it is a general problem in cpu > online/offline or just S3 specific.How reliable is S3 in 4.0, for comparison? Did you get it rock solid? Also, try c/s 21204 before getting into heavy debugging. It fixes being able to access VMCS fields from c_h_o_c() context. Worth a try. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wei, Gang
2010-Apr-20 14:10 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On Tuesday, 2010-4-20 8:52 PM, Keir Fraser wrote:> How reliable is S3 in 4.0, for comparison? Did you get it rock solid?S3 in 4.0 just pass several 100-time S3 stress test. It is not that rock solid. So I am not sure whether it is a regression or not yet.> Also, try c/s 21204 before getting into heavy debugging. It fixes > being able to access VMCS fields from c_h_o_c() context. Worth a try.I will give 21204 a try. Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Wei, Gang
2010-Apr-21 06:47 UTC
RE: [Xen-devel] [Patch] continue_hypercall_on_cpu rework using tasklets
On Tuesday, 2010-4-20 8:52 PM, Keir Fraser wrote:> Also, try c/s 21204 before getting into heavy debugging. It fixes > being able to access VMCS fields from c_h_o_c() context. Worth a try.Tried c/s 21208, still can''t pass 100-time S3 test. Hung up after 30~60 times. Jimmy _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel