Juergen Gross
2011-Mar-14  14:39 UTC
[Xen-devel] [PATCH] Avoid endless loop for vcpu migration
On multi-thread multi-core systems an endless loop can occur in vcpu_migrate() with credit scheduler. Avoid this loop by changing the interface of pick_cpu to indicate a repeated call in this case. Signed-off-by: juergen.gross@ts.fujitsu.com 6 files changed, 11 insertions(+), 15 deletions(-) xen/common/sched_arinc653.c | 2 +- xen/common/sched_credit.c | 12 +++--------- xen/common/sched_credit2.c | 2 +- xen/common/sched_sedf.c | 2 +- xen/common/schedule.c | 6 ++++-- xen/include/xen/sched-if.h | 2 +- _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Mar-14  15:03 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
>>> On 14.03.11 at 15:39, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote: > On multi-thread multi-core systems an endless loop can occur in vcpu_migrate() > with credit scheduler. Avoid this loop by changing the interface of pick_cpu > to indicate a repeated call in this case.But you''re not changing in any way the loop that doesn''t get exited - did you perhaps read my original description as the pick function itself looping (which - afaict - it doesn''t)? Further, the change still isn''t consistent with idle_bias - the updating ought to happen on the last iteration (if you need to call the function more than once), not the first one, which creates a chicken-and-egg problem for you as you will know it''s the last one only when it returned. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Mar-14  15:06 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
I''m not versed enough in this aspect of the scheduler to ack or commit this. It just looks a bit ugly and confusing to me. Needs an Ack from George therefore. -- Keir On 14/03/2011 14:39, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:> On multi-thread multi-core systems an endless loop can occur in vcpu_migrate() > with credit scheduler. Avoid this loop by changing the interface of pick_cpu > to indicate a repeated call in this case. > > Signed-off-by: juergen.gross@ts.fujitsu.com > > > 6 files changed, 11 insertions(+), 15 deletions(-) > xen/common/sched_arinc653.c | 2 +- > xen/common/sched_credit.c | 12 +++--------- > xen/common/sched_credit2.c | 2 +- > xen/common/sched_sedf.c | 2 +- > xen/common/schedule.c | 6 ++++-- > xen/include/xen/sched-if.h | 2 +- > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tim Deegan
2011-Mar-14  15:06 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
At 15:03 +0000 on 14 Mar (1300115028), Jan Beulich wrote:> >>> On 14.03.11 at 15:39, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote: > > On multi-thread multi-core systems an endless loop can occur in vcpu_migrate() > > with credit scheduler. Avoid this loop by changing the interface of pick_cpu > > to indicate a repeated call in this case. > > But you''re not changing in any way the loop that doesn''t get > exited - did you perhaps read my original description as the > pick function itself looping (which - afaict - it doesn''t)? > > Further, the change still isn''t consistent with idle_bias - the > updating ought to happen on the last iteration (if you need > to call the function more than once), not the first one, which > creates a chicken-and-egg problem for you as you will know > it''s the last one only when it returned.Perhaps you could submit a comment patch that describes exactly what idle_bias is and how it''s supposed to work. At the moment it''s entirely uncommented. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Mar-15  05:50 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
On 03/14/11 16:03, Jan Beulich wrote:>>>> On 14.03.11 at 15:39, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >> On multi-thread multi-core systems an endless loop can occur in vcpu_migrate() >> with credit scheduler. Avoid this loop by changing the interface of pick_cpu >> to indicate a repeated call in this case. > > But you''re not changing in any way the loop that doesn''t get > exited - did you perhaps read my original description as the > pick function itself looping (which - afaict - it doesn''t)?I''m changing the way the pick_cpu function is reacting on multiple calls in a loop. If I''ve understood the idle_bias correctly, updating it in each loop iteration did result in returning another cpu for each call. By updating idle_bias only once, it should return the same cpu in subsequent calls. This should exit the loop in vcpu_migrate.> Further, the change still isn''t consistent with idle_bias - the > updating ought to happen on the last iteration (if you need > to call the function more than once), not the first one, which > creates a chicken-and-egg problem for you as you will know > it''s the last one only when it returned.Is it really so important idle_bias is reflecting the last cpu selected? I was under the impression it should be okay when this is true in most cases. With my patch idle_bias might be "wrong" if there is a race with other cpus forcing a selection of a different cpu in the second iteration of the loop in vcpu_migrate. Is this really critical? I doubt it. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Mar-15  07:57 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
>>> On 15.03.11 at 06:50, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote: > On 03/14/11 16:03, Jan Beulich wrote: >>>>> On 14.03.11 at 15:39, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>> On multi-thread multi-core systems an endless loop can occur in vcpu_migrate() >>> with credit scheduler. Avoid this loop by changing the interface of pick_cpu >>> to indicate a repeated call in this case. >> >> But you''re not changing in any way the loop that doesn''t get >> exited - did you perhaps read my original description as the >> pick function itself looping (which - afaict - it doesn''t)? > > I''m changing the way the pick_cpu function is reacting on multiple calls in > a loop. If I''ve understood the idle_bias correctly, updating it in each > loop iteration did result in returning another cpu for each call. > By updating idle_bias only once, it should return the same cpu in subsequent > calls. This should exit the loop in vcpu_migrate.You''re only decreasing the likelihood of a live lock, as the return value of pick_cpu not only depends on idle_bias.>> Further, the change still isn''t consistent with idle_bias - the >> updating ought to happen on the last iteration (if you need >> to call the function more than once), not the first one, which >> creates a chicken-and-egg problem for you as you will know >> it''s the last one only when it returned. > > Is it really so important idle_bias is reflecting the last cpu selected? > I was under the impression it should be okay when this is true in most > cases. With my patch idle_bias might be "wrong" if there is a race with > other cpus forcing a selection of a different cpu in the second iteration > of the loop in vcpu_migrate. Is this really critical? I doubt it.It''s not critical, and not affecting correctness. But with updating idle_bias on the first invocation you''re (on the right hardware) basically guaranteeing the second invocation to return a different CPU. That way, your loop will be run minimally three times on such systems. I already find it odd to require two iterations when previously this was a strait code path. If there''s really no way around the iterative approach, one possibility might be to not take into consideration idle_bias on non-initial invocations at all. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Mar-15  08:46 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
On 03/15/11 08:57, Jan Beulich wrote:>>>> On 15.03.11 at 06:50, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >> On 03/14/11 16:03, Jan Beulich wrote: >>>>>> On 14.03.11 at 15:39, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>>> On multi-thread multi-core systems an endless loop can occur in vcpu_migrate() >>>> with credit scheduler. Avoid this loop by changing the interface of pick_cpu >>>> to indicate a repeated call in this case. >>> >>> But you''re not changing in any way the loop that doesn''t get >>> exited - did you perhaps read my original description as the >>> pick function itself looping (which - afaict - it doesn''t)? >> >> I''m changing the way the pick_cpu function is reacting on multiple calls in >> a loop. If I''ve understood the idle_bias correctly, updating it in each >> loop iteration did result in returning another cpu for each call. >> By updating idle_bias only once, it should return the same cpu in subsequent >> calls. This should exit the loop in vcpu_migrate. > > You''re only decreasing the likelihood of a live lock, as the return > value of pick_cpu not only depends on idle_bias.Hmm, then another solution would be to let pick_cpu really return the proposed cpu from the first iteration, if it doesn''t contradict the allowed settings. It could be sub-optimal, but I don''t think this is critical, as vcpu_migrate is called rarely. Patch attached.> >>> Further, the change still isn''t consistent with idle_bias - the >>> updating ought to happen on the last iteration (if you need >>> to call the function more than once), not the first one, which >>> creates a chicken-and-egg problem for you as you will know >>> it''s the last one only when it returned. >> >> Is it really so important idle_bias is reflecting the last cpu selected? >> I was under the impression it should be okay when this is true in most >> cases. With my patch idle_bias might be "wrong" if there is a race with >> other cpus forcing a selection of a different cpu in the second iteration >> of the loop in vcpu_migrate. Is this really critical? I doubt it. > > It''s not critical, and not affecting correctness. But with updating > idle_bias on the first invocation you''re (on the right hardware) > basically guaranteeing the second invocation to return a > different CPU. That way, your loop will be run minimally three > times on such systems. I already find it odd to require two > iterations when previously this was a strait code path.This was wrong. It was always required to hold the schedule lock of the picked cpu as well, otherwise a race with cpu hotplug would be possible.> > If there''s really no way around the iterative approach, one > possibility might be to not take into consideration idle_bias > on non-initial invocations at all.This would be a side effect of my suggestion. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Mar-15  08:50 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
On 15/03/2011 08:46, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> It''s not critical, and not affecting correctness. But with updating >> idle_bias on the first invocation you''re (on the right hardware) >> basically guaranteeing the second invocation to return a >> different CPU. That way, your loop will be run minimally three >> times on such systems. I already find it odd to require two >> iterations when previously this was a strait code path. > > This was wrong. It was always required to hold the schedule lock of the > picked cpu as well, otherwise a race with cpu hotplug would be possible.What would that race be? CPU offlining is done in stop_machine context. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Mar-15  08:53 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
On 03/15/11 09:50, Keir Fraser wrote:> On 15/03/2011 08:46, "Juergen Gross"<juergen.gross@ts.fujitsu.com> wrote: > >>> It''s not critical, and not affecting correctness. But with updating >>> idle_bias on the first invocation you''re (on the right hardware) >>> basically guaranteeing the second invocation to return a >>> different CPU. That way, your loop will be run minimally three >>> times on such systems. I already find it odd to require two >>> iterations when previously this was a strait code path. >> >> This was wrong. It was always required to hold the schedule lock of the >> picked cpu as well, otherwise a race with cpu hotplug would be possible. > > What would that race be? CPU offlining is done in stop_machine context.Ahh, okay. Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Mar-15  09:01 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
>>> On 15.03.11 at 09:46, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote: > On 03/15/11 08:57, Jan Beulich wrote: >>>>> On 15.03.11 at 06:50, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>> On 03/14/11 16:03, Jan Beulich wrote: >>>>>>> On 14.03.11 at 15:39, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>>>> On multi-thread multi-core systems an endless loop can occur in vcpu_migrate() >>>>> with credit scheduler. Avoid this loop by changing the interface of pick_cpu >>>>> to indicate a repeated call in this case. >>>> >>>> But you''re not changing in any way the loop that doesn''t get >>>> exited - did you perhaps read my original description as the >>>> pick function itself looping (which - afaict - it doesn''t)? >>> >>> I''m changing the way the pick_cpu function is reacting on multiple calls in >>> a loop. If I''ve understood the idle_bias correctly, updating it in each >>> loop iteration did result in returning another cpu for each call. >>> By updating idle_bias only once, it should return the same cpu in subsequent >>> calls. This should exit the loop in vcpu_migrate. >> >> You''re only decreasing the likelihood of a live lock, as the return >> value of pick_cpu not only depends on idle_bias. > > Hmm, then another solution would be to let pick_cpu really return the > proposed cpu from the first iteration, if it doesn''t contradict the > allowed settings. It could be sub-optimal, but I don''t think this is > critical, as vcpu_migrate is called rarely. > > Patch attached.That candidate-is-valid check seems absolutely independent of the particular scheduler used, and hence could be done in the (sole) caller, thus not requiring any change to the scheduler interface. Which at once would eliminate unnecessary calls into pick_cpu (i.e. you''d call it a second time only if the previously selected CPU really is no longer valid to be used for that vCPU). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Mar-15  09:21 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
On 03/15/11 10:01, Jan Beulich wrote:>>>> On 15.03.11 at 09:46, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >> On 03/15/11 08:57, Jan Beulich wrote: >>>>>> On 15.03.11 at 06:50, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>>> On 03/14/11 16:03, Jan Beulich wrote: >>>>>>>> On 14.03.11 at 15:39, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>>>>> On multi-thread multi-core systems an endless loop can occur in vcpu_migrate() >>>>>> with credit scheduler. Avoid this loop by changing the interface of pick_cpu >>>>>> to indicate a repeated call in this case. >>>>> >>>>> But you''re not changing in any way the loop that doesn''t get >>>>> exited - did you perhaps read my original description as the >>>>> pick function itself looping (which - afaict - it doesn''t)? >>>> >>>> I''m changing the way the pick_cpu function is reacting on multiple calls in >>>> a loop. If I''ve understood the idle_bias correctly, updating it in each >>>> loop iteration did result in returning another cpu for each call. >>>> By updating idle_bias only once, it should return the same cpu in subsequent >>>> calls. This should exit the loop in vcpu_migrate. >>> >>> You''re only decreasing the likelihood of a live lock, as the return >>> value of pick_cpu not only depends on idle_bias. >> >> Hmm, then another solution would be to let pick_cpu really return the >> proposed cpu from the first iteration, if it doesn''t contradict the >> allowed settings. It could be sub-optimal, but I don''t think this is >> critical, as vcpu_migrate is called rarely. >> >> Patch attached. > > That candidate-is-valid check seems absolutely independent of the > particular scheduler used, and hence could be done in the (sole) > caller, thus not requiring any change to the scheduler interface. > > Which at once would eliminate unnecessary calls into pick_cpu (i.e. > you''d call it a second time only if the previously selected CPU really > is no longer valid to be used for that vCPU).True. The patch seems to become smaller :-) Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Mar-15  09:34 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
>>> On 15.03.11 at 10:21, Juergen Gross <juergen.gross@ts.fujitsu.com> wrote: > On 03/15/11 10:01, Jan Beulich wrote: >>>>> On 15.03.11 at 09:46, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>> On 03/15/11 08:57, Jan Beulich wrote: >>>>>>> On 15.03.11 at 06:50, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>>>> On 03/14/11 16:03, Jan Beulich wrote: >>>>>>>>> On 14.03.11 at 15:39, Juergen Gross<juergen.gross@ts.fujitsu.com> wrote: >>>>>>> On multi-thread multi-core systems an endless loop can occur in vcpu_migrate() >>>>>>> with credit scheduler. Avoid this loop by changing the interface of pick_cpu >>>>>>> to indicate a repeated call in this case. >>>>>> >>>>>> But you''re not changing in any way the loop that doesn''t get >>>>>> exited - did you perhaps read my original description as the >>>>>> pick function itself looping (which - afaict - it doesn''t)? >>>>> >>>>> I''m changing the way the pick_cpu function is reacting on multiple calls in >>>>> a loop. If I''ve understood the idle_bias correctly, updating it in each >>>>> loop iteration did result in returning another cpu for each call. >>>>> By updating idle_bias only once, it should return the same cpu in subsequent >>>>> calls. This should exit the loop in vcpu_migrate. >>>> >>>> You''re only decreasing the likelihood of a live lock, as the return >>>> value of pick_cpu not only depends on idle_bias. >>> >>> Hmm, then another solution would be to let pick_cpu really return the >>> proposed cpu from the first iteration, if it doesn''t contradict the >>> allowed settings. It could be sub-optimal, but I don''t think this is >>> critical, as vcpu_migrate is called rarely. >>> >>> Patch attached. >> >> That candidate-is-valid check seems absolutely independent of the >> particular scheduler used, and hence could be done in the (sole) >> caller, thus not requiring any change to the scheduler interface. >> >> Which at once would eliminate unnecessary calls into pick_cpu (i.e. >> you''d call it a second time only if the previously selected CPU really >> is no longer valid to be used for that vCPU). > > True. > > The patch seems to become smaller :-)This looks good to me now, and it makes quite obvious that there is a likely exit path from the loop (it can only live lock now if v->cpu_affinity and/or v->domain->cpupool->cpu_valid are constantly changing, which could only be due to a misbehaving administrator). Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2011-Mar-15  09:58 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
On 15/03/2011 09:21, "Juergen Gross" <juergen.gross@ts.fujitsu.com> wrote:>> That candidate-is-valid check seems absolutely independent of the >> particular scheduler used, and hence could be done in the (sole) >> caller, thus not requiring any change to the scheduler interface. >> >> Which at once would eliminate unnecessary calls into pick_cpu (i.e. >> you''d call it a second time only if the previously selected CPU really >> is no longer valid to be used for that vCPU). > > True. > > The patch seems to become smaller :-)By the way, why is the cpu_isset(new_cpu, v->domain->vcpupool->cpu_valid) check required (after calling pick_cpu, in the cuirrently checked-in code)? You already check that pick_cpu was called holding the correct pair of locks, if it has returned a cpu that is not in the pool''s cpu_valid mask, what would make pick_cpu return anything different on the next invocation thus avoiding an endless loop? Looks like this question would remain even if this new patch was applied. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Juergen Gross
2011-Mar-15  10:29 UTC
Re: [Xen-devel] [PATCH] Avoid endless loop for vcpu migration
On 03/15/11 10:58, Keir Fraser wrote:> On 15/03/2011 09:21, "Juergen Gross"<juergen.gross@ts.fujitsu.com> wrote: > >>> That candidate-is-valid check seems absolutely independent of the >>> particular scheduler used, and hence could be done in the (sole) >>> caller, thus not requiring any change to the scheduler interface. >>> >>> Which at once would eliminate unnecessary calls into pick_cpu (i.e. >>> you''d call it a second time only if the previously selected CPU really >>> is no longer valid to be used for that vCPU). >> >> True. >> >> The patch seems to become smaller :-) > > By the way, why is the cpu_isset(new_cpu, v->domain->vcpupool->cpu_valid) > check required (after calling pick_cpu, in the cuirrently checked-in code)?Good question. It shouldn''t be required, as pick_cpu should check this and should return only cpus in the current cpupool. With the latest patches from Jan this seems to be true. :-) I think I''ll send a separate patch to remove the check.> You already check that pick_cpu was called holding the correct pair of > locks, if it has returned a cpu that is not in the pool''s cpu_valid mask, > what would make pick_cpu return anything different on the next invocation > thus avoiding an endless loop?Nothing. If pick_cpu is returning a cpu outside of it''s cpupool, the loop could be infinite. Inserting a BUG_ON would be a good idea, but this would require a bit more logic, as a cpu might be removed from the cpupool during a running vcpu_migrate (this case is handled by the call of cpu_disable_scheduler() during removing a cpu from a cpupool). Juergen -- Juergen Gross Principal Developer Operating Systems TSP ES&S SWE OS6 Telephone: +49 (0) 89 3222 2967 Fujitsu Technology Solutions e-mail: juergen.gross@ts.fujitsu.com Domagkstr. 28 Internet: ts.fujitsu.com D-80807 Muenchen Company details: ts.fujitsu.com/imprint.html _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel