Tian, Kevin
2005-Jul-13  09:54 UTC
[Xen-devel] [Patch] Fix IDLE issue with sedf scheduler on IA64
Hi, Dan, This patch fixes strange behavior on IA64, that IDLE is scheduled more than Dom0 with default sEDF scheduler. The key point is reprogram_ac_timer at the end of ac_timer dispatcher, which programs local apic timer with expire of next ac_timer on x86. Higher precision lapic timer can trigger ac_timer more precisely than simply done in PIT interrupt handler. That works perfectly on x86 because PIT is already dedicated to update system time and wall time, etc. However on IA64, there''s only one timer source (local sapic timer), to cover both updating time related statistics and triggering ac_timer softirq. In this case, the delta between adjacent timer interrupts can''t be changed dynamically, or else system time (especially NOW()) is inaccurate. So reprogram_ac_timer should always return 1 on IA64. To make ac_timer softirq triggered more precisely, another parameter tuned in the patch is the HZ, changed from 100(10ms) to 1024(1ms). Then the cpu_time of IDLE is decreased to about 0.09% from previous %1. Signed-off-by Kevin Tian <Kevin.Tian@intel.com> Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2005-Jul-13  12:24 UTC
[Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
> To make ac_timer softirq triggered more precisely, another > parameter tuned in the patch is the HZ, changed from 100(10ms) to > 1024(1ms). Then the cpu_time of IDLE is decreased to about 0.09% from > previous %1.Hmmm... Are there other side effects from the HZ=32? This patch costs an extra ~1000 interrupts/second. Since each of these interrupts need to be handled in C code, the save/restore overhead is a large fraction of the cost of a domain switch, thus increasing the total Xen overhead (on _every_ application, even one which is CPU bound) substantially. Well, maybe not _substantially_ but probably on the order of 0.1%. Is there a better way (for ia64)? I kind of like the solution Keir and Ian imply... is it possible in context_switch to simply "refuse" to switch to the idle domain? E.g. if the idle domain is the target of the switch, instead switch to domain0 (and make it runnable)? Dan P.S. If this can be done in ia64-specific code, let''s switch this thread to xen-ia64-devel. If not, we can later move the discussion back to xen-devel. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2005-Jul-13  13:34 UTC
RE: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
>From: xen-devel-bounces@lists.xensource.com >[mailto:xen-devel-bounces@lists.xensource.com] On Behalf OfMagenheimer,>Dan (HP Labs Fort Collins) >Sent: Wednesday, July 13, 2005 8:25 PM >To: Tian, Kevin > >Is there a better way (for ia64)? I kind of like the solution >Keir and Ian imply... is it possible in context_switch to simply >"refuse" to switch to the idle domain? E.g. if the idle domain >is the target of the switch, instead switch to domain0 (and >make it runnable)? >This seems not easy to be simply done in context_switch without common change. Preventing switch to IDLE is easy, and a simple check in context_switch can achieve. However the really bad thing is about housekeep info within scheduler. Eg. domain0 may have been placed on waitq, with begin of next period still far away. Stealing slice of IDLE to Dom0 without notifying scheduler, may mess the future decision since next schedule will happen on Dom0''s context and base on dom0''s statistic info... Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hollis Blanchard
2005-Jul-13  14:32 UTC
Re: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
On Jul 13, 2005, at 8:34 AM, Tian, Kevin wrote:>> Magenheimer, Dan (HP Labs Fort Collins) wrote: >> >> Is there a better way (for ia64)? I kind of like the solution >> Keir and Ian imply... is it possible in context_switch to simply >> "refuse" to switch to the idle domain? E.g. if the idle domain >> is the target of the switch, instead switch to domain0 (and >> make it runnable)? > > This seems not easy to be simply done in context_switch without common > change. Preventing switch to IDLE is easy, and a simple check in > context_switch can achieve. However the really bad thing is about > housekeep info within scheduler. Eg. domain0 may have been placed on > waitq, with begin of next period still far away. Stealing slice of IDLE > to Dom0 without notifying scheduler, may mess the future decision since > next schedule will happen on Dom0''s context and base on dom0''s > statistic > info...See how x86 does this in context_switch() (arch/x86/domain.c). In particular, __context_switch is avoided for the idle domain, so the context restored is some register pops in ret_from_intr/restore_all_xen (arch/x86/x86_32/entry.S). No scheduler changes or confusion necessary... -- Hollis Blanchard IBM Linux Technology Center _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2005-Jul-13  15:05 UTC
RE: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
> >> Magenheimer, Dan (HP Labs Fort Collins) wrote: > >> > >> Is there a better way (for ia64)? I kind of like the solution > >> Keir and Ian imply... is it possible in context_switch to simply > >> "refuse" to switch to the idle domain? E.g. if the idle domain > >> is the target of the switch, instead switch to domain0 (and > >> make it runnable)? > > > > This seems not easy to be simply done in context_switch > without common > > change. Preventing switch to IDLE is easy, and a simple check in > > context_switch can achieve. However the really bad thing is about > > housekeep info within scheduler. Eg. domain0 may have been placed on > > waitq, with begin of next period still far away. Stealing > slice of IDLE > > to Dom0 without notifying scheduler, may mess the future > decision since > > next schedule will happen on Dom0''s context and base on dom0''s > > statistic > > info... > > See how x86 does this in context_switch() (arch/x86/domain.c). In > particular, __context_switch is avoided for the idle domain, so the > context restored is some register pops in > ret_from_intr/restore_all_xen > (arch/x86/x86_32/entry.S).Neat, but doesn''t this only solve half the problem? Idle is now an "impostor" for the last runnable domain. Generally the machine goes idle because all domains are waiting for a device interrupt. Since (in general) all device interrupts go through domain0, a context switch is still necessary from idle=last_runnable_domain to domain0 to process the device interrupt, then back to domU to process the virtual interrupt. In a I/O bound system, interrupt latency still seems to be twice what it could be. A related idea though for the scheduler experts to think about: Is it possible for idle to be an "alias" for domain0? Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2005-Jul-13  15:14 UTC
Re: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
On 13 Jul 2005, at 16:05, Magenheimer, Dan (HP Labs Fort Collins) wrote:> Neat, but doesn''t this only solve half the problem? Idle is now > an "impostor" for the last runnable domain. Generally the machine > goes idle because all domains are waiting for a device interrupt. > Since (in general) all device interrupts go through domain0, > a context switch is still necessary from idle=last_runnable_domain > to domain0 to process the device interrupt, then back to domU to > process the virtual interrupt. > > In a I/O bound system, interrupt latency still seems to be > twice what it could be.If you really care about i/o throughput you should probably be prepared to burn a core or a hyperthread on your driver domain (probably dom0) as otherwise the domu<->dom0 context switches are a serious overhead. Mostly, if context switch times are a bottleneck limit then you are cpu bound anyway, and there will be no idle time to optimise usage of. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2005-Jul-14  01:02 UTC
RE: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
>From: Hollis Blanchard [mailto:hollisb@us.ibm.com] >Sent: Wednesday, July 13, 2005 10:32 PM > >On Jul 13, 2005, at 8:34 AM, Tian, Kevin wrote: > >> This seems not easy to be simply done in context_switch withoutcommon>> change. Preventing switch to IDLE is easy, and a simple check in >> context_switch can achieve. However the really bad thing is about >> housekeep info within scheduler. Eg. domain0 may have been placed on >> waitq, with begin of next period still far away. Stealing slice ofIDLE>> to Dom0 without notifying scheduler, may mess the future decisionsince>> next schedule will happen on Dom0''s context and base on dom0''s >> statistic >> info... > >See how x86 does this in context_switch() (arch/x86/domain.c). In >particular, __context_switch is avoided for the idle domain, so the >context restored is some register pops in ret_from_intr/restore_all_xen >(arch/x86/x86_32/entry.S). > >No scheduler changes or confusion necessary... >What you talked is about the optimization which is the neat way I prefer, just like Ian suggested to use concept of lazy context switch instead of eliminating IDLE domain earlier. But this is not the same point as what Dan is suggesting. Dan is suggesting to remove IDLE completely, or get IDLE simply an alias for Dom0. My concern was just raised upon that direction - difficult to prevent scheduling IDLE (not mean optimization for context switch) without common scheduler change... Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2005-Jul-14  01:23 UTC
RE: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
> Neat, but doesn''t this only solve half the problem? Idle is > now an "impostor" for the last runnable domain. Generally > the machine goes idle because all domains are waiting for a > device interrupt. > Since (in general) all device interrupts go through domain0, > a context switch is still necessary from > idle=last_runnable_domain to domain0 to process the device > interrupt, then back to domU to process the virtual interrupt. > > In a I/O bound system, interrupt latency still seems to be > twice what it could be. > > A related idea though for the scheduler experts to think about: > Is it possible for idle to be an "alias" for domain0?If you really want to do something like this, it would be much better just to detect a switch to the idle domain (on whatever CPU dom0 happens to be running on) and load the register and mm state for dom0 and make it appear to be the last domain that ran. The lazy switching logic will then take care of things. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2005-Jul-14  02:01 UTC
RE: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
>From: Magenheimer, Dan (HP Labs Fort Collins) >[mailto:dan.magenheimer@hp.com] >Sent: Wednesday, July 13, 2005 11:05 PM > >Neat, but doesn''t this only solve half the problem? Idle is now >an "impostor" for the last runnable domain. Generally the machine >goes idle because all domains are waiting for a device interrupt. >Since (in general) all device interrupts go through domain0, >a context switch is still necessary from idle=last_runnable_domain >to domain0 to process the device interrupt, then back to domU to >process the virtual interrupt. > >In a I/O bound system, interrupt latency still seems to be >twice what it could be. > >A related idea though for the scheduler experts to think about: >Is it possible for idle to be an "alias" for domain0? > >DanIf Dom0 is always doing meaningful job, that''s possibly wanted. However I''m not sure whether this approach has better performance on the other side... Say dom0 is requesting to block explicitly (in its idle loop due to bunch of I/O sessions), to always schedule dom0 (eliminating IDLE) actually lets do_block returned immediately to idle loop in dom0, which will issue another do_block to Xen, and then such heavily context switch will continue until a new schedule or expected events happens. Then the net effect is actually to move idle concept from Xen into Dom0, when hardware really wants to sleep. However this is much worse than idle loop in Xen (current IDLE domain), because more power are consumed unexpectedly due to too many context switches. More, scheduler will be triggered with more latency then, because context switch is more boring with dom0 compared with IDLE domain. Less optimization can be done. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2005-Jul-14  02:45 UTC
RE: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
>From: Ian Pratt [mailto:m+Ian.Pratt@cl.cam.ac.uk] >Sent: Thursday, July 14, 2005 9:24 AM > > >If you really want to do something like this, it would be much better >just to detect a switch to the idle domain (on whatever CPU dom0happens>to be running on) and load the register and mm state for dom0 and make >it appear to be the last domain that ran. The lazy switching logicwill>then take care of things. > >IanI''m still doubt the really gain of "not switch to idle", which may bring dom0 less period on real job compared to other domains. Saying current model, dom0 is requesting to block in its own idle loop. In this way, dom0''s idle loop doesn''t occupy any real slice, and the whole period allocated to dom0 is all spent on meaningful job. Then let''s not switch to idle domain with some trick in context_switch. What''s happen? If dom0 is in idle loop and request to block, its next period/slice is set as 20ms and scheduler decides to switch to IDLE with next expire for scheduler is 2ms. Then in context_switch, idle domain is instead replaced by Dom0 which may simply does idle loop too. When next schedule happens, 2ms is minus from Dom0''s slice however this 2ms is likely to be spent on completely useless loop. Finally in this period, Dom0 only gets 18ms to do real job, and I think this will also affect the throughput of Dom0. Just take a special case as example here... :-) So we need to balance based on future experimental data. Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Pratt
2005-Jul-14  10:39 UTC
RE: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
> >If you really want to do something like this, it would be > much better > >just to detect a switch to the idle domain (on whatever CPU dom0 > happens > >to be running on) and load the register and mm state for > dom0 and make > >it appear to be the last domain that ran. The lazy switching logic > will > >then take care of things.> I''m still doubt the really gain of "not switch to idle", > which may bring dom0 less period on real job compared to > other domains. Saying current model, dom0 is requesting to > block in its own idle loop. In this way, dom0''s idle loop > doesn''t occupy any real slice, and the whole period allocated > to dom0 is all spent on meaningful job.You misunderstood my suggestion. We would still switch to the idle domain, we just load the dom0 bulk state such that the lazy switch logic won''t won''t have to do anything should dom0 be the next domain to run. Ian _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Tian, Kevin
2005-Jul-14  11:33 UTC
RE: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
>> I''m still doubt the really gain of "not switch to idle", >> which may bring dom0 less period on real job compared to >> other domains. Saying current model, dom0 is requesting to >> block in its own idle loop. In this way, dom0''s idle loop >> doesn''t occupy any real slice, and the whole period allocated >> to dom0 is all spent on meaningful job. > >You misunderstood my suggestion. We would still switch to the idle >domain, we just load the dom0 bulk state such that the lazy switchlogic>won''t won''t have to do anything should dom0 be the next domain to run. > >IanThen, I fully agree! ;-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Magenheimer, Dan (HP Labs Fort Collins)
2005-Jul-14  14:48 UTC
RE: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
Keir> An idle domain is a convenient abstraction for thinking about the Keir> current scheduling state of a cpu -- it allows us to treat things more Keir> uniformly in the common code. Of course we can treat the idle domains Keir> very specially during state switch for performance. So... an idle domain is a convenient abstraction which, it seems, results in every platform inconveniently adding code to work around the abstraction? ;-) Isn''t it really the case that an idle domain/process is an anachronistic concept that pre-dates "low power states" and is used by Xen mostly because Xen is leveraging OS scheduler designs (that also pre-date low power states)? I recognize that that''s still a perfectly reasonable design choice for Xen... just trying to ensure I understand. Ian> >You misunderstood my suggestion. We would still switch to the idle Ian> >domain, we just load the dom0 bulk state such that the lazy switch Ian> logic Ian> >won''t won''t have to do anything should dom0 be the next Ian> domain to run. Kevin> Then, I fully agree! ;-) The non-lazy state on ia64 is still significant but I agree this reduces the issue to a level its probably not worth worrying about any more. One additional question however: Is there a way to determine/query "am I (current) the only domain on the run queue (other than idle)?"**. If so, I can leave the processor fully in "domain0 context" (and leave domain0 runnable) when I enter a low power state, thus eliminating the non-lazy state switch and reducing the interrupt latency when all domains are (virtual-) I/O bound. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2005-Jul-14  15:10 UTC
[Xen-ia64-devel] Re: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
On 14 Jul 2005, at 15:48, Magenheimer, Dan (HP Labs Fort Collins) wrote:> So... an idle domain is a convenient abstraction which, it seems, > results in every platform inconveniently adding code to work around > the abstraction? ;-) > > Isn''t it really the case that an idle domain/process is an > anachronistic > concept that pre-dates "low power states" and is used by Xen mostly > because Xen is leveraging OS scheduler designs (that also pre-date > low power states)? I recognize that that''s still a perfectly > reasonable design choice for Xen... just trying to ensure I > understand.I think that what you execute during idle time, and what state you save/restore, is orthogonal to how you represent the idle state to the scheduler and other subsystems in the hypervisor. -- Keir _______________________________________________ Xen-ia64-devel mailing list Xen-ia64-devel@lists.xensource.com http://lists.xensource.com/xen-ia64-devel
Magenheimer, Dan (HP Labs Fort Collins)
2005-Jul-14  15:17 UTC
RE: [Xen-devel] RE: [Patch] Fix IDLE issue with sedf scheduler on IA64
> > So... an idle domain is a convenient abstraction which, it seems, > > results in every platform inconveniently adding code to work around > > the abstraction? ;-) > > > > Isn''t it really the case that an idle domain/process is an > > anachronistic > > concept that pre-dates "low power states" and is used by Xen mostly > > because Xen is leveraging OS scheduler designs (that also pre-date > > low power states)? I recognize that that''s still a perfectly > > reasonable design choice for Xen... just trying to ensure I > > understand. > > I think that what you execute during idle time, and what state you > save/restore, is orthogonal to how you represent the idle > state to the > scheduler and other subsystems in the hypervisor.Which is exactly my point... The existing paradigm for schedulers mixes/confuses the two because scheduling the idle process/thread/domain is a convenient way for a scheduler to report that it has nothing runnable. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel