It looks like a HYPERVISOR_yield() call will end placing the yielded VCPU at the head of run queue if there are only equal-or-lower priority VCPUs on the queue. Shouldn''t it place it after any equal-priority CPUs on the list? thanks john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Atsushi SAKAI
2007-Oct-09 01:23 UTC
Re: [Xen-devel] credit scheduler and HYPERVISOR_yield()
Hi, John Do you find any problem in your test (by HYPERVISOR_yield())? Please describe your problem at first. Thanks Atsushi SAKAI John Levon <levon@movementarian.org> wrote:> > It looks like a HYPERVISOR_yield() call will end placing the yielded > VCPU at the head of run queue if there are only equal-or-lower priority > VCPUs on the queue. Shouldn''t it place it after any equal-priority CPUs > on the list? > > thanks > john > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Oct 09, 2007 at 10:23:19AM +0900, Atsushi SAKAI wrote:> Do you find any problem in your test (by HYPERVISOR_yield())? > Please describe your problem at first.The problem is over semantics of HYPERVISOR_yield(). We use this after doing an IPI when we want to wait for the other CPU to respond before continuing. If we''re just going into the hypervisor and straight back out again until our credit expires, then it''s clearly sub-optimal for this case. I haven''t looked hard at the implementation, or even verified this behaviour, but it does look like that. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel Ackaouy
2007-Oct-09 07:06 UTC
Re: [Xen-devel] credit scheduler and HYPERVISOR_yield()
Hi John. The expected behavior of yield() (or any schedule operation really) is that the current VCPU will be placed on the runq behind all VCPUs of equal or greater priority. Looking at __runq_insert() in sched_credit.c, it looks correct to me in that respect. Can you clarify what''s going wrong? On Oct 9, 2007, at 1:41, John Levon wrote:> > It looks like a HYPERVISOR_yield() call will end placing the yielded > VCPU at the head of run queue if there are only equal-or-lower priority > VCPUs on the queue. Shouldn''t it place it after any equal-priority CPUs > on the list? > > thanks > john > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Oct 09, 2007 at 09:06:14AM +0200, Emmanuel Ackaouy wrote:> The expected behavior of yield() (or any schedule operation really) is > that the current VCPU will be placed on the runq behind all VCPUs of > equal or greater priority. > > Looking at __runq_insert() in sched_credit.c, it looks correct to me in > that respect. > > Can you clarify what''s going wrong?It looks fine... no idea how I misread the code. sorry, john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2007-Oct-09 13:22 UTC
Re: [Xen-devel] credit scheduler and HYPERVISOR_yield()
The code does what it''s designed to -- put the current vcpu behind any vcpus of equal priority. But that behavior isn''t always the ideal, specifically in situations like John describes. The credit scheduler has two basic priorities -- TS_UNDER (hasn''t yet used all its credits) and TS_OVER (used all its credits). The scheduler switches between these two based on how much cpu time a vcpu has actually had compared to how much it''s allocated. This is a very clever way to make sure that each vcpu gets its fair share, but that spare cycles are still used effectively. What this means in the case of a yield(), unfortunately, is that If a given vcpu is the only vcpu on its processor with credits left, all it can do is burn up its extra credits spinning or calling yield() to no effect. A simple option would be, for the credit scheduler, to temporarily reduce the priority from TS_UNDER to TS_OVER. This will cause it to actually yield if there''s any other vcpus that can run. The next time accounting is done, the priority will be reset, and it should get more time because of the time it''s given up. Thoughts? -George On 10/9/07, John Levon <levon@movementarian.org> wrote:> On Tue, Oct 09, 2007 at 09:06:14AM +0200, Emmanuel Ackaouy wrote: > > > The expected behavior of yield() (or any schedule operation really) is > > that the current VCPU will be placed on the runq behind all VCPUs of > > equal or greater priority. > > > > Looking at __runq_insert() in sched_credit.c, it looks correct to me in > > that respect. > > > > Can you clarify what''s going wrong? > > It looks fine... no idea how I misread the code. > > sorry, > john > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel Ackaouy
2007-Oct-09 14:48 UTC
Re: [Xen-devel] credit scheduler and HYPERVISOR_yield()
On Oct 9, 2007, at 15:22, George Dunlap wrote:> What this means in the case of a yield(), unfortunately, is that If a > given vcpu is the only vcpu on its processor with credits left, all it > can do is burn up its extra credits spinning or calling yield() to no > effect. > > A simple option would be, for the credit scheduler, to temporarily > reduce the priority from TS_UNDER to TS_OVER. This will cause it to > actually yield if there''s any other vcpus that can run. The next time > accounting is done, the priority will be reset, and it should get more > time because of the time it''s given up.Temporarily changing the priority to TS_OVER strikes me as a reasonable idea. However, changing it for an average of half of the accounting period (1/2 100ms = 50ms) is hardly "temporary". A VCPUs that would call yield() more than once every 50ms or so -- which isn''t unreasonable -- would never be able to run at TS_UNDER. That would totally distort accounting fairness for users of yield(). Maybe something more in the temporary spirit of the TS_BOOST priority (but lower not higher than TS_UNDER) would be better? It may be worthwhile to consider if yield() can be replaced with more intelligent mechanisms for VCPU synchronization of SMP guests. In the case of ACKed IPIs for example, if all target VCPUs are not running at the time of the IPI initiation, it might be a good idea to put the source to sleep until all targets have ACKed. If all target VCPUs are running though, I suspect things will work best if the IPI initiator does not yield at all. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Tue, Oct 09, 2007 at 02:22:13PM +0100, George Dunlap wrote:> What this means in the case of a yield(), unfortunately, is that If a > given vcpu is the only vcpu on its processor with credits left, all it > can do is burn up its extra credits spinning or calling yield() to no > effect. > > A simple option would be, for the credit scheduler, to temporarily > reduce the priority from TS_UNDER to TS_OVER. This will cause it toWe prototyped this change and it made quite a difference (though didn''t solve our problems entirely). Would it be possible to get a proper fix available? Emmanuel Ackaouy wrote:> It may be worthwhile to consider if yield() can be replaced with > more intelligent mechanisms for VCPU synchronization of SMP > guests. In the case of ACKed IPIs for example, if all target VCPUs > are not running at the time of the IPI initiation, it might be a good > idea to put the source to sleep until all targets have ACKed. > If all target VCPUs are running though, I suspect things will work > best if the IPI initiator does not yield at all.This seems like a bad idea since we may be IPIing to several CPUs and we don''t want to sleep whilst we can usefully move on and IPI the other CPUs (even if they can''t quite respond yet). cheers john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel Ackaouy
2007-Oct-14 19:20 UTC
Re: [Xen-devel] credit scheduler and HYPERVISOR_yield()
On Oct 14, 2007, at 20:45, John Levon wrote:> Emmanuel Ackaouy wrote: >> It may be worthwhile to consider if yield() can be replaced with >> more intelligent mechanisms for VCPU synchronization of SMP >> guests. In the case of ACKed IPIs for example, if all target VCPUs >> are not running at the time of the IPI initiation, it might be a good >> idea to put the source to sleep until all targets have ACKed. >> If all target VCPUs are running though, I suspect things will work >> best if the IPI initiator does not yield at all. > > This seems like a bad idea since we may be IPIing to several CPUs and > we > don''t want to sleep whilst we can usefully move on and IPI the other > CPUs (even if they can''t quite respond yet).Why can''t you initiate the IPI to all the destination CPUs first and then wait for them to ACK, going to sleep if it looks like at least one of them won''t be able to ACK in a reasonable timeframe (for example if it is asleep or on the run queue of the running VCPU''s physical CPU)? I''m probably not understanding what you''re trying to do?> On Tue, Oct 09, 2007 at 02:22:13PM +0100, George Dunlap wrote: >> A simple option would be, for the credit scheduler, to temporarily >> reduce the priority from TS_UNDER to TS_OVER. This will cause it to > > We prototyped this change and it made quite a difference (though didn''t > solve our problems entirely). Would it be possible to get a proper fix > available?Doing the change that George proposed may help in your case but I suspect that, as I described in my previous post, it will cause problems for other workloads. I think it is reasonable for a yield() operation to yield to runnable VCPUs of equal or higher priority than the running VCPU. That is the behavior of the scheduler today. Maybe your problem can be addressed without changing the behavior of yield? With that said, it''s unlikely that I''ll be making a change to the scheduler myself: I haven''t worked at XenSource for some time now and don''t have the resources (not to mention time) to test any such change. I''m happy to learn about your problem and suggest potential fixes but I''m probably not the person you need to convince if you want to make a significant scheduler change these days. Arguably, a number of things need to be done in the Xen scheduler and synchronization primitives to improve the performance of SMP guests. It may be worthwhile to have a generic discussion about that on top of the specific problem you''re encountering. Cheers, Emmanuel. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sun, Oct 14, 2007 at 09:20:50PM +0200, Emmanuel Ackaouy wrote:> >>It may be worthwhile to consider if yield() can be replaced with > >>more intelligent mechanisms for VCPU synchronization of SMP > >>guests. In the case of ACKed IPIs for example, if all target VCPUs > >>are not running at the time of the IPI initiation, it might be a good > >>idea to put the source to sleep until all targets have ACKed. > >>If all target VCPUs are running though, I suspect things will work > >>best if the IPI initiator does not yield at all. > >> >This seems like a bad idea since we may be IPIing to several CPUs and > >we don''t want to sleep whilst we can usefully move on and IPI the > >other CPUs (even if they can''t quite respond yet). > > Why can''t you initiate the IPI to all the destination CPUs firstWe do, maybe I misunderstood what you were suggesting.> Doing the change that George proposed may help in your case > but I suspect that, as I described in my previous post, it will cause > problems for other workloads. > > I think it is reasonable for a yield() operation to yield to runnable > VCPUs of equal or higher priority than the running VCPU. That > is the behavior of the scheduler today.Well yes, except the priorities are "wrong". We''ve explicitly asked to not be scheduled, it doesn''t seem right for the scheduler not to heed that suggestion.> Maybe your problem can be addressed without changing the behavior of > yield?For this particular problem, sure.> With that said, it''s unlikely that I''ll be making a change to the > scheduler myself: I haven''t worked at XenSource for some timeThat''s fine... regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel Ackaouy
2007-Oct-14 21:25 UTC
Re: [Xen-devel] credit scheduler and HYPERVISOR_yield()
On Oct 14, 2007, at 21:49, John Levon wrote:> On Sun, Oct 14, 2007 at 09:20:50PM +0200, Emmanuel Ackaouy wrote: >> Why can''t you initiate the IPI to all the destination CPUs first > > We do, maybe I misunderstood what you were suggesting.Thats ok. I still don''t understand what problem you''re working on myself. I think you''re suggesting you''d like to always deschedule a VCPU initiating an IPI for potentially multiple time slices. That seems crazy to me. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sun, Oct 14, 2007 at 11:25:43PM +0200, Emmanuel Ackaouy wrote:> >On Sun, Oct 14, 2007 at 09:20:50PM +0200, Emmanuel Ackaouy wrote: > >>Why can''t you initiate the IPI to all the destination CPUs first > > > >We do, maybe I misunderstood what you were suggesting. > > Thats ok. I still don''t understand what problem you''re working > on myself. I think you''re suggesting you''d like to always > deschedule a VCPU initiating an IPI for potentially multiple > time slices. That seems crazy to me.Yes, that would be crazy indeed. I''d like a VCPU that does a yield to actually yield, from what I can tell that''s typically not happening right now john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2007-Oct-15 12:26 UTC
Re: [Xen-devel] credit scheduler and HYPERVISOR_yield()
On 10/14/07, Emmanuel Ackaouy <ackaouy@gmail.com> wrote:> Doing the change that George proposed may help in your case > but I suspect that, as I described in my previous post, it will cause > problems for other workloads. > > I think it is reasonable for a yield() operation to yield to runnable > VCPUs of equal or higher priority than the running VCPU. That > is the behavior of the scheduler today. Maybe your problem can > be addressed without changing the behavior of yield?Part of the problem is that for the credit scheduler, the "priority" is used a bit differently. It changes, and it has no fundamental relationship between more important work and less important work; it''s just a mechanism for implementing time allocations. (And a very clever way, I might add.) It''s clear that "yield-I-really-mean-it" is useful for smp synchronization issues (like yielding when waiting for a spinlock held by scheduled-out vcpus, or waiting for a scheduled-out processor to ACK an IPI). But I can''t really think of a situation where "yield-to-other-cpus-that-haven''t-used-all-their-credits-yet" is particularly useful. Can you think of an example? Perhaps a better implementation of "yield-I-really-mean-it" would be: * Reduce the priority only if there are no vcpus of the same priority in the queue; and perhaps, only if there are no vcpus in the queue and no work to steal. * As soon as the vcpu in question is scheduled, raise its priority again. That should avoid some of the problems you''ve pointed out with the yield-reduces-priority technique.> Arguably, a number of things need to be done in > the Xen scheduler and synchronization primitives to improve > the performance of SMP guests. It may be worthwhile to have > a generic discussion about that on top of the specific problem > you''re encountering.Here are some random ideas: * Expose to the guest, via the shared-info page, which vcpus are actively scheduled or not. * Implement some kind of a yield or block primitive, like: + yield to a specific vcpu (i.e., the one holding the lock you want) + block with a vcpu mask. The vcpu will then be blocked until each of the vcpus in the mask has been scheduled at least once. Thoughts? -George _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Mon, Oct 15, 2007 at 01:26:06PM +0100, George Dunlap wrote:> Part of the problem is that for the credit scheduler, the "priority" > is used a bit differently. It changes, and it has no fundamental > relationship between more important work and less important work; it''s > just a mechanism for implementing time allocations. (And a very clever > way, I might add.) > > It''s clear that "yield-I-really-mean-it" is useful for smp > synchronization issues (like yielding when waiting for a spinlock held > by scheduled-out vcpus, or waiting for a scheduled-out processor to > ACK an IPI). But I can''t really think of a situation where > "yield-to-other-cpus-that-haven''t-used-all-their-credits-yet" is > particularly useful. Can you think of an example? > > Perhaps a better implementation of "yield-I-really-mean-it" would be: > * Reduce the priority only if there are no vcpus of the same priority > in the queue; and perhaps, only if there are no vcpus in the queue and > no work to steal.Isn''t this the opposite of what our case needs? That is, we yield, and we want to schedule another VCPU, whether it''s the same priority or not.> > Arguably, a number of things need to be done in > > the Xen scheduler and synchronization primitives to improve > > the performance of SMP guests. It may be worthwhile to have > > a generic discussion about that on top of the specific problem > > you''re encountering. > > Here are some random ideas: > * Expose to the guest, via the shared-info page, which vcpus are > actively scheduled or not.That info is already available via the runstate (although we don''t use it, and it wouldn''t help us - the problem is that the ''other'' VCPU doesn''t get scheduled when we yield, not that we don''t know whether to yield or not.)> * Implement some kind of a yield or block primitive, like: > + yield to a specific vcpu (i.e., the one holding the lock you want) > + block with a vcpu mask. The vcpu will then be blocked until each of > the vcpus in the mask has been scheduled at least once.Possible if the scheduler can''t be fixed in a similar way. regards, john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Samuel Thibault
2007-Oct-15 12:43 UTC
Re: [Xen-devel] credit scheduler and HYPERVISOR_yield()
Hi, George Dunlap, le Mon 15 Oct 2007 13:26:06 +0100, a écrit :> I can''t really think of a situation where > "yield-to-other-cpus-that-haven''t-used-all-their-credits-yet" is > particularly useful. Can you think of an example?That could actually be the counter part of "yield-I-really-mean-it": - vCPU0 yields-really-means-it so as to hopefully schedule vCPU1 - vCPU1 realizes why it got scheduled, does the needed urging job. - vCPU1 "yields-to-other-cpus-thatblabla", for letting Xen know it finished its urging job and usual priorities can be taken into account again. - vCPU0 gets scheduled again because it actually had bigger priority.> Here are some random ideas: > * Expose to the guest, via the shared-info page, which vcpus are > actively scheduled or not.That could be useful, but one can''t safely rely on it, since it may change asynchronously.> * Implement some kind of a yield or block primitive, like: > + yield to a specific vcpu (i.e., the one holding the lock you want)That should be quite fine. Xen could use it as a strong scheduling hint. If scheduling that vCPU immediately would break some quota rules for instance, Xen could still remember that it shouldn''t reschedule the calling vCPU before having scheduled the target vCPU at least once.> + block with a vcpu mask. The vcpu will then be blocked until each of > the vcpus in the mask has been scheduled at least once.That could be also called yield_to_vcpus actually. Samuel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel Ackaouy
2007-Oct-15 17:13 UTC
Re: [Xen-devel] credit scheduler and HYPERVISOR_yield()
I suspect yield() was first devised as a simple synchronization mechanism for uni-processor round robin schedulers. Then strict priorities were added to make certain tasks (like pagers) run more aggressively than "normal" ones. As long as these high priority threads don''t use the yield() mechanism, things are fine. I believe you are pointing out that from the perspective of the yield() mechanism, all time-share priorities (UNDER and OVER) should be considered one and the same because they are not strict priorities. This is a good observation and I agree with you (as long as reasonable uses of yield() don''t cause fairness to go out the window). However, before you go and fix yield(), you might want to consider this: 1- It''s been proposed before that things like dom0 VCPUs be scheduled with a priority strictly greater than any domU VCPU. If strict priorities are introduced into the Xen scheduler at some point in the future, code that assumes that a yield() from a VCPU will allow all other runnable VCPUs in the system a chance to run ahead of it will break (again). 2- Priorities aside, on an SMP host (ie all computers) with distributed run queues, it is non trivial to guarantee that a VCPU will not be rescheduled until all other runnable VCPUs have had a chance to run first. If you can come up with a simple and scalable way to do it, great. I suspect you will need to approximate this definition of yield() though, perhaps by using some form of directed yield, targeted at one or more VCPUs ,as you have suggested. 3- Yield really isn''t a great model to do synchronization in an SMP world. If you''re going to para-virtualize your IPI and spinlock paths, as you pointed out in your last mail, you might as well do something that can be directed and block if necessary. I guess my point is that instead of working real hard to try and maintain the old yield behavior ("don''t run again until all other runnable VCPUs have had a chance to run first") on an SMP scheduler which potentially also has to deal with strict priorities, you''d be better off spending your energy on building and optimizing simpler and more targeted synchronization mechanisms and using those instead. User level threads libraries may be a good place to look for inspiration if you''re really worried about the costs of supervisor to hypervisor context switches. I''m not a huge fan of share pages but it was popular to write papers about them for user level thread synchronization back in the 90s. In the case of IPIs, you''re already going into the hypervisor so you should be able to do something straightforward with a sleeping semaphore. Maybe you spin a little before you sleep though to give running VCPUs a chance to respond before you give up the end of your time slice. For spinlocks, I suspect turning a spinlock into a sleeping lock after a reasonable number of spins would work well too. In the long run, it would probably be beneficial to remove most uses of the generic yield mechanism. Emmanuel. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 15/10/07 18:13, "Emmanuel Ackaouy" <ackaouy@gmail.com> wrote:> In the case of IPIs, you''re already going into the hypervisor so you > should be able to do something straightforward with a sleeping > semaphore. Maybe you spin a little before you sleep though to give > running VCPUs a chance to respond before you give up the end of > your time slice.Actually a blocking spinlock could be implemented with no Xen changes. Change the spinlock function in Linux to spin a few times and then set a waiting bit in a cpumask and then SCHEDOP_poll on a per-VCPU spinlock-wakeup event channel. On spin_unlock, the unlocker sends an event to any VCPU in the cpumask, using each VCPU''s spinlock-wakeup event channel. Yes, this is probably nicer than using yield() and praying. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel