I''m working on a new hypercall, do_confer, which allows the directed yielding of a vcpu to another vcpu. It is mainly used when a vcpu fails to acquire a spinlock, yielding to the lock holder instead of spinning. I ported the ppc64 spinlock implementation for the i386 linux portion. In implementing the hypercall, I''ve been trying to figure out how to get the scheduler (I''ve only played with bvt) to run the vcpu passed in the hypercall (after some validation) but I''ve run into various bad state situations (do_softirq pending != 0 assert, ''!active_ac_timer(timer)'' failed , and __task_on_runqueue(prev) failed) which tells me I don''t fully understand all of the book-keeping that is needed. Has anyone thought about how to do this with either BVT or the new EDF scheduler? -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephan Diestelhorst
2005-May-18 12:10 UTC
Re: [Xen-devel] scheduler independent forced vcpu selection
That is a good idea, there is quite a number of other spinlock optimisations on the way...> I''m working on a new hypercall, do_confer, which allows the directed > yielding of a vcpu to another vcpu. It is mainly used when a vcpu fails > to acquire a spinlock, yielding to the lock holder instead of spinning. I > ported the ppc64 spinlock implementation for the i386 linux portion. In > implementing the hypercall, I''ve been trying to figure out how to get > the scheduler (I''ve only played with bvt) to run the vcpu passed in the > hypercall (after some validation) but I''ve run into various bad state > situations (do_softirq pending != 0 assert, ''!active_ac_timer(timer)'' > failed , and __task_on_runqueue(prev) failed) which tells me I > don''t fully understand all of the book-keeping that is needed. Has > anyone thought about how to do this with either BVT or the new EDF > scheduler?Building code similar to do_block and __enter_scheduler in xen/common/schedule.c should be working fine, except of course running the original scheduler, but switching directly to the hinted domain. Are you calling do_softirq directly? If not then it is quite strange, that this assertion fails. The timer assertion might be the old scheduling timer, which gets probably reset, but not deleted beforehand... And the on runqueue assertion suggests that you are ''stealing'' the domain from the schedulers queues without giving it a chance to notice. I''d guess cloning do_block and appending code from __enter_scheduler with some checks (is the ''receiver'' domain runnable? if not run proper sched.do_schedule) should give you a solid base to start from. Cheers, Stephan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan Harper
2005-May-18 14:55 UTC
Re: [Xen-devel] scheduler independent forced vcpu selection
* Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]:> Are you calling do_softirq directly? If not then it is quite strange,No, I just call raise_softirq(SCHEDULE_SOFTIRQ); without a subsequent do_softirq().> that this assertion fails. > The timer assertion might be the old scheduling timer, which gets > probably reset, but not deleted beforehand... And the on runqueue > assertion suggests that you are ''stealing'' the domain from the > schedulers queues without giving it a chance to notice.Could you explain what ''giving it a chance to notice'' means?> I''d guess cloning do_block and appending code from __enter_scheduler > with some checks (is the ''receiver'' domain runnable? if not run proper > sched.do_schedule) should give you a solid base to start from.Let me add in a check for domain_runnable and see if that helps. Thanks for the feedback. Let me know if you want me to post the patch of where I''m at right now. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan Harper
2005-May-18 18:03 UTC
Re: [Xen-devel] scheduler independent forced vcpu selection
* Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]:> The timer assertion might be the old scheduling timer, which gets > probably reset, but not deleted beforehand... And the on runqueue > assertion suggests that you are ''stealing'' the domain from the > schedulers queues without giving it a chance to notice.Looking at both bvt and sedf, the runqueue is ordered by some metric or another (evt, deadline respectively). What I think we need is a way to swap positions in the runqueues. That is, if the lock holder is runnable, I want the holder to run instead of current. Is there some way to do this in a scheduler independent manner with the current set of scheduler ops defined in sched-if.h ? I noticed that neither bvt or sedf implement the rem_task function which I thought could be used to help out with the ''stealing'' by notifying the schedulers that prev was going away (removing it from the runqueue) but just removing the exec_domain from the runqueue didn''t help. I''m including a patch that I''m currently using so you can get a better idea of the modifications to schedule.c I''m making. -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com --- --- b/xen/common/schedule.c 2005-05-17 22:16:55.000000000 -0500 +++ c/xen/common/schedule.c 2005-05-18 12:42:44.765691872 -0500 @@ -273,6 +273,49 @@ return 0; } +/* Confer control to another vcpu */ +long do_confer(unsigned int vcpu, unsigned int yield_count) +{ + struct domain *d = current->domain; + + /* count hcalls */ + current->confercnt++; + + /* Validate CONFER prereqs: + * - vcpu is within bounds + * - vcpu is a valid in this domain + * - current has not already conferred its slice to vcpu + * - vcpu is not already running + * - designated vcpu''s yield_count matches value from call + * + * of 1-4 are ok, then set conferred value and enter scheduler + */ + + if (vcpu > MAX_VIRT_CPUS) + return 0; + + if (d->exec_domain[vcpu] == NULL) + return 0; + + if (current->conferred != VCPU_CANCONFER) + return 0; + + /* even counts indicate a running vcpu, odd is preempted/conferred */ + if ((d->exec_domain[vcpu]->vcpu_info->yield_count & 1) == 0) + return 0; + + if (d->exec_domain[vcpu]->vcpu_info->yield_count != yield_count) + return 0; + + /* + * set which vcpu should run in conferred state, request scheduling + */ + current->conferred = (VCPU_CONFERRING|vcpu); + raise_softirq(SCHEDULE_SOFTIRQ); + + return 0; +} + /* * Demultiplex scheduler-related hypercalls. */ @@ -412,8 +455,9 @@ */ static void __enter_scheduler(void) { - struct exec_domain *prev = current, *next = NULL; + struct exec_domain *prev = current, *next = NULL, *holder = NULL; int cpu = prev->processor; + unsigned int holder_vcpu; s_time_t now; struct task_slice next_slice; s32 r_time; /* time for new dom to run */ @@ -436,12 +480,39 @@ prev->cpu_time += now - prev->lastschd; - /* get policy-specific decision on scheduling... */ - next_slice = ops.do_schedule(now); + /* get ed pointer to holder vcpu */ + holder_vcpu = 0xffff & prev->conferred; + holder = prev->domain->exec_domain[holder_vcpu]; + + if (unlikely(prev->conferred & VCPU_CONFERRING) && + domain_runnable(holder)) + { + /* run holder next */ + next = holder; + + /* run for the remainder of prev''s slice */ + r_time = schedule_data[cpu].s_timer.expires - now; + + /* increment confer counters */ + prev->confer_out++; + next->confer_in++; + + /* change prev''s confer state to prevent re-entrance */ + prev->conferred = VCPU_CONFERRED; + + } else { + /* get policy-specific decision on scheduling... */ + next_slice = ops.do_schedule(now); + + r_time = next_slice.time; + next = next_slice.task; + } + + /* + * always clear conferred state so this vcpu can confer during its slice + */ + next->conferred = 0; - r_time = next_slice.time; - next = next_slice.task; - schedule_data[cpu].curr = next; next->lastschd = now; @@ -455,6 +526,12 @@ spin_unlock_irq(&schedule_data[cpu].schedule_lock); + /* bump vcpu yield_count when controlling domain is not-idle */ + if ( !is_idle_task(prev->domain) ) + prev->vcpu_info->yield_count++; + if ( !is_idle_task(next->domain) ) + next->vcpu_info->yield_count++; + if ( unlikely(prev == next) ) { #ifdef ADV_SCHED_HISTO adv_sched_hist_to_stop(cpu); _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan Harper
2005-May-18 22:37 UTC
Re: [Xen-devel] scheduler independent forced vcpu selection
* Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]:> > I''m working on a new hypercall, do_confer, which allows the directed > > yielding of a vcpu to another vcpu. It is mainly used when a vcpu fails > > to acquire a spinlock, yielding to the lock holder instead of spinning. I > > ported the ppc64 spinlock implementation for the i386 linux portion. In > > implementing the hypercall, I''ve been trying to figure out how to get > > the scheduler (I''ve only played with bvt) to run the vcpu passed in the > > hypercall (after some validation) but I''ve run into various bad state > > situations (do_softirq pending != 0 assert, ''!active_ac_timer(timer)'' > > failed , and __task_on_runqueue(prev) failed) which tells me I > > don''t fully understand all of the book-keeping that is needed. Has > > anyone thought about how to do this with either BVT or the new EDF > > scheduler?After some thought, domain_wake(), followed by raise_softirq(SCHEDULE_SOFTIRQ) does what I want and removes the huge mess I was making in __enter_scheduler(). -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephan Diestelhorst
2005-May-19 13:22 UTC
Re: [Xen-devel] scheduler independent forced vcpu selection
Ryan Harper schrieb:> * Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]: > >>The timer assertion might be the old scheduling timer, which gets >>probably reset, but not deleted beforehand... And the on runqueue >>assertion suggests that you are ''stealing'' the domain from the >>schedulers queues without giving it a chance to notice. > > > Looking at both bvt and sedf, the runqueue is ordered by some metric or > another (evt, deadline respectively). What I think we need is a way to > swap positions in the runqueues. That is, if the lock holder is > runnable, I want the holder to run instead of current. Is there some > way to do this in a scheduler independent manner with the current set of > scheduler ops defined in sched-if.h ?How about blocking/pausing the currently running domain? I can''t think of another way of doing this in an scheduler independent fashion...> I noticed that neither bvt or sedf implement the rem_task function which > I thought could be used to help out with the ''stealing'' by notifying the > schedulers that prev was going away (removing it from the runqueue) but > just removing the exec_domain from the runqueue didn''t help.That is really nasty, and just describes what I meant with "stealing" a domain from the scheduler! :-)> I''m including a patch that I''m currently using so you can get a better > idea of the modifications to schedule.c I''m making.Thanks, Stephan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stephan Diestelhorst
2005-May-19 13:25 UTC
Re: [Xen-devel] scheduler independent forced vcpu selection
Ryan Harper schrieb:> * Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]: > >>>I''m working on a new hypercall, do_confer, which allows the directed >>>yielding of a vcpu to another vcpu. It is mainly used when a vcpu fails >>>to acquire a spinlock, yielding to the lock holder instead of spinning. I >>>ported the ppc64 spinlock implementation for the i386 linux portion. In >>>implementing the hypercall, I''ve been trying to figure out how to get >>>the scheduler (I''ve only played with bvt) to run the vcpu passed in the >>>hypercall (after some validation) but I''ve run into various bad state >>>situations (do_softirq pending != 0 assert, ''!active_ac_timer(timer)'' >>>failed , and __task_on_runqueue(prev) failed) which tells me I >>>don''t fully understand all of the book-keeping that is needed. Has >>>anyone thought about how to do this with either BVT or the new EDF >>>scheduler? > > > After some thought, domain_wake(), followed by > raise_softirq(SCHEDULE_SOFTIRQ) does what I want and removes the huge > mess I was making in __enter_scheduler().Are you waking up the domain that holds the lock? Then you would rely on the scheduler to give the woken domain a high "priority" (whatever this means for the current scheduler) and should start that domain immediatelly, right? Best, Stephan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan Harper
2005-May-19 14:55 UTC
Re: [Xen-devel] scheduler independent forced vcpu selection
* Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-19 09:04]:> Ryan Harper schrieb: > > * Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]: > > > >>>I''m working on a new hypercall, do_confer, which allows the directed > >>>yielding of a vcpu to another vcpu. It is mainly used when a vcpu fails > >>>to acquire a spinlock, yielding to the lock holder instead of spinning. I > >>>ported the ppc64 spinlock implementation for the i386 linux portion. In > >>>implementing the hypercall, I''ve been trying to figure out how to get > >>>the scheduler (I''ve only played with bvt) to run the vcpu passed in the > >>>hypercall (after some validation) but I''ve run into various bad state > >>>situations (do_softirq pending != 0 assert, ''!active_ac_timer(timer)'' > >>>failed , and __task_on_runqueue(prev) failed) which tells me I > >>>don''t fully understand all of the book-keeping that is needed. Has > >>>anyone thought about how to do this with either BVT or the new EDF > >>>scheduler? > > > > > > After some thought, domain_wake(), followed by > > raise_softirq(SCHEDULE_SOFTIRQ) does what I want and removes the huge > > mess I was making in __enter_scheduler(). > > Are you waking up the domain that holds the lock? Then you would rely onYes, that is the idea.> the scheduler to give the woken domain a high "priority" (whatever this > means for the current scheduler) and should start that domain > immediatelly, right?Yes, that is part of what is required. I need to do two things after validation of do_confer: 1) Wake the lock-holder vcpu 2) Schedule the lock-holder to only run for the remaining time-slice of the current running vcpu. Using domain_wake() and softirq, I''m only getting (1), but I have no guarantee when the lock-holder is actually woken up. Any thoughts on how to get (2)? -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ryan Harper
2005-May-19 15:05 UTC
Re: [Xen-devel] scheduler independent forced vcpu selection
* Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-19 09:04]:> Ryan Harper schrieb: > > * Stephan Diestelhorst <sd386@cl.cam.ac.uk> [2005-05-18 09:04]: > > > >>>I''m working on a new hypercall, do_confer, which allows the directed > >>>yielding of a vcpu to another vcpu. It is mainly used when a vcpu fails > >>>to acquire a spinlock, yielding to the lock holder instead of spinning. I > >>>ported the ppc64 spinlock implementation for the i386 linux portion. In > >>>implementing the hypercall, I''ve been trying to figure out how to get > >>>the scheduler (I''ve only played with bvt) to run the vcpu passed in the > >>>hypercall (after some validation) but I''ve run into various bad state > >>>situations (do_softirq pending != 0 assert, ''!active_ac_timer(timer)'' > >>>failed , and __task_on_runqueue(prev) failed) which tells me I > >>>don''t fully understand all of the book-keeping that is needed. Has > >>>anyone thought about how to do this with either BVT or the new EDF > >>>scheduler? > > > > > > After some thought, domain_wake(), followed by > > raise_softirq(SCHEDULE_SOFTIRQ) does what I want and removes the huge > > mess I was making in __enter_scheduler(). > > Are you waking up the domain that holds the lock? Then you would rely on > the scheduler to give the woken domain a high "priority" (whatever this > means for the current scheduler) and should start that domain > immediatelly, right?I noticed your comments in sched_sedf.c about domain waking. * 3. Unconservative (i.e. incorrect) * -to boost the performance of I/O dependent domains it would be possible * to put the domain into the runnable queue immediately, and let it run * for the remainder of the slice of the current period * (or even worse: allocate a new full slice for the domain) * -either behaviour can lead to missed deadlines in other domains as * opposed to approaches 1,2a,2b Giving the remainder of the current slice to the domain we are waking *sounds* like what I wanted, but you are concerned that it causes missed deadlines. Could you elaborate when we would have such a case? If we are only running in the remaining timeslice (which would expire before the next deadline) then why would such behaviour lead to missing deadlines? -- Ryan Harper Software Engineer; Linux Technology Center IBM Corp., Austin, Tx (512) 838-9253 T/L: 678-9253 ryanh@us.ibm.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel