Hui Lv
2011-Dec-19 22:13 UTC
[PATCH] xen/sched_credit: Use delay to control scheduling frequency
1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler. 2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance) 3. We have compared different load level response time/latency (low, high, peak), "delay method" didn''t bring very much response time increase. 4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (.int sched_ratelimit_us = 1000. is the recommended setting) Signed-off-by: Hui Lv <hui.lv@intel.com> Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com> diff -r a4bff36780a3 -r 114495d1ff94 xen/common/sched_credit.c --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/common/sched_credit.c Mon Dec 19 15:58:50 2011 -0500 @@ -110,6 +110,10 @@ boolean_param("sched_credit_default_yiel static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS; integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms); +/* Scheduler generic parameters +*/ +extern int sched_ratelimit_us; + /* * Physical CPU */ @@ -1297,10 +1301,15 @@ csched_schedule( struct csched_private *prv = CSCHED_PRIV(ops); struct csched_vcpu *snext; struct task_slice ret; + s_time_t runtime, tslice; CSCHED_STAT_CRANK(schedule); CSCHED_VCPU_CHECK(current); + runtime = now - current->runstate.state_entry_time; + if ( runtime < 0 ) /* Does this ever happen? */ + runtime = 0; + if ( !is_idle_vcpu(scurr->vcpu) ) { /* Update credits of a non-idle VCPU. */ @@ -1313,6 +1322,41 @@ csched_schedule( scurr->pri = CSCHED_PRI_IDLE; } + /* Choices, choices: + * - If we have a tasklet, we need to run the idle vcpu no matter what. + * - If sched rate limiting is in effect, and the current vcpu has + * run for less than that amount of time, continue the current one, + * but with a shorter timeslice and return it immediately + * - Otherwise, chose the one with the highest priority (which may + * be the one currently running) + * - If the currently running one is TS_OVER, see if there + * is a higher priority one waiting on the runqueue of another + * cpu and steal it. + */ + + /* If we have schedule rate limiting enabled, check to see + * how long we''ve run for. */ + if ( !tasklet_work_scheduled + && sched_ratelimit_us + && vcpu_runnable(current) + && !is_idle_vcpu(current) + && runtime < MICROSECS(sched_ratelimit_us) ) + { + snext = scurr; + snext->start_time += now; + perfc_incr(delay_ms); + tslice = MICROSECS(sched_ratelimit_us); + ret.migrated = 0; + goto out; + } + else + { + /* + * Select next runnable local VCPU (ie top of local runq) + */ + tslice = MILLISECS(prv->tslice_ms); + } + /* * Select next runnable local VCPU (ie top of local runq) */ @@ -1367,11 +1411,12 @@ csched_schedule( if ( !is_idle_vcpu(snext->vcpu) ) snext->start_time += now; +out: /* * Return task to run next... */ ret.time = (is_idle_vcpu(snext->vcpu) ? - -1 : MILLISECS(prv->tslice_ms)); + -1 : tslice); ret.task = snext->vcpu; CSCHED_VCPU_CHECK(ret.task); diff -r a4bff36780a3 -r 114495d1ff94 xen/common/schedule.c --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/common/schedule.c Mon Dec 19 15:58:50 2011 -0500 @@ -47,6 +47,12 @@ string_param("sched", opt_sched); bool_t sched_smt_power_savings = 0; boolean_param("sched_smt_power_savings", sched_smt_power_savings); +/* Default scheduling rate limit: 1ms + * It''s not recommended to set this value bigger than "sched_credit_tslice_ms" + * otherwise, something weired may happen + * */ +int sched_ratelimit_us = 1000; +integer_param("sched_ratelimit_us", sched_ratelimit_us); /* Various timer handlers. */ static void s_timer_fn(void *unused); static void vcpu_periodic_timer_fn(void *data); diff -r a4bff36780a3 -r 114495d1ff94 xen/include/xen/perfc_defn.h --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/include/xen/perfc_defn.h Mon Dec 19 15:58:50 2011 -0500 @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch PERFCOUNTER(sched_run, "sched: runs through scheduler") PERFCOUNTER(sched_ctx, "sched: context switches") +PERFCOUNTER(delay_ms, "csched: delay") PERFCOUNTER(vcpu_check, "csched: vcpu_check") PERFCOUNTER(schedule, "csched: schedule") PERFCOUNTER(acct_run, "csched: acct_run")
Ian Campbell
2011-Dec-20 09:07 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
On Mon, 2011-12-19 at 22:13 +0000, Hui Lv wrote:> 1. Basically, the "delay method" can achieve 11% overall performance > boost for SPECvirt than original credit scheduler.Is it possible to publish the raw numbers?> 2. We have tried 1ms delay and 10ms delay, there is no big difference > between these two configurations. (1ms is enough to achieve a good > performance) > 3. We have compared different load level response time/latency (low, > high, peak), "delay method" didn''t bring very much response time > increase. > 4. 1ms delay can reduce 30% context switch at peak performance, where > produces the benefits. (.int sched_ratelimit_us = 1000. is the > recommended setting) > > Signed-off-by: Hui Lv <hui.lv@intel.com> > Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com> > > diff -r a4bff36780a3 -r 114495d1ff94 xen/common/sched_credit.c > --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/common/sched_credit.c Mon Dec 19 15:58:50 2011 -0500 > @@ -110,6 +110,10 @@ boolean_param("sched_credit_default_yiel > static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS; > integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms); > > +/* Scheduler generic parameters > +*/ > +extern int sched_ratelimit_us;If this is generic then it seems like xen/sched.h is the right place for it. Is it really generic though if only the credit scheduler applies it?> + > /* > * Physical CPU > */ > @@ -1297,10 +1301,15 @@ csched_schedule( > struct csched_private *prv = CSCHED_PRIV(ops); > struct csched_vcpu *snext; > struct task_slice ret; > + s_time_t runtime, tslice; > > CSCHED_STAT_CRANK(schedule); > CSCHED_VCPU_CHECK(current); > > + runtime = now - current->runstate.state_entry_time; > + if ( runtime < 0 ) /* Does this ever happen? */ > + runtime = 0; > + > if ( !is_idle_vcpu(scurr->vcpu) ) > { > /* Update credits of a non-idle VCPU. */ > @@ -1313,6 +1322,41 @@ csched_schedule( > scurr->pri = CSCHED_PRI_IDLE; > } > > + /* Choices, choices: > + * - If we have a tasklet, we need to run the idle vcpu no matter what. > + * - If sched rate limiting is in effect, and the current vcpu has > + * run for less than that amount of time, continue the current one, > + * but with a shorter timeslice and return it immediately > + * - Otherwise, chose the one with the highest priority (which may > + * be the one currently running) > + * - If the currently running one is TS_OVER, see if there > + * is a higher priority one waiting on the runqueue of another > + * cpu and steal it. > + */ > + > + /* If we have schedule rate limiting enabled, check to see > + * how long we''ve run for. */ > + if ( !tasklet_work_scheduled > + && sched_ratelimit_us > + && vcpu_runnable(current) > + && !is_idle_vcpu(current) > + && runtime < MICROSECS(sched_ratelimit_us) ) > + { > + snext = scurr; > + snext->start_time += now; > + perfc_incr(delay_ms); > + tslice = MICROSECS(sched_ratelimit_us); > + ret.migrated = 0; > + goto out; > + } > + else > + {This is the same comment as the next block below, does it really apply here too? Since the previous if block ends with a goto the else clause is a bit redundant.> + /* > + * Select next runnable local VCPU (ie top of local runq) > + */ > + tslice = MILLISECS(prv->tslice_ms); > + } > + > /* > * Select next runnable local VCPU (ie top of local runq) > */ > @@ -1367,11 +1411,12 @@ csched_schedule( > if ( !is_idle_vcpu(snext->vcpu) ) > snext->start_time += now; > > +out: > /* > * Return task to run next... > */ > ret.time = (is_idle_vcpu(snext->vcpu) ? > - -1 : MILLISECS(prv->tslice_ms)); > + -1 : tslice); > ret.task = snext->vcpu; > > CSCHED_VCPU_CHECK(ret.task); > diff -r a4bff36780a3 -r 114495d1ff94 xen/common/schedule.c > --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/common/schedule.c Mon Dec 19 15:58:50 2011 -0500 > @@ -47,6 +47,12 @@ string_param("sched", opt_sched); > bool_t sched_smt_power_savings = 0; > boolean_param("sched_smt_power_savings", sched_smt_power_savings); > > +/* Default scheduling rate limit: 1ms > + * It''s not recommended to set this value bigger than "sched_credit_tslice_ms" > + * otherwise, something weired may happenweird It that constraint enforced anywhere?> + * */ > +int sched_ratelimit_us = 1000; > +integer_param("sched_ratelimit_us", sched_ratelimit_us); > /* Various timer handlers. */ > static void s_timer_fn(void *unused); > static void vcpu_periodic_timer_fn(void *data); > diff -r a4bff36780a3 -r 114495d1ff94 xen/include/xen/perfc_defn.h > --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/include/xen/perfc_defn.h Mon Dec 19 15:58:50 2011 -0500 > @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch > PERFCOUNTER(sched_run, "sched: runs through scheduler") > PERFCOUNTER(sched_ctx, "sched: context switches") > > +PERFCOUNTER(delay_ms, "csched: delay") > PERFCOUNTER(vcpu_check, "csched: vcpu_check") > PERFCOUNTER(schedule, "csched: schedule") > PERFCOUNTER(acct_run, "csched: acct_run") > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Lv, Hui
2011-Dec-20 09:44 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
> Is it possible to publish the raw numbers?Yes. Actually, I have posted the raw data before. You can check the attachment for the details.> If this is generic then it seems like xen/sched.h is the right place for it. > > Is it really generic though if only the credit scheduler applies it?We have discussed the "generic" issue in previous mails with George, Dario Faggioli and some other people. The conclusion is that currently, it is applied to credit scheduler. In future, it may be developed into a generic solution.> This is the same comment as the next block below, does it really apply here > too? > > Since the previous if block ends with a goto the else clause is a bit redundant.Thanks, I'll remove the redundant comment and else clause.> It that constraint enforced anywhere?Currently, it's not constraint enforced. I think George have gave the explanation in yesterday's discussion. Best regards, Lv, Hui _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
George Dunlap
2011-Dec-20 10:21 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
On Tue, 2011-12-20 at 09:07 +0000, Ian Campbell wrote:> > > +/* Scheduler generic parameters > > +*/ > > +extern int sched_ratelimit_us; > > If this is generic then it seems like xen/sched.h is the right place for > it.Yes, this should be declared in xen/include/xen/sched.h.> Is it really generic though if only the credit scheduler applies it?The concept is generic enough that I think saying, "This is the control, not all schedulers implement it" makes sense. credit2 *may* in the future have use for such a control; at which point it would be a bit asinine to have two switches, named "credit_sched_ratelimit_us" and "credit2_sched_ratelimit_us".> > + /* If we have schedule rate limiting enabled, check to see > > + * how long we''ve run for. */ > > + if ( !tasklet_work_scheduled > > + && sched_ratelimit_us > > + && vcpu_runnable(current) > > + && !is_idle_vcpu(current) > > + && runtime < MICROSECS(sched_ratelimit_us) ) > > + { > > + snext = scurr; > > + snext->start_time += now; > > + perfc_incr(delay_ms); > > + tslice = MICROSECS(sched_ratelimit_us); > > + ret.migrated = 0; > > + goto out; > > + } > > + else > > + { > > This is the same comment as the next block below, does it really apply > here too? > > Since the previous if block ends with a goto the else clause is a bit > redundant.Yeah, probably better w/o the else.> > +/* Default scheduling rate limit: 1ms > > + * It''s not recommended to set this value bigger than "sched_credit_tslice_ms" > > + * otherwise, something weired may happenHui, "something weird may happen" is fine for informal speech, but for comments we tend to use more formal language. I might say something like this: "The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms is undefined." -George
George Dunlap
2011-Dec-20 10:26 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
On Tue, 2011-12-20 at 10:09 +0000, Dario Faggioli wrote:> On Tue, 2011-12-20 at 17:44 +0800, Lv, Hui wrote: > > > It that constraint enforced anywhere? > > > > Currently, it''s not constraint enforced. > > I think George have gave the explanation in yesterday''s discussion. > > > I think that too, but maybe you can print some kind of warning when it > happens?I guess the consensus is that we should put in some effort to make the interface more polished. How about this: First, add a ratelimit_us element to csched_priv, just like the current tslice_ms element. When the scheduler comes up (in csched_init), it checks to see if MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms); if so, it prints a warning, and sets prv->ratelimit_us to 1000*prv->tslice_ms. In the future, when we implement the domctls to change a scheduler''s ratelimit_us, and tslice_ms, we disallow changes which would violate the "ratelimit < tslice" rule (returning -EINVAL or something like that). Thoughts? -George
Lv, Hui
2011-Dec-20 12:14 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
> How about this: > > First, add a ratelimit_us element to csched_priv, just like the current tslice_ms > element. > > When the scheduler comes up (in csched_init), it checks to see if > MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms); if so, it > prints a warning, and sets prv->ratelimit_us to 1000*prv->tslice_ms. >I think it makes sense to do so.
Hui Lv
2011-Dec-26 08:46 UTC
[PATCH] xen/sched_credit: Use delay to control scheduling frequency
Some modifications for this patch. 1.Based on George''s proposal, added a ratelimit_us element to csched_priv to constrain prv->ratelimit_us <= 1000* prv->tslice_ms in csched_init. 2.Move the definition of sched_ratelimit_us to xen/sched.h 3.Remove the redundant "else" structure See if you have any comment for such changes. This patch can improve Xen performance: 1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler. 2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance) 3. We have compared different load level response time/latency (low, high, peak), "delay method" didn''t bring very much response time increase. 4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (int sched_ratelimit_us = 1000 is the recommended setting) Signed-off-by: Hui Lv <hui.lv@intel.com> Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com> diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/sched_credit.c --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/common/sched_credit.c Mon Dec 26 03:44:39 2011 -0500 @@ -172,6 +172,7 @@ struct csched_private { uint32_t credit; int credit_balance; uint32_t runq_sort; + unsigned ratelimit_us; /* Period of master and tick in milliseconds */ unsigned tslice_ms, tick_period_us, ticks_per_tslice; unsigned credits_per_tslice; @@ -1297,10 +1298,15 @@ csched_schedule( struct csched_private *prv = CSCHED_PRIV(ops); struct csched_vcpu *snext; struct task_slice ret; + s_time_t runtime, tslice; CSCHED_STAT_CRANK(schedule); CSCHED_VCPU_CHECK(current); + runtime = now - current->runstate.state_entry_time; + if ( runtime < 0 ) /* Does this ever happen? */ + runtime = 0; + if ( !is_idle_vcpu(scurr->vcpu) ) { /* Update credits of a non-idle VCPU. */ @@ -1313,6 +1319,35 @@ csched_schedule( scurr->pri = CSCHED_PRI_IDLE; } + /* Choices, choices: + * - If we have a tasklet, we need to run the idle vcpu no matter what. + * - If sched rate limiting is in effect, and the current vcpu has + * run for less than that amount of time, continue the current one, + * but with a shorter timeslice and return it immediately + * - Otherwise, chose the one with the highest priority (which may + * be the one currently running) + * - If the currently running one is TS_OVER, see if there + * is a higher priority one waiting on the runqueue of another + * cpu and steal it. + */ + + /* If we have schedule rate limiting enabled, check to see + * how long we''ve run for. */ + if ( !tasklet_work_scheduled + && prv->ratelimit_us + && vcpu_runnable(current) + && !is_idle_vcpu(current) + && runtime < MICROSECS(prv->ratelimit_us) ) + { + snext = scurr; + snext->start_time += now; + perfc_incr(delay_ms); + tslice = MICROSECS(prv->ratelimit_us); + ret.migrated = 0; + goto out; + } + tslice = MILLISECS(prv->tslice_ms); + /* * Select next runnable local VCPU (ie top of local runq) */ @@ -1367,11 +1402,12 @@ csched_schedule( if ( !is_idle_vcpu(snext->vcpu) ) snext->start_time += now; +out: /* * Return task to run next... */ ret.time = (is_idle_vcpu(snext->vcpu) ? - -1 : MILLISECS(prv->tslice_ms)); + -1 : tslice); ret.task = snext->vcpu; CSCHED_VCPU_CHECK(ret.task); @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice; prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms; + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) ) + { + printk("WARNING: sched_ratelimit_us >" + "sched_credit_tslice_ms is undefined\n" + "ratelimit_us is set to 1000 * tslice_ms forcely\n"); + prv->ratelimit_us = 1000 * prv->tslice_ms; + } + else + prv->ratelimit_us = sched_ratelimit_us; return 0; } diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/schedule.c --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/common/schedule.c Mon Dec 26 03:44:39 2011 -0500 @@ -47,6 +47,11 @@ string_param("sched", opt_sched); bool_t sched_smt_power_savings = 0; boolean_param("sched_smt_power_savings", sched_smt_power_savings); +/* Default scheduling rate limit: 1ms + * The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms is undefined + * */ +int sched_ratelimit_us = 1000; +integer_param("sched_ratelimit_us", sched_ratelimit_us); /* Various timer handlers. */ static void s_timer_fn(void *unused); static void vcpu_periodic_timer_fn(void *data); diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/perfc_defn.h --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/include/xen/perfc_defn.h Mon Dec 26 03:44:39 2011 -0500 @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch PERFCOUNTER(sched_run, "sched: runs through scheduler") PERFCOUNTER(sched_ctx, "sched: context switches") +PERFCOUNTER(delay_ms, "csched: delay") PERFCOUNTER(vcpu_check, "csched: vcpu_check") PERFCOUNTER(schedule, "csched: schedule") PERFCOUNTER(acct_run, "csched: acct_run") diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/sched-if.h --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/include/xen/sched-if.h Mon Dec 26 03:44:39 2011 -0500 @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0; /* cpus currently in no cpupool */ extern cpumask_t cpupool_free_cpus; +/* Scheduler generic parameters + * */ +extern int sched_ratelimit_us; + + /* * In order to allow a scheduler to remap the lock->cpu mapping, * we have a per-cpu pointer, along with a pre-allocated set of
Jan Beulich
2012-Jan-02 08:19 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
>>> On 26.12.11 at 09:46, Hui Lv <hui.lv@intel.com> wrote: > Some modifications for this patch. > 1.Based on George''s proposal, added a ratelimit_us element to csched_priv to > constrain prv->ratelimit_us <= 1000* prv->tslice_ms in csched_init.I do not see why you need a per-scheduler-instance variable for issuing the warning and correcting the value - one warning (during the first initialization) is entirely sufficient. (Otoh the already existing tslice_ms field is pointless currently too, as it never gets changed after being initialized from sched_credit_tslice_ms - George?) Jan> 2.Move the definition of sched_ratelimit_us to xen/sched.h > 3.Remove the redundant "else" structure > > See if you have any comment for such changes. > > > This patch can improve Xen performance: > 1. Basically, the "delay method" can achieve 11% overall performance boost > for SPECvirt than original credit scheduler. > 2. We have tried 1ms delay and 10ms delay, there is no big difference > between these two configurations. (1ms is enough to achieve a good > performance) > 3. We have compared different load level response time/latency (low, high, > peak), "delay method" didn''t bring very much response time increase. > 4. 1ms delay can reduce 30% context switch at peak performance, where > produces the benefits. (int sched_ratelimit_us = 1000 is the recommended > setting) > > Signed-off-by: Hui Lv <hui.lv@intel.com> > Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com> > > diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/sched_credit.c > --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/common/sched_credit.c Mon Dec 26 03:44:39 2011 -0500 > @@ -172,6 +172,7 @@ struct csched_private { > uint32_t credit; > int credit_balance; > uint32_t runq_sort; > + unsigned ratelimit_us; > /* Period of master and tick in milliseconds */ > unsigned tslice_ms, tick_period_us, ticks_per_tslice; > unsigned credits_per_tslice; > @@ -1297,10 +1298,15 @@ csched_schedule( > struct csched_private *prv = CSCHED_PRIV(ops); > struct csched_vcpu *snext; > struct task_slice ret; > + s_time_t runtime, tslice; > > CSCHED_STAT_CRANK(schedule); > CSCHED_VCPU_CHECK(current); > > + runtime = now - current->runstate.state_entry_time; > + if ( runtime < 0 ) /* Does this ever happen? */ > + runtime = 0; > + > if ( !is_idle_vcpu(scurr->vcpu) ) > { > /* Update credits of a non-idle VCPU. */ > @@ -1313,6 +1319,35 @@ csched_schedule( > scurr->pri = CSCHED_PRI_IDLE; > } > > + /* Choices, choices: > + * - If we have a tasklet, we need to run the idle vcpu no matter what. > + * - If sched rate limiting is in effect, and the current vcpu has > + * run for less than that amount of time, continue the current one, > + * but with a shorter timeslice and return it immediately > + * - Otherwise, chose the one with the highest priority (which may > + * be the one currently running) > + * - If the currently running one is TS_OVER, see if there > + * is a higher priority one waiting on the runqueue of another > + * cpu and steal it. > + */ > + > + /* If we have schedule rate limiting enabled, check to see > + * how long we''ve run for. */ > + if ( !tasklet_work_scheduled > + && prv->ratelimit_us > + && vcpu_runnable(current) > + && !is_idle_vcpu(current) > + && runtime < MICROSECS(prv->ratelimit_us) ) > + { > + snext = scurr; > + snext->start_time += now; > + perfc_incr(delay_ms); > + tslice = MICROSECS(prv->ratelimit_us); > + ret.migrated = 0; > + goto out; > + } > + tslice = MILLISECS(prv->tslice_ms); > + > /* > * Select next runnable local VCPU (ie top of local runq) > */ > @@ -1367,11 +1402,12 @@ csched_schedule( > if ( !is_idle_vcpu(snext->vcpu) ) > snext->start_time += now; > > +out: > /* > * Return task to run next... > */ > ret.time = (is_idle_vcpu(snext->vcpu) ? > - -1 : MILLISECS(prv->tslice_ms)); > + -1 : tslice); > ret.task = snext->vcpu; > > CSCHED_VCPU_CHECK(ret.task); > @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) > prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice; > prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms; > > + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) ) > + { > + printk("WARNING: sched_ratelimit_us >" > + "sched_credit_tslice_ms is undefined\n" > + "ratelimit_us is set to 1000 * tslice_ms forcely\n"); > + prv->ratelimit_us = 1000 * prv->tslice_ms; > + } > + else > + prv->ratelimit_us = sched_ratelimit_us; > return 0; > } > > diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/schedule.c > --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/common/schedule.c Mon Dec 26 03:44:39 2011 -0500 > @@ -47,6 +47,11 @@ string_param("sched", opt_sched); > bool_t sched_smt_power_savings = 0; > boolean_param("sched_smt_power_savings", sched_smt_power_savings); > > +/* Default scheduling rate limit: 1ms > + * The behavior when sched_ratelimit_us is greater than > sched_credit_tslice_ms is undefined > + * */ > +int sched_ratelimit_us = 1000; > +integer_param("sched_ratelimit_us", sched_ratelimit_us); > /* Various timer handlers. */ > static void s_timer_fn(void *unused); > static void vcpu_periodic_timer_fn(void *data); > diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/perfc_defn.h > --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/include/xen/perfc_defn.h Mon Dec 26 03:44:39 2011 -0500 > @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch > PERFCOUNTER(sched_run, "sched: runs through scheduler") > PERFCOUNTER(sched_ctx, "sched: context switches") > > +PERFCOUNTER(delay_ms, "csched: delay") > PERFCOUNTER(vcpu_check, "csched: vcpu_check") > PERFCOUNTER(schedule, "csched: schedule") > PERFCOUNTER(acct_run, "csched: acct_run") > diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/sched-if.h > --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/include/xen/sched-if.h Mon Dec 26 03:44:39 2011 -0500 > @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0; > /* cpus currently in no cpupool */ > extern cpumask_t cpupool_free_cpus; > > +/* Scheduler generic parameters > + * */ > +extern int sched_ratelimit_us; > + > + > /* > * In order to allow a scheduler to remap the lock->cpu mapping, > * we have a per-cpu pointer, along with a pre-allocated set of
Lv, Hui
2012-Jan-05 04:08 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
George, Any comments for this? Is it the right "warning" way?> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Monday, January 02, 2012 4:19 PM > To: Lv, Hui > Cc: Ian.Campbell@citrix.com; George.Dunlap@eu.citrix.com; > raistlin@linux.it; xen-devel@lists.xensource.com > Subject: Re: [PATCH] xen/sched_credit: Use delay to control scheduling > frequency > > >>> On 26.12.11 at 09:46, Hui Lv <hui.lv@intel.com> wrote: > > Some modifications for this patch. > > 1.Based on George''s proposal, added a ratelimit_us element to > csched_priv to > > constrain prv->ratelimit_us <= 1000* prv->tslice_ms in csched_init. > > I do not see why you need a per-scheduler-instance variable for > issuing the warning and correcting the value - one warning (during > the first initialization) is entirely sufficient. (Otoh the already > existing > tslice_ms field is pointless currently too, as it never gets changed > after being initialized from sched_credit_tslice_ms - George?) > > Jan > > > 2.Move the definition of sched_ratelimit_us to xen/sched.h > > 3.Remove the redundant "else" structure > > > > See if you have any comment for such changes. > > > > > > This patch can improve Xen performance: > > 1. Basically, the "delay method" can achieve 11% overall performance > boost > > for SPECvirt than original credit scheduler. > > 2. We have tried 1ms delay and 10ms delay, there is no big difference > > between these two configurations. (1ms is enough to achieve a good > > performance) > > 3. We have compared different load level response time/latency (low, > high, > > peak), "delay method" didn''t bring very much response time increase. > > 4. 1ms delay can reduce 30% context switch at peak performance, where > > produces the benefits. (int sched_ratelimit_us = 1000 is the > recommended > > setting) > > > > Signed-off-by: Hui Lv <hui.lv@intel.com> > > Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com> > > > > diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/sched_credit.c > > --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000 > > +++ b/xen/common/sched_credit.c Mon Dec 26 03:44:39 2011 -0500 > > @@ -172,6 +172,7 @@ struct csched_private { > > uint32_t credit; > > int credit_balance; > > uint32_t runq_sort; > > + unsigned ratelimit_us; > > /* Period of master and tick in milliseconds */ > > unsigned tslice_ms, tick_period_us, ticks_per_tslice; > > unsigned credits_per_tslice; > > @@ -1297,10 +1298,15 @@ csched_schedule( > > struct csched_private *prv = CSCHED_PRIV(ops); > > struct csched_vcpu *snext; > > struct task_slice ret; > > + s_time_t runtime, tslice; > > > > CSCHED_STAT_CRANK(schedule); > > CSCHED_VCPU_CHECK(current); > > > > + runtime = now - current->runstate.state_entry_time; > > + if ( runtime < 0 ) /* Does this ever happen? */ > > + runtime = 0; > > + > > if ( !is_idle_vcpu(scurr->vcpu) ) > > { > > /* Update credits of a non-idle VCPU. */ > > @@ -1313,6 +1319,35 @@ csched_schedule( > > scurr->pri = CSCHED_PRI_IDLE; > > } > > > > + /* Choices, choices: > > + * - If we have a tasklet, we need to run the idle vcpu no > matter what. > > + * - If sched rate limiting is in effect, and the current vcpu > has > > + * run for less than that amount of time, continue the current > one, > > + * but with a shorter timeslice and return it immediately > > + * - Otherwise, chose the one with the highest priority (which > may > > + * be the one currently running) > > + * - If the currently running one is TS_OVER, see if there > > + * is a higher priority one waiting on the runqueue of another > > + * cpu and steal it. > > + */ > > + > > + /* If we have schedule rate limiting enabled, check to see > > + * how long we''ve run for. */ > > + if ( !tasklet_work_scheduled > > + && prv->ratelimit_us > > + && vcpu_runnable(current) > > + && !is_idle_vcpu(current) > > + && runtime < MICROSECS(prv->ratelimit_us) ) > > + { > > + snext = scurr; > > + snext->start_time += now; > > + perfc_incr(delay_ms); > > + tslice = MICROSECS(prv->ratelimit_us); > > + ret.migrated = 0; > > + goto out; > > + } > > + tslice = MILLISECS(prv->tslice_ms); > > + > > /* > > * Select next runnable local VCPU (ie top of local runq) > > */ > > @@ -1367,11 +1402,12 @@ csched_schedule( > > if ( !is_idle_vcpu(snext->vcpu) ) > > snext->start_time += now; > > > > +out: > > /* > > * Return task to run next... > > */ > > ret.time = (is_idle_vcpu(snext->vcpu) ? > > - -1 : MILLISECS(prv->tslice_ms)); > > + -1 : tslice); > > ret.task = snext->vcpu; > > > > CSCHED_VCPU_CHECK(ret.task); > > @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) > > prv->tick_period_us = prv->tslice_ms * 1000 / prv- > >ticks_per_tslice; > > prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv- > >tslice_ms; > > > > + if ( MICROSECS(sched_ratelimit_us) > > MILLISECS(sched_credit_tslice_ms) ) > > + { > > + printk("WARNING: sched_ratelimit_us >" > > + "sched_credit_tslice_ms is undefined\n" > > + "ratelimit_us is set to 1000 * tslice_ms forcely\n"); > > + prv->ratelimit_us = 1000 * prv->tslice_ms; > > + } > > + else > > + prv->ratelimit_us = sched_ratelimit_us; > > return 0; > > } > > > > diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/schedule.c > > --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000 > > +++ b/xen/common/schedule.c Mon Dec 26 03:44:39 2011 -0500 > > @@ -47,6 +47,11 @@ string_param("sched", opt_sched); > > bool_t sched_smt_power_savings = 0; > > boolean_param("sched_smt_power_savings", sched_smt_power_savings); > > > > +/* Default scheduling rate limit: 1ms > > + * The behavior when sched_ratelimit_us is greater than > > sched_credit_tslice_ms is undefined > > + * */ > > +int sched_ratelimit_us = 1000; > > +integer_param("sched_ratelimit_us", sched_ratelimit_us); > > /* Various timer handlers. */ > > static void s_timer_fn(void *unused); > > static void vcpu_periodic_timer_fn(void *data); > > diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/perfc_defn.h > > --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 > +0000 > > +++ b/xen/include/xen/perfc_defn.h Mon Dec 26 03:44:39 2011 - > 0500 > > @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch > > PERFCOUNTER(sched_run, "sched: runs through scheduler") > > PERFCOUNTER(sched_ctx, "sched: context switches") > > > > +PERFCOUNTER(delay_ms, "csched: delay") > > PERFCOUNTER(vcpu_check, "csched: vcpu_check") > > PERFCOUNTER(schedule, "csched: schedule") > > PERFCOUNTER(acct_run, "csched: acct_run") > > diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/sched-if.h > > --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000 > > +++ b/xen/include/xen/sched-if.h Mon Dec 26 03:44:39 2011 -0500 > > @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0; > > /* cpus currently in no cpupool */ > > extern cpumask_t cpupool_free_cpus; > > > > +/* Scheduler generic parameters > > + * */ > > +extern int sched_ratelimit_us; > > + > > + > > /* > > * In order to allow a scheduler to remap the lock->cpu mapping, > > * we have a per-cpu pointer, along with a pre-allocated set of >
George Dunlap
2012-Jan-06 19:50 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
On 02/01/12 03:19, Jan Beulich wrote:>>>> On 26.12.11 at 09:46, Hui Lv<hui.lv@intel.com> wrote: >> Some modifications for this patch. >> 1.Based on George''s proposal, added a ratelimit_us element to csched_priv to >> constrain prv->ratelimit_us<= 1000* prv->tslice_ms in csched_init. > I do not see why you need a per-scheduler-instance variable for > issuing the warning and correcting the value - one warning (during > the first initialization) is entirely sufficient. (Otoh the already existing > tslice_ms field is pointless currently too, as it never gets changed > after being initialized from sched_credit_tslice_ms - George?)Before the 4.2 release, I plan to add SYSCTL_scheduler_op calls for the credit scheduler for both the timeslice and ratelimiting. But I don''t think it''s necessary for this patch to be admitted. -George
George Dunlap
2012-Jan-06 19:56 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
Sorry for the delay; just catching up after the Christmas holidays. On 26/12/11 03:46, Hui Lv wrote:> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) > prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice; > prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms; > > + if ( MICROSECS(sched_ratelimit_us)> MILLISECS(sched_credit_tslice_ms) ) > + { > + printk("WARNING: sched_ratelimit_us>" > + "sched_credit_tslice_ms is undefined\n" > + "ratelimit_us is set to 1000 * tslice_ms forcely\n")The standard idiom for this kind of message would be: WARNING [what''s wrong] [What you''re doing about it] So the last sentence of the warning should be: Setting ratelimit_us to 1000 * tslice_ms (Grammatically, you could say "Forcing ratelimit..." but I think "force" is too strong in this case.) Other than that, I''m happy with it, if everyone else is: Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Lv, Hui
2012-Jan-08 12:03 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
Thanks, George. Should I send a revised version? Can it be checked in? -----Original Message----- From: George Dunlap [mailto:george.dunlap@eu.citrix.com] Sent: Saturday, January 07, 2012 3:57 AM To: Lv, Hui Cc: xen-devel@lists.xensource.com; raistlin@linux.it; JBeulich@suse.com; Ian Campbell Subject: Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency Sorry for the delay; just catching up after the Christmas holidays. On 26/12/11 03:46, Hui Lv wrote:> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) > prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice; > prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms; > > + if ( MICROSECS(sched_ratelimit_us)> MILLISECS(sched_credit_tslice_ms) ) > + { > + printk("WARNING: sched_ratelimit_us>" > + "sched_credit_tslice_ms is undefined\n" > + "ratelimit_us is set to 1000 * tslice_ms forcely\n")The standard idiom for this kind of message would be: WARNING [what''s wrong] [What you''re doing about it] So the last sentence of the warning should be: Setting ratelimit_us to 1000 * tslice_ms (Grammatically, you could say "Forcing ratelimit..." but I think "force" is too strong in this case.) Other than that, I''m happy with it, if everyone else is: Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
George Dunlap
2012-Jan-09 09:49 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
On Sun, 2012-01-08 at 12:03 +0000, Lv, Hui wrote:> Thanks, George. > Should I send a revised version? > Can it be checked in?Yes, please send a revised version. It will be checked in if one of the maintainers thinks there''s enough consensus (probably if no one has any more comments in the next day or two). Thanks, -George> > -----Original Message----- > From: George Dunlap [mailto:george.dunlap@eu.citrix.com] > Sent: Saturday, January 07, 2012 3:57 AM > To: Lv, Hui > Cc: xen-devel@lists.xensource.com; raistlin@linux.it; JBeulich@suse.com; Ian Campbell > Subject: Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency > > Sorry for the delay; just catching up after the Christmas holidays. > > On 26/12/11 03:46, Hui Lv wrote: > > @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) > > prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice; > > prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms; > > > > + if ( MICROSECS(sched_ratelimit_us)> MILLISECS(sched_credit_tslice_ms) ) > > + { > > + printk("WARNING: sched_ratelimit_us>" > > + "sched_credit_tslice_ms is undefined\n" > > + "ratelimit_us is set to 1000 * tslice_ms forcely\n") > The standard idiom for this kind of message would be: > WARNING [what''s wrong] > [What you''re doing about it] > > So the last sentence of the warning should be: > Setting ratelimit_us to 1000 * tslice_ms > > (Grammatically, you could say "Forcing ratelimit..." but I think "force" > is too strong in this case.) > > Other than that, I''m happy with it, if everyone else is: > > Acked-by: George Dunlap <george.dunlap@eu.citrix.com> >
Hui Lv
2012-Jan-09 10:22 UTC
[PATCH] xen/sched_credit: Use delay to control scheduling frequency
Updated the warning sentence for ratelimit_us. This patch can improve Xen performance: 1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler. 2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance) 3. We have compared different load level response time/latency (low, high, peak), "delay method" didn''t bring very much response time increase. 4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (int sched_ratelimit_us = 1000 is the recommended setting) Signed-off-by: Hui Lv <hui.lv@intel.com> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500 @@ -172,6 +172,7 @@ struct csched_private { uint32_t credit; int credit_balance; uint32_t runq_sort; + unsigned ratelimit_us; /* Period of master and tick in milliseconds */ unsigned tslice_ms, tick_period_us, ticks_per_tslice; unsigned credits_per_tslice; @@ -1297,10 +1298,15 @@ csched_schedule( struct csched_private *prv = CSCHED_PRIV(ops); struct csched_vcpu *snext; struct task_slice ret; + s_time_t runtime, tslice; CSCHED_STAT_CRANK(schedule); CSCHED_VCPU_CHECK(current); + runtime = now - current->runstate.state_entry_time; + if ( runtime < 0 ) /* Does this ever happen? */ + runtime = 0; + if ( !is_idle_vcpu(scurr->vcpu) ) { /* Update credits of a non-idle VCPU. */ @@ -1313,6 +1319,35 @@ csched_schedule( scurr->pri = CSCHED_PRI_IDLE; } + /* Choices, choices: + * - If we have a tasklet, we need to run the idle vcpu no matter what. + * - If sched rate limiting is in effect, and the current vcpu has + * run for less than that amount of time, continue the current one, + * but with a shorter timeslice and return it immediately + * - Otherwise, chose the one with the highest priority (which may + * be the one currently running) + * - If the currently running one is TS_OVER, see if there + * is a higher priority one waiting on the runqueue of another + * cpu and steal it. + */ + + /* If we have schedule rate limiting enabled, check to see + * how long we''ve run for. */ + if ( !tasklet_work_scheduled + && prv->ratelimit_us + && vcpu_runnable(current) + && !is_idle_vcpu(current) + && runtime < MICROSECS(prv->ratelimit_us) ) + { + snext = scurr; + snext->start_time += now; + perfc_incr(delay_ms); + tslice = MICROSECS(prv->ratelimit_us); + ret.migrated = 0; + goto out; + } + tslice = MILLISECS(prv->tslice_ms); + /* * Select next runnable local VCPU (ie top of local runq) */ @@ -1367,11 +1402,12 @@ csched_schedule( if ( !is_idle_vcpu(snext->vcpu) ) snext->start_time += now; +out: /* * Return task to run next... */ ret.time = (is_idle_vcpu(snext->vcpu) ? - -1 : MILLISECS(prv->tslice_ms)); + -1 : tslice); ret.task = snext->vcpu; CSCHED_VCPU_CHECK(ret.task); @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice; prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms; + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) ) + { + printk("WARNING: sched_ratelimit_us >" + "sched_credit_tslice_ms is undefined\n" + "Setting ratelimit_us to 1000 * tslice_ms\n"); + prv->ratelimit_us = 1000 * prv->tslice_ms; + } + else + prv->ratelimit_us = sched_ratelimit_us; return 0; } diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/common/schedule.c Mon Jan 09 05:21:35 2012 -0500 @@ -47,6 +47,11 @@ string_param("sched", opt_sched); bool_t sched_smt_power_savings = 0; boolean_param("sched_smt_power_savings", sched_smt_power_savings); +/* Default scheduling rate limit: 1ms + * The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms is undefined + * */ +int sched_ratelimit_us = 1000; +integer_param("sched_ratelimit_us", sched_ratelimit_us); /* Various timer handlers. */ static void s_timer_fn(void *unused); static void vcpu_periodic_timer_fn(void *data); diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/include/xen/perfc_defn.h Mon Jan 09 05:21:35 2012 -0500 @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch PERFCOUNTER(sched_run, "sched: runs through scheduler") PERFCOUNTER(sched_ctx, "sched: context switches") +PERFCOUNTER(delay_ms, "csched: delay") PERFCOUNTER(vcpu_check, "csched: vcpu_check") PERFCOUNTER(schedule, "csched: schedule") PERFCOUNTER(acct_run, "csched: acct_run") diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000 +++ b/xen/include/xen/sched-if.h Mon Jan 09 05:21:35 2012 -0500 @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0; /* cpus currently in no cpupool */ extern cpumask_t cpupool_free_cpus; +/* Scheduler generic parameters + * */ +extern int sched_ratelimit_us; + + /* * In order to allow a scheduler to remap the lock->cpu mapping, * we have a per-cpu pointer, along with a pre-allocated set of
George Dunlap
2012-Jan-10 10:05 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com> wrote:> Updated the warning sentence for ratelimit_us. > > This patch can improve Xen performance: > 1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler. > 2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance) > 3. We have compared different load level response time/latency (low, high, peak), "delay method" didn''t bring very much response time increase. > 4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (int sched_ratelimit_us = 1000 is the recommended setting) > > Signed-off-by: Hui Lv <hui.lv@intel.com> > Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> > > Acked-by: George Dunlap <george.dunlap@eu.citrix.com>Just confirming this: Acked-by: George Dunlap <george.dunlap@eu.citrix.com> Thanks, Hui. -George> > diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c > --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500 > @@ -172,6 +172,7 @@ struct csched_private { > uint32_t credit; > int credit_balance; > uint32_t runq_sort; > + unsigned ratelimit_us; > /* Period of master and tick in milliseconds */ > unsigned tslice_ms, tick_period_us, ticks_per_tslice; > unsigned credits_per_tslice; > @@ -1297,10 +1298,15 @@ csched_schedule( > struct csched_private *prv = CSCHED_PRIV(ops); > struct csched_vcpu *snext; > struct task_slice ret; > + s_time_t runtime, tslice; > > CSCHED_STAT_CRANK(schedule); > CSCHED_VCPU_CHECK(current); > > + runtime = now - current->runstate.state_entry_time; > + if ( runtime < 0 ) /* Does this ever happen? */ > + runtime = 0; > + > if ( !is_idle_vcpu(scurr->vcpu) ) > { > /* Update credits of a non-idle VCPU. */ > @@ -1313,6 +1319,35 @@ csched_schedule( > scurr->pri = CSCHED_PRI_IDLE; > } > > + /* Choices, choices: > + * - If we have a tasklet, we need to run the idle vcpu no matter what. > + * - If sched rate limiting is in effect, and the current vcpu has > + * run for less than that amount of time, continue the current one, > + * but with a shorter timeslice and return it immediately > + * - Otherwise, chose the one with the highest priority (which may > + * be the one currently running) > + * - If the currently running one is TS_OVER, see if there > + * is a higher priority one waiting on the runqueue of another > + * cpu and steal it. > + */ > + > + /* If we have schedule rate limiting enabled, check to see > + * how long we''ve run for. */ > + if ( !tasklet_work_scheduled > + && prv->ratelimit_us > + && vcpu_runnable(current) > + && !is_idle_vcpu(current) > + && runtime < MICROSECS(prv->ratelimit_us) ) > + { > + snext = scurr; > + snext->start_time += now; > + perfc_incr(delay_ms); > + tslice = MICROSECS(prv->ratelimit_us); > + ret.migrated = 0; > + goto out; > + } > + tslice = MILLISECS(prv->tslice_ms); > + > /* > * Select next runnable local VCPU (ie top of local runq) > */ > @@ -1367,11 +1402,12 @@ csched_schedule( > if ( !is_idle_vcpu(snext->vcpu) ) > snext->start_time += now; > > +out: > /* > * Return task to run next... > */ > ret.time = (is_idle_vcpu(snext->vcpu) ? > - -1 : MILLISECS(prv->tslice_ms)); > + -1 : tslice); > ret.task = snext->vcpu; > > CSCHED_VCPU_CHECK(ret.task); > @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) > prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice; > prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms; > > + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) ) > + { > + printk("WARNING: sched_ratelimit_us >" > + "sched_credit_tslice_ms is undefined\n" > + "Setting ratelimit_us to 1000 * tslice_ms\n"); > + prv->ratelimit_us = 1000 * prv->tslice_ms; > + } > + else > + prv->ratelimit_us = sched_ratelimit_us; > return 0; > } > > diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c > --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/common/schedule.c Mon Jan 09 05:21:35 2012 -0500 > @@ -47,6 +47,11 @@ string_param("sched", opt_sched); > bool_t sched_smt_power_savings = 0; > boolean_param("sched_smt_power_savings", sched_smt_power_savings); > > +/* Default scheduling rate limit: 1ms > + * The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms is undefined > + * */ > +int sched_ratelimit_us = 1000; > +integer_param("sched_ratelimit_us", sched_ratelimit_us); > /* Various timer handlers. */ > static void s_timer_fn(void *unused); > static void vcpu_periodic_timer_fn(void *data); > diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h > --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/include/xen/perfc_defn.h Mon Jan 09 05:21:35 2012 -0500 > @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch > PERFCOUNTER(sched_run, "sched: runs through scheduler") > PERFCOUNTER(sched_ctx, "sched: context switches") > > +PERFCOUNTER(delay_ms, "csched: delay") > PERFCOUNTER(vcpu_check, "csched: vcpu_check") > PERFCOUNTER(schedule, "csched: schedule") > PERFCOUNTER(acct_run, "csched: acct_run") > diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h > --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000 > +++ b/xen/include/xen/sched-if.h Mon Jan 09 05:21:35 2012 -0500 > @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0; > /* cpus currently in no cpupool */ > extern cpumask_t cpupool_free_cpus; > > +/* Scheduler generic parameters > + * */ > +extern int sched_ratelimit_us; > + > + > /* > * In order to allow a scheduler to remap the lock->cpu mapping, > * we have a per-cpu pointer, along with a pre-allocated set of > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
George Dunlap
2012-Jan-24 12:01 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
Ping? On Tue, Jan 10, 2012 at 10:05 AM, George Dunlap <George.Dunlap@eu.citrix.com> wrote:> On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com> wrote: >> Updated the warning sentence for ratelimit_us. >> >> This patch can improve Xen performance: >> 1. Basically, the "delay method" can achieve 11% overall performance boost for SPECvirt than original credit scheduler. >> 2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance) >> 3. We have compared different load level response time/latency (low, high, peak), "delay method" didn''t bring very much response time increase. >> 4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. (int sched_ratelimit_us = 1000 is the recommended setting) >> >> Signed-off-by: Hui Lv <hui.lv@intel.com> >> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> >> >> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> > > Just confirming this: > Acked-by: George Dunlap <george.dunlap@eu.citrix.com> > > Thanks, Hui. > -George > >> >> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c >> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000 >> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500 >> @@ -172,6 +172,7 @@ struct csched_private { >> uint32_t credit; >> int credit_balance; >> uint32_t runq_sort; >> + unsigned ratelimit_us; >> /* Period of master and tick in milliseconds */ >> unsigned tslice_ms, tick_period_us, ticks_per_tslice; >> unsigned credits_per_tslice; >> @@ -1297,10 +1298,15 @@ csched_schedule( >> struct csched_private *prv = CSCHED_PRIV(ops); >> struct csched_vcpu *snext; >> struct task_slice ret; >> + s_time_t runtime, tslice; >> >> CSCHED_STAT_CRANK(schedule); >> CSCHED_VCPU_CHECK(current); >> >> + runtime = now - current->runstate.state_entry_time; >> + if ( runtime < 0 ) /* Does this ever happen? */ >> + runtime = 0; >> + >> if ( !is_idle_vcpu(scurr->vcpu) ) >> { >> /* Update credits of a non-idle VCPU. */ >> @@ -1313,6 +1319,35 @@ csched_schedule( >> scurr->pri = CSCHED_PRI_IDLE; >> } >> >> + /* Choices, choices: >> + * - If we have a tasklet, we need to run the idle vcpu no matter what. >> + * - If sched rate limiting is in effect, and the current vcpu has >> + * run for less than that amount of time, continue the current one, >> + * but with a shorter timeslice and return it immediately >> + * - Otherwise, chose the one with the highest priority (which may >> + * be the one currently running) >> + * - If the currently running one is TS_OVER, see if there >> + * is a higher priority one waiting on the runqueue of another >> + * cpu and steal it. >> + */ >> + >> + /* If we have schedule rate limiting enabled, check to see >> + * how long we''ve run for. */ >> + if ( !tasklet_work_scheduled >> + && prv->ratelimit_us >> + && vcpu_runnable(current) >> + && !is_idle_vcpu(current) >> + && runtime < MICROSECS(prv->ratelimit_us) ) >> + { >> + snext = scurr; >> + snext->start_time += now; >> + perfc_incr(delay_ms); >> + tslice = MICROSECS(prv->ratelimit_us); >> + ret.migrated = 0; >> + goto out; >> + } >> + tslice = MILLISECS(prv->tslice_ms); >> + >> /* >> * Select next runnable local VCPU (ie top of local runq) >> */ >> @@ -1367,11 +1402,12 @@ csched_schedule( >> if ( !is_idle_vcpu(snext->vcpu) ) >> snext->start_time += now; >> >> +out: >> /* >> * Return task to run next... >> */ >> ret.time = (is_idle_vcpu(snext->vcpu) ? >> - -1 : MILLISECS(prv->tslice_ms)); >> + -1 : tslice); >> ret.task = snext->vcpu; >> >> CSCHED_VCPU_CHECK(ret.task); >> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) >> prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice; >> prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms; >> >> + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) ) >> + { >> + printk("WARNING: sched_ratelimit_us >" >> + "sched_credit_tslice_ms is undefined\n" >> + "Setting ratelimit_us to 1000 * tslice_ms\n"); >> + prv->ratelimit_us = 1000 * prv->tslice_ms; >> + } >> + else >> + prv->ratelimit_us = sched_ratelimit_us; >> return 0; >> } >> >> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c >> --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000 >> +++ b/xen/common/schedule.c Mon Jan 09 05:21:35 2012 -0500 >> @@ -47,6 +47,11 @@ string_param("sched", opt_sched); >> bool_t sched_smt_power_savings = 0; >> boolean_param("sched_smt_power_savings", sched_smt_power_savings); >> >> +/* Default scheduling rate limit: 1ms >> + * The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms is undefined >> + * */ >> +int sched_ratelimit_us = 1000; >> +integer_param("sched_ratelimit_us", sched_ratelimit_us); >> /* Various timer handlers. */ >> static void s_timer_fn(void *unused); >> static void vcpu_periodic_timer_fn(void *data); >> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h >> --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000 >> +++ b/xen/include/xen/perfc_defn.h Mon Jan 09 05:21:35 2012 -0500 >> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch >> PERFCOUNTER(sched_run, "sched: runs through scheduler") >> PERFCOUNTER(sched_ctx, "sched: context switches") >> >> +PERFCOUNTER(delay_ms, "csched: delay") >> PERFCOUNTER(vcpu_check, "csched: vcpu_check") >> PERFCOUNTER(schedule, "csched: schedule") >> PERFCOUNTER(acct_run, "csched: acct_run") >> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h >> --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000 >> +++ b/xen/include/xen/sched-if.h Mon Jan 09 05:21:35 2012 -0500 >> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0; >> /* cpus currently in no cpupool */ >> extern cpumask_t cpupool_free_cpus; >> >> +/* Scheduler generic parameters >> + * */ >> +extern int sched_ratelimit_us; >> + >> + >> /* >> * In order to allow a scheduler to remap the lock->cpu mapping, >> * we have a per-cpu pointer, along with a pre-allocated set of >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel
Keir Fraser
2012-Jan-24 14:17 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
On 24/01/2012 12:01, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:> Ping?It''s been in the tree for a week.> On Tue, Jan 10, 2012 at 10:05 AM, George Dunlap > <George.Dunlap@eu.citrix.com> wrote: >> On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com> wrote: >>> Updated the warning sentence for ratelimit_us. >>> >>> This patch can improve Xen performance: >>> 1. Basically, the "delay method" can achieve 11% overall performance boost >>> for SPECvirt than original credit scheduler. >>> 2. We have tried 1ms delay and 10ms delay, there is no big difference >>> between these two configurations. (1ms is enough to achieve a good >>> performance) >>> 3. We have compared different load level response time/latency (low, high, >>> peak), "delay method" didn''t bring very much response time increase. >>> 4. 1ms delay can reduce 30% context switch at peak performance, where >>> produces the benefits. (int sched_ratelimit_us = 1000 is the recommended >>> setting) >>> >>> Signed-off-by: Hui Lv <hui.lv@intel.com> >>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> >>> >>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> >> >> Just confirming this: >> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> >> >> Thanks, Hui. >> -George >> >>> >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c >>> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000 >>> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500 >>> @@ -172,6 +172,7 @@ struct csched_private { >>> uint32_t credit; >>> int credit_balance; >>> uint32_t runq_sort; >>> + unsigned ratelimit_us; >>> /* Period of master and tick in milliseconds */ >>> unsigned tslice_ms, tick_period_us, ticks_per_tslice; >>> unsigned credits_per_tslice; >>> @@ -1297,10 +1298,15 @@ csched_schedule( >>> struct csched_private *prv = CSCHED_PRIV(ops); >>> struct csched_vcpu *snext; >>> struct task_slice ret; >>> + s_time_t runtime, tslice; >>> >>> CSCHED_STAT_CRANK(schedule); >>> CSCHED_VCPU_CHECK(current); >>> >>> + runtime = now - current->runstate.state_entry_time; >>> + if ( runtime < 0 ) /* Does this ever happen? */ >>> + runtime = 0; >>> + >>> if ( !is_idle_vcpu(scurr->vcpu) ) >>> { >>> /* Update credits of a non-idle VCPU. */ >>> @@ -1313,6 +1319,35 @@ csched_schedule( >>> scurr->pri = CSCHED_PRI_IDLE; >>> } >>> >>> + /* Choices, choices: >>> + * - If we have a tasklet, we need to run the idle vcpu no matter what. >>> + * - If sched rate limiting is in effect, and the current vcpu has >>> + * run for less than that amount of time, continue the current one, >>> + * but with a shorter timeslice and return it immediately >>> + * - Otherwise, chose the one with the highest priority (which may >>> + * be the one currently running) >>> + * - If the currently running one is TS_OVER, see if there >>> + * is a higher priority one waiting on the runqueue of another >>> + * cpu and steal it. >>> + */ >>> + >>> + /* If we have schedule rate limiting enabled, check to see >>> + * how long we''ve run for. */ >>> + if ( !tasklet_work_scheduled >>> + && prv->ratelimit_us >>> + && vcpu_runnable(current) >>> + && !is_idle_vcpu(current) >>> + && runtime < MICROSECS(prv->ratelimit_us) ) >>> + { >>> + snext = scurr; >>> + snext->start_time += now; >>> + perfc_incr(delay_ms); >>> + tslice = MICROSECS(prv->ratelimit_us); >>> + ret.migrated = 0; >>> + goto out; >>> + } >>> + tslice = MILLISECS(prv->tslice_ms); >>> + >>> /* >>> * Select next runnable local VCPU (ie top of local runq) >>> */ >>> @@ -1367,11 +1402,12 @@ csched_schedule( >>> if ( !is_idle_vcpu(snext->vcpu) ) >>> snext->start_time += now; >>> >>> +out: >>> /* >>> * Return task to run next... >>> */ >>> ret.time = (is_idle_vcpu(snext->vcpu) ? >>> - -1 : MILLISECS(prv->tslice_ms)); >>> + -1 : tslice); >>> ret.task = snext->vcpu; >>> >>> CSCHED_VCPU_CHECK(ret.task); >>> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) >>> prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice; >>> prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms; >>> >>> + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) >>> ) >>> + { >>> + printk("WARNING: sched_ratelimit_us >" >>> + "sched_credit_tslice_ms is undefined\n" >>> + "Setting ratelimit_us to 1000 * tslice_ms\n"); >>> + prv->ratelimit_us = 1000 * prv->tslice_ms; >>> + } >>> + else >>> + prv->ratelimit_us = sched_ratelimit_us; >>> return 0; >>> } >>> >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c >>> --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000 >>> +++ b/xen/common/schedule.c Mon Jan 09 05:21:35 2012 -0500 >>> @@ -47,6 +47,11 @@ string_param("sched", opt_sched); >>> bool_t sched_smt_power_savings = 0; >>> boolean_param("sched_smt_power_savings", sched_smt_power_savings); >>> >>> +/* Default scheduling rate limit: 1ms >>> + * The behavior when sched_ratelimit_us is greater than >>> sched_credit_tslice_ms is undefined >>> + * */ >>> +int sched_ratelimit_us = 1000; >>> +integer_param("sched_ratelimit_us", sched_ratelimit_us); >>> /* Various timer handlers. */ >>> static void s_timer_fn(void *unused); >>> static void vcpu_periodic_timer_fn(void *data); >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h >>> --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000 >>> +++ b/xen/include/xen/perfc_defn.h Mon Jan 09 05:21:35 2012 -0500 >>> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch >>> PERFCOUNTER(sched_run, "sched: runs through scheduler") >>> PERFCOUNTER(sched_ctx, "sched: context switches") >>> >>> +PERFCOUNTER(delay_ms, "csched: delay") >>> PERFCOUNTER(vcpu_check, "csched: vcpu_check") >>> PERFCOUNTER(schedule, "csched: schedule") >>> PERFCOUNTER(acct_run, "csched: acct_run") >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h >>> --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000 >>> +++ b/xen/include/xen/sched-if.h Mon Jan 09 05:21:35 2012 -0500 >>> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0; >>> /* cpus currently in no cpupool */ >>> extern cpumask_t cpupool_free_cpus; >>> >>> +/* Scheduler generic parameters >>> + * */ >>> +extern int sched_ratelimit_us; >>> + >>> + >>> /* >>> * In order to allow a scheduler to remap the lock->cpu mapping, >>> * we have a per-cpu pointer, along with a pre-allocated set of >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel
George Dunlap
2012-Jan-24 14:31 UTC
Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency
On Tue, 2012-01-24 at 14:17 +0000, Keir Fraser wrote:> On 24/01/2012 12:01, "George Dunlap" <George.Dunlap@eu.citrix.com> wrote: > > > Ping? > > It''s been in the tree for a week.Sorry -- I was sure I looked for it and didn''t see it; but now I look again, there it is. :-/ -George> > > On Tue, Jan 10, 2012 at 10:05 AM, George Dunlap > > <George.Dunlap@eu.citrix.com> wrote: > >> On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com> wrote: > >>> Updated the warning sentence for ratelimit_us. > >>> > >>> This patch can improve Xen performance: > >>> 1. Basically, the "delay method" can achieve 11% overall performance boost > >>> for SPECvirt than original credit scheduler. > >>> 2. We have tried 1ms delay and 10ms delay, there is no big difference > >>> between these two configurations. (1ms is enough to achieve a good > >>> performance) > >>> 3. We have compared different load level response time/latency (low, high, > >>> peak), "delay method" didn''t bring very much response time increase. > >>> 4. 1ms delay can reduce 30% context switch at peak performance, where > >>> produces the benefits. (int sched_ratelimit_us = 1000 is the recommended > >>> setting) > >>> > >>> Signed-off-by: Hui Lv <hui.lv@intel.com> > >>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com> > >>> > >>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> > >> > >> Just confirming this: > >> Acked-by: George Dunlap <george.dunlap@eu.citrix.com> > >> > >> Thanks, Hui. > >> -George > >> > >>> > >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c > >>> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000 > >>> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500 > >>> @@ -172,6 +172,7 @@ struct csched_private { > >>> uint32_t credit; > >>> int credit_balance; > >>> uint32_t runq_sort; > >>> + unsigned ratelimit_us; > >>> /* Period of master and tick in milliseconds */ > >>> unsigned tslice_ms, tick_period_us, ticks_per_tslice; > >>> unsigned credits_per_tslice; > >>> @@ -1297,10 +1298,15 @@ csched_schedule( > >>> struct csched_private *prv = CSCHED_PRIV(ops); > >>> struct csched_vcpu *snext; > >>> struct task_slice ret; > >>> + s_time_t runtime, tslice; > >>> > >>> CSCHED_STAT_CRANK(schedule); > >>> CSCHED_VCPU_CHECK(current); > >>> > >>> + runtime = now - current->runstate.state_entry_time; > >>> + if ( runtime < 0 ) /* Does this ever happen? */ > >>> + runtime = 0; > >>> + > >>> if ( !is_idle_vcpu(scurr->vcpu) ) > >>> { > >>> /* Update credits of a non-idle VCPU. */ > >>> @@ -1313,6 +1319,35 @@ csched_schedule( > >>> scurr->pri = CSCHED_PRI_IDLE; > >>> } > >>> > >>> + /* Choices, choices: > >>> + * - If we have a tasklet, we need to run the idle vcpu no matter what. > >>> + * - If sched rate limiting is in effect, and the current vcpu has > >>> + * run for less than that amount of time, continue the current one, > >>> + * but with a shorter timeslice and return it immediately > >>> + * - Otherwise, chose the one with the highest priority (which may > >>> + * be the one currently running) > >>> + * - If the currently running one is TS_OVER, see if there > >>> + * is a higher priority one waiting on the runqueue of another > >>> + * cpu and steal it. > >>> + */ > >>> + > >>> + /* If we have schedule rate limiting enabled, check to see > >>> + * how long we''ve run for. */ > >>> + if ( !tasklet_work_scheduled > >>> + && prv->ratelimit_us > >>> + && vcpu_runnable(current) > >>> + && !is_idle_vcpu(current) > >>> + && runtime < MICROSECS(prv->ratelimit_us) ) > >>> + { > >>> + snext = scurr; > >>> + snext->start_time += now; > >>> + perfc_incr(delay_ms); > >>> + tslice = MICROSECS(prv->ratelimit_us); > >>> + ret.migrated = 0; > >>> + goto out; > >>> + } > >>> + tslice = MILLISECS(prv->tslice_ms); > >>> + > >>> /* > >>> * Select next runnable local VCPU (ie top of local runq) > >>> */ > >>> @@ -1367,11 +1402,12 @@ csched_schedule( > >>> if ( !is_idle_vcpu(snext->vcpu) ) > >>> snext->start_time += now; > >>> > >>> +out: > >>> /* > >>> * Return task to run next... > >>> */ > >>> ret.time = (is_idle_vcpu(snext->vcpu) ? > >>> - -1 : MILLISECS(prv->tslice_ms)); > >>> + -1 : tslice); > >>> ret.task = snext->vcpu; > >>> > >>> CSCHED_VCPU_CHECK(ret.task); > >>> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops) > >>> prv->tick_period_us = prv->tslice_ms * 1000 / prv->ticks_per_tslice; > >>> prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms; > >>> > >>> + if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) > >>> ) > >>> + { > >>> + printk("WARNING: sched_ratelimit_us >" > >>> + "sched_credit_tslice_ms is undefined\n" > >>> + "Setting ratelimit_us to 1000 * tslice_ms\n"); > >>> + prv->ratelimit_us = 1000 * prv->tslice_ms; > >>> + } > >>> + else > >>> + prv->ratelimit_us = sched_ratelimit_us; > >>> return 0; > >>> } > >>> > >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c > >>> --- a/xen/common/schedule.c Fri Dec 16 18:46:27 2011 +0000 > >>> +++ b/xen/common/schedule.c Mon Jan 09 05:21:35 2012 -0500 > >>> @@ -47,6 +47,11 @@ string_param("sched", opt_sched); > >>> bool_t sched_smt_power_savings = 0; > >>> boolean_param("sched_smt_power_savings", sched_smt_power_savings); > >>> > >>> +/* Default scheduling rate limit: 1ms > >>> + * The behavior when sched_ratelimit_us is greater than > >>> sched_credit_tslice_ms is undefined > >>> + * */ > >>> +int sched_ratelimit_us = 1000; > >>> +integer_param("sched_ratelimit_us", sched_ratelimit_us); > >>> /* Various timer handlers. */ > >>> static void s_timer_fn(void *unused); > >>> static void vcpu_periodic_timer_fn(void *data); > >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h > >>> --- a/xen/include/xen/perfc_defn.h Fri Dec 16 18:46:27 2011 +0000 > >>> +++ b/xen/include/xen/perfc_defn.h Mon Jan 09 05:21:35 2012 -0500 > >>> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch > >>> PERFCOUNTER(sched_run, "sched: runs through scheduler") > >>> PERFCOUNTER(sched_ctx, "sched: context switches") > >>> > >>> +PERFCOUNTER(delay_ms, "csched: delay") > >>> PERFCOUNTER(vcpu_check, "csched: vcpu_check") > >>> PERFCOUNTER(schedule, "csched: schedule") > >>> PERFCOUNTER(acct_run, "csched: acct_run") > >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h > >>> --- a/xen/include/xen/sched-if.h Fri Dec 16 18:46:27 2011 +0000 > >>> +++ b/xen/include/xen/sched-if.h Mon Jan 09 05:21:35 2012 -0500 > >>> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0; > >>> /* cpus currently in no cpupool */ > >>> extern cpumask_t cpupool_free_cpus; > >>> > >>> +/* Scheduler generic parameters > >>> + * */ > >>> +extern int sched_ratelimit_us; > >>> + > >>> + > >>> /* > >>> * In order to allow a scheduler to remap the lock->cpu mapping, > >>> * we have a per-cpu pointer, along with a pre-allocated set of > >>> > >>> _______________________________________________ > >>> Xen-devel mailing list > >>> Xen-devel@lists.xensource.com > >>> http://lists.xensource.com/xen-devel > >