Lv, Hui
2011-Dec-17 03:24 UTC
[PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
The delay method for credit scheduler can do as well as SRC patch (previous one) to gain significant performance boost without obvious drawbacks. 1. Basically, the "delay method" can achieve nearly the same benefits as my previous SRC patch, 11% overall performance boost for SPECvirt than original credit scheduler. 2. We have tried 1ms delay and 10ms delay, there is no big difference between these two configurations. (1ms is enough to achieve a good performance) 3. We have compared different load level response time/latency (low, high, peak), "delay method" didn''t bring very much response time increase. 4. 1ms delay can reduce 30% context switch at peak performance, where produces the benefits. ("int sched_ratelimit_us = 1000" is the recommended setting) Signed-off-by: Hui Lv <hui.lv@intel.com<mailto:hui.lv@intel.com>> Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com<mailto:George.Dunlap@eu.citrix.com>> diff -r 1c58bb664d8d xen/common/sched_credit.c --- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000 +++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500 @@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS; integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms); +/* Scheduler generic parameters +*/ +extern int sched_ratelimit_us; /* * Physical CPU */ @@ -1297,10 +1300,15 @@ csched_schedule( struct csched_private *prv = CSCHED_PRIV(ops); struct csched_vcpu *snext; struct task_slice ret; + s_time_t runtime, tslice; CSCHED_STAT_CRANK(schedule); CSCHED_VCPU_CHECK(current); + runtime = now - current->runstate.state_entry_time; + if ( runtime < 0 ) /* Does this ever happen? */ + runtime = 0; + if ( !is_idle_vcpu(scurr->vcpu) ) { /* Update credits of a non-idle VCPU. */ @@ -1313,6 +1321,41 @@ csched_schedule( scurr->pri = CSCHED_PRI_IDLE; } + /* Choices, choices: + * - If we have a tasklet, we need to run the idle vcpu no matter what. + * - If sched rate limiting is in effect, and the current vcpu has + * run for less than that amount of time, continue the current one, + * but with a shorter timeslice and return it immediately + * - Otherwise, chose the one with the highest priority (which may + * be the one currently running) + * - If the currently running one is TS_OVER, see if there + * is a higher priority one waiting on the runqueue of another + * cpu and steal it. + */ + + /* If we have schedule rate limiting enabled, check to see + * how long we''ve run for. */ + if ( sched_ratelimit_us + && !tasklet_work_scheduled + && vcpu_runnable(current) + && !is_idle_vcpu(current) + && runtime < MICROSECS(sched_ratelimit_us) ) + { + snext = scurr; + snext->start_time += now; + perfc_incr(delay_ms); + tslice = MICROSECS(sched_ratelimit_us); + ret.migrated = 0; + goto out; + } + else + { + /* + * Select next runnable local VCPU (ie top of local runq) + */ + tslice = MILLISECS(prv->tslice_ms); + } + /* * Select next runnable local VCPU (ie top of local runq) */ @@ -1367,11 +1410,12 @@ csched_schedule( if ( !is_idle_vcpu(snext->vcpu) ) snext->start_time += now; +out: /* * Return task to run next... */ ret.time = (is_idle_vcpu(snext->vcpu) ? - -1 : MILLISECS(prv->tslice_ms)); + -1 : tslice); ret.task = snext->vcpu; CSCHED_VCPU_CHECK(ret.task); diff -r 1c58bb664d8d xen/common/schedule.c --- a/xen/common/schedule.c Thu Dec 08 17:15:16 2011 +0000 +++ b/xen/common/schedule.c Fri Dec 16 15:08:09 2011 -0500 @@ -47,6 +47,10 @@ string_param("sched", opt_sched); bool_t sched_smt_power_savings = 0; boolean_param("sched_smt_power_savings", sched_smt_power_savings); +/* Default scheduling rate limit: 1ms */ +int sched_ratelimit_us = 1000; +integer_param("sched_ratelimit_us", sched_ratelimit_us); + /* Various timer handlers. */ static void s_timer_fn(void *unused); static void vcpu_periodic_timer_fn(void *data); diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h --- a/xen/include/xen/perfc_defn.h Thu Dec 08 17:15:16 2011 +0000 +++ b/xen/include/xen/perfc_defn.h Fri Dec 16 15:08:09 2011 -0500 @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch PERFCOUNTER(sched_run, "sched: runs through scheduler") PERFCOUNTER(sched_ctx, "sched: context switches") +PERFCOUNTER(delay_ms, "csched: delay") PERFCOUNTER(vcpu_check, "csched: vcpu_check") PERFCOUNTER(schedule, "csched: schedule") PERFCOUNTER(acct_run, "csched: acct_run") _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2011-Dec-19 08:24 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
>>> On 17.12.11 at 04:24, "Lv, Hui" <hui.lv@intel.com> wrote: > The delay method for credit scheduler can do as well as SRC patch (previous > one) to gain significant performance boost without obvious drawbacks. > > 1. Basically, the "delay method" can achieve nearly the same benefits as my > previous SRC patch, 11% overall performance boost for SPECvirt than original > credit scheduler. > 2. We have tried 1ms delay and 10ms delay, there is no big difference > between these two configurations. (1ms is enough to achieve a good > performance) > 3. We have compared different load level response time/latency (low, high, > peak), "delay method" didn''t bring very much response time increase. > 4. 1ms delay can reduce 30% context switch at peak performance, where > produces the benefits. ("int sched_ratelimit_us = 1000" is the recommended > setting) > > > Signed-off-by: Hui Lv <hui.lv@intel.com<mailto:hui.lv@intel.com>> > Signed-off-by: George Dunlap > <George.Dunlap@eu.citrix.com<mailto:George.Dunlap@eu.citrix.com>> > > diff -r 1c58bb664d8d xen/common/sched_credit.c > --- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000 > +++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500 > @@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel > static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS; > integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms); > > +/* Scheduler generic parameters > +*/ > +extern int sched_ratelimit_us; > /* > * Physical CPU > */ > @@ -1297,10 +1300,15 @@ csched_schedule( > struct csched_private *prv = CSCHED_PRIV(ops); > struct csched_vcpu *snext; > struct task_slice ret; > + s_time_t runtime, tslice; > > CSCHED_STAT_CRANK(schedule); > CSCHED_VCPU_CHECK(current); > > + runtime = now - current->runstate.state_entry_time; > + if ( runtime < 0 ) /* Does this ever happen? */ > + runtime = 0; > + > if ( !is_idle_vcpu(scurr->vcpu) ) > { > /* Update credits of a non-idle VCPU. */ > @@ -1313,6 +1321,41 @@ csched_schedule( > scurr->pri = CSCHED_PRI_IDLE; > } > > + /* Choices, choices: > + * - If we have a tasklet, we need to run the idle vcpu no matter what. > + * - If sched rate limiting is in effect, and the current vcpu has > + * run for less than that amount of time, continue the current one, > + * but with a shorter timeslice and return it immediately > + * - Otherwise, chose the one with the highest priority (which may > + * be the one currently running) > + * - If the currently running one is TS_OVER, see if there > + * is a higher priority one waiting on the runqueue of another > + * cpu and steal it. > + */ > + > + /* If we have schedule rate limiting enabled, check to see > + * how long we''ve run for. */ > + if ( sched_ratelimit_us > + && !tasklet_work_scheduledHow about making the checking order match the description above (which might also be slightly faster given that tasklet_work_scheduled is a function parameter [in a register or written recently], and [given the default value you''re picking] you expect sched_ratelimit_us to be non-zero generally)?> + && vcpu_runnable(current) > + && !is_idle_vcpu(current) > + && runtime < MICROSECS(sched_ratelimit_us) ) > + { > + snext = scurr; > + snext->start_time += now; > + perfc_incr(delay_ms); > + tslice = MICROSECS(sched_ratelimit_us);So if there happens to be a VM with MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us) it''d get *more* time than allowed/intended through this mechanism. Jan> + ret.migrated = 0; > + goto out; > + } > + else > + { > + /* > + * Select next runnable local VCPU (ie top of local runq) > + */ > + tslice = MILLISECS(prv->tslice_ms); > + } > + > /* > * Select next runnable local VCPU (ie top of local runq) > */ > @@ -1367,11 +1410,12 @@ csched_schedule( > if ( !is_idle_vcpu(snext->vcpu) ) > snext->start_time += now; > > +out: > /* > * Return task to run next... > */ > ret.time = (is_idle_vcpu(snext->vcpu) ? > - -1 : MILLISECS(prv->tslice_ms)); > + -1 : tslice); > ret.task = snext->vcpu; > > CSCHED_VCPU_CHECK(ret.task); > diff -r 1c58bb664d8d xen/common/schedule.c > --- a/xen/common/schedule.c Thu Dec 08 17:15:16 2011 +0000 > +++ b/xen/common/schedule.c Fri Dec 16 15:08:09 2011 -0500 > @@ -47,6 +47,10 @@ string_param("sched", opt_sched); > bool_t sched_smt_power_savings = 0; > boolean_param("sched_smt_power_savings", sched_smt_power_savings); > > +/* Default scheduling rate limit: 1ms */ > +int sched_ratelimit_us = 1000; > +integer_param("sched_ratelimit_us", sched_ratelimit_us); > + > /* Various timer handlers. */ > static void s_timer_fn(void *unused); > static void vcpu_periodic_timer_fn(void *data); > diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h > --- a/xen/include/xen/perfc_defn.h Thu Dec 08 17:15:16 2011 +0000 > +++ b/xen/include/xen/perfc_defn.h Fri Dec 16 15:08:09 2011 -0500 > @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch > PERFCOUNTER(sched_run, "sched: runs through scheduler") > PERFCOUNTER(sched_ctx, "sched: context switches") > > +PERFCOUNTER(delay_ms, "csched: delay") > PERFCOUNTER(vcpu_check, "csched: vcpu_check") > PERFCOUNTER(schedule, "csched: schedule") > PERFCOUNTER(acct_run, "csched: acct_run")
George Dunlap
2011-Dec-19 10:54 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
Hui, Unfortunately your mailer is mangling the patch: static int __read_mostly sched_credit_tslice_ms =3D CSCHED_DEFAULT_TSLICE_MS; Try using "hg email", or sending it as an attachment. -George On Sat, Dec 17, 2011 at 3:24 AM, Lv, Hui <hui.lv@intel.com> wrote:> The delay method for credit scheduler can do as well as SRC patch (previous > one) to gain significant performance boost without obvious drawbacks. > > 1. Basically, the "delay method" can achieve nearly the same benefits as my > previous SRC patch, 11% overall performance boost for SPECvirt than original > credit scheduler. > 2. We have tried 1ms delay and 10ms delay, there is no big difference > between these two configurations. (1ms is enough to achieve a good > performance) > 3. We have compared different load level response time/latency (low, high, > peak), "delay method" didn''t bring very much response time increase. > 4. 1ms delay can reduce 30% context switch at peak performance, where > produces the benefits. (“int sched_ratelimit_us = 1000” is the recommended > setting) > > > Signed-off-by: Hui Lv <hui.lv@intel.com> > Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com> > > diff -r 1c58bb664d8d xen/common/sched_credit.c > --- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000 > +++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500 > @@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel > static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS; > integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms); > > +/* Scheduler generic parameters > +*/ > +extern int sched_ratelimit_us; > /* > * Physical CPU > */ > @@ -1297,10 +1300,15 @@ csched_schedule( > struct csched_private *prv = CSCHED_PRIV(ops); > struct csched_vcpu *snext; > struct task_slice ret; > + s_time_t runtime, tslice; > > CSCHED_STAT_CRANK(schedule); > CSCHED_VCPU_CHECK(current); > > + runtime = now - current->runstate.state_entry_time; > + if ( runtime < 0 ) /* Does this ever happen? */ > + runtime = 0; > + > if ( !is_idle_vcpu(scurr->vcpu) ) > { > /* Update credits of a non-idle VCPU. */ > @@ -1313,6 +1321,41 @@ csched_schedule( > scurr->pri = CSCHED_PRI_IDLE; > } > > + /* Choices, choices: > + * - If we have a tasklet, we need to run the idle vcpu no matter what. > + * - If sched rate limiting is in effect, and the current vcpu has > + * run for less than that amount of time, continue the current one, > + * but with a shorter timeslice and return it immediately > + * - Otherwise, chose the one with the highest priority (which may > + * be the one currently running) > + * - If the currently running one is TS_OVER, see if there > + * is a higher priority one waiting on the runqueue of another > + * cpu and steal it. > + */ > + > + /* If we have schedule rate limiting enabled, check to see > + * how long we''ve run for. */ > + if ( sched_ratelimit_us > + && !tasklet_work_scheduled > + && vcpu_runnable(current) > + && !is_idle_vcpu(current) > + && runtime < MICROSECS(sched_ratelimit_us) ) > + { > + snext = scurr; > + snext->start_time += now; > + perfc_incr(delay_ms); > + tslice = MICROSECS(sched_ratelimit_us); > + ret.migrated = 0; > + goto out; > + } > + else > + { > + /* > + * Select next runnable local VCPU (ie top of local runq) > + */ > + tslice = MILLISECS(prv->tslice_ms); > + } > + > /* > * Select next runnable local VCPU (ie top of local runq) > */ > @@ -1367,11 +1410,12 @@ csched_schedule( > if ( !is_idle_vcpu(snext->vcpu) ) > snext->start_time += now; > > +out: > /* > * Return task to run next... > */ > ret.time = (is_idle_vcpu(snext->vcpu) ? > - -1 : MILLISECS(prv->tslice_ms)); > + -1 : tslice); > ret.task = snext->vcpu; > > CSCHED_VCPU_CHECK(ret.task); > diff -r 1c58bb664d8d xen/common/schedule.c > --- a/xen/common/schedule.c Thu Dec 08 17:15:16 2011 +0000 > +++ b/xen/common/schedule.c Fri Dec 16 15:08:09 2011 -0500 > @@ -47,6 +47,10 @@ string_param("sched", opt_sched); > bool_t sched_smt_power_savings = 0; > boolean_param("sched_smt_power_savings", sched_smt_power_savings); > > +/* Default scheduling rate limit: 1ms */ > +int sched_ratelimit_us = 1000; > +integer_param("sched_ratelimit_us", sched_ratelimit_us); > + > /* Various timer handlers. */ > static void s_timer_fn(void *unused); > static void vcpu_periodic_timer_fn(void *data); > diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h > --- a/xen/include/xen/perfc_defn.h Thu Dec 08 17:15:16 2011 +0000 > +++ b/xen/include/xen/perfc_defn.h Fri Dec 16 15:08:09 2011 -0500 > @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq, "sch > PERFCOUNTER(sched_run, "sched: runs through scheduler") > PERFCOUNTER(sched_ctx, "sched: context switches") > > +PERFCOUNTER(delay_ms, "csched: delay") > PERFCOUNTER(vcpu_check, "csched: vcpu_check") > PERFCOUNTER(schedule, "csched: schedule") > PERFCOUNTER(acct_run, "csched: acct_run") > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >
George Dunlap
2011-Dec-19 11:32 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
On Mon, Dec 19, 2011 at 8:24 AM, Jan Beulich <JBeulich@suse.com> wrote:>> + && vcpu_runnable(current) >> + && !is_idle_vcpu(current) >> + && runtime < MICROSECS(sched_ratelimit_us) ) >> + { >> + snext = scurr; >> + snext->start_time += now; >> + perfc_incr(delay_ms); >> + tslice = MICROSECS(sched_ratelimit_us); > > So if there happens to be a VM with > > MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us) > > it''d get *more* time than allowed/intended through this mechanism.Yeah, if you set your default timeslice to 1ms, and then set your minimum scheduling rate to 5ms, you''re going to get weird results. :-) The way it stands now, the ratelimit value will override the timeslice value. It had to be one way or the other; do you think the timeslice value should be the priority? -George
Jan Beulich
2011-Dec-19 12:05 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
>>> On 19.12.11 at 12:32, George Dunlap <George.Dunlap@eu.citrix.com> wrote: > On Mon, Dec 19, 2011 at 8:24 AM, Jan Beulich <JBeulich@suse.com> wrote: >>> + && vcpu_runnable(current) >>> + && !is_idle_vcpu(current) >>> + && runtime < MICROSECS(sched_ratelimit_us) ) >>> + { >>> + snext = scurr; >>> + snext->start_time += now; >>> + perfc_incr(delay_ms); >>> + tslice = MICROSECS(sched_ratelimit_us); >> >> So if there happens to be a VM with >> >> MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us) >> >> it''d get *more* time than allowed/intended through this mechanism. > > Yeah, if you set your default timeslice to 1ms, and then set your > minimum scheduling rate to 5ms, you''re going to get weird results. :-) > > The way it stands now, the ratelimit value will override the timeslice > value. It had to be one way or the other; do you think the timeslice > value should be the priority?The minimum of both should be used, I would think. Jan
Lv, Hui
2011-Dec-19 13:56 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
> > + /* If we have schedule rate limiting enabled, check to see > > + * how long we''ve run for. */ > > + if ( sched_ratelimit_us > > + && !tasklet_work_scheduled > > How about making the checking order match the description above (which > might also be slightly faster given that tasklet_work_scheduled is a function > parameter [in a register or written recently], and [given the default value > you''re picking] you expect sched_ratelimit_us to be non-zero generally)? >Thanks, agree this.> > + && vcpu_runnable(current) > > + && !is_idle_vcpu(current) > > + && runtime < MICROSECS(sched_ratelimit_us) ) > > + { > > + snext = scurr; > > + snext->start_time += now; > > + perfc_incr(delay_ms); > > + tslice = MICROSECS(sched_ratelimit_us); > > So if there happens to be a VM with > > MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us) > > it''d get *more* time than allowed/intended through this mechanism.First, it does not make sense to set prv->tslice_ms < sched_ratelimit_us. If people really need an extreme smaller time slice(1ms, for example), remember to set sched_ratelimit_us smaller than prv->tslice_ms or zero. If people forgot to do so, I think sched_ratelimit_us is higher priority to be considered (instead of minimum of both), although this is not the "right"xe configuration.
George Dunlap
2011-Dec-19 13:59 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
On Mon, Dec 19, 2011 at 12:05 PM, Jan Beulich <JBeulich@suse.com> wrote:>> The way it stands now, the ratelimit value will override the timeslice >> value. It had to be one way or the other; do you think the timeslice >> value should be the priority? > > The minimum of both should be used, I would think.What do you mean? You mean in the assignment above? That won''t have any effect other than increasing interrupts and trips through the scheduler. Suppose the following set of events: * timeslice is set to 1ms, ratelimit_us to 5000. * a vcpu V is chosen; it''s set to run with 1ms timeout. * 1ms later, we go through the scheduler; the ratelimit code determines that it hasn''t run for long enough (only 1ms), so choses it to run again (with a 1ms timeslice) * Repeat until V has run for 5ms. So although the timeslice is set to 1ms, and interrupts are happening after 1ms, the ratelimit is overriding the 1ms of the timeslice and making it 5ms. Fundamentally, one of the two parameters must override the other. With this patch the way it is now, ratelimit will override timeslice. if you want the timeslice to override ratelimit, then there will have to be more code to make that happen. -George
Jan Beulich
2011-Dec-19 14:59 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
>>> On 19.12.11 at 14:59, George Dunlap <George.Dunlap@eu.citrix.com> wrote: > On Mon, Dec 19, 2011 at 12:05 PM, Jan Beulich <JBeulich@suse.com> wrote: >>> The way it stands now, the ratelimit value will override the timeslice >>> value. It had to be one way or the other; do you think the timeslice >>> value should be the priority? >> >> The minimum of both should be used, I would think. > > What do you mean? You mean in the assignment above?Yes, I had thought that this would suffice. But ...> That won''t have any effect other than increasing interrupts and trips > through the scheduler. Suppose the following set of events: > * timeslice is set to 1ms, ratelimit_us to 5000. > * a vcpu V is chosen; it''s set to run with 1ms timeout. > * 1ms later, we go through the scheduler; the ratelimit code > determines that it hasn''t run for long enough (only 1ms), so choses it > to run again (with a 1ms timeslice) > * Repeat until V has run for 5ms.... if that''s what''s happening, the whole thing looks bogus to me. Clearly the rate limit code should not force the time slice to be extended.> So although the timeslice is set to 1ms, and interrupts are happening > after 1ms, the ratelimit is overriding the 1ms of the timeslice and > making it 5ms. Fundamentally, one of the two parameters must override > the other. With this patch the way it is now, ratelimit will override > timeslice. if you want the timeslice to override ratelimit, then > there will have to be more code to make that happen.Overriding the rate limit by the time slice isn''t the right thing either, as that (the way I "read" it) means there''s no rate limiting at all. What "rate limit" to me means is preventing quickly switching away from a vCPU recently scheduled without extending its (remaining) time slice, i.e. in any place a respective evaluation is done the shorter of the two should be used. Jan
Lv, Hui
2011-Dec-19 15:25 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
> Overriding the rate limit by the time slice isn''t the right thing either, as that > (the way I "read" it) means there''s no rate limiting at all. > What "rate limit" to me means is preventing quickly switching away from a > vCPU recently scheduled without extending its (remaining) time slice, i.e. in any > place a respective evaluation is done the shorter of the two should be used. > > JanSo the basic thing is to avoid "time slice" < "rate limit", happen. I really don''t understand why people want a 1ms time slice, but set the rate_limit to 5000(us), that is insubstantial. If, this unfortunately happens, I prefer to put "rate_limit" at higher priority, which means extending the running time slice. Some warnings should be put before the parameter of sched_ratelimit_us to avoid this. Is this reasonable?
Jan Beulich
2011-Dec-19 15:38 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
>>> On 19.12.11 at 16:25, "Lv, Hui" <hui.lv@intel.com> wrote: >> Overriding the rate limit by the time slice isn''t the right thing either, as > that >> (the way I "read" it) means there''s no rate limiting at all. >> What "rate limit" to me means is preventing quickly switching away from a >> vCPU recently scheduled without extending its (remaining) time slice, i.e. > in any >> place a respective evaluation is done the shorter of the two should be used. >> >> Jan > > So the basic thing is to avoid "time slice" < "rate limit", happen. > I really don''t understand why people want a 1ms time slice, but set the > rate_limit to 5000(us), that is insubstantial.So if someone set the (global) rate limit value to 5000us and then, days or weeks later, migrates a VM with a 3ms time slice to that host, why should this be an admin mistake? To me, the rate limit is a performance improvement, while the time slice may be (an indirect result of) a to be enforced policy. Jan> If, this unfortunately happens, I prefer to put "rate_limit" at higher > priority, which means extending the running time slice. > Some warnings should be put before the parameter of sched_ratelimit_us to > avoid this. > Is this reasonable?
George Dunlap
2011-Dec-19 16:48 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
On Mon, 2011-12-19 at 15:38 +0000, Jan Beulich wrote:> >>> On 19.12.11 at 16:25, "Lv, Hui" <hui.lv@intel.com> wrote: > >> Overriding the rate limit by the time slice isn''t the right thing either, as > > that > >> (the way I "read" it) means there''s no rate limiting at all. > >> What "rate limit" to me means is preventing quickly switching away from a > >> vCPU recently scheduled without extending its (remaining) time slice, i.e. > > in any > >> place a respective evaluation is done the shorter of the two should be used. > >> > >> Jan > > > > So the basic thing is to avoid "time slice" < "rate limit", happen. > > I really don''t understand why people want a 1ms time slice, but set the > > rate_limit to 5000(us), that is insubstantial. > > So if someone set the (global) rate limit value to 5000us and then, > days or weeks later, migrates a VM with a 3ms time slice to that > host, why should this be an admin mistake? To me, the rate limit is a > performance improvement, while the time slice may be (an indirect > result of) a to be enforced policy.Right now the timeslice is effectively global. There''s a per-scheduler variable, but it''s only set from the boot-time option. Before 4.2 I''m going to add some code that will allow that to be changed on a scheduler granularity; but there was never a plan to make it per-VM. It would make sense, in theory, to let the VM run for the rest of its timeslice; but there''s not an easy way at the moment to get that information from an already-running vcpu. The timescales we''re talking about here are so small that it doesn''t seem worth the extra complication to me. We''re really bike-shedding at this point. I don''t think functionality-wise it matters either way, and the code is simpler the way it is. I think we should just leave it. -George
Jan Beulich
2011-Dec-19 17:04 UTC
Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency
>>> On 19.12.11 at 17:48, George Dunlap <george.dunlap@citrix.com> wrote: > On Mon, 2011-12-19 at 15:38 +0000, Jan Beulich wrote: >> >>> On 19.12.11 at 16:25, "Lv, Hui" <hui.lv@intel.com> wrote: >> >> Overriding the rate limit by the time slice isn''t the right thing either, > as >> > that >> >> (the way I "read" it) means there''s no rate limiting at all. >> >> What "rate limit" to me means is preventing quickly switching away from a >> >> vCPU recently scheduled without extending its (remaining) time slice, i.e. >> > in any >> >> place a respective evaluation is done the shorter of the two should be > used. >> >> >> >> Jan >> > >> > So the basic thing is to avoid "time slice" < "rate limit", happen. >> > I really don''t understand why people want a 1ms time slice, but set the >> > rate_limit to 5000(us), that is insubstantial. >> >> So if someone set the (global) rate limit value to 5000us and then, >> days or weeks later, migrates a VM with a 3ms time slice to that >> host, why should this be an admin mistake? To me, the rate limit is a >> performance improvement, while the time slice may be (an indirect >> result of) a to be enforced policy. > > Right now the timeslice is effectively global. There''s a per-scheduler > variable, but it''s only set from the boot-time option. Before 4.2 I''m > going to add some code that will allow that to be changed on a scheduler > granularity; but there was never a plan to make it per-VM.Oh, okay, I missed that point. In that case just warning about the two values having a bad relation would seem fine. Sorry for the noise then. Jan