thr3ads.net - Xen devel - [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Lv, Hui

2011-Dec-17 03:24 UTC

[PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

The delay method for credit scheduler can do as well as SRC patch (previous one)
to gain significant performance boost without obvious drawbacks.

1. Basically, the "delay method" can achieve nearly the same benefits
as my previous SRC patch, 11% overall performance boost for SPECvirt than
original credit scheduler.
2. We have tried 1ms delay and 10ms delay, there is no big difference between
these two configurations. (1ms is enough to achieve a good performance)
3. We have compared different load level response time/latency (low, high,
peak), "delay method" didn''t bring very much response time
increase.
4. 1ms delay can reduce 30% context switch at peak performance, where produces
the benefits. ("int sched_ratelimit_us = 1000" is the recommended
setting)


Signed-off-by: Hui Lv <hui.lv@intel.com<mailto:hui.lv@intel.com>>
Signed-off-by: George Dunlap
<George.Dunlap@eu.citrix.com<mailto:George.Dunlap@eu.citrix.com>>

diff -r 1c58bb664d8d xen/common/sched_credit.c
--- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000
+++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500
@@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel
 static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
 integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);

+/* Scheduler generic parameters
+*/
+extern int sched_ratelimit_us;
 /*
  * Physical CPU
  */
@@ -1297,10 +1300,15 @@ csched_schedule(
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_vcpu *snext;
     struct task_slice ret;
+    s_time_t runtime, tslice;

     CSCHED_STAT_CRANK(schedule);
     CSCHED_VCPU_CHECK(current);

+    runtime = now - current->runstate.state_entry_time;
+    if ( runtime < 0 ) /* Does this ever happen? */
+        runtime = 0;
+
     if ( !is_idle_vcpu(scurr->vcpu) )
     {
         /* Update credits of a non-idle VCPU. */
@@ -1313,6 +1321,41 @@ csched_schedule(
         scurr->pri = CSCHED_PRI_IDLE;
     }

+    /* Choices, choices:
+     * - If we have a tasklet, we need to run the idle vcpu no matter what.
+     * - If sched rate limiting is in effect, and the current vcpu has
+     *   run for less than that amount of time, continue the current one,
+     *   but with a shorter timeslice and return it immediately
+     * - Otherwise, chose the one with the highest priority (which may
+     *   be the one currently running)
+     * - If the currently running one is TS_OVER, see if there
+     *   is a higher priority one waiting on the runqueue of another
+     *   cpu and steal it.
+     */
+
+    /* If we have schedule rate limiting enabled, check to see
+     * how long we''ve run for. */
+    if ( sched_ratelimit_us
+         && !tasklet_work_scheduled
+         && vcpu_runnable(current)
+         && !is_idle_vcpu(current)
+         && runtime < MICROSECS(sched_ratelimit_us) )
+    {
+        snext = scurr;
+        snext->start_time += now;
+        perfc_incr(delay_ms);
+        tslice = MICROSECS(sched_ratelimit_us);
+        ret.migrated = 0;
+        goto out;
+    }
+    else
+    {
+        /*
+         * Select next runnable local VCPU (ie top of local runq)
+        */
+        tslice = MILLISECS(prv->tslice_ms);
+    }
+
     /*
      * Select next runnable local VCPU (ie top of local runq)
      */
@@ -1367,11 +1410,12 @@ csched_schedule(
     if ( !is_idle_vcpu(snext->vcpu) )
         snext->start_time += now;

+out:
     /*
      * Return task to run next...
      */
     ret.time = (is_idle_vcpu(snext->vcpu) ?
-                -1 : MILLISECS(prv->tslice_ms));
+                -1 : tslice);
     ret.task = snext->vcpu;

     CSCHED_VCPU_CHECK(ret.task);
diff -r 1c58bb664d8d xen/common/schedule.c
--- a/xen/common/schedule.c     Thu Dec 08 17:15:16 2011 +0000
+++ b/xen/common/schedule.c     Fri Dec 16 15:08:09 2011 -0500
@@ -47,6 +47,10 @@ string_param("sched", opt_sched);
 bool_t sched_smt_power_savings = 0;
 boolean_param("sched_smt_power_savings", sched_smt_power_savings);

+/* Default scheduling rate limit: 1ms */
+int sched_ratelimit_us = 1000;
+integer_param("sched_ratelimit_us", sched_ratelimit_us);
+
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h
--- a/xen/include/xen/perfc_defn.h      Thu Dec 08 17:15:16 2011 +0000
+++ b/xen/include/xen/perfc_defn.h      Fri Dec 16 15:08:09 2011 -0500
@@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
 PERFCOUNTER(sched_run,              "sched: runs through scheduler")
 PERFCOUNTER(sched_ctx,              "sched: context switches")

+PERFCOUNTER(delay_ms,              "csched: delay")
 PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
 PERFCOUNTER(schedule,               "csched: schedule")
 PERFCOUNTER(acct_run,               "csched: acct_run")



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jan Beulich

2011-Dec-19 08:24 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

>>> On 17.12.11 at 04:24, "Lv, Hui" <hui.lv@intel.com>
wrote:
> The delay method for credit scheduler can do as well as SRC patch (previous
> one) to gain significant performance boost without obvious drawbacks.
> 
> 1. Basically, the "delay method" can achieve nearly the same
benefits as my
> previous SRC patch, 11% overall performance boost for SPECvirt than
original
> credit scheduler.
> 2. We have tried 1ms delay and 10ms delay, there is no big difference 
> between these two configurations. (1ms is enough to achieve a good 
> performance)
> 3. We have compared different load level response time/latency (low, high, 
> peak), "delay method" didn''t bring very much response
time increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where 
> produces the benefits. ("int sched_ratelimit_us = 1000" is the
recommended
> setting)
> 
> 
> Signed-off-by: Hui Lv
<hui.lv@intel.com<mailto:hui.lv@intel.com>>
> Signed-off-by: George Dunlap 
>
<George.Dunlap@eu.citrix.com<mailto:George.Dunlap@eu.citrix.com>>
> 
> diff -r 1c58bb664d8d xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500
> @@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel
>  static int __read_mostly sched_credit_tslice_ms =
CSCHED_DEFAULT_TSLICE_MS;
>  integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);
> 
> +/* Scheduler generic parameters
> +*/
> +extern int sched_ratelimit_us;
>  /*
>   * Physical CPU
>   */
> @@ -1297,10 +1300,15 @@ csched_schedule(
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      struct csched_vcpu *snext;
>      struct task_slice ret;
> +    s_time_t runtime, tslice;
> 
>      CSCHED_STAT_CRANK(schedule);
>      CSCHED_VCPU_CHECK(current);
> 
> +    runtime = now - current->runstate.state_entry_time;
> +    if ( runtime < 0 ) /* Does this ever happen? */
> +        runtime = 0;
> +
>      if ( !is_idle_vcpu(scurr->vcpu) )
>      {
>          /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1321,41 @@ csched_schedule(
>          scurr->pri = CSCHED_PRI_IDLE;
>      }
> 
> +    /* Choices, choices:
> +     * - If we have a tasklet, we need to run the idle vcpu no matter
what.
> +     * - If sched rate limiting is in effect, and the current vcpu has
> +     *   run for less than that amount of time, continue the current one,
> +     *   but with a shorter timeslice and return it immediately
> +     * - Otherwise, chose the one with the highest priority (which may
> +     *   be the one currently running)
> +     * - If the currently running one is TS_OVER, see if there
> +     *   is a higher priority one waiting on the runqueue of another
> +     *   cpu and steal it.
> +     */
> +
> +    /* If we have schedule rate limiting enabled, check to see
> +     * how long we''ve run for. */
> +    if ( sched_ratelimit_us
> +         && !tasklet_work_scheduled
How about making the checking order match the description above
(which might also be slightly faster given that tasklet_work_scheduled
is a function parameter [in a register or written recently], and [given
the default value you''re picking] you expect sched_ratelimit_us to be
non-zero generally)?
> +         && vcpu_runnable(current)
> +         && !is_idle_vcpu(current)
> +         && runtime < MICROSECS(sched_ratelimit_us) )
> +    {
> +        snext = scurr;
> +        snext->start_time += now;
> +        perfc_incr(delay_ms);
> +        tslice = MICROSECS(sched_ratelimit_us);
So if there happens to be a VM with 

MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us)

it''d get *more* time than allowed/intended through this mechanism.

Jan
> +        ret.migrated = 0;
> +        goto out;
> +    }
> +    else
> +    {
> +        /*
> +         * Select next runnable local VCPU (ie top of local runq)
> +        */
> +        tslice = MILLISECS(prv->tslice_ms);
> +    }
> +
>      /*
>       * Select next runnable local VCPU (ie top of local runq)
>       */
> @@ -1367,11 +1410,12 @@ csched_schedule(
>      if ( !is_idle_vcpu(snext->vcpu) )
>          snext->start_time += now;
> 
> +out:
>      /*
>       * Return task to run next...
>       */
>      ret.time = (is_idle_vcpu(snext->vcpu) ?
> -                -1 : MILLISECS(prv->tslice_ms));
> +                -1 : tslice);
>      ret.task = snext->vcpu;
> 
>      CSCHED_VCPU_CHECK(ret.task);
> diff -r 1c58bb664d8d xen/common/schedule.c
> --- a/xen/common/schedule.c     Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/schedule.c     Fri Dec 16 15:08:09 2011 -0500
> @@ -47,6 +47,10 @@ string_param("sched", opt_sched);
>  bool_t sched_smt_power_savings = 0;
>  boolean_param("sched_smt_power_savings",
sched_smt_power_savings);
> 
> +/* Default scheduling rate limit: 1ms */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> +
>  /* Various timer handlers. */
>  static void s_timer_fn(void *unused);
>  static void vcpu_periodic_timer_fn(void *data);
> diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h      Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h      Fri Dec 16 15:08:09 2011 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
>  PERFCOUNTER(sched_run,              "sched: runs through
scheduler")
>  PERFCOUNTER(sched_ctx,              "sched: context switches")
> 
> +PERFCOUNTER(delay_ms,              "csched: delay")
>  PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
>  PERFCOUNTER(schedule,               "csched: schedule")
>  PERFCOUNTER(acct_run,               "csched: acct_run")

George Dunlap

2011-Dec-19 10:54 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Hui,

Unfortunately your mailer is mangling the patch:

 static int __read_mostly sched_credit_tslice_ms =3D CSCHED_DEFAULT_TSLICE_MS;

Try using "hg email", or sending it as an attachment.

 -George

On Sat, Dec 17, 2011 at 3:24 AM, Lv, Hui <hui.lv@intel.com>
wrote:> The delay method for credit scheduler can do as well as SRC patch (previous
> one) to gain significant performance boost without obvious drawbacks.
>
> 1. Basically, the "delay method" can achieve nearly the same
benefits as my
> previous SRC patch, 11% overall performance boost for SPECvirt than
original
> credit scheduler.
> 2. We have tried 1ms delay and 10ms delay, there is no big difference
> between these two configurations. (1ms is enough to achieve a good
> performance)
> 3. We have compared different load level response time/latency (low, high,
> peak), "delay method" didn''t bring very much response
time increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where
> produces the benefits. (“int sched_ratelimit_us = 1000” is the recommended
> setting)
>
>
> Signed-off-by: Hui Lv <hui.lv@intel.com>
> Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>
>
> diff -r 1c58bb664d8d xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/sched_credit.c Fri Dec 16 15:08:09 2011 -0500
> @@ -110,6 +110,9 @@ boolean_param("sched_credit_default_yiel
> static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
> integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);
>
> +/* Scheduler generic parameters
> +*/
> +extern int sched_ratelimit_us;
> /*
>   * Physical CPU
>   */
> @@ -1297,10 +1300,15 @@ csched_schedule(
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      struct csched_vcpu *snext;
>      struct task_slice ret;
> +    s_time_t runtime, tslice;
>
>      CSCHED_STAT_CRANK(schedule);
>      CSCHED_VCPU_CHECK(current);
>
> +    runtime = now - current->runstate.state_entry_time;
> +    if ( runtime < 0 ) /* Does this ever happen? */
> +        runtime = 0;
> +
>      if ( !is_idle_vcpu(scurr->vcpu) )
>      {
>          /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1321,41 @@ csched_schedule(
>          scurr->pri = CSCHED_PRI_IDLE;
>      }
>
> +    /* Choices, choices:
> +     * - If we have a tasklet, we need to run the idle vcpu no matter
what.
> +     * - If sched rate limiting is in effect, and the current vcpu has
> +     *   run for less than that amount of time, continue the current one,
> +     *   but with a shorter timeslice and return it immediately
> +     * - Otherwise, chose the one with the highest priority (which may
> +     *   be the one currently running)
> +     * - If the currently running one is TS_OVER, see if there
> +     *   is a higher priority one waiting on the runqueue of another
> +     *   cpu and steal it.
> +     */
> +
> +    /* If we have schedule rate limiting enabled, check to see
> +     * how long we''ve run for. */
> +    if ( sched_ratelimit_us
> +         && !tasklet_work_scheduled
> +         && vcpu_runnable(current)
> +         && !is_idle_vcpu(current)
> +         && runtime < MICROSECS(sched_ratelimit_us) )
> +    {
> +        snext = scurr;
> +        snext->start_time += now;
> +        perfc_incr(delay_ms);
> +        tslice = MICROSECS(sched_ratelimit_us);
> +        ret.migrated = 0;
> +        goto out;
> +    }
> +    else
> +    {
> +        /*
> +         * Select next runnable local VCPU (ie top of local runq)
> +        */
> +        tslice = MILLISECS(prv->tslice_ms);
> +    }
> +
>      /*
>       * Select next runnable local VCPU (ie top of local runq)
>       */
> @@ -1367,11 +1410,12 @@ csched_schedule(
>      if ( !is_idle_vcpu(snext->vcpu) )
>          snext->start_time += now;
>
> +out:
>      /*
>       * Return task to run next...
>       */
>      ret.time = (is_idle_vcpu(snext->vcpu) ?
> -                -1 : MILLISECS(prv->tslice_ms));
> +                -1 : tslice);
>      ret.task = snext->vcpu;
>
>      CSCHED_VCPU_CHECK(ret.task);
> diff -r 1c58bb664d8d xen/common/schedule.c
> --- a/xen/common/schedule.c     Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/common/schedule.c     Fri Dec 16 15:08:09 2011 -0500
> @@ -47,6 +47,10 @@ string_param("sched", opt_sched);
> bool_t sched_smt_power_savings = 0;
> boolean_param("sched_smt_power_savings",
sched_smt_power_savings);
>
> +/* Default scheduling rate limit: 1ms */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> +
> /* Various timer handlers. */
> static void s_timer_fn(void *unused);
> static void vcpu_periodic_timer_fn(void *data);
> diff -r 1c58bb664d8d xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h      Thu Dec 08 17:15:16 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h      Fri Dec 16 15:08:09 2011 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
> PERFCOUNTER(sched_run,              "sched: runs through
scheduler")
> PERFCOUNTER(sched_ctx,              "sched: context switches")
>
> +PERFCOUNTER(delay_ms,              "csched: delay")
> PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
> PERFCOUNTER(schedule,               "csched: schedule")
> PERFCOUNTER(acct_run,               "csched: acct_run")
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>

George Dunlap

2011-Dec-19 11:32 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

On Mon, Dec 19, 2011 at 8:24 AM, Jan Beulich <JBeulich@suse.com>
wrote:>> +         && vcpu_runnable(current)
>> +         && !is_idle_vcpu(current)
>> +         && runtime < MICROSECS(sched_ratelimit_us) )
>> +    {
>> +        snext = scurr;
>> +        snext->start_time += now;
>> +        perfc_incr(delay_ms);
>> +        tslice = MICROSECS(sched_ratelimit_us);
>
> So if there happens to be a VM with
>
> MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us)
>
> it''d get *more* time than allowed/intended through this mechanism.
Yeah, if you set your default timeslice to 1ms, and then set your
minimum scheduling rate to 5ms, you''re going to get weird results. :-)

The way it stands now, the ratelimit value will override the timeslice
value.  It had to be one way or the other; do you think the timeslice
value should be the priority?

 -George

Jan Beulich

2011-Dec-19 12:05 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

>>> On 19.12.11 at 12:32, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Mon, Dec 19, 2011 at 8:24 AM, Jan Beulich <JBeulich@suse.com>
wrote:
>>> +         && vcpu_runnable(current)
>>> +         && !is_idle_vcpu(current)
>>> +         && runtime < MICROSECS(sched_ratelimit_us) )
>>> +    {
>>> +        snext = scurr;
>>> +        snext->start_time += now;
>>> +        perfc_incr(delay_ms);
>>> +        tslice = MICROSECS(sched_ratelimit_us);
>>
>> So if there happens to be a VM with
>>
>> MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us)
>>
>> it''d get *more* time than allowed/intended through this
mechanism.
> 
> Yeah, if you set your default timeslice to 1ms, and then set your
> minimum scheduling rate to 5ms, you''re going to get weird results.
:-)
> 
> The way it stands now, the ratelimit value will override the timeslice
> value.  It had to be one way or the other; do you think the timeslice
> value should be the priority?
The minimum of both should be used, I would think.

Jan

Lv, Hui

2011-Dec-19 13:56 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

> > +    /* If we have schedule rate limiting enabled, check to see
> > +     * how long we''ve run for. */
> > +    if ( sched_ratelimit_us
> > +         && !tasklet_work_scheduled
> 
> How about making the checking order match the description above (which
> might also be slightly faster given that tasklet_work_scheduled is a
function
> parameter [in a register or written recently], and [given the default value
> you''re picking] you expect sched_ratelimit_us to be non-zero
generally)?
> Thanks, agree this.
> > +         && vcpu_runnable(current)
> > +         && !is_idle_vcpu(current)
> > +         && runtime < MICROSECS(sched_ratelimit_us) )
> > +    {
> > +        snext = scurr;
> > +        snext->start_time += now;
> > +        perfc_incr(delay_ms);
> > +        tslice = MICROSECS(sched_ratelimit_us);
> 
> So if there happens to be a VM with
> 
> MILLISECS(prv->tslice_ms) < MICROSECS(sched_ratelimit_us)
> 
> it''d get *more* time than allowed/intended through this mechanism.
First, it does not make sense to set prv->tslice_ms < sched_ratelimit_us.
If people really need an extreme smaller time slice(1ms, for example), remember
to set sched_ratelimit_us smaller than prv->tslice_ms or zero.
If people forgot to do so, I think sched_ratelimit_us is higher priority to be
considered (instead of minimum of both), although this is not the
"right"xe configuration.

George Dunlap

2011-Dec-19 13:59 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

On Mon, Dec 19, 2011 at 12:05 PM, Jan Beulich <JBeulich@suse.com>
wrote:>> The way it stands now, the ratelimit value will override the timeslice
>> value.  It had to be one way or the other; do you think the timeslice
>> value should be the priority?
>
> The minimum of both should be used, I would think.
What do you mean?  You mean in the assignment above?

That won''t have any effect other than increasing interrupts and trips
through the scheduler.  Suppose the following set of events:
* timeslice is set to 1ms, ratelimit_us to 5000.
* a vcpu V is chosen; it''s set to run with 1ms timeout.
* 1ms later, we go through the scheduler; the ratelimit code
determines that it hasn''t run for long enough (only 1ms), so choses it
to run again (with a 1ms timeslice)
* Repeat until V has run for 5ms.

So although the timeslice is set to 1ms, and interrupts are happening
after 1ms, the ratelimit is overriding the 1ms of the timeslice and
making it 5ms.  Fundamentally, one of the two parameters must override
the other.  With this patch the way it is now, ratelimit will override
timeslice.  if you want the timeslice to override ratelimit, then
there will have to be more code to make that happen.

 -George

Jan Beulich

2011-Dec-19 14:59 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

>>> On 19.12.11 at 14:59, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> On Mon, Dec 19, 2011 at 12:05 PM, Jan Beulich <JBeulich@suse.com>
wrote:
>>> The way it stands now, the ratelimit value will override the
timeslice
>>> value.  It had to be one way or the other; do you think the
timeslice
>>> value should be the priority?
>>
>> The minimum of both should be used, I would think.
> 
> What do you mean?  You mean in the assignment above?
Yes, I had thought that this would suffice. But ...
> That won''t have any effect other than increasing interrupts and
trips
> through the scheduler.  Suppose the following set of events:
> * timeslice is set to 1ms, ratelimit_us to 5000.
> * a vcpu V is chosen; it''s set to run with 1ms timeout.
> * 1ms later, we go through the scheduler; the ratelimit code
> determines that it hasn''t run for long enough (only 1ms), so
choses it
> to run again (with a 1ms timeslice)
> * Repeat until V has run for 5ms.
... if that''s what''s happening, the whole thing looks bogus to
me.
Clearly the rate limit code should not force the time slice to be
extended.
> So although the timeslice is set to 1ms, and interrupts are happening
> after 1ms, the ratelimit is overriding the 1ms of the timeslice and
> making it 5ms.  Fundamentally, one of the two parameters must override
> the other.  With this patch the way it is now, ratelimit will override
> timeslice.  if you want the timeslice to override ratelimit, then
> there will have to be more code to make that happen.
Overriding the rate limit by the time slice isn''t the right thing
either,
as that (the way I "read" it) means there''s no rate limiting
at all.
What "rate limit" to me means is preventing quickly switching away
from a vCPU recently scheduled without extending its (remaining)
time slice, i.e. in any place a respective evaluation is done the
shorter of the two should be used.

Jan

Lv, Hui

2011-Dec-19 15:25 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

> Overriding the rate limit by the time slice isn''t the right thing
either, as that
> (the way I "read" it) means there''s no rate limiting at
all.
> What "rate limit" to me means is preventing quickly switching
away from a
> vCPU recently scheduled without extending its (remaining) time slice, i.e.
in any
> place a respective evaluation is done the shorter of the two should be
used.
> 
> Jan
So the basic thing is to avoid "time slice" < "rate
limit", happen.
I really don''t understand why people want a 1ms time slice, but set the
rate_limit to 5000(us), that is insubstantial.

If, this unfortunately happens, I prefer to put "rate_limit" at higher
priority, which means extending the running time slice.
Some warnings should be put before the parameter of sched_ratelimit_us to avoid
this.
Is this reasonable?

Jan Beulich

2011-Dec-19 15:38 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

>>> On 19.12.11 at 16:25, "Lv, Hui" <hui.lv@intel.com>
wrote:
>> Overriding the rate limit by the time slice isn''t the right
thing either, as
> that
>> (the way I "read" it) means there''s no rate limiting
at all.
>> What "rate limit" to me means is preventing quickly switching
away from a
>> vCPU recently scheduled without extending its (remaining) time slice,
i.e.
> in any
>> place a respective evaluation is done the shorter of the two should be
used.
>> 
>> Jan
> 
> So the basic thing is to avoid "time slice" < "rate
limit", happen.
> I really don''t understand why people want a 1ms time slice, but
set the
> rate_limit to 5000(us), that is insubstantial.
So if someone set the (global) rate limit value to 5000us and then,
days or weeks later, migrates a VM with a 3ms time slice to that
host, why should this be an admin mistake? To me, the rate limit is a
performance improvement, while the time slice may be (an indirect
result of) a to be enforced policy.

Jan
> If, this unfortunately happens, I prefer to put "rate_limit" at
higher
> priority, which means extending the running time slice.
> Some warnings should be put before the parameter of sched_ratelimit_us to 
> avoid this.
> Is this reasonable?

George Dunlap

2011-Dec-19 16:48 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

On Mon, 2011-12-19 at 15:38 +0000, Jan Beulich wrote:> >>> On 19.12.11 at 16:25, "Lv, Hui"
<hui.lv@intel.com> wrote:
> >> Overriding the rate limit by the time slice isn''t the
right thing either, as
> > that
> >> (the way I "read" it) means there''s no rate
limiting at all.
> >> What "rate limit" to me means is preventing quickly
switching away from a
> >> vCPU recently scheduled without extending its (remaining) time
slice, i.e.
> > in any
> >> place a respective evaluation is done the shorter of the two
should be used.
> >> 
> >> Jan
> > 
> > So the basic thing is to avoid "time slice" < "rate
limit", happen.
> > I really don''t understand why people want a 1ms time slice,
but set the
> > rate_limit to 5000(us), that is insubstantial.
> 
> So if someone set the (global) rate limit value to 5000us and then,
> days or weeks later, migrates a VM with a 3ms time slice to that
> host, why should this be an admin mistake? To me, the rate limit is a
> performance improvement, while the time slice may be (an indirect
> result of) a to be enforced policy.
Right now the timeslice is effectively global.  There''s a per-scheduler
variable, but it''s only set from the boot-time option.  Before 4.2
I''m
going to add some code that will allow that to be changed on a scheduler
granularity; but there was never a plan to make it per-VM.

It would make sense, in theory, to let the VM run for the rest of its
timeslice; but there''s not an easy way at the moment to get that
information from an already-running vcpu.  The timescales we''re talking
about here are so small that it doesn''t seem worth the extra
complication to me.

We''re really bike-shedding at this point.  I don''t think
functionality-wise it matters either way, and the code is simpler the
way it is.  I think we should just leave it.

 -George

Jan Beulich

2011-Dec-19 17:04 UTC

head link

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

>>> On 19.12.11 at 17:48, George Dunlap
<george.dunlap@citrix.com> wrote:
> On Mon, 2011-12-19 at 15:38 +0000, Jan Beulich wrote:
>> >>> On 19.12.11 at 16:25, "Lv, Hui"
<hui.lv@intel.com> wrote:
>> >> Overriding the rate limit by the time slice isn''t the
right thing either,
> as 
>> > that
>> >> (the way I "read" it) means there''s no rate
limiting at all.
>> >> What "rate limit" to me means is preventing quickly
switching away from a
>> >> vCPU recently scheduled without extending its (remaining) time
slice, i.e.
>> > in any
>> >> place a respective evaluation is done the shorter of the two
should be
> used.
>> >> 
>> >> Jan
>> > 
>> > So the basic thing is to avoid "time slice" <
"rate limit", happen.
>> > I really don''t understand why people want a 1ms time
slice, but set the
>> > rate_limit to 5000(us), that is insubstantial.
>> 
>> So if someone set the (global) rate limit value to 5000us and then,
>> days or weeks later, migrates a VM with a 3ms time slice to that
>> host, why should this be an admin mistake? To me, the rate limit is a
>> performance improvement, while the time slice may be (an indirect
>> result of) a to be enforced policy.
> 
> Right now the timeslice is effectively global.  There''s a
per-scheduler
> variable, but it''s only set from the boot-time option.  Before 4.2
I''m
> going to add some code that will allow that to be changed on a scheduler
> granularity; but there was never a plan to make it per-VM.
Oh, okay, I missed that point. In that case just warning about the
two values having a bad relation would seem fine.

Sorry for the noise then.

Jan

Xen devel - Dec 2011 - [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

[PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency

Re: [PATCH v2] xen/credit scheduler; Use delay to control scheduling frequency