thr3ads.net - Xen devel - [PATCH] xen/sched_credit: Use delay to control scheduling frequency [Dec 2011]

If this information is useful, please help other people find it:
Share via:

Hui Lv

2011-Dec-19 22:13 UTC

[PATCH] xen/sched_credit: Use delay to control scheduling frequency

1. Basically, the "delay method" can achieve 11% overall performance
boost for SPECvirt than original credit scheduler.
2. We have tried 1ms delay and 10ms delay, there is no big difference between
these two configurations. (1ms is enough to achieve a good performance)
3. We have compared different load level response time/latency (low, high,
peak), "delay method" didn''t bring very much response time
increase.
4. 1ms delay can reduce 30% context switch at peak performance, where produces
the benefits. (.int sched_ratelimit_us = 1000. is the recommended setting)

Signed-off-by: Hui Lv <hui.lv@intel.com>
Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>

diff -r a4bff36780a3 -r 114495d1ff94 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/sched_credit.c	Mon Dec 19 15:58:50 2011 -0500
@@ -110,6 +110,10 @@ boolean_param("sched_credit_default_yiel
 static int __read_mostly sched_credit_tslice_ms = CSCHED_DEFAULT_TSLICE_MS;
 integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);
 
+/* Scheduler generic parameters
+*/
+extern int sched_ratelimit_us;
+
 /*
  * Physical CPU
  */
@@ -1297,10 +1301,15 @@ csched_schedule(
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_vcpu *snext;
     struct task_slice ret;
+    s_time_t runtime, tslice;
 
     CSCHED_STAT_CRANK(schedule);
     CSCHED_VCPU_CHECK(current);
 
+    runtime = now - current->runstate.state_entry_time;
+    if ( runtime < 0 ) /* Does this ever happen? */
+        runtime = 0;
+
     if ( !is_idle_vcpu(scurr->vcpu) )
     {
         /* Update credits of a non-idle VCPU. */
@@ -1313,6 +1322,41 @@ csched_schedule(
         scurr->pri = CSCHED_PRI_IDLE;
     }
 
+    /* Choices, choices:
+     * - If we have a tasklet, we need to run the idle vcpu no matter what.
+     * - If sched rate limiting is in effect, and the current vcpu has
+     *   run for less than that amount of time, continue the current one,
+     *   but with a shorter timeslice and return it immediately
+     * - Otherwise, chose the one with the highest priority (which may
+     *   be the one currently running)
+     * - If the currently running one is TS_OVER, see if there
+     *   is a higher priority one waiting on the runqueue of another
+     *   cpu and steal it.
+     */
+
+    /* If we have schedule rate limiting enabled, check to see
+     * how long we''ve run for. */
+    if ( !tasklet_work_scheduled
+         && sched_ratelimit_us
+         && vcpu_runnable(current)
+         && !is_idle_vcpu(current)
+         && runtime < MICROSECS(sched_ratelimit_us) )
+    {
+        snext = scurr;
+        snext->start_time += now;
+        perfc_incr(delay_ms);
+        tslice = MICROSECS(sched_ratelimit_us);
+        ret.migrated = 0;
+        goto out;
+    }
+    else
+    {
+        /*
+         * Select next runnable local VCPU (ie top of local runq)
+        */
+        tslice = MILLISECS(prv->tslice_ms);
+    }
+
     /*
      * Select next runnable local VCPU (ie top of local runq)
      */
@@ -1367,11 +1411,12 @@ csched_schedule(
     if ( !is_idle_vcpu(snext->vcpu) )
         snext->start_time += now;
 
+out:
     /*
      * Return task to run next...
      */
     ret.time = (is_idle_vcpu(snext->vcpu) ?
-                -1 : MILLISECS(prv->tslice_ms));
+                -1 : tslice);
     ret.task = snext->vcpu;
 
     CSCHED_VCPU_CHECK(ret.task);
diff -r a4bff36780a3 -r 114495d1ff94 xen/common/schedule.c
--- a/xen/common/schedule.c	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/schedule.c	Mon Dec 19 15:58:50 2011 -0500
@@ -47,6 +47,12 @@ string_param("sched", opt_sched);
 bool_t sched_smt_power_savings = 0;
 boolean_param("sched_smt_power_savings", sched_smt_power_savings);
 
+/* Default scheduling rate limit: 1ms 
+ * It''s not recommended to set this value bigger than
"sched_credit_tslice_ms"
+ * otherwise, something weired may happen
+ * */
+int sched_ratelimit_us = 1000;
+integer_param("sched_ratelimit_us", sched_ratelimit_us);
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
diff -r a4bff36780a3 -r 114495d1ff94 xen/include/xen/perfc_defn.h
--- a/xen/include/xen/perfc_defn.h	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/include/xen/perfc_defn.h	Mon Dec 19 15:58:50 2011 -0500
@@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
 PERFCOUNTER(sched_run,              "sched: runs through scheduler")
 PERFCOUNTER(sched_ctx,              "sched: context switches")
 
+PERFCOUNTER(delay_ms,               "csched: delay")
 PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
 PERFCOUNTER(schedule,               "csched: schedule")
 PERFCOUNTER(acct_run,               "csched: acct_run")

Ian Campbell

2011-Dec-20 09:07 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

On Mon, 2011-12-19 at 22:13 +0000, Hui Lv wrote:> 1. Basically, the "delay method" can achieve 11% overall
performance
> boost for SPECvirt than original credit scheduler.
Is it possible to publish the raw numbers?
> 2. We have tried 1ms delay and 10ms delay, there is no big difference
> between these two configurations. (1ms is enough to achieve a good
> performance)
> 3. We have compared different load level response time/latency (low,
> high, peak), "delay method" didn''t bring very much
response time
> increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where
> produces the benefits. (.int sched_ratelimit_us = 1000. is the
> recommended setting)
> 
> Signed-off-by: Hui Lv <hui.lv@intel.com>
> Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>
> 
> diff -r a4bff36780a3 -r 114495d1ff94 xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c	Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/sched_credit.c	Mon Dec 19 15:58:50 2011 -0500
> @@ -110,6 +110,10 @@ boolean_param("sched_credit_default_yiel
>  static int __read_mostly sched_credit_tslice_ms =
CSCHED_DEFAULT_TSLICE_MS;
>  integer_param("sched_credit_tslice_ms", sched_credit_tslice_ms);
>  
> +/* Scheduler generic parameters
> +*/
> +extern int sched_ratelimit_us;
If this is generic then it seems like xen/sched.h is the right place for
it.

Is it really generic though if only the credit scheduler applies it?
> +
>  /*
>   * Physical CPU
>   */
> @@ -1297,10 +1301,15 @@ csched_schedule(
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      struct csched_vcpu *snext;
>      struct task_slice ret;
> +    s_time_t runtime, tslice;
>  
>      CSCHED_STAT_CRANK(schedule);
>      CSCHED_VCPU_CHECK(current);
>  
> +    runtime = now - current->runstate.state_entry_time;
> +    if ( runtime < 0 ) /* Does this ever happen? */
> +        runtime = 0;
> +
>      if ( !is_idle_vcpu(scurr->vcpu) )
>      {
>          /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1322,41 @@ csched_schedule(
>          scurr->pri = CSCHED_PRI_IDLE;
>      }
>  
> +    /* Choices, choices:
> +     * - If we have a tasklet, we need to run the idle vcpu no matter
what.
> +     * - If sched rate limiting is in effect, and the current vcpu has
> +     *   run for less than that amount of time, continue the current one,
> +     *   but with a shorter timeslice and return it immediately
> +     * - Otherwise, chose the one with the highest priority (which may
> +     *   be the one currently running)
> +     * - If the currently running one is TS_OVER, see if there
> +     *   is a higher priority one waiting on the runqueue of another
> +     *   cpu and steal it.
> +     */
> +
> +    /* If we have schedule rate limiting enabled, check to see
> +     * how long we''ve run for. */
> +    if ( !tasklet_work_scheduled
> +         && sched_ratelimit_us
> +         && vcpu_runnable(current)
> +         && !is_idle_vcpu(current)
> +         && runtime < MICROSECS(sched_ratelimit_us) )
> +    {
> +        snext = scurr;
> +        snext->start_time += now;
> +        perfc_incr(delay_ms);
> +        tslice = MICROSECS(sched_ratelimit_us);
> +        ret.migrated = 0;
> +        goto out;
> +    }
> +    else
> +    {
This is the same comment as the next block below, does it really apply
here too?

Since the previous if block ends with a goto the else clause is a bit
redundant.
> +        /*
> +         * Select next runnable local VCPU (ie top of local runq)
> +        */
> +        tslice = MILLISECS(prv->tslice_ms);
> +    }
> +
>      /*
>       * Select next runnable local VCPU (ie top of local runq)
>       */
> @@ -1367,11 +1411,12 @@ csched_schedule(
>      if ( !is_idle_vcpu(snext->vcpu) )
>          snext->start_time += now;
>  
> +out:
>      /*
>       * Return task to run next...
>       */
>      ret.time = (is_idle_vcpu(snext->vcpu) ?
> -                -1 : MILLISECS(prv->tslice_ms));
> +                -1 : tslice);
>      ret.task = snext->vcpu;
>  
>      CSCHED_VCPU_CHECK(ret.task);
> diff -r a4bff36780a3 -r 114495d1ff94 xen/common/schedule.c
> --- a/xen/common/schedule.c	Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/schedule.c	Mon Dec 19 15:58:50 2011 -0500
> @@ -47,6 +47,12 @@ string_param("sched", opt_sched);
>  bool_t sched_smt_power_savings = 0;
>  boolean_param("sched_smt_power_savings",
sched_smt_power_savings);
>  
> +/* Default scheduling rate limit: 1ms 
> + * It''s not recommended to set this value bigger than
"sched_credit_tslice_ms"
> + * otherwise, something weired may happen
                           weird

It that constraint enforced anywhere?
> + * */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
>  /* Various timer handlers. */
>  static void s_timer_fn(void *unused);
>  static void vcpu_periodic_timer_fn(void *data);
> diff -r a4bff36780a3 -r 114495d1ff94 xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h	Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h	Mon Dec 19 15:58:50 2011 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
>  PERFCOUNTER(sched_run,              "sched: runs through
scheduler")
>  PERFCOUNTER(sched_ctx,              "sched: context switches")
>  
> +PERFCOUNTER(delay_ms,               "csched: delay")
>  PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
>  PERFCOUNTER(schedule,               "csched: schedule")
>  PERFCOUNTER(acct_run,               "csched: acct_run")
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

Lv, Hui

2011-Dec-20 09:44 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

> Is it possible to publish the raw numbers?Yes. 
Actually, I have posted the raw data before. 
You can check the attachment for the details.
> If this is generic then it seems like xen/sched.h is the right place for
it.
> 
> Is it really generic though if only the credit scheduler applies it?
We have discussed the "generic" issue in previous mails with George,
Dario Faggioli and some other people.
The conclusion is that currently, it is applied to credit scheduler. In future,
it may be developed into a generic solution.
> This is the same comment as the next block below, does it really apply here
> too?
> 
> Since the previous if block ends with a goto the else clause is a bit
redundant.
Thanks, I'll remove the redundant comment and else clause.
> It that constraint enforced anywhere?
Currently, it's not constraint enforced. 
I think George have gave the explanation in yesterday's discussion.


Best regards,

Lv, Hui





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Dec-20 10:21 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

On Tue, 2011-12-20 at 09:07 +0000, Ian Campbell wrote:>  
> > +/* Scheduler generic parameters
> > +*/
> > +extern int sched_ratelimit_us;
> 
> If this is generic then it seems like xen/sched.h is the right place for
> it.
Yes, this should be declared in xen/include/xen/sched.h.
> Is it really generic though if only the credit scheduler applies it?
The concept is generic enough that I think saying, "This is the control,
not all schedulers implement it" makes sense.  credit2 *may* in the
future have use for such a control; at which point it would be a bit
asinine to have two switches, named "credit_sched_ratelimit_us" and
"credit2_sched_ratelimit_us".
> > +    /* If we have schedule rate limiting enabled, check to see
> > +     * how long we''ve run for. */
> > +    if ( !tasklet_work_scheduled
> > +         && sched_ratelimit_us
> > +         && vcpu_runnable(current)
> > +         && !is_idle_vcpu(current)
> > +         && runtime < MICROSECS(sched_ratelimit_us) )
> > +    {
> > +        snext = scurr;
> > +        snext->start_time += now;
> > +        perfc_incr(delay_ms);
> > +        tslice = MICROSECS(sched_ratelimit_us);
> > +        ret.migrated = 0;
> > +        goto out;
> > +    }
> > +    else
> > +    {
> 
> This is the same comment as the next block below, does it really apply
> here too?
> 
> Since the previous if block ends with a goto the else clause is a bit
> redundant.
Yeah, probably better w/o the else.
> > +/* Default scheduling rate limit: 1ms 
> > + * It''s not recommended to set this value bigger than
"sched_credit_tslice_ms"
> > + * otherwise, something weired may happen
Hui, "something weird may happen" is fine for informal speech, but for
comments we tend to use more formal language.  I might say something
like this:

"The behavior when sched_ratelimit_us is greater than
sched_credit_tslice_ms is undefined."

 -George

George Dunlap

2011-Dec-20 10:26 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

On Tue, 2011-12-20 at 10:09 +0000, Dario Faggioli wrote:> On Tue, 2011-12-20 at 17:44 +0800, Lv, Hui wrote: 
> > > It that constraint enforced anywhere?
> > 
> > Currently, it''s not constraint enforced. 
> > I think George have gave the explanation in yesterday''s
discussion.
> > 
> I think that too, but maybe you can print some kind of warning when it
> happens?
I guess the consensus is that we should put in some effort to make the
interface more polished.

How about this:

First, add a ratelimit_us element to csched_priv, just like the current
tslice_ms element.

When the scheduler comes up (in csched_init), it checks to see if
MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms); if
so, it prints a warning, and sets prv->ratelimit_us to
1000*prv->tslice_ms.

In the future, when we implement the domctls to change a scheduler''s
ratelimit_us, and tslice_ms, we disallow changes which would violate the
"ratelimit < tslice" rule (returning -EINVAL or something like
that).

Thoughts?

 -George

Lv, Hui

2011-Dec-20 12:14 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

> How about this:
> 
> First, add a ratelimit_us element to csched_priv, just like the current
tslice_ms
> element.
> 
> When the scheduler comes up (in csched_init), it checks to see if
> MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms); if
so, it
> prints a warning, and sets prv->ratelimit_us to 1000*prv->tslice_ms.
> I think it makes sense to do so.

Hui Lv

2011-Dec-26 08:46 UTC

head link

[PATCH] xen/sched_credit: Use delay to control scheduling frequency

Some modifications for this patch.
1.Based on George''s proposal, added a ratelimit_us element to
csched_priv to constrain prv->ratelimit_us <= 1000* prv->tslice_ms in
csched_init.
2.Move the definition of sched_ratelimit_us to xen/sched.h
3.Remove the redundant "else" structure

See if you have any comment for such changes.


This patch can improve Xen performance:
1. Basically, the "delay method" can achieve 11% overall performance
boost for SPECvirt than original credit scheduler.
2. We have tried 1ms delay and 10ms delay, there is no big difference between
these two configurations. (1ms is enough to achieve a good performance)
3. We have compared different load level response time/latency (low, high,
peak), "delay method" didn''t bring very much response time
increase.
4. 1ms delay can reduce 30% context switch at peak performance, where produces
the benefits. (int sched_ratelimit_us = 1000 is the recommended setting)

Signed-off-by: Hui Lv <hui.lv@intel.com>
Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>

diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/sched_credit.c	Mon Dec 26 03:44:39 2011 -0500
@@ -172,6 +172,7 @@ struct csched_private {
     uint32_t credit;
     int credit_balance;
     uint32_t runq_sort;
+    unsigned ratelimit_us;
     /* Period of master and tick in milliseconds */
     unsigned tslice_ms, tick_period_us, ticks_per_tslice;
     unsigned credits_per_tslice;
@@ -1297,10 +1298,15 @@ csched_schedule(
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_vcpu *snext;
     struct task_slice ret;
+    s_time_t runtime, tslice;
 
     CSCHED_STAT_CRANK(schedule);
     CSCHED_VCPU_CHECK(current);
 
+    runtime = now - current->runstate.state_entry_time;
+    if ( runtime < 0 ) /* Does this ever happen? */
+        runtime = 0;
+
     if ( !is_idle_vcpu(scurr->vcpu) )
     {
         /* Update credits of a non-idle VCPU. */
@@ -1313,6 +1319,35 @@ csched_schedule(
         scurr->pri = CSCHED_PRI_IDLE;
     }
 
+    /* Choices, choices:
+     * - If we have a tasklet, we need to run the idle vcpu no matter what.
+     * - If sched rate limiting is in effect, and the current vcpu has
+     *   run for less than that amount of time, continue the current one,
+     *   but with a shorter timeslice and return it immediately
+     * - Otherwise, chose the one with the highest priority (which may
+     *   be the one currently running)
+     * - If the currently running one is TS_OVER, see if there
+     *   is a higher priority one waiting on the runqueue of another
+     *   cpu and steal it.
+     */
+
+    /* If we have schedule rate limiting enabled, check to see
+     * how long we''ve run for. */
+    if ( !tasklet_work_scheduled
+         && prv->ratelimit_us
+         && vcpu_runnable(current)
+         && !is_idle_vcpu(current)
+         && runtime < MICROSECS(prv->ratelimit_us) )
+    {
+        snext = scurr;
+        snext->start_time += now;
+        perfc_incr(delay_ms);
+        tslice = MICROSECS(prv->ratelimit_us);
+        ret.migrated = 0;
+        goto out;
+    }
+    tslice = MILLISECS(prv->tslice_ms);
+
     /*
      * Select next runnable local VCPU (ie top of local runq)
      */
@@ -1367,11 +1402,12 @@ csched_schedule(
     if ( !is_idle_vcpu(snext->vcpu) )
         snext->start_time += now;
 
+out:
     /*
      * Return task to run next...
      */
     ret.time = (is_idle_vcpu(snext->vcpu) ?
-                -1 : MILLISECS(prv->tslice_ms));
+                -1 : tslice);
     ret.task = snext->vcpu;
 
     CSCHED_VCPU_CHECK(ret.task);
@@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
     prv->tick_period_us = prv->tslice_ms * 1000 /
prv->ticks_per_tslice;
     prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
 
+    if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) )
+    {
+        printk("WARNING: sched_ratelimit_us >" 
+               "sched_credit_tslice_ms is undefined\n"
+               "ratelimit_us is set to 1000 * tslice_ms forcely\n");
+        prv->ratelimit_us = 1000 * prv->tslice_ms;
+    }
+    else
+        prv->ratelimit_us = sched_ratelimit_us;
     return 0;
 }
 
diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/schedule.c
--- a/xen/common/schedule.c	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/schedule.c	Mon Dec 26 03:44:39 2011 -0500
@@ -47,6 +47,11 @@ string_param("sched", opt_sched);
 bool_t sched_smt_power_savings = 0;
 boolean_param("sched_smt_power_savings", sched_smt_power_savings);
 
+/* Default scheduling rate limit: 1ms 
+ * The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms
is undefined
+ * */
+int sched_ratelimit_us = 1000;
+integer_param("sched_ratelimit_us", sched_ratelimit_us);
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/perfc_defn.h
--- a/xen/include/xen/perfc_defn.h	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/include/xen/perfc_defn.h	Mon Dec 26 03:44:39 2011 -0500
@@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
 PERFCOUNTER(sched_run,              "sched: runs through scheduler")
 PERFCOUNTER(sched_ctx,              "sched: context switches")
 
+PERFCOUNTER(delay_ms,               "csched: delay")
 PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
 PERFCOUNTER(schedule,               "csched: schedule")
 PERFCOUNTER(acct_run,               "csched: acct_run")
diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/sched-if.h
--- a/xen/include/xen/sched-if.h	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/include/xen/sched-if.h	Mon Dec 26 03:44:39 2011 -0500
@@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
 /* cpus currently in no cpupool */
 extern cpumask_t cpupool_free_cpus;
 
+/* Scheduler generic parameters
+ * */
+extern int sched_ratelimit_us;
+
+
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
  * we have a per-cpu pointer, along with a pre-allocated set of

Jan Beulich

2012-Jan-02 08:19 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

>>> On 26.12.11 at 09:46, Hui Lv <hui.lv@intel.com> wrote:
> Some modifications for this patch.
> 1.Based on George''s proposal, added a ratelimit_us element to
csched_priv to
> constrain prv->ratelimit_us <= 1000* prv->tslice_ms in
csched_init.
I do not see why you need a per-scheduler-instance variable for
issuing the warning and correcting the value - one warning (during
the first initialization) is entirely sufficient. (Otoh the already existing
tslice_ms field is pointless currently too, as it never gets changed
after being initialized from sched_credit_tslice_ms - George?)

Jan
> 2.Move the definition of sched_ratelimit_us to xen/sched.h
> 3.Remove the redundant "else" structure
> 
> See if you have any comment for such changes.
> 
> 
> This patch can improve Xen performance:
> 1. Basically, the "delay method" can achieve 11% overall
performance boost
> for SPECvirt than original credit scheduler.
> 2. We have tried 1ms delay and 10ms delay, there is no big difference 
> between these two configurations. (1ms is enough to achieve a good 
> performance)
> 3. We have compared different load level response time/latency (low, high, 
> peak), "delay method" didn''t bring very much response
time increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where 
> produces the benefits. (int sched_ratelimit_us = 1000 is the recommended 
> setting)
> 
> Signed-off-by: Hui Lv <hui.lv@intel.com>
> Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>
> 
> diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c	Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/sched_credit.c	Mon Dec 26 03:44:39 2011 -0500
> @@ -172,6 +172,7 @@ struct csched_private {
>      uint32_t credit;
>      int credit_balance;
>      uint32_t runq_sort;
> +    unsigned ratelimit_us;
>      /* Period of master and tick in milliseconds */
>      unsigned tslice_ms, tick_period_us, ticks_per_tslice;
>      unsigned credits_per_tslice;
> @@ -1297,10 +1298,15 @@ csched_schedule(
>      struct csched_private *prv = CSCHED_PRIV(ops);
>      struct csched_vcpu *snext;
>      struct task_slice ret;
> +    s_time_t runtime, tslice;
>  
>      CSCHED_STAT_CRANK(schedule);
>      CSCHED_VCPU_CHECK(current);
>  
> +    runtime = now - current->runstate.state_entry_time;
> +    if ( runtime < 0 ) /* Does this ever happen? */
> +        runtime = 0;
> +
>      if ( !is_idle_vcpu(scurr->vcpu) )
>      {
>          /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1319,35 @@ csched_schedule(
>          scurr->pri = CSCHED_PRI_IDLE;
>      }
>  
> +    /* Choices, choices:
> +     * - If we have a tasklet, we need to run the idle vcpu no matter
what.
> +     * - If sched rate limiting is in effect, and the current vcpu has
> +     *   run for less than that amount of time, continue the current one,
> +     *   but with a shorter timeslice and return it immediately
> +     * - Otherwise, chose the one with the highest priority (which may
> +     *   be the one currently running)
> +     * - If the currently running one is TS_OVER, see if there
> +     *   is a higher priority one waiting on the runqueue of another
> +     *   cpu and steal it.
> +     */
> +
> +    /* If we have schedule rate limiting enabled, check to see
> +     * how long we''ve run for. */
> +    if ( !tasklet_work_scheduled
> +         && prv->ratelimit_us
> +         && vcpu_runnable(current)
> +         && !is_idle_vcpu(current)
> +         && runtime < MICROSECS(prv->ratelimit_us) )
> +    {
> +        snext = scurr;
> +        snext->start_time += now;
> +        perfc_incr(delay_ms);
> +        tslice = MICROSECS(prv->ratelimit_us);
> +        ret.migrated = 0;
> +        goto out;
> +    }
> +    tslice = MILLISECS(prv->tslice_ms);
> +
>      /*
>       * Select next runnable local VCPU (ie top of local runq)
>       */
> @@ -1367,11 +1402,12 @@ csched_schedule(
>      if ( !is_idle_vcpu(snext->vcpu) )
>          snext->start_time += now;
>  
> +out:
>      /*
>       * Return task to run next...
>       */
>      ret.time = (is_idle_vcpu(snext->vcpu) ?
> -                -1 : MILLISECS(prv->tslice_ms));
> +                -1 : tslice);
>      ret.task = snext->vcpu;
>  
>      CSCHED_VCPU_CHECK(ret.task);
> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
>      prv->tick_period_us = prv->tslice_ms * 1000 /
prv->ticks_per_tslice;
>      prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC *
prv->tslice_ms;
>  
> +    if ( MICROSECS(sched_ratelimit_us) >
MILLISECS(sched_credit_tslice_ms) )
> +    {
> +        printk("WARNING: sched_ratelimit_us >" 
> +               "sched_credit_tslice_ms is undefined\n"
> +               "ratelimit_us is set to 1000 * tslice_ms
forcely\n");
> +        prv->ratelimit_us = 1000 * prv->tslice_ms;
> +    }
> +    else
> +        prv->ratelimit_us = sched_ratelimit_us;
>      return 0;
>  }
>  
> diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/schedule.c
> --- a/xen/common/schedule.c	Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/schedule.c	Mon Dec 26 03:44:39 2011 -0500
> @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
>  bool_t sched_smt_power_savings = 0;
>  boolean_param("sched_smt_power_savings",
sched_smt_power_savings);
>  
> +/* Default scheduling rate limit: 1ms 
> + * The behavior when sched_ratelimit_us is greater than 
> sched_credit_tslice_ms is undefined
> + * */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
>  /* Various timer handlers. */
>  static void s_timer_fn(void *unused);
>  static void vcpu_periodic_timer_fn(void *data);
> diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h	Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h	Mon Dec 26 03:44:39 2011 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
>  PERFCOUNTER(sched_run,              "sched: runs through
scheduler")
>  PERFCOUNTER(sched_ctx,              "sched: context switches")
>  
> +PERFCOUNTER(delay_ms,               "csched: delay")
>  PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
>  PERFCOUNTER(schedule,               "csched: schedule")
>  PERFCOUNTER(acct_run,               "csched: acct_run")
> diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/sched-if.h
> --- a/xen/include/xen/sched-if.h	Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/include/xen/sched-if.h	Mon Dec 26 03:44:39 2011 -0500
> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
>  /* cpus currently in no cpupool */
>  extern cpumask_t cpupool_free_cpus;
>  
> +/* Scheduler generic parameters
> + * */
> +extern int sched_ratelimit_us;
> +
> +
>  /*
>   * In order to allow a scheduler to remap the lock->cpu mapping,
>   * we have a per-cpu pointer, along with a pre-allocated set of

Lv, Hui

2012-Jan-05 04:08 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

George, 

Any comments for this? 

Is it the right "warning" way?

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: Monday, January 02, 2012 4:19 PM
> To: Lv, Hui
> Cc: Ian.Campbell@citrix.com; George.Dunlap@eu.citrix.com;
> raistlin@linux.it; xen-devel@lists.xensource.com
> Subject: Re: [PATCH] xen/sched_credit: Use delay to control scheduling
> frequency
> 
> >>> On 26.12.11 at 09:46, Hui Lv <hui.lv@intel.com> wrote:
> > Some modifications for this patch.
> > 1.Based on George''s proposal, added a ratelimit_us element to
> csched_priv to
> > constrain prv->ratelimit_us <= 1000* prv->tslice_ms in
csched_init.
> 
> I do not see why you need a per-scheduler-instance variable for
> issuing the warning and correcting the value - one warning (during
> the first initialization) is entirely sufficient. (Otoh the already
> existing
> tslice_ms field is pointless currently too, as it never gets changed
> after being initialized from sched_credit_tslice_ms - George?)
> 
> Jan
> 
> > 2.Move the definition of sched_ratelimit_us to xen/sched.h
> > 3.Remove the redundant "else" structure
> >
> > See if you have any comment for such changes.
> >
> >
> > This patch can improve Xen performance:
> > 1. Basically, the "delay method" can achieve 11% overall
performance
> boost
> > for SPECvirt than original credit scheduler.
> > 2. We have tried 1ms delay and 10ms delay, there is no big difference
> > between these two configurations. (1ms is enough to achieve a good
> > performance)
> > 3. We have compared different load level response time/latency (low,
> high,
> > peak), "delay method" didn''t bring very much
response time increase.
> > 4. 1ms delay can reduce 30% context switch at peak performance, where
> > produces the benefits. (int sched_ratelimit_us = 1000 is the
> recommended
> > setting)
> >
> > Signed-off-by: Hui Lv <hui.lv@intel.com>
> > Signed-off-by: George Dunlap <George.Dunlap@eu.citrix.com>
> >
> > diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/sched_credit.c
> > --- a/xen/common/sched_credit.c	Fri Dec 16 18:46:27 2011 +0000
> > +++ b/xen/common/sched_credit.c	Mon Dec 26 03:44:39 2011 -0500
> > @@ -172,6 +172,7 @@ struct csched_private {
> >      uint32_t credit;
> >      int credit_balance;
> >      uint32_t runq_sort;
> > +    unsigned ratelimit_us;
> >      /* Period of master and tick in milliseconds */
> >      unsigned tslice_ms, tick_period_us, ticks_per_tslice;
> >      unsigned credits_per_tslice;
> > @@ -1297,10 +1298,15 @@ csched_schedule(
> >      struct csched_private *prv = CSCHED_PRIV(ops);
> >      struct csched_vcpu *snext;
> >      struct task_slice ret;
> > +    s_time_t runtime, tslice;
> >
> >      CSCHED_STAT_CRANK(schedule);
> >      CSCHED_VCPU_CHECK(current);
> >
> > +    runtime = now - current->runstate.state_entry_time;
> > +    if ( runtime < 0 ) /* Does this ever happen? */
> > +        runtime = 0;
> > +
> >      if ( !is_idle_vcpu(scurr->vcpu) )
> >      {
> >          /* Update credits of a non-idle VCPU. */
> > @@ -1313,6 +1319,35 @@ csched_schedule(
> >          scurr->pri = CSCHED_PRI_IDLE;
> >      }
> >
> > +    /* Choices, choices:
> > +     * - If we have a tasklet, we need to run the idle vcpu no
> matter what.
> > +     * - If sched rate limiting is in effect, and the current vcpu
> has
> > +     *   run for less than that amount of time, continue the current
> one,
> > +     *   but with a shorter timeslice and return it immediately
> > +     * - Otherwise, chose the one with the highest priority (which
> may
> > +     *   be the one currently running)
> > +     * - If the currently running one is TS_OVER, see if there
> > +     *   is a higher priority one waiting on the runqueue of another
> > +     *   cpu and steal it.
> > +     */
> > +
> > +    /* If we have schedule rate limiting enabled, check to see
> > +     * how long we''ve run for. */
> > +    if ( !tasklet_work_scheduled
> > +         && prv->ratelimit_us
> > +         && vcpu_runnable(current)
> > +         && !is_idle_vcpu(current)
> > +         && runtime < MICROSECS(prv->ratelimit_us) )
> > +    {
> > +        snext = scurr;
> > +        snext->start_time += now;
> > +        perfc_incr(delay_ms);
> > +        tslice = MICROSECS(prv->ratelimit_us);
> > +        ret.migrated = 0;
> > +        goto out;
> > +    }
> > +    tslice = MILLISECS(prv->tslice_ms);
> > +
> >      /*
> >       * Select next runnable local VCPU (ie top of local runq)
> >       */
> > @@ -1367,11 +1402,12 @@ csched_schedule(
> >      if ( !is_idle_vcpu(snext->vcpu) )
> >          snext->start_time += now;
> >
> > +out:
> >      /*
> >       * Return task to run next...
> >       */
> >      ret.time = (is_idle_vcpu(snext->vcpu) ?
> > -                -1 : MILLISECS(prv->tslice_ms));
> > +                -1 : tslice);
> >      ret.task = snext->vcpu;
> >
> >      CSCHED_VCPU_CHECK(ret.task);
> > @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
> >      prv->tick_period_us = prv->tslice_ms * 1000 / prv-
> >ticks_per_tslice;
> >      prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv-
> >tslice_ms;
> >
> > +    if ( MICROSECS(sched_ratelimit_us) >
> MILLISECS(sched_credit_tslice_ms) )
> > +    {
> > +        printk("WARNING: sched_ratelimit_us >"
> > +               "sched_credit_tslice_ms is undefined\n"
> > +               "ratelimit_us is set to 1000 * tslice_ms
forcely\n");
> > +        prv->ratelimit_us = 1000 * prv->tslice_ms;
> > +    }
> > +    else
> > +        prv->ratelimit_us = sched_ratelimit_us;
> >      return 0;
> >  }
> >
> > diff -r a4bff36780a3 -r 18f40e1d0274 xen/common/schedule.c
> > --- a/xen/common/schedule.c	Fri Dec 16 18:46:27 2011 +0000
> > +++ b/xen/common/schedule.c	Mon Dec 26 03:44:39 2011 -0500
> > @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
> >  bool_t sched_smt_power_savings = 0;
> >  boolean_param("sched_smt_power_savings",
sched_smt_power_savings);
> >
> > +/* Default scheduling rate limit: 1ms
> > + * The behavior when sched_ratelimit_us is greater than
> > sched_credit_tslice_ms is undefined
> > + * */
> > +int sched_ratelimit_us = 1000;
> > +integer_param("sched_ratelimit_us", sched_ratelimit_us);
> >  /* Various timer handlers. */
> >  static void s_timer_fn(void *unused);
> >  static void vcpu_periodic_timer_fn(void *data);
> > diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/perfc_defn.h
> > --- a/xen/include/xen/perfc_defn.h	Fri Dec 16 18:46:27 2011
> +0000
> > +++ b/xen/include/xen/perfc_defn.h	Mon Dec 26 03:44:39 2011 -
> 0500
> > @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
> >  PERFCOUNTER(sched_run,              "sched: runs through
scheduler")
> >  PERFCOUNTER(sched_ctx,              "sched: context
switches")
> >
> > +PERFCOUNTER(delay_ms,               "csched: delay")
> >  PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
> >  PERFCOUNTER(schedule,               "csched: schedule")
> >  PERFCOUNTER(acct_run,               "csched: acct_run")
> > diff -r a4bff36780a3 -r 18f40e1d0274 xen/include/xen/sched-if.h
> > --- a/xen/include/xen/sched-if.h	Fri Dec 16 18:46:27 2011 +0000
> > +++ b/xen/include/xen/sched-if.h	Mon Dec 26 03:44:39 2011 -0500
> > @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
> >  /* cpus currently in no cpupool */
> >  extern cpumask_t cpupool_free_cpus;
> >
> > +/* Scheduler generic parameters
> > + * */
> > +extern int sched_ratelimit_us;
> > +
> > +
> >  /*
> >   * In order to allow a scheduler to remap the lock->cpu mapping,
> >   * we have a per-cpu pointer, along with a pre-allocated set of
>

George Dunlap

2012-Jan-06 19:50 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

On 02/01/12 03:19, Jan Beulich wrote:>>>> On 26.12.11 at 09:46, Hui Lv<hui.lv@intel.com>  wrote:
>> Some modifications for this patch.
>> 1.Based on George''s proposal, added a ratelimit_us element to
csched_priv to
>> constrain prv->ratelimit_us<= 1000* prv->tslice_ms in
csched_init.
> I do not see why you need a per-scheduler-instance variable for
> issuing the warning and correcting the value - one warning (during
> the first initialization) is entirely sufficient. (Otoh the already
existing
> tslice_ms field is pointless currently too, as it never gets changed
> after being initialized from sched_credit_tslice_ms - George?)Before the 4.2 release, I plan to add SYSCTL_scheduler_op calls for the 
credit scheduler for both the timeslice and ratelimiting.  But I don''t 
think it''s necessary for this patch to be admitted.

  -George

George Dunlap

2012-Jan-06 19:56 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Sorry for the delay; just catching up after the Christmas holidays.

On 26/12/11 03:46, Hui Lv wrote:> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
>       prv->tick_period_us = prv->tslice_ms * 1000 /
prv->ticks_per_tslice;
>       prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC *
prv->tslice_ms;
>
> +    if ( MICROSECS(sched_ratelimit_us)> 
MILLISECS(sched_credit_tslice_ms) )
> +    {
> +        printk("WARNING: sched_ratelimit_us>"
> +               "sched_credit_tslice_ms is undefined\n"
> +               "ratelimit_us is set to 1000 * tslice_ms
forcely\n")The standard idiom for this kind of message would be:
  WARNING [what''s wrong]
  [What you''re doing about it]

So the last sentence of the warning should be:
   Setting ratelimit_us to 1000 * tslice_ms

(Grammatically, you could say "Forcing ratelimit..." but I think
"force"
is too strong in this case.)

Other than that, I''m happy with it, if everyone else is:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

Lv, Hui

2012-Jan-08 12:03 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Thanks, George.
Should I send a revised version?
Can it be checked in?

-----Original Message-----
From: George Dunlap [mailto:george.dunlap@eu.citrix.com] 
Sent: Saturday, January 07, 2012 3:57 AM
To: Lv, Hui
Cc: xen-devel@lists.xensource.com; raistlin@linux.it; JBeulich@suse.com; Ian
Campbell
Subject: Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Sorry for the delay; just catching up after the Christmas holidays.

On 26/12/11 03:46, Hui Lv wrote:> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
>       prv->tick_period_us = prv->tslice_ms * 1000 /
prv->ticks_per_tslice;
>       prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC *
prv->tslice_ms;
>
> +    if ( MICROSECS(sched_ratelimit_us)> 
MILLISECS(sched_credit_tslice_ms) )
> +    {
> +        printk("WARNING: sched_ratelimit_us>"
> +               "sched_credit_tslice_ms is undefined\n"
> +               "ratelimit_us is set to 1000 * tslice_ms
forcely\n")The standard idiom for this kind of message would be:
  WARNING [what''s wrong]
  [What you''re doing about it]

So the last sentence of the warning should be:
   Setting ratelimit_us to 1000 * tslice_ms

(Grammatically, you could say "Forcing ratelimit..." but I think
"force"
is too strong in this case.)

Other than that, I''m happy with it, if everyone else is:

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

George Dunlap

2012-Jan-09 09:49 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

On Sun, 2012-01-08 at 12:03 +0000, Lv, Hui wrote:> Thanks, George.
> Should I send a revised version?
> Can it be checked in?
Yes, please send a revised version.

It will be checked in if one of the maintainers thinks there''s enough
consensus (probably if no one has any more comments in the next day or
two).

Thanks,
 -George
> 
> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@eu.citrix.com] 
> Sent: Saturday, January 07, 2012 3:57 AM
> To: Lv, Hui
> Cc: xen-devel@lists.xensource.com; raistlin@linux.it; JBeulich@suse.com;
Ian Campbell
> Subject: Re: [PATCH] xen/sched_credit: Use delay to control scheduling
frequency
> 
> Sorry for the delay; just catching up after the Christmas holidays.
> 
> On 26/12/11 03:46, Hui Lv wrote:
> > @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
> >       prv->tick_period_us = prv->tslice_ms * 1000 /
prv->ticks_per_tslice;
> >       prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC *
prv->tslice_ms;
> >
> > +    if ( MICROSECS(sched_ratelimit_us)> 
MILLISECS(sched_credit_tslice_ms) )
> > +    {
> > +        printk("WARNING: sched_ratelimit_us>"
> > +               "sched_credit_tslice_ms is undefined\n"
> > +               "ratelimit_us is set to 1000 * tslice_ms
forcely\n")
> The standard idiom for this kind of message would be:
>   WARNING [what''s wrong]
>   [What you''re doing about it]
> 
> So the last sentence of the warning should be:
>    Setting ratelimit_us to 1000 * tslice_ms
> 
> (Grammatically, you could say "Forcing ratelimit..." but I think
"force"
> is too strong in this case.)
> 
> Other than that, I''m happy with it, if everyone else is:
> 
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>

Hui Lv

2012-Jan-09 10:22 UTC

head link

[PATCH] xen/sched_credit: Use delay to control scheduling frequency

Updated the warning sentence for ratelimit_us.

This patch can improve Xen performance:
1. Basically, the "delay method" can achieve 11% overall performance
boost for SPECvirt than original credit scheduler.
2. We have tried 1ms delay and 10ms delay, there is no big difference between
these two configurations. (1ms is enough to achieve a good performance)
3. We have compared different load level response time/latency (low, high,
peak), "delay method" didn''t bring very much response time
increase.
4. 1ms delay can reduce 30% context switch at peak performance, where produces
the benefits. (int sched_ratelimit_us = 1000 is the recommended setting)

Signed-off-by: Hui Lv <hui.lv@intel.com>
Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>

Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c
--- a/xen/common/sched_credit.c	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/sched_credit.c	Mon Jan 09 05:21:35 2012 -0500
@@ -172,6 +172,7 @@ struct csched_private {
     uint32_t credit;
     int credit_balance;
     uint32_t runq_sort;
+    unsigned ratelimit_us;
     /* Period of master and tick in milliseconds */
     unsigned tslice_ms, tick_period_us, ticks_per_tslice;
     unsigned credits_per_tslice;
@@ -1297,10 +1298,15 @@ csched_schedule(
     struct csched_private *prv = CSCHED_PRIV(ops);
     struct csched_vcpu *snext;
     struct task_slice ret;
+    s_time_t runtime, tslice;
 
     CSCHED_STAT_CRANK(schedule);
     CSCHED_VCPU_CHECK(current);
 
+    runtime = now - current->runstate.state_entry_time;
+    if ( runtime < 0 ) /* Does this ever happen? */
+        runtime = 0;
+
     if ( !is_idle_vcpu(scurr->vcpu) )
     {
         /* Update credits of a non-idle VCPU. */
@@ -1313,6 +1319,35 @@ csched_schedule(
         scurr->pri = CSCHED_PRI_IDLE;
     }
 
+    /* Choices, choices:
+     * - If we have a tasklet, we need to run the idle vcpu no matter what.
+     * - If sched rate limiting is in effect, and the current vcpu has
+     *   run for less than that amount of time, continue the current one,
+     *   but with a shorter timeslice and return it immediately
+     * - Otherwise, chose the one with the highest priority (which may
+     *   be the one currently running)
+     * - If the currently running one is TS_OVER, see if there
+     *   is a higher priority one waiting on the runqueue of another
+     *   cpu and steal it.
+     */
+
+    /* If we have schedule rate limiting enabled, check to see
+     * how long we''ve run for. */
+    if ( !tasklet_work_scheduled
+         && prv->ratelimit_us
+         && vcpu_runnable(current)
+         && !is_idle_vcpu(current)
+         && runtime < MICROSECS(prv->ratelimit_us) )
+    {
+        snext = scurr;
+        snext->start_time += now;
+        perfc_incr(delay_ms);
+        tslice = MICROSECS(prv->ratelimit_us);
+        ret.migrated = 0;
+        goto out;
+    }
+    tslice = MILLISECS(prv->tslice_ms);
+
     /*
      * Select next runnable local VCPU (ie top of local runq)
      */
@@ -1367,11 +1402,12 @@ csched_schedule(
     if ( !is_idle_vcpu(snext->vcpu) )
         snext->start_time += now;
 
+out:
     /*
      * Return task to run next...
      */
     ret.time = (is_idle_vcpu(snext->vcpu) ?
-                -1 : MILLISECS(prv->tslice_ms));
+                -1 : tslice);
     ret.task = snext->vcpu;
 
     CSCHED_VCPU_CHECK(ret.task);
@@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
     prv->tick_period_us = prv->tslice_ms * 1000 /
prv->ticks_per_tslice;
     prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC * prv->tslice_ms;
 
+    if ( MICROSECS(sched_ratelimit_us) > MILLISECS(sched_credit_tslice_ms) )
+    {
+        printk("WARNING: sched_ratelimit_us >" 
+               "sched_credit_tslice_ms is undefined\n"
+               "Setting ratelimit_us to 1000 * tslice_ms\n");
+        prv->ratelimit_us = 1000 * prv->tslice_ms;
+    }
+    else
+        prv->ratelimit_us = sched_ratelimit_us;
     return 0;
 }
 
diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c
--- a/xen/common/schedule.c	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/common/schedule.c	Mon Jan 09 05:21:35 2012 -0500
@@ -47,6 +47,11 @@ string_param("sched", opt_sched);
 bool_t sched_smt_power_savings = 0;
 boolean_param("sched_smt_power_savings", sched_smt_power_savings);
 
+/* Default scheduling rate limit: 1ms 
+ * The behavior when sched_ratelimit_us is greater than sched_credit_tslice_ms
is undefined
+ * */
+int sched_ratelimit_us = 1000;
+integer_param("sched_ratelimit_us", sched_ratelimit_us);
 /* Various timer handlers. */
 static void s_timer_fn(void *unused);
 static void vcpu_periodic_timer_fn(void *data);
diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h
--- a/xen/include/xen/perfc_defn.h	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/include/xen/perfc_defn.h	Mon Jan 09 05:21:35 2012 -0500
@@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
 PERFCOUNTER(sched_run,              "sched: runs through scheduler")
 PERFCOUNTER(sched_ctx,              "sched: context switches")
 
+PERFCOUNTER(delay_ms,               "csched: delay")
 PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
 PERFCOUNTER(schedule,               "csched: schedule")
 PERFCOUNTER(acct_run,               "csched: acct_run")
diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h
--- a/xen/include/xen/sched-if.h	Fri Dec 16 18:46:27 2011 +0000
+++ b/xen/include/xen/sched-if.h	Mon Jan 09 05:21:35 2012 -0500
@@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
 /* cpus currently in no cpupool */
 extern cpumask_t cpupool_free_cpus;
 
+/* Scheduler generic parameters
+ * */
+extern int sched_ratelimit_us;
+
+
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
  * we have a per-cpu pointer, along with a pre-allocated set of

George Dunlap

2012-Jan-10 10:05 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com>
wrote:> Updated the warning sentence for ratelimit_us.
>
> This patch can improve Xen performance:
> 1. Basically, the "delay method" can achieve 11% overall
performance boost for SPECvirt than original credit scheduler.
> 2. We have tried 1ms delay and 10ms delay, there is no big difference
between these two configurations. (1ms is enough to achieve a good performance)
> 3. We have compared different load level response time/latency (low, high,
peak), "delay method" didn''t bring very much response time
increase.
> 4. 1ms delay can reduce 30% context switch at peak performance, where
produces the benefits. (int sched_ratelimit_us = 1000 is the recommended
setting)
>
> Signed-off-by: Hui Lv <hui.lv@intel.com>
> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Just confirming this:
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>

Thanks, Hui.
 -George
>
> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c
> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500
> @@ -172,6 +172,7 @@ struct csched_private {
>     uint32_t credit;
>     int credit_balance;
>     uint32_t runq_sort;
> +    unsigned ratelimit_us;
>     /* Period of master and tick in milliseconds */
>     unsigned tslice_ms, tick_period_us, ticks_per_tslice;
>     unsigned credits_per_tslice;
> @@ -1297,10 +1298,15 @@ csched_schedule(
>     struct csched_private *prv = CSCHED_PRIV(ops);
>     struct csched_vcpu *snext;
>     struct task_slice ret;
> +    s_time_t runtime, tslice;
>
>     CSCHED_STAT_CRANK(schedule);
>     CSCHED_VCPU_CHECK(current);
>
> +    runtime = now - current->runstate.state_entry_time;
> +    if ( runtime < 0 ) /* Does this ever happen? */
> +        runtime = 0;
> +
>     if ( !is_idle_vcpu(scurr->vcpu) )
>     {
>         /* Update credits of a non-idle VCPU. */
> @@ -1313,6 +1319,35 @@ csched_schedule(
>         scurr->pri = CSCHED_PRI_IDLE;
>     }
>
> +    /* Choices, choices:
> +     * - If we have a tasklet, we need to run the idle vcpu no matter
what.
> +     * - If sched rate limiting is in effect, and the current vcpu has
> +     *   run for less than that amount of time, continue the current one,
> +     *   but with a shorter timeslice and return it immediately
> +     * - Otherwise, chose the one with the highest priority (which may
> +     *   be the one currently running)
> +     * - If the currently running one is TS_OVER, see if there
> +     *   is a higher priority one waiting on the runqueue of another
> +     *   cpu and steal it.
> +     */
> +
> +    /* If we have schedule rate limiting enabled, check to see
> +     * how long we''ve run for. */
> +    if ( !tasklet_work_scheduled
> +         && prv->ratelimit_us
> +         && vcpu_runnable(current)
> +         && !is_idle_vcpu(current)
> +         && runtime < MICROSECS(prv->ratelimit_us) )
> +    {
> +        snext = scurr;
> +        snext->start_time += now;
> +        perfc_incr(delay_ms);
> +        tslice = MICROSECS(prv->ratelimit_us);
> +        ret.migrated = 0;
> +        goto out;
> +    }
> +    tslice = MILLISECS(prv->tslice_ms);
> +
>     /*
>      * Select next runnable local VCPU (ie top of local runq)
>      */
> @@ -1367,11 +1402,12 @@ csched_schedule(
>     if ( !is_idle_vcpu(snext->vcpu) )
>         snext->start_time += now;
>
> +out:
>     /*
>      * Return task to run next...
>      */
>     ret.time = (is_idle_vcpu(snext->vcpu) ?
> -                -1 : MILLISECS(prv->tslice_ms));
> +                -1 : tslice);
>     ret.task = snext->vcpu;
>
>     CSCHED_VCPU_CHECK(ret.task);
> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
>     prv->tick_period_us = prv->tslice_ms * 1000 /
prv->ticks_per_tslice;
>     prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC *
prv->tslice_ms;
>
> +    if ( MICROSECS(sched_ratelimit_us) >
MILLISECS(sched_credit_tslice_ms) )
> +    {
> +        printk("WARNING: sched_ratelimit_us >"
> +               "sched_credit_tslice_ms is undefined\n"
> +               "Setting ratelimit_us to 1000 * tslice_ms\n");
> +        prv->ratelimit_us = 1000 * prv->tslice_ms;
> +    }
> +    else
> +        prv->ratelimit_us = sched_ratelimit_us;
>     return 0;
>  }
>
> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c
> --- a/xen/common/schedule.c     Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/common/schedule.c     Mon Jan 09 05:21:35 2012 -0500
> @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
>  bool_t sched_smt_power_savings = 0;
>  boolean_param("sched_smt_power_savings",
sched_smt_power_savings);
>
> +/* Default scheduling rate limit: 1ms
> + * The behavior when sched_ratelimit_us is greater than
sched_credit_tslice_ms is undefined
> + * */
> +int sched_ratelimit_us = 1000;
> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
>  /* Various timer handlers. */
>  static void s_timer_fn(void *unused);
>  static void vcpu_periodic_timer_fn(void *data);
> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h
> --- a/xen/include/xen/perfc_defn.h      Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/include/xen/perfc_defn.h      Mon Jan 09 05:21:35 2012 -0500
> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
>  PERFCOUNTER(sched_run,              "sched: runs through
scheduler")
>  PERFCOUNTER(sched_ctx,              "sched: context switches")
>
> +PERFCOUNTER(delay_ms,               "csched: delay")
>  PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
>  PERFCOUNTER(schedule,               "csched: schedule")
>  PERFCOUNTER(acct_run,               "csched: acct_run")
> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h
> --- a/xen/include/xen/sched-if.h        Fri Dec 16 18:46:27 2011 +0000
> +++ b/xen/include/xen/sched-if.h        Mon Jan 09 05:21:35 2012 -0500
> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
>  /* cpus currently in no cpupool */
>  extern cpumask_t cpupool_free_cpus;
>
> +/* Scheduler generic parameters
> + * */
> +extern int sched_ratelimit_us;
> +
> +
>  /*
>  * In order to allow a scheduler to remap the lock->cpu mapping,
>  * we have a per-cpu pointer, along with a pre-allocated set of
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

George Dunlap

2012-Jan-24 12:01 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Ping?

On Tue, Jan 10, 2012 at 10:05 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:> On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com> wrote:
>> Updated the warning sentence for ratelimit_us.
>>
>> This patch can improve Xen performance:
>> 1. Basically, the "delay method" can achieve 11% overall
performance boost for SPECvirt than original credit scheduler.
>> 2. We have tried 1ms delay and 10ms delay, there is no big difference
between these two configurations. (1ms is enough to achieve a good performance)
>> 3. We have compared different load level response time/latency (low,
high, peak), "delay method" didn''t bring very much response
time increase.
>> 4. 1ms delay can reduce 30% context switch at peak performance, where
produces the benefits. (int sched_ratelimit_us = 1000 is the recommended
setting)
>>
>> Signed-off-by: Hui Lv <hui.lv@intel.com>
>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>
>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>
> Just confirming this:
> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>
> Thanks, Hui.
>  -George
>
>>
>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c
>> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
>> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500
>> @@ -172,6 +172,7 @@ struct csched_private {
>>     uint32_t credit;
>>     int credit_balance;
>>     uint32_t runq_sort;
>> +    unsigned ratelimit_us;
>>     /* Period of master and tick in milliseconds */
>>     unsigned tslice_ms, tick_period_us, ticks_per_tslice;
>>     unsigned credits_per_tslice;
>> @@ -1297,10 +1298,15 @@ csched_schedule(
>>     struct csched_private *prv = CSCHED_PRIV(ops);
>>     struct csched_vcpu *snext;
>>     struct task_slice ret;
>> +    s_time_t runtime, tslice;
>>
>>     CSCHED_STAT_CRANK(schedule);
>>     CSCHED_VCPU_CHECK(current);
>>
>> +    runtime = now - current->runstate.state_entry_time;
>> +    if ( runtime < 0 ) /* Does this ever happen? */
>> +        runtime = 0;
>> +
>>     if ( !is_idle_vcpu(scurr->vcpu) )
>>     {
>>         /* Update credits of a non-idle VCPU. */
>> @@ -1313,6 +1319,35 @@ csched_schedule(
>>         scurr->pri = CSCHED_PRI_IDLE;
>>     }
>>
>> +    /* Choices, choices:
>> +     * - If we have a tasklet, we need to run the idle vcpu no matter
what.
>> +     * - If sched rate limiting is in effect, and the current vcpu has
>> +     *   run for less than that amount of time, continue the current
one,
>> +     *   but with a shorter timeslice and return it immediately
>> +     * - Otherwise, chose the one with the highest priority (which may
>> +     *   be the one currently running)
>> +     * - If the currently running one is TS_OVER, see if there
>> +     *   is a higher priority one waiting on the runqueue of another
>> +     *   cpu and steal it.
>> +     */
>> +
>> +    /* If we have schedule rate limiting enabled, check to see
>> +     * how long we''ve run for. */
>> +    if ( !tasklet_work_scheduled
>> +         && prv->ratelimit_us
>> +         && vcpu_runnable(current)
>> +         && !is_idle_vcpu(current)
>> +         && runtime < MICROSECS(prv->ratelimit_us) )
>> +    {
>> +        snext = scurr;
>> +        snext->start_time += now;
>> +        perfc_incr(delay_ms);
>> +        tslice = MICROSECS(prv->ratelimit_us);
>> +        ret.migrated = 0;
>> +        goto out;
>> +    }
>> +    tslice = MILLISECS(prv->tslice_ms);
>> +
>>     /*
>>      * Select next runnable local VCPU (ie top of local runq)
>>      */
>> @@ -1367,11 +1402,12 @@ csched_schedule(
>>     if ( !is_idle_vcpu(snext->vcpu) )
>>         snext->start_time += now;
>>
>> +out:
>>     /*
>>      * Return task to run next...
>>      */
>>     ret.time = (is_idle_vcpu(snext->vcpu) ?
>> -                -1 : MILLISECS(prv->tslice_ms));
>> +                -1 : tslice);
>>     ret.task = snext->vcpu;
>>
>>     CSCHED_VCPU_CHECK(ret.task);
>> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
>>     prv->tick_period_us = prv->tslice_ms * 1000 /
prv->ticks_per_tslice;
>>     prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC *
prv->tslice_ms;
>>
>> +    if ( MICROSECS(sched_ratelimit_us) >
MILLISECS(sched_credit_tslice_ms) )
>> +    {
>> +        printk("WARNING: sched_ratelimit_us >"
>> +               "sched_credit_tslice_ms is undefined\n"
>> +               "Setting ratelimit_us to 1000 *
tslice_ms\n");
>> +        prv->ratelimit_us = 1000 * prv->tslice_ms;
>> +    }
>> +    else
>> +        prv->ratelimit_us = sched_ratelimit_us;
>>     return 0;
>>  }
>>
>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c
>> --- a/xen/common/schedule.c     Fri Dec 16 18:46:27 2011 +0000
>> +++ b/xen/common/schedule.c     Mon Jan 09 05:21:35 2012 -0500
>> @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
>>  bool_t sched_smt_power_savings = 0;
>>  boolean_param("sched_smt_power_savings",
sched_smt_power_savings);
>>
>> +/* Default scheduling rate limit: 1ms
>> + * The behavior when sched_ratelimit_us is greater than
sched_credit_tslice_ms is undefined
>> + * */
>> +int sched_ratelimit_us = 1000;
>> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
>>  /* Various timer handlers. */
>>  static void s_timer_fn(void *unused);
>>  static void vcpu_periodic_timer_fn(void *data);
>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h
>> --- a/xen/include/xen/perfc_defn.h      Fri Dec 16 18:46:27 2011 +0000
>> +++ b/xen/include/xen/perfc_defn.h      Mon Jan 09 05:21:35 2012 -0500
>> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
>>  PERFCOUNTER(sched_run,              "sched: runs through
scheduler")
>>  PERFCOUNTER(sched_ctx,              "sched: context
switches")
>>
>> +PERFCOUNTER(delay_ms,               "csched: delay")
>>  PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
>>  PERFCOUNTER(schedule,               "csched: schedule")
>>  PERFCOUNTER(acct_run,               "csched: acct_run")
>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h
>> --- a/xen/include/xen/sched-if.h        Fri Dec 16 18:46:27 2011 +0000
>> +++ b/xen/include/xen/sched-if.h        Mon Jan 09 05:21:35 2012 -0500
>> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
>>  /* cpus currently in no cpupool */
>>  extern cpumask_t cpupool_free_cpus;
>>
>> +/* Scheduler generic parameters
>> + * */
>> +extern int sched_ratelimit_us;
>> +
>> +
>>  /*
>>  * In order to allow a scheduler to remap the lock->cpu mapping,
>>  * we have a per-cpu pointer, along with a pre-allocated set of
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

Keir Fraser

2012-Jan-24 14:17 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

On 24/01/2012 12:01, "George Dunlap"
<George.Dunlap@eu.citrix.com> wrote:
> Ping?
It''s been in the tree for a week.
> On Tue, Jan 10, 2012 at 10:05 AM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
>> On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com> wrote:
>>> Updated the warning sentence for ratelimit_us.
>>> 
>>> This patch can improve Xen performance:
>>> 1. Basically, the "delay method" can achieve 11% overall
performance boost
>>> for SPECvirt than original credit scheduler.
>>> 2. We have tried 1ms delay and 10ms delay, there is no big
difference
>>> between these two configurations. (1ms is enough to achieve a good
>>> performance)
>>> 3. We have compared different load level response time/latency
(low, high,
>>> peak), "delay method" didn''t bring very much
response time increase.
>>> 4. 1ms delay can reduce 30% context switch at peak performance,
where
>>> produces the benefits. (int sched_ratelimit_us = 1000 is the
recommended
>>> setting)
>>> 
>>> Signed-off-by: Hui Lv <hui.lv@intel.com>
>>> Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
>>> 
>>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>> 
>> Just confirming this:
>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
>> 
>> Thanks, Hui.
>>  -George
>> 
>>> 
>>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c
>>> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
>>> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500
>>> @@ -172,6 +172,7 @@ struct csched_private {
>>>     uint32_t credit;
>>>     int credit_balance;
>>>     uint32_t runq_sort;
>>> +    unsigned ratelimit_us;
>>>     /* Period of master and tick in milliseconds */
>>>     unsigned tslice_ms, tick_period_us, ticks_per_tslice;
>>>     unsigned credits_per_tslice;
>>> @@ -1297,10 +1298,15 @@ csched_schedule(
>>>     struct csched_private *prv = CSCHED_PRIV(ops);
>>>     struct csched_vcpu *snext;
>>>     struct task_slice ret;
>>> +    s_time_t runtime, tslice;
>>> 
>>>     CSCHED_STAT_CRANK(schedule);
>>>     CSCHED_VCPU_CHECK(current);
>>> 
>>> +    runtime = now - current->runstate.state_entry_time;
>>> +    if ( runtime < 0 ) /* Does this ever happen? */
>>> +        runtime = 0;
>>> +
>>>     if ( !is_idle_vcpu(scurr->vcpu) )
>>>     {
>>>         /* Update credits of a non-idle VCPU. */
>>> @@ -1313,6 +1319,35 @@ csched_schedule(
>>>         scurr->pri = CSCHED_PRI_IDLE;
>>>     }
>>> 
>>> +    /* Choices, choices:
>>> +     * - If we have a tasklet, we need to run the idle vcpu no
matter what.
>>> +     * - If sched rate limiting is in effect, and the current vcpu
has
>>> +     *   run for less than that amount of time, continue the
current one,
>>> +     *   but with a shorter timeslice and return it immediately
>>> +     * - Otherwise, chose the one with the highest priority (which
may
>>> +     *   be the one currently running)
>>> +     * - If the currently running one is TS_OVER, see if there
>>> +     *   is a higher priority one waiting on the runqueue of
another
>>> +     *   cpu and steal it.
>>> +     */
>>> +
>>> +    /* If we have schedule rate limiting enabled, check to see
>>> +     * how long we''ve run for. */
>>> +    if ( !tasklet_work_scheduled
>>> +         && prv->ratelimit_us
>>> +         && vcpu_runnable(current)
>>> +         && !is_idle_vcpu(current)
>>> +         && runtime < MICROSECS(prv->ratelimit_us) )
>>> +    {
>>> +        snext = scurr;
>>> +        snext->start_time += now;
>>> +        perfc_incr(delay_ms);
>>> +        tslice = MICROSECS(prv->ratelimit_us);
>>> +        ret.migrated = 0;
>>> +        goto out;
>>> +    }
>>> +    tslice = MILLISECS(prv->tslice_ms);
>>> +
>>>     /*
>>>      * Select next runnable local VCPU (ie top of local runq)
>>>      */
>>> @@ -1367,11 +1402,12 @@ csched_schedule(
>>>     if ( !is_idle_vcpu(snext->vcpu) )
>>>         snext->start_time += now;
>>> 
>>> +out:
>>>     /*
>>>      * Return task to run next...
>>>      */
>>>     ret.time = (is_idle_vcpu(snext->vcpu) ?
>>> -                -1 : MILLISECS(prv->tslice_ms));
>>> +                -1 : tslice);
>>>     ret.task = snext->vcpu;
>>> 
>>>     CSCHED_VCPU_CHECK(ret.task);
>>> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
>>>     prv->tick_period_us = prv->tslice_ms * 1000 /
prv->ticks_per_tslice;
>>>     prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC *
prv->tslice_ms;
>>> 
>>> +    if ( MICROSECS(sched_ratelimit_us) >
MILLISECS(sched_credit_tslice_ms)
>>> )
>>> +    {
>>> +        printk("WARNING: sched_ratelimit_us >"
>>> +               "sched_credit_tslice_ms is undefined\n"
>>> +               "Setting ratelimit_us to 1000 *
tslice_ms\n");
>>> +        prv->ratelimit_us = 1000 * prv->tslice_ms;
>>> +    }
>>> +    else
>>> +        prv->ratelimit_us = sched_ratelimit_us;
>>>     return 0;
>>>  }
>>> 
>>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c
>>> --- a/xen/common/schedule.c     Fri Dec 16 18:46:27 2011 +0000
>>> +++ b/xen/common/schedule.c     Mon Jan 09 05:21:35 2012 -0500
>>> @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
>>>  bool_t sched_smt_power_savings = 0;
>>>  boolean_param("sched_smt_power_savings",
sched_smt_power_savings);
>>> 
>>> +/* Default scheduling rate limit: 1ms
>>> + * The behavior when sched_ratelimit_us is greater than
>>> sched_credit_tslice_ms is undefined
>>> + * */
>>> +int sched_ratelimit_us = 1000;
>>> +integer_param("sched_ratelimit_us", sched_ratelimit_us);
>>>  /* Various timer handlers. */
>>>  static void s_timer_fn(void *unused);
>>>  static void vcpu_periodic_timer_fn(void *data);
>>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/perfc_defn.h
>>> --- a/xen/include/xen/perfc_defn.h      Fri Dec 16 18:46:27 2011
+0000
>>> +++ b/xen/include/xen/perfc_defn.h      Mon Jan 09 05:21:35 2012
-0500
>>> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,              "sch
>>>  PERFCOUNTER(sched_run,              "sched: runs through
scheduler")
>>>  PERFCOUNTER(sched_ctx,              "sched: context
switches")
>>> 
>>> +PERFCOUNTER(delay_ms,               "csched: delay")
>>>  PERFCOUNTER(vcpu_check,             "csched:
vcpu_check")
>>>  PERFCOUNTER(schedule,               "csched: schedule")
>>>  PERFCOUNTER(acct_run,               "csched: acct_run")
>>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/include/xen/sched-if.h
>>> --- a/xen/include/xen/sched-if.h        Fri Dec 16 18:46:27 2011
+0000
>>> +++ b/xen/include/xen/sched-if.h        Mon Jan 09 05:21:35 2012
-0500
>>> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
>>>  /* cpus currently in no cpupool */
>>>  extern cpumask_t cpupool_free_cpus;
>>> 
>>> +/* Scheduler generic parameters
>>> + * */
>>> +extern int sched_ratelimit_us;
>>> +
>>> +
>>>  /*
>>>  * In order to allow a scheduler to remap the lock->cpu mapping,
>>>  * we have a per-cpu pointer, along with a pre-allocated set of
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel

George Dunlap

2012-Jan-24 14:31 UTC

head link

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

On Tue, 2012-01-24 at 14:17 +0000, Keir Fraser wrote:> On 24/01/2012 12:01, "George Dunlap"
<George.Dunlap@eu.citrix.com> wrote:
> 
> > Ping?
> 
> It''s been in the tree for a week.

Sorry -- I was sure I looked for it and didn''t see it; but now I look
again, there it is. :-/

 -George
> 
> > On Tue, Jan 10, 2012 at 10:05 AM, George Dunlap
> > <George.Dunlap@eu.citrix.com> wrote:
> >> On Mon, Jan 9, 2012 at 10:22 AM, Hui Lv <hui.lv@intel.com>
wrote:
> >>> Updated the warning sentence for ratelimit_us.
> >>> 
> >>> This patch can improve Xen performance:
> >>> 1. Basically, the "delay method" can achieve 11%
overall performance boost
> >>> for SPECvirt than original credit scheduler.
> >>> 2. We have tried 1ms delay and 10ms delay, there is no big
difference
> >>> between these two configurations. (1ms is enough to achieve a
good
> >>> performance)
> >>> 3. We have compared different load level response time/latency
(low, high,
> >>> peak), "delay method" didn''t bring very
much response time increase.
> >>> 4. 1ms delay can reduce 30% context switch at peak
performance, where
> >>> produces the benefits. (int sched_ratelimit_us = 1000 is the
recommended
> >>> setting)
> >>> 
> >>> Signed-off-by: Hui Lv <hui.lv@intel.com>
> >>> Signed-off-by: George Dunlap
<george.dunlap@eu.citrix.com>
> >>> 
> >>> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> >> 
> >> Just confirming this:
> >> Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
> >> 
> >> Thanks, Hui.
> >>  -George
> >> 
> >>> 
> >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/sched_credit.c
> >>> --- a/xen/common/sched_credit.c Fri Dec 16 18:46:27 2011 +0000
> >>> +++ b/xen/common/sched_credit.c Mon Jan 09 05:21:35 2012 -0500
> >>> @@ -172,6 +172,7 @@ struct csched_private {
> >>>     uint32_t credit;
> >>>     int credit_balance;
> >>>     uint32_t runq_sort;
> >>> +    unsigned ratelimit_us;
> >>>     /* Period of master and tick in milliseconds */
> >>>     unsigned tslice_ms, tick_period_us, ticks_per_tslice;
> >>>     unsigned credits_per_tslice;
> >>> @@ -1297,10 +1298,15 @@ csched_schedule(
> >>>     struct csched_private *prv = CSCHED_PRIV(ops);
> >>>     struct csched_vcpu *snext;
> >>>     struct task_slice ret;
> >>> +    s_time_t runtime, tslice;
> >>> 
> >>>     CSCHED_STAT_CRANK(schedule);
> >>>     CSCHED_VCPU_CHECK(current);
> >>> 
> >>> +    runtime = now - current->runstate.state_entry_time;
> >>> +    if ( runtime < 0 ) /* Does this ever happen? */
> >>> +        runtime = 0;
> >>> +
> >>>     if ( !is_idle_vcpu(scurr->vcpu) )
> >>>     {
> >>>         /* Update credits of a non-idle VCPU. */
> >>> @@ -1313,6 +1319,35 @@ csched_schedule(
> >>>         scurr->pri = CSCHED_PRI_IDLE;
> >>>     }
> >>> 
> >>> +    /* Choices, choices:
> >>> +     * - If we have a tasklet, we need to run the idle vcpu
no matter what.
> >>> +     * - If sched rate limiting is in effect, and the current
vcpu has
> >>> +     *   run for less than that amount of time, continue the
current one,
> >>> +     *   but with a shorter timeslice and return it
immediately
> >>> +     * - Otherwise, chose the one with the highest priority
(which may
> >>> +     *   be the one currently running)
> >>> +     * - If the currently running one is TS_OVER, see if
there
> >>> +     *   is a higher priority one waiting on the runqueue of
another
> >>> +     *   cpu and steal it.
> >>> +     */
> >>> +
> >>> +    /* If we have schedule rate limiting enabled, check to
see
> >>> +     * how long we''ve run for. */
> >>> +    if ( !tasklet_work_scheduled
> >>> +         && prv->ratelimit_us
> >>> +         && vcpu_runnable(current)
> >>> +         && !is_idle_vcpu(current)
> >>> +         && runtime <
MICROSECS(prv->ratelimit_us) )
> >>> +    {
> >>> +        snext = scurr;
> >>> +        snext->start_time += now;
> >>> +        perfc_incr(delay_ms);
> >>> +        tslice = MICROSECS(prv->ratelimit_us);
> >>> +        ret.migrated = 0;
> >>> +        goto out;
> >>> +    }
> >>> +    tslice = MILLISECS(prv->tslice_ms);
> >>> +
> >>>     /*
> >>>      * Select next runnable local VCPU (ie top of local runq)
> >>>      */
> >>> @@ -1367,11 +1402,12 @@ csched_schedule(
> >>>     if ( !is_idle_vcpu(snext->vcpu) )
> >>>         snext->start_time += now;
> >>> 
> >>> +out:
> >>>     /*
> >>>      * Return task to run next...
> >>>      */
> >>>     ret.time = (is_idle_vcpu(snext->vcpu) ?
> >>> -                -1 : MILLISECS(prv->tslice_ms));
> >>> +                -1 : tslice);
> >>>     ret.task = snext->vcpu;
> >>> 
> >>>     CSCHED_VCPU_CHECK(ret.task);
> >>> @@ -1533,6 +1569,15 @@ csched_init(struct scheduler *ops)
> >>>     prv->tick_period_us = prv->tslice_ms * 1000 /
prv->ticks_per_tslice;
> >>>     prv->credits_per_tslice = CSCHED_CREDITS_PER_MSEC *
prv->tslice_ms;
> >>> 
> >>> +    if ( MICROSECS(sched_ratelimit_us) >
MILLISECS(sched_credit_tslice_ms)
> >>> )
> >>> +    {
> >>> +        printk("WARNING: sched_ratelimit_us >"
> >>> +               "sched_credit_tslice_ms is
undefined\n"
> >>> +               "Setting ratelimit_us to 1000 *
tslice_ms\n");
> >>> +        prv->ratelimit_us = 1000 * prv->tslice_ms;
> >>> +    }
> >>> +    else
> >>> +        prv->ratelimit_us = sched_ratelimit_us;
> >>>     return 0;
> >>>  }
> >>> 
> >>> diff -r a4bff36780a3 -r fe8d0ca867aa xen/common/schedule.c
> >>> --- a/xen/common/schedule.c     Fri Dec 16 18:46:27 2011 +0000
> >>> +++ b/xen/common/schedule.c     Mon Jan 09 05:21:35 2012 -0500
> >>> @@ -47,6 +47,11 @@ string_param("sched", opt_sched);
> >>>  bool_t sched_smt_power_savings = 0;
> >>>  boolean_param("sched_smt_power_savings",
sched_smt_power_savings);
> >>> 
> >>> +/* Default scheduling rate limit: 1ms
> >>> + * The behavior when sched_ratelimit_us is greater than
> >>> sched_credit_tslice_ms is undefined
> >>> + * */
> >>> +int sched_ratelimit_us = 1000;
> >>> +integer_param("sched_ratelimit_us",
sched_ratelimit_us);
> >>>  /* Various timer handlers. */
> >>>  static void s_timer_fn(void *unused);
> >>>  static void vcpu_periodic_timer_fn(void *data);
> >>> diff -r a4bff36780a3 -r fe8d0ca867aa
xen/include/xen/perfc_defn.h
> >>> --- a/xen/include/xen/perfc_defn.h      Fri Dec 16 18:46:27
2011 +0000
> >>> +++ b/xen/include/xen/perfc_defn.h      Mon Jan 09 05:21:35
2012 -0500
> >>> @@ -16,6 +16,7 @@ PERFCOUNTER(sched_irq,             
"sch
> >>>  PERFCOUNTER(sched_run,              "sched: runs through
scheduler")
> >>>  PERFCOUNTER(sched_ctx,              "sched: context
switches")
> >>> 
> >>> +PERFCOUNTER(delay_ms,               "csched:
delay")
> >>>  PERFCOUNTER(vcpu_check,             "csched:
vcpu_check")
> >>>  PERFCOUNTER(schedule,               "csched:
schedule")
> >>>  PERFCOUNTER(acct_run,               "csched:
acct_run")
> >>> diff -r a4bff36780a3 -r fe8d0ca867aa
xen/include/xen/sched-if.h
> >>> --- a/xen/include/xen/sched-if.h        Fri Dec 16 18:46:27
2011 +0000
> >>> +++ b/xen/include/xen/sched-if.h        Mon Jan 09 05:21:35
2012 -0500
> >>> @@ -16,6 +16,11 @@ extern struct cpupool *cpupool0;
> >>>  /* cpus currently in no cpupool */
> >>>  extern cpumask_t cpupool_free_cpus;
> >>> 
> >>> +/* Scheduler generic parameters
> >>> + * */
> >>> +extern int sched_ratelimit_us;
> >>> +
> >>> +
> >>>  /*
> >>>  * In order to allow a scheduler to remap the lock->cpu
mapping,
> >>>  * we have a per-cpu pointer, along with a pre-allocated set
of
> >>> 
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xensource.com
> >>> http://lists.xensource.com/xen-devel
> 
>

Xen devel - Dec 2011 - [PATCH] xen/sched_credit: Use delay to control scheduling frequency

[PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

[PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

[PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency

Re: [PATCH] xen/sched_credit: Use delay to control scheduling frequency