thr3ads.net - Xen devel - [Xen-devel] [PATCH] scheduler rate controller [Oct 2011]

If this information is useful, please help other people find it:
Share via:

Lv, Hui

2011-Oct-24 03:36 UTC

[Xen-devel] [PATCH] scheduler rate controller

As one of the topics presented in Xen summit2011 in SC, we proposed one method
scheduler rate controller (SRC) to control high frequency of scheduling under
some conditions. You can find the slides at
http://www.slideshare.net/xen_com_mgr/9-hui-lvtacklingthemanagementchallengesofserverconsolidationonmulticoresystems

In the followings, we have tested it with 2-socket multi-core system with many
rounds and got the positive results and improve the performance greatly either
with the consolidation workload SPECvirt_2010 or some small workloads such as
sysbench and SPECjbb. So I posted it here for review.
>From Xen scheduling mechanism, hypervisor kicks related VCPUs by raising
schedule softirq during processing external interrupts. Therefore, if the number
of IRQ is very large, the scheduling happens more frequent. Frequent scheduling
will1) bring more overhead for hypervisor and 
2) increase cache miss rate. 

In our consolidation workloads, SPECvirt_sc2010, SR-IOV & iSCSI solution are
adopted to bypass software emulation but bring heavy network traffic.
Correspondingly, 15k scheduling happened per second on each physical core, which
means the average running time is  very short, only 60us. We proposed SRC in XEN
to mitigate this problem.
The performance benefits brought by this patch is very huge at peak throughput
with no influence when system loads are low.

SRC improved SPECvirt performance by 14%. 
1)It reduced CPU utilization, which allows more load to be added.
2)Response time (QoS)  became better at the same CPU %.
3)The better response time allowed us to push the CPU % at peak performance to
an even higher level (CPU was not saturated in SPECvirt).
SRC reduced context switch rate significantly, resulted in 
2)Smaller Path Length
3)Less cache misses thus lower CPI
4)Better performance for both Guest and Hypervisor sides.

With this patch, from our SPECvirt_sc2010 results, the performance of xen
catches up the other open sourced hypervisor.


Signed-off-by: Hui Lv hui.lv@intel.com


diff -ruNp xen.org/common/schedule.c xen/common/schedule.c
--- xen.org/common/schedule.c	2011-10-20 03:29:44.000000000 -0400
+++ xen/common/schedule.c	2011-10-23 21:41:14.000000000 -0400
@@ -98,6 +98,31 @@ static inline void trace_runstate_change
 
     __trace_var(event, 1/*tsc*/, sizeof(d), &d);
 }
+/*
+ *opt_sched_rate_control:  parameter to turn on/off  scheduler rate controller
(SRC)
+ *opt_sched_rate_high: scheduling frequency threshold, default value is 50.
+
+ *Suggest to set the value of opt_sched_rate_high larger than 50.
+ *It means if the scheduling frequency number, calculated during
SCHED_SRC_INTERVAL (default 10 millisecond), is larger than opt_sched_rate_high,
SRC works.
+*/
+bool_t opt_sched_rate_control = 0;
+unsigned int opt_sched_rate_high = 50;           
+boolean_param("sched_rate_control", opt_sched_rate_control);
+integer_param("sched_rate_high", opt_sched_rate_high);
+
+
+/* The following function is the scheduling rate controller (SRC). It is
triggered when
+ * the frequency of scheduling is excessive high. (larger than
opt_sched_rate_high)
+ * 
+ * Rules to control the scheduling frequency
+ * 1)if the frequency of scheduling (sd->s_csnum), calculated during the
period of SCHED_SRC_INTERVAL,
+ * is larger than the threshold opt_sched_rate_high, SRC is enabled to work by
setting sd->s_src_control = 1
+ * 2)if SRC works, it returns previous vcpu directly if previous vcpu is still
runnalbe and not the idle vcpu.
+ * This method can decrease the frequency of scheduling when the scheduling
frequency is excessive.
+*/
+
+void src_controller(struct schedule_data *sd, struct vcpu *prev, s_time_t now);
+
 
 static inline void trace_continue_running(struct vcpu *v)
 {
@@ -1033,6 +1058,29 @@ static void vcpu_periodic_timer_work(str
     set_timer(&v->periodic_timer, periodic_next_event);
 }
 
+void src_controller(struct schedule_data *sd, struct vcpu *prev, s_time_t now)
+{
+    sd->s_csnum++;
+    if ((now - sd->s_src_loop_begin) >= MILLISECS(SCHED_SRC_INTERVAL))
+    {
+        if (sd->s_csnum >= opt_sched_rate_high)
+               sd->s_src_control = 1;
+        else
+               sd->s_src_control = 0;
+        sd->s_src_loop_begin = now;
+        sd->s_csnum = 0;
+    }
+    if (sd->s_src_control)
+    {
+       if (!is_idle_vcpu(prev) && vcpu_runnable(prev))
+       {
+           perfc_incr(sched_src);
+           return continue_running(prev);
+       }
+       perfc_incr(sched_nosrc);
+    }
+}
+
 /* 
  * The main function
  * - deschedule the current domain (scheduler independent).
@@ -1054,6 +1102,8 @@ static void schedule(void)
 
     sd = &this_cpu(schedule_data);
 
+    if (opt_sched_rate_control)
+        src_controller(sd,prev,now);
     /* Update tasklet scheduling status. */
     switch ( *tasklet_work )
     {
@@ -1197,6 +1247,9 @@ static int cpu_schedule_up(unsigned int 
     sd->curr = idle_vcpu[cpu];
     init_timer(&sd->s_timer, s_timer_fn, NULL, cpu);
     atomic_set(&sd->urgent_count, 0);
+    sd->s_csnum=0;
+    sd->s_src_loop_begin=NOW();
+    sd->s_src_control=0;
 
     /* Boot CPU is dealt with later in schedule_init(). */
     if ( cpu == 0 )
diff -ruNp xen.org/include/xen/perfc_defn.h xen/include/xen/perfc_defn.h
--- xen.org/include/xen/perfc_defn.h	2011-10-20 03:29:44.000000000 -0400
+++ xen/include/xen/perfc_defn.h	2011-10-23 21:08:28.000000000 -0400
@@ -15,6 +15,8 @@ PERFCOUNTER(ipis,                   "#IP
 PERFCOUNTER(sched_irq,              "sched: timer")
 PERFCOUNTER(sched_run,              "sched: runs through scheduler")
 PERFCOUNTER(sched_ctx,              "sched: context switches")
+PERFCOUNTER(sched_src,              "sched: src triggered")
+PERFCOUNTER(sched_nosrc,            "sched: src not triggered")
 
 PERFCOUNTER(vcpu_check,             "csched: vcpu_check")
 PERFCOUNTER(schedule,               "csched: schedule")
diff -ruNp xen.org/include/xen/sched-if.h xen/include/xen/sched-if.h
--- xen.org/include/xen/sched-if.h	2011-10-20 03:29:44.000000000 -0400
+++ xen/include/xen/sched-if.h	2011-10-23 21:20:57.000000000 -0400
@@ -15,6 +15,11 @@ extern struct cpupool *cpupool0;
 
 /* cpus currently in no cpupool */
 extern cpumask_t cpupool_free_cpus;
+/*SRC judge whether to trigger scheduling controller based on the comparison 
+ *between the scheduling frequency, counted during SCHED_SRC_INTERVAL, and the
threshold opt_sched_rate_high
+ *Suggest to set SCHED_SRC_INTERVAL to 10 (millisecond)
+*/
+#define SCHED_SRC_INTERVAL      10
 
 /*
  * In order to allow a scheduler to remap the lock->cpu mapping,
@@ -32,6 +37,9 @@ struct schedule_data {
     struct vcpu        *curr;           /* current task                    */
     void               *sched_priv;
     struct timer        s_timer;        /* scheduling timer                */
+    int                 s_csnum;          	/* scheduling number based on last
period */
+    s_time_t            s_src_loop_begin;       /* SRC conting start point */
+    bool_t              s_src_control;        	/*indicate whether src should be
triggered */
     atomic_t            urgent_count;   /* how many urgent vcpus           */
 };

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Oct-24 16:17 UTC

head link

Re: [Xen-devel] [PATCH] scheduler rate controller

On Mon, Oct 24, 2011 at 4:36 AM, Lv, Hui <hui.lv@intel.com>
wrote:>
> As one of the topics presented in Xen summit2011 in SC, we proposed one
method scheduler rate controller (SRC) to control high frequency of scheduling
under some conditions. You can find the slides at
>
http://www.slideshare.net/xen_com_mgr/9-hui-lvtacklingthemanagementchallengesofserverconsolidationonmulticoresystems
>
> In the followings, we have tested it with 2-socket multi-core system with
many rounds and got the positive results and improve the performance greatly
either with the consolidation workload SPECvirt_2010 or some small workloads
such as sysbench and SPECjbb. So I posted it here for review.
>
> >From Xen scheduling mechanism, hypervisor kicks related VCPUs by
raising schedule softirq during processing external interrupts. Therefore, if
the number of IRQ is very large, the scheduling happens more frequent. Frequent
scheduling will
> 1) bring more overhead for hypervisor and
> 2) increase cache miss rate.
>
> In our consolidation workloads, SPECvirt_sc2010, SR-IOV & iSCSI
solution are adopted to bypass software emulation but bring heavy network
traffic. Correspondingly, 15k scheduling happened per second on each physical
core, which means the average running time is  very short, only 60us. We
proposed SRC in XEN to mitigate this problem.
> The performance benefits brought by this patch is very huge at peak
throughput with no influence when system loads are low.
>
> SRC improved SPECvirt performance by 14%.
> 1)It reduced CPU utilization, which allows more load to be added.
> 2)Response time (QoS)  became better at the same CPU %.
> 3)The better response time allowed us to push the CPU % at peak performance
to an even higher level (CPU was not saturated in SPECvirt).
> SRC reduced context switch rate significantly, resulted in
> 2)Smaller Path Length
> 3)Less cache misses thus lower CPI
> 4)Better performance for both Guest and Hypervisor sides.
>
> With this patch, from our SPECvirt_sc2010 results, the performance of xen
catches up the other open sourced hypervisor.
Hui,

Thanks for the patch, and the work you''ve done testing it.  There are
a couple of things to discuss.

* I''m not sure I like the idea of doing this at the generic level than
at the specific scheduler level -- e.g., inside of credit1.  For
better or for worse, all aspects of scheduling work together, and even
small changes tend to have a significant effect on the emergent
behavior.  I understand why you''d want this in the generic scheduling
code; but it seems like it would be better for each scheduler to
implement a rate control independently.

* The actual algorithm you use here isn''t described.  It seems to be
as follows (please correct me if I''ve made a mistake
reverse-engineering the algorithm):

Every 10ms, check to see if there have been more than 50 schedules.
If so, disable pre-emption entirely for 10ms, allowing processes to
run without being interrupted (unless they yield).

It seems like we should be able to do better.  For one, it means in
the general case you will flip back and forth between really frequent
schedules and less frequent schedules.  For two, turning off
preemption entirely will mean that whatever vcpu happens to be running
could, if it wished, run for the full 10ms; and which one got elected
to do that would be really random.  This may work well for SPECvirt,
but it''s the kind of algorithm that is likely to have some workloads
on which it works very poorly.  Finally, there''s the chance that this
algorithm could be "gamed" -- i.e., if a rogue VM knew that most other
VMs yielded frequently, it might be able to arrange that there would
always be more than 50 context switches a second, while it runs
without preemption and takes up more than its fair share.

Have you tried just making it give each vcpu a minimum amount of
scheduling time, say, 500us or 1ms?

Now a couple of stylistic comments:
* src tends to make me think of "source".  I think sched_rate[_*]
would fit the existing naming convention better.
* src_controller() shouldn''t call continue_running() directly.
Instead, scheduler() should call src_controller(); and only call
sched->do_schedule() if src_controller() returns false (or something
like that).
* Whatever the algorithm is should have comments describing what it
does and how it''s supposed to work.
* Your patch is malformed; you need to have it apply at the top level,
not from within the xen/ subdirectory.  The easiest way to get a patch
is to use either mercurial queues, or "hg diff".  There are some good
suggestions for making and posting patches here:
http://wiki.xensource.com/xenwiki/SubmittingXenPatches

Thanks again for all your work on this -- we definitely want Xen to
beat the other open-source hypervisor. :-)

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2011-Oct-24 16:57 UTC

head link

Re: [Xen-devel] [PATCH] scheduler rate controller

On 24/10/2011 17:17, "George Dunlap"
<George.Dunlap@eu.citrix.com> wrote:
> * I''m not sure I like the idea of doing this at the generic level
than
> at the specific scheduler level -- e.g., inside of credit1.  For
> better or for worse, all aspects of scheduling work together, and even
> small changes tend to have a significant effect on the emergent
> behavior.  I understand why you''d want this in the generic
scheduling
> code; but it seems like it would be better for each scheduler to
> implement a rate control independently.
Yes, this doesn''t belong in schedule.c.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Lv, Hui

2011-Oct-28 02:07 UTC

head link

RE: [Xen-devel] [PATCH] scheduler rate controller

> -----Original Message-----
> From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George
> Dunlap
> Sent: Tuesday, October 25, 2011 12:17 AM
> To: Lv, Hui
> Cc: xen-devel@lists.xensource.com; Duan, Jiangang; Tian, Kevin;
> keir@xen.org; Dong, Eddie
> Subject: Re: [Xen-devel] [PATCH] scheduler rate controller
> 
> On Mon, Oct 24, 2011 at 4:36 AM, Lv, Hui <hui.lv@intel.com> wrote:
> >
> > As one of the topics presented in Xen summit2011 in SC, we proposed
> one method scheduler rate controller (SRC) to control high frequency of
> scheduling under some conditions. You can find the slides at
> > http://www.slideshare.net/xen_com_mgr/9-hui-
> lvtacklingthemanagementchallengesofserverconsolidationonmulticoresystem
> s
> >
> > In the followings, we have tested it with 2-socket multi-core system
> with many rounds and got the positive results and improve the
> performance greatly either with the consolidation workload
> SPECvirt_2010 or some small workloads such as sysbench and SPECjbb. So
> I posted it here for review.
> >
> > >From Xen scheduling mechanism, hypervisor kicks related VCPUs by
> raising schedule softirq during processing external interrupts.
> Therefore, if the number of IRQ is very large, the scheduling happens
> more frequent. Frequent scheduling will
> > 1) bring more overhead for hypervisor and
> > 2) increase cache miss rate.
> >
> > In our consolidation workloads, SPECvirt_sc2010, SR-IOV & iSCSI
> solution are adopted to bypass software emulation but bring heavy
> network traffic. Correspondingly, 15k scheduling happened per second on
> each physical core, which means the average running time is  very short,
> only 60us. We proposed SRC in XEN to mitigate this problem.
> > The performance benefits brought by this patch is very huge at peak
> throughput with no influence when system loads are low.
> >
> > SRC improved SPECvirt performance by 14%.
> > 1)It reduced CPU utilization, which allows more load to be added.
> > 2)Response time (QoS)  became better at the same CPU %.
> > 3)The better response time allowed us to push the CPU % at peak
> performance to an even higher level (CPU was not saturated in SPECvirt).
> > SRC reduced context switch rate significantly, resulted in
> > 2)Smaller Path Length
> > 3)Less cache misses thus lower CPI
> > 4)Better performance for both Guest and Hypervisor sides.
> >
> > With this patch, from our SPECvirt_sc2010 results, the performance of
> xen catches up the other open sourced hypervisor.
> 
> Hui,
> 
> Thanks for the patch, and the work you''ve done testing it.  There
are
> a couple of things to discuss.
> 
> * I''m not sure I like the idea of doing this at the generic level
than
> at the specific scheduler level -- e.g., inside of credit1.  For
> better or for worse, all aspects of scheduling work together, and even
> small changes tend to have a significant effect on the emergent
> behavior.  I understand why you''d want this in the generic
scheduling
> code; but it seems like it would be better for each scheduler to
> implement a rate control independently.
> 
> * The actual algorithm you use here isn''t described.  It seems to
be
> as follows (please correct me if I''ve made a mistake
> reverse-engineering the algorithm):
> 
> Every 10ms, check to see if there have been more than 50 schedules.
> If so, disable pre-emption entirely for 10ms, allowing processes to
> run without being interrupted (unless they yield).
> 
Sorry for the lack of description. You are right for the control process.
> It seems like we should be able to do better.  For one, it means in
> the general case you will flip back and forth between really frequent
> schedules and less frequent schedules.  For two, turning off
> preemption entirely will mean that whatever vcpu happens to be running
> could, if it wished, run for the full 10ms; and which one got elected
> to do that would be really random.  This may work well for SPECvirt,
> but it''s the kind of algorithm that is likely to have some
workloads
> on which it works very poorly.  Finally, there''s the chance that
this
> algorithm could be "gamed" -- i.e., if a rogue VM knew that most
other
> VMs yielded frequently, it might be able to arrange that there would
> always be more than 50 context switches a second, while it runs
> without preemption and takes up more than its fair share.
> 
Yes, agree that, there are more things to do to make a more delicate solution
for this in the next step. For example, we can consider per VM status to decide
whether to turn on/off the control to make it more fair, such as your point
three.

However, as the first step, the current solution is straightforward and
effective.
1) Most importantly, it happens when the scheduling frequency is excessive. User
can decide which degree is excessive by setting parameter
"opt_sched_rate_high"(default 50). If the system is crucial for
latency sensitive tasks, you can choose a higher value that this patch will have
little impact on it. User can decide which value is good for their environment.
However, from our experience, if the scheduling frequency is too excessive, it
also impairs the Qos of latency sensitive tasks due to frequent interrupts by
other VMs.
2) Considering the excessive scheduling condition, the preemption is turning off
entirely. If current running vcpu, yielded frequently, it cannot run for the
full 10ms. If current running vcpu, not yielded frequently, it can possible run
as long as up to 10ms. That means, this algorithm roughly protect the un-yielded
vcpu to run a long time slice without preemption. This is something similar to
your point 3 but in a roughly way. :)
3) Finally, this patch aimed to solve the issue when scheduling frequency is
excessive but not influence the normal case (less frequency). We should treat
these two cases separately. Since excessive scheduling case cannot guarantee
neither performance or Qos.


 > Have you tried just making it give each vcpu a minimum amount of
> scheduling time, say, 500us or 1ms?
> 
> Now a couple of stylistic comments:
> * src tends to make me think of "source".  I think sched_rate[_*]
> would fit the existing naming convention better.
> * src_controller() shouldn''t call continue_running() directly.
> Instead, scheduler() should call src_controller(); and only call
> sched->do_schedule() if src_controller() returns false (or something
> like that).
> * Whatever the algorithm is should have comments describing what it
> does and how it''s supposed to work.
> * Your patch is malformed; you need to have it apply at the top level,
> not from within the xen/ subdirectory.  The easiest way to get a patch
> is to use either mercurial queues, or "hg diff".  There are some
good
> suggestions for making and posting patches here:
> http://wiki.xensource.com/xenwiki/SubmittingXenPatches
> 
Thanks for the kind information. I think the next version will be better :)
> Thanks again for all your work on this -- we definitely want Xen to
> beat the other open-source hypervisor. :-)
> 
>  -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Lv, Hui

2011-Oct-28 08:52 UTC

head link

RE: [Xen-devel] [PATCH] scheduler rate controller

Thanks Dario for your helpful comments.
> Just something crossed my mind reading the patch and the comments,
> would it make sense to rate-limit the calls coming from (non-timer)
> interrupt exit paths while still letting the tick able to trigger a
> scheduling decision? This just to be sure that at least the time slice
> enforcing (if any) happens how expected... Could it make sense?
> 
Yes, it makes sense. But currently, we lacks the scheduler knowledge such as
what caused the scheduler, timer or interrupt? Can we?
> More generally speaking, I see how this feature can be useful, and I
> also think it could live in the generic schedule.c code, but (as George
> was saying) the algorithm by which rate-limiting is happening needs to
> be well known, documented and exposed to the user (more than by means
> of a couple of perf-counters).
> 
One question is that, what is the right palace to document such information?
I'd like to make it as clear as possible to the users.
> For example this might completely destroy the time guarantees a
> scheduler  like sEDF would give, and in such case it must be easy
> enough to figure out what's going on and why the scheduler is not
> behaving as expected!
> 
> For that reason, although again, this could be made general enough to
> be sensible and meaningful for all the various schedulers, it might be
> worthwhile to have it inside credit1 for now, where we know it will
> probably yield the most of its benefits.
> 
I think I got your point. More considerations should be taken to avoid the
disasters to any of the existing schedulers.
I'm fine to move it to the credit in the current stage. :)
> Just my 2 cents. :-)
> 
> Thanks and Regards,
> Dario
> 
> --
> <<This happens because I choose it to happen!>> (Raistlin
Majere)
> ----------------------------------------------------------------------
> Dario Faggioli, http://retis.sssup.it/people/faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK) PhD
> Candidate, ReTiS Lab, Scuola Superiore Sant'Anna, Pisa  (Italy)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Oct-28 16:18 UTC

head link

RE: [Xen-devel] [PATCH] scheduler rate controller

On Fri, 2011-10-28 at 11:09 +0100, Dario Faggioli wrote:> Not sure yet, I can imagine it''s tricky and I need to dig a bit
more in
> the code, but I''ll let know if I found a way of doing that...
There are lots of reasons why the SCHEDULE_SOFTIRQ gets raised.  But I
think we want to focus on the scheduler itself raising it as a result of
the .wake() callback.  Whether the .wake() happens as a result of a HW
interrupt or something else, I don''t think really matters.

Dario and Hui,  neither of you have commented on my idea, which is
simply don''t preempt a VM if it has run for less than some amount of
time (say, 500us or 1ms).  If a higher-priority VM is woken up, see how
long the current VM has run.  If it''s less than 1ms, set a 1ms timer
and
call schedule() then.
> > > More generally speaking, I see how this feature can be useful,
and I
> > > also think it could live in the generic schedule.c code, but (as
George
> > > was saying) the algorithm by which rate-limiting is happening
needs to
> > > be well known, documented and exposed to the user (more than by
means
> > > of a couple of perf-counters).
> > > 
> > 
> > One question is that, what is the right palace to document such
information? I''d like to make it as clear as possible to the users.
> > 
> Well, don''t know, maybe a WARN (a WARN_ONCE alike thing would
probably
> be better), or in general something that leave a footstep in the logs,
> so that one can find out by means of `xl dmesg'' or related.
Obviously,
> I''m not suggesting of printk-ing each suppressed schedule
invocation, or
> the overhead would get even worse... :-P
> 
> I''m thinking of something that happens the very first time the
limiting
> fires, or maybe oncee some period/number of suppressions, just to remind
> the user that he''s getting weird behaviour because _he_enabled_
> rate-limiting. Hopefully, that might also be useful for the user itself
> to fine tune the limiting parameters, although I think the perf-counters
> are already quite well suited for this.
As much as possible, we want the system to Just Work.  Under normal
circumstances it wouldn''t be too unusual for a VM to have a several-ms
delay between receiving a physical interrupt and being scheduled; I
think that if the 1ms delay works, having it on all the time would
probably be the best solution.  That''s another reason I''m in
favor of
trying it -- it''s simple and easy to understand, and doesn''t
require
detecting when to "turn it on".

 -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Lv, Hui

2011-Oct-29 02:05 UTC

head link

RE: [Xen-devel] [PATCH] scheduler rate controller

I have tried one way very similar as your idea.
1) to check whether current running vcpu runs less than 1ms, if yes, we will
return current vcpu directly without preemption.
It try to guarantee vcpu to run as long as 1ms, if it wants.
It can reduce the scheduling frequency to some degree, but not very significant.
Because 1ms is too light/weak with comparison to 10ms delay (SRC patch used).

As you said, if applying the seveal_ms_delay, it will happen whenever system is
normal or not (excessive frequency). It may possible have the consequence that
1)under normal condition, it will produce worse Qos than that without applying
such delay, 2) under excessive frequency condition, the mitigation effect of
1ms-delay may be too weak. In addition, your idea is to delay scheduling instead
of reducing, which means the total number of scheduling would probably not
change.
I think one possible solution, is to make the value of 1ms-delay adaptive
according to the system status (low load or high load). If so, SRC patch just
covered the excessive condition currently :). That's why I mentioned to
treat normal and excessive conditions separately and don't influence the
normal case as much as possible. Because we never know the consequence without
amount of testing work. :)

Some of my stupid thinking :)

Best regards,

Lv, Hui

-----Original Message-----
From: George Dunlap [mailto:george.dunlap@citrix.com] 
Sent: Saturday, October 29, 2011 12:19 AM
To: Dario Faggioli
Cc: Lv, Hui; George Dunlap; Duan, Jiangang; Tian, Kevin;
xen-devel@lists.xensource.com; Keir (Xen.org); Dong, Eddie
Subject: RE: [Xen-devel] [PATCH] scheduler rate controller

On Fri, 2011-10-28 at 11:09 +0100, Dario Faggioli wrote:> Not sure yet, I can imagine it's tricky and I need to dig a bit more 
> in the code, but I'll let know if I found a way of doing that...
There are lots of reasons why the SCHEDULE_SOFTIRQ gets raised.  But I think we
want to focus on the scheduler itself raising it as a result of the .wake()
callback.  Whether the .wake() happens as a result of a HW interrupt or
something else, I don't think really matters.

Dario and Hui,  neither of you have commented on my idea, which is simply
don't preempt a VM if it has run for less than some amount of time (say,
500us or 1ms).  If a higher-priority VM is woken up, see how long the current VM
has run.  If it's less than 1ms, set a 1ms timer and call schedule() then.
> > > More generally speaking, I see how this feature can be useful,
and
> > > I also think it could live in the generic schedule.c code, but
(as
> > > George was saying) the algorithm by which rate-limiting is 
> > > happening needs to be well known, documented and exposed to the 
> > > user (more than by means of a couple of perf-counters).
> > > 
> > 
> > One question is that, what is the right palace to document such
information? I'd like to make it as clear as possible to the users.
> > 
> Well, don't know, maybe a WARN (a WARN_ONCE alike thing would probably 
> be better), or in general something that leave a footstep in the logs, 
> so that one can find out by means of `xl dmesg' or related. Obviously, 
> I'm not suggesting of printk-ing each suppressed schedule invocation, 
> or the overhead would get even worse... :-P
> 
> I'm thinking of something that happens the very first time the 
> limiting fires, or maybe oncee some period/number of suppressions, 
> just to remind the user that he's getting weird behaviour because 
> _he_enabled_ rate-limiting. Hopefully, that might also be useful for 
> the user itself to fine tune the limiting parameters, although I think 
> the perf-counters are already quite well suited for this.
As much as possible, we want the system to Just Work.  Under normal
circumstances it wouldn't be too unusual for a VM to have a several-ms delay
between receiving a physical interrupt and being scheduled; I think that if the
1ms delay works, having it on all the time would probably be the best solution. 
That's another reason I'm in favor of trying it -- it's simple and
easy to understand, and doesn't require detecting when to "turn it
on".

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Nov-03 04:28 UTC

head link

Re: [Xen-devel] [PATCH] scheduler rate controller

On Sat, Oct 29, 2011 at 11:05 AM, Lv, Hui <hui.lv@intel.com>
wrote:> I have tried one way very similar as your idea.
> 1) to check whether current running vcpu runs less than 1ms, if yes, we
will return current vcpu directly without preemption.
> It try to guarantee vcpu to run as long as 1ms, if it wants.
> It can reduce the scheduling frequency to some degree, but not very
significant. Because 1ms is too light/weak with comparison to 10ms delay (SRC
patch used).
Hey Hui,  Sorry for the delay in response -- FYI I''m at the XenSummit
Korea now, and I''ll be on holiday next week.

Do you have the patch that you wrote for the 1ms delay handy, and any
numbers that you ran?  I''m a bit surprised that a 1ms delay
didn''t
have much effect; but in any case, it seems dialing that up should
have a similar effect -- e.g., if we changed that to 10ms, then it
should have a similar effect to the patch that you sent before.
> As you said, if applying the seveal_ms_delay, it will happen whenever
system is normal or not (excessive frequency). It may possible have the
consequence that 1)under normal condition, it will produce worse Qos than that
without applying such delay,
Perhaps; but the current credit scheduler may already allow a VM to
run exclusively for 30ms, so I don''t think that overall it should have
a big influence.
> 2) under excessive frequency condition, the mitigation effect of 1ms-delay
may be too weak. In addition, your idea is to delay scheduling instead of
reducing, which means the total number of scheduling would probably not change.
Well it will prevent preemption; so as long as at least one VM does
not yield, it will reduce the number of schedule events to 1000 times
per second.  If all VMs yield, then you can''t really reduce the number
of scheduling events anyway (even with your preemption-disable patch).
> I think one possible solution, is to make the value of 1ms-delay adaptive
according to the system status (low load or high load). If so, SRC patch just
covered the excessive condition currently :). That''s why I mentioned to
treat normal and excessive conditions separately and don''t influence
the normal case as much as possible. Because we never know the consequence
without amount of testing work. :)
Yes, exactly. :-)
> Some of my stupid thinking :)
Well, you''ve obviously done a lot more looking recently than I have.
:-)

I''m attaching a prototype minimum timeslice patch that I threw
together last week.  It currently hangs during boot, but it will give
you the idea of what I was thinking of.

Hui, can you let me know what you think of the idea, and if you find
it interesting, could you try to fix it up, and test it?  Testing it
with bigger values like 5ms would be really interesting.

 -George
>
> Best regards,
>
> Lv, Hui
>
>
> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@citrix.com]
> Sent: Saturday, October 29, 2011 12:19 AM
> To: Dario Faggioli
> Cc: Lv, Hui; George Dunlap; Duan, Jiangang; Tian, Kevin;
xen-devel@lists.xensource.com; Keir (Xen.org); Dong, Eddie
> Subject: RE: [Xen-devel] [PATCH] scheduler rate controller
>
> On Fri, 2011-10-28 at 11:09 +0100, Dario Faggioli wrote:
>> Not sure yet, I can imagine it''s tricky and I need to dig a
bit more
>> in the code, but I''ll let know if I found a way of doing
that...
>
> There are lots of reasons why the SCHEDULE_SOFTIRQ gets raised.  But I
think we want to focus on the scheduler itself raising it as a result of the
.wake() callback.  Whether the .wake() happens as a result of a HW interrupt or
something else, I don''t think really matters.
>
> Dario and Hui,  neither of you have commented on my idea, which is simply
don''t preempt a VM if it has run for less than some amount of time
(say, 500us or 1ms).  If a higher-priority VM is woken up, see how long the
current VM has run.  If it''s less than 1ms, set a 1ms timer and call
schedule() then.
>
>> > > More generally speaking, I see how this feature can be
useful, and
>> > > I also think it could live in the generic schedule.c code,
but (as
>> > > George was saying) the algorithm by which rate-limiting is
>> > > happening needs to be well known, documented and exposed to
the
>> > > user (more than by means of a couple of perf-counters).
>> > >
>> >
>> > One question is that, what is the right palace to document such
information? I''d like to make it as clear as possible to the users.
>> >
>> Well, don''t know, maybe a WARN (a WARN_ONCE alike thing would
probably
>> be better), or in general something that leave a footstep in the logs,
>> so that one can find out by means of `xl dmesg'' or related.
Obviously,
>> I''m not suggesting of printk-ing each suppressed schedule
invocation,
>> or the overhead would get even worse... :-P
>>
>> I''m thinking of something that happens the very first time the
>> limiting fires, or maybe oncee some period/number of suppressions,
>> just to remind the user that he''s getting weird behaviour
because
>> _he_enabled_ rate-limiting. Hopefully, that might also be useful for
>> the user itself to fine tune the limiting parameters, although I think
>> the perf-counters are already quite well suited for this.
>
> As much as possible, we want the system to Just Work.  Under normal
circumstances it wouldn''t be too unusual for a VM to have a several-ms
delay between receiving a physical interrupt and being scheduled; I think that
if the 1ms delay works, having it on all the time would probably be the best
solution.  That''s another reason I''m in favor of trying it --
it''s simple and easy to understand, and doesn''t require
detecting when to "turn it on".
>
>  -George
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Lv, Hui

2011-Nov-04 14:08 UTC

head link

RE: [Xen-devel] [PATCH] scheduler rate controller

> 
> Hey Hui,  Sorry for the delay in response -- FYI I'm at the XenSummit
Korea
> now, and I'll be on holiday next week.
> 
Have a good trip in Korea and the following holiday! And Say Hi to everyone
there。 ：）
> I'm attaching a prototype minimum timeslice patch that I threw together
last
> week.  It currently hangs during boot, but it will give you the idea of
what I
> was thinking of.
> 
> Hui, can you let me know what you think of the idea, and if you find it
> interesting, could you try to fix it up, and test it?  Testing it with
bigger values
> like 5ms would be really interesting.
I agree that this idea seems more natural and proper, if it can solve the two
problems that I addressed above. We need data to prove/disprove it.
As you mentioned that, this method is supposed to have the similar result as the
patch I sent, when setting 10ms as the delay value in the excessive condition.
So that, an idea came to me that may enforce your proposal,
1. we still count the scheduling number during each period (for example 10ms)
2. This scheduling number is used to adaptively decide the delay value.
For example, if scheduling number is very excessive, we can set longer delay
time, such as 5ms or 10ms. Or if the scheduling number is small, we can set
small delay time, such as 1ms, 500us or even zero. In this way, the delay value
is decided adaptively.
I think It can solve the possible problems that I addressed above.
George, how do you think this?
I'd like to try this and see the result. May also to compare the results
between different solutions. As you know, SPECvirt workloads is too complex that
I need some time to produce this :).
Also we have a set of small workloads to make quick testing.




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Nov-14 15:22 UTC

head link

Re: [Xen-devel] [PATCH] scheduler rate controller

On Fri, Nov 4, 2011 at 2:08 PM, Lv, Hui <hui.lv@intel.com>
wrote:> So that, an idea came to me that may enforce your proposal,
> 1. we still count the scheduling number during each period (for example
10ms)
> 2. This scheduling number is used to adaptively decide the delay value.
> For example, if scheduling number is very excessive, we can set longer
delay time, such as 5ms or 10ms. Or if the scheduling number is small, we can
set small delay time, such as 1ms, 500us or even zero. In this way, the delay
value is decided adaptively.
Setting the value adaptively is good, but only if it''s adapting to the
right thing. :-)

For instance, adapting to number of cache misses, or to latency
requirements of guests, seems like a good idea.

But adapting to the number of scheduling events in the last period
doesn''t seem very useful -- especially since our whole goal is to
change the number of scheduling events to be fewer. :-)
> I''d like to try this and see the result. May also to compare the
results between different solutions. As you know, SPECvirt workloads is too
complex that I need some time to produce this :).
I''ve heard that; thanks for doing the work.
> Also we have a set of small workloads to make quick testing.
What kinds of workloads are these?

Our performance team here is also trying to develop a lighter-weight
(i.e., easier to set up and run) benchmark for scalability /
consolidation.  Hopefully once they get that up and running I can test
the scheduling characteristics as well.

Peace,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Nov-14 15:30 UTC

head link

Re: [Xen-devel] [PATCH] scheduler rate controller

On Fri, Nov 4, 2011 at 2:08 PM, Lv, Hui <hui.lv@intel.com>
wrote:> I''d like to try this and see the result. May also to compare the
results between different solutions. As you know, SPECvirt workloads is too
complex that I need some time to produce this :).
> Also we have a set of small workloads to make quick testing.
BTW, did you take any traces when running SPECVirt for your previous
tests?  If you could take some traces of SPECVirt whenever you get to
doing the new tests, that would be really helpful in understanding
both how the SPECVirt benchmark behaves, and how we can best tweak the
scheduler so that it runs better.

Thanks,
 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Lv, Hui

2011-Nov-28 17:31 UTC

head link

Re: [PATCH] scheduler rate controller

Hi, George

Sorry for the late. I also met issues to boot up xen according to your patch,
which is same as credit_3.patch that I attached.
So I modified it to the credit_1.patch and credit_2.patch, both of which work
well.
1) credit_1 adopts "scheduling frequency counting" to decide the value
of sched_ratelimit_us, which makes it adaptive.
2) credit_2 adopts the constant sched_ratelimit_us value 1000. 

Although the performance comparison data is still in process, I want to hear
some feedbacks from you.
I think I shall share the data very soon when the system becomes stable.


Best regards,

Lv, Hui


-----Original Message-----
From: dunlapg@gmail.com [mailto:dunlapg@gmail.com] On Behalf Of George Dunlap
Sent: Thursday, November 03, 2011 12:29 PM
To: Lv, Hui
Cc: George Dunlap; Dario Faggioli; Tian, Kevin; xen-devel@lists.xensource.com;
Keir (Xen.org); Dong, Eddie; Duan, Jiangang
Subject: Re: [Xen-devel] [PATCH] scheduler rate controller

On Sat, Oct 29, 2011 at 11:05 AM, Lv, Hui <hui.lv@intel.com>
wrote:> I have tried one way very similar as your idea.
> 1) to check whether current running vcpu runs less than 1ms, if yes, we
will return current vcpu directly without preemption.
> It try to guarantee vcpu to run as long as 1ms, if it wants.
> It can reduce the scheduling frequency to some degree, but not very
significant. Because 1ms is too light/weak with comparison to 10ms delay (SRC
patch used).
Hey Hui,  Sorry for the delay in response -- FYI I''m at the XenSummit
Korea now, and I''ll be on holiday next week.

Do you have the patch that you wrote for the 1ms delay handy, and any numbers
that you ran?  I''m a bit surprised that a 1ms delay didn''t
have much effect; but in any case, it seems dialing that up should have a
similar effect -- e.g., if we changed that to 10ms, then it should have a
similar effect to the patch that you sent before.
> As you said, if applying the seveal_ms_delay, it will happen whenever 
> system is normal or not (excessive frequency). It may possible have 
> the consequence that 1)under normal condition, it will produce worse 
> Qos than that without applying such delay,
Perhaps; but the current credit scheduler may already allow a VM to run
exclusively for 30ms, so I don''t think that overall it should have a
big influence.
> 2) under excessive frequency condition, the mitigation effect of 1ms-delay
may be too weak. In addition, your idea is to delay scheduling instead of
reducing, which means the total number of scheduling would probably not change.
Well it will prevent preemption; so as long as at least one VM does not yield,
it will reduce the number of schedule events to 1000 times per second.  If all
VMs yield, then you can''t really reduce the number of scheduling events
anyway (even with your preemption-disable patch).
> I think one possible solution, is to make the value of 1ms-delay 
> adaptive according to the system status (low load or high load). If 
> so, SRC patch just covered the excessive condition currently :). 
> That''s why I mentioned to treat normal and excessive conditions 
> separately and don''t influence the normal case as much as
possible.
> Because we never know the consequence without amount of testing work. 
> :)
Yes, exactly. :-)
> Some of my stupid thinking :)
Well, you''ve obviously done a lot more looking recently than I have.
:-)

I''m attaching a prototype minimum timeslice patch that I threw together
last week.  It currently hangs during boot, but it will give you the idea of
what I was thinking of.

Hui, can you let me know what you think of the idea, and if you find it
interesting, could you try to fix it up, and test it?  Testing it with bigger
values like 5ms would be really interesting.

 -George
>
> Best regards,
>
> Lv, Hui
>
>
> -----Original Message-----
> From: George Dunlap [mailto:george.dunlap@citrix.com]
> Sent: Saturday, October 29, 2011 12:19 AM
> To: Dario Faggioli
> Cc: Lv, Hui; George Dunlap; Duan, Jiangang; Tian, Kevin; 
> xen-devel@lists.xensource.com; Keir (Xen.org); Dong, Eddie
> Subject: RE: [Xen-devel] [PATCH] scheduler rate controller
>
> On Fri, 2011-10-28 at 11:09 +0100, Dario Faggioli wrote:
>> Not sure yet, I can imagine it''s tricky and I need to dig a
bit more
>> in the code, but I''ll let know if I found a way of doing
that...
>
> There are lots of reasons why the SCHEDULE_SOFTIRQ gets raised.  But I
think we want to focus on the scheduler itself raising it as a result of the
.wake() callback.  Whether the .wake() happens as a result of a HW interrupt or
something else, I don''t think really matters.
>
> Dario and Hui,  neither of you have commented on my idea, which is simply
don''t preempt a VM if it has run for less than some amount of time
(say, 500us or 1ms).  If a higher-priority VM is woken up, see how long the
current VM has run.  If it''s less than 1ms, set a 1ms timer and call
schedule() then.
>
>> > > More generally speaking, I see how this feature can be
useful,
>> > > and I also think it could live in the generic schedule.c
code,
>> > > but (as George was saying) the algorithm by which
rate-limiting
>> > > is happening needs to be well known, documented and exposed
to
>> > > the user (more than by means of a couple of perf-counters).
>> > >
>> >
>> > One question is that, what is the right palace to document such
information? I''d like to make it as clear as possible to the users.
>> >
>> Well, don''t know, maybe a WARN (a WARN_ONCE alike thing would 
>> probably be better), or in general something that leave a footstep in 
>> the logs, so that one can find out by means of `xl dmesg'' or
related.
>> Obviously, I''m not suggesting of printk-ing each suppressed
schedule
>> invocation, or the overhead would get even worse... :-P
>>
>> I''m thinking of something that happens the very first time the
>> limiting fires, or maybe oncee some period/number of suppressions, 
>> just to remind the user that he''s getting weird behaviour
because
>> _he_enabled_ rate-limiting. Hopefully, that might also be useful for 
>> the user itself to fine tune the limiting parameters, although I 
>> think the perf-counters are already quite well suited for this.
>
> As much as possible, we want the system to Just Work.  Under normal
circumstances it wouldn''t be too unusual for a VM to have a several-ms
delay between receiving a physical interrupt and being scheduled; I think that
if the 1ms delay works, having it on all the time would probably be the best
solution.  That''s another reason I''m in favor of trying it --
it''s simple and easy to understand, and doesn''t require
detecting when to "turn it on".
>
>  -George
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Dec-01 17:13 UTC

head link

Re: [PATCH] scheduler rate controller

On Mon, Nov 28, 2011 at 5:31 PM, Lv, Hui <hui.lv@intel.com>
wrote:> Sorry for the late. I also met issues to boot up xen according to your
patch, which is same as credit_3.patch that I attached.
Thanks Hui; debugging someone else''s buggy code is going beyond
expectations. :-)
> So I modified it to the credit_1.patch and credit_2.patch, both of which
work well.
Standard patches for OSS development need to be -p1, not -p0; they
need to work if, in the toplevel of the directory (foo.hg) you type
"patch -p1 < bar.patch". The easiest way to make one of these
patches
is to use "hg diff"; the best way (if you''re using Mercurial)
is to
use Mercurual queues.
> 2) credit_2 adopts the constant sched_ratelimit_us value 1000.
Looks fine overall.  One issue with the patch is that it will not only
fail to preempt for a higher-priority vcpu, it will also fail to
preempt for tasklets.  Tasklet work must be done immediately.  Perhaps
we can add "!tasklet_work_scheduled" to the list of conditions for

Why did you change "MICROSECS" to "MILLISECS" when
calculating
timeslice?  In this case, it will set the timeslice to a full second!
Not what we want...

From a software maintenance perspective, I''m not a fan of early
returns from functions.  I think it''s too easy not to notice that
there''s a different return path.  In this case, I think I''d
prefer
adding a label, and using "goto out;" instead.
> 1) credit_1 adopts "scheduling frequency counting" to decide the
value of sched_ratelimit_us, which makes it adaptive.
If you were using mercurial queues, you could put this after the last
one, and it would be easier to see the proposed "adaptive" part of the
code. :-)

Hypervisors are very complicated; it''s best to keep things as
absolutely simple as possible.  This kind of mechanism is exactly the
sort of thing that makes it very hard to predict what will happen.  I
think unless you can show that it adds a significant benefit, it''s
better just to use the min timeslice.

Regarding this particular code, a couple of items, just for feedback:
* All of the ratelimiting code and data structures should be in the
pluggable scheduler, not in common code.
* This code hard-codes in ''1000'' as the value it sets the
global
variable to, overriding whatever the user may have entered on the
command-line
* Furthermore, global variable is shared by all of the cpus, however;
meaning, you may have one cpu enabling it one moment based on its own
conditions, and have nother cpu disabling it almost immediatly
afterwards, based on conditions on that cpu.  If you''re testing with
this at the moment, you might as well stop -- you''re going to get a
pretty random result.  If you really wanted this to be an adaptive
solution, you''d need to make a per-cpu variable with the per-cpu rate
scheduling; and then set it to the global variable (which is the user
configuration).
* Finally, this patch doesn''t distinguish between schedules that
happen due to a yield, and schedules that happen due to preemption.
The only schedules we have any control over are schedules that happen
due to preemption.  If adaptivity has any value, it should pay
attention to what it can control.

I''ve taken your two patches, given them the proper formating, and made
them into a patch series (the first adding the ratelimiting, the
second adding the adaptive bit); they are attached.  You should be
able to easily pull them into a mercurial patchqueue using "hg qimport
/path/to/patches/*.diff".  In the future, I will not look at any
patches which do not apply using either "patch -p1" or "hg
qimport."

Thanks again for all your work on this -- I hope we can get a simple,
effective solution in place soon.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Lv, Hui

2011-Dec-11 15:27 UTC

head link

Re: [PATCH] scheduler rate controller

Hi George,

Thank you very much for your feedbacks.
I have finished the measurement work based on the delay method. From the
comparable results, 1ms-delay can do as well as SRC patch to gain significant
performance boost without obvious drawbacks.

1. Basically, the "delay method" can achieve nearly the same benefits
as my previous SRC patch, 11% overall performance boost for SPECvirt than
original credit scheduler.
2. I have tried 1ms delay and 10ms delay, there is no big difference between
these two configurations. (1ms is enough to achieve a good performance)
3. I have compared different load level response time/latency (low, high, peak),
"delay method" didn''t bring very much Qos increase.
4. 1ms delay can reduce 30% context switch at peak performance, where produces
the benefits.

You can find the raw data from the attached excel file.
The attached credit_1.diff patch works stable at my side.

> Looks fine overall.  One issue with the patch is that it will not only fail
to
> preempt for a higher-priority vcpu, it will also fail to preempt for
tasklets.
> Tasklet work must be done immediately.  Perhaps we can add
> "!tasklet_work_scheduled" to the list of conditions for
Yes, added "!tasklet_work_scheduled". I have done the experiments to
compare with/without this, there is not big difference. > 
> Why did you change "MICROSECS" to "MILLISECS" when
calculating timeslice?
> In this case, it will set the timeslice to a full second!
> Not what we want...Sorry , it''s my typo, I have changed
> From a software maintenance perspective, I''m not a fan of early
returns from
> functions.  I think it''s too easy not to notice that
there''s a different return
> path.  In this case, I think I''d prefer adding a label, and using
"goto out;"
> instead.
Followed this code style.
> If you were using mercurial queues, you could put this after the last one,
and it
> would be easier to see the proposed "adaptive" part of the code.
:-)
> 
> Hypervisors are very complicated; it''s best to keep things as
absolutely simple
> as possible.  This kind of mechanism is exactly the sort of thing that
makes it
> very hard to predict what will happen.  I think unless you can show that it
adds
> a significant benefit, it''s better just to use the min timeslice.
In fact, I have tested the results with 1ms delay and 10ms delay, there is no
significant performance improvement. It means, even though we select adaptive
method, there is also no significant performance boost with comparison to 1ms.
1ms is good enough insofar. 
> Regarding this particular code, a couple of items, just for feedback:
> * All of the ratelimiting code and data structures should be in the
pluggable
> scheduler, not in common code.Agreed.

> an adaptive solution, you''d need to make a per-cpu variable with
the per-cpu
> rate scheduling; and then set it to the global variable (which is the user
> configuration).
It seems no need to make it adaptive now :)
> I''ve taken your two patches, given them the proper formating, and
made them
> into a patch series (the first adding the ratelimiting, the second adding
the
> adaptive bit); they are attached.  You should be able to easily pull them
into a
> mercurial patchqueue using "hg qimport /path/to/patches/*.diff". 
In the
> future, I will not look at any patches which do not apply using either
"patch -p1"
> or "hg qimport."
Thanks for the coach to submit patch in a right way.> 
>  -George


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2011-Dec-12 11:43 UTC

head link

Re: [PATCH] scheduler rate controller

On Sun, Dec 11, 2011 at 3:27 PM, Lv, Hui <hui.lv@intel.com>
wrote:> Hi George,
>
> Thank you very much for your feedbacks.
> I have finished the measurement work based on the delay method. From the
comparable results, 1ms-delay can do as well as SRC patch to gain significant
performance boost without obvious drawbacks.
> 1. Basically, the "delay method" can achieve nearly the same
benefits as my previous SRC patch, 11% overall performance boost for SPECvirt
than original credit scheduler.
> 2. I have tried 1ms delay and 10ms delay, there is no big difference
between these two configurations. (1ms is enough to achieve a good performance)
> 3. I have compared different load level response time/latency (low, high,
peak), "delay method" didn''t bring very much Qos increase.
Thanks Hui, those are good results.  Just one question: What''s QoS
supposed to measure?  Is this a metric that SPECVirt reports?  Is
higher or lower better?

The patch looks good, but there''s one last nitpick: Several of your
lines have hard tab characters in them; tabs are officially verboten
in the Xen code.  Please replace them with spaces.  After that, I
think we''re ready to check it in.

One more small request: would it be possible to get some short xen
traces of SPECVirt running, at least with the 1ms-delay patch, and if
possible without it?  I''d like to have a better understanding of what
kind of scheduling workload SPECVirt creates, and how the 1ms delay
affects it.  If you have other priorities, don''t worry, I''ll
wait
until our performance team here gets it set up.  If you do have time,
the command to use is as follows:

xentrace -D -e 0x21000 -T 30 /path/to/file.trace

This will take *just* scheduling traces for 30 seconds.  If you could
run it when the benchmark is going full throttle, that should help me
get an idea what the scheduling looks like.

Thanks,
 -George

Lv, Hui

2011-Dec-13 02:24 UTC

head link

Re: [PATCH] scheduler rate controller

> Thanks Hui, those are good results.  Just one question: What''s QoS
> supposed to measure?  Is this a metric that SPECVirt reports?  Is
> higher or lower better?
> Yes, it was reported by SPECvirt. The lower, the better.
 > The patch looks good, but there''s one last nitpick: Several of
your
> lines have hard tab characters in them; tabs are officially verboten
> in the Xen code.  Please replace them with spaces.  After that, I
> think we''re ready to check it in.
>Great!!
 > the command to use is as follows:
> 
> xentrace -D -e 0x21000 -T 30 /path/to/file.trace
> 
> This will take *just* scheduling traces for 30 seconds.  If you could
> run it when the benchmark is going full throttle, that should help me
> get an idea what the scheduling looks like.
> Yes, I''d like to do this. 
Hope the data size is not very large. Since, for full throttle condition, Xen
hypervisor occupied a large CPU time (around 20%).
Correspondingly, the trace data is very big.
I''ll let you know when data is ready.

Xen devel - Oct 2011 - [PATCH] scheduler rate controller

[Xen-devel] [PATCH] scheduler rate controller

Re: [Xen-devel] [PATCH] scheduler rate controller

Re: [Xen-devel] [PATCH] scheduler rate controller

RE: [Xen-devel] [PATCH] scheduler rate controller

RE: [Xen-devel] [PATCH] scheduler rate controller

RE: [Xen-devel] [PATCH] scheduler rate controller

RE: [Xen-devel] [PATCH] scheduler rate controller

Re: [Xen-devel] [PATCH] scheduler rate controller

RE: [Xen-devel] [PATCH] scheduler rate controller

Re: [Xen-devel] [PATCH] scheduler rate controller

Re: [Xen-devel] [PATCH] scheduler rate controller

Re: [PATCH] scheduler rate controller

Re: [PATCH] scheduler rate controller

Re: [PATCH] scheduler rate controller

Re: [PATCH] scheduler rate controller

Re: [PATCH] scheduler rate controller