thr3ads.net - Xen devel - [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization [Dec 2008]

If this information is useful, please help other people find it:
Share via:

NISHIGUCHI Naoki

2008-Dec-03 08:54 UTC

[Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Hi all,

This patch is what I spoke about improvement of credit scheduler in
XenSummit Tokyo.
My presentation is now available at
http://www.xen.org/xensummit/xensummit_fall_2008.html.

In case of using Xen hypervisor on the client virtualization
environment, especially enabling vtd and passing through some devices to
a domain, I think that it is neccessary to reduce time for the vcpu in
the domain to wait its turn on a run queue.

My approach is to keep the vcpu''s priority in BOOST and to switch the
vcpu to another vcpu at short intervals when there are some vcpus in
BOOST priority .

Changes to credit scheduler are the following:

- Improve the precision of credit
  There are three changes. First change is to subtract credit for
consumed cpu time accurately. Second change is to preserve the value of
credit when credit of the vcpu is over upper bound value(currently 300).
Third change is to shorten cpu time per one credit(experimentally 30000
credits to 30ms).

- Shorten allocated time to a vcpu in BOOST priority
  Allocated time is experimentally changed to 2ms from 30ms.

- Balance credits of each vcpu of a domain

- Introduce boost credit
  Boost credit is new credit to keep a vcpu''s priority in BOOST. When a
value of boost credit is 1 or more, priority of the vcpu is set to
BOOST. Moreover, to avoid the fall of priority for abrupt cpu
consumption of the vcpu, upper bound value of boost credit can be set.


How to use:

On this patch, I added bcredit scheduler(named boost credit scheduler)
as third scheduler. In order to use bcredit scheduler, add
"sched=bcredit" option to xen.gz in grub.conf.

Then in order to boost a domain, you should enable boost credit of the
domain. There is two method.

1. Using xm command, set upper bound value of boost credit of the
domain. It is specified by not the value of credit but the millisecond.
It is named max boost period.
  e.g. domain:0, max boost period:100ms
    xm sched-bcredit -d 0 -m 100

2. Using xm command, set upper bound value of boost credit of the domain
and set boost ratio. Boost ratio is ratio to one CPU that is used for
distributing boost credit. Boost credit corresponding to boost ratio is
distributed in place of credit. An influence of other domains is not
received because of ratio to one CPU.
  e.g. domain:0, max boost period:500ms, boost ratio:80(80% to one CPU)
    xm sched-bcredit -d 0 -m 500 -r 80


Please review this patch.
Any comments are appreciated.

Best regards,
Naoki Nishiguchi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Dec-03 09:16 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

On 03/12/2008 08:54, "NISHIGUCHI Naoki"
<nisiguti@jp.fujitsu.com> wrote:
> Please review this patch.
> Any comments are appreciated.
Don''t hack it into the existing sched_credit.c unless you are really
sharing
significant amounts of stuff (which it looks like you aren''t?).
sched_bcredit.c would be a cleaner name if there''s no sharing. Is a new
scheduler necessary -- could the existing credit scheduler be generalised
with your boost mechanism to be suitable for both client and server?

The issue with multiple schedulers is that it''s most likely the
non-default
will not be tested, used or maintained. The default credit scheduler gets
little enough love as it is, and it''s really the only sensible
scheduler to
choose now (SEDF is not great -- good example of a rotten non-default
scheduler).

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2008-Dec-03 12:46 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

On Wed, Dec 3, 2008 at 9:16 AM, Keir Fraser <keir.fraser@eu.citrix.com>
wrote:> Don''t hack it into the existing sched_credit.c unless you are
really sharing
> significant amounts of stuff (which it looks like you aren''t?).
> sched_bcredit.c would be a cleaner name if there''s no sharing. Is
a new
> scheduler necessary -- could the existing credit scheduler be generalised
> with your boost mechanism to be suitable for both client and server?
I think we ought to be able to work this out; the functionality
doesn''t sound that different, and as you say, keeping two schedulers
around is only an invitation to bitrot.

The more accurate credit scheduling and vcpu credit "balancing" seem
like good ideas.  For the other changes, it''s probably worth measuring
on a battery of tests to see what kinds of effects we get, especially
on network throughput.

Nishiguchi-san, (I hope that''s right!) as I understood from your
presentation, you haven''t tested this on a server workload, but you
predict that the "boost" scheduling of 2ms will cause unnecessary
overhead for server workloads.  Is that correct?

Couldn''t we avoid the overhead this way:  If a vcpu has 5 or more
"boost" credits, we simply set the next-timer to 10ms.  If the vcpu
yields before then, we subtract the amount of "boost" credits actually
used.  If not, we subtract 5.  That way we''re not interrupting any
more frequently than we were before.

Come to think of it: won''t the effect of setting the
''boost'' time to
2ms be basically counteracted by giving domains boost credits?  I
thought the purpose reducing the boost time was to allow other domains
to run more quickly?  But if a domain has more than 5 ''boost''
credits,
it will run for a full 10 ms anyway.  Is that not so?

Could you test your video latency measurement with all the other
optimizations, but with the "boost" time set to 10ms instead of 2?  If
it works well, it''s probably worth simply merging the bulk of your
changes in and testing with server workloads.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

NISHIGUCHI Naoki

2008-Dec-04 07:45 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Thank you for your comment.

I''ll try to be suitable for both server and client.

Regards,
Naoki Nishiguchi

Keir Fraser wrote:> On 03/12/2008 08:54, "NISHIGUCHI Naoki"
<nisiguti@jp.fujitsu.com> wrote:
> 
>> Please review this patch.
>> Any comments are appreciated.
> 
> Don''t hack it into the existing sched_credit.c unless you are
really sharing
> significant amounts of stuff (which it looks like you aren''t?).
> sched_bcredit.c would be a cleaner name if there''s no sharing. Is
a new
> scheduler necessary -- could the existing credit scheduler be generalised
> with your boost mechanism to be suitable for both client and server?
> 
> The issue with multiple schedulers is that it''s most likely the
non-default
> will not be tested, used or maintained. The default credit scheduler gets
> little enough love as it is, and it''s really the only sensible
scheduler to
> choose now (SEDF is not great -- good example of a rotten non-default
> scheduler).
> 
>  -- Keir
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

NISHIGUCHI Naoki

2008-Dec-04 07:51 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Thank you for your suggestions.

George Dunlap wrote:> On Wed, Dec 3, 2008 at 9:16 AM, Keir Fraser
<keir.fraser@eu.citrix.com> wrote:
>> Don''t hack it into the existing sched_credit.c unless you are
really sharing
>> significant amounts of stuff (which it looks like you
aren''t?).
>> sched_bcredit.c would be a cleaner name if there''s no sharing.
Is a new
>> scheduler necessary -- could the existing credit scheduler be
generalised
>> with your boost mechanism to be suitable for both client and server?
> 
> I think we ought to be able to work this out; the functionality
> doesn''t sound that different, and as you say, keeping two
schedulers
> around is only an invitation to bitrot.
I had thought that the scheduler for client would be needed separately 
because this modification would influence a server workload. In order to 
minimize modifications, the bcredit scheduler was implemented by 
wrapping the current credit scheduler. I added the differences between 
original and bcredit. But as a result, almost functions were created newly.

Now, I agree that one scheduler is best.
> The more accurate credit scheduling and vcpu credit "balancing"
seem
> like good ideas.  For the other changes, it''s probably worth
measuring
> on a battery of tests to see what kinds of effects we get, especially
> on network throughput.
I didn’t think about the battery and the performance.
> Nishiguchi-san, (I hope that''s right!) as I understood from your
> presentation, you haven''t tested this on a server workload, but
you
> predict that the "boost" scheduling of 2ms will cause unnecessary
> overhead for server workloads.  Is that correct?
Yes, you are correct. I answered that in Q/A.
> Couldn''t we avoid the overhead this way:  If a vcpu has 5 or more
> "boost" credits, we simply set the next-timer to 10ms.  If the
vcpu
> yields before then, we subtract the amount of "boost" credits
actually
> used.  If not, we subtract 5.  That way we''re not interrupting any
> more frequently than we were before.
I set the next-timer to 2ms in any vcpu having “boost” credits since 
every vcpu having “boost” credits need to be run equally at short 
intervals. If there are vcpus having “boost” credits and the next-timer 
of a vcpu is set to 10ms, the other vcpus will be waited during 10ms.

At present, I am thinking that if the other vcpus don’t have “boost” 
credits then we may set the next-timer to 30ms.

> Come to think of it: won''t the effect of setting the
''boost'' time to
> 2ms be basically counteracted by giving domains boost credits?  I
> thought the purpose reducing the boost time was to allow other domains
> to run more quickly?  But if a domain has more than 5
''boost'' credits,
> it will run for a full 10 ms anyway.  Is that not so?
I suppose that there are two domains given “boost” credits. One domain 
runs for 2ms, then the other domain runs for 2ms, then one domain runs 
for 2ms, then the other domain runs for 2ms, … Because I think to need 
that waited time of both is same.
> Could you test your video latency measurement with all the other
> optimizations, but with the "boost" time set to 10ms instead of
2?  If
> it works well, it''s probably worth simply merging the bulk of your
> changes in and testing with server workloads.
I tested the video latency measurement with the “boost” time set to 
10ms. But it regretted not to work well. As I was mentioned above, the 
vcpu was occasionally waited during 10ms.

On my patch, “boost” time is tuneable. How about the default “boost” 
time is 30ms and if necessary, “boost” time is set? Is it acceptable?

In order to lengthen the “boost” time as much as possible, I will think 
about computing the length of the next-timer of the vcpu having “boost” 
credits.
I’ll try to revise the patch.

And thanks again.

Best regards,
Naoki Nishiguchi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2008-Dec-04 12:21 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

On Thu, Dec 4, 2008 at 7:51 AM, NISHIGUCHI Naoki
<nisiguti@jp.fujitsu.com> wrote:>> The more accurate credit scheduling and vcpu credit
"balancing" seem
>> like good ideas.  For the other changes, it''s probably worth
measuring
>> on a battery of tests to see what kinds of effects we get, especially
>> on network throughput.
>
> I didn''t think about the battery and the performance.
I''m sorry, I used an uncommon definition of the word
"battery"; I
should have been more careful. :-)

In this context, "a battery of tests" means "a combination of
several
different kinds of tests."  I meant some disk-intensive tests, some
network-intensive tests, some cpu-intensive tests, and some
combination of all three.  I can run some of these, and you can make
sure that the "client" tests still work well.  It would probably be
helpful to have other people volunteer to do some testing as well,
just to make sure we have our bases covered.
> I set the next-timer to 2ms in any vcpu having "boost" credits
since every
> vcpu having "boost" credits need to be run equally at short
intervals. If
> there are vcpus having "boost" credits and the next-timer of a
vcpu is set
> to 10ms, the other vcpus will be waited during 10ms.
> At present, I am thinking that if the other vcpus don''t have
"boost" credits
> then we may set the next-timer to 30ms.
I see -- the current setup is good if there''s only one
"boosted" VM
(per cpu) at a time; but if there are two "boosted" VMs,
they''re back
to taking turns at 30 ms.  Your 2ms patch allows several
latency-sensitive VMs to share the "low latency" boost.  That makes
sense.  I agree with your suggestion: we can set the timer to 2ms only
if the next waiting vcpu on the queue is also BOOST.
> I tested the video latency measurement with the "boost" time set
to 10ms.
> But it regretted not to work well. As I was mentioned above, the vcpu was
> occasionally waited during 10ms.
OK, good to know.
> On my patch, "boost" time is tuneable. How about the default
"boost" time is
> 30ms and if necessary, "boost" time is set? Is it acceptable?
I suspect that latency-sensitive workloads such as network, especially
network servers that do very little computation, may also benefit from
short boost times.
> In order to lengthen the "boost" time as much as possible, I will
think
> about computing the length of the next-timer of the vcpu having
"boost"
> credits.
If it makes things simpler, we could just stick with 10ms timeslices
when there are no waiting vcpus with BOOST priority, and 2ms if there
is BOOST priority.  I don''t think there''s a particular need to
give a
VM only (say) 8 ms instead of 10, if there are no latency-sensitive
VMs waiting.
> I''ll try to revise the patch.
I suggest:
* Modify the credit scheduler directly, rather than having an extra scheduler
* Break down your changes into patches that make individual changes,
i.e (from your first post):
 + A patch to subtract credit consumed accurately
 + A patch to preserve the value of cpu credit when the vcpu is over upper bound
 + A patch to shorten cpu time per one credit
 + A patch to balance credits of each vcpu of a domain
 + A patch to introduce BOOST credit (both Xen and tool components)
 + A patch to shorten allocated time in BOOST priority if the next
vcpu on the runqueue is also at BOOST

Then we can evaluate each change individually.

Thanks for your work!

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2008-Dec-04 12:37 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

On Thu, Dec 4, 2008 at 12:21 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:> I see -- the current setup is good if there''s only one
"boosted" VM
> (per cpu) at a time; but if there are two "boosted" VMs,
they''re back
> to taking turns at 30 ms.  Your 2ms patch allows several
> latency-sensitive VMs to share the "low latency" boost.  That
makes
> sense.  I agree with your suggestion: we can set the timer to 2ms only
> if the next waiting vcpu on the queue is also BOOST.
There was a paper earlier this year about scheduling and I/O performance:
 http://www.cs.rice.edu/CS/Architecture/docs/ongaro-vee08.pdf

One of the things he noted was that if a driver domain is accepting
network packets for multiple VMs, we sometimes get the following
pattern:
* driver domain wakes up, starts processing packets.  Because it''s in
"over", it doesn''t get boosted.
* Passes a packet to VM 1, waking it up.  It runs in "boost",
preempting the (now lower-priority) driver domain.
* Other packets (possibly even for VM 1) sit in the driver domain''s
queue, waiting for it to get cpu time.

Their tests, for 3 networking guests and 3 cpu-intensive guests,
showed a 40% degradation in performance due to this problem.  While
we''re thinking about the scheduler, it might be worth seeing if we can
solve this.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

NISHIGUCHI Naoki

2008-Dec-05 02:47 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Hi,

Thank you for your comments and suggestions.

George Dunlap wrote:>> I didn''t think about the battery and the performance.
> 
> I''m sorry, I used an uncommon definition of the word
"battery"; I
> should have been more careful. :-)
> 
> In this context, "a battery of tests" means "a combination
of several
> different kinds of tests."  I meant some disk-intensive tests, some
> network-intensive tests, some cpu-intensive tests, and some
> combination of all three.  I can run some of these, and you can make
> sure that the "client" tests still work well.  It would probably
be
> helpful to have other people volunteer to do some testing as well,
> just to make sure we have our bases covered.
Oh, I misread the word “battery”. I understand what “a battery of tests” 
means.
By the way, what tests do you concretely do? I have no idea on these tests.
>> I set the next-timer to 2ms in any vcpu having "boost"
credits since every
>> vcpu having "boost" credits need to be run equally at short
intervals. If
>> there are vcpus having "boost" credits and the next-timer of
a vcpu is set
>> to 10ms, the other vcpus will be waited during 10ms.
> 
>> At present, I am thinking that if the other vcpus don''t have
"boost" credits
>> then we may set the next-timer to 30ms.
> 
> I see -- the current setup is good if there''s only one
"boosted" VM
> (per cpu) at a time; but if there are two "boosted" VMs,
they''re back
> to taking turns at 30 ms.  Your 2ms patch allows several
> latency-sensitive VMs to share the "low latency" boost.  That
makes
> sense.  I agree with your suggestion: we can set the timer to 2ms only
> if the next waiting vcpu on the queue is also BOOST.
OK.
We must consider also a sleeping vcpu. The vcpu will be added to the 
queue by wakeup. So, we can set the timer to 2ms only if the next 
waiting vcpu on the queue or the sleeping vcpu is also BOOST.

My thought about 2ms is: the period that the vcpu will be executed next 
is 2ms. Therefore, time slice of the vcpu is changed according to the 
number of existing vcpus. In a word, we may set the timer to 2ms or 
less. But I think that the number of vcpus will not be so much. Is this 
supposition wrong? And how about time slice of 2ms or less?
>> On my patch, "boost" time is tuneable. How about the default
"boost" time is
>> 30ms and if necessary, "boost" time is set? Is it acceptable?
> 
> I suspect that latency-sensitive workloads such as network, especially
> network servers that do very little computation, may also benefit from
> short boost times.
I think so, too.
>> In order to lengthen the "boost" time as much as possible, I
will think
>> about computing the length of the next-timer of the vcpu having
"boost"
>> credits.
> 
> If it makes things simpler, we could just stick with 10ms timeslices
> when there are no waiting vcpus with BOOST priority, and 2ms if there
> is BOOST priority.  I don''t think there''s a particular
need to give a
> VM only (say) 8 ms instead of 10, if there are no latency-sensitive
> VMs waiting.
I agree.
>> I''ll try to revise the patch.
> 
> I suggest:
> * Modify the credit scheduler directly, rather than having an extra
scheduler
> * Break down your changes into patches that make individual changes,
> i.e (from your first post):
>  + A patch to subtract credit consumed accurately
>  + A patch to preserve the value of cpu credit when the vcpu is over upper
bound
>  + A patch to shorten cpu time per one credit
>  + A patch to balance credits of each vcpu of a domain
>  + A patch to introduce BOOST credit (both Xen and tool components)
>  + A patch to shorten allocated time in BOOST priority if the next
> vcpu on the runqueue is also at BOOST
> 
> Then we can evaluate each change individually.
OK.
I’ll separate individual changes from current patch and post each patch.

Best regards,
Naoki Nishiguchi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

NISHIGUCHI Naoki

2008-Dec-05 03:17 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Thanks for your information.

George Dunlap wrote:> There was a paper earlier this year about scheduling and I/O performance:
>  http://www.cs.rice.edu/CS/Architecture/docs/ongaro-vee08.pdf
> 
> One of the things he noted was that if a driver domain is accepting
> network packets for multiple VMs, we sometimes get the following
> pattern:
> * driver domain wakes up, starts processing packets.  Because it''s
in
> "over", it doesn''t get boosted.
> * Passes a packet to VM 1, waking it up.  It runs in "boost",
> preempting the (now lower-priority) driver domain.
> * Other packets (possibly even for VM 1) sit in the driver
domain''s
> queue, waiting for it to get cpu time.
I don''t read the paper yet, but I think our approach is effective in 
this problem.
However, if driver domain consumes cpu time too much, we couldn''t 
prevent it from becoming "over" priority. Otherwise, we could keep it 
with "under" or "boost" priority.
> Their tests, for 3 networking guests and 3 cpu-intensive guests,
> showed a 40% degradation in performance due to this problem.  While
> we''re thinking about the scheduler, it might be worth seeing if we
can
> solve this.
Firstly, I''d like to read the paper.

Regards,
Naoki Nishiguchi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2008-Dec-05 11:37 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

On Fri, Dec 5, 2008 at 2:47 AM, NISHIGUCHI Naoki
<nisiguti@jp.fujitsu.com> wrote:> Oh, I misread the word "battery". I understand what "a
battery of tests"
> means.
> By the way, what tests do you concretely do? I have no idea on these tests.
For basic workload tests, a couple are pretty handy.  vConsolidate is
a good test, but pretty hard to set up; I should be able to manage it
with our infrastructure here, though.  Other tests include:
* kernel-build (i.e., time how long it takes to build the Linux
kernel) and or ddk-build (Windows equivalent)
* specjbb (a cpu-intensive workload)
* netperf (for networks)

For testing its effect on network, the paper I mentioned has three
workloads that it combines with different ways:
* cpu (just busy spinning)
* sustained network (netbench): throughput
* network ping: latency.
> OK.
> We must consider also a sleeping vcpu. The vcpu will be added to the queue
> by wakeup. So, we can set the timer to 2ms only if the next waiting vcpu on
> the queue or the sleeping vcpu is also BOOST.
>
> My thought about 2ms is: the period that the vcpu will be executed next is
> 2ms. Therefore, time slice of the vcpu is changed according to the number
of
> existing vcpus. In a word, we may set the timer to 2ms or less. But I think
> that the number of vcpus will not be so much. Is this supposition wrong?
And
> how about time slice of 2ms or less?
I think I understand you to mean: If we set the timer for 10ms, and in
the mean time another vcpu wakes up and is set at BOOST, then it won''t
get a chance to run for another 10 ms.  And you''re suggesting that we
run the scheduler at 2ms if there are any vcpus that *may* wake up and
be at BOOST, just in case; and you don''t think this situation will
happen very often.  Is that correct?

Unfortunately, in consolidated server workloads you''re pretty likely
to have more vcpus than physical cpus, so I think this case would come
up pretty often.  Furthermore, 2ms is really too short a scheduling
quantum for normal use, especially for HVM domains, which have to take
a vmexit/vmenter cycle to handle every interrupt.  (I did some tests
back when we were using the SEDF scheduler, and the scheduling alone
was a 4-5% overhead for HVM domains.)

But I don''t think we actually have a problem here: if a vcpu wakes up
and is promoted to BOOST, won''t it "tickle" the runqueues to
find
somewhere for it to run?  At very least the current cpu should be able
to run it, or if it''s already running one at BOOST, it can set its own
timer to 2ms.  In any case, I think handling this corner case with
some extra code is preferrable to running a 2ms timer any time it
*might* happen.
> OK.
> I''ll separate individual changes from current patch and post each
patch.
Thanks!  I''ll take them for a spin today.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

NISHIGUCHI Naoki

2008-Dec-08 08:37 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

George Dunlap wrote:> For basic workload tests, a couple are pretty handy.  vConsolidate is
> a good test, but pretty hard to set up; I should be able to manage it
> with our infrastructure here, though.  Other tests include:
> * kernel-build (i.e., time how long it takes to build the Linux
> kernel) and or ddk-build (Windows equivalent)
> * specjbb (a cpu-intensive workload)
> * netperf (for networks)
> 
> For testing its effect on network, the paper I mentioned has three
> workloads that it combines with different ways:
> * cpu (just busy spinning)
> * sustained network (netbench): throughput
> * network ping: latency.
Thanks! I''ll try to prepare.
> I think I understand you to mean: If we set the timer for 10ms, and in
> the mean time another vcpu wakes up and is set at BOOST, then it
won''t
> get a chance to run for another 10 ms.  And you''re suggesting that
we
> run the scheduler at 2ms if there are any vcpus that *may* wake up and
> be at BOOST, just in case; and you don''t think this situation will
> happen very often.  Is that correct?
Almost that is correct.
I had thought that we run the scheduler at 2ms only if there are vcpus 
that have boost credit and are already at BOOST. But I don''t think so
now.
> Unfortunately, in consolidated server workloads you''re pretty
likely
> to have more vcpus than physical cpus, so I think this case would come
> up pretty often.  Furthermore, 2ms is really too short a scheduling
> quantum for normal use, especially for HVM domains, which have to take
> a vmexit/vmenter cycle to handle every interrupt.  (I did some tests
> back when we were using the SEDF scheduler, and the scheduling alone
> was a 4-5% overhead for HVM domains.)
I see.
> But I don''t think we actually have a problem here: if a vcpu wakes
up
> and is promoted to BOOST, won''t it "tickle" the
runqueues to find
> somewhere for it to run?  At very least the current cpu should be able
> to run it, or if it''s already running one at BOOST, it can set its
own
> timer to 2ms.  In any case, I think handling this corner case with
> some extra code is preferrable to running a 2ms timer any time it
> *might* happen.
OK.
I implemented as follows:
- If next running vcpu is at BOOST and first vcpu on run-queue is at 
BOOST, set the timer to 2ms.
- If next running vcpu is at BOOST and first vcpu on run-queue is not at 
BOOST, set the timer to 10ms.
- If next running vcpu is not at BOOST, set the timer to 30ms.
- When a vcpu wakes up, if the vcpu has boost credit then send scheduler 
interrupts to at least one CPU.

In my test environment, it works well.
I''ll post last patch today.

Thanks,
Naoki


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

NISHIGUCHI Naoki

2008-Dec-18 02:49 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Hi all,

In almost the same environment as the paper, I experimented with credit 
scheduler(original and modified version).
I describe the results below.

Unfortunately the good result was not obtained by my previous patches.

I found that there were some problems on my previous patches.
So, I had revised the patches and experimented with revised version 
again. Using revised patches, the good result was obtained.
Especially, please look at the result of ex7. In revised version, I/O 
bandwidth per guest is growing correctly according to dom0''s weight.

I''ll post the revised patches later.

Thanks,
Naoki Nishiguchi

---------- results ----------
experimental environment:
   HP dc7800 US/CT(Core2 Duo E6550 2.33GHz)
     Multi-processor: disable
   Xen: xen 3.3.0 release
   dom0: CentOS 5.2

I used the following experiments from among the paper''s experiments.
ex3: burn x7, ping x1
ex5: stream x7, ping x1
ex7: stream x3, burn x3, ping x1
ex8: stream x3, ping+burn x1, burn x3

original credit scheduler
ex3
   burn(%):      14 14 14 14 14 14 14
   ping(ms):     19.7(average)  0.1 - 359
ex5
   stream(Mbps): 144.05 141.19 137.81 137.01 137.30 138.76 142.21
   ping(ms)    : 8.2(average)  7.84 - 8.63
ex7
   stream(Mbps): 33.74 27.74 34.70
   burn(%):      28 28 28 (by guess)
   ping(ms):     238(average)  1.78 - 485
ex7(xm sched-credit -d 0 -w 512)
   There was no change in the result.
ex8
   stream(Mbps): 9.98 11.32 10.61
   ping+burn:    264.9ms(average)  20.3 - 547
                 24%
   burn(%):      24 24 24


modified version(previous patches)
ex3
   burn(%):      14 14 14 14 14 14 14
   ping(ms):     0.17(average)  0.136 - 0.202
ex5
   stream(Mbps): 143.90 141.79 137.15 138.43 138.37 130.33 143.36
   ping(ms):     7.2(average)  4.85 - 8.95
ex7
   stream(Mbps): 2.33 2.18 1.87
   burn(%):      32 32 32 (by guess)
   ping(ms):     373.7(average)  68.0 - 589
ex7(xm sched-credit -d 0 -w 512)
   There was no change in the result.
ex7(xm sched-credit -d 0 -m 100 -r 20)
   stream(Mbps): 114.49 117.59 115.76
   burn(%):      24 24 24
   ping(ms):     1.2(average)  0.158 - 65.1
ex8
   stream(Mbps): 1.31 1.09 1.92
   ping+burn:    387.7ms(average)  92.6 - 676
                 24% (by guess)
   burn(%):      24 24 24 (by guess)


revised version
ex3
   burn(%):      14 14 14 14 14 14 14
   ping(ms):     0.18(average)  0.140 - 0.238
ex5
   stream(Mbps): 142.57 139.03 137.50 136.77 137.61 138.95 142.63
   ping(ms):     8.2(average)  7.86 - 8.71
ex7
   stream(Mbps): 143.63 132.13 131.77
   burn(%):      24 24 24
   ping(ms):     32.2(average)  1.73 - 173
ex7(xm sched-credit -d 0 -w 512)
   stream(Mbps): 240.06 204.85 229.23
   burn(%):      18 18 18
   ping(ms):     7.0(average)  0.412 - 73.9
ex7(xm sched-credit -d 0 -m 100 -r 20)
   stream(Mbps): 139.74 134.95 135.18
   burn(%):      23 23 23
   ping(ms):     15.1(average)  1.87 - 95.4
ex8
   stream(Mbps): 118.15 106.71 116.37
   ping+burn:    68.8ms(average) 1.86 - 319
                 19%
   burn(%):      19 19 19
----------

NISHIGUCHI Naoki wrote:> Thanks for your information.
> 
> George Dunlap wrote:
>> There was a paper earlier this year about scheduling and I/O
performance:
>>  http://www.cs.rice.edu/CS/Architecture/docs/ongaro-vee08.pdf
>>
>> One of the things he noted was that if a driver domain is accepting
>> network packets for multiple VMs, we sometimes get the following
>> pattern:
>> * driver domain wakes up, starts processing packets.  Because
it''s in
>> "over", it doesn''t get boosted.
>> * Passes a packet to VM 1, waking it up.  It runs in "boost",
>> preempting the (now lower-priority) driver domain.
>> * Other packets (possibly even for VM 1) sit in the driver
domain''s
>> queue, waiting for it to get cpu time.
> 
> I don''t read the paper yet, but I think our approach is effective
in
> this problem.
> However, if driver domain consumes cpu time too much, we couldn''t 
> prevent it from becoming "over" priority. Otherwise, we could
keep it
> with "under" or "boost" priority.
> 
>> Their tests, for 3 networking guests and 3 cpu-intensive guests,
>> showed a 40% degradation in performance due to this problem.  While
>> we''re thinking about the scheduler, it might be worth seeing
if we can
>> solve this.
> 
> Firstly, I''d like to read the paper.
> 
> Regards,
> Naoki Nishiguchi
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2008-Dec-18 10:21 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Naoki,

Thank you for your work!  The results look really good.

Overall, I think the scheduler as a whole needs some design work
before these can go in.  No one at this point fully understands the
principles on which it''s supposed to run.  I''ve been taking a
close
look at the unmodified scheduler (notably trying to understand the
anomalies pointed out by Atsushi), and I think it''s clear that there
are some flaws in the logic.

Before making a large change like this, I think we should do several things:
* Try to describe exactly what the scheduler is currently doing, and why
* If there are some inconsistencies, change them
* Modify the description to include your proposed changes to the boost scheduler

Your changes, although proven effective, make the scheduler much more
complicated.  If no one understands it now, it will be even harder to
understand with your changes, unless we set down some very clear
documentation of how the algorithm is supposed to work.  Namely, we
need to document:

* What factors different workloads need; i.e.:
 + Long enough time for cpu-bound workloads to warm up the cache effectively
 + Fast responsiveness for "latency-sensitive" workloads, esp. in the
face of multiple latency-sensitive workloads
 + Fairness wrt weight
* At a high level, what we''d like to see happen
* How individual mechanisms work:
 + Credits: when they are added / subtracted
 + Priorities: when they are changed and why
 + Preemption: when a cpu-bound process gets preempted
 + Active / passive status: when and why switched from one to the other

I''ve been intending to do this for a couple of weeks now, but
I''ve got
some other patches I need to get cleaned up and submitted first.
Hopefully those will be finished by the end of the week.  This is my
very next priority.

Once I have the "design" document, I can describe your changes in
reference to them, and we can discuss them at a design level.

I have a couple of specific comments on your patches that I''ll put
inline in other e-mails.

Thank you for your work, and your patience.

 -George


On Thu, Dec 18, 2008 at 2:49 AM, NISHIGUCHI Naoki
<nisiguti@jp.fujitsu.com> wrote:> Hi all,
>
> In almost the same environment as the paper, I experimented with credit
> scheduler(original and modified version).
> I describe the results below.
>
> Unfortunately the good result was not obtained by my previous patches.
>
> I found that there were some problems on my previous patches.
> So, I had revised the patches and experimented with revised version again.
> Using revised patches, the good result was obtained.
> Especially, please look at the result of ex7. In revised version, I/O
> bandwidth per guest is growing correctly according to dom0''s
weight.
>
> I''ll post the revised patches later.
>
> Thanks,
> Naoki Nishiguchi
>
> ---------- results ----------
> experimental environment:
>  HP dc7800 US/CT(Core2 Duo E6550 2.33GHz)
>    Multi-processor: disable
>  Xen: xen 3.3.0 release
>  dom0: CentOS 5.2
>
> I used the following experiments from among the paper''s
experiments.
> ex3: burn x7, ping x1
> ex5: stream x7, ping x1
> ex7: stream x3, burn x3, ping x1
> ex8: stream x3, ping+burn x1, burn x3
>
> original credit scheduler
> ex3
>  burn(%):      14 14 14 14 14 14 14
>  ping(ms):     19.7(average)  0.1 - 359
> ex5
>  stream(Mbps): 144.05 141.19 137.81 137.01 137.30 138.76 142.21
>  ping(ms)    : 8.2(average)  7.84 - 8.63
> ex7
>  stream(Mbps): 33.74 27.74 34.70
>  burn(%):      28 28 28 (by guess)
>  ping(ms):     238(average)  1.78 - 485
> ex7(xm sched-credit -d 0 -w 512)
>  There was no change in the result.
> ex8
>  stream(Mbps): 9.98 11.32 10.61
>  ping+burn:    264.9ms(average)  20.3 - 547
>                24%
>  burn(%):      24 24 24
>
>
> modified version(previous patches)
> ex3
>  burn(%):      14 14 14 14 14 14 14
>  ping(ms):     0.17(average)  0.136 - 0.202
> ex5
>  stream(Mbps): 143.90 141.79 137.15 138.43 138.37 130.33 143.36
>  ping(ms):     7.2(average)  4.85 - 8.95
> ex7
>  stream(Mbps): 2.33 2.18 1.87
>  burn(%):      32 32 32 (by guess)
>  ping(ms):     373.7(average)  68.0 - 589
> ex7(xm sched-credit -d 0 -w 512)
>  There was no change in the result.
> ex7(xm sched-credit -d 0 -m 100 -r 20)
>  stream(Mbps): 114.49 117.59 115.76
>  burn(%):      24 24 24
>  ping(ms):     1.2(average)  0.158 - 65.1
> ex8
>  stream(Mbps): 1.31 1.09 1.92
>  ping+burn:    387.7ms(average)  92.6 - 676
>                24% (by guess)
>  burn(%):      24 24 24 (by guess)
>
>
> revised version
> ex3
>  burn(%):      14 14 14 14 14 14 14
>  ping(ms):     0.18(average)  0.140 - 0.238
> ex5
>  stream(Mbps): 142.57 139.03 137.50 136.77 137.61 138.95 142.63
>  ping(ms):     8.2(average)  7.86 - 8.71
> ex7
>  stream(Mbps): 143.63 132.13 131.77
>  burn(%):      24 24 24
>  ping(ms):     32.2(average)  1.73 - 173
> ex7(xm sched-credit -d 0 -w 512)
>  stream(Mbps): 240.06 204.85 229.23
>  burn(%):      18 18 18
>  ping(ms):     7.0(average)  0.412 - 73.9
> ex7(xm sched-credit -d 0 -m 100 -r 20)
>  stream(Mbps): 139.74 134.95 135.18
>  burn(%):      23 23 23
>  ping(ms):     15.1(average)  1.87 - 95.4
> ex8
>  stream(Mbps): 118.15 106.71 116.37
>  ping+burn:    68.8ms(average) 1.86 - 319
>                19%
>  burn(%):      19 19 19
> ----------
>
> NISHIGUCHI Naoki wrote:
>>
>> Thanks for your information.
>>
>> George Dunlap wrote:
>>>
>>> There was a paper earlier this year about scheduling and I/O
performance:
>>>  http://www.cs.rice.edu/CS/Architecture/docs/ongaro-vee08.pdf
>>>
>>> One of the things he noted was that if a driver domain is accepting
>>> network packets for multiple VMs, we sometimes get the following
>>> pattern:
>>> * driver domain wakes up, starts processing packets.  Because
it''s in
>>> "over", it doesn''t get boosted.
>>> * Passes a packet to VM 1, waking it up.  It runs in
"boost",
>>> preempting the (now lower-priority) driver domain.
>>> * Other packets (possibly even for VM 1) sit in the driver
domain''s
>>> queue, waiting for it to get cpu time.
>>
>> I don''t read the paper yet, but I think our approach is
effective in this
>> problem.
>> However, if driver domain consumes cpu time too much, we
couldn''t prevent
>> it from becoming "over" priority. Otherwise, we could keep it
with "under"
>> or "boost" priority.
>>
>>> Their tests, for 3 networking guests and 3 cpu-intensive guests,
>>> showed a 40% degradation in performance due to this problem.  While
>>> we''re thinking about the scheduler, it might be worth
seeing if we can
>>> solve this.
>>
>> Firstly, I''d like to read the paper.
>>
>> Regards,
>> Naoki Nishiguchi
>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2009-Jan-21 10:35 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Naoki,

I''m working on revising the scheduler right now, so it''s
probably best
if you hold off patches for a little while.

I''m also trying to understand the minimum that your client workloads
actually need to run well.  There were compontents of the "boost"
patch series that helped your workload:
 (a) minimum cpu time,
 (b) Shortened time slices (2ms)
 (c) "boosted" priority for multimedia domains

Is it possible that having (a) and (b), possibly with some other
combinations, could work well without adding (c)?

At any rate, I''m going to start with a revised system that has a
minimum cpu time, but no "high priority", and see if we can get things
to work OK without it.

Thanks for your work, BTW -- the scheduler has needed some attention
for a long time, but I don''t think it would have gotten it if you
hadn''t introduced these patches.

Peace,
 -George

On Wed, Jan 21, 2009 at 3:00 AM, NISHIGUCHI Naoki
<nisiguti@jp.fujitsu.com> wrote:> Hi George,
>
> George Dunlap wrote:
>>
>> Sorry, didn''t finish my thoughts before sending...
>>
>>> The original meaning of the "boost" priority was a
priority given to
>>> domains when waking up, so that latency-sensitive workloads could
>>> achieve low latency when competing with cpu-intensive workloads,
while
>>> maintaining weight.  I think this meaning of "boost" (and
the
>>> mechanism) is still important, especially for server-style
workloads.
>>
>> ...so, I think we need to maintain the old "boost" mechanism
(or
>> something like it), and come up with a new name for this "priority
cpu
>> time" feature.
>
> I believe that the old "boost" mechanism remains after applying
my patches.
> But, now I think that "priority cpu time" feature needs a new
name as you
> said.
>
> Because of not changing the existing functionalities in credit scheduler
and
> achieving continuous high-priority for a domain, I decided to use
"boost"
> mechanism, especially boost priority. In my rev2 patches, old
"boost"
> mechanism and "boost credit" I introduced were integrated
strongly and the
> good result was obtained. But, as you said and I wrote above, I think that
> the "boost" mechanism and "boost credit" should be
separated. I''ll try to
> achieve this by introducing new priority for "priority cpu time"
feature.
>
> Regards,
> Naoki
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

NISHIGUCHI Naoki

2009-Jan-22 06:15 UTC

head link

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Hi George,

George Dunlap wrote:> I''m working on revising the scheduler right now, so it''s
probably best
> if you hold off patches for a little while.
OK. I''ll wait to finish your work.
> I''m also trying to understand the minimum that your client
workloads
> actually need to run well.  There were compontents of the "boost"
> patch series that helped your workload:
>  (a) minimum cpu time,
>  (b) Shortened time slices (2ms)
>  (c) "boosted" priority for multimedia domains
> 
> Is it possible that having (a) and (b), possibly with some other
> combinations, could work well without adding (c)?
Yes, it is possible.
I divided the rev2 "boost" patch as follows without (c).
  (1) minimum cpu time (a): boost_1.patch + boost_1_tools.patch
  (2) Shortened time slices (b): boost_2.patch
  (3) alternative "boost" mechanism by boost_credit: boost_3.patch

These patches works with the following combinations.
   (1), (1)+(2), (1)+(2)+(3)
Please apply these patches in numerical order.

Without (3), didn''t solve the problem in the paper you showed.

Are these what you want?
> At any rate, I''m going to start with a revised system that has a
> minimum cpu time, but no "high priority", and see if we can get
things
> to work OK without it.
> 
> Thanks for your work, BTW -- the scheduler has needed some attention
> for a long time, but I don''t think it would have gotten it if you
> hadn''t introduced these patches.
Thanks.

Best regards,
Naoki





_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Dec 2008 - [RFC][PATCH] scheduler: credit scheduler for client virtualization

[Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization

Re: [Xen-devel] [RFC][PATCH] scheduler: credit scheduler for client virtualization