thr3ads.net - Xen devel - [Xen-devel] credit based scheduler [Jun 2006]

If this information is useful, please help other people find it:
Share via:

Ky Srinivasan

2006-Jun-21 14:07 UTC

[Xen-devel] credit based scheduler

Is there a plan to make the credit based scheduler the default
scheduler? Also, is there any performance data available for this
scheduler?

Regards,

K. Y


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Matt Ayres

2006-Jun-21 14:26 UTC

head link

Re: [Xen-devel] credit based scheduler

Ky Srinivasan wrote:> Is there a plan to make the credit based scheduler the default
> scheduler? Also, is there any performance data available for this
> scheduler?
> 
It is already the default in -unstable as of a few days ago... I can''t 
wait until it gets put into the -testing tree..

Thanks,
Matt Ayres

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Steven Hand

2006-Jun-21 14:26 UTC

head link

Re: [Xen-devel] credit based scheduler

> Is there a plan to make the credit based scheduler the default
> scheduler? Also, is there any performance data available for this
> scheduler?
It has been made the default from yesterday (xen-unstable cset
10459:a31f3bff4f76) so that we can get more widespread usage 
before calling 3.0.3 

Performance is "pretty good", and certainly fixes some of the 
previous behaviour with sedf, but I don''t have any specific 
numbers to hand. Emmanuel may have more more details? 


cheers,

S.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ky Srinivasan

2006-Jun-21 14:36 UTC

head link

Re: [Xen-devel] credit based scheduler

Thanks Matt.

K. Y
>>> On Wed, Jun 21, 2006 at 10:26 AM, in message<44995709.9000802@tektonic.net>,
Matt Ayres <matta@tektonic.net> wrote: 
> 
> Ky Srinivasan wrote:
>> Is there a plan to make the credit based scheduler the default
>> scheduler? Also, is there any performance data available for this
>> scheduler?
>> 
> 
> It is already the default in - unstable as of a few days ago... I
can''t > wait until it gets put into the - testing tree..
> 
> Thanks,
> Matt Ayres

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-21 15:29 UTC

head link

Re: [Xen-devel] credit based scheduler

On 21 Jun 2006, at 15:26, Matt Ayres wrote:
>> Is there a plan to make the credit based scheduler the default
>> scheduler? Also, is there any performance data available for this
>> scheduler?
>
> It is already the default in -unstable as of a few days ago... I
can''t
> wait until it gets put into the -testing tree..
That won''t happen until 3.0.3 (which will be a sweep-thru of unstable 
into a new branch of testing). Probably a few weeks away.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Emmanuel Ackaouy

2006-Jun-21 16:28 UTC

head link

Re: [Xen-devel] credit based scheduler

On Wed, Jun 21, 2006 at 03:26:48PM +0100, Steven Hand
wrote:> > Is there a plan to make the credit based scheduler the default
> > scheduler? Also, is there any performance data available for this
> > scheduler?
> 
> It has been made the default from yesterday (xen-unstable cset
> 10459:a31f3bff4f76) so that we can get more widespread usage 
> before calling 3.0.3 
> 
> Performance is "pretty good", and certainly fixes some of the 
> previous behaviour with sedf, but I don''t have any specific 
> numbers to hand. Emmanuel may have more more details? 
I responded to a similar enquiry from Anthony a few weeks ago:
http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01346.html

Short summary: I''ve seen improvements over the other schedulers
with everything I have tried. Let''s not forget though: Raw
performance numbers on various benchmarks are useful. But the
scheduler''s responsabilities also include defining and enforcing
fairness among all active guests.

I''ll add the following about latency and throughput philosophy:

The credit scheduler is an SMP CPU scheduler. It is designed such
that no physical CPU will ever idle if there is a waiting
runnable VCPU on the host. We call this being work conserving.
This is in contrast to other schedulers which are UP CPU
schedulers glued together with higher level tools to migrate
VCPUs between physical CPUs.

For example, consider an 4CPU server hosting 32 concurrent web
server guests. Being work conserving accross all physical CPUs,
the credit scheduler is going to give you a lot more total system
throughput (more web server guests per physical host) and better
per-guest response times (HTTP ops/sec) too.

Additionally, the credit scheduler ensures that all guests
throughout the system get a fair share of all host CPU resources:
3 active UP guests on a 2CPU host each get 33% of the total CPU
available; The credit scheduler takes care of migrating guest
VCPUs across physical CPUs to achieve this transparently. If you
are in the business of consolidating servers onto one physical
box, this is pretty important.

Finally, there is a question of when to preempt a running domain
to run another runnable one. The credit scheduler runs VCPUs in
30millisecond time-slices by default (That''s about the time it
takes the human eye to focus or notice queueing latency). A VCPU
being woken up will preempt the time-slice of a running VCPU only
if the latter has already run more than its fair share and the
former has not. A VCPU is assigned a fair share of the system if
it is "active". To be considered "active", a VCPU must be
runnable at least once in the time it takes it to fairly "earn"
one time-slice.

Consider two competing VCPUs on a UP host: A is spinning. B is
doing I/O (it''s sleeps and wakes up much more often than every
30 milliseconds). Both A and B are considered "active". If B is
not able to consume its fair share of the CPU because it is
constantly waiting for I/O, it will at least preempt A every time
it becomes runnable. B gets good service in the way of short
latencies from being runnable to actually running. On the other
hand if B only rarely does an I/O operation but consumes the CPU
otherwise, it will not always preempt other running VCPUs when it
wakes up. That would be "unfair" to the other VCPUs. The credit
scheduler uses preemption to provide good service to latency
sensitive guests, but in a fair way: I/O bound VCPUs will not
starve out compute bound ones by constantly preempting them.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Molle Bestefich

2006-Jun-21 19:39 UTC

head link

Re: [Xen-devel] credit based scheduler

Emmanuel Ackaouy wrote:> The credit scheduler is an SMP CPU scheduler. It is designed such
> that no physical CPU will ever idle if there is a waiting
> runnable VCPU on the host. We call this being work conserving.
> 3 active UP guests on a 2CPU host each get 33% of the total CPU
> available; The credit scheduler takes care of migrating guest
> VCPUs across physical CPUs to achieve this transparently.
Migrating 1 VCPU across 2 CPUs, or swapping all 3 VCPUs around in a
ring-fashion?

Sounds like this algorithm is an evil killer of CPU cache performance to me :-).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ky Srinivasan

2006-Jun-21 20:04 UTC

head link

Re: [Xen-devel] credit based scheduler

Emmanuel,

I agree with your analysis. As I think more about this, affinity based
scheduling in the guest would run into trouble even without changing the
mapping between vcpu and pcpu! (just running other domains on the pcpu
may invalidate the cache state). We may need to look at how best to
preserve the cache state for domains that need it. Some form of
exclusive VCPU pinning may be the answer here (the VCPU will run on the
pcpu it is pinned on and furthermore the pcpu will only run the vcpu
that has the exclusive binding).  It might also be useful to notify the
guest when vcpu-pcpu binding changes (via the shared page).

Regards,

K. Y
>>> On Wed, Jun 21, 2006 at  2:13 PM, in message<20060621181356.GA20321@cockermouth.uk.xensource.com>, Emmanuel
Ackaouy
<ack@xensource.com> wrote: > On Wed, Jun 21, 2006 at 11:41:14AM - 0600, Ky Srinivasan wrote:
>> with sedf. Have you looked at the implication of  load balancing in
the>> hypervisor on scheduling policies implemented in the guest os. For
>> instance if the guest is implementing  CPU affinity and as part of
vcpu>> load balancing we decide to change the mapping between the vcpu and
the>> physical cpu in the hypervisor, the scheduling decisions taken in
the>> guest would be bogus. 
> 
> True. There is a tradeoff between keeping a VCPU waiting to
> run on a particular physical CPU and running it elsewhere
> right away.
> 
> I think the only case we would prefer not to move a waiting
> VCPU from the physical CPU it last ran on to an idle one is:
> 
> The VCPU very recently stopped running on said CPU, and
> It has warmed its cache considerably, and
> Before the VCPU gets to run on the CPU again,
>     Very little will have run on the CPU, and
>     The cache will not have been significantly blown.
> 
> Basically, this says: It''s bad to move a VCPU if it has a
> physical CPU pretty much to itself.
> 
> But if a VCPU has a PCPU pretty much to itself, it''s very
> unlikely it will end up sitting on that PCPU''s runq long
> enough to be picked up by another PCPU.
> 
> I think the simple thing to do here and a good rule of thumb
> in general is not to allow cycles to go idle when there is
> runnable work. If you can think of a counter example though,
> I''d love to consider it and perhaps make some changes.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Emmanuel Ackaouy

2006-Jun-21 21:26 UTC

head link

Re: [Xen-devel] credit based scheduler

On Wed, Jun 21, 2006 at 09:39:17PM +0200, Molle Bestefich
wrote:> Emmanuel Ackaouy wrote:
> Sounds like this algorithm is an evil killer of CPU cache performance to me
> :-).
I don''t agree.

First of all, if you were to time slice two VCPUs on one PCPU
and dedicate the other PCPU for the 3rd VCPU, you''d still be
time-slicing on one PCPU and you''d pay context switching and
cache sharing costs there.

Second, migration or not, once you start time-slicing, you have
to deal with cache warming costs. The important thing is to run
long enough time slices to take advantage of the cache.

Third, there are many applications where latency to run is more
important than wasting idle CPU cycles in order to run on a
warm cache.

Fourth, in the case we were talking about of 3 active VCPUs on
a 2PCPU host: Without migration, 2 VCPUs time-slice and get 25%
of the total host CPU resources each. 1 VCPU has 50% and does
not time-slice. With migration, all 3 VCPUs time-slice and get
33% of the total CPU resources. Two out of three VCPUs do better
with migration and you end up with fair sharing of system-wide
CPU resources too.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2006-Jun-21 21:51 UTC

head link

Re: [Xen-devel] credit based scheduler

On 21 Jun 2006, at 22:26, Emmanuel Ackaouy wrote:
> Second, migration or not, once you start time-slicing, you have
> to deal with cache warming costs. The important thing is to run
> long enough time slices to take advantage of the cache.
Yep, locality across timeslices doesn''t really help. If you''ve
run a
different VCPU between timeslices then the caches will be cold. Even if 
there is a small cache advantage, it''s very unlikely to balance the 
cost of deliberately leaving CPUs idle. Things may be more interesting 
in NUMA environments, but there''s always the per-VCPU affinity map to 
restrict cross-CPU scheduling to within a NUMA node.

  -- Keir


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Molle Bestefich

2006-Jun-22 10:26 UTC

head link

Re: [Xen-devel] credit based scheduler

Emmanuel Ackaouy wrote:> Let me know if you have any other questions.
> It''s useful to discuss design philosophy on the list.
Well, if you don''t mind! :-)...

The first sentence was actually a question too:> Migrating 1 VCPU across 2 CPUs, or
> swapping all 3 VCPUs around in a ring-fashion?
To explain myself a bit more, I was wondering whether the scheduler
conceptually can be said to work a little like this (VCPUs mapped to
CPUs in a "ring-fashion"):

(VCPUs = A, B, C)

Time     CPU1  CPU2
0ms      A     B
30ms     B     C
60ms     C     A
90ms     A     B
120ms    B     C
...etc...

This would give 30ms slices with a time-to-wait for scheduling of 30ms.

Or perhaps more like this:
Time     CPU1  CPU2
0ms      A     B
30ms     A     C
60ms     B     C
90ms     B     A
120ms    C     A
...etc...

Which gives 60ms slices (warmer cache, yummi) but still with a
time-to-wait for scheduling of 30ms.

The latter obviously being a better algorithm, cache-wise...

Hope it''s not stupid questions :-).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Emmanuel Ackaouy

2006-Jun-22 10:39 UTC

head link

Re: [Xen-devel] credit based scheduler

On Thu, Jun 22, 2006 at 12:26:02PM +0200, Molle Bestefich
wrote:> To explain myself a bit more, I was wondering whether the scheduler
> conceptually can be said to work a little like this (VCPUs mapped to
> CPUs in a "ring-fashion"):
> 
> (VCPUs = A, B, C)
> 
> Time     CPU1  CPU2
> 0ms      A     B
> 30ms     B     C
> 60ms     C     A
> 90ms     A     B
> 120ms    B     C
> ...etc...
> 
> This would give 30ms slices with a time-to-wait for scheduling of 30ms.
Here, you''re moving something that is already running from
one CPU to another. The credit scheduler won''t do this. It
moves something that is on a CPU''s runq but hasn''t had a
chance to run on that CPU.
> Or perhaps more like this:
> Time     CPU1  CPU2
> 0ms      A     B
> 30ms     A     C
> 60ms     B     C
> 90ms     B     A
> 120ms    C     A
> ...etc...
> 
> Which gives 60ms slices (warmer cache, yummi) but still with a
> time-to-wait for scheduling of 30ms.
> 
> The latter obviously being a better algorithm, cache-wise...
The credit scheduler behaves like this latter example:

Basically, once a VCPU has ran two consecutive time-slices,
its physical CPU goes and looks for a VCPU on the other CPU''s
runq which hasn''t yet ran its fair share. It''s the very act
of running two consecutive time-slices on a PCPU while two
other VCPUs are time-slicing on the other PCPU which causes
the fair share imbalance which then causes the migration.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jun 2006 - credit based scheduler

[Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler

Re: [Xen-devel] credit based scheduler