Is there a plan to make the credit based scheduler the default scheduler? Also, is there any performance data available for this scheduler? Regards, K. Y _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ky Srinivasan wrote:> Is there a plan to make the credit based scheduler the default > scheduler? Also, is there any performance data available for this > scheduler? >It is already the default in -unstable as of a few days ago... I can''t wait until it gets put into the -testing tree.. Thanks, Matt Ayres _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
> Is there a plan to make the credit based scheduler the default > scheduler? Also, is there any performance data available for this > scheduler?It has been made the default from yesterday (xen-unstable cset 10459:a31f3bff4f76) so that we can get more widespread usage before calling 3.0.3 Performance is "pretty good", and certainly fixes some of the previous behaviour with sedf, but I don''t have any specific numbers to hand. Emmanuel may have more more details? cheers, S. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Thanks Matt. K. Y>>> On Wed, Jun 21, 2006 at 10:26 AM, in message<44995709.9000802@tektonic.net>, Matt Ayres <matta@tektonic.net> wrote:> > Ky Srinivasan wrote: >> Is there a plan to make the credit based scheduler the default >> scheduler? Also, is there any performance data available for this >> scheduler? >> > > It is already the default in - unstable as of a few days ago... Ican''t> wait until it gets put into the - testing tree.. > > Thanks, > Matt Ayres_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 21 Jun 2006, at 15:26, Matt Ayres wrote:>> Is there a plan to make the credit based scheduler the default >> scheduler? Also, is there any performance data available for this >> scheduler? > > It is already the default in -unstable as of a few days ago... I can''t > wait until it gets put into the -testing tree..That won''t happen until 3.0.3 (which will be a sweep-thru of unstable into a new branch of testing). Probably a few weeks away. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jun 21, 2006 at 03:26:48PM +0100, Steven Hand wrote:> > Is there a plan to make the credit based scheduler the default > > scheduler? Also, is there any performance data available for this > > scheduler? > > It has been made the default from yesterday (xen-unstable cset > 10459:a31f3bff4f76) so that we can get more widespread usage > before calling 3.0.3 > > Performance is "pretty good", and certainly fixes some of the > previous behaviour with sedf, but I don''t have any specific > numbers to hand. Emmanuel may have more more details?I responded to a similar enquiry from Anthony a few weeks ago: http://lists.xensource.com/archives/html/xen-devel/2006-05/msg01346.html Short summary: I''ve seen improvements over the other schedulers with everything I have tried. Let''s not forget though: Raw performance numbers on various benchmarks are useful. But the scheduler''s responsabilities also include defining and enforcing fairness among all active guests. I''ll add the following about latency and throughput philosophy: The credit scheduler is an SMP CPU scheduler. It is designed such that no physical CPU will ever idle if there is a waiting runnable VCPU on the host. We call this being work conserving. This is in contrast to other schedulers which are UP CPU schedulers glued together with higher level tools to migrate VCPUs between physical CPUs. For example, consider an 4CPU server hosting 32 concurrent web server guests. Being work conserving accross all physical CPUs, the credit scheduler is going to give you a lot more total system throughput (more web server guests per physical host) and better per-guest response times (HTTP ops/sec) too. Additionally, the credit scheduler ensures that all guests throughout the system get a fair share of all host CPU resources: 3 active UP guests on a 2CPU host each get 33% of the total CPU available; The credit scheduler takes care of migrating guest VCPUs across physical CPUs to achieve this transparently. If you are in the business of consolidating servers onto one physical box, this is pretty important. Finally, there is a question of when to preempt a running domain to run another runnable one. The credit scheduler runs VCPUs in 30millisecond time-slices by default (That''s about the time it takes the human eye to focus or notice queueing latency). A VCPU being woken up will preempt the time-slice of a running VCPU only if the latter has already run more than its fair share and the former has not. A VCPU is assigned a fair share of the system if it is "active". To be considered "active", a VCPU must be runnable at least once in the time it takes it to fairly "earn" one time-slice. Consider two competing VCPUs on a UP host: A is spinning. B is doing I/O (it''s sleeps and wakes up much more often than every 30 milliseconds). Both A and B are considered "active". If B is not able to consume its fair share of the CPU because it is constantly waiting for I/O, it will at least preempt A every time it becomes runnable. B gets good service in the way of short latencies from being runnable to actually running. On the other hand if B only rarely does an I/O operation but consumes the CPU otherwise, it will not always preempt other running VCPUs when it wakes up. That would be "unfair" to the other VCPUs. The credit scheduler uses preemption to provide good service to latency sensitive guests, but in a fair way: I/O bound VCPUs will not starve out compute bound ones by constantly preempting them. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel Ackaouy wrote:> The credit scheduler is an SMP CPU scheduler. It is designed such > that no physical CPU will ever idle if there is a waiting > runnable VCPU on the host. We call this being work conserving.> 3 active UP guests on a 2CPU host each get 33% of the total CPU > available; The credit scheduler takes care of migrating guest > VCPUs across physical CPUs to achieve this transparently.Migrating 1 VCPU across 2 CPUs, or swapping all 3 VCPUs around in a ring-fashion? Sounds like this algorithm is an evil killer of CPU cache performance to me :-). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel, I agree with your analysis. As I think more about this, affinity based scheduling in the guest would run into trouble even without changing the mapping between vcpu and pcpu! (just running other domains on the pcpu may invalidate the cache state). We may need to look at how best to preserve the cache state for domains that need it. Some form of exclusive VCPU pinning may be the answer here (the VCPU will run on the pcpu it is pinned on and furthermore the pcpu will only run the vcpu that has the exclusive binding). It might also be useful to notify the guest when vcpu-pcpu binding changes (via the shared page). Regards, K. Y>>> On Wed, Jun 21, 2006 at 2:13 PM, in message<20060621181356.GA20321@cockermouth.uk.xensource.com>, Emmanuel Ackaouy <ack@xensource.com> wrote:> On Wed, Jun 21, 2006 at 11:41:14AM - 0600, Ky Srinivasan wrote: >> with sedf. Have you looked at the implication of load balancing inthe>> hypervisor on scheduling policies implemented in the guest os. For >> instance if the guest is implementing CPU affinity and as part ofvcpu>> load balancing we decide to change the mapping between the vcpu andthe>> physical cpu in the hypervisor, the scheduling decisions taken inthe>> guest would be bogus. > > True. There is a tradeoff between keeping a VCPU waiting to > run on a particular physical CPU and running it elsewhere > right away. > > I think the only case we would prefer not to move a waiting > VCPU from the physical CPU it last ran on to an idle one is: > > The VCPU very recently stopped running on said CPU, and > It has warmed its cache considerably, and > Before the VCPU gets to run on the CPU again, > Very little will have run on the CPU, and > The cache will not have been significantly blown. > > Basically, this says: It''s bad to move a VCPU if it has a > physical CPU pretty much to itself. > > But if a VCPU has a PCPU pretty much to itself, it''s very > unlikely it will end up sitting on that PCPU''s runq long > enough to be picked up by another PCPU. > > I think the simple thing to do here and a good rule of thumb > in general is not to allow cycles to go idle when there is > runnable work. If you can think of a counter example though, > I''d love to consider it and perhaps make some changes._______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Wed, Jun 21, 2006 at 09:39:17PM +0200, Molle Bestefich wrote:> Emmanuel Ackaouy wrote: > Sounds like this algorithm is an evil killer of CPU cache performance to me > :-).I don''t agree. First of all, if you were to time slice two VCPUs on one PCPU and dedicate the other PCPU for the 3rd VCPU, you''d still be time-slicing on one PCPU and you''d pay context switching and cache sharing costs there. Second, migration or not, once you start time-slicing, you have to deal with cache warming costs. The important thing is to run long enough time slices to take advantage of the cache. Third, there are many applications where latency to run is more important than wasting idle CPU cycles in order to run on a warm cache. Fourth, in the case we were talking about of 3 active VCPUs on a 2PCPU host: Without migration, 2 VCPUs time-slice and get 25% of the total host CPU resources each. 1 VCPU has 50% and does not time-slice. With migration, all 3 VCPUs time-slice and get 33% of the total CPU resources. Two out of three VCPUs do better with migration and you end up with fair sharing of system-wide CPU resources too. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 21 Jun 2006, at 22:26, Emmanuel Ackaouy wrote:> Second, migration or not, once you start time-slicing, you have > to deal with cache warming costs. The important thing is to run > long enough time slices to take advantage of the cache.Yep, locality across timeslices doesn''t really help. If you''ve run a different VCPU between timeslices then the caches will be cold. Even if there is a small cache advantage, it''s very unlikely to balance the cost of deliberately leaving CPUs idle. Things may be more interesting in NUMA environments, but there''s always the per-VCPU affinity map to restrict cross-CPU scheduling to within a NUMA node. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Emmanuel Ackaouy wrote:> Let me know if you have any other questions. > It''s useful to discuss design philosophy on the list.Well, if you don''t mind! :-)... The first sentence was actually a question too:> Migrating 1 VCPU across 2 CPUs, or > swapping all 3 VCPUs around in a ring-fashion?To explain myself a bit more, I was wondering whether the scheduler conceptually can be said to work a little like this (VCPUs mapped to CPUs in a "ring-fashion"): (VCPUs = A, B, C) Time CPU1 CPU2 0ms A B 30ms B C 60ms C A 90ms A B 120ms B C ...etc... This would give 30ms slices with a time-to-wait for scheduling of 30ms. Or perhaps more like this: Time CPU1 CPU2 0ms A B 30ms A C 60ms B C 90ms B A 120ms C A ...etc... Which gives 60ms slices (warmer cache, yummi) but still with a time-to-wait for scheduling of 30ms. The latter obviously being a better algorithm, cache-wise... Hope it''s not stupid questions :-). _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Jun 22, 2006 at 12:26:02PM +0200, Molle Bestefich wrote:> To explain myself a bit more, I was wondering whether the scheduler > conceptually can be said to work a little like this (VCPUs mapped to > CPUs in a "ring-fashion"): > > (VCPUs = A, B, C) > > Time CPU1 CPU2 > 0ms A B > 30ms B C > 60ms C A > 90ms A B > 120ms B C > ...etc... > > This would give 30ms slices with a time-to-wait for scheduling of 30ms.Here, you''re moving something that is already running from one CPU to another. The credit scheduler won''t do this. It moves something that is on a CPU''s runq but hasn''t had a chance to run on that CPU.> Or perhaps more like this: > Time CPU1 CPU2 > 0ms A B > 30ms A C > 60ms B C > 90ms B A > 120ms C A > ...etc... > > Which gives 60ms slices (warmer cache, yummi) but still with a > time-to-wait for scheduling of 30ms. > > The latter obviously being a better algorithm, cache-wise...The credit scheduler behaves like this latter example: Basically, once a VCPU has ran two consecutive time-slices, its physical CPU goes and looks for a VCPU on the other CPU''s runq which hasn''t yet ran its fair share. It''s the very act of running two consecutive time-slices on a PCPU while two other VCPUs are time-slicing on the other PCPU which causes the fair share imbalance which then causes the migration. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel