thr3ads.net - Xen devel - [Xen-devel] Power aware credit scheduler [Jun 2008]

If this information is useful, please help other people find it:
Share via:

Tian, Kevin

2008-Jun-19 04:51 UTC

[Xen-devel] Power aware credit scheduler

Existing credit scheduler is not power aware. To achieve better 
power saving ability with negligible performance impact, following 
areas may be tweaked and listed here for comments first.

Goal is not to silly save power with sacrifice of performance, e.g. 
we don''t want to prevent migration when there''re free cpus
with
some pending runqueues. But when free computing power is more 
than existing requirement, power aware policy can be pushed to 
choose a less power-intrusive decision. Of course even in latter 
case, it''s controllable with a scheduler parameter like 
csched_private.power and exposed to user.

----

a) when there''s more idle cpus than required

a.1) csched_cpu_pick
	Existing policy is to pick one with more idle neighbours, 
to avoid shared resource contention among cores or threads. 
However from power P.O.V, package C-state saves much more 
power than per-core C-state vehicle. From this angle, it might be 
better to keep idle package continuously idle, while picking idle 
cores/threads with busy neighbours already, if csched_private.
power is set. The performance/watt ratio is positively incremented 
though absolute performance is kicked a bit.

a.2) csched_vcpu_wake
	Similar as above, instead of blindly kick all idle cpus in 
a rush, some selective knock can be pushed with power factor
concerned.

----

b) when physical cpu resides in idle C-state
	Avoid unnecessary work to keep longer C-state residency.
For example, accouting process (tick timer, more specifically)
can be stopped before C-state entrance and then resumed after
waking up. The point is that no accounting is required when current
cpu is idle, and any runqueue change triggering from other cpus
incurs a IPI to this cpu which effectively breaks it back to C0 
state with accounting resumed. Since the residency period may
be longer than accouting period (30ms), csched_tick should be
aware of resume event to adjust elapsed credits.

----

c) when cpu''s freq is scaled dynamically
	When cpufreq/Px is enabled, cpu''s frequency is adjusted
to different operation points driven by a on-demand governor. So
csched_acct may need take frequency difference among cpus into
consideration and total available credits won''t be a simple 300 *
online cpu_number. 

----

Of course there''re bunch of research areas to add more power
factor into scheduler policy. But above is fundamental stuff which
we believe would help scheduler understand power requirement 
and not incurs bad impact to performance/watt first.

Comments are appreciated.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Jun-19 05:31 UTC

head link

RE: [Xen-devel] Power aware credit scheduler

>From: Tian, Kevin
>Sent: 2008年6月19日 12:52
>----
>
>c) when cpu''s freq is scaled dynamically
>	When cpufreq/Px is enabled, cpu''s frequency is adjusted
>to different operation points driven by a on-demand governor. So
>csched_acct may need take frequency difference among cpus into
>consideration and total available credits won''t be a simple 300 *
>online cpu_number. 
>
Not accurate above. Credit scheduler can''t anticipate the cpu freq
in next accounting 30ms phase, and thus it can still only assume
total credit budget as 300 * online cpu_number for allocation. The 
question is whether we need substract credit in 10ms vcpu account
multiplying a freq ratio. But it seems that two issues are along with
this approach:

a) total budget is counted inconsistently as vcpu accounting, which 
may bring inaccurate vision to credit scheduler, like balance, etc.

b) it''s not easy to get accurate freq ratio on target cpu. Some cpus
may not support, and on-demand governor runs async with credit
tick timer. Also query remote cpu normaly incurs inter cpu traffic.
Maybe on-demand governor can be put align with credit tick timer
with same interval, which may solve the freq query issue.

It looks complex than initial thought. Also considering on-demand
governor will scale freq back immediately when there''re more real
work to be done, this may not show a real impact in reality. We''ll
keep an eye in future tune to see whether it matters. :-)

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Jun-19 07:31 UTC

head link

Re: [Xen-devel] Power aware credit scheduler

On 19/6/08 05:51, "Tian, Kevin" <kevin.tian@intel.com> wrote:
> b) when physical cpu resides in idle C-state
> Avoid unnecessary work to keep longer C-state residency.
> For example, accouting process (tick timer, more specifically)
> can be stopped before C-state entrance and then resumed after
> waking up. The point is that no accounting is required when current
> cpu is idle, and any runqueue change triggering from other cpus
> incurs a IPI to this cpu which effectively breaks it back to C0
> state with accounting resumed. Since the residency period may
> be longer than accouting period (30ms), csched_tick should be
> aware of resume event to adjust elapsed credits.
Yes, this should be an easy low-hanging fruit to fix.
> c) when cpu''s freq is scaled dynamically
> When cpufreq/Px is enabled, cpu''s frequency is adjusted
> to different operation points driven by a on-demand governor. So
> csched_acct may need take frequency difference among cpus into
> consideration and total available credits won''t be a simple 300 *
> online cpu_number.
Not sure. I think the current governor runs frequently to react to the
scheduler (i.e., try to keep the CPU non-idle by downscaling frequency;
upscale frequency when the CPU gets busy; both these done over sub-second
timescales). Does it then make sense to have the scheduler react to
governor? Sounds like it could be a weird feedback loop.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Jun-19 08:03 UTC

head link

RE: [Xen-devel] Power aware credit scheduler

>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] 
>Sent: 2008年6月19日 15:32
>
>
>> c) when cpu''s freq is scaled dynamically
>> When cpufreq/Px is enabled, cpu''s frequency is adjusted
>> to different operation points driven by a on-demand governor. So
>> csched_acct may need take frequency difference among cpus into
>> consideration and total available credits won''t be a simple
300 *
>> online cpu_number.
>
>Not sure. I think the current governor runs frequently to react to the
>scheduler (i.e., try to keep the CPU non-idle by downscaling frequency;
>upscale frequency when the CPU gets busy; both these done over 
>sub-second
>timescales).
Yes, normally it''s based at 20ms level.
>Does it then make sense to have the scheduler react to
>governor? Sounds like it could be a weird feedback loop.
>
Good suggestion. We''re considering adding some more inputs from
key components into on-demand governor, instead of simply polling
busy ratio for freq change in a fixed interval. For example, when one
cpu pulls vcpu from other runqueues, it''s the indicator that its
current
freq may not fit and it''s better to scale to max immediately instead 
of waiting for next 20ms check timer. Other indicators like interrupt,
event, etc. You kick a good instance. :-)

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Jun-19 08:09 UTC

head link

Re: [Xen-devel] Power aware credit scheduler

On 19/6/08 09:03, "Tian, Kevin" <kevin.tian@intel.com> wrote:
>> Does it then make sense to have the scheduler react to
>> governor? Sounds like it could be a weird feedback loop.
>> 
> 
> Good suggestion. We''re considering adding some more inputs from
> key components into on-demand governor, instead of simply polling
> busy ratio for freq change in a fixed interval. For example, when one
> cpu pulls vcpu from other runqueues, it''s the indicator that its
current
> freq may not fit and it''s better to scale to max immediately
instead
> of waiting for next 20ms check timer. Other indicators like interrupt,
> event, etc. You kick a good instance. :-)
I see. This specific example doesn''t sound unreasonable. I suppose
experimental data will show what works and what doesn''t.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Jun-19 08:11 UTC

head link

RE: [Xen-devel] Power aware credit scheduler

>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] 
>Sent: 2008年6月19日 16:10
>On 19/6/08 09:03, "Tian, Kevin" <kevin.tian@intel.com>
wrote:
>
>>> Does it then make sense to have the scheduler react to
>>> governor? Sounds like it could be a weird feedback loop.
>>> 
>> 
>> Good suggestion. We''re considering adding some more inputs
from
>> key components into on-demand governor, instead of simply polling
>> busy ratio for freq change in a fixed interval. For example, when one
>> cpu pulls vcpu from other runqueues, it''s the indicator that 
>its current
>> freq may not fit and it''s better to scale to max immediately
instead
>> of waiting for next 20ms check timer. Other indicators like 
>interrupt,
>> event, etc. You kick a good instance. :-)
>
>I see. This specific example doesn''t sound unreasonable. I suppose
>experimental data will show what works and what doesn''t.
>
Yes, and we''ll start some experiment soon. Will let you know once we
get some concrete data.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-Jun-19 09:14 UTC

head link

RE: [Xen-devel] Power aware credit scheduler

> c) when cpu''s freq is scaled dynamically
> 	When cpufreq/Px is enabled, cpu''s frequency is adjusted
> to different operation points driven by a on-demand governor. So
> csched_acct may need take frequency difference among cpus into
> consideration and total available credits won''t be a simple 300 *
> online cpu_number.
We should also adjust the accounting of the credits consumed in light of
hyperthreading: we should scale the credit we subtract proportional to
the how much of the period was spent competing with another VCPU running
on alternate hyperthread (we can tell this by seeing how much time the
idle thread spent running on the other thread). 

We can then scale the accounting according to some rough notion of the
expected throughput of two hyperthreads e.g. experience on P4 CPU''s
suggests that a single VCPU will typically receive something like 65% of
its normal throughput when competing against another thread (total
throughput 130%). We thus scale the amount of credit subtracted between
65% and 100% depending on how much time was spent competing. 

There''s an argument that says we should at least have an option to
prevent VCPUs from different guests running against each other in
adjacent threads. This would be introducing a simple kind of gang
scheduling.

Ian  

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Jun-19 09:24 UTC

head link

RE: [Xen-devel] Power aware credit scheduler

>From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com] 
>Sent: 2008年6月19日 17:14
>
>> c) when cpu''s freq is scaled dynamically
>> 	When cpufreq/Px is enabled, cpu''s frequency is adjusted
>> to different operation points driven by a on-demand governor. So
>> csched_acct may need take frequency difference among cpus into
>> consideration and total available credits won''t be a simple
300 *
>> online cpu_number.
>
>We should also adjust the accounting of the credits consumed 
>in light of
>hyperthreading: we should scale the credit we subtract proportional to
>the how much of the period was spent competing with another 
>VCPU running
>on alternate hyperthread (we can tell this by seeing how much time the
>idle thread spent running on the other thread). 
>
>We can then scale the accounting according to some rough notion of the
>expected throughput of two hyperthreads e.g. experience on P4 CPU''s
>suggests that a single VCPU will typically receive something 
>like 65% of
>its normal throughput when competing against another thread (total
>throughput 130%). We thus scale the amount of credit subtracted between
>65% and 100% depending on how much time was spent competing. 
>
>There''s an argument that says we should at least have an option to
>prevent VCPUs from different guests running against each other in
>adjacent threads. This would be introducing a simple kind of gang
>scheduling.
>
Well, when such scale can be or should be applied to other facets,
original proposal on freq side doesn''t apply as I replied to myself in
another mail, since credit scheduler can''t anticipate the freq
distribution
in next accounting phase, unless freq change is controlled by 
scheduler fully. But we''ll experiment adding scheduler input into freq
governor as discussed with Keir. :-)

BTW, when such scale concepts takes more factors as mentioned
above, original tick based accounting seems more unconformable when
there''s no direct map between tick and credit...

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Emmanuel Ackaouy

2008-Jun-19 13:09 UTC

head link

[Xen-devel] Re: Power aware credit scheduler

Hi Kevin.

I''m glad you''re looking at this. There are a bunch of
interesting
areas to look at to improve scheduling on large hierarchical
systems. The idle loop is at the center of most of them.

On Jun 19, 2008, at 6:51 , Tian, Kevin wrote:> a) when there''s more idle cpus than required
>
> a.1) csched_cpu_pick
> 	Existing policy is to pick one with more idle neighbours,
> to avoid shared resource contention among cores or threads.
> However from power P.O.V, package C-state saves much more
> power than per-core C-state vehicle. From this angle, it might be
> better to keep idle package continuously idle, while picking idle
> cores/threads with busy neighbours already, if csched_private.
> power is set. The performance/watt ratio is positively incremented
> though absolute performance is kicked a bit.
Regardless of any new knobs, a good default behavior might be
to only take a package out of C-state when another non-idle
package has had more than one VCPU active on it over some
reasonable amount of time.

By default, putting multiple VCPUs on the same physical package
when other packages are idle is obviously not always going to
be optimal. Maybe it''s not a bad default for VCPUs that are
related (same VM or qemu)? I think Ian P hinted at this. But it
frightens me that you would always do this by default for any set
of VCPUs. Power saving is good but so is memory bandwidth

> a.2) csched_vcpu_wake
> 	Similar as above, instead of blindly kick all idle cpus in
> a rush, some selective knock can be pushed with power factor
> concerned.
Yeah, you will need to rewrite the idle kick code. This can be
tricky because a CPU''s idle state might change by the time it
processes a "scheduling IPI" and you need to be careful that
a runnable VCPU doesn''t sit on a runqueue when there is at
least one idle CPU in the system.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-Jun-19 13:30 UTC

head link

RE: [Xen-devel] Power aware credit scheduler

> Well, when such scale can be or should be applied to other facets,
> original proposal on freq side doesn''t apply as I replied to
myself in
> another mail, since credit scheduler can''t anticipate the freq
> distribution
> in next accounting phase, unless freq change is controlled by
> scheduler fully. But we''ll experiment adding scheduler input into
freq
> governor as discussed with Keir. :-)
That''s OK -- it''s fine to account in arrears, and doing so
will have the
right influence on how we schedule things in the future.  That''s why
it''s important to move from tick accounting to absolute. 

Ian
> BTW, when such scale concepts takes more factors as mentioned
> above, original tick based accounting seems more unconformable when
> there''s no direct map between tick and credit...
> 
> Thanks,
> Kevin
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Jun-19 13:32 UTC

head link

[Xen-devel] RE: Power aware credit scheduler

>From: Emmanuel Ackaouy [mailto:ackaouy@gmail.com] 
>Sent: 2008年6月19日 21:09
>
>Hi Kevin.
>
>I''m glad you''re looking at this. There are a bunch of
interesting
>areas to look at to improve scheduling on large hierarchical
>systems. The idle loop is at the center of most of them.
Agree.
>
>On Jun 19, 2008, at 6:51 , Tian, Kevin wrote:
>> a) when there''s more idle cpus than required
>>
>> a.1) csched_cpu_pick
>> 	Existing policy is to pick one with more idle neighbours,
>> to avoid shared resource contention among cores or threads.
>> However from power P.O.V, package C-state saves much more
>> power than per-core C-state vehicle. From this angle, it might be
>> better to keep idle package continuously idle, while picking idle
>> cores/threads with busy neighbours already, if csched_private.
>> power is set. The performance/watt ratio is positively incremented
>> though absolute performance is kicked a bit.
>
>Regardless of any new knobs, a good default behavior might be
>to only take a package out of C-state when another non-idle
>package has had more than one VCPU active on it over some
>reasonable amount of time.
>
>By default, putting multiple VCPUs on the same physical package
>when other packages are idle is obviously not always going to
>be optimal. Maybe it''s not a bad default for VCPUs that are
>related (same VM or qemu)? I think Ian P hinted at this. But it
>frightens me that you would always do this by default for any set
>of VCPUs. Power saving is good but so is memory bandwidth
To enable this feature depends on a control command from system
adminstrator, who knows the tradeoff. From absolute performance 
P.O.V, I believe it''s not optimal. However if looking from the 
performance/watt, i.e. power efficiency angle, power saving due to
package level idle may overwhelm performance impact by keeping
activity in other package. Of course finally memory latency should
be also considered in NUMA system, as you mentioned.

Note that we''ll never keep one package idle when other package
already has vcpu pending in runqueue. Even when such power
aware feature is configured, it only happens when cpu number is
larger than runnable vcpu number.

Just like what prevalent OS provides to choose user''s own 
profiles... :-)
>
>
>> a.2) csched_vcpu_wake
>> 	Similar as above, instead of blindly kick all idle cpus in
>> a rush, some selective knock can be pushed with power factor
>> concerned.
>
>Yeah, you will need to rewrite the idle kick code. This can be
>tricky because a CPU''s idle state might change by the time it
>processes a "scheduling IPI" and you need to be careful that
>a runnable VCPU doesn''t sit on a runqueue when there is at
>least one idle CPU in the system.
>
I understand above caveats but not sure I catch exactly how it''s
related to possible change. Could you elaborate a bit? How does
above concerns get handled in current logic?

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Jun-19 13:42 UTC

head link

RE: [Xen-devel] Power aware credit scheduler

>From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com] 
>Sent: 2008年6月19日 21:31
>
>> Well, when such scale can be or should be applied to other facets,
>> original proposal on freq side doesn''t apply as I replied to 
>myself in
>> another mail, since credit scheduler can''t anticipate the freq
>> distribution
>> in next accounting phase, unless freq change is controlled by
>> scheduler fully. But we''ll experiment adding scheduler input 
>into freq
>> governor as discussed with Keir. :-)
>
>That''s OK -- it''s fine to account in arrears, and doing so
>will have the
>right influence on how we schedule things in the future.  That''s
why
>it''s important to move from tick accounting to absolute. 
>
OK, then that''ll be some mutual inputs between scheduler and
freq governor...

Thanks
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Emmanuel Ackaouy

2008-Jun-19 14:38 UTC

head link

[Xen-devel] Re: Power aware credit scheduler

On Jun 19, 2008, at 15:32 , Tian, Kevin wrote:>> Regardless of any new knobs, a good default behavior might be
>> to only take a package out of C-state when another non-idle
>> package has had more than one VCPU active on it over some
>> reasonable amount of time.
>>
>> By default, putting multiple VCPUs on the same physical package
>> when other packages are idle is obviously not always going to
>> be optimal. Maybe it''s not a bad default for VCPUs that are
>> related (same VM or qemu)? I think Ian P hinted at this. But it
>> frightens me that you would always do this by default for any set
>> of VCPUs. Power saving is good but so is memory bandwidth
>
> To enable this feature depends on a control command from system
> adminstrator, who knows the tradeoff. From absolute performance
> P.O.V, I believe it''s not optimal. However if looking from the
> performance/watt, i.e. power efficiency angle, power saving due to
> package level idle may overwhelm performance impact by keeping
> activity in other package. Of course finally memory latency should
> be also considered in NUMA system, as you mentioned.
I''m saying something can be done to improve power saving in
the current system without adding a knob. Perhaps you can give
the admin even more power saving abilities with a knob, but it
makes sense to save power when performance is not impacted,
regardless of any knob position.

Also, note I mentioned memory BANDWIDTH and not latency.
It''s not the same thing. And I wasn''t just thinking about NUMA
systems.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Emmanuel Ackaouy

2008-Jun-19 15:40 UTC

head link

Re: [Xen-devel] Power aware credit scheduler

On Jun 19, 2008, at 15:30 , Ian Pratt wrote:> That''s OK -- it''s fine to account in arrears, and doing
so will have
> the
> right influence on how we schedule things in the future.  That''s
why
> it''s important to move from tick accounting to absolute.
I actually still don''t agree it''s important to move from tick
accounting
to absolute. CPU wall clock time is an approximation of service to
start with. From the point of view of basic short term fairness and
load balancing, tick based accounting works well and is simple to
scale.

Accounting for shared resources of physical CPUs makes sense,
be it caches or memory buses (or the pipeline in the hyperthread
case). But you can''t really do that precisely: 2 CPUs may share a
memory bus, but perhaps one of them is compute bound out of its
L1 cache. What is the point of precisely measuring wall clock CPU
time if you''re then going to multiply that number by some constant
that may or may not reflect the real impact of resource sharing in
that case?

IMO, the more pressing problem is to approximately account for
shared physical resources and scale the cpu_pick() and cpu_kick()
mechanisms to improve efficiency on medium and large hierarchical
systems. It''s probably ok to approximate the cost of sharing physical
resources using reasonable constants (ie 0.65 when co-scheduled
on hyperthreads).

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Jun-20 00:40 UTC

head link

[Xen-devel] RE: Power aware credit scheduler

>From: Emmanuel Ackaouy [mailto:ackaouy@gmail.com] 
>Sent: 2008年6月19日 22:38
>
>On Jun 19, 2008, at 15:32 , Tian, Kevin wrote:
>>> Regardless of any new knobs, a good default behavior might be
>>> to only take a package out of C-state when another non-idle
>>> package has had more than one VCPU active on it over some
>>> reasonable amount of time.
>>>
>>> By default, putting multiple VCPUs on the same physical package
>>> when other packages are idle is obviously not always going to
>>> be optimal. Maybe it''s not a bad default for VCPUs that
are
>>> related (same VM or qemu)? I think Ian P hinted at this. But it
>>> frightens me that you would always do this by default for any set
>>> of VCPUs. Power saving is good but so is memory bandwidth
>>
>> To enable this feature depends on a control command from system
>> adminstrator, who knows the tradeoff. From absolute performance
>> P.O.V, I believe it''s not optimal. However if looking from the
>> performance/watt, i.e. power efficiency angle, power saving due to
>> package level idle may overwhelm performance impact by keeping
>> activity in other package. Of course finally memory latency should
>> be also considered in NUMA system, as you mentioned.
>
>I''m saying something can be done to improve power saving in
>the current system without adding a knob. Perhaps you can give
>the admin even more power saving abilities with a knob, but it
>makes sense to save power when performance is not impacted,
>regardless of any knob position.
Then I agree. It''s always good to have one improved with the other
immune, or fix some hindering both first. Then we''ll also compare 
whether a knob can shoot for obviously better result.
>
>Also, note I mentioned memory BANDWIDTH and not latency.
>It''s not the same thing. And I wasn''t just thinking about
NUMA
>systems.
>
Thanks for pointing out. I misread fast. But I''m not sure how memory
bandwidth is affected by the vcpu scheduling. Do you mean more mem
traffic involved in bus due to shared cache contention when multiple 
vcpus are running in same package? It then may be workload specific
and others may not be affected to same extent. But this is good hint
that we''ll keep such workload in experiment when doing the change.
Also consider vcpu/domain relationship is one thing we can try. The
basic direction will be first go simple to see the effect.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Jun-20 00:57 UTC

head link

RE: [Xen-devel] Power aware credit scheduler

>From: Emmanuel Ackaouy [mailto:ackaouy@gmail.com] 
>Sent: 2008年6月19日 23:40
>
>On Jun 19, 2008, at 15:30 , Ian Pratt wrote:
>> That''s OK -- it''s fine to account in arrears, and
doing so
>will have  
>> the
>> right influence on how we schedule things in the future. 
That''s why
>> it''s important to move from tick accounting to absolute.
>
>I actually still don''t agree it''s important to move from
tick
>accounting
>to absolute. CPU wall clock time is an approximation of service to
>start with. From the point of view of basic short term fairness and
>load balancing, tick based accounting works well and is simple to
>scale.
>
>Accounting for shared resources of physical CPUs makes sense,
>be it caches or memory buses (or the pipeline in the hyperthread
>case). But you can''t really do that precisely: 2 CPUs may share a
>memory bus, but perhaps one of them is compute bound out of its
>L1 cache. What is the point of precisely measuring wall clock CPU
>time if you''re then going to multiply that number by some constant
>that may or may not reflect the real impact of resource sharing in
>that case?
>
I''m not sure how fairness is ensured in my posted example in first
mail with a tick-based accounting. Maybe, long-term fairness is still
approximately achieved in average, but at last micro-accounting level 
may not perform well which impacts guest with such requirement.

The effect of precisely accounting with multiply is hard to judge now
without some experiments to prove. However to be absolute without
multiply is still more natural way to go, IMO.
>IMO, the more pressing problem is to approximately account for
>shared physical resources and scale the cpu_pick() and cpu_kick()
>mechanisms to improve efficiency on medium and large hierarchical
>systems. It''s probably ok to approximate the cost of sharing
physical
>resources using reasonable constants (ie 0.65 when co-scheduled
>on hyperthreads).
>
This is good.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Maybe Matching Threads

Search for more possibly parallel threads

Xen devel - Jun 2008 - Power aware credit scheduler

[Xen-devel] Power aware credit scheduler

RE: [Xen-devel] Power aware credit scheduler

Re: [Xen-devel] Power aware credit scheduler

RE: [Xen-devel] Power aware credit scheduler

Re: [Xen-devel] Power aware credit scheduler

RE: [Xen-devel] Power aware credit scheduler

RE: [Xen-devel] Power aware credit scheduler

RE: [Xen-devel] Power aware credit scheduler

[Xen-devel] Re: Power aware credit scheduler

RE: [Xen-devel] Power aware credit scheduler

[Xen-devel] RE: Power aware credit scheduler

RE: [Xen-devel] Power aware credit scheduler

[Xen-devel] Re: Power aware credit scheduler

Re: [Xen-devel] Power aware credit scheduler

[Xen-devel] RE: Power aware credit scheduler

RE: [Xen-devel] Power aware credit scheduler

Maybe Matching Threads