thr3ads.net - Xen devel - [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler [Dec 2008]

If this information is useful, please help other people find it:
Share via:

Atsushi SAKAI

2008-Dec-08 06:34 UTC

[Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Hi,

This patch intends to accurate vcpu weighting
for CPU intensive job.

The reason of this problem is that
vcpu round-robin queue blocks large weight vcpus
by small weight vcpus.

For example, we assume following case on 2pcpu environment.
(with 4domains (each domain has 2vcpus))

dom1 vcpu0,1 w128 credit  4
dom2 vcpu0,1 w128 credit  4
dom3 vcpu0,1 w256 credit  8
dom4 vcpu0,1 w512 credit 15


d4v0 gets 15ms credit each time.
but if 3vcpus are blocking,(like d1v0, d2v0, d3v0)
d4v0 credit becomes over 30msec(45msec=15x3times).
Then d4v0 credit cleared.
This makes d4v0 uses smaller credit than expected.
This problem also occurs on d4v1 case.
(blocked by d1v1, d2v1, d3v1)


In my case, xentop shows following % for each domain.
dom1 27
dom2 28
dom3 53
dom4 88

After this patch applied, each domain has following %.
dom1 25
dom2 25
dom3 49
dom4 99

This patch adds condition that 
"credit clear function" should work
when vcpu is not runnable.

 sched_credit.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Signed-off-by: Atsushi SAKAI <sakaia@jp.fujitsu.com>


Thanks
Atsushi SAKAI

 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2008-Dec-08 11:01 UTC

head link

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Below, when you say "blocking", I assume you don''t mean it in
the
"do_block()" sense (i.e., are taken off the runqueue), but instead in
a more conventional sense; i.e., d[1-3]v0 are ahead of d4v0 in the
runqueue, and are thus "blocking" d4v0 from running?

Hmm... I can see how the "credit reset" is hurting the proportional
fairness.  The problem is I can''t quite see what the credit reset was
there for in the first place, so I can''t tell if this is going to
screw up some other corner case that it was designed to solve.  Why
not, for instance, just get rid of the conditional altogether?

Emmanuel, would you mind commenting?

 -George

On Mon, Dec 8, 2008 at 6:34 AM, Atsushi SAKAI <sakaia@jp.fujitsu.com>
wrote:> Hi,
>
> This patch intends to accurate vcpu weighting
> for CPU intensive job.
>
> The reason of this problem is that
> vcpu round-robin queue blocks large weight vcpus
> by small weight vcpus.
>
> For example, we assume following case on 2pcpu environment.
> (with 4domains (each domain has 2vcpus))
>
> dom1 vcpu0,1 w128 credit  4
> dom2 vcpu0,1 w128 credit  4
> dom3 vcpu0,1 w256 credit  8
> dom4 vcpu0,1 w512 credit 15
>
>
> d4v0 gets 15ms credit each time.
> but if 3vcpus are blocking,(like d1v0, d2v0, d3v0)
> d4v0 credit becomes over 30msec(45msec=15x3times).
> Then d4v0 credit cleared.
> This makes d4v0 uses smaller credit than expected.
> This problem also occurs on d4v1 case.
> (blocked by d1v1, d2v1, d3v1)
>
>
> In my case, xentop shows following % for each domain.
> dom1 27
> dom2 28
> dom3 53
> dom4 88
>
> After this patch applied, each domain has following %.
> dom1 25
> dom2 25
> dom3 49
> dom4 99
>
> This patch adds condition that
> "credit clear function" should work
> when vcpu is not runnable.
>
>  sched_credit.c |    7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> Signed-off-by: Atsushi SAKAI <sakaia@jp.fujitsu.com>
>
>
> Thanks
> Atsushi SAKAI
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Atsushi SAKAI

2008-Dec-09 07:33 UTC

head link

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Hi, George

"George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
> Below, when you say "blocking", I assume you don''t mean
it in the
> "do_block()" sense (i.e., are taken off the runqueue), but
instead in
> a more conventional sense; i.e., d[1-3]v0 are ahead of d4v0 in the
> runqueue, and are thus "blocking" d4v0 from running?
Yes, you are right.> 
> Hmm... I can see how the "credit reset" is hurting the
proportional
> fairness.  The problem is I can''t quite see what the credit reset
was
> there for in the first place, so I can''t tell if this is going to
> screw up some other corner case that it was designed to solve.  WhyI do various test sets for CPU intensive job.
Almost test works properly, but one test set is failed.
This is the patch to fix this and without no effect to other test sets.
> not, for instance, just get rid of the conditional altogether?
> You mean it should get rid of "credit reset"?



In any case, we should wait Emmnauel comments.

Thanks
Atsushi SAKAI


> Emmanuel, would you mind commenting?
> 
>  -George
> 
> On Mon, Dec 8, 2008 at 6:34 AM, Atsushi SAKAI <sakaia@jp.fujitsu.com>
wrote:
> > Hi,
> >
> > This patch intends to accurate vcpu weighting
> > for CPU intensive job.
> >
> > The reason of this problem is that
> > vcpu round-robin queue blocks large weight vcpus
> > by small weight vcpus.
> >
> > For example, we assume following case on 2pcpu environment.
> > (with 4domains (each domain has 2vcpus))
> >
> > dom1 vcpu0,1 w128 credit  4
> > dom2 vcpu0,1 w128 credit  4
> > dom3 vcpu0,1 w256 credit  8
> > dom4 vcpu0,1 w512 credit 15
> >
> >
> > d4v0 gets 15ms credit each time.
> > but if 3vcpus are blocking,(like d1v0, d2v0, d3v0)
> > d4v0 credit becomes over 30msec(45msec=15x3times).
> > Then d4v0 credit cleared.
> > This makes d4v0 uses smaller credit than expected.
> > This problem also occurs on d4v1 case.
> > (blocked by d1v1, d2v1, d3v1)
> >
> >
> > In my case, xentop shows following % for each domain.
> > dom1 27
> > dom2 28
> > dom3 53
> > dom4 88
> >
> > After this patch applied, each domain has following %.
> > dom1 25
> > dom2 25
> > dom3 49
> > dom4 99
> >
> > This patch adds condition that
> > "credit clear function" should work
> > when vcpu is not runnable.
> >
> >  sched_credit.c |    7 +++++--
> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >
> > Signed-off-by: Atsushi SAKAI <sakaia@jp.fujitsu.com>
> >
> >
> > Thanks
> > Atsushi SAKAI
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >
> >


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2008-Dec-09 10:25 UTC

head link

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI <sakaia@jp.fujitsu.com>
wrote:> You mean it should get rid of "credit reset"?
Yes, that''s exactly what I was thinking.  Removing the check for vcpus
on the runqueue may actually be functionally equivalent to removing
the check altogether.

I guess part of the thought might be that we don''t want to have a
single cpu running for a really long time without other cpus getting a
chance to run at all.  But in order for "weight" to have any meaning,
sometimes a cpu will have to get a long time to run.

What workload are you running -- is it just while(1); loops?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Atsushi SAKAI

2008-Dec-10 02:45 UTC

head link

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Hi, Emmanuel

1)rounding error for credit

This patch is over rounding error.
So I think it does not need to consider this effect.
If you think, would you suggest me your patch.
It seems changing CSCHED_TICKS_PER_ACCT is not enough.

2)Effect for I/O intensive job.

I am not change the code for BOOST priority.
I just changes "credit reset" condition.
It should be no effect on I/O intensive(but I am not measured it.)
If it needs, I will test it.
Which test is best for this change?
(Simple I/O test is not enough for this case, 
I think complex domain I/O configuration is needed to prove this patch effect.)

3)vcpu allocation measurement.

At first time, I use
  http://weather.ou.edu/~apw/projects/stress/
  stress --cpu xx --timeout xx --verbose
then I use simple test.(since 2vcpus on 1domain)
  yes > /dev/null &
  yes > /dev/null & 
Now I test with suggested method, then result is
     original   w/ patch
dom1    27        25
dom2    27        25
dom3    53        50
dom4    91        98


Thanks
Atsushi SAKAI




Emmanuel Ackaouy <ackaouy@gmail.com> wrote:
> On Dec 9, 2008, at 2:25, George Dunlap wrote:
> > On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI  
> > <sakaia@jp.fujitsu.com> wrote:
> >> You mean it should get rid of "credit reset"?
> >
> > Yes, that''s exactly what I was thinking.  Removing the check
for vcpus
> > on the runqueue may actually be functionally equivalent to removing
> > the check altogether.
> 
> Essentially, this code is there as a safeguard against rounding errors
> and other oddball cases. In theory, a runnable VCPU should seldom
> accumulate more than one time slice''s worth of credits.
> 
> The problem with your change is that a VCPU that is not a spinner
> but instead runs and sleeps may not be removed from the accounting
> list because when it should because it will not always be running when
> accounting and the check in question is performed. Potentially this will
> do very bad things for VCPUs that are I/O intensive or otherwise yield
> or sleep for a short time before consuming a full time slice.
> 
> One thing that may help here is to make the credit calculations less
> prone to rounding errors. One thing I had wanted to do while at
> XenSource but never got around to was to change the arithmetic
> so that instead of 30 credits representing a time slice, we would
> make this a much bigger number.
> 
> In this case for example, you would get credit allocations that had
> less significant rounding errors if you used 30000 instead of 30
> credits per time slice:
> 
> dom1 vcpu0,1 w128 credit 3750
> dom2 vcpu0,1 w128 credit 3750
> dom3 vcpu0,1 w256 credit 7500
> dom4 vcpu0,1 w512 credit 15000
> 
> I suspect this would get rid of a large number of cases such as the
> one you are reporting, where a runnable VCPU''s credit exceeds
> one entire time slice. This type of change would improve accuracy
> and not screw up credit computation for I/O intensive and other
> non spinning domains.
> 
> What do you think?
> 
> Also please confirm that your VCPUs are indeed doing simple
> "while(1);" loops.
> 
> Cheers,
> Emmanuel.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2008-Dec-11 10:34 UTC

head link

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

OK, I''ve grueled through an example by hand and think I see
what''s going on.

So the idea of the credit scheduler is that we have a certain number
of "credits" per accounting period, and each of these credits
represents a certain amount of time.  The scheduler gives out credits
according to weight, so theoretically each accounting period, if all
vcpus are active, each should consume all of its credits.  Based on
that assumption, if a vcpu has run and accumulated more than one full
accounting period of credits, it''s probably idle and we can leave it
be.

The problem in this situation isnt'' so much with rounding errors, as
with *scheduling granularity*.  In the eample given:

d1: weight 128
d2: weight 128
d3: weight 256
d4: weight 512

If each domain has 2 vcpus, and there are 2 cores, then the credits
will be divided thus:

d1: 37 credits / vcpu
d2: 37 credits / vcpu
d3: 75 credits / vcpu
d4: 150 credits / vcpu

But since scheduling and accounting only happens every "tick", and
every "tick" is 100 credits.  So each vcpu of d{1,2}, instead of
consuming 37 credits, consumes 100; same with each vcpu of d3.   At
the end of the first accounting period, d{1,2,3} have gotten to run
for 100 credits worth of time, but d4 hasn''t gotten to run at all.

In short, the fact that we have a 100-credit scheduling granularity
breaks the assumption that every VM has had a chance to run each
accounting period when there are really long runqueues.

I can think of a couple of solutions: the simplest one might be to
sort the runqueue by number of credits -- at least every accounting
period.  In that case, d4 would always get to run every accounting
period; d{1.2} might not run for a given accounting period, but the
next time it would have twice the number of credits, &c.

Others might include extending accounting periods when we have long
runqueues, or doing the credit limit during accounting only if it''s
not on the runqueue (Sakai-san''s idea) *combined* with a check when
the vcpu blocks.  That would catch vcpus that are only moderately
active, but just happen to be on the runqueue for several accounting
periods in a row.

Sakai-san, would you be willing to try to implement a simple "runqueue
sort" patch, and see if it also solves your scheduling issue?

 -George

On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI <sakaia@jp.fujitsu.com>
wrote:> Hi, Emmanuel
>
> 1)rounding error for credit
>
> This patch is over rounding error.
> So I think it does not need to consider this effect.
> If you think, would you suggest me your patch.
> It seems changing CSCHED_TICKS_PER_ACCT is not enough.
>
> 2)Effect for I/O intensive job.
>
> I am not change the code for BOOST priority.
> I just changes "credit reset" condition.
> It should be no effect on I/O intensive(but I am not measured it.)
> If it needs, I will test it.
> Which test is best for this change?
> (Simple I/O test is not enough for this case,
> I think complex domain I/O configuration is needed to prove this patch
effect.)
>
> 3)vcpu allocation measurement.
>
> At first time, I use
>  http://weather.ou.edu/~apw/projects/stress/
>  stress --cpu xx --timeout xx --verbose
> then I use simple test.(since 2vcpus on 1domain)
>  yes > /dev/null &
>  yes > /dev/null &
> Now I test with suggested method, then result is
>     original   w/ patch
> dom1    27        25
> dom2    27        25
> dom3    53        50
> dom4    91        98
>
>
> Thanks
> Atsushi SAKAI
>
>
>
>
> Emmanuel Ackaouy <ackaouy@gmail.com> wrote:
>
>> On Dec 9, 2008, at 2:25, George Dunlap wrote:
>> > On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI
>> > <sakaia@jp.fujitsu.com> wrote:
>> >> You mean it should get rid of "credit reset"?
>> >
>> > Yes, that''s exactly what I was thinking.  Removing the
check for vcpus
>> > on the runqueue may actually be functionally equivalent to
removing
>> > the check altogether.
>>
>> Essentially, this code is there as a safeguard against rounding errors
>> and other oddball cases. In theory, a runnable VCPU should seldom
>> accumulate more than one time slice''s worth of credits.
>>
>> The problem with your change is that a VCPU that is not a spinner
>> but instead runs and sleeps may not be removed from the accounting
>> list because when it should because it will not always be running when
>> accounting and the check in question is performed. Potentially this
will
>> do very bad things for VCPUs that are I/O intensive or otherwise yield
>> or sleep for a short time before consuming a full time slice.
>>
>> One thing that may help here is to make the credit calculations less
>> prone to rounding errors. One thing I had wanted to do while at
>> XenSource but never got around to was to change the arithmetic
>> so that instead of 30 credits representing a time slice, we would
>> make this a much bigger number.
>>
>> In this case for example, you would get credit allocations that had
>> less significant rounding errors if you used 30000 instead of 30
>> credits per time slice:
>>
>> dom1 vcpu0,1 w128 credit 3750
>> dom2 vcpu0,1 w128 credit 3750
>> dom3 vcpu0,1 w256 credit 7500
>> dom4 vcpu0,1 w512 credit 15000
>>
>> I suspect this would get rid of a large number of cases such as the
>> one you are reporting, where a runnable VCPU''s credit exceeds
>> one entire time slice. This type of change would improve accuracy
>> and not screw up credit computation for I/O intensive and other
>> non spinning domains.
>>
>> What do you think?
>>
>> Also please confirm that your VCPUs are indeed doing simple
>> "while(1);" loops.
>>
>> Cheers,
>> Emmanuel.
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Atsushi SAKAI

2008-Dec-11 11:33 UTC

head link

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Hi, George

I will make a simple runqsort version.
By the way, sorting makes some overhead.
(even if small number of vcpus exists on pcpu runq)

Thanks
Atsushi SAKAI


"George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
> OK, I''ve grueled through an example by hand and think I see
what''s going on.
> 
> So the idea of the credit scheduler is that we have a certain number
> of "credits" per accounting period, and each of these credits
> represents a certain amount of time.  The scheduler gives out credits
> according to weight, so theoretically each accounting period, if all
> vcpus are active, each should consume all of its credits.  Based on
> that assumption, if a vcpu has run and accumulated more than one full
> accounting period of credits, it''s probably idle and we can leave
it
> be.
> 
> The problem in this situation isnt'' so much with rounding errors,
as
> with *scheduling granularity*.  In the eample given:
> 
> d1: weight 128
> d2: weight 128
> d3: weight 256
> d4: weight 512
> 
> If each domain has 2 vcpus, and there are 2 cores, then the credits
> will be divided thus:
> 
> d1: 37 credits / vcpu
> d2: 37 credits / vcpu
> d3: 75 credits / vcpu
> d4: 150 credits / vcpu
> 
> But since scheduling and accounting only happens every "tick",
and
> every "tick" is 100 credits.  So each vcpu of d{1,2}, instead of
> consuming 37 credits, consumes 100; same with each vcpu of d3.   At
> the end of the first accounting period, d{1,2,3} have gotten to run
> for 100 credits worth of time, but d4 hasn''t gotten to run at all.
> 
> In short, the fact that we have a 100-credit scheduling granularity
> breaks the assumption that every VM has had a chance to run each
> accounting period when there are really long runqueues.
> 
> I can think of a couple of solutions: the simplest one might be to
> sort the runqueue by number of credits -- at least every accounting
> period.  In that case, d4 would always get to run every accounting
> period; d{1.2} might not run for a given accounting period, but the
> next time it would have twice the number of credits, &c.
> 
> Others might include extending accounting periods when we have long
> runqueues, or doing the credit limit during accounting only if
it''s
> not on the runqueue (Sakai-san''s idea) *combined* with a check
when
> the vcpu blocks.  That would catch vcpus that are only moderately
> active, but just happen to be on the runqueue for several accounting
> periods in a row.
> 
> Sakai-san, would you be willing to try to implement a simple "runqueue
> sort" patch, and see if it also solves your scheduling issue?
> 
>  -George
> 
> On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI
<sakaia@jp.fujitsu.com> wrote:
> > Hi, Emmanuel
> >
> > 1)rounding error for credit
> >
> > This patch is over rounding error.
> > So I think it does not need to consider this effect.
> > If you think, would you suggest me your patch.
> > It seems changing CSCHED_TICKS_PER_ACCT is not enough.
> >
> > 2)Effect for I/O intensive job.
> >
> > I am not change the code for BOOST priority.
> > I just changes "credit reset" condition.
> > It should be no effect on I/O intensive(but I am not measured it.)
> > If it needs, I will test it.
> > Which test is best for this change?
> > (Simple I/O test is not enough for this case,
> > I think complex domain I/O configuration is needed to prove this patch
effect.)
> >
> > 3)vcpu allocation measurement.
> >
> > At first time, I use
> >  http://weather.ou.edu/~apw/projects/stress/
> >  stress --cpu xx --timeout xx --verbose
> > then I use simple test.(since 2vcpus on 1domain)
> >  yes > /dev/null &
> >  yes > /dev/null &
> > Now I test with suggested method, then result is
> >     original   w/ patch
> > dom1    27        25
> > dom2    27        25
> > dom3    53        50
> > dom4    91        98
> >
> >
> > Thanks
> > Atsushi SAKAI
> >
> >
> >
> >
> > Emmanuel Ackaouy <ackaouy@gmail.com> wrote:
> >
> >> On Dec 9, 2008, at 2:25, George Dunlap wrote:
> >> > On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI
> >> > <sakaia@jp.fujitsu.com> wrote:
> >> >> You mean it should get rid of "credit reset"?
> >> >
> >> > Yes, that''s exactly what I was thinking.  Removing
the check for vcpus
> >> > on the runqueue may actually be functionally equivalent to
removing
> >> > the check altogether.
> >>
> >> Essentially, this code is there as a safeguard against rounding
errors
> >> and other oddball cases. In theory, a runnable VCPU should seldom
> >> accumulate more than one time slice''s worth of credits.
> >>
> >> The problem with your change is that a VCPU that is not a spinner
> >> but instead runs and sleeps may not be removed from the accounting
> >> list because when it should because it will not always be running
when
> >> accounting and the check in question is performed. Potentially
this will
> >> do very bad things for VCPUs that are I/O intensive or otherwise
yield
> >> or sleep for a short time before consuming a full time slice.
> >>
> >> One thing that may help here is to make the credit calculations
less
> >> prone to rounding errors. One thing I had wanted to do while at
> >> XenSource but never got around to was to change the arithmetic
> >> so that instead of 30 credits representing a time slice, we would
> >> make this a much bigger number.
> >>
> >> In this case for example, you would get credit allocations that
had
> >> less significant rounding errors if you used 30000 instead of 30
> >> credits per time slice:
> >>
> >> dom1 vcpu0,1 w128 credit 3750
> >> dom2 vcpu0,1 w128 credit 3750
> >> dom3 vcpu0,1 w256 credit 7500
> >> dom4 vcpu0,1 w512 credit 15000
> >>
> >> I suspect this would get rid of a large number of cases such as
the
> >> one you are reporting, where a runnable VCPU''s credit
exceeds
> >> one entire time slice. This type of change would improve accuracy
> >> and not screw up credit computation for I/O intensive and other
> >> non spinning domains.
> >>
> >> What do you think?
> >>
> >> Also please confirm that your VCPUs are indeed doing simple
> >> "while(1);" loops.
> >>
> >> Cheers,
> >> Emmanuel.
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Atsushi SAKAI

2008-Dec-15 06:53 UTC

head link

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Hi, George

Sorry for delaying.

With this type of changes,
The CPU% shows following.
dom1  26
dom2  26
dom3  51
dom4  96

Thanks
Atsushi SAKAI

"George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
> OK, I''ve grueled through an example by hand and think I see
what''s going on.
> 
> So the idea of the credit scheduler is that we have a certain number
> of "credits" per accounting period, and each of these credits
> represents a certain amount of time.  The scheduler gives out credits
> according to weight, so theoretically each accounting period, if all
> vcpus are active, each should consume all of its credits.  Based on
> that assumption, if a vcpu has run and accumulated more than one full
> accounting period of credits, it''s probably idle and we can leave
it
> be.
> 
> The problem in this situation isnt'' so much with rounding errors,
as
> with *scheduling granularity*.  In the eample given:
> 
> d1: weight 128
> d2: weight 128
> d3: weight 256
> d4: weight 512
> 
> If each domain has 2 vcpus, and there are 2 cores, then the credits
> will be divided thus:
> 
> d1: 37 credits / vcpu
> d2: 37 credits / vcpu
> d3: 75 credits / vcpu
> d4: 150 credits / vcpu
> 
> But since scheduling and accounting only happens every "tick",
and
> every "tick" is 100 credits.  So each vcpu of d{1,2}, instead of
> consuming 37 credits, consumes 100; same with each vcpu of d3.   At
> the end of the first accounting period, d{1,2,3} have gotten to run
> for 100 credits worth of time, but d4 hasn''t gotten to run at all.
> 
> In short, the fact that we have a 100-credit scheduling granularity
> breaks the assumption that every VM has had a chance to run each
> accounting period when there are really long runqueues.
> 
> I can think of a couple of solutions: the simplest one might be to
> sort the runqueue by number of credits -- at least every accounting
> period.  In that case, d4 would always get to run every accounting
> period; d{1.2} might not run for a given accounting period, but the
> next time it would have twice the number of credits, &c.
> 
> Others might include extending accounting periods when we have long
> runqueues, or doing the credit limit during accounting only if
it''s
> not on the runqueue (Sakai-san''s idea) *combined* with a check
when
> the vcpu blocks.  That would catch vcpus that are only moderately
> active, but just happen to be on the runqueue for several accounting
> periods in a row.
> 
> Sakai-san, would you be willing to try to implement a simple "runqueue
> sort" patch, and see if it also solves your scheduling issue?
> 
>  -George
> 
> On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI
<sakaia@jp.fujitsu.com> wrote:
> > Hi, Emmanuel
> >
> > 1)rounding error for credit
> >
> > This patch is over rounding error.
> > So I think it does not need to consider this effect.
> > If you think, would you suggest me your patch.
> > It seems changing CSCHED_TICKS_PER_ACCT is not enough.
> >
> > 2)Effect for I/O intensive job.
> >
> > I am not change the code for BOOST priority.
> > I just changes "credit reset" condition.
> > It should be no effect on I/O intensive(but I am not measured it.)
> > If it needs, I will test it.
> > Which test is best for this change?
> > (Simple I/O test is not enough for this case,
> > I think complex domain I/O configuration is needed to prove this patch
effect.)
> >
> > 3)vcpu allocation measurement.
> >
> > At first time, I use
> >  http://weather.ou.edu/~apw/projects/stress/
> >  stress --cpu xx --timeout xx --verbose
> > then I use simple test.(since 2vcpus on 1domain)
> >  yes > /dev/null &
> >  yes > /dev/null &
> > Now I test with suggested method, then result is
> >     original   w/ patch
> > dom1    27        25
> > dom2    27        25
> > dom3    53        50
> > dom4    91        98
> >
> >
> > Thanks
> > Atsushi SAKAI
> >
> >
> >
> >
> > Emmanuel Ackaouy <ackaouy@gmail.com> wrote:
> >
> >> On Dec 9, 2008, at 2:25, George Dunlap wrote:
> >> > On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI
> >> > <sakaia@jp.fujitsu.com> wrote:
> >> >> You mean it should get rid of "credit reset"?
> >> >
> >> > Yes, that''s exactly what I was thinking.  Removing
the check for vcpus
> >> > on the runqueue may actually be functionally equivalent to
removing
> >> > the check altogether.
> >>
> >> Essentially, this code is there as a safeguard against rounding
errors
> >> and other oddball cases. In theory, a runnable VCPU should seldom
> >> accumulate more than one time slice''s worth of credits.
> >>
> >> The problem with your change is that a VCPU that is not a spinner
> >> but instead runs and sleeps may not be removed from the accounting
> >> list because when it should because it will not always be running
when
> >> accounting and the check in question is performed. Potentially
this will
> >> do very bad things for VCPUs that are I/O intensive or otherwise
yield
> >> or sleep for a short time before consuming a full time slice.
> >>
> >> One thing that may help here is to make the credit calculations
less
> >> prone to rounding errors. One thing I had wanted to do while at
> >> XenSource but never got around to was to change the arithmetic
> >> so that instead of 30 credits representing a time slice, we would
> >> make this a much bigger number.
> >>
> >> In this case for example, you would get credit allocations that
had
> >> less significant rounding errors if you used 30000 instead of 30
> >> credits per time slice:
> >>
> >> dom1 vcpu0,1 w128 credit 3750
> >> dom2 vcpu0,1 w128 credit 3750
> >> dom3 vcpu0,1 w256 credit 7500
> >> dom4 vcpu0,1 w512 credit 15000
> >>
> >> I suspect this would get rid of a large number of cases such as
the
> >> one you are reporting, where a runnable VCPU''s credit
exceeds
> >> one entire time slice. This type of change would improve accuracy
> >> and not screw up credit computation for I/O intensive and other
> >> non spinning domains.
> >>
> >> What do you think?
> >>
> >> Also please confirm that your VCPUs are indeed doing simple
> >> "while(1);" loops.
> >>
> >> Cheers,
> >> Emmanuel.
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

NISHIGUCHI Naoki

2008-Dec-18 03:20 UTC

head link

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Hi, Atsushi

After my patches applied, I have tested similarly.
The CPU% shows following.
dom0  25
dom1  25
dom2  50
dom3 100

How do you think about my patches?

Regards,
Naoki Nishiguchi

Atsushi SAKAI wrote:> Hi, George
> 
> Sorry for delaying.
> 
> With this type of changes,
> The CPU% shows following.
> dom1  26
> dom2  26
> dom3  51
> dom4  96
> 
> Thanks
> Atsushi SAKAI
> 
> "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
> 
>> OK, I''ve grueled through an example by hand and think I see
what''s going on.
>>
>> So the idea of the credit scheduler is that we have a certain number
>> of "credits" per accounting period, and each of these credits
>> represents a certain amount of time.  The scheduler gives out credits
>> according to weight, so theoretically each accounting period, if all
>> vcpus are active, each should consume all of its credits.  Based on
>> that assumption, if a vcpu has run and accumulated more than one full
>> accounting period of credits, it''s probably idle and we can
leave it
>> be.
>>
>> The problem in this situation isnt'' so much with rounding
errors, as
>> with *scheduling granularity*.  In the eample given:
>>
>> d1: weight 128
>> d2: weight 128
>> d3: weight 256
>> d4: weight 512
>>
>> If each domain has 2 vcpus, and there are 2 cores, then the credits
>> will be divided thus:
>>
>> d1: 37 credits / vcpu
>> d2: 37 credits / vcpu
>> d3: 75 credits / vcpu
>> d4: 150 credits / vcpu
>>
>> But since scheduling and accounting only happens every
"tick", and
>> every "tick" is 100 credits.  So each vcpu of d{1,2}, instead
of
>> consuming 37 credits, consumes 100; same with each vcpu of d3.   At
>> the end of the first accounting period, d{1,2,3} have gotten to run
>> for 100 credits worth of time, but d4 hasn''t gotten to run at
all.
>>
>> In short, the fact that we have a 100-credit scheduling granularity
>> breaks the assumption that every VM has had a chance to run each
>> accounting period when there are really long runqueues.
>>
>> I can think of a couple of solutions: the simplest one might be to
>> sort the runqueue by number of credits -- at least every accounting
>> period.  In that case, d4 would always get to run every accounting
>> period; d{1.2} might not run for a given accounting period, but the
>> next time it would have twice the number of credits, &c.
>>
>> Others might include extending accounting periods when we have long
>> runqueues, or doing the credit limit during accounting only if
it''s
>> not on the runqueue (Sakai-san''s idea) *combined* with a check
when
>> the vcpu blocks.  That would catch vcpus that are only moderately
>> active, but just happen to be on the runqueue for several accounting
>> periods in a row.
>>
>> Sakai-san, would you be willing to try to implement a simple
"runqueue
>> sort" patch, and see if it also solves your scheduling issue?
>>
>>  -George
>>
>> On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI
<sakaia@jp.fujitsu.com> wrote:
>>> Hi, Emmanuel
>>>
>>> 1)rounding error for credit
>>>
>>> This patch is over rounding error.
>>> So I think it does not need to consider this effect.
>>> If you think, would you suggest me your patch.
>>> It seems changing CSCHED_TICKS_PER_ACCT is not enough.
>>>
>>> 2)Effect for I/O intensive job.
>>>
>>> I am not change the code for BOOST priority.
>>> I just changes "credit reset" condition.
>>> It should be no effect on I/O intensive(but I am not measured it.)
>>> If it needs, I will test it.
>>> Which test is best for this change?
>>> (Simple I/O test is not enough for this case,
>>> I think complex domain I/O configuration is needed to prove this
patch effect.)
>>>
>>> 3)vcpu allocation measurement.
>>>
>>> At first time, I use
>>>  http://weather.ou.edu/~apw/projects/stress/
>>>  stress --cpu xx --timeout xx --verbose
>>> then I use simple test.(since 2vcpus on 1domain)
>>>  yes > /dev/null &
>>>  yes > /dev/null &
>>> Now I test with suggested method, then result is
>>>     original   w/ patch
>>> dom1    27        25
>>> dom2    27        25
>>> dom3    53        50
>>> dom4    91        98
>>>
>>>
>>> Thanks
>>> Atsushi SAKAI
>>>
>>>
>>>
>>>
>>> Emmanuel Ackaouy <ackaouy@gmail.com> wrote:
>>>
>>>> On Dec 9, 2008, at 2:25, George Dunlap wrote:
>>>>> On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI
>>>>> <sakaia@jp.fujitsu.com> wrote:
>>>>>> You mean it should get rid of "credit reset"?
>>>>> Yes, that''s exactly what I was thinking.  Removing
the check for vcpus
>>>>> on the runqueue may actually be functionally equivalent to
removing
>>>>> the check altogether.
>>>> Essentially, this code is there as a safeguard against rounding
errors
>>>> and other oddball cases. In theory, a runnable VCPU should
seldom
>>>> accumulate more than one time slice''s worth of
credits.
>>>>
>>>> The problem with your change is that a VCPU that is not a
spinner
>>>> but instead runs and sleeps may not be removed from the
accounting
>>>> list because when it should because it will not always be
running when
>>>> accounting and the check in question is performed. Potentially
this will
>>>> do very bad things for VCPUs that are I/O intensive or
otherwise yield
>>>> or sleep for a short time before consuming a full time slice.
>>>>
>>>> One thing that may help here is to make the credit calculations
less
>>>> prone to rounding errors. One thing I had wanted to do while at
>>>> XenSource but never got around to was to change the arithmetic
>>>> so that instead of 30 credits representing a time slice, we
would
>>>> make this a much bigger number.
>>>>
>>>> In this case for example, you would get credit allocations that
had
>>>> less significant rounding errors if you used 30000 instead of
30
>>>> credits per time slice:
>>>>
>>>> dom1 vcpu0,1 w128 credit 3750
>>>> dom2 vcpu0,1 w128 credit 3750
>>>> dom3 vcpu0,1 w256 credit 7500
>>>> dom4 vcpu0,1 w512 credit 15000
>>>>
>>>> I suspect this would get rid of a large number of cases such as
the
>>>> one you are reporting, where a runnable VCPU''s credit
exceeds
>>>> one entire time slice. This type of change would improve
accuracy
>>>> and not screw up credit computation for I/O intensive and other
>>>> non spinning domains.
>>>>
>>>> What do you think?
>>>>
>>>> Also please confirm that your VCPUs are indeed doing simple
>>>> "while(1);" loops.
>>>>
>>>> Cheers,
>>>> Emmanuel.
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>>
------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Atsushi SAKAI

2008-Dec-18 03:31 UTC

head link

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Hi, Naoki

Ask Emmanuel and George first,
since I am not maintaining scheduler.

By the way, I want to see the xentrace data for original one.
(adding vcpu priority and credit in trace-output is helpful.)
Your problem seems vcpu priority mis-handling in somewhere.

Thanks
Atsushi SAKAI


NISHIGUCHI Naoki <nisiguti@jp.fujitsu.com> wrote:
> Hi, Atsushi
> 
> After my patches applied, I have tested similarly.
> The CPU% shows following.
> dom0  25
> dom1  25
> dom2  50
> dom3 100
> 
> How do you think about my patches?
> 
> Regards,
> Naoki Nishiguchi
> 
> Atsushi SAKAI wrote:
> > Hi, George
> > 
> > Sorry for delaying.
> > 
> > With this type of changes,
> > The CPU% shows following.
> > dom1  26
> > dom2  26
> > dom3  51
> > dom4  96
> > 
> > Thanks
> > Atsushi SAKAI
> > 
> > "George Dunlap" <George.Dunlap@eu.citrix.com> wrote:
> > 
> >> OK, I''ve grueled through an example by hand and think I
see what''s going on.
> >>
> >> So the idea of the credit scheduler is that we have a certain
number
> >> of "credits" per accounting period, and each of these
credits
> >> represents a certain amount of time.  The scheduler gives out
credits
> >> according to weight, so theoretically each accounting period, if
all
> >> vcpus are active, each should consume all of its credits.  Based
on
> >> that assumption, if a vcpu has run and accumulated more than one
full
> >> accounting period of credits, it''s probably idle and we
can leave it
> >> be.
> >>
> >> The problem in this situation isnt'' so much with rounding
errors, as
> >> with *scheduling granularity*.  In the eample given:
> >>
> >> d1: weight 128
> >> d2: weight 128
> >> d3: weight 256
> >> d4: weight 512
> >>
> >> If each domain has 2 vcpus, and there are 2 cores, then the
credits
> >> will be divided thus:
> >>
> >> d1: 37 credits / vcpu
> >> d2: 37 credits / vcpu
> >> d3: 75 credits / vcpu
> >> d4: 150 credits / vcpu
> >>
> >> But since scheduling and accounting only happens every
"tick", and
> >> every "tick" is 100 credits.  So each vcpu of d{1,2},
instead of
> >> consuming 37 credits, consumes 100; same with each vcpu of d3.  
At
> >> the end of the first accounting period, d{1,2,3} have gotten to
run
> >> for 100 credits worth of time, but d4 hasn''t gotten to
run at all.
> >>
> >> In short, the fact that we have a 100-credit scheduling
granularity
> >> breaks the assumption that every VM has had a chance to run each
> >> accounting period when there are really long runqueues.
> >>
> >> I can think of a couple of solutions: the simplest one might be to
> >> sort the runqueue by number of credits -- at least every
accounting
> >> period.  In that case, d4 would always get to run every accounting
> >> period; d{1.2} might not run for a given accounting period, but
the
> >> next time it would have twice the number of credits, &c.
> >>
> >> Others might include extending accounting periods when we have
long
> >> runqueues, or doing the credit limit during accounting only if
it''s
> >> not on the runqueue (Sakai-san''s idea) *combined* with a
check when
> >> the vcpu blocks.  That would catch vcpus that are only moderately
> >> active, but just happen to be on the runqueue for several
accounting
> >> periods in a row.
> >>
> >> Sakai-san, would you be willing to try to implement a simple
"runqueue
> >> sort" patch, and see if it also solves your scheduling issue?
> >>
> >>  -George
> >>
> >> On Wed, Dec 10, 2008 at 2:45 AM, Atsushi SAKAI
<sakaia@jp.fujitsu.com> wrote:
> >>> Hi, Emmanuel
> >>>
> >>> 1)rounding error for credit
> >>>
> >>> This patch is over rounding error.
> >>> So I think it does not need to consider this effect.
> >>> If you think, would you suggest me your patch.
> >>> It seems changing CSCHED_TICKS_PER_ACCT is not enough.
> >>>
> >>> 2)Effect for I/O intensive job.
> >>>
> >>> I am not change the code for BOOST priority.
> >>> I just changes "credit reset" condition.
> >>> It should be no effect on I/O intensive(but I am not measured
it.)
> >>> If it needs, I will test it.
> >>> Which test is best for this change?
> >>> (Simple I/O test is not enough for this case,
> >>> I think complex domain I/O configuration is needed to prove
this patch effect.)
> >>>
> >>> 3)vcpu allocation measurement.
> >>>
> >>> At first time, I use
> >>>  http://weather.ou.edu/~apw/projects/stress/
> >>>  stress --cpu xx --timeout xx --verbose
> >>> then I use simple test.(since 2vcpus on 1domain)
> >>>  yes > /dev/null &
> >>>  yes > /dev/null &
> >>> Now I test with suggested method, then result is
> >>>     original   w/ patch
> >>> dom1    27        25
> >>> dom2    27        25
> >>> dom3    53        50
> >>> dom4    91        98
> >>>
> >>>
> >>> Thanks
> >>> Atsushi SAKAI
> >>>
> >>>
> >>>
> >>>
> >>> Emmanuel Ackaouy <ackaouy@gmail.com> wrote:
> >>>
> >>>> On Dec 9, 2008, at 2:25, George Dunlap wrote:
> >>>>> On Tue, Dec 9, 2008 at 7:33 AM, Atsushi SAKAI
> >>>>> <sakaia@jp.fujitsu.com> wrote:
> >>>>>> You mean it should get rid of "credit
reset"?
> >>>>> Yes, that''s exactly what I was thinking. 
Removing the check for vcpus
> >>>>> on the runqueue may actually be functionally
equivalent to removing
> >>>>> the check altogether.
> >>>> Essentially, this code is there as a safeguard against
rounding errors
> >>>> and other oddball cases. In theory, a runnable VCPU should
seldom
> >>>> accumulate more than one time slice''s worth of
credits.
> >>>>
> >>>> The problem with your change is that a VCPU that is not a
spinner
> >>>> but instead runs and sleeps may not be removed from the
accounting
> >>>> list because when it should because it will not always be
running when
> >>>> accounting and the check in question is performed.
Potentially this will
> >>>> do very bad things for VCPUs that are I/O intensive or
otherwise yield
> >>>> or sleep for a short time before consuming a full time
slice.
> >>>>
> >>>> One thing that may help here is to make the credit
calculations less
> >>>> prone to rounding errors. One thing I had wanted to do
while at
> >>>> XenSource but never got around to was to change the
arithmetic
> >>>> so that instead of 30 credits representing a time slice,
we would
> >>>> make this a much bigger number.
> >>>>
> >>>> In this case for example, you would get credit allocations
that had
> >>>> less significant rounding errors if you used 30000 instead
of 30
> >>>> credits per time slice:
> >>>>
> >>>> dom1 vcpu0,1 w128 credit 3750
> >>>> dom2 vcpu0,1 w128 credit 3750
> >>>> dom3 vcpu0,1 w256 credit 7500
> >>>> dom4 vcpu0,1 w512 credit 15000
> >>>>
> >>>> I suspect this would get rid of a large number of cases
such as the
> >>>> one you are reporting, where a runnable VCPU''s
credit exceeds
> >>>> one entire time slice. This type of change would improve
accuracy
> >>>> and not screw up credit computation for I/O intensive and
other
> >>>> non spinning domains.
> >>>>
> >>>> What do you think?
> >>>>
> >>>> Also please confirm that your VCPUs are indeed doing
simple
> >>>> "while(1);" loops.
> >>>>
> >>>> Cheers,
> >>>> Emmanuel.
> >>>
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xensource.com
> >>> http://lists.xensource.com/xen-devel
> >>>
> >>>
> >>>
------------------------------------------------------------------------
> >>>
> >>> _______________________________________________
> >>> Xen-devel mailing list
> >>> Xen-devel@lists.xensource.com
> >>> http://lists.xensource.com/xen-devel
> 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

NISHIGUCHI Naoki

2008-Dec-18 04:23 UTC

head link

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Hi, Atsushi

Atsushi SAKAI wrote:> Ask Emmanuel and George first,
> since I am not maintaining scheduler.
Sorry.
> By the way, I want to see the xentrace data for original one.
> (adding vcpu priority and credit in trace-output is helpful.)
> Your problem seems vcpu priority mis-handling in somewhere.
Thanks for your advice.
I''ll try to use xentrace.

Regards,
Naoki Nishiguchi


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Maybe Matching Threads

Search for more seemingly similar threads

Xen devel - Dec 2008 - [PATCH] Accurate vcpu weighting for credit scheduler

[Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Re: [Xen-devel] [PATCH] Accurate vcpu weighting for credit scheduler

Maybe Matching Threads