thr3ads.net - Xen devel - [Xen-devel] The caculation of the credit in credit

If this information is useful, please help other people find it:
Share via:

Jiang, Yunhong

2010-Nov-05 07:06 UTC

[Xen-devel] The caculation of the credit in credit_scheduler

When reading the credit scheduler code and doing experiment, I notice one thing
interesting in current credit scheduler. For example, in following situation:

Hardware:
A powerful system with 64 CPUs.

Xen Environment:
Dom0 with 8 vCPU bound to CPU (0, 16~24)

3 HVM domain, all with 2 vCPUS, all bound as vcpu0->pcpu1, vcpu1->pcpu2.
Among them, 2 are CPU intensive while 1 is I/O intensive.

The result shows that the I/O intensive domain will occupy more than 100% cpu,
while the two cpu intensive domain each occupy 50%.

IMHO it should be 66% for all domain.

The reason is how the credit is caculated. Although the 3 HVM domains is pinned
to 2 PCPU and share the 2 CPUs, they will all get 2* 300 credit when credit
account. That means the I/O intensive HVM domain will never be under credit,
thus it will preempt the CPU intensive whenever it is boost (i.e. after I/O
access to QEMU), and it is set to be TS_UNDER only at the tick time, and then,
boost again.

I''m not sure if this is meaningful usage model and need fix, but I
think it is helpful to show this to the list.

I didn''t try credit2, so no idea if this will happen to credit2 also.

Thanks
--jyh


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Xiantao

2010-Nov-05 07:26 UTC

head link

[Xen-devel] RE: The caculation of the credit in credit_scheduler

Maybe idlers shouldn''t produce the credits at the calcuation points.  I
did an experiment before, it can reduce the unfaireness if idlers not producing
credit.

Except this issue, I also have findings and want to share them with you guys to
get more input about credit scheduler.

1. Interrupt delivery for assiged devices is done in a tasklet and the tasklet
is running in the idle vcpu''s context, but scheduler''s
behavior for scheduling idle vcpu looks very strange. Ideally, when switch to
idle vcpu for executing tasklet, the previous vcpu should be switch back after
tasklet is done, but current policy is to choose another vcpu in runq.  That is
to say, one interrupt happens on one CPU, the CPU may do a real task switch, it
maybe not acceptable when interrupt frequency is high and also introduce some
performance bugs according to our experiments.  Even if we can switch back the
previous vcpu after executing tasklet, how to determine its timeslice for its
next run is also a key issue and this is not addressed. If still give 30ms for
its restart run, it may trigger some fairness issues, I think.

2.  Another issue is found during our experiments and this is a very interesting
issue(likely to be a bug).  In the experiment, we pinned three guests(two
cpu-intensive and one IO-intensive) on two logical processors firstly, and each
guest is configured with two virtual CPUs, and the CPU utilization share is ~90%
for each CPU intensive guest and ~20% for IO-intensive guest.  But the magic
thing happens after we introducing an addition idle guest which doesn''t
do real worload and just does idle.  The CPU utilization share is changed : ~50%
for each CPU-intensive guest and ~100% for the IO-intensive  guest.  After
analying the scheduling data, we found the change is from virtual timer
interrupt delivery to the idle guest. Although the guest is idle, but there are
still 1000 timer interrupts for each vcpu in one second. Current credit
scheduler will boost the idle vcpu from the blocked state and trigger 1000
schedule events in the target physical processor, and the IO-intensive guest
maybe benefit from the frequent schedule events and get more CPU utilization
share.  The more magic thing is that after ''xm pause'' and
''xm unpause'' the idle guest,  the each of the three guests are
all allocated with ~66% CPU share.
This finding tells us some facts:  (1)  current credit scheduler is not fair to
IO-intensive guests. (2) IO-intensive guests have the ability to acquire fair
CPU share when competing with CPU-intensive guests. (3) Current timeslice (30ms)
is meaningless, since the average timeslice is far smaller than 1ms under real
workloads(This may bring performance issues). (4) boost mechanism is too
aggressive and idle guest shouldn''t be boosted when it is waken from
halt state.  (5)  There is no policy in credit to determine how
long the boosted vcpu can run ,and how to handle the preempted vcpu . 

3.  Credit is not really used for determining key scheduling policies. For
example, when choose candidate task, credit is not well used to evaluate
tasks'' priority, and this maybe not fair to IO-intensive guest.
Additionally, task''s priority is not caculated in time and just is
updated every 30ms. In this case, even if one task''s credit is minus,
its prioirty maybe still TS_UNDER or TS_BOOST due to delayed update, so maybe
when the vcpu is scheduled out, its priority should be updated after credit
change.  In addition, when a boosted vCPU is scheduled out, its priority is
always set to TS_UNDER, and credit is not considered as well. If the credit
becomes minus, it maybe better to set the priority to TS_OVER?.

Any comments ? 

Xiantao

Jiang, Yunhong wrote:> When reading the credit scheduler code and doing experiment, I notice
> one thing interesting in current credit scheduler. For example, in
> following situation:  
> 
> Hardware:
> A powerful system with 64 CPUs.
> 
> Xen Environment:
> Dom0 with 8 vCPU bound to CPU (0, 16~24)
> 
> 3 HVM domain, all with 2 vCPUS, all bound as vcpu0->pcpu1,
> vcpu1->pcpu2. Among them, 2 are CPU intensive while 1 is I/O
> intensive.  
> 
> The result shows that the I/O intensive domain will occupy more than
> 100% cpu, while the two cpu intensive domain each occupy 50%. 
> 
> IMHO it should be 66% for all domain.
> 
> The reason is how the credit is caculated. Although the 3 HVM domains
> is pinned to 2 PCPU and share the 2 CPUs, they will all get 2* 300
> credit when credit account. That means the I/O intensive HVM domain
> will never be under credit, thus it will preempt the CPU intensive
> whenever it is boost (i.e. after I/O access to QEMU), and it is set
> to be TS_UNDER only at the tick time, and then, boost again.     
> 
> I''m not sure if this is meaningful usage model and need fix, but I
> think it is helpful to show this to the list. 
> 
> I didn''t try credit2, so no idea if this will happen to credit2
also.
> 
> Thanks
> --jyh

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-05 08:07 UTC

head link

Re: [Xen-devel] RE: The caculation of the credit in credit_scheduler

On 05/11/2010 07:26, "Zhang, Xiantao" <xiantao.zhang@intel.com>
wrote:
> Maybe idlers shouldn''t produce the credits at the calcuation
points.  I did an
> experiment before, it can reduce the unfaireness if idlers not producing
> credit. 
> 
> Except this issue, I also have findings and want to share them with you
guys
> to get more input about credit scheduler.
> 
> 1. Interrupt delivery for assiged devices is done in a tasklet and the
tasklet
> is running in the idle vcpu''s context, but scheduler''s
behavior for scheduling
> idle vcpu looks very strange. Ideally, when switch to idle vcpu for
executing
> tasklet, the previous vcpu should be switch back after tasklet is done, but
> current policy is to choose another vcpu in runq.  That is to say, one
> interrupt happens on one CPU, the CPU may do a real task switch, it maybe
not
> acceptable when interrupt frequency is high and also introduce some
> performance bugs according to our experiments.  Even if we can switch back
the
> previous vcpu after executing tasklet, how to determine its timeslice for
its
> next run is also a key issue and this is not addressed. If still give 30ms
for
> its restart run, it may trigger some fairness issues, I think.
Interrupt delivery is a victim of us switching tasklet implementation to
work in idle VCPU context instead of in softirq context. It might be
sensible to make use of softirqs directly from the interrupt-delivery logic,
or introduce a second type of tasklets (built on softirqs), or perhaps we
can think of a way to structure interrupt delivery that doesn''t need
softirq
context at all -- that would be nice! What did we need softirq context for
in the first place?

 -- Keir
> 2.  Another issue is found during our experiments and this is a very
> interesting issue(likely to be a bug).  In the experiment, we pinned three
> guests(two cpu-intensive and one IO-intensive) on two logical processors
> firstly, and each guest is configured with two virtual CPUs, and the CPU
> utilization share is ~90% for each CPU intensive guest and ~20% for
> IO-intensive guest.  But the magic thing happens after we introducing an
> addition idle guest which doesn''t do real worload and just does
idle.  The CPU
> utilization share is changed : ~50% for each CPU-intensive guest and ~100%
for
> the IO-intensive  guest.  After analying the scheduling data, we found the
> change is from virtual timer interrupt delivery to the idle guest. Although
> the guest is idle, but there are still 1000 timer interrupts for each vcpu
in
> one second. Current credit scheduler will boost the idle vcpu from the
blocked
> state and trigger 1000 schedule events in the target physical processor,
and
> the IO-intensive guest maybe benefit from the frequent schedule events and
get
> more CPU utilization share.  The more magic thing is that after
''xm pause'' and
> ''xm unpause'' the idle guest,  the each of the three
guests are all allocated
> with ~66% CPU share.
> This finding tells us some facts:  (1)  current credit scheduler is not
fair
> to IO-intensive guests. (2) IO-intensive guests have the ability to acquire
> fair CPU share when competing with CPU-intensive guests. (3) Current
timeslice
> (30ms) is meaningless, since the average timeslice is far smaller than 1ms
> under real workloads(This may bring performance issues). (4) boost
mechanism
> is too aggressive and idle guest shouldn''t be boosted when it is
waken from
> halt state.  (5)  There is no policy in credit to determine how
> long the boosted vcpu can run ,and how to handle the preempted vcpu .
> 
> 3.  Credit is not really used for determining key scheduling policies. For
> example, when choose candidate task, credit is not well used to evaluate
> tasks'' priority, and this maybe not fair to IO-intensive guest.
Additionally,
> task''s priority is not caculated in time and just is updated every
30ms. In
> this case, even if one task''s credit is minus, its prioirty maybe
still
> TS_UNDER or TS_BOOST due to delayed update, so maybe when the vcpu is
> scheduled out, its priority should be updated after credit change.  In
> addition, when a boosted vCPU is scheduled out, its priority is always set
to
> TS_UNDER, and credit is not considered as well. If the credit becomes
minus,
> it maybe better to set the priority to TS_OVER?.
> 
> Any comments ? 
> 
> Xiantao
> 
> 
> Jiang, Yunhong wrote:
>> When reading the credit scheduler code and doing experiment, I notice
>> one thing interesting in current credit scheduler. For example, in
>> following situation:
>> 
>> Hardware:
>> A powerful system with 64 CPUs.
>> 
>> Xen Environment:
>> Dom0 with 8 vCPU bound to CPU (0, 16~24)
>> 
>> 3 HVM domain, all with 2 vCPUS, all bound as vcpu0->pcpu1,
>> vcpu1->pcpu2. Among them, 2 are CPU intensive while 1 is I/O
>> intensive.  
>> 
>> The result shows that the I/O intensive domain will occupy more than
>> 100% cpu, while the two cpu intensive domain each occupy 50%.
>> 
>> IMHO it should be 66% for all domain.
>> 
>> The reason is how the credit is caculated. Although the 3 HVM domains
>> is pinned to 2 PCPU and share the 2 CPUs, they will all get 2* 300
>> credit when credit account. That means the I/O intensive HVM domain
>> will never be under credit, thus it will preempt the CPU intensive
>> whenever it is boost (i.e. after I/O access to QEMU), and it is set
>> to be TS_UNDER only at the tick time, and then, boost again.
>> 
>> I''m not sure if this is meaningful usage model and need fix,
but I
>> think it is helpful to show this to the list.
>> 
>> I didn''t try credit2, so no idea if this will happen to
credit2 also.
>> 
>> Thanks
>> --jyh
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Xiantao

2010-Nov-05 09:33 UTC

head link

RE: [Xen-devel] RE: The caculation of the credit in credit_scheduler

Keir Fraser wrote:> On 05/11/2010 07:26, "Zhang, Xiantao"
<xiantao.zhang@intel.com> wrote:
> 
>> Maybe idlers shouldn''t produce the credits at the calcuation
points.
>> I did an experiment before, it can reduce the unfaireness if idlers
>> not producing credit. 
>> 
>> Except this issue, I also have findings and want to share them with
>> you guys to get more input about credit scheduler.
>> 
>> 1. Interrupt delivery for assiged devices is done in a tasklet and
>> the tasklet is running in the idle vcpu''s context, but
scheduler''s
>> behavior for scheduling idle vcpu looks very strange. Ideally, when
>> switch to idle vcpu for executing tasklet, the previous vcpu should
>> be switch back after tasklet is done, but current policy is to
>> choose another vcpu in runq.  That is to say, one interrupt happens
>> on one CPU, the CPU may do a real task switch, it maybe not
>> acceptable when interrupt frequency is high and also introduce some
>> performance bugs according to our experiments.  Even if we can
>> switch back the previous vcpu after executing tasklet, how to
>> determine its timeslice for its next run is also a key issue and
>> this is not addressed. If still give 30ms for its restart run, it
>> may trigger some fairness issues, I think.  
> 
> Interrupt delivery is a victim of us switching tasklet implementation
> to work in idle VCPU context instead of in softirq context. It might
> be sensible to make use of softirqs directly from the
> interrupt-delivery logic, or introduce a second type of tasklets
> (built on softirqs), or perhaps we can think of a way to structure
> interrupt delivery that doesn''t need softirq context at all --
that
> would be nice! What did we need softirq context for in the first
> place? 
A dedicated softirq maybe a good choice. In this case,  it needs a Per-CPU list
structure to record the pending interrupts'' destinations(which
domains). In the softirq callback,  it can check the list and delivery the
interrupts to guests one by one.
Xiantao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2010-Nov-05 09:45 UTC

head link

Re: [Xen-devel] RE: The caculation of the credit in credit_scheduler

On 05/11/2010 09:33, "Zhang, Xiantao" <xiantao.zhang@intel.com>
wrote:
>> Interrupt delivery is a victim of us switching tasklet implementation
>> to work in idle VCPU context instead of in softirq context. It might
>> be sensible to make use of softirqs directly from the
>> interrupt-delivery logic, or introduce a second type of tasklets
>> (built on softirqs), or perhaps we can think of a way to structure
>> interrupt delivery that doesn''t need softirq context at all --
that
>> would be nice! What did we need softirq context for in the first
>> place? 
> 
> A dedicated softirq maybe a good choice. In this case,  it needs a Per-CPU
> list structure to record the pending interrupts''
destinations(which domains).
> In the softirq callback,  it can check the list and delivery the interrupts
to
> guests one by one.
Feel free to code it up. Don''t forget the CPU hotplug callback so that
interrupt work can de done immediately, or farmed off to another CPU, when a
CPU is taken offline.

I''d be perfectly happy to implement softirq tasklets as an alternative.
I
could do that bit of the implementation. Which do you think would result in
neater code in the interrupt-delivery subsystem?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2010-Nov-09 14:16 UTC

head link

[Xen-devel] Re: The caculation of the credit in credit_scheduler

Xiantao,

Thanks for your comments.  All of the things you pointed out are things 
I''m trying to address in credit2.  In fact, a huge amount of them can
be
attributed to the fact that credit1 divides tasks into 3 priorities 
(OVER, UNDER, and BOOST) and will schedule tasks "round-robin" within
a
each priority.  Round-robin is known to discriminate against tasks which 
yield (such as tasks that do frequent I/O) in favor of tasks that don''t
yield (such as cpu "burners").

In credit2, I hope to address these issues in a couple of ways:
* Always sort the runqueue by order of credit.  This addresses issues in 
all of 1, 2, and 3.
* When a VM wakes up, update the credit of all the running VMs to see if 
any of them should be preempted (addressing #3)
* When selecting how long to run, I have a mechanism to look at the next 
VM in the runqueue, and calculate how long it would take for the current 
VM''s credit to equal the next VM''s credit.  I.e., if the one
chosen to
run has 10ms of credit, and the next one on the runqueue has 7ms of 
credit, set the schedule time to 3ms.  This is limited by a "minimum 
schedule time" (currently 500us) and a "maximum schedule time" 
(currently 10ms).  This could probably use some more tweaking, but it 
seem to work pretty well.

It''s not clear to me how to address a lot of the issues you bring up 
without doing a big redesign -- which is what I''m already working on.

If you''re interested in helping test / develop credit2, let me know,
I''d
love some help. :-)

  -George

On 05/11/10 07:26, Zhang, Xiantao wrote:> Maybe idlers shouldn''t produce the credits at the calcuation
points.  I did an experiment before, it can reduce the unfaireness if idlers not
producing credit.
>
> Except this issue, I also have findings and want to share them with you
guys to get more input about credit scheduler.
>
> 1. Interrupt delivery for assiged devices is done in a tasklet and the
tasklet is running in the idle vcpu''s context, but scheduler''s
behavior for scheduling idle vcpu looks very strange. Ideally, when switch to
idle vcpu for executing tasklet, the previous vcpu should be switch back after
tasklet is done, but current policy is to choose another vcpu in runq.  That is
to say, one interrupt happens on one CPU, the CPU may do a real task switch, it
maybe not acceptable when interrupt frequency is high and also introduce some
performance bugs according to our experiments.  Even if we can switch back the
previous vcpu after executing tasklet, how to determine its timeslice for its
next run is also a key issue and this is not addressed. If still give 30ms for
its restart run, it may trigger some fairness issues, I think.
>
> 2.  Another issue is found during our experiments and this is a very
interesting issue(likely to be a bug).  In the experiment, we pinned three
guests(two cpu-intensive and one IO-intensive) on two logical processors
firstly, and each guest is configured with two virtual CPUs, and the CPU
utilization share is ~90% for each CPU intensive guest and ~20% for IO-intensive
guest.  But the magic thing happens after we introducing an addition idle guest
which doesn''t do real worload and just does idle.  The CPU utilization
share is changed : ~50% for each CPU-intensive guest and ~100% for the
IO-intensive  guest.  After analying the scheduling data, we found the change is
from virtual timer interrupt delivery to the idle guest. Although the guest is
idle, but there are still 1000 timer interrupts for each vcpu in one second.
Current credit scheduler will boost the idle vcpu from the blocked state and
trigger 1000 schedule events in the target physical processor, and the
IO-intensi 
ve guest maybe benefit from the frequent schedule events and get more CPU
utilization share.  The more magic thing is that after ''xm
pause'' and ''xm unpause'' the idle guest,  the each of
the three guests are all allocated with ~66% CPU share.> This finding tells us some facts:  (1)  current credit scheduler is not
fair to IO-intensive guests. (2) IO-intensive guests have the ability to acquire
fair CPU share when competing with CPU-intensive guests. (3) Current timeslice
(30ms) is meaningless, since the average timeslice is far smaller than 1ms under
real workloads(This may bring performance issues). (4) boost mechanism is too
aggressive and idle guest shouldn''t be boosted when it is waken from
halt state.  (5)  There is no policy in credit to determine how
> long the boosted vcpu can run ,and how to handle the preempted vcpu .
>
> 3.  Credit is not really used for determining key scheduling policies. For
example, when choose candidate task, credit is not well used to evaluate
tasks'' priority, and this maybe not fair to IO-intensive guest.
Additionally, task''s priority is not caculated in time and just is
updated every 30ms. In this case, even if one task''s credit is minus,
its prioirty maybe still TS_UNDER or TS_BOOST due to delayed update, so maybe
when the vcpu is scheduled out, its priority should be updated after credit
change.  In addition, when a boosted vCPU is scheduled out, its priority is
always set to TS_UNDER, and credit is not considered as well. If the credit
becomes minus, it maybe better to set the priority to TS_OVER?.
>
> Any comments ?
>
> Xiantao
>
>
> Jiang, Yunhong wrote:
>> When reading the credit scheduler code and doing experiment, I notice
>> one thing interesting in current credit scheduler. For example, in
>> following situation:
>>
>> Hardware:
>> A powerful system with 64 CPUs.
>>
>> Xen Environment:
>> Dom0 with 8 vCPU bound to CPU (0, 16~24)
>>
>> 3 HVM domain, all with 2 vCPUS, all bound as vcpu0->pcpu1,
>> vcpu1->pcpu2. Among them, 2 are CPU intensive while 1 is I/O
>> intensive.
>>
>> The result shows that the I/O intensive domain will occupy more than
>> 100% cpu, while the two cpu intensive domain each occupy 50%.
>>
>> IMHO it should be 66% for all domain.
>>
>> The reason is how the credit is caculated. Although the 3 HVM domains
>> is pinned to 2 PCPU and share the 2 CPUs, they will all get 2* 300
>> credit when credit account. That means the I/O intensive HVM domain
>> will never be under credit, thus it will preempt the CPU intensive
>> whenever it is boost (i.e. after I/O access to QEMU), and it is set
>> to be TS_UNDER only at the tick time, and then, boost again.
>>
>> I''m not sure if this is meaningful usage model and need fix,
but I
>> think it is helpful to show this to the list.
>>
>> I didn''t try credit2, so no idea if this will happen to
credit2 also.
>>
>> Thanks
>> --jyh
>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2010-Nov-09 14:26 UTC

head link

[Xen-devel] Re: The caculation of the credit in credit_scheduler

On 05/11/10 07:06, Jiang, Yunhong wrote:> The reason is how the credit is caculated. Although the 3 HVM domains is
pinned to 2 PCPU and share the 2 CPUs, they will all get 2* 300 credit when
credit account. That means the I/O intensive HVM domain will never be under
credit, thus it will preempt the CPU intensive whenever it is boost (i.e. after
I/O access to QEMU), and it is set to be TS_UNDER only at the tick time, and
then, boost again.
I suspect that the real reason you''re having trouble is that pinning
and
the credit mechanism don''t work very well together.  Instead of
pinning,
have you tried using the cpupools interface to make a 2-cpu pool to put 
the VMs into?  That should allow the credit to be divided appropriately.
> I didn''t try credit2, so no idea if this will happen to credit2
also.
Credit2 may do better at dividing.  However, it doesn''t implement 
pinning (just ignores it).  So you couldn''t do your test unless you
used
cpupools, or limited cpus=2 to Xen''s commandline.

Also, credit2 isn''t yet designed to handle 64 cpus yet, so it may not 
work very well on a system with 64 cores.

  -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Nov-10 02:39 UTC

head link

[Xen-devel] RE: The caculation of the credit in credit_scheduler

>-----Original Message-----
>From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
>Sent: Tuesday, November 09, 2010 10:27 PM
>To: Jiang, Yunhong
>Cc: xen-devel@lists.xensource.com; Dong, Eddie; Zhang, Xiantao
>Subject: Re: The caculation of the credit in credit_scheduler
>
>On 05/11/10 07:06, Jiang, Yunhong wrote:
>> The reason is how the credit is caculated. Although the 3 HVM domains
is pinned
>to 2 PCPU and share the 2 CPUs, they will all get 2* 300 credit when credit
account.
>That means the I/O intensive HVM domain will never be under credit, thus it
will
>preempt the CPU intensive whenever it is boost (i.e. after I/O access to
QEMU), and
>it is set to be TS_UNDER only at the tick time, and then, boost again.
>
>I suspect that the real reason you''re having trouble is that
pinning and
>the credit mechanism don''t work very well together.  Instead of
pinning,
>have you tried using the cpupools interface to make a 2-cpu pool to put
>the VMs into?  That should allow the credit to be divided appropriately.
I have a quick look in the code, and seems the cpu pool should not help on such
situation. The CPU poll only cares about the CPUs a domain can be scheduled, but
not about the credit caculation.

Will do the experiment later.

Thanks
--jyh
>
>> I didn''t try credit2, so no idea if this will happen to
credit2 also.
>
>Credit2 may do better at dividing.  However, it doesn''t implement
>pinning (just ignores it).  So you couldn''t do your test unless you
used
>cpupools, or limited cpus=2 to Xen''s commandline.
>
>Also, credit2 isn''t yet designed to handle 64 cpus yet, so it may
not
>work very well on a system with 64 cores.
>
>  -George
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Nov-10 05:46 UTC

head link

Re: [Xen-devel] RE: The caculation of the credit in credit_scheduler

On 11/10/10 03:39, Jiang, Yunhong wrote:>
>
>> -----Original Message-----
>> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
>> Sent: Tuesday, November 09, 2010 10:27 PM
>> To: Jiang, Yunhong
>> Cc: xen-devel@lists.xensource.com; Dong, Eddie; Zhang, Xiantao
>> Subject: Re: The caculation of the credit in credit_scheduler
>>
>> On 05/11/10 07:06, Jiang, Yunhong wrote:
>>> The reason is how the credit is caculated. Although the 3 HVM
domains is pinned
>> to 2 PCPU and share the 2 CPUs, they will all get 2* 300 credit when
credit account.
>> That means the I/O intensive HVM domain will never be under credit,
thus it will
>> preempt the CPU intensive whenever it is boost (i.e. after I/O access
to QEMU), and
>> it is set to be TS_UNDER only at the tick time, and then, boost again.
>>
>> I suspect that the real reason you''re having trouble is that
pinning and
>> the credit mechanism don''t work very well together.  Instead
of pinning,
>> have you tried using the cpupools interface to make a 2-cpu pool to put
>> the VMs into?  That should allow the credit to be divided
appropriately.
>
> I have a quick look in the code, and seems the cpu pool should not help on
such situation. The CPU poll only cares about the CPUs a domain can be
scheduled, but not about the credit caculation.
With cpupools you avoid the pinning. This will result in better credit
calculation results.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Nov-10 05:55 UTC

head link

RE: [Xen-devel] RE: The caculation of the credit in credit_scheduler

>-----Original Message-----
>From: xen-devel-bounces@lists.xensource.com
>[mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Juergen Gross
>Sent: Wednesday, November 10, 2010 1:46 PM
>To: Jiang, Yunhong
>Cc: George Dunlap; xen-devel@lists.xensource.com; Dong, Eddie; Zhang,
Xiantao
>Subject: Re: [Xen-devel] RE: The caculation of the credit in
credit_scheduler
>
>On 11/10/10 03:39, Jiang, Yunhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
>>> Sent: Tuesday, November 09, 2010 10:27 PM
>>> To: Jiang, Yunhong
>>> Cc: xen-devel@lists.xensource.com; Dong, Eddie; Zhang, Xiantao
>>> Subject: Re: The caculation of the credit in credit_scheduler
>>>
>>> On 05/11/10 07:06, Jiang, Yunhong wrote:
>>>> The reason is how the credit is caculated. Although the 3 HVM
domains is
>pinned
>>> to 2 PCPU and share the 2 CPUs, they will all get 2* 300 credit
when credit
>account.
>>> That means the I/O intensive HVM domain will never be under credit,
thus it will
>>> preempt the CPU intensive whenever it is boost (i.e. after I/O
access to QEMU),
>and
>>> it is set to be TS_UNDER only at the tick time, and then, boost
again.
>>>
>>> I suspect that the real reason you''re having trouble is
that pinning and
>>> the credit mechanism don''t work very well together. 
Instead of pinning,
>>> have you tried using the cpupools interface to make a 2-cpu pool to
put
>>> the VMs into?  That should allow the credit to be divided
appropriately.
>>
>> I have a quick look in the code, and seems the cpu pool should not help
on such
>situation. The CPU poll only cares about the CPUs a domain can be scheduled,
but
>not about the credit caculation.
>
>With cpupools you avoid the pinning. This will result in better credit
>calculation results.
My system is doing testing, so I can''t do the experiment now, but
I''m not sure if the cpupool will help the credit caculation.
>From the code in csched_acct() at "common/sched_credit.c", the
credit_fair is caculated as followed, and the creadt_fair''s original
value is caculated by sum all pcpu''s credit, without considering the
cpu poll.
        credit_fair = ( ( credit_total
                          * sdom->weight
                          * sdom->active_vcpu_count )
                        + (weight_total - 1)
                      ) / weight_total;

Or did I missed anything?

Thanks
--jyh
>
>
>Juergen
>
>--
>Juergen Gross                 Principal Developer Operating Systems
>TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
>Fujitsu Technology Solutions              e-mail:
juergen.gross@ts.fujitsu.com
>Domagkstr. 28                           Internet: ts.fujitsu.com
>D-80807 Muenchen                 Company details:
>ts.fujitsu.com/imprint.html
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Juergen Gross

2010-Nov-10 06:03 UTC

head link

Re: [Xen-devel] RE: The caculation of the credit in credit_scheduler

On 11/10/10 06:55, Jiang, Yunhong wrote:>
>
>> -----Original Message-----
>> From: xen-devel-bounces@lists.xensource.com
>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Juergen
Gross
>> Sent: Wednesday, November 10, 2010 1:46 PM
>> To: Jiang, Yunhong
>> Cc: George Dunlap; xen-devel@lists.xensource.com; Dong, Eddie; Zhang,
Xiantao
>> Subject: Re: [Xen-devel] RE: The caculation of the credit in
credit_scheduler
>>
>> On 11/10/10 03:39, Jiang, Yunhong wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
>>>> Sent: Tuesday, November 09, 2010 10:27 PM
>>>> To: Jiang, Yunhong
>>>> Cc: xen-devel@lists.xensource.com; Dong, Eddie; Zhang, Xiantao
>>>> Subject: Re: The caculation of the credit in credit_scheduler
>>>>
>>>> On 05/11/10 07:06, Jiang, Yunhong wrote:
>>>>> The reason is how the credit is caculated. Although the 3
HVM domains is
>> pinned
>>>> to 2 PCPU and share the 2 CPUs, they will all get 2* 300 credit
when credit
>> account.
>>>> That means the I/O intensive HVM domain will never be under
credit, thus it will
>>>> preempt the CPU intensive whenever it is boost (i.e. after I/O
access to QEMU),
>> and
>>>> it is set to be TS_UNDER only at the tick time, and then, boost
again.
>>>>
>>>> I suspect that the real reason you''re having trouble
is that pinning and
>>>> the credit mechanism don''t work very well together. 
Instead of pinning,
>>>> have you tried using the cpupools interface to make a 2-cpu
pool to put
>>>> the VMs into?  That should allow the credit to be divided
appropriately.
>>>
>>> I have a quick look in the code, and seems the cpu pool should not
help on such
>> situation. The CPU poll only cares about the CPUs a domain can be
scheduled, but
>> not about the credit caculation.
>>
>> With cpupools you avoid the pinning. This will result in better credit
>> calculation results.
>
> My system is doing testing, so I can''t do the experiment now, but
I''m not sure if the cpupool will help the credit caculation.
>
>> From the code in csched_acct() at "common/sched_credit.c",
the credit_fair is caculated as followed, and the creadt_fair''s
original value is caculated by sum all pcpu''s credit, without
considering the cpu poll.
>
>          credit_fair = ( ( credit_total
>                            * sdom->weight
>                            * sdom->active_vcpu_count )
>                          + (weight_total - 1)
>                        ) / weight_total;
>
> Or did I missed anything?
The scheduler sees only the pcpus and domains in the pool, as it is cpupool
specific.
BTW: the credit scheduler''s problem with cpu pinning was the main
reason for
introducing cpupools.


Juergen

-- 
Juergen Gross                 Principal Developer Operating Systems
TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
Fujitsu Technology Solutions              e-mail: juergen.gross@ts.fujitsu.com
Domagkstr. 28                           Internet: ts.fujitsu.com
D-80807 Muenchen                 Company details: ts.fujitsu.com/imprint.html

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jiang, Yunhong

2010-Nov-10 10:53 UTC

head link

RE: [Xen-devel] RE: The caculation of the credit in credit_scheduler

Yes, this works. Thanks very much!

--jyh
>-----Original Message-----
>From: Juergen Gross [mailto:juergen.gross@ts.fujitsu.com]
>Sent: Wednesday, November 10, 2010 2:04 PM
>To: Jiang, Yunhong
>Cc: George Dunlap; xen-devel@lists.xensource.com; Dong, Eddie; Zhang,
Xiantao
>Subject: Re: [Xen-devel] RE: The caculation of the credit in
credit_scheduler
>
>On 11/10/10 06:55, Jiang, Yunhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: xen-devel-bounces@lists.xensource.com
>>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Juergen
Gross
>>> Sent: Wednesday, November 10, 2010 1:46 PM
>>> To: Jiang, Yunhong
>>> Cc: George Dunlap; xen-devel@lists.xensource.com; Dong, Eddie;
Zhang, Xiantao
>>> Subject: Re: [Xen-devel] RE: The caculation of the credit in
credit_scheduler
>>>
>>> On 11/10/10 03:39, Jiang, Yunhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: George Dunlap [mailto:George.Dunlap@eu.citrix.com]
>>>>> Sent: Tuesday, November 09, 2010 10:27 PM
>>>>> To: Jiang, Yunhong
>>>>> Cc: xen-devel@lists.xensource.com; Dong, Eddie; Zhang,
Xiantao
>>>>> Subject: Re: The caculation of the credit in
credit_scheduler
>>>>>
>>>>> On 05/11/10 07:06, Jiang, Yunhong wrote:
>>>>>> The reason is how the credit is caculated. Although the
3 HVM domains is
>>> pinned
>>>>> to 2 PCPU and share the 2 CPUs, they will all get 2* 300
credit when credit
>>> account.
>>>>> That means the I/O intensive HVM domain will never be under
credit, thus it
>will
>>>>> preempt the CPU intensive whenever it is boost (i.e. after
I/O access to
>QEMU),
>>> and
>>>>> it is set to be TS_UNDER only at the tick time, and then,
boost again.
>>>>>
>>>>> I suspect that the real reason you''re having
trouble is that pinning and
>>>>> the credit mechanism don''t work very well
together.  Instead of pinning,
>>>>> have you tried using the cpupools interface to make a 2-cpu
pool to put
>>>>> the VMs into?  That should allow the credit to be divided
appropriately.
>>>>
>>>> I have a quick look in the code, and seems the cpu pool should
not help on such
>>> situation. The CPU poll only cares about the CPUs a domain can be
scheduled, but
>>> not about the credit caculation.
>>>
>>> With cpupools you avoid the pinning. This will result in better
credit
>>> calculation results.
>>
>> My system is doing testing, so I can''t do the experiment now,
but I''m not sure if
>the cpupool will help the credit caculation.
>>
>>> From the code in csched_acct() at
"common/sched_credit.c", the credit_fair is
>caculated as followed, and the creadt_fair''s original value is
caculated by sum all
>pcpu''s credit, without considering the cpu poll.
>>
>>          credit_fair = ( ( credit_total
>>                            * sdom->weight
>>                            * sdom->active_vcpu_count )
>>                          + (weight_total - 1)
>>                        ) / weight_total;
>>
>> Or did I missed anything?
>
>The scheduler sees only the pcpus and domains in the pool, as it is cpupool
>specific.
>BTW: the credit scheduler''s problem with cpu pinning was the main
reason for
>introducing cpupools.
>
>
>Juergen
>
>--
>Juergen Gross                 Principal Developer Operating Systems
>TSP ES&S SWE OS6                       Telephone: +49 (0) 89 3222 2967
>Fujitsu Technology Solutions              e-mail:
juergen.gross@ts.fujitsu.com
>Domagkstr. 28                           Internet: ts.fujitsu.com
>D-80807 Muenchen                 Company details:
>ts.fujitsu.com/imprint.html
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Xiantao

2010-Nov-15 04:29 UTC

head link

[Xen-devel] RE: The caculation of the credit in credit_scheduler

George Dunlap wrote:> Xiantao,
> 
> Thanks for your comments.  All of the things you pointed out are
> things 
> I''m trying to address in credit2.  In fact, a huge amount of them
can
> be 
> attributed to the fact that credit1 divides tasks into 3 priorities
> (OVER, UNDER, and BOOST) and will schedule tasks "round-robin"
within
> a 
> each priority.  Round-robin is known to discriminate against tasks
> which 
> yield (such as tasks that do frequent I/O) in favor of tasks that
> don''t 
> yield (such as cpu "burners").
> In credit2, I hope to address these issues in a couple of ways:
> * Always sort the runqueue by order of credit.  This addresses issues
> in 
> all of 1, 2, and 3.
> * When a VM wakes up, update the credit of all the running VMs to see
> if 
> any of them should be preempted (addressing #3)
> * When selecting how long to run, I have a mechanism to look at the
> next 
> VM in the runqueue, and calculate how long it would take for the
> current 
> VM''s credit to equal the next VM''s credit.  I.e., if the
one chosen to
> run has 10ms of credit, and the next one on the runqueue has 7ms of
> credit, set the schedule time to 3ms.  This is limited by a "minimum
> schedule time" (currently 500us) and a "maximum schedule
time"
> (currently 10ms).  This could probably use some more tweaking, but it
> seem to work pretty well.
> 
> It''s not clear to me how to address a lot of the issues you bring
up
> without doing a big redesign -- which is what I''m already working
on.
> 
> If you''re interested in helping test / develop credit2, let me
know,
> I''d 
> love some help. :-)Hi, George
    Sorry for late reply! Glad to see you are addressing these issues in Credit2
and I also think sheduler is very performance-critical component in Xen, and it
should impact the whole system''s scalability and performance,
especailly for large system. And we are also working on solving such issues and
also glad to work with you and community to make the scheduler better and more
friendly to large systems.
Xiantao


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Nov 2010 - The caculation of the credit in credit_scheduler

[Xen-devel] The caculation of the credit in credit_scheduler

[Xen-devel] RE: The caculation of the credit in credit_scheduler

Re: [Xen-devel] RE: The caculation of the credit in credit_scheduler

RE: [Xen-devel] RE: The caculation of the credit in credit_scheduler

Re: [Xen-devel] RE: The caculation of the credit in credit_scheduler

[Xen-devel] Re: The caculation of the credit in credit_scheduler

[Xen-devel] Re: The caculation of the credit in credit_scheduler

[Xen-devel] RE: The caculation of the credit in credit_scheduler

Re: [Xen-devel] RE: The caculation of the credit in credit_scheduler

RE: [Xen-devel] RE: The caculation of the credit in credit_scheduler

Re: [Xen-devel] RE: The caculation of the credit in credit_scheduler

RE: [Xen-devel] RE: The caculation of the credit in credit_scheduler

[Xen-devel] RE: The caculation of the credit in credit_scheduler