thr3ads.net - Xen users - [Xen-users] Xen Scheduler: Credit Scheduler ? [Nov 2006]

If this information is useful, please help other people find it:
Share via:

Ott, Donna E

2006-Nov-18 18:42 UTC

[Xen-users] Xen Scheduler: Credit Scheduler ?

Following up on the strange behavior I''ve seen, here is a summary and
then my most
Recent experiment. 
I''ve been having a discussion with Emmanuel and he has urged me to post
to the list.
Up to now we have talked about what could be going wrong and whether
there was blocking or not, etc. 
 Basically, when I run 2-4 2vcpu domUs one of them gets "stalled" at
 .5-.6 CPU % after a little while. At first, they all
 balance nicely (as shown by xm top),then something shifts. I have even
had TWO domUs get "stalled" to .5-.6- while the third or fourth
happily
run on dividing up the left over cpu.
When I set xm sched-credit -C 30 -
 which means I limit each domain to 30% of a CPU- the lock-out
phenomena STILL occurs! So even though all(three) of the domUs can
easily "fit" (I have a 2 way ia64 box) they are still somehow getting
disorganized.  (note I can easily run the benchmark like this
by itself on one domU or on dom0 )

Yesterday I was able to do some more experiments and using vcpu-pin and
xm sched-cred
I was able to xm sched-credit -C  cap each domU to only 30% of a real
processor (thus two vcpus consigned to only 30% of a real processor) and
then I used vcpu-pin to pin each vcpu to a real processor. I did this
with 3 domUs for my experiment. I pinned each vcpu to a different
"real-cpu." When I ran this configuration I STILL got a
"stalled"
domU. Looking at vcpu-list I found that the "stalled" domU had been
blocked in a few 
Samples but not all- but a strange thing that I saw was that even after
it had been "unblocked" from being blocked- the time on the processor
was not updated. SO, a vcpu appeared to be blocked but then was
unblocked for a significant amount of time, yet did not appear to get
any more "time."  I am working on some other things to figure this
out.
  Sometimes it runs beautifully other times we get a "stall" (note
that
a "stall" is not the death of the domU it''s a situation where
that domU
is getting like .5/.6 cpu % and the running software on it is
"zombie-fied"
Once you ctl-c on that domU *after the others are done* you can "wake"
it up and it runs.

Any input/ideas? What could be happening?



Thanks!


Here''s my configuration:
4 way capable 2way ia64 Madison 1.2GHz processor server running RHEL-4AS
Xen-unstable snapshot from Oct 29 24GB of memory, each guest has 2GB of
memory assigned Each guest has its own fiber channel 140GB standalone
disk

As far as my application- its SDET an ancient and SPEC system benchmark.

Thanks!
Donna "searching" Ott


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Emmanuel Ackaouy

2006-Nov-20 10:17 UTC

head link

[Xen-users] Re: Xen Scheduler: Credit Scheduler ?

On Sat, Nov 18, 2006 at 01:42:15PM -0500, Ott, Donna E
wrote:> Any input/ideas? What could be happening?
Hi Donna.

Again,

You need to find out if the VCPUs are blocked in the kernel
or runnable but not being scheduled.

The easiest way to do this is to run 2 spinner processes in
the guest after it "stalls".

That will tell you if it''s the application that has stalled
or if it''s the guest OS that''s runnable but not getting any
CPU time.

Running 3 competing 2vcpu guests on a 2cpu host may cause
some interesting problems because while the OS is written
to assume that its physical CPUs all exist at the same time,
the same is not necessarly true in a virtual environment.
Your guest OS or benchmark could be timing out due to time
outs on spinlocks or something like that.

The way to make progress on this is:
1- verify that if your vcpus are runnable they run: do this
   by running spinners on top of ur benchmark or once the
   benchmark stalls.
2- verify that the problem goes away with single CPU guests.
3- collect scheduler traces on all CPUs.

In general, the best way to deal with SMP guests which have
less CPU resources than their number of VCPUs is to "fold"
the guest down using the CPU hotplug mechanism. There are
other alternatives as well that we can look at. Before we do
so, let''s try to reduce this problem a bit so we can verify
if this is or isn''t a virtual SMP issue.

Emmanuel.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Ott, Donna E

2006-Nov-22 18:29 UTC

head link

[Xen-users] RE: Xen Scheduler: Credit Scheduler ?

> You need to find out if the VCPUs are blocked in the kernel 
> or runnable but not being scheduled.
> 
> The easiest way to do this is to run 2 spinner processes in 
> the guest after it "stalls".
Well, I did find that if I wait a bit and then hit "ctl-C" and/or type
into the "stalled" domain, it
Will start up again but will never get much CPU time relative to what it
"had" it will
Then complete the benchmark - but with errors-
obviously.> 
> That will tell you if it''s the application that has stalled 
> or if it''s the guest OS that''s runnable but not getting
any CPU time.
Could you explain the details?
> 
> Running 3 competing 2vcpu guests on a 2cpu host may cause 
> some interesting problems because while the OS is written to 
> assume that its physical CPUs all exist at the same time, the 
> same is not necessarly true in a virtual environment.
> Your guest OS or benchmark could be timing out due to time 
> outs on spinlocks or something like that.I have now run them as 1cpu guests as well. (Once again I think it
unlikely that my 
Benchmark is timing out, etc. it''s a well known, well used, even by me,
and has NEVER
Behaved this way on other Os''s or virtualization software. (that said
anything is possible in software/hw land!!))
> 
> The way to make progress on this is:
> 1- verify that if your vcpus are runnable they run: do this
>    by running spinners on top of ur benchmark or once the
>    benchmark stalls.Not sure what you mean by this- once the benchmark stalls- it is still
there and typing
Into the domain will make it start  to run again- sort of right where it
had "paused".

> 2- verify that the problem goes away with single CPU guests.It does NOT go away with single cpu guests- shockingly- it can even
occur with a single
Guest and a large load- say "xm create newguest" - will stall out the
"single guest"

It is particularly easy to see on the first run with the three guests-
or even two.
Just create them,set up the bm, run it in each guest (by hand) and in
moments  a "stall"
Will occur. After the first run, it is harder to get to happen. But the
first time is fairly repeatable.

Though, it does seem to be less prevalent with single guests but it can
STILL happen.

> 3- collect scheduler traces on all CPUs.Ok, please explain how to do this. I am running out of time to debug
this.
I may soon have to leave this as it is and just go with the results I
have (sadly as I am so impressed with it when it runs well.)

> 
> In general, the best way to deal with SMP guests which have 
> less CPU resources than their number of VCPUs is to "fold"
> the guest down using the CPU hotplug mechanism. There are 
> other alternatives as well that we can look at. Before we do 
> so, let''s try to reduce this problem a bit so we can verify 
> if this is or isn''t a virtual SMP issue.
Sounds great to me- hope this latest data is helpful. I wish I had more
time!
Cheers
Donna "thankful for what I found that worked well" Ott

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Emmanuel Ackaouy

2006-Nov-23 11:33 UTC

head link

[Xen-users] Re: Xen Scheduler: Credit Scheduler ?

It''s a bit frustrating that we''re not making progess isolating
the problem here.

We still don''t have any concrete evidence showing that the
benchmark user processes or domain VCPU are or aren''t runnable
when you notice the "stall".

It''s also not clear what is the simplest scenario under which
the problem can be reproduced. I tried reading your explanation
about what happens with UP guests but I can''t understand it.
Can you clarify exactly what you are doing and in what order?

I threw around some ideas to get more data points and help
debug this:
- run spinners on the guest that consume CPU
    "int main() { while (1); return 0; }" and run X copies (for X
vcpus)
- take scheduler traces (man xentrace)

You could also monitor the benchmark from inside the guest using
a variety of means to check if its processes are blocked for any
reason and why.

You need to isolate the problem further. We just don''t have
anything to go on right now to even say if this is a Xen problem
or not and much less a scheduler or other type of issue.


Emmanuel.


On Wed, Nov 22, 2006 at 01:29:04PM -0500, Ott, Donna E
wrote:>  
> > You need to find out if the VCPUs are blocked in the kernel 
> > or runnable but not being scheduled.
> > 
> > The easiest way to do this is to run 2 spinner processes in 
> > the guest after it "stalls".
> 
> Well, I did find that if I wait a bit and then hit "ctl-C" and/or
type
> into the "stalled" domain, it
> Will start up again but will never get much CPU time relative to what it
> "had" it will
> Then complete the benchmark - but with errors- obviously.
> > 
> > That will tell you if it''s the application that has stalled 
> > or if it''s the guest OS that''s runnable but not
getting any CPU time.
> 
> Could you explain the details?
> 
> > 
> > Running 3 competing 2vcpu guests on a 2cpu host may cause 
> > some interesting problems because while the OS is written to 
> > assume that its physical CPUs all exist at the same time, the 
> > same is not necessarly true in a virtual environment.
> > Your guest OS or benchmark could be timing out due to time 
> > outs on spinlocks or something like that.
> I have now run them as 1cpu guests as well. (Once again I think it
> unlikely that my 
> Benchmark is timing out, etc. it''s a well known, well used, even
by me,
> and has NEVER
> Behaved this way on other Os''s or virtualization software. (that
said
> anything is possible in software/hw land!!))
> 
> > 
> > The way to make progress on this is:
> > 1- verify that if your vcpus are runnable they run: do this
> >    by running spinners on top of ur benchmark or once the
> >    benchmark stalls.
> Not sure what you mean by this- once the benchmark stalls- it is still
> there and typing
> Into the domain will make it start  to run again- sort of right where it
> had "paused".
> 
> 
> > 2- verify that the problem goes away with single CPU guests.
> It does NOT go away with single cpu guests- shockingly- it can even
> occur with a single
> Guest and a large load- say "xm create newguest" - will stall out
the
> "single guest"
> 
> It is particularly easy to see on the first run with the three guests-
> or even two.
> Just create them,set up the bm, run it in each guest (by hand) and in
> moments  a "stall"
> Will occur. After the first run, it is harder to get to happen. But the
> first time is fairly repeatable.
> 
> Though, it does seem to be less prevalent with single guests but it can
> STILL happen.
> 
> 
> > 3- collect scheduler traces on all CPUs.
> Ok, please explain how to do this. I am running out of time to debug
> this.
> I may soon have to leave this as it is and just go with the results I
> have (sadly as I am so impressed with it when it runs well.)
> 
> 
> > 
> > In general, the best way to deal with SMP guests which have 
> > less CPU resources than their number of VCPUs is to "fold"
> > the guest down using the CPU hotplug mechanism. There are 
> > other alternatives as well that we can look at. Before we do 
> > so, let''s try to reduce this problem a bit so we can verify 
> > if this is or isn''t a virtual SMP issue.
> 
> Sounds great to me- hope this latest data is helpful. I wish I had more
> time!
> Cheers
> Donna "thankful for what I found that worked well" Ott
_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Ulrich Windl

2006-Nov-24 07:23 UTC

head link

Re: [Xen-users] Re: Xen Scheduler: Credit Scheduler ?

On 23 Nov 2006 at 11:33, Emmanuel Ackaouy wrote:
> It''s a bit frustrating that we''re not making progess
isolating
> the problem here.
> 
> We still don''t have any concrete evidence showing that the
> benchmark user processes or domain VCPU are or aren''t runnable
> when you notice the "stall".
Stupid idea: just write a program that outputs the time of day at high
resolution
to some file. The run at least one such program in each domain (preferrably with
synced clocks), then inspect the output for discontinuities.
> 
> It''s also not clear what is the simplest scenario under which
> the problem can be reproduced. I tried reading your explanation
> about what happens with UP guests but I can''t understand it.
> Can you clarify exactly what you are doing and in what order?
If it''s not CPU-bound (the example uses little I/O), you could try
something like
copying as much data as 90% of your RAM from one file to another. Logically if a
process in Dom0 is blocked on I/O, the scheduler can do little to help DomUs 
waiting for completion of that I/O. I suspect...
> 
> I threw around some ideas to get more data points and help
> debug this:
> - run spinners on the guest that consume CPU
>     "int main() { while (1); return 0; }" and run X copies (for X
vcpus)
> - take scheduler traces (man xentrace)
A bad example, because that will either be optimized that it will execute in
CPU''s
L1 cache, thus causing about no load at all (assuming a SMP system).

What about "find /usr -type f | xargs cat | gzip >/dev/null"
instead. If you have
plenty of room output to some real file. That''s some load.

Regards,
Ulrich


_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Emmanuel Ackaouy

2006-Nov-24 09:06 UTC

head link

Re: [Xen-users] Re: Xen Scheduler: Credit Scheduler ?

On Fri, Nov 24, 2006 at 08:23:54AM +0100, Ulrich Windl
wrote:> > I threw around some ideas to get more data points and help
> > debug this:
> > - run spinners on the guest that consume CPU
> >     "int main() { while (1); return 0; }" and run X copies
(for X vcpus)
> > - take scheduler traces (man xentrace)
> 
> A bad example, because that will either be optimized that it will execute
in CPU''s
> L1 cache, thus causing about no load at all (assuming a SMP system).
> 
> What about "find /usr -type f | xargs cat | gzip >/dev/null"
instead. If you have
> plenty of room output to some real file. That''s some load.
Why would you want to generate cache misses or file system I/O in
this case? We just want to keep a domain artificially runnable
to verify that Xen schedules its VCPUs appropriately. An infinite
loop does just that.

_______________________________________________
Xen-users mailing list
Xen-users@lists.xensource.com
http://lists.xensource.com/xen-users

Xen users - Nov 2006 - Xen Scheduler: Credit Scheduler ?

[Xen-users] Xen Scheduler: Credit Scheduler ?

[Xen-users] Re: Xen Scheduler: Credit Scheduler ?

[Xen-users] RE: Xen Scheduler: Credit Scheduler ?

[Xen-users] Re: Xen Scheduler: Credit Scheduler ?

Re: [Xen-users] Re: Xen Scheduler: Credit Scheduler ?

Re: [Xen-users] Re: Xen Scheduler: Credit Scheduler ?