thr3ads.net - Xen devel - [Xen-devel] Questions regarding Xen Credit Scheduler [Jul 2010]

If this information is useful, please help other people find it:
Share via:

Gaurav Dhiman

2010-Jul-08 21:14 UTC

[Xen-devel] Questions regarding Xen Credit Scheduler

Hi All,

I am using Xen 3.3.2 for some of my experiments, and have been
consistently observing some sub-optimal results in our experiments
with latency sensitive I/O intensive and compute bound VMs running
together on a physical machine. We are observing latency issues in
cases with enough CPU resources available for both the VMs to co-exist
well, even if we give much higher weight to the latency sensitive VM.
I suspect it is due to the way the Xen credit scheduler works. In this
context, I have some questions regarding the scheduler:

1) In the sched_acct function, the credit cap is set to 300, enough to
survive one time slice. But if some VCPU crosses that cap, it is set
to 0, and marked inactive. Why is there no concept of a ceiling (like
that of a floor for the VCPUs going over the credit line), i.e. why is
it not set to 300? Is there some fundamental reason for setting it to
0? I believe this is resulting in a lot of times when our latency
sensitive VCPUs have to wait for maybe a time slice, when they can
immediately run. This might happen if they run with BOOST priority and
get interrupted by a timer tick, which takes that priority away.

2) Why is the runq sorted by just priority (which is very coarse
grained: BOOST, UNDER and OVER), and not the credit? This can result
in VCPUs with higher credit getting starved for CPU if we have batch
and latency sensitive VCPUs in the system.

3) Is there some patch, which makes the current credit scheduler
fairer to the latency sensitive VCPUs? I see that the sched_credit2
scheduler addresses these issues, but right now it has just one global
runq and no load balancing features.

Any advice/inputs here will be extremely valuable!

Thanks in advance,
-Gaurav

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2010-Jul-09 12:13 UTC

head link

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

Guarav,

I''ve identified a lot of the problems you mention here (you may want
to see my paper and talk from XenSummit Asia 2009 [1]) , but I haven''t
done anything to address them in credit1 because I thought it really
just needed to be scrapped and started over.

However, that process is taking a lot longer than I''d hoped, so I
think it may make sense to do some work to patch up the current
scheduler to keep it running until the new one can replace it.  Are
you willing to help out with some investigation / testing in this
process?

Regarding specific things:

One thing you didn''t catch is that credits before 4.0 are debited
probabilistically, a full 10ms at a time, by the very timer tick that
moves a vcpu from "inactive" to "active"; so when you make
the switch
from "active" to "inactive", you don''t start out at
0, but at -10ms.
It turns out that''s not only bad for latency-sensitive processes, but
it''s also a security bug; so there''s a patch in 4.0 (not sure
whether
it''s been backported to 3.4) to do accurate accounting based on RDTSC
reads instead of probabilistic-based accounting based on timer ticks.

#1: Setting the credits to 0 is part of the "reset condition" I
mention in my paper.  The basic idea is that accumulated credit needs
to be discarded somehow.  I have a patch that will intsead of setting
it to 0, will divide it by 2.  This should balance between discarding
credits and not starting too far "behind".

#2: AFAICT, the reason for choosing to sort by priority was that it
allowed a simple O(n) sorting algorithm.  However, the effect is that
within a given priority, scheduling is round-robin.  Round-robin
scheduling is known to discriminate against processes that voluntarily
block in favor of those that use up their entire timeslice.  Diego et
al[2] did some experiments with sorting by credit and found that it
helped latency sensitive workloads

So the answer to #3 is:
* The "accurate credit" patch is in 4.0, maybe 3.4.  That should help
somewhat.
* I have a patch that will change the "reset condition"; I''m
considering submitting it.  I''d appreciate testing / feedback. 
(I''ll
send this in a separate e-mail.)
* There is no patch yet that will fix the sort-by-priority, but it
should be simple and straightforward to implement.  I''ll support
putting it in once I''m reasonably convinced that it helps and
doesn''t
hurt too much.  If you were to help out with the implementation and
testing, that will happen a lot faster. :--)

Peace,
 -George

Refs:
[1] http://www.xen.org/xensummit/xensummit_fall_2009.html -- search
for my name under "Topics"
[2]  Diego Ongaro , Alan L. Cox , Scott Rixner, Scheduling I/O in
virtual machine monitors, Proceedings of the fourth ACM SIGPLAN/SIGOPS
international conference on Virtual execution environments, March
05-07, 2008, Seattle, WA, USA

On Thu, Jul 8, 2010 at 10:14 PM, Gaurav Dhiman <dimanuec@gmail.com>
wrote:> 1) In the sched_acct function, the credit cap is set to 300, enough to
> survive one time slice. But if some VCPU crosses that cap, it is set
> to 0, and marked inactive. Why is there no concept of a ceiling (like
> that of a floor for the VCPUs going over the credit line), i.e. why is
> it not set to 300? Is there some fundamental reason for setting it to
> 0? I believe this is resulting in a lot of times when our latency
> sensitive VCPUs have to wait for maybe a time slice, when they can
> immediately run. This might happen if they run with BOOST priority and
> get interrupted by a timer tick, which takes that priority away.

>
> 2) Why is the runq sorted by just priority (which is very coarse
> grained: BOOST, UNDER and OVER), and not the credit? This can result
> in VCPUs with higher credit getting starved for CPU if we have batch
> and latency sensitive VCPUs in the system.
>
> 3) Is there some patch, which makes the current credit scheduler
> fairer to the latency sensitive VCPUs? I see that the sched_credit2
> scheduler addresses these issues, but right now it has just one global
> runq and no load balancing features.
>
> Any advice/inputs here will be extremely valuable!
>
> Thanks in advance,
> -Gaurav
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gaurav Dhiman

2010-Jul-10 09:21 UTC

head link

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

Hi George,

Thanks for your reply. I am in the process of fixing some of these
issues. So this is what I have in mind:

1. __runq_insert: Insert according to credit as well as priority.
Current code just looks at the priority, which is very coarse.
2. __runq_tickle: Tickle the CPU even if the new VCPU has same
priority but higher amount of credits left. Current code just looks at
the priority.
3. csched_runq_sort: Sort according to credit.
4. csched_acct: If credit of a VCPU crosses 300, then set it to 300,
not 0. I am still not sure why the VCPU is being marked as inactive?
Can''t I just update the credit and let it be active?
5. csched_schedule: Always call csched_load_balance. In the
csched_load_balance and csched_runq_steal functions, change the logic
to grab a VCPU with higher credit. Current code just works on
priority.

Do you think, these ideas make sense? Am I missing out on something?
> Regarding specific things:
>
> One thing you didn''t catch is that credits before 4.0 are debited
> probabilistically, a full 10ms at a time, by the very timer tick that
> moves a vcpu from "inactive" to "active"; so when you
make the switch
> from "active" to "inactive", you don''t start
out at 0, but at -10ms.
Yes, I noticed this; point 4 above tries to address this. As I
mentioned above, I am not sure why it is being marked inactive in
first place?
> It turns out that''s not only bad for latency-sensitive processes,
but
> it''s also a security bug; so there''s a patch in 4.0 (not
sure whether
> it''s been backported to 3.4) to do accurate accounting based on
RDTSC
> reads instead of probabilistic-based accounting based on timer ticks.
Yes, I have seen Xen 4.0 code; it does deterministic accounting by
recording the amount of time spent on the CPU by a VCPU.
> So the answer to #3 is:
> * The "accurate credit" patch is in 4.0, maybe 3.4.  That should
help somewhat.
> * I have a patch that will change the "reset condition";
I''m
> considering submitting it.  I''d appreciate testing / feedback.
 (I''ll
> send this in a separate e-mail.)
Please do send this.
> * There is no patch yet that will fix the sort-by-priority, but it
> should be simple and straightforward to implement.  I''ll support
> putting it in once I''m reasonably convinced that it helps and
doesn''t
> hurt too much.  If you were to help out with the implementation and
> testing, that will happen a lot faster. :--)
I am trying to implement the ideas I mentioned above. You feedback
would be very helpful.

Thanks,
-Gaurav

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2010-Jul-12 11:05 UTC

head link

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

On Sat, Jul 10, 2010 at 10:21 AM, Gaurav Dhiman <dimanuec@gmail.com>
wrote:> 1. __runq_insert: Insert according to credit as well as priority.
> Current code just looks at the priority, which is very coarse.
[snip]> 3. csched_runq_sort: Sort according to credit.
I think these (both related to a sorted runqueue) are probably good
ideas.  The main thing to pay attention to is overhead: run some tests
with long runqueues, and see if there''s any performance degradation.
> 2. __runq_tickle: Tickle the CPU even if the new VCPU has same
> priority but higher amount of credits left. Current code just looks at
> the priority.
[snip]> 5. csched_schedule: Always call csched_load_balance. In the
> csched_load_balance and csched_runq_steal functions, change the logic
> to grab a VCPU with higher credit. Current code just works on
> priority.
I''m much more wary of these ideas.  The problem here is that doing
runqueue tickling and load balancing isn''t free -- IPIs can be
expensive, especially if your VMs are running with hardware
virtualization .  In fact, with the current scheduler, you get a sort
of n^2 effect, where the time the system spends doing IPIs due to load
balancing squares with the number of schedulable entities.  In
addition, frequent migration will reduce cache effectiveness and
increase congestion on the memory bus.

I presume you want to do this to decrease the latency?  Lee et al [1]
actually found that *decreasing* the cpu migrations of their soft
real-time workload led to an overall improvement in quality.  The
paper doesn''t delve deeply into why, but it seems reasonable to
conclude that although the vcpus may have been able to start their
task sooner (although even that''s not guaranteed -- it may have taken
longer to migrate than to get to the front of the runqueue), they
ended their task later, presumably due to cpu stalls on cacheline
misses and so on.

I think a much better approach would be:
* To have long-term effective placement, if possible: i.e., distribute
latency-sensitive vcpus
* If two latency-sensitive vcpus are sharing a cpu, do shorter time-slices.

But I think those need more research; it would be better to put that
effort into the new scheduler.
> 4. csched_acct: If credit of a VCPU crosses 300, then set it to 300,
> not 0. I am still not sure why the VCPU is being marked as inactive?
> Can''t I just update the credit and let it be active?
Did you read the whitepaper that I linked to, and/or watch my
presentation? It has a lot of information about the logic behind the
algorithm: specifically, the tendency of this "credit-like" class of
algorithms to credit divergence.  Please read it and let me know if
you have any questions.

The active / inactive distinction has to do with who gets credits. If
you just divided credits equally with everyone, then eventually VMs
that weren''t using credits would gain a lot (or be capped out, as
your''e suggesting).  Because we allow VMs to use "extra" cpu
time if
it''s not being used, those VMs will by definition burn more credits
than they earn, and will tend to go off to negative.

So what credit1 does is assume that all workloads fall into two
categories: "active" VMs, which consume as much cpu as they can, and
"inactive" (or "I/O-bound") VMs, which use almost no cpu. 
"Inactive"
VMs essentially run at BOOST priority, and run whenever they want to.
Then the credit for each timeslice is divided among the "active" VMs.
 This way the ones that are consuming cpu don''t get too far behind.

The problem of course, is that most server workloads fall in the
middle: they spend a significant time processing, but also a
significant time waiting for more network packets.

I looked at the idea of "capping" credit, as you say; but the
steady-state when I worked out the algorithms by hand was that all the
VMs were at their cap all the time, which screwed up other aspects of
the algorithm.  Credits need to be thrown away; my proposal was to
divide the credits by 2, rather than setting to 0.  This should be a
good mid-way.

These things are actually really subtle.  I''ve spent hours and hours
with pencil-and-paper, working out different algorithms by hand, to
see exactly what effect the different changes would have.  I even
wrote a discrete event simulator, to make the process a bit faster.
(But of course, to understand why things look the way they do, you
still have to trace through the algorithm manually).  If you''re really
keen, I can tar it up and send it to you. :-)

So in summary:
* Please do post a sort-runq-by-credit patch, preferably along with
some benchmarks showing a lack of performance impact
* Don''t think increased load-balancing is the right approach. 
Won''t
scale, and probably won''t even make throughput faster.  I
wouldn''t
approve these without significant large-scale testing.
* Think the "reset condition" could use revising. My sense is that
leaving it at the cap isn''t the best approach.  If you can convince me
that it works OK (including test results as well as posting graphs of
consumed credit, &c), I''ll consider it; but I think dividing in
half
will be better.  I''ll post a patch later today.
* Let me know if you want my hacked-up scheduler simulator to play with. :-)

Thanks again for your help,
 -George

[1] Min Lee, A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini
Yajnik. ‘Supporting Soft
Real-Time Tasks in the Xen Hypervisor’, VEE 2010, Pittsburgh, PA,
March 17-19, 2010

> Do you think, these ideas make sense? Am I missing out on something?
>
>> Regarding specific things:
>>
>> One thing you didn''t catch is that credits before 4.0 are
debited
>> probabilistically, a full 10ms at a time, by the very timer tick that
>> moves a vcpu from "inactive" to "active"; so when
you make the switch
>> from "active" to "inactive", you don''t
start out at 0, but at -10ms.
>
> Yes, I noticed this; point 4 above tries to address this. As I
> mentioned above, I am not sure why it is being marked inactive in
> first place?
>
>> It turns out that''s not only bad for latency-sensitive
processes, but
>> it''s also a security bug; so there''s a patch in 4.0
(not sure whether
>> it''s been backported to 3.4) to do accurate accounting based
on RDTSC
>> reads instead of probabilistic-based accounting based on timer ticks.
>
> Yes, I have seen Xen 4.0 code; it does deterministic accounting by
> recording the amount of time spent on the CPU by a VCPU.
>
>> So the answer to #3 is:
>> * The "accurate credit" patch is in 4.0, maybe 3.4.  That
should help somewhat.
>> * I have a patch that will change the "reset condition";
I''m
>> considering submitting it.  I''d appreciate testing / feedback.
 (I''ll
>> send this in a separate e-mail.)
>
> Please do send this.
>
>> * There is no patch yet that will fix the sort-by-priority, but it
>> should be simple and straightforward to implement.  I''ll
support
>> putting it in once I''m reasonably convinced that it helps and
doesn''t
>> hurt too much.  If you were to help out with the implementation and
>> testing, that will happen a lot faster. :--)
>
> I am trying to implement the ideas I mentioned above. You feedback
> would be very helpful.
>
> Thanks,
> -Gaurav
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Gaurav Dhiman

2010-Jul-16 00:41 UTC

head link

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

On Mon, Jul 12, 2010 at 4:05 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:>> 2. __runq_tickle: Tickle the CPU even if the new VCPU has same
>> priority but higher amount of credits left. Current code just looks at
>> the priority.
> [snip]
>> 5. csched_schedule: Always call csched_load_balance. In the
>> csched_load_balance and csched_runq_steal functions, change the logic
>> to grab a VCPU with higher credit. Current code just works on
>> priority.
>
> I''m much more wary of these ideas.  The problem here is that doing
> runqueue tickling and load balancing isn''t free -- IPIs can be
> expensive, especially if your VMs are running with hardware
> virtualization .  In fact, with the current scheduler, you get a sort
> of n^2 effect, where the time the system spends doing IPIs due to load
> balancing squares with the number of schedulable entities.  In
> addition, frequent migration will reduce cache effectiveness and
> increase congestion on the memory bus.
>
> I presume you want to do this to decrease the latency?  Lee et al [1]
> actually found that *decreasing* the cpu migrations of their soft
> real-time workload led to an overall improvement in quality.  The
> paper doesn''t delve deeply into why, but it seems reasonable to
> conclude that although the vcpus may have been able to start their
> task sooner (although even that''s not guaranteed -- it may have
taken
> longer to migrate than to get to the front of the runqueue), they
> ended their task later, presumably due to cpu stalls on cacheline
> misses and so on.
>
Thanks for this paper. It gives a very interesting analysis on what
can go wrong with applications that fall in the middle (need CPU, but
are latency sensitive as well). In my experiments, I see some servers
like mysql db-servers fall into this category. And as expected they do
not do well with some CPU intensive jobs in background, even if I give
them highest possible weight (65535). I guess very aggressive
migrations might not be a good idea, but there needs to be some way to
guarantee such apps getting their fair share at the right time.
> I think a much better approach would be:
> * To have long-term effective placement, if possible: i.e., distribute
> latency-sensitive vcpus
> * If two latency-sensitive vcpus are sharing a cpu, do shorter time-slices.
These are very interesting ideas indeed.
>> 4. csched_acct: If credit of a VCPU crosses 300, then set it to 300,
>> not 0. I am still not sure why the VCPU is being marked as inactive?
>> Can''t I just update the credit and let it be active?
> So what credit1 does is assume that all workloads fall into two
> categories: "active" VMs, which consume as much cpu as they can,
and
> "inactive" (or "I/O-bound") VMs, which use almost no
cpu.  "Inactive"
> VMs essentially run at BOOST priority, and run whenever they want to.
> Then the credit for each timeslice is divided among the "active"
VMs.
>  This way the ones that are consuming cpu don''t get too far
behind.
>
> The problem of course, is that most server workloads fall in the
> middle: they spend a significant time processing, but also a
> significant time waiting for more network packets.
This is precisely the problem we are facing.
> I looked at the idea of "capping" credit, as you say; but the
> steady-state when I worked out the algorithms by hand was that all the
> VMs were at their cap all the time, which screwed up other aspects of
> the algorithm.  Credits need to be thrown away; my proposal was to
> divide the credits by 2, rather than setting to 0.  This should be a
> good mid-way.
Sure, dividing by 2 could be a good middle ground. We can additionally
not mark them inactive as well?
> These things are actually really subtle.  I''ve spent hours and
hours
> with pencil-and-paper, working out different algorithms by hand, to
> see exactly what effect the different changes would have.  I even
> wrote a discrete event simulator, to make the process a bit faster.
> (But of course, to understand why things look the way they do, you
> still have to trace through the algorithm manually).  If you''re
really
> keen, I can tar it up and send it to you. :-)
I am just figuring out how non trivial these apparently small problems
are :-) It would be great if you could share your simulator!

I will keep you posted on my changes and tests.

Thanks,
-Gaurav

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2010-Jul-16 09:13 UTC

head link

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

On Fri, Jul 16, 2010 at 1:41 AM, Gaurav Dhiman <dimanuec@gmail.com>
wrote:> Sure, dividing by 2 could be a good middle ground. We can additionally
> not mark them inactive as well?
Think through the implications of your policy if we have the following
situation:
* 2 "burn" VMs, one with weight 100, one with weight 200
* 10 mostly idle VMs, using 1% of the cpu each, with a weight of 100.

Think about what the ideal scheduler would do in this situation.

You want the idle VMs to run whenever they want; that''s 90% left for
the two "burn" VMs.  We want one "burn" VM to run 30% of the
time, and
the other to run 60% of the time (because of the weights).

Now, consider what would happen if we use the algorithm you describe.
Credit1 divides all credits by weight among "active" VMs.  With your
modification, we''re not marking any VMs "inactive", so
we''re dividing
it by all VMs.  That means each accounting period, the "idle" VMs are
each getting about 7.7% of the credit (1/13), the 100-weight
''burn" VM
is getting 7.7% of the credit, and the 200-weight "burn" vm is getting
15.4% of the credit (2/13).

Now what happens?  The "burn" VMs are guaranteed to burn more than
their credits, so they''re continually negative. The 200-weight VM only
has 7.7% of cpu time more credit added per accounting period than the
100-weight VM, so even if we sort by credits, it''s likely that the
split will be 10% idle VMs / 49% 200-weight / 41 % 100-weight (i.e.,
the 200-weight gets 7.7% of total cpu time more, rather than twice as
much).  If we don''t set a "floor" for credits, then the
credit of the
"burn" VMs will continue to go negative into oblivion; if we do set a
floor, the steady state will be for all VMs to be either at the
ceiling (if they''re not using their "fair share"), or at the
floor (if
they are).

(I encourage you to work out your algorithm by hand, or set up a
simulator and go over the results with a fine-tooth comb, to
understand why this is the case.  It''s a real grind, but it will give
you a really solid foundation for understanding scheduling problems.
I''ve spent hours and hours doing just that.)

Credit1 solves this by using the "active / inactive" designation.  The
100-weight VM gets 33% of the credits, the 200-weight VM gets 66% of
the credits, and the idle VMs are usually in the "inactive" state,
running at BOOST priority; only occasionally flipping into "active"
for a short time, before flipping back to "inactive".

It''s far from ideal, as you''ve seen, but it usually works not
too
badly.  Changing the credits to divide by 2 (but still mark it
"active") is a patch-up.  But a more fundamental change in the
algorithm needs to be made to avoid this; and that''s what credit2 is
for.

BTW, what are you using to do your analysis of the live scheduler?
Xen has a tracing mechanism that I''ve found indispensable for
understanding what the algorithm was actually doing.  I''ve got the
basic tool I use to analyze the output here:

http://xenbits.xensource.com/ext/xenalyze.hg

I don''t have the patches used to analyze the scheduler stuff public
(since they normally go through a lot of churn, and are interesting
almost exclusively to developers), but I''ll see if I can dig some of
them up for you.

 -George

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

George Dunlap

2010-Jul-16 11:04 UTC

head link

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

I''ve uploaded my scheduler simulator here:

 http://xenbits.xensource.com/people/gdunlap/sched-sim.hg

It''s a little hacky still, but I think you should be able to adapt it
to your needs.

There are "workloads" defined in workloads.c, and a generic scheduling
interface.  The hg includes an example "round-robin" scheduler, and
versions of credit2 with different features added (to be able to
compare their effects on different simulated workloads).

There''s a script, run.sh which will run a series of simulations and
visualize them with "ygraph".

It needs some way to specify scheduler-specific things in the workload
definitions; atm versions 01 and 02 don''t use weights, and 03 has a
hard-coded list. :-)

If you want to use to to experiment with tweaks to credit1, you''ll
have to implement that yourself.

Feel free to send patches for improvements / fixes.

 -George

On Fri, Jul 16, 2010 at 10:13 AM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:> On Fri, Jul 16, 2010 at 1:41 AM, Gaurav Dhiman <dimanuec@gmail.com>
wrote:
>> Sure, dividing by 2 could be a good middle ground. We can additionally
>> not mark them inactive as well?
>
> Think through the implications of your policy if we have the following
> situation:
> * 2 "burn" VMs, one with weight 100, one with weight 200
> * 10 mostly idle VMs, using 1% of the cpu each, with a weight of 100.
>
> Think about what the ideal scheduler would do in this situation.
>
> You want the idle VMs to run whenever they want; that''s 90% left
for
> the two "burn" VMs.  We want one "burn" VM to run 30%
of the time, and
> the other to run 60% of the time (because of the weights).
>
> Now, consider what would happen if we use the algorithm you describe.
> Credit1 divides all credits by weight among "active" VMs.  With
your
> modification, we''re not marking any VMs "inactive", so
we''re dividing
> it by all VMs.  That means each accounting period, the "idle" VMs
are
> each getting about 7.7% of the credit (1/13), the 100-weight
''burn" VM
> is getting 7.7% of the credit, and the 200-weight "burn" vm is
getting
> 15.4% of the credit (2/13).
>
> Now what happens?  The "burn" VMs are guaranteed to burn more
than
> their credits, so they''re continually negative. The 200-weight VM
only
> has 7.7% of cpu time more credit added per accounting period than the
> 100-weight VM, so even if we sort by credits, it''s likely that the
> split will be 10% idle VMs / 49% 200-weight / 41 % 100-weight (i.e.,
> the 200-weight gets 7.7% of total cpu time more, rather than twice as
> much).  If we don''t set a "floor" for credits, then the
credit of the
> "burn" VMs will continue to go negative into oblivion; if we do
set a
> floor, the steady state will be for all VMs to be either at the
> ceiling (if they''re not using their "fair share"), or at
the floor (if
> they are).
>
> (I encourage you to work out your algorithm by hand, or set up a
> simulator and go over the results with a fine-tooth comb, to
> understand why this is the case.  It''s a real grind, but it will
give
> you a really solid foundation for understanding scheduling problems.
> I''ve spent hours and hours doing just that.)
>
> Credit1 solves this by using the "active / inactive" designation.
 The
> 100-weight VM gets 33% of the credits, the 200-weight VM gets 66% of
> the credits, and the idle VMs are usually in the "inactive"
state,
> running at BOOST priority; only occasionally flipping into
"active"
> for a short time, before flipping back to "inactive".
>
> It''s far from ideal, as you''ve seen, but it usually works
not too
> badly.  Changing the credits to divide by 2 (but still mark it
> "active") is a patch-up.  But a more fundamental change in the
> algorithm needs to be made to avoid this; and that''s what credit2
is
> for.
>
> BTW, what are you using to do your analysis of the live scheduler?
> Xen has a tracing mechanism that I''ve found indispensable for
> understanding what the algorithm was actually doing.  I''ve got the
> basic tool I use to analyze the output here:
>
> http://xenbits.xensource.com/ext/xenalyze.hg
>
> I don''t have the patches used to analyze the scheduler stuff
public
> (since they normally go through a lot of churn, and are interesting
> almost exclusively to developers), but I''ll see if I can dig some
of
> them up for you.
>
>  -George
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhiyuan Shao

2010-Dec-17 14:23 UTC

head link

[Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?

hi all,

I had installed Xen-4.0 in my Ubuntu desktop (Maverick Meerkat, 10.10) 
according to https://help.ubuntu.com/community/Xen

although, I had to replace grub 2.0 with old version to make everything 
work.

After rebooting, I want to create a PV guest with old method I had used 
in Xen 3.4.2. However, I got the following messages, and the creation 
failed.

zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr -c 
1-pv.cfg
[sudo] password for zhiyuan:
Using config file "./1-pv.cfg".
Error: (4, ''Out of memory'', ''xc_dom_alloc_segment:
segment ramdisk too
large (0x11575 > 0x4000 - 0x1b8f pages)\n'')
zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$

The configuration file is attached with this email.

BTW, I booted the system with xen_commandline: dom0_mem=1024M sched=credit

and my box has 2GB memory, Intel(R) Core(TM)2 Duo CPU     P8600  @ 
2.40GHz processor.

Thanks in advance!
Zhiyuan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhiyuan Shao

2010-Dec-17 14:25 UTC

head link

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

In your case, I think, you can try Boost-Credit. Your I/O intensive 
domain will benefit for that.

Best,
Zhiyuan

On 07/09/2010 05:14 AM, Gaurav Dhiman wrote:> Hi All,
>
> I am using Xen 3.3.2 for some of my experiments, and have been
> consistently observing some sub-optimal results in our experiments
> with latency sensitive I/O intensive and compute bound VMs running
> together on a physical machine. We are observing latency issues in
> cases with enough CPU resources available for both the VMs to co-exist
> well, even if we give much higher weight to the latency sensitive VM.
> I suspect it is due to the way the Xen credit scheduler works. In this
> context, I have some questions regarding the scheduler:
>
> 1) In the sched_acct function, the credit cap is set to 300, enough to
> survive one time slice. But if some VCPU crosses that cap, it is set
> to 0, and marked inactive. Why is there no concept of a ceiling (like
> that of a floor for the VCPUs going over the credit line), i.e. why is
> it not set to 300? Is there some fundamental reason for setting it to
> 0? I believe this is resulting in a lot of times when our latency
> sensitive VCPUs have to wait for maybe a time slice, when they can
> immediately run. This might happen if they run with BOOST priority and
> get interrupted by a timer tick, which takes that priority away.
>
> 2) Why is the runq sorted by just priority (which is very coarse
> grained: BOOST, UNDER and OVER), and not the credit? This can result
> in VCPUs with higher credit getting starved for CPU if we have batch
> and latency sensitive VCPUs in the system.
>
> 3) Is there some patch, which makes the current credit scheduler
> fairer to the latency sensitive VCPUs? I see that the sched_credit2
> scheduler addresses these issues, but right now it has just one global
> runq and no load balancing features.
>
> Any advice/inputs here will be extremely valuable!
>
> Thanks in advance,
> -Gaurav
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2010-Dec-17 14:36 UTC

head link

Re: [Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?

On Fri, 2010-12-17 at 14:23 +0000, Zhiyuan Shao wrote:> hi all,
> 
> I had installed Xen-4.0 in my Ubuntu desktop (Maverick Meerkat, 10.10) 
> according to https://help.ubuntu.com/community/Xen
> 
> although, I had to replace grub 2.0 with old version to make everything 
> work.
> 
> After rebooting, I want to create a PV guest with old method I had used 
> in Xen 3.4.2. However, I got the following messages, and the creation 
> failed.
> 
> zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr -c 
> 1-pv.cfg
> [sudo] password for zhiyuan:
> Using config file "./1-pv.cfg".
> Error: (4, ''Out of memory'',
''xc_dom_alloc_segment: segment ramdisk too
> large (0x11575 > 0x4000 - 0x1b8f pages)\n'')
0x11575 pages is a 277M ramdisk, which, as the message says, is really
quite large. Especially compared with the 64M of RAM which you have
configured the guest with.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhiyuan Shao

2010-Dec-17 14:51 UTC

head link

Re: [Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?

On 12/17/2010 10:36 PM, Ian Campbell wrote:> On Fri, 2010-12-17 at 14:23 +0000, Zhiyuan Shao wrote:
>> hi all,
>>
>> I had installed Xen-4.0 in my Ubuntu desktop (Maverick Meerkat, 10.10)
>> according to https://help.ubuntu.com/community/Xen
>>
>> although, I had to replace grub 2.0 with old version to make everything
>> work.
>>
>> After rebooting, I want to create a PV guest with old method I had used
>> in Xen 3.4.2. However, I got the following messages, and the creation
>> failed.
>>
>> zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr
-c
>> 1-pv.cfg
>> [sudo] password for zhiyuan:
>> Using config file "./1-pv.cfg".
>> Error: (4, ''Out of memory'',
''xc_dom_alloc_segment: segment ramdisk too
>> large (0x11575>  0x4000 - 0x1b8f pages)\n'')
> 0x11575 pages is a 277M ramdisk, which, as the message says, is really
> quite large. Especially compared with the 64M of RAM which you have
> configured the guest with.
>
> Ian.
>
>OK, I changed the "memory" line to 800, but, seems it does not work
also:

zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr -c 
1-pv.cfg
Using config file "./1-pv.cfg".
zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ Error: Device 
2049 (vbd) could not be connected. Path closed or removed during hotplug 
add: backend/vbd/1/2049 state: 1

and quite silently .....

But I am very sure the disk image path is correct. And tried also to 
change "sda" to "hda" the result is the same.

I also attach the .config file for the pvops kernel with this email. 
Should I compile another kernel for the PV domUs?

Thanks!
Zhiyuan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Campbell

2010-Dec-17 15:07 UTC

head link

Re: [Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?

On Fri, 2010-12-17 at 14:51 +0000, Zhiyuan Shao wrote:> On 12/17/2010 10:36 PM, Ian Campbell wrote:
> > On Fri, 2010-12-17 at 14:23 +0000, Zhiyuan Shao wrote:
> >> hi all,
> >>
> >> I had installed Xen-4.0 in my Ubuntu desktop (Maverick Meerkat,
10.10)
> >> according to https://help.ubuntu.com/community/Xen
> >>
> >> although, I had to replace grub 2.0 with old version to make
everything
> >> work.
> >>
> >> After rebooting, I want to create a PV guest with old method I had
used
> >> in Xen 3.4.2. However, I got the following messages, and the
creation
> >> failed.
> >>
> >> zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm
cr -c
> >> 1-pv.cfg
> >> [sudo] password for zhiyuan:
> >> Using config file "./1-pv.cfg".
> >> Error: (4, ''Out of memory'',
''xc_dom_alloc_segment: segment ramdisk too
> >> large (0x11575>  0x4000 - 0x1b8f pages)\n'')
> > 0x11575 pages is a 277M ramdisk, which, as the message says, is really
> > quite large. Especially compared with the 64M of RAM which you have
> > configured the guest with.
> >
> > Ian.
> >
> >
> OK, I changed the "memory" line to 800, but, seems it does not
work also:
> 
> zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ sudo xm cr -c 
> 1-pv.cfg
> Using config file "./1-pv.cfg".
> zhiyuan@zhiyuan-ThinkPad-R400:~/xen_work/images/rhel51-pv$ Error: Device 
> 2049 (vbd) could not be connected. Path closed or removed during hotplug 
> add: backend/vbd/1/2049 state: 1
> 
> and quite silently .....
> 
> But I am very sure the disk image path is correct. And tried also to 
> change "sda" to "hda" the result is the same.
pvops kernels require you to use xvda not sd* or hd*.

Ian.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2010 - Questions regarding Xen Credit Scheduler

[Xen-devel] Questions regarding Xen Credit Scheduler

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

[Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?

Re: [Xen-devel] Questions regarding Xen Credit Scheduler

Re: [Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?

Re: [Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?

Re: [Xen-devel] Does Xen-4.0 + pvops kernel still supports PV guest?