thr3ads.net - Xen devel - [Xen-devel] Credit scheduler and latencies [Dec 2006]

If this information is useful, please help other people find it:
Share via:

Milan Holzäpfel

2006-Dec-14 17:24 UTC

[Xen-devel] Credit scheduler and latencies

Hello,

I''m currently testing Xen-3.0.4-rc1 and its new credit scheduler how it
can fit my latency needs.  For me, latencies up to 1-5 ms are ok.
Latencies < 1 ms would be better, which I implemented with the bvt
scheduler and a quantum of .2 ms so far.

My test setup is as follows: Xen running on a single-core
Athlon64 3000+ and reachable via 192.168.1.34.  Three domUs on
192.168.1.35, .36 and .37.  Two of the domUs are always spinning
(python -c "while True: pass") and the third is idle.  If not
mentioned
otherwise, all have the default weight of 256 and no cap.


First of all, I find it interesting that VCPUs are rescheduled after 30
ms when the PCPU is under full load, but if a domain doesn''t use much
PCPU, then the credit scheduler will happily interrupt the
currently-running domain almost whenever needed, e.g. at an interval of
5 ms:

| ping -c 500  -i .005 192.168.1.34
| ...
| --- 192.168.1.34 ping statistics ---
| 500 packets transmitted, 500 received, 0% packet loss, time 2495ms
| rtt min/avg/max/mdev = 0.055/0.062/2.605/0.113 ms

(dom0 is idle and pinged, as described above, two spinning and one idle
domUs)

Average response time is 0.062 ms, medium deviation is 0.113 ms.

In this light, my current plans to force the scheduler to reschedule
more often (as formerly with bvt; see below) don''t seem that bad to
me :)


Next, I checked out how ping latencies to dom0 depend on dom0''s cpu
usage.  I used a script which sleeps and then tries to spinn for a
certain amount of time (based on wall clock).  These are the results:

| dom0          sleep (ms)   spin (ms)  ping avg (ms)   ping mdev (ms)
| idle          -            -           0.099           0.024
| idle          -            -           0.091           0.029
| idle          -            -           0.087           0.031
| 25%   (.2)     4            1          0.084           0.026
| 25%   (.2)     8            2          0.084           0.026
| 25%   (.2)     40           10         0.088           0.030

| 38%   (.3)     1.5          3.5        0.084           0.025

| 44%   (.35)    1.75         3.75       0.075           0.023
| 44%   (.35)    3.5          6.5        0.271           1.445
| 30%   (.35)   17.5         32.5        6.685          14.633

| 34.5% (.4)     2            3         11.003          17.638

| 45.6% (.9)     0.2          1.8        0.111           0.238

| === domain0 with weight 3072, capped @.2 ==| 25%   (.2)     4            1    
0.101           0.031
| 20%   (.2)    40           10         10.698          18.643
| 36%   (.3)     1.5          3.5        0.061           0.713

The first column shows the CPU usage reported by xentop and the amount
of time the script was spinning.  Next is the length of one sleeping
and one spinning interval, followed by the latency results.  (Measured
with ping -i .2 192.168.1.34 -c 120)

As it seems, a domain/VCPU can in some cases use more than it''s fair
share of PCPU and still interrupt other VCPUs if it only sleeps
frequently enough.

If a domain/VCPU spins for a long enough amount of time, it does indeed
not interrupt other VCPUs anymore, with direct effects upon the latency
I measured.

The results with capping enabled are also interesting (may use more
than CAP if sleeping frequently enough) but not a solution for my
needs.


Therefore I will try reducing the rescheduling interval from 30 ms to
10 ms (should be possible?) and 1 ms (may break the credit accounting
code completely?  I haven''t totally understood in which way it needs
the timer interrupt).

I''d be happy about any advice :)


Regards,
Milan

PS:  Would it easily possible to use bvt with 3.0.4-rc1?  I know it has
been dropped...


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Milan Holzäpfel

2006-Dec-14 17:48 UTC

head link

Re: [Xen-devel] Credit scheduler and latencies

On Thu, 14 Dec 2006 18:24:43 +0100
Milan Holzäpfel <listen@mjh.name> wrote:
> As it seems, a domain/VCPU can in some cases use more than it''s
fair
> share of PCPU and still interrupt other VCPUs if it only sleeps
> frequently enough.
>
> If a domain/VCPU spins for a long enough amount of time, it does indeed
> not interrupt other VCPUs anymore, with direct effects upon the latency
> I measured.
Thinking about it once again, the problem I''m having with this
behaviour is that a domain can''t do I/O with low latency (which
possibly wouldn''t require much CPU time by itself) as soon as it starts
consuming lots of CPU time, e.g. because an archiving process has just
started.

Maybe it would be better to always interrupt the currently running
domain for a limited amount of time when another domain receives I/O?

Yet, I think I would still need a smaller scheduler quantum...

Regards,
Milan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Emmanuel Ackaouy

2006-Dec-14 19:10 UTC

head link

Re: [Xen-devel] Credit scheduler and latencies

Hi Milan,

This is interesting data.

As you noted, the credit scheduler runs 30ms time slices
by default. It will however preempt the CPU for a VCPU
which is waking up and isn''t consuming its fair share of
CPU resources (as calculated by the proportional weighted
method). The idea is to give good performance for many
standard workloads without requiring manual tuning.

I''m quite surprised that you managed to get one of three
CPU hogs to get more than 33.3% of the CPU! This is not
expected behaviour. I''ll look into it.

It is however expected behaviour that once a VCPU consumes
its fair share of CPU resources, it will no longer preempt
others and will have to wait its turn for a time slice.
If we didn''t do that, the VCPU in question could just hog
the CPU.

The way to increase the share a VCPU can use and still
preempt others when waking up is to up the fair share of
the domain in question to make sure that it''s constantly
using less than its fair share of the CPU. But then, this
domain will have the ability to actually use that many
CPU resources.

The credit scheduler doesn''t have a good mechanism to
guarantee a sub ms wake-to-run latency for VCPUs that it
must also restrict the CPU usage of. The assumption is
that if you require good wake-to-run latencies, then you
are not a CPU hog. This assumption may not be valid in
all workloads.

Short of recompiling source, there is no currently no
way to change the default time slice I''m affraid. And if
you recompile, you''re indeed exploring uncharted territory.
Caps aren''t what you''re looking for. They limit the total
CPU a domain can actually get ahold of regardless of the
availability of idle resources but VCPUs still run 30ms
time slices.

Are you trying to guarantee wake-to-run latencies for
one or more domains which also hog CPU resources if left
to run unchecked?

In 3.0.4, you could try to use SEDF which basically seems
to run 1ms time slices. I can also add whatever mechanisms
you require to the credit scheduler but depending on what
is required, that may not happen for a while, and likely
not in 3.0.4.

Cheers,
Emmanuel.

On Thu, Dec 14, 2006 at 06:24:43PM +0100, Milan Holz?pfel
wrote:> Hello,
> 
> I''m currently testing Xen-3.0.4-rc1 and its new credit scheduler
how it
> can fit my latency needs.  For me, latencies up to 1-5 ms are ok.
> Latencies < 1 ms would be better, which I implemented with the bvt
> scheduler and a quantum of .2 ms so far.
> 
> My test setup is as follows: Xen running on a single-core
> Athlon64 3000+ and reachable via 192.168.1.34.  Three domUs on
> 192.168.1.35, .36 and .37.  Two of the domUs are always spinning
> (python -c "while True: pass") and the third is idle.  If not
mentioned
> otherwise, all have the default weight of 256 and no cap.
> 
> 
> First of all, I find it interesting that VCPUs are rescheduled after 30
> ms when the PCPU is under full load, but if a domain doesn''t use
much
> PCPU, then the credit scheduler will happily interrupt the
> currently-running domain almost whenever needed, e.g. at an interval of
> 5 ms:
> 
> | ping -c 500  -i .005 192.168.1.34
> | ...
> | --- 192.168.1.34 ping statistics ---
> | 500 packets transmitted, 500 received, 0% packet loss, time 2495ms
> | rtt min/avg/max/mdev = 0.055/0.062/2.605/0.113 ms
> 
> (dom0 is idle and pinged, as described above, two spinning and one idle
> domUs)
> 
> Average response time is 0.062 ms, medium deviation is 0.113 ms.
> 
> In this light, my current plans to force the scheduler to reschedule
> more often (as formerly with bvt; see below) don''t seem that bad
to
> me :)
> 
> 
> Next, I checked out how ping latencies to dom0 depend on dom0''s
cpu
> usage.  I used a script which sleeps and then tries to spinn for a
> certain amount of time (based on wall clock).  These are the results:
> 
> | dom0          sleep (ms)   spin (ms)  ping avg (ms)   ping mdev (ms)
> | idle          -            -           0.099           0.024
> | idle          -            -           0.091           0.029
> | idle          -            -           0.087           0.031
> | 25%   (.2)     4            1          0.084           0.026
> | 25%   (.2)     8            2          0.084           0.026
> | 25%   (.2)     40           10         0.088           0.030
> 
> | 38%   (.3)     1.5          3.5        0.084           0.025
> 
> | 44%   (.35)    1.75         3.75       0.075           0.023
> | 44%   (.35)    3.5          6.5        0.271           1.445
> | 30%   (.35)   17.5         32.5        6.685          14.633
> 
> | 34.5% (.4)     2            3         11.003          17.638
> 
> | 45.6% (.9)     0.2          1.8        0.111           0.238
> 
> | === domain0 with weight 3072, capped @.2 ==> | 25%   (.2)     4       
1          0.101           0.031
> | 20%   (.2)    40           10         10.698          18.643
> | 36%   (.3)     1.5          3.5        0.061           0.713
> 
> The first column shows the CPU usage reported by xentop and the amount
> of time the script was spinning.  Next is the length of one sleeping
> and one spinning interval, followed by the latency results.  (Measured
> with ping -i .2 192.168.1.34 -c 120)
> 
> As it seems, a domain/VCPU can in some cases use more than it''s
fair
> share of PCPU and still interrupt other VCPUs if it only sleeps
> frequently enough.
> 
> If a domain/VCPU spins for a long enough amount of time, it does indeed
> not interrupt other VCPUs anymore, with direct effects upon the latency
> I measured.
> 
> The results with capping enabled are also interesting (may use more
> than CAP if sleeping frequently enough) but not a solution for my
> needs.
> 
> 
> Therefore I will try reducing the rescheduling interval from 30 ms to
> 10 ms (should be possible?) and 1 ms (may break the credit accounting
> code completely?  I haven''t totally understood in which way it
needs
> the timer interrupt).
> 
> I''d be happy about any advice :)
> 
> 
> Regards,
> Milan
> 
> PS:  Would it easily possible to use bvt with 3.0.4-rc1?  I know it has
> been dropped...

> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Milan Holzäpfel

2006-Dec-14 19:35 UTC

head link

Re: [Xen-devel] Credit scheduler and latencies

On Thu, 14 Dec 2006 19:10:53 +0000
Emmanuel Ackaouy <ack@xensource.com> wrote:
> Hi Milan,
>
> This is interesting data.
Glad that it is useful.
> I''m quite surprised that you managed to get one of three
> CPU hogs to get more than 33.3% of the CPU! This is not
> expected behaviour. I''ll look into it.
Great.
> It is however expected behaviour that once a VCPU consumes
> its fair share of CPU resources, it will no longer preempt
> others and will have to wait its turn for a time slice.
> If we didn''t do that, the VCPU in question could just hog
> the CPU.
Yes, I probably should have mentioned clearly that I read sth like that
in the archives.
> The way to increase the share a VCPU can use and still
> preempt others when waking up is to up the fair share of
> the domain in question to make sure that it''s constantly
> using less than its fair share of the CPU. But then, this
> domain will have the ability to actually use that many
> CPU resources.
Yes, but see below...
> The credit scheduler doesn''t have a good mechanism to
> guarantee a sub ms wake-to-run latency for VCPUs that it
> must also restrict the CPU usage of. The assumption is
> that if you require good wake-to-run latencies, then you
> are not a CPU hog. This assumption may not be valid in
> all workloads.
I do not want to make this assumption in my case, as it can become
false.  (E.g. running a I/O-based-workload, and then logging in via SSH
(usually short CPU hog) or doing some other work (like nice''d bzip2))
> Short of recompiling source, there is no currently no
> way to change the default time slice I''m affraid. And if
> you recompile, you''re indeed exploring uncharted territory.
Yes.  I already read a part of it, but I don''t know everything about
the scheduler API and the credit scheduler in particular yet.

Can you say whether it is possible / feasible to change the time
slice / scheduler quantum to less than 10 ms (time between two
100-Hz-based timer interrupts)?

I think I will try out 10 ms and then 1 ms in the next days.
> [...]
>
> Are you trying to guarantee wake-to-run latencies for
> one or more domains which also hog CPU resources if left
> to run unchecked?
Yes, this would be the ideal case.  Good wake-to-run latencies and in
general reliable CPU limiting at the same time.
> In 3.0.4, you could try to use SEDF which basically seems
> to run 1ms time slices.
Last time I checked SEDF was 3.0.2.  IIRC, I can assign fixed slices of
CPU time to each domain, and I can specify whether each domain can
consume extra CPU time.  I *think* extra CPU time didn''t work at back
then.  I guess I''ll have a look at SEDF again too.
> I can also add whatever mechanisms
> you require to the credit scheduler but depending on what
> is required, that may not happen for a while, and likely
> not in 3.0.4.
I guess that sth like allowing any domain to interrupt at any time and
at the same time distributing CPU usage fairly doesn''t quite fit in the
current credit scheduler''s concept..?

I will tell you if I have more concrete ideas for the credit scheduler.
> Cheers,
> Emmanuel.
Regards,
Milan

PS: I''m subscribed, so you can also send mail only to the list.


> On Thu, Dec 14, 2006 at 06:24:43PM +0100, Milan Holz?pfel wrote:
> [...]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Emmanuel Ackaouy

2006-Dec-14 19:52 UTC

head link

Re: [Xen-devel] Credit scheduler and latencies

On Thu, Dec 14, 2006 at 08:35:30PM +0100, Milan Holz?pfel
wrote:> Can you say whether it is possible / feasible to change the time
> slice / scheduler quantum to less than 10 ms (time between two
> 100-Hz-based timer interrupts)?
Going to 10ms is easy:
Change CSCHED_TICKS_PER_TSLICE from 3 to 1 in common/sched_credit.c.

Going below that is a little more tricky...
You may be able to change CSCHED_MSECS_PER_TSLICE to simply
be defined to 1 (for 1ms).

That may cause some accounting issues though because the
accounting work will still be done every 30ms.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Dec 2006 - Credit scheduler and latencies

[Xen-devel] Credit scheduler and latencies

Re: [Xen-devel] Credit scheduler and latencies

Re: [Xen-devel] Credit scheduler and latencies

Re: [Xen-devel] Credit scheduler and latencies

Re: [Xen-devel] Credit scheduler and latencies