Hello, I''m currently testing Xen-3.0.4-rc1 and its new credit scheduler how it can fit my latency needs. For me, latencies up to 1-5 ms are ok. Latencies < 1 ms would be better, which I implemented with the bvt scheduler and a quantum of .2 ms so far. My test setup is as follows: Xen running on a single-core Athlon64 3000+ and reachable via 192.168.1.34. Three domUs on 192.168.1.35, .36 and .37. Two of the domUs are always spinning (python -c "while True: pass") and the third is idle. If not mentioned otherwise, all have the default weight of 256 and no cap. First of all, I find it interesting that VCPUs are rescheduled after 30 ms when the PCPU is under full load, but if a domain doesn''t use much PCPU, then the credit scheduler will happily interrupt the currently-running domain almost whenever needed, e.g. at an interval of 5 ms: | ping -c 500 -i .005 192.168.1.34 | ... | --- 192.168.1.34 ping statistics --- | 500 packets transmitted, 500 received, 0% packet loss, time 2495ms | rtt min/avg/max/mdev = 0.055/0.062/2.605/0.113 ms (dom0 is idle and pinged, as described above, two spinning and one idle domUs) Average response time is 0.062 ms, medium deviation is 0.113 ms. In this light, my current plans to force the scheduler to reschedule more often (as formerly with bvt; see below) don''t seem that bad to me :) Next, I checked out how ping latencies to dom0 depend on dom0''s cpu usage. I used a script which sleeps and then tries to spinn for a certain amount of time (based on wall clock). These are the results: | dom0 sleep (ms) spin (ms) ping avg (ms) ping mdev (ms) | idle - - 0.099 0.024 | idle - - 0.091 0.029 | idle - - 0.087 0.031 | 25% (.2) 4 1 0.084 0.026 | 25% (.2) 8 2 0.084 0.026 | 25% (.2) 40 10 0.088 0.030 | 38% (.3) 1.5 3.5 0.084 0.025 | 44% (.35) 1.75 3.75 0.075 0.023 | 44% (.35) 3.5 6.5 0.271 1.445 | 30% (.35) 17.5 32.5 6.685 14.633 | 34.5% (.4) 2 3 11.003 17.638 | 45.6% (.9) 0.2 1.8 0.111 0.238 | === domain0 with weight 3072, capped @.2 ==| 25% (.2) 4 1 0.101 0.031 | 20% (.2) 40 10 10.698 18.643 | 36% (.3) 1.5 3.5 0.061 0.713 The first column shows the CPU usage reported by xentop and the amount of time the script was spinning. Next is the length of one sleeping and one spinning interval, followed by the latency results. (Measured with ping -i .2 192.168.1.34 -c 120) As it seems, a domain/VCPU can in some cases use more than it''s fair share of PCPU and still interrupt other VCPUs if it only sleeps frequently enough. If a domain/VCPU spins for a long enough amount of time, it does indeed not interrupt other VCPUs anymore, with direct effects upon the latency I measured. The results with capping enabled are also interesting (may use more than CAP if sleeping frequently enough) but not a solution for my needs. Therefore I will try reducing the rescheduling interval from 30 ms to 10 ms (should be possible?) and 1 ms (may break the credit accounting code completely? I haven''t totally understood in which way it needs the timer interrupt). I''d be happy about any advice :) Regards, Milan PS: Would it easily possible to use bvt with 3.0.4-rc1? I know it has been dropped... _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 14 Dec 2006 18:24:43 +0100 Milan Holzäpfel <listen@mjh.name> wrote:> As it seems, a domain/VCPU can in some cases use more than it''s fair > share of PCPU and still interrupt other VCPUs if it only sleeps > frequently enough. > > If a domain/VCPU spins for a long enough amount of time, it does indeed > not interrupt other VCPUs anymore, with direct effects upon the latency > I measured.Thinking about it once again, the problem I''m having with this behaviour is that a domain can''t do I/O with low latency (which possibly wouldn''t require much CPU time by itself) as soon as it starts consuming lots of CPU time, e.g. because an archiving process has just started. Maybe it would be better to always interrupt the currently running domain for a limited amount of time when another domain receives I/O? Yet, I think I would still need a smaller scheduler quantum... Regards, Milan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Hi Milan, This is interesting data. As you noted, the credit scheduler runs 30ms time slices by default. It will however preempt the CPU for a VCPU which is waking up and isn''t consuming its fair share of CPU resources (as calculated by the proportional weighted method). The idea is to give good performance for many standard workloads without requiring manual tuning. I''m quite surprised that you managed to get one of three CPU hogs to get more than 33.3% of the CPU! This is not expected behaviour. I''ll look into it. It is however expected behaviour that once a VCPU consumes its fair share of CPU resources, it will no longer preempt others and will have to wait its turn for a time slice. If we didn''t do that, the VCPU in question could just hog the CPU. The way to increase the share a VCPU can use and still preempt others when waking up is to up the fair share of the domain in question to make sure that it''s constantly using less than its fair share of the CPU. But then, this domain will have the ability to actually use that many CPU resources. The credit scheduler doesn''t have a good mechanism to guarantee a sub ms wake-to-run latency for VCPUs that it must also restrict the CPU usage of. The assumption is that if you require good wake-to-run latencies, then you are not a CPU hog. This assumption may not be valid in all workloads. Short of recompiling source, there is no currently no way to change the default time slice I''m affraid. And if you recompile, you''re indeed exploring uncharted territory. Caps aren''t what you''re looking for. They limit the total CPU a domain can actually get ahold of regardless of the availability of idle resources but VCPUs still run 30ms time slices. Are you trying to guarantee wake-to-run latencies for one or more domains which also hog CPU resources if left to run unchecked? In 3.0.4, you could try to use SEDF which basically seems to run 1ms time slices. I can also add whatever mechanisms you require to the credit scheduler but depending on what is required, that may not happen for a while, and likely not in 3.0.4. Cheers, Emmanuel. On Thu, Dec 14, 2006 at 06:24:43PM +0100, Milan Holz?pfel wrote:> Hello, > > I''m currently testing Xen-3.0.4-rc1 and its new credit scheduler how it > can fit my latency needs. For me, latencies up to 1-5 ms are ok. > Latencies < 1 ms would be better, which I implemented with the bvt > scheduler and a quantum of .2 ms so far. > > My test setup is as follows: Xen running on a single-core > Athlon64 3000+ and reachable via 192.168.1.34. Three domUs on > 192.168.1.35, .36 and .37. Two of the domUs are always spinning > (python -c "while True: pass") and the third is idle. If not mentioned > otherwise, all have the default weight of 256 and no cap. > > > First of all, I find it interesting that VCPUs are rescheduled after 30 > ms when the PCPU is under full load, but if a domain doesn''t use much > PCPU, then the credit scheduler will happily interrupt the > currently-running domain almost whenever needed, e.g. at an interval of > 5 ms: > > | ping -c 500 -i .005 192.168.1.34 > | ... > | --- 192.168.1.34 ping statistics --- > | 500 packets transmitted, 500 received, 0% packet loss, time 2495ms > | rtt min/avg/max/mdev = 0.055/0.062/2.605/0.113 ms > > (dom0 is idle and pinged, as described above, two spinning and one idle > domUs) > > Average response time is 0.062 ms, medium deviation is 0.113 ms. > > In this light, my current plans to force the scheduler to reschedule > more often (as formerly with bvt; see below) don''t seem that bad to > me :) > > > Next, I checked out how ping latencies to dom0 depend on dom0''s cpu > usage. I used a script which sleeps and then tries to spinn for a > certain amount of time (based on wall clock). These are the results: > > | dom0 sleep (ms) spin (ms) ping avg (ms) ping mdev (ms) > | idle - - 0.099 0.024 > | idle - - 0.091 0.029 > | idle - - 0.087 0.031 > | 25% (.2) 4 1 0.084 0.026 > | 25% (.2) 8 2 0.084 0.026 > | 25% (.2) 40 10 0.088 0.030 > > | 38% (.3) 1.5 3.5 0.084 0.025 > > | 44% (.35) 1.75 3.75 0.075 0.023 > | 44% (.35) 3.5 6.5 0.271 1.445 > | 30% (.35) 17.5 32.5 6.685 14.633 > > | 34.5% (.4) 2 3 11.003 17.638 > > | 45.6% (.9) 0.2 1.8 0.111 0.238 > > | === domain0 with weight 3072, capped @.2 ==> | 25% (.2) 4 1 0.101 0.031 > | 20% (.2) 40 10 10.698 18.643 > | 36% (.3) 1.5 3.5 0.061 0.713 > > The first column shows the CPU usage reported by xentop and the amount > of time the script was spinning. Next is the length of one sleeping > and one spinning interval, followed by the latency results. (Measured > with ping -i .2 192.168.1.34 -c 120) > > As it seems, a domain/VCPU can in some cases use more than it''s fair > share of PCPU and still interrupt other VCPUs if it only sleeps > frequently enough. > > If a domain/VCPU spins for a long enough amount of time, it does indeed > not interrupt other VCPUs anymore, with direct effects upon the latency > I measured. > > The results with capping enabled are also interesting (may use more > than CAP if sleeping frequently enough) but not a solution for my > needs. > > > Therefore I will try reducing the rescheduling interval from 30 ms to > 10 ms (should be possible?) and 1 ms (may break the credit accounting > code completely? I haven''t totally understood in which way it needs > the timer interrupt). > > I''d be happy about any advice :) > > > Regards, > Milan > > PS: Would it easily possible to use bvt with 3.0.4-rc1? I know it has > been dropped...> _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 14 Dec 2006 19:10:53 +0000 Emmanuel Ackaouy <ack@xensource.com> wrote:> Hi Milan, > > This is interesting data.Glad that it is useful.> I''m quite surprised that you managed to get one of three > CPU hogs to get more than 33.3% of the CPU! This is not > expected behaviour. I''ll look into it.Great.> It is however expected behaviour that once a VCPU consumes > its fair share of CPU resources, it will no longer preempt > others and will have to wait its turn for a time slice. > If we didn''t do that, the VCPU in question could just hog > the CPU.Yes, I probably should have mentioned clearly that I read sth like that in the archives.> The way to increase the share a VCPU can use and still > preempt others when waking up is to up the fair share of > the domain in question to make sure that it''s constantly > using less than its fair share of the CPU. But then, this > domain will have the ability to actually use that many > CPU resources.Yes, but see below...> The credit scheduler doesn''t have a good mechanism to > guarantee a sub ms wake-to-run latency for VCPUs that it > must also restrict the CPU usage of. The assumption is > that if you require good wake-to-run latencies, then you > are not a CPU hog. This assumption may not be valid in > all workloads.I do not want to make this assumption in my case, as it can become false. (E.g. running a I/O-based-workload, and then logging in via SSH (usually short CPU hog) or doing some other work (like nice''d bzip2))> Short of recompiling source, there is no currently no > way to change the default time slice I''m affraid. And if > you recompile, you''re indeed exploring uncharted territory.Yes. I already read a part of it, but I don''t know everything about the scheduler API and the credit scheduler in particular yet. Can you say whether it is possible / feasible to change the time slice / scheduler quantum to less than 10 ms (time between two 100-Hz-based timer interrupts)? I think I will try out 10 ms and then 1 ms in the next days.> [...] > > Are you trying to guarantee wake-to-run latencies for > one or more domains which also hog CPU resources if left > to run unchecked?Yes, this would be the ideal case. Good wake-to-run latencies and in general reliable CPU limiting at the same time.> In 3.0.4, you could try to use SEDF which basically seems > to run 1ms time slices.Last time I checked SEDF was 3.0.2. IIRC, I can assign fixed slices of CPU time to each domain, and I can specify whether each domain can consume extra CPU time. I *think* extra CPU time didn''t work at back then. I guess I''ll have a look at SEDF again too.> I can also add whatever mechanisms > you require to the credit scheduler but depending on what > is required, that may not happen for a while, and likely > not in 3.0.4.I guess that sth like allowing any domain to interrupt at any time and at the same time distributing CPU usage fairly doesn''t quite fit in the current credit scheduler''s concept..? I will tell you if I have more concrete ideas for the credit scheduler.> Cheers, > Emmanuel.Regards, Milan PS: I''m subscribed, so you can also send mail only to the list.> On Thu, Dec 14, 2006 at 06:24:43PM +0100, Milan Holz?pfel wrote: > [...]_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Dec 14, 2006 at 08:35:30PM +0100, Milan Holz?pfel wrote:> Can you say whether it is possible / feasible to change the time > slice / scheduler quantum to less than 10 ms (time between two > 100-Hz-based timer interrupts)?Going to 10ms is easy: Change CSCHED_TICKS_PER_TSLICE from 3 to 1 in common/sched_credit.c. Going below that is a little more tricky... You may be able to change CSCHED_MSECS_PER_TSLICE to simply be defined to 1 (for 1ms). That may cause some accounting issues though because the accounting work will still be done every 30ms. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel