Hi, while experimenting a bit with time.c we found a bug in time accounting. Basically, /proc/stat counts idle time twice for PV guests running a pvops kernel. To reproduce, try this command in an unloaded guest: grep cpu0 /proc/stat; sleep 20 ; grep cpu0 /proc/stat and see the fourth number in /proc/stat (idle) increasing by approximately 4000 for a kernel with USER_HZ == 100. Instead, if you try these commands instead (you need an otherwise unloaded machine for these): grep cpu0 /proc/stat; timeout 20s yes > /dev/null ; grep cpu0 /proc/stat grep cpu0 /proc/stat; timeout 20s dd if=/dev/urandom > /dev/null ; grep cpu0 /proc/stat the first and third number in the /cpu/stat increase instead by 2000 only. The reason for this seems to be that in xen_timer_interrupt Linux''s normal timer accounting is called (via evt->event_handler) and this calls account_idle_time. However, idle ticks are also added from do_stolen_accounting, so that overall they''re counted twice. Related to this, it looks like stolen tick accounting is subtly wrong. Even if only part of a tick is stolen by the hypervisor, Linux''s time accounting will add a whole tick to the user/system/idle time. In a dynticks kernel (or maybe even if the scheduling quanta have some kind of resonance with the guest''s timer interrupt?) the sum of the four components user+sys+idle+steal will then be larger than the wall time. In fact, I found experimentally steal time to be usually 20% off from wall-user-sys-idle when the machine is under moderate load (e.g. 5 domains at 100% CPU usage, on a 4-CPU machine). Of course I used the correct, divided-by-2 idle time to do this computation. Paolo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-17 22:51 UTC
Re: [Xen-devel] time accounting problem in pvops kernel
On 08/17/2010 10:29 AM, Paolo Bonzini wrote:> Hi, > > while experimenting a bit with time.c we found a bug in time > accounting. Basically, /proc/stat counts idle time twice for PV guests > running a pvops kernelWhat version? Upstream and stable kernels contain the changeset "xen: drop xen_sched_clock in favour of using plain wallclock time" which should fix a lot of timekeeping/scheduling problems. Thanks, J> . > > To reproduce, try this command in an unloaded guest: > > grep cpu0 /proc/stat; sleep 20 ; grep cpu0 /proc/stat > > and see the fourth number in /proc/stat (idle) increasing by approximately > 4000 for a kernel with USER_HZ == 100. Instead, if you try these commands > instead (you need an otherwise unloaded machine for these): > > grep cpu0 /proc/stat; timeout 20s yes > /dev/null ; grep cpu0 /proc/stat > grep cpu0 /proc/stat; timeout 20s dd if=/dev/urandom > /dev/null ; grep cpu0 /proc/stat > > the first and third number in the /cpu/stat increase instead by 2000 only. > > The reason for this seems to be that in xen_timer_interrupt Linux''s > normal timer accounting is called (via evt->event_handler) and this > calls account_idle_time. However, idle ticks are also added from > do_stolen_accounting, so that overall they''re counted twice. > > Related to this, it looks like stolen tick accounting is subtly > wrong. Even if only part of a tick is stolen by the hypervisor, Linux''s > time accounting will add a whole tick to the user/system/idle time. In > a dynticks kernel (or maybe even if the scheduling quanta have some > kind of resonance with the guest''s timer interrupt?) the sum of the > four components user+sys+idle+steal will then be larger than the wall > time. In fact, I found experimentally steal time to be usually 20% > off from wall-user-sys-idle when the machine is under moderate load > (e.g. 5 domains at 100% CPU usage, on a 4-CPU machine). Of course I used > the correct, divided-by-2 idle time to do this computation. > > Paolo > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Paolo Bonzini
2010-Aug-18 07:49 UTC
Re: [Xen-devel] time accounting problem in pvops kernel
On 08/18/2010 12:51 AM, Jeremy Fitzhardinge wrote:> On 08/17/2010 10:29 AM, Paolo Bonzini wrote: >> Hi, >> >> while experimenting a bit with time.c we found a bug in time >> accounting. Basically, /proc/stat counts idle time twice for PV guests >> running a pvops kernel > > What version?I was using the latest RHEL6 snapshot + the 16-patch blkfront series (i.e. without the patch you pointed out).> Upstream and stable kernels contain the changeset "xen: > drop xen_sched_clock in favour of using plain wallclock time" which > should fix a lot of timekeeping/scheduling problems.I''ll try this patch; however, offhand I don''t see how it fixes the problem of calling account_idle_ticks twice. Paolo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Paolo Bonzini
2010-Aug-18 14:15 UTC
[Xen-devel] Re: time accounting problem in pvops kernel
On 08/18/2010 09:49 AM, Paolo Bonzini wrote:> >> Upstream and stable kernels contain the changeset "xen: >> drop xen_sched_clock in favour of using plain wallclock time" which >> should fix a lot of timekeeping/scheduling problems. > > I''ll try this patch; however, offhand I don''t see how it fixes the > problem of calling account_idle_ticks twice.It doesn''t. :) Paolo _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Aug 18, 2010, at 3:49 AM, Paolo Bonzini wrote:> On 08/18/2010 12:51 AM, Jeremy Fitzhardinge wrote: >> On 08/17/2010 10:29 AM, Paolo Bonzini wrote: >>> Hi, >>> >>> while experimenting a bit with time.c we found a bug in time >>> accounting. Basically, /proc/stat counts idle time twice for PV guests >>> running a pvops kernel >> >> What version? > > I was using the latest RHEL6 snapshot + the 16-patch blkfront series (i.e. without the patch you pointed out). > >> Upstream and stable kernels contain the changeset "xen: >> drop xen_sched_clock in favour of using plain wallclock time" which >> should fix a lot of timekeeping/scheduling problems. > > I''ll try this patch; however, offhand I don''t see how it fixes the problem of calling account_idle_ticks twice.I saw this too, even with said patch applied. To avoid this being simply a ''me too!'' message, I noticed that it aggravated Munin quite a bit. The CPU plugin detects 800% of idle on a 4-core machine, but only idle time is off. Regards, Jed Smith Systems Administrator Linode, LLC +1 (609) 593-7103 x1209 jed@linode.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Aug-18 16:06 UTC
[Xen-devel] Re: time accounting problem in pvops kernel
On 08/18/2010 07:15 AM, Paolo Bonzini wrote:> On 08/18/2010 09:49 AM, Paolo Bonzini wrote: >> >>> Upstream and stable kernels contain the changeset "xen: >>> drop xen_sched_clock in favour of using plain wallclock time" which >>> should fix a lot of timekeeping/scheduling problems. >> >> I''ll try this patch; however, offhand I don''t see how it fixes the >> problem of calling account_idle_ticks twice. > > It doesn''t. :)OK. To be honest, I didn''t look at the detail of your report. I just wanted to make sure it wasn''t something we''d already addressed. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel