Jed Smith
2010-Jul-06 16:32 UTC
[Xen-devel] Runaway real/sys time in newer paravirt domUs?
Good morning, We''ve had a few reports from domU customers[1] - confirmed by myself - that CPU time accounting is very inaccurate in certain circumstances. This issue seems to be limited to x86_64 domUs, starting around the 2.6.32 family (but I can''t be sure of that). The symptoms of the flaw include top reporting hours and days of CPU consumed by a task which has been running for mere seconds of wall time, as well as the time(1) utility reporting hundreds of years in some cases. Contra-indicatively, the /proc/stat timers on all four VCPUs increment at roughly the expected rate. Needless to say, this is puzzling. A test case which highlights the failure has been brought to our attention by Ævar Arnfjörð Bjarmason, which is a simple Perl script[2] that forks and executes numerous dig(1) processes. At the end of his script, time(1) reports 268659840m0.951s of user and 38524003m13.072s of system time consumed. I am able to confirm this demonstration using: - Xen 3.4.1 on dom0 2.6.18.8-931-2 - Debian Lenny on domU 2.6.32.12-x86_64-linode12 [3] Running Ævar''s test case looks like this, in that domU:> real 0m30.741s > user 307399002m50.773s > sys 46724m44.192sHowever, a quick busyloop in Python seems to report the correct time:> li21-66:~# cat doit.py > for i in xrange(10000000): > a = i ** 5 > > li21-66:~# time python doit.py > > real 0m16.600s > user 0m16.593s > sys 0m0.006sI rebooted the domU, and the problem no longer exists. It seems to be transient in nature, and difficult to isolate. /proc/stat seems to increment normally:> li21-66:/proc# cat stat | grep "cpu " && sleep 1 && cat stat | grep "cpu " > cpu 3742 0 1560 700180 1326 0 27 1282 0 > cpu 3742 0 1562 700983 1326 0 27 1282 0I''m not sure where to begin with this one - any thoughts? [1]: http://www.linode.com/forums/viewtopic.php?p=30715 [2]: git://gist.github.com/449825.git [3]: Source: http://www.linode.com/src/2.6.32.12-x86_64-linode12.tar.bz2 Config: http://jedsmith.org/tmp/2.6.32.12-x86_64-linode12.txt Thanks for the assistance, Jed Smith Systems Administrator Linode, LLC +1 (609) 593-7103 x1209 jed@linode.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jul-06 19:05 UTC
Re: [Xen-devel] Runaway real/sys time in newer paravirt domUs?
On 07/06/2010 09:32 AM, Jed Smith wrote:> Good morning, > > We''ve had a few reports from domU customers[1] - confirmed by myself - that CPU > time accounting is very inaccurate in certain circumstances. This issue seems > to be limited to x86_64 domUs, starting around the 2.6.32 family (but I can''t be > sure of that). > > The symptoms of the flaw include top reporting hours and days of CPU consumed by > a task which has been running for mere seconds of wall time, as well as the > time(1) utility reporting hundreds of years in some cases. Contra-indicatively, > the /proc/stat timers on all four VCPUs increment at roughly the expected rate. > Needless to say, this is puzzling. > > A test case which highlights the failure has been brought to our attention by > Ævar Arnfjörð Bjarmason, which is a simple Perl script[2] that forks and > executes numerous dig(1) processes. At the end of his script, time(1) reports > 268659840m0.951s of user and 38524003m13.072s of system time consumed. I am > able to confirm this demonstration using: > > - Xen 3.4.1 on dom0 2.6.18.8-931-2 > - Debian Lenny on domU 2.6.32.12-x86_64-linode12 [3] > > Running Ævar''s test case looks like this, in that domU: > > >> real 0m30.741s >> user 307399002m50.773s >> sys 46724m44.192s >> > However, a quick busyloop in Python seems to report the correct time: > > >> li21-66:~# cat doit.py >> for i in xrange(10000000): >> a = i ** 5 >> >> li21-66:~# time python doit.py >> >> real 0m16.600s >> user 0m16.593s >> sys 0m0.006s >> > I rebooted the domU, and the problem no longer exists. It seems to be transient > in nature, and difficult to isolate. /proc/stat seems to increment normally: > > >> li21-66:/proc# cat stat | grep "cpu " && sleep 1 && cat stat | grep "cpu " >> cpu 3742 0 1560 700180 1326 0 27 1282 0 >> cpu 3742 0 1562 700983 1326 0 27 1282 0 >> > I''m not sure where to begin with this one - any thoughts? >It would be helpful to identify what kernel version the change of behaviour started in (ideally a git bisect down to a particular change, but a pair of versions would be close enough). I think this is the same problem as https://bugzilla.kernel.org/show_bug.cgi?id=16314 Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jed Smith
2010-Jul-06 20:20 UTC
Re: [Xen-devel] Runaway real/sys time in newer paravirt domUs?
On Jul 6, 2010, at 3:05 PM, Jeremy Fitzhardinge wrote:> I think this is the same problem as > https://bugzilla.kernel.org/show_bug.cgi?id=16314Which fits, because "OverlordQ" in that bug entry has posted on the forum entry we have about it, too. I''ll see if I can bisect kernels later this evening or tomorrow - you think .31 is a safe guess for a left side? Regards, Jed Smith Systems Administrator Linode, LLC +1 (609) 593-7103 x1209 jed@linode.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jed Smith
2010-Jul-08 16:38 UTC
Re: [Xen-devel] Runaway real/sys time in newer paravirt domUs?
On Jul 6, 2010, at 3:05 PM, Jeremy Fitzhardinge wrote:> I think this is the same problem as > https://bugzilla.kernel.org/show_bug.cgi?id=16314If it is, it goes further back than 2.6.31 as that bug report suggests. I was able to reproduce it once in v2.6.30 as tagged, and am still laboring to find an A-B point to bisect. Failure rate is significantly lower in 2.6.30, but it did happen once. So far, Ubuntu linux-image-ec2 (2.6.32-30{5,7}-ec2), our 2.6.32.12, vanilla 2.6.31, and vanilla 2.6.30 have demonstrated this issue in my testing environment. All x86_64. I am unable to reproduce on i386 to this point. Regards, Jed Smith Systems Administrator Linode, LLC +1 (609) 593-7103 x1209 jed@linode.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Jul-09 08:46 UTC
Re: [Xen-devel] Runaway real/sys time in newer paravirt domUs?
>>> On 08.07.10 at 18:38, Jed Smith <jed@linode.com> wrote: > On Jul 6, 2010, at 3:05 PM, Jeremy Fitzhardinge wrote: > >> I think this is the same problem as >> https://bugzilla.kernel.org/show_bug.cgi?id=16314 > > If it is, it goes further back than 2.6.31 as that bug report suggests. I > was > able to reproduce it once in v2.6.30 as tagged, and am still laboring to > find > an A-B point to bisect. Failure rate is significantly lower in 2.6.30, but > it did happen once. > > So far, Ubuntu linux-image-ec2 (2.6.32-30{5,7}-ec2), our 2.6.32.12, vanilla > 2.6.31, and vanilla 2.6.30 have demonstrated this issue in my testing > environment. All x86_64. I am unable to reproduce on i386 to this point.Is this perhaps also dependent on the CPU make/model? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jed Smith
2010-Jul-09 14:57 UTC
Re: [Xen-devel] Runaway real/sys time in newer paravirt domUs?
On Jul 9, 2010, at 4:46 AM, Jan Beulich wrote:> Is this perhaps also dependent on the CPU make/model?I have a mostly-homogenous environment, but I''ll see what I can do. I have one box in mind, actually... Since I spoke with Jeremy yesterday, I''ve become suspicious that the issue I am reporting here is likely the same that I''ve seen for some time. A quick background: In our domU environment, Munin incorrectly reports idle time. All Linode domUs have 4 VCPUs, so user/sys travel freely to 400%. Idle, however, has always reported 800% under paravirtualized kernels. This led me to investigate a bit, and this is what I deduced: - In /proc/uptime, idle time outruns system uptime significantly. - This ratio is seemingly affected by the number of VCPUs the domU is configured for. With only one VCPU, the ratio is roughly 2.0, leading me to think that idle ticks are counted twice per-VCPU. - The ratio between idle/system is inconclusive after a lengthy uptime. - In a 2.6.29 domU, two things happen: - a) The original bug does not manifest after 50 attempts. - b) In /proc/uptime, idle time is always precisely 0.00. It never counts. - Between 2.6.29 and 2.6.30, /proc/uptime behaves much differently, and the bug then exposes itself. Something changed there. I have seen the /proc/uptime behavior on i386. In fact, a personal domU: 10:51 jsmith@undertow% cat /proc/uptime 1984022.43 2954870.51 10:51 jsmith@undertow% awk ''{printf("%f\n", $2 / $1)}'' /proc/uptime 1.489342 10:51 jsmith@undertow% uname -a Linux undertow.jedsmith.org 2.6.32.12-linode25 #1 SMP Wed Apr 28 19:25:11 UTC 2010 i686 GNU/Linux I am not sure if the fact that I can only make the original bug appear on x86_64 is a red herring. Maybe the timer {ov,und}erflow display is different depending on word size, and this is all the same issue? I will see if I can try this on a vastly different CPU later today. Regards, Jed Smith Systems Administrator Linode, LLC +1 (609) 593-7103 x1209 jed@linode.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jan Beulich
2010-Jul-09 16:00 UTC
Re: [Xen-devel] Runaway real/sys time in newer paravirt domUs?
>>> On 09.07.10 at 16:57, Jed Smith <jed@linode.com> wrote: > On Jul 9, 2010, at 4:46 AM, Jan Beulich wrote: >> Is this perhaps also dependent on the CPU make/model? > - Between 2.6.29 and 2.6.30, /proc/uptime behaves much differently, and the > bug > then exposes itself. Something changed there.Others as well as me had seen similar misbehavior (outside of pv-ops), but only on reasonably new Intel systems. sched_clock_stable getting set to one in arch/x86/kernel/cpu/intel.c turned out to be the problem. Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jed Smith
2010-Jul-13 21:10 UTC
Re: [Xen-devel] Runaway real/sys time in newer paravirt domUs?
On Jul 9, 2010, at 12:00 PM, Jan Beulich wrote:>>>> On 09.07.10 at 16:57, Jed Smith <jed@linode.com> wrote: >> On Jul 9, 2010, at 4:46 AM, Jan Beulich wrote: >>> Is this perhaps also dependent on the CPU make/model? >> - Between 2.6.29 and 2.6.30, /proc/uptime behaves much differently, and the >> bug >> then exposes itself. Something changed there. > > Others as well as me had seen similar misbehavior (outside of pv-ops), > but only on reasonably new Intel systems. sched_clock_stable getting > set to one in arch/x86/kernel/cpu/intel.c turned out to be the problem.Is there a way to test this that I could perform for you? We''re getting more reports of the same issue in our newer kernels. Do you think CONFIG_NO_HZ makes a difference either way? Regards, Jed Smith Systems Administrator Linode, LLC +1 (609) 593-7103 x1209 jed@linode.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jed Smith
2010-Jul-14 16:31 UTC
Re: [Xen-devel] Runaway real/sys time in newer paravirt domUs?
On Jul 9, 2010, at 12:00 PM, Jan Beulich wrote:>>>> On 09.07.10 at 16:57, Jed Smith <jed@linode.com> wrote: >> On Jul 9, 2010, at 4:46 AM, Jan Beulich wrote: >>> Is this perhaps also dependent on the CPU make/model? >> - Between 2.6.29 and 2.6.30, /proc/uptime behaves much differently, and the >> bug >> then exposes itself. Something changed there. > > Others as well as me had seen similar misbehavior (outside of pv-ops), > but only on reasonably new Intel systems. sched_clock_stable getting > set to one in arch/x86/kernel/cpu/intel.c turned out to be the problem.Like so? This change seems to fix the issue, but I''m going to continue to hammer on it for the remainder of the day. diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 40e1835..e362517 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -71,7 +71,14 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c) set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC); set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC); set_cpu_cap(c, X86_FEATURE_TSC_RELIABLE); +#ifndef CONFIG_XEN + /* + * Considering the clock stable in paravirt domUs under Xen + * causes timing instability on certain Intel CPUs. + */ sched_clock_stable = 1; +#endif /* CONFIG_XEN */ } /* Regards, Jed Smith Systems Administrator Linode, LLC +1 (609) 593-7103 x1209 jed@linode.com _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel