Hi Jeremy -- In trying to understand some strange behavior I was seeing on a RHEL6b2 guest, I ran across an interesting anomaly, and it seems to be true on different upstream (pvops) kernels. At first I thought it was a result of the xen_sched_clock->xen_clocksource_read patch you recently posted, but after some testing this appears to be unrelated. The number of rdtsc/second goes up dramatically when there is CPU-intensive load on an upstream kernel! I know we both observed some cases where rdtsc/sec was very high, but I don''t think we ever were able to reproduce this consistently. First, this is a single vcpu, 64-bit 2.6.32 (RHEL6b2) kernel. I am observing ~300 rdtsc/sec on an idle VM. When I run a load of: main() {while(1);} I am observing about 10000 rdtsc/sec!! This is a CONFIG_HZ_1000=y kernel, so I would expect 1000 rdtsc/sec, or maybe 2000 rdtsc/sec, but 10000 makes me wonder if there is some hidden bug. Changing to vcpus=2, the above numbers roughly double... ~600 goes up to 20000 rdtsc/sec. So clearly it is some kind of per-vcpu activity. Any idea what kernel parts might be doing this? I can instrument the xen rdtsc code to collect some more information, but thought I''d get ideas from you first. Thanks, Dan P.S. Lest anyone panic about this, a rate of 10000 rdtsc/sec/vcpu should translate to ~1% performance. Not a catastrophe but still worth understanding. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jul-19 23:34 UTC
[Xen-devel] Re: rdtsc strangeness on upstream kernel
On 07/19/2010 04:19 PM, Dan Magenheimer wrote:> In trying to understand some strange behavior I was > seeing on a RHEL6b2 guest, I ran across an interesting > anomaly, and it seems to be true on different upstream > (pvops) kernels. At first I thought it was a result > of the xen_sched_clock->xen_clocksource_read patch > you recently posted, but after some testing this > appears to be unrelated. > > The number of rdtsc/second goes up dramatically when > there is CPU-intensive load on an upstream kernel! >Are you looking at rdtsc emulation traps when running a PV domain?> I know we both observed some cases where rdtsc/sec > was very high, but I don''t think we ever were able to > reproduce this consistently. >It would be interesting to compare that to the context switch rate (cs column in vmstat output) to see if they correlate. Also, how does it relate to timer interrupts?> First, this is a single vcpu, 64-bit 2.6.32 (RHEL6b2) > kernel. > > I am observing ~300 rdtsc/sec on an idle VM. When > I run a load of: > > main() {while(1);} > > I am observing about 10000 rdtsc/sec!! > > This is a CONFIG_HZ_1000=y kernel, so I would > expect 1000 rdtsc/sec, or maybe 2000 rdtsc/sec, > but 10000 makes me wonder if there is some hidden > bug. >Do you have preemption running? And why HZ=1000? Thanks, J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2010-Jul-19 23:51 UTC
[Xen-devel] RE: rdtsc strangeness on upstream kernel
> > The number of rdtsc/second goes up dramatically when > > there is CPU-intensive load on an upstream kernel! > > Are you looking at rdtsc emulation traps when running a PV domain?Yes, with "for i in {0..99}; do xm debug-key s; sleep 1; done".> > I know we both observed some cases where rdtsc/sec > > was very high, but I don''t think we ever were able to > > reproduce this consistently. > > It would be interesting to compare that to the context switch rate (cs > column in vmstat output) to see if they correlate.Doesn''t appear to be related. Vmstat continues to show cs at about 40/sec while the rdtsc/sec is very high.> Also, how does it relate to timer interrupts?With vcpus=2, timer0 goes up to about 2000 irq/sec on CPU0 and timer1 on CPU1 is the same while rdtsc/sec is about 20000. Is it possible that the timer interrupt is using rdtsc five times each interrupt?> > First, this is a single vcpu, 64-bit 2.6.32 (RHEL6b2) > > kernel. > > > > I am observing ~300 rdtsc/sec on an idle VM. When > > I run a load of: > > > > main() {while(1);} > > > > I am observing about 10000 rdtsc/sec!! > > > > This is a CONFIG_HZ_1000=y kernel, so I would > > expect 1000 rdtsc/sec, or maybe 2000 rdtsc/sec, > > but 10000 makes me wonder if there is some hidden > > bug. > > Do you have preemption running? And why HZ=1000?CONFIG_PREEMPT is off, though there are some other PREEMPT-related configs that are on. Are any others relevant? HZ=1000 is because this is a straight RHEL6b2 kernel, but running as a PV guest. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2010-Jul-19 23:55 UTC
[Xen-devel] RE: rdtsc strangeness on upstream kernel
> With vcpus=2, timer0 goes up to about 2000 irq/sec on CPU0 > and timer1 on CPU1 is the same while rdtsc/sec is about 20000.No, wait, correct that...timer0 and timer1 are 1000 irq/sec. (I was using "watch -d" which defaults to updating every two seconds, not every second.)> Is it possible that the timer interrupt is using rdtsc > five times each interrupt?TEN times each interrupt.> -----Original Message----- > From: Dan Magenheimer > Sent: Monday, July 19, 2010 5:51 PM > To: Jeremy Fitzhardinge > Cc: xen-devel@lists.xensource.com > Subject: RE: rdtsc strangeness on upstream kernel > > > > The number of rdtsc/second goes up dramatically when > > > there is CPU-intensive load on an upstream kernel! > > > > Are you looking at rdtsc emulation traps when running a PV domain? > > Yes, with "for i in {0..99}; do xm debug-key s; sleep 1; done". > > > > I know we both observed some cases where rdtsc/sec > > > was very high, but I don''t think we ever were able to > > > reproduce this consistently. > > > > It would be interesting to compare that to the context switch rate > (cs > > column in vmstat output) to see if they correlate. > > Doesn''t appear to be related. Vmstat continues to show > cs at about 40/sec while the rdtsc/sec is very high. > > > Also, how does it relate to timer interrupts? > > With vcpus=2, timer0 goes up to about 2000 irq/sec on CPU0 > and timer1 on CPU1 is the same while rdtsc/sec is about 20000. > Is it possible that the timer interrupt is using rdtsc > five times each interrupt? > > > > First, this is a single vcpu, 64-bit 2.6.32 (RHEL6b2) > > > kernel. > > > > > > I am observing ~300 rdtsc/sec on an idle VM. When > > > I run a load of: > > > > > > main() {while(1);} > > > > > > I am observing about 10000 rdtsc/sec!! > > > > > > This is a CONFIG_HZ_1000=y kernel, so I would > > > expect 1000 rdtsc/sec, or maybe 2000 rdtsc/sec, > > > but 10000 makes me wonder if there is some hidden > > > bug. > > > > Do you have preemption running? And why HZ=1000? > > CONFIG_PREEMPT is off, though there are some other > PREEMPT-related configs that are on. Are any others > relevant? > > HZ=1000 is because this is a straight RHEL6b2 kernel, > but running as a PV guest. > > Dan_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jul-20 00:34 UTC
Re: [Xen-devel] RE: rdtsc strangeness on upstream kernel
On 07/19/2010 04:55 PM, Dan Magenheimer wrote:>> With vcpus=2, timer0 goes up to about 2000 irq/sec on CPU0 >> and timer1 on CPU1 is the same while rdtsc/sec is about 20000. >> > No, wait, correct that...timer0 and timer1 are 1000 irq/sec. > (I was using "watch -d" which defaults to updating every > two seconds, not every second.) > > >> Is it possible that the timer interrupt is using rdtsc >> five times each interrupt? >> > TEN times each interrupt. >Do you have any other debugging/tracing enabled? CONFIG_XEN_DEBUG_FS for example? 10 sounds a little high, but not necessarily completely unreasonable, especially if there''s tracing/debugging on. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2010-Jul-20 00:42 UTC
RE: [Xen-devel] RE: rdtsc strangeness on upstream kernel
> On 07/19/2010 04:55 PM, Dan Magenheimer wrote: > >> With vcpus=2, timer0 goes up to about 2000 irq/sec on CPU0 > >> and timer1 on CPU1 is the same while rdtsc/sec is about 20000. > >> > > No, wait, correct that...timer0 and timer1 are 1000 irq/sec. > > (I was using "watch -d" which defaults to updating every > > two seconds, not every second.) > > > > > >> Is it possible that the timer interrupt is using rdtsc > >> five times each interrupt? > >> > > TEN times each interrupt. > > > > Do you have any other debugging/tracing enabled? CONFIG_XEN_DEBUG_FS > for example? > > 10 sounds a little high, but not necessarily completely unreasonable, > especially if there''s tracing/debugging on.Yes, CONFIG_XEN_DEBUG_FS is on. I dug up some instrumentation I used in the hypervisor last fall which should identify the kernel locations of the rdtsc''s. I''ll plug that in and post the results tomorrow. Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jul-20 00:44 UTC
Re: [Xen-devel] RE: rdtsc strangeness on upstream kernel
On 07/19/2010 05:42 PM, Dan Magenheimer wrote:> Yes, CONFIG_XEN_DEBUG_FS is on. I dug up some instrumentation > I used in the hypervisor last fall which should identify the > kernel locations of the rdtsc''s. I''ll plug that in and > post the results tomorrow. >They''re probably all in the same place, so you''ll need to get some calling stack to make sense of it. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2010-Jul-23 16:47 UTC
RE: [Xen-devel] RE: rdtsc strangeness on upstream kernel
> On 07/19/2010 05:42 PM, Dan Magenheimer wrote: > > Yes, CONFIG_XEN_DEBUG_FS is on. I dug up some instrumentation > > I used in the hypervisor last fall which should identify the > > kernel locations of the rdtsc''s. I''ll plug that in and > > post the results tomorrow. > > They''re probably all in the same place, so you''ll need to get some > calling stack to make sense of it. > > JSorry this took awhile. I''ve been hacking and instrumenting and measuring and the results are not very good. I''m cc''ing Andrew Jones since things look worse on stock RHEL6b2. First, I''ve measured on both PV 2.6.35-rc4 ("upstream") and PV 2.6.32-37 ("rhel6b2"). I''ve attached the patch for rhel6b2 that I used for measurement; the same code fragments were used for upstream. Rhel6b2 had HZ=1000 and had vcpus=2. Upstream had HZ=250 and vcpus=1. Both had the "xen_sched_clock" call changed to xen_clocksource_read. The workload is simply "main() {while(1); }" run in background. On vcpus=2, I ran two in background. Both had CONFIG_XEN_FS_DEBUG set, but the workload should encounter few or no kernel spinlocks and the measurements confirmed this. Rhel6b2 had HIGH_RES_TIMERS and SCHED_HRTICK set; they were not set on upstream... don''t know if this matters. BOTTOM LINE: A cpu-loaded vcpus=2 HZ=1000 rhel6b2 guest had close to 20000 rdtsc/sec. Unloaded, the same guest had about 600 rdtsc/sec. A cpu-loaded HZ=250 upstream guest had about 4000 rdtsc/sec, regardless of whether it was loaded or not. Breakdown: I instrumented several of the primary sources of calls to native_read_tsc. No rdtsc instructions occurred, except the one in native_read_tsc. (Some small number are binary translated to call native_read_tsc.) The most frequent users of rdtsc: (A) sched_clock via xen_clocksource_read->pvclock_clocksource_read-> pvclock_nsec_offset->native_read_tsc (B) clock->read via xen_clocksource_read (C) set_next_event via xen_timerop_clockevent I measured (running the workload overnight) how frequently each of these paths resulted in a call to native_read_tsc. Each of these functions go through this path exactly once so this is also a frequency (per second) count for the callers. ALL 4672..18689 (A) 2149...8428 (~8*HZ) (B) 1350...9456 (~(4*HZ+?)*vcpus) (C) 524...2102 (~2*HZ) The measurement code outputs frequency every 2**20 samples and I did a cursory review to verify that the frequency samples were consistent. There is some small variance, which accounts for the fact that the numbers above (which were from one sample) don''t quite add up. So please consider them "trend indicators" rather than exact measurements. I also did verify the total rdtsc usage counts against the "xm debug-key s" output from the Xen console. Conclusions/questions: 1) Frequency of rdtsc use is much higher than expected, especially when HZ=1000. Sadly, the divider= kernel parameter may still be necessary for RHEL6b2 on older machines where Xen must do rdtsc emulation... and for any guest that has migrated (or will migrate?)... unless this high frequency identifies a fixable bug. 2) There is some fix in rhel6b2 that hides this when vcpus are inactive. Should this go upstream so idle guests don''t consume so much CPU? 3) Why is sched_clock called at 8*HZ?!? Why is clock->read called at 4*HZ (on every processor)? !? Why is set_next_event called at 2*HZ?!? Even with hardware rdtsc, this seems like a potential (though admittedly small) performance issue. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel