Dan Magenheimer
2009-Oct-29 20:28 UTC
[Xen-devel] tsc_scale/cpu_khz imprecise and need fixing?
I observed with the attached patch (on a machine with invariant TSC) that cstate_restore_tsc() has very poor precision; the value that is written to TSC on C3 recovery seems like it should be within a handful of cycles of being accurate (which on a invariant TSC can be precisely compared). Instead, it was off by 200,000 cycles or more! This was counter-intuitive so I dug through the code a bit to see if there is an obvious bug. I *think* the reason is that tsc_scale, which I believe is set only once per processor at startup on machines with constant/invariant TSC, is set imprecisely using init_pit_and_calibrate_tsc(). I suspect the imprecision is compounded through the reciprocal operation. AND I wonder if an ill-timed power management event might render tsc_scale not just imprecise, but just plain wrong! Tsc_scale -- and cpu_khz which is tsc_scale/1000 -- are used in other places as well; one of interest to me is in hvm_gtsc_need_scale()... for TSC to work properly across certain migrations, this test needs to be very precise. There may be others. Anyway, I wonder if there is a more precise way of determining the exact TSC Hz rate, particularly on machines with constant/invariant TSC? I found one such method in Examples 9-5 and 9-6 in: http://www.intel.com/Assets/PDF/appnote/241618.pdf Or maybe there''s a better way using ACPI tables or cpufreq? And hopefully there''s a method that can work for both AMD and Intel processors? If nothing else, we should probably pick up the latest Linux native_calibrate_tsc() code which has grown considerably more complicated. Thanks, Dan P.S. Note that to reproduce my test, the hpetbroadcast Xen boot parameter must be set, else no C3 events occur. P.P.S. Yes, I realize that the write_tsc() on all processors in time_calibration_tsc_rendezvous() is intended to fix any synchronization lost by an imprecise setting in cstate_restore_tsc(). I was testing to see if the rendezvous''d write was really necessary (and perhaps counter- productive!) when I discovered this imprecision problem. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2009-Oct-29 22:13 UTC
[Xen-devel] Re: tsc_scale/cpu_khz imprecise and need fixing?
On 29/10/2009 20:28, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> I *think* the reason is that tsc_scale, which > I believe is set only once per processor at startup > on machines with constant/invariant TSC, is set > imprecisely using init_pit_and_calibrate_tsc(). > I suspect the imprecision is compounded through > the reciprocal operation. AND I wonder if an ill-timed > power management event might render tsc_scale not > just imprecise, but just plain wrong!The 50ms calibration period may not be long enough, we could put the PIT in square-wave mode instead and count 10 50ms periods... However this may not improve matters since the PIT may tick at quite a different rate than the stated frequency. A crystal can easily be 100ppm off from what''s stamped on the can. Really we should calibrate the TSC to the platform timer that we choose to use. Perhaps we should update tsc_scale even for invariant tsc, just to fold in extra precision after boot. E.g., tsc_scale = alpha*old_tsc_scale + (1-alpha)*(new_sys_time-old_sys_time)/(new_tsc-old_tsc) and increase alpha towards 1 over time, as we develop trust in the value of tsc_scale. Where new/old tsc and system-time values would be across a calibration rendezvous period. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2009-Oct-29 23:00 UTC
RE: [Xen-devel] Re: tsc_scale/cpu_khz imprecise and need fixing?
> > I *think* the reason is that tsc_scale, which > > I believe is set only once per processor at startup > > on machines with constant/invariant TSC, is set > > imprecisely using init_pit_and_calibrate_tsc(). > > I suspect the imprecision is compounded through > > the reciprocal operation. AND I wonder if an ill-timed > > power management event might render tsc_scale not > > just imprecise, but just plain wrong! > > The 50ms calibration period may not be long enough, we could > put the PIT in > square-wave mode instead and count 10 50ms periods...To support this, it appears that the value returned by init_pit_and_calibrate_tsc(), which is essentially "cpu_hz", varies by about 20K or more from boot to boot on the same hardware.> However this may not improve matters since the PIT may tick at quite a > different rate than the stated frequency. A crystal can > easily be 100ppm off > from what''s stamped on the can. Really we should calibrate > the TSC to the > platform timer that we choose to useI suppose this is probably true for the crystal driving TSC as well. Which lowers any expectation of "matching" cpu_khz across a migration (in Xiantao''s HVM approach).> Perhaps we should update tsc_scale > even for invariant tsc, just to fold in extra precision after > boot. E.g., > tsc_scale = alpha*old_tsc_scale + > (1-alpha)*(new_sys_time-old_sys_time)/(new_tsc-old_tsc) > and increase alpha towards 1 over time, as we develop trust > in the value of > tsc_scale. Where new/old tsc and system-time values would be across a > calibration rendezvous period.Interesting. How do you "develop trust"? And is this guaranteed to converge? If it "flutters", it might make matters worse. You may want to look at the new Linux code as it tries to choose the best result from a number of methods, and even tries to weed out SMI''s: http://lxr.linux.no/linux+v2.6.31/arch/x86/kernel/tsc.c#L399 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-30 01:55 UTC
[Xen-devel] RE: tsc_scale/cpu_khz imprecise and need fixing?
> > Tsc_scale -- and cpu_khz which is tsc_scale/1000 -- > are used in other places as well; one of interest to > me is in hvm_gtsc_need_scale()... for TSC to work > properly across certain migrations, this test needs > to be very precise. There may be others.I remembered Keir had raised the same question before. And the answer is that we can''t use more precise comparison for the decision considering real computing environment. We just use cpu_khz/1000 to determin whether needs TSC scaling for migration. You know, processor vendors shouldn''t provides two processors whose frequency''s difference is less than 1M Hz. If use more precise comparison it may lead to incorrect decision. For example, one machine''s tsc_khz =2900123, and another one''s tsc_kzh 2900124, and I believe their real frequencies should be same, and don''t need TSC soft scaling when do migration between them. But if the unit is khz, we may get an incorrect decision due to calibration warp, and lead to unnecessary performance loss. That is why we use less precise mHZ as unit than kHZ here. Make sense ? :) Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Zhang, Xiantao
2009-Oct-30 02:04 UTC
RE: [Xen-devel] Re: tsc_scale/cpu_khz imprecise and need fixing?
Dan Magenheimer wrote:>>> I *think* the reason is that tsc_scale, which >>> I believe is set only once per processor at startup >>> on machines with constant/invariant TSC, is set >>> imprecisely using init_pit_and_calibrate_tsc(). >>> I suspect the imprecision is compounded through >>> the reciprocal operation. AND I wonder if an ill-timed >>> power management event might render tsc_scale not >>> just imprecise, but just plain wrong! >> >> The 50ms calibration period may not be long enough, we could >> put the PIT in >> square-wave mode instead and count 10 50ms periods... > > To support this, it appears that the value returned > by init_pit_and_calibrate_tsc(), which is essentially > "cpu_hz", varies by about 20K or more from boot to boot > on the same hardware.That explains why we use cpu_khz/1000(mHZ) to determin whether needs TSC soft scaling for migration between two machines. That is to say, even if two machine''s tsc_khz are different in khz unit(calibration result), their real frequency maybe same in real world. :) Xiantao _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel