Adjust time init sequence percpu timer init for BSP happens within init_xen_time, while CMOS access at end may take up to 1s. This may make 1st trigger to calibration timer happens >1s interval and elapsed local stime for BSP also exceed 1s. However percpu timer init happens after that point for APs, which will then have a <1s elapsed local stime at 1st calibration. That leads to distinct mul_frac among cores, which can cause up to 6ms system time skew in the start. This is not big issue, since gradually xen calibration framework closes that gap. But it''s still better to avoid that big skew in early boot phase. Signed-off-by Kevin Tian <kevin.tian@intel.com> diff -r 85b2f4aafea4 xen/arch/x86/time.c --- a/xen/arch/x86/time.c Tue Dec 09 20:56:23 2008 -0500 +++ b/xen/arch/x86/time.c Tue Dec 09 22:21:07 2008 -0500 @@ -1095,7 +1095,7 @@ local_irq_save(flags); rdtscll(t->local_tsc_stamp); - now = !plt_src.read_counter ? 0 : read_platform_stime(); + now = read_platform_stime(); local_irq_restore(flags); t->stime_master_stamp = now; @@ -1118,12 +1118,13 @@ open_softirq(TIME_CALIBRATE_SOFTIRQ, local_time_calibration); - init_percpu_time(); + do_settime(get_cmos_time(), 0, 0); stime_platform_stamp = 0; init_platform_timer(); - do_settime(get_cmos_time(), 0, NOW()); + /* init bsp percpu time after platform timer initialized, similar as AP */ + init_percpu_time(); return 0; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Is it really safe to use NOW() before init_percpu_time()? Seems dodgy. Since calibration points are sync''ed across all CPUs on Xen 3.3 and unstable, trying to ensure that the first calibration period across all CPUs is equal is a lost cause isn''t it? -- Keir On 10/12/2008 13:10, "Tian, Kevin" <kevin.tian@intel.com> wrote:> Adjust time init sequence > > percpu timer init for BSP happens within init_xen_time, > while CMOS access at end may take up to 1s. This may > make 1st trigger to calibration timer happens >1s interval > and elapsed local stime for BSP also exceed 1s. However > percpu timer init happens after that point for APs, which > will then have a <1s elapsed local stime at 1st calibration. > That leads to distinct mul_frac among cores, which can > cause up to 6ms system time skew in the start. This is > not big issue, since gradually xen calibration framework > closes that gap. But it''s still better to avoid that big > skew in early boot phase. > > Signed-off-by Kevin Tian <kevin.tian@intel.com> > > diff -r 85b2f4aafea4 xen/arch/x86/time.c > --- a/xen/arch/x86/time.c Tue Dec 09 20:56:23 2008 -0500 > +++ b/xen/arch/x86/time.c Tue Dec 09 22:21:07 2008 -0500 > @@ -1095,7 +1095,7 @@ > > local_irq_save(flags); > rdtscll(t->local_tsc_stamp); > - now = !plt_src.read_counter ? 0 : read_platform_stime(); > + now = read_platform_stime(); > local_irq_restore(flags); > > t->stime_master_stamp = now; > @@ -1118,12 +1118,13 @@ > > open_softirq(TIME_CALIBRATE_SOFTIRQ, local_time_calibration); > > - init_percpu_time(); > + do_settime(get_cmos_time(), 0, 0); > > stime_platform_stamp = 0; > init_platform_timer(); > > - do_settime(get_cmos_time(), 0, NOW()); > + /* init bsp percpu time after platform timer initialized, similar as AP > */ > + init_percpu_time(); > > return 0; > } > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Wednesday, December 10, 2008 9:49 PM > >Is it really safe to use NOW() before init_percpu_time()? Seems dodgy.Where did you mean by using NOW before init_percpu_time? I moved do_settime earlier but with a zero system stamp now which matches the line behind to init stime_platform_time to zero. To me there''s no difference to initialize wallclock at zero point or sometime after with a NOW() drift, which should cause similar result to wc_sec/wc_nsec.> >Since calibration points are sync''ed across all CPUs on Xen 3.3 and >unstable, trying to ensure that the first calibration period >across all CPUs >is equal is a lost cause isn''t it?Yes, it''s not a strong cause as I also mentioned in patch description. But if it can be addressed without hurting others, I think it still welcomed. Especially considering that only 2nd or 3rd calibration fully sync across all CPUs, that normally means dom0 will suffer a short bump in 1-2s at early boot phase. It''s better to eliminish such unconformtable factor as we don''t know whether some corner case is not considered. :-) Thanks, Kevin> > -- Keir > >On 10/12/2008 13:10, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> Adjust time init sequence >> >> percpu timer init for BSP happens within init_xen_time, >> while CMOS access at end may take up to 1s. This may >> make 1st trigger to calibration timer happens >1s interval >> and elapsed local stime for BSP also exceed 1s. However >> percpu timer init happens after that point for APs, which >> will then have a <1s elapsed local stime at 1st calibration. >> That leads to distinct mul_frac among cores, which can >> cause up to 6ms system time skew in the start. This is >> not big issue, since gradually xen calibration framework >> closes that gap. But it''s still better to avoid that big >> skew in early boot phase. >> >> Signed-off-by Kevin Tian <kevin.tian@intel.com> >> >> diff -r 85b2f4aafea4 xen/arch/x86/time.c >> --- a/xen/arch/x86/time.c Tue Dec 09 20:56:23 2008 -0500 >> +++ b/xen/arch/x86/time.c Tue Dec 09 22:21:07 2008 -0500 >> @@ -1095,7 +1095,7 @@ >> >> local_irq_save(flags); >> rdtscll(t->local_tsc_stamp); >> - now = !plt_src.read_counter ? 0 : read_platform_stime(); >> + now = read_platform_stime(); >> local_irq_restore(flags); >> >> t->stime_master_stamp = now; >> @@ -1118,12 +1118,13 @@ >> >> open_softirq(TIME_CALIBRATE_SOFTIRQ, local_time_calibration); >> >> - init_percpu_time(); >> + do_settime(get_cmos_time(), 0, 0); >> >> stime_platform_stamp = 0; >> init_platform_timer(); >> >> - do_settime(get_cmos_time(), 0, NOW()); >> + /* init bsp percpu time after platform timer >initialized, similar as AP >> */ >> + init_percpu_time(); >> >> return 0; >> } >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/12/2008 00:23, "Tian, Kevin" <kevin.tian@intel.com> wrote:>> Is it really safe to use NOW() before init_percpu_time()? Seems dodgy. > > Where did you mean by using NOW before init_percpu_time? > I moved do_settime earlier but with a zero system stamp now > which matches the line behind to init stime_platform_time to zero. > To me there''s no difference to initialize wallclock at zero point > or sometime after with a NOW() drift, which should cause similar > result to wc_sec/wc_nsec.init_platform_time() -> plt_overflow() -> NOW() Perhaps the above is safe though? Will NOW() return zero for an uninitialised per-cpu time sstructure (since stime_local_stamp and tsc_scale are both zero)? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Thursday, December 11, 2008 8:45 AM > >On 11/12/2008 00:23, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >>> Is it really safe to use NOW() before init_percpu_time()? >Seems dodgy. >> >> Where did you mean by using NOW before init_percpu_time? >> I moved do_settime earlier but with a zero system stamp now >> which matches the line behind to init stime_platform_time to zero. >> To me there''s no difference to initialize wallclock at zero point >> or sometime after with a NOW() drift, which should cause similar >> result to wc_sec/wc_nsec. > >init_platform_time() -> plt_overflow() -> NOW() > >Perhaps the above is safe though? Will NOW() return zero for an >uninitialised per-cpu time sstructure (since stime_local_stamp >and tsc_scale >are both zero)? >I guess not, due to same reason as why I sent out 1st patch idle vcpu state entry. The point is the current TSC value, which count from power on and is translated to a dozens of seconds for elapsed time upon a zero tsc stamp. :-( I didn''t realize that point in the start... Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Tian, Kevin >Sent: Thursday, December 11, 2008 8:47 AM >> >>init_platform_time() -> plt_overflow() -> NOW() >> >>Perhaps the above is safe though? Will NOW() return zero for an >>uninitialised per-cpu time sstructure (since stime_local_stamp >>and tsc_scale >>are both zero)? >> > >I guess not, due to same reason as why I sent out 1st patch idle >vcpu state entry. The point is the current TSC value, which count >from power on and is translated to a dozens of seconds for elapsed >time upon a zero tsc stamp. :-( I didn''t realize that point in >the start... >The net effect of first plt_overflow is to take current count of platform time as init stamps, and thus following simple change IMO would be acceptable. Of course this leaves to you to decide whether it''s worthy change. :-) ---- Adjust time init sequence percpu timer init for BSP happens within init_xen_time, while CMOS access at end may take up to 1s. This may make 1st trigger to calibration timer happens >1s interval and elapsed local stime for BSP also exceed 1s. However percpu timer init happens after that point for APs, which will then have a <1s elapsed local stime at 1st calibration. That leads to distinct mul_frac among cores, which can cause up to 6ms system time skew in the start. This is not big issue, since gradually xen calibration framework closes that gap. But it''s still better to avoid that big skew in early boot phase. Signed-off-by Kevin Tian <kevin.tian@intel.com> diff -r f8ccab7036c7 xen/arch/x86/time.c --- a/xen/arch/x86/time.c Wed Dec 10 16:48:10 2008 -0500 +++ b/xen/arch/x86/time.c Wed Dec 10 17:39:27 2008 -0500 @@ -656,10 +656,9 @@ 1ull << (pts->counter_bits-1), &plt_scale); init_timer(&plt_overflow_timer, plt_overflow, NULL, 0); plt_src = *pts; - plt_overflow(NULL); - platform_timer_stamp = plt_stamp64; - + plt_stamp = pts->read_counter(); + platform_timer_stamp = plt_stamp64 = (plt_stamp & plt_mask); printk("Platform timer is %s %s\n", freq_string(pts->frequency), pts->name); } @@ -1109,7 +1108,7 @@ local_irq_save(flags); rdtscll(t->local_tsc_stamp); - now = !plt_src.read_counter ? 0 : read_platform_stime(); + now = read_platform_stime(); local_irq_restore(flags); t->stime_master_stamp = now; @@ -1132,13 +1131,15 @@ open_softirq(TIME_CALIBRATE_SOFTIRQ, local_time_calibration); - init_percpu_time(); + do_settime(get_cmos_time(), 0, 0); stime_platform_stamp = 0; init_platform_timer(); - do_settime(get_cmos_time(), 0, NOW()); + init_percpu_time(); + /* Make sure NOW used after percpu time init */ + set_timer(&plt_overflow_timer, NOW() + plt_overflow_period); return 0; } _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/12/2008 00:47, "Tian, Kevin" <kevin.tian@intel.com> wrote:> I guess not, due to same reason as why I sent out 1st patch idle > vcpu state entry. The point is the current TSC value, which count > from power on and is translated to a dozens of seconds for elapsed > time upon a zero tsc stamp. :-( I didn''t realize that point in the start...Ah, because it''s set up in early_time_init(). By the way, instead of avoiding NOW() early on, could we just set local_tsc_stamp in early_time_init()? Then we could use NOW() when initialising idle VCPUs, and also early on in init_xen_time()? We could set stime_platform_stamp = NOW() too, so that platform time is kicked off following BP''s time. I could send a patch which I find tasteful if you think this could work? :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Thursday, December 11, 2008 5:05 PM > >On 11/12/2008 00:47, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> I guess not, due to same reason as why I sent out 1st patch idle >> vcpu state entry. The point is the current TSC value, which count >> from power on and is translated to a dozens of seconds for elapsed >> time upon a zero tsc stamp. :-( I didn''t realize that point >in the start... > >Ah, because it''s set up in early_time_init(). > >By the way, instead of avoiding NOW() early on, could we just set >local_tsc_stamp in early_time_init()? Then we could use NOW() when >initialising idle VCPUs, and also early on in init_xen_time()? > >We could set stime_platform_stamp = NOW() too, so that platform time is >kicked off following BP''s time. > >I could send a patch which I find tasteful if you think this >could work? :-) >Sure, I think that should work too. :-) Thanks, Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 11/12/2008 09:04, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:> By the way, instead of avoiding NOW() early on, could we just set > local_tsc_stamp in early_time_init()? Then we could use NOW() when > initialising idle VCPUs, and also early on in init_xen_time()? > > We could set stime_platform_stamp = NOW() too, so that platform time is > kicked off following BP''s time. > > I could send a patch which I find tasteful if you think this could work? :-)It turned out pretty simple, so I checked it in as c/s 18909. Let me know how that works for you. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Thursday, December 11, 2008 9:12 PM > >On 11/12/2008 09:04, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: > >> By the way, instead of avoiding NOW() early on, could we just set >> local_tsc_stamp in early_time_init()? Then we could use NOW() when >> initialising idle VCPUs, and also early on in init_xen_time()? >> >> We could set stime_platform_stamp = NOW() too, so that >platform time is >> kicked off following BP''s time. >> >> I could send a patch which I find tasteful if you think this >could work? :-) > >It turned out pretty simple, so I checked it in as c/s 18909. >Let me know >how that works for you. >Unfortunately it doesn''t work, and gave me a TOD to year 2185. After checking the code, record tsc stamp in early time init is nullified by later synchronize_tsc_bp which reset TSC to zero. Then later invocation to get_s_time gets a negative value which is converted to an extremely large system time. A temp workaround is to move rdtscll(t->local_tsc_stamp) into calibrate_tsc_bp, which gives a sane NOW() before percpu time init. But then the purpose behind is ambiguous... why would we want to count system time from this point? If we can''t count system time starting from power on, it looks clearer to tag system time as 0 when initializing wc_sec/wc_nsec by do_settime. Actually the starting point of xen system time is not that critical since it''s mostly used by relative progress. Then my previous updated patch can be used? :-) Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/12/2008 03:40, "Tian, Kevin" <kevin.tian@intel.com> wrote:> A temp workaround is to move rdtscll(t->local_tsc_stamp) into > calibrate_tsc_bp, which gives a sane NOW() before percpu time > init. But then the purpose behind is ambiguous... why would we > want to count system time from this point? If we can''t count > system time starting from power on, it looks clearer to tag system > time as 0 when initializing wc_sec/wc_nsec by do_settime. > Actually the starting point of xen system time is not that critical > since it''s mostly used by relative progress. Then my previous > updated patch can be used? :-)How about rdtscll(t->local_tsc_stamp) at the top of init_xen_time(), and remove the one I added to early_time_init()? That would allow NOW() usage at least in init_xen_time(), and be later than the TSC reset. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>>> Keir Fraser <keir.fraser@eu.citrix.com> 12.12.08 10:08 >>> >On 12/12/2008 03:40, "Tian, Kevin" <kevin.tian@intel.com> wrote: > >> A temp workaround is to move rdtscll(t->local_tsc_stamp) into >> calibrate_tsc_bp, which gives a sane NOW() before percpu time >> init. But then the purpose behind is ambiguous... why would we >> want to count system time from this point? If we can''t count >> system time starting from power on, it looks clearer to tag system >> time as 0 when initializing wc_sec/wc_nsec by do_settime. >> Actually the starting point of xen system time is not that critical >> since it''s mostly used by relative progress. Then my previous >> updated patch can be used? :-) > >How about rdtscll(t->local_tsc_stamp) at the top of init_xen_time(), and >remove the one I added to early_time_init()? That would allow NOW() usage at >least in init_xen_time(), and be later than the TSC reset.Perhaps it should be considered to switch to the non-resetting synchronization logic in x86-64 Linux up to 2.6.20 (according to the comments there derived from ia64), or even the current version that doesn''t re-write the TSC at all? Jan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 12/12/2008 10:10, "Jan Beulich" <jbeulich@novell.com> wrote:>> How about rdtscll(t->local_tsc_stamp) at the top of init_xen_time(), and >> remove the one I added to early_time_init()? That would allow NOW() usage at >> least in init_xen_time(), and be later than the TSC reset. > > Perhaps it should be considered to switch to the non-resetting > synchronization logic in x86-64 Linux up to 2.6.20 (according to the > comments there derived from ia64), or even the current version that > doesn''t re-write the TSC at all?Quite agreeable, albeit a bigger patch. ;-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
>From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >Sent: Friday, December 12, 2008 6:38 PM > >On 12/12/2008 10:10, "Jan Beulich" <jbeulich@novell.com> wrote: > >>> How about rdtscll(t->local_tsc_stamp) at the top of >init_xen_time(), and >>> remove the one I added to early_time_init()? That would >allow NOW() usage at >>> least in init_xen_time(), and be later than the TSC reset. >> >> Perhaps it should be considered to switch to the non-resetting >> synchronization logic in x86-64 Linux up to 2.6.20 (according to the >> comments there derived from ia64), or even the current version that >> doesn''t re-write the TSC at all? > >Quite agreeable, albeit a bigger patch. ;-) >Current Linux doesn''t re-write TSC but marking it as unstable and then may use alternative source if available. For Xen, tsc is one basic wheel to drive both pv and hvm time virtualization. By not re-writing TSCs on those platforms with unsynchrozed TSC, it may cause time issue since currently hvm guest doesn''t adjust guest TSC offset across vcpu migration. Then if above change is pursued, first we''d better add guest TSC offset adjustment to avoid cause regression on those platforms? Also do you know the reason why Linux doesn''t re-write TSC nowadays? I recalled some thread seen before that such re-write may cause time warp on some platforms, but not sure about the detail. Did we ever see similar issue caused by re-write in Xen before? Thanks Kevin _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel