thr3ads.net - Xen devel - Re: [xen-devel] System time monotonicity [Apr 2008]

If this information is useful, please help other people find it:
Share via:

Dan Magenheimer

2008-Apr-08 16:34 UTC

Re: [xen-devel] System time monotonicity

>On 26/3/07 19:50, "Ian Pratt" <Ian.Pratt@xxxxxxxxxxxx>
wrote:
>
>> On your system it appears to be a couple of microseconds out, which is
>> on the high side of what we've observed. Normally you only see that
kind
>> of mismatch on systems with TSCs running off different crystals.
>
> More likely a jittery chipset timer -- we've observed less-than-ideal
> stability from some chipset timers, which can throw us off a bit when
> independently sync'ing the TSCs (which each CPU does for its TSC
> independently every couple of seconds).
>
> -- Keir
Sorry, a little slow on responding here, only took a year ;-)

Where is the code that does this independent TSC sync'ing?  I see
code in smpboot.c that seems to do this at startup (though exactly
how I admit I haven't yet figured out... looks like some kind of
rendezvous loop triggered by the BP?).  But I don't see where/how
this gets called "every couple of seconds", nor do I see any writing
to the TSC (except setting BP and each AP to zero at startup).

Thanks,
Dan

==================================If Xen could save time in a bottle / then
clocks wouldn't virtually skew /
It would save every tick / for VMs that aren't quick /
and Xen then would send them anew
(with apologies to the late great Jim Croce)
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Apr-08 16:42 UTC

head link

Re: [xen-devel] System time monotonicity

On 8/4/08 17:34, "Dan Magenheimer" <dan.magenheimer@oracle.com>
wrote:
> Sorry, a little slow on responding here, only took a year ;-)
> 
> Where is the code that does this independent TSC sync''ing?  I see
> code in smpboot.c that seems to do this at startup (though exactly
> how I admit I haven''t yet figured out... looks like some kind of
> rendezvous loop triggered by the BP?).  But I don''t see where/how
> this gets called "every couple of seconds", nor do I see any
writing
> to the TSC (except setting BP and each AP to zero at startup).
arch/x86/time.c:local_time_calibration()

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Apr-08 17:39 UTC

head link

RE: [xen-devel] System time monotonicity

> > Where is the code that does this independent TSC sync''ing?  I
see
> > code in smpboot.c that seems to do this at startup (though exactly
> > how I admit I haven''t yet figured out... looks like some kind
of
> > rendezvous loop triggered by the BP?).  But I don''t see
where/how
> > this gets called "every couple of seconds", nor do I see any
writing
> > to the TSC (except setting BP and each AP to zero at startup).
> 
> arch/x86/time.c:local_time_calibration()
OK, thanks.

If I read the code correctly, Xen goes through this effort to
ensure that the TSC''s are synchronized, but maintains this
synchronization in a data structure and doesn''t actually
change each processor''s physical TSC.  Correct? This is of
course just fine for the hypervisor''s timer needs (and thus
indirectly for paravirtualized domains).

But I also observe that all of the hvm platform timer (pit,
hpet, and pmtimer) code is built on top of the physical TSC
plus the vmx/svm tsc_offset which doesn''t seem to be affected
by the Xen TSC synchronization.  True?

So assuming the above isn''t mistaken, hvm domain reads of the
platform timer on an SMP system lacking hardware-synchronized
TSC may suffer from non-monotonicity.  Correct?

Thanks,
Dan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Apr-09 01:16 UTC

head link

RE: [xen-devel] System time monotonicity

>From: Dan Magenheimer
>Sent: 2008年4月9日 1:40
>
>But I also observe that all of the hvm platform timer (pit,
>hpet, and pmtimer) code is built on top of the physical TSC
>plus the vmx/svm tsc_offset which doesn''t seem to be affected
>by the Xen TSC synchronization.  True?
For cpus on same system bus driven by one crystal, TSC drift among
cpus may be just dozen of cycles after boot time sync, which is 
negligible enough compared to migration overhead and thus it''s unlikely
to have HVM guest to observe a non-monotonic behavior after resume.

The issue comes with cpus running on different frequency, like driven
by multiple crystals or on-demand frequency change which affects TSC
too. HVM guest can be configured to avoid migrating among cpus with
different TSC freq, like limiting its cpu affinity to cpus on same system
bus. Or you have to configure HVM guest to not trust TSC...

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Apr-09 01:55 UTC

head link

RE: [xen-devel] System time monotonicity

> >But I also observe that all of the hvm platform timer (pit,
> >hpet, and pmtimer) code is built on top of the physical TSC
> >plus the vmx/svm tsc_offset which doesn't seem to be affected
> >by the Xen TSC synchronization.  True?
>
> For cpus on same system bus driven by one crystal, TSC drift among
> cpus may be just dozen of cycles after boot time sync, which is
> negligible enough compared to migration overhead and thus
> it's unlikely
> to have HVM guest to observe a non-monotonic behavior after resume.
I agree this case is not much of a problem.
> The issue comes with cpus running on different frequency, like driven
> by multiple crystals or on-demand frequency change which affects TSC
> too. HVM guest can be configured to avoid migrating among cpus with
> different TSC freq, like limiting its cpu affinity to cpus on
> same system bus.
These are the cases I am worried about.  The linux kernel seems
to have a number of cases that mark TSC as unstable, but
Xen does not, nor (I think) does Xen expose this information
anywhere.  So it seems SMP guests need to be pinned to physical
CPUs that are measured to have sync'ed TSC's to guarantee that
the (virtual) platform timer is monotonic.
> Or you have to configure HVM guest to not trust TSC...
Yes, that's what I'm thinking... like Linux, Xen could/should
build virtual platform timers on a physical clocksource other
than tsc if all of the potential vcpu->pcpu mappings are not
on sync'd-TSC-pcpus.

I assume this problem is worse with multi-socket Hypertransport
and future Intel QPI boxes?  Or is TSC (and frequency changing)
synchronized for such systems?

Thanks,
Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2008-Apr-09 03:20 UTC

head link

RE: [xen-devel] System time monotonicity

>From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] 
>Sent: 2008年4月9日 9:55
>
>> Or you have to configure HVM guest to not trust TSC...
>
>Yes, that''s what I''m thinking... like Linux, Xen
could/should
>build virtual platform timers on a physical clocksource other
>than tsc if all of the potential vcpu->pcpu mappings are not
>on sync''d-TSC-pcpus.
virtual platform timers are only one area. The most important is
TSC itself which is used frequently by guest to calculate relative
offset...
>
>I assume this problem is worse with multi-socket Hypertransport
>and future Intel QPI boxes?  Or is TSC (and frequency changing)
>synchronized for such systems?
For same crystal case, Intel processors with VT-x support all have 
TSC constant feature which is not bound to frequency change and 
can be detected by CPUID. But for multiple crystals case, Xen may
need tackle affinity then.

Thanks,
Kevin

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2008-Apr-09 12:42 UTC

head link

RE: [xen-devel] System time monotonicity

> > The issue comes with cpus running on different frequency, like
driven> > by multiple crystals or on-demand frequency change which affects TSC
> > too. HVM guest can be configured to avoid migrating among cpus with
> > different TSC freq, like limiting its cpu affinity to cpus on same
> > system bus.
> 
> These are the cases I am worried about.  The linux kernel seems to
have> a number of cases that mark TSC as unstable, but Xen does not, nor (I
> think) does Xen expose this information anywhere.  So it seems SMP
> guests need to be pinned to physical CPUs that are measured to have
> sync''ed TSC''s to guarantee that the (virtual) platform
timer is
> monotonic.
Xen itself copes fine with CPUs running from entirely independent clock
sources. It calibrates the TSCs frequency against a global clock (e.g.
the hpet).
> > Or you have to configure HVM guest to not trust TSC...
> 
> Yes, that''s what I''m thinking... like Linux, Xen
could/should build
> virtual platform timers on a physical clocksource other than tsc if
all> of the potential vcpu->pcpu mappings are not on
sync''d-TSC-pcpus.
Although Xen is fine, guests can get confused if they''re relying on the
TSC. Fortunately, Windows doesn''t rely on the TSC, and most folk run
Linux PV which also works fine.

If you want to make Linux work HVM on such a system you need to either
convince it to not to use the TSC, or arrange for TSC reads to trap to
Xen and then compute the result based on Xen''s time base. If
you''re
doing the latter, better hope that TSC reads aren''t called
frequently...


Ian






_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Apr-09 14:25 UTC

head link

RE: [xen-devel] System time monotonicity

> Although Xen is fine, guests can get confused if they''re 
> relying on the
> TSC. Fortunately, Windows doesn''t rely on the TSC, and most folk
run
> Linux PV which also works fine.
> 
> If you want to make Linux work HVM on such a system you need to either
> convince it to not to use the TSC, or arrange for TSC reads to trap to
> Xen and then compute the result based on Xen''s time base. If
you''re
> doing the latter, better hope that TSC reads aren''t called 
> frequently...
Hi Ian --

Let me clarify... unless my reading of the code is wrong, ALL hvm
guests that rely on ANY (virtual) platform timer are UNKNOWINGLY
relying on the physical TSCs.  Thus if the underlying physical
system has unsynchronized TSCs, different vcpus in an SMP HVM
guest (or even the SAME vcpu when rescheduled on another pcpu)
may find that consecutive reads of ANY (virtual) platform timer
are unexpectedly non-monotonic, which violates the whole purpose
of using a PLATFORM timer.

I suspect this is unintended and bad?

Thanks,
Dan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Apr-09 14:41 UTC

head link

Re: [xen-devel] System time monotonicity

On 9/4/08 15:25, "Dan Magenheimer" <dan.magenheimer@oracle.com>
wrote:
> Let me clarify... unless my reading of the code is wrong, ALL hvm
> guests that rely on ANY (virtual) platform timer are UNKNOWINGLY
> relying on the physical TSCs.  Thus if the underlying physical
> system has unsynchronized TSCs, different vcpus in an SMP HVM
> guest (or even the SAME vcpu when rescheduled on another pcpu)
> may find that consecutive reads of ANY (virtual) platform timer
> are unexpectedly non-monotonic, which violates the whole purpose
> of using a PLATFORM timer.
This is all true. The logic in vpt.c should be fixed to use Xen''s
concept of
system time and everything, guest TSC included, should be derived from that.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Apr-09 16:33 UTC

head link

RE: [xen-devel] System time monotonicity

> > Let me clarify... unless my reading of the code is wrong, ALL hvm
> > guests that rely on ANY (virtual) platform timer are UNKNOWINGLY
> > relying on the physical TSCs.  Thus if the underlying physical
> > system has unsynchronized TSCs, different vcpus in an SMP HVM
> > guest (or even the SAME vcpu when rescheduled on another pcpu)
> > may find that consecutive reads of ANY (virtual) platform timer
> > are unexpectedly non-monotonic, which violates the whole purpose
> > of using a PLATFORM timer.
> 
> This is all true. The logic in vpt.c should be fixed to use 
> Xen''s concept of
> system time and everything, guest TSC included, should be 
> derived from that.
Does Xen''s concept of system time have sufficient resolution
and continuity to ensure both monotonicity and a reasonable
guest timer granularity?  I''m thinking not; some form of
interpolation will probably be necessary which will require
reading a physical platform timer** (e.g. other than tsc).

Since a guest that is presented with a (virtual) platform timer
of a given resolution may come to rely on both the monotonicity
AND resolution of that timer, I''m beginning to understand why
"that other virtualization company" doesn''t virtualize HPET.

Dan

** Lest anyone say "well then just read the d**n platform timer",
be aware that it must be done judiciously as it can be very
expensive: On one recent vintage box I have, I measured reading
HPET at about 10000 cycles and reading PIT at about 50000!
So if every vcpu on every guest reads the (virtual) platform
timer at 1000Hz, things can get ugly fast.



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Apr-09 16:40 UTC

head link

Re: [xen-devel] System time monotonicity

On 9/4/08 17:33, "Dan Magenheimer" <dan.magenheimer@oracle.com>
wrote:
>> This is all true. The logic in vpt.c should be fixed to use
>> Xen''s concept of
>> system time and everything, guest TSC included, should be
>> derived from that.
> 
> Does Xen''s concept of system time have sufficient resolution
> and continuity to ensure both monotonicity and a reasonable
> guest timer granularity?  I''m thinking not; some form of
> interpolation will probably be necessary which will require
> reading a physical platform timer** (e.g. other than tsc).
Xen''s system time provides nanosecond precision and is intended to be
as
accurate as the underlying platform timer (over long periods) and as
granular and accurate as the TSC over sub-second periods. It''s quite
good
enough for any guest purposes.
> Since a guest that is presented with a (virtual) platform timer
> of a given resolution may come to rely on both the monotonicity
> AND resolution of that timer, I''m beginning to understand why
> "that other virtualization company" doesn''t virtualize
HPET.
The HPET is a good example of the difference between precision and accuracy.
It may report its period in picoseconds, but the spec allows drift of 100s
of ppm.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Apr-09 18:36 UTC

head link

RE: [xen-devel] System time monotonicity

> >> This is all true. The logic in vpt.c should be fixed to use
> >> Xen''s concept of
> >> system time and everything, guest TSC included, should be
> >> derived from that.
> >
> > Does Xen''s concept of system time have sufficient resolution
> > and continuity to ensure both monotonicity and a reasonable
> > guest timer granularity?  I''m thinking not; some form of
> > interpolation will probably be necessary which will require
> > reading a physical platform timer** (e.g. other than tsc).
> 
> Xen''s system time provides nanosecond precision and is 
> intended to be as
> accurate as the underlying platform timer (over long periods) and as
> granular and accurate as the TSC over sub-second periods. 
> It''s quite good enough for any guest purposes.
OK, as long as the maximum uncorrected drift between physical TSCs
does not exceed the guest-expected granularity of its virtual
platform timer, I agree its good enough.

It appears that TSC drift for each pcpu is corrected by Xen
once per second.  Any idea for real systems out there what the
maximum drift (per second) is?  Will this be affected by
existing or future power-savings designs (e.g. is it possible
for the TSCs in one socket to be slowed down while the TSCs
in another socket are not)?  If so, as Kevin points out,
some kind of affinity enforcement might be necessary for
time-sensitive VMs.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Apr-10 07:08 UTC

head link

Re: [xen-devel] System time monotonicity

On 9/4/08 19:36, "Dan Magenheimer" <dan.magenheimer@oracle.com>
wrote:
> OK, as long as the maximum uncorrected drift between physical TSCs
> does not exceed the guest-expected granularity of its virtual
> platform timer, I agree its good enough.
Ignoring power-saving events, TSCs are crystal-driven and hence we can
expect specified tolerance of a few ppm across temperature extremes, and in
practice over few-second periods I would expect tolerance of better than
1ppm. *However* I have seen platform timers (which also should be
crystal-driven) which inexplicably exhibit much worse behaviour.
> It appears that TSC drift for each pcpu is corrected by Xen
> once per second.  Any idea for real systems out there what the
> maximum drift (per second) is?  Will this be affected by
> existing or future power-savings designs (e.g. is it possible
> for the TSCs in one socket to be slowed down while the TSCs
> in another socket are not)?  If so, as Kevin points out,
> some kind of affinity enforcement might be necessary for
> time-sensitive VMs.
P-state changes are informed to Xen so we can re-sync the local TSC
immediately. The tricky ones are unannounced thermal events because software
does not get informed about those. On some systems we can turn them off, on
others (new Intel platforms) TSC is constant-rate regardless. In a normal
running system thermal events are rare.

 -- Keir




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Apr-10 21:27 UTC

head link

RE: [xen-devel] System time monotonicity

> > OK, as long as the maximum uncorrected drift between physical TSCs
> > does not exceed the guest-expected granularity of its virtual
> > platform timer, I agree its good enough.
> 
> Ignoring power-saving events, TSCs are crystal-driven and hence we can
> expect specified tolerance of a few ppm across temperature 
> extremes, and in
> practice over few-second periods I would expect tolerance of 
> better than
> 1ppm. *However* I have seen platform timers (which also should be
> crystal-driven) which inexplicably exhibit much worse behaviour.
OK... back to monotonicity for a moment:

So regardless of ppms and thermal and P-state and drifts,
are you confident that the current corrected-tsc mechanism
will never see time going backwards for the following test?
(Apologies for pseudo-code, but hope you get the drift...
pun intended).

global val1, proceed = 0;

Guest thread 1:
spin_lock(lock);
val1 = read_hpet();
proceed = 1;
spin_unlock(lock);

Guest thread 2:
while (!proceed);
spin_unlock_wait(lock);
val2 = read_hpet();
if (val2 < val1) PANIC();

If you are not confident that this will be OK on existing and
(within-reason) future Xen platforms, perhaps the hvm virtual
platform timers should (at least optionally) be built on physical
platform timers (Dave Winchell cc''ed), which would ensure time
never goes backwards.
> > It appears that TSC drift for each pcpu is corrected by Xen
> > once per second.  Any idea for real systems out there what the
> > maximum drift (per second) is?  Will this be affected by
> > existing or future power-savings designs (e.g. is it possible
> > for the TSCs in one socket to be slowed down while the TSCs
> > in another socket are not)?  If so, as Kevin points out,
> > some kind of affinity enforcement might be necessary for
> > time-sensitive VMs.
> 
> P-state changes are informed to Xen so we can re-sync the local TSC
> immediately. The tricky ones are unannounced thermal events 
> because software
> does not get informed about those. On some systems we can 
> turn them off, on
> others (new Intel platforms) TSC is constant-rate regardless. 
> In a normal
> running system thermal events are rare.
If it is possible to write code that can determine at
boot-time (or at hotplug cpu_online) what CPUs are
guaranteed-sync''ed with what other CPUs, it would be
nice if this information was exported by Xen
so that tools can manage very-time-sensitive guests
appropriately.

Personally, I think this code should be provided by the
CPU vendors ;-)

Dan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Apr-11 06:48 UTC

head link

Re: [xen-devel] System time monotonicity

On 10/4/08 22:27, "Dan Magenheimer" <dan.magenheimer@oracle.com>
wrote:
> If you are not confident that this will be OK on existing and
> (within-reason) future Xen platforms, perhaps the hvm virtual
> platform timers should (at least optionally) be built on physical
> platform timers (Dave Winchell cc''ed), which would ensure time
> never goes backwards.
If we wanted to be more certain we could maintain a last_system_time fields
per VCPU and, whenever using system time to compute current value for a
virtual timer for an HVM VCPU, we could actually use max(system time,
last_system_time). This would mean we were 100% sure that time didn''t
go
backwards, by turning small backwards deltas into very short periods of
stalled time.

As it is: no, since system time ''free runs'' on each CPU over
one-second
periods, there can be drift between CPUs if they are driven by different
oscillators. Also there are tolerances in our software calibration code to
consider. Which is why Linux guests implement the max(curr time, last time)
in their gettimeofday() code. It would be quite reasonable to the same,
inside Xen, for HVM guests. We can at least be pretty certain that any
drifts across CPUs/VCPUs will be on the order of less than 100us.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Apr-11 22:05 UTC

head link

RE: [xen-devel] System time monotonicity

> If we wanted to be more certain we could maintain a 
> last_system_time fields per VCPU and
If you mean per VCPU *and* per guest this seems like
a good idea.
> backwards, by turning small backwards deltas into very short 
> periods of stalled time.
The stalled time may be a problem, but only if the tsc skew
between processors is "bad".  Your estimate of 100us seems
like it could be unacceptable for some applications.

Any idea how expensive arch/x86/time.c:local_time_calibration()
is?  If it''s not too bad, one option might be to add a xen
boot parameter "calibratehz" to calibrate more frequently.
Then systems running time-sensitive guests can be instructed
to increase the parameter accordingly to ensure tsc skew
is small enough.
> > If you are not confident that this will be OK on existing and
> > (within-reason) future Xen platforms, perhaps the hvm virtual
> > platform timers should (at least optionally) be built on physical
> > platform timers (Dave Winchell cc''ed), which would ensure
time
> > never goes backwards.
> 
> If we wanted to be more certain we could maintain a 
> last_system_time fields
> per VCPU and, whenever using system time to compute current 
> value for a
> virtual timer for an HVM VCPU, we could actually use max(system time,
> last_system_time). This would mean we were 100% sure that 
> time didn''t go
> backwards, by turning small backwards deltas into very short 
> periods of
> stalled time.
> 
> As it is: no, since system time ''free runs'' on each CPU
over
> one-second
> periods, there can be drift between CPUs if they are driven 
> by different
> oscillators. Also there are tolerances in our software 
> calibration code to
> consider. Which is why Linux guests implement the max(curr 
> time, last time)
> in their gettimeofday() code. It would be quite reasonable to 
> the same,
> inside Xen, for HVM guests. We can at least be pretty certain that any
> drifts across CPUs/VCPUs will be on the order of less than 100us.
> 
>  -- Keir
> 
> 
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Apr 2008 - Re: System time monotonicity

Re: [xen-devel] System time monotonicity

Re: [xen-devel] System time monotonicity

RE: [xen-devel] System time monotonicity

RE: [xen-devel] System time monotonicity

RE: [xen-devel] System time monotonicity

RE: [xen-devel] System time monotonicity

RE: [xen-devel] System time monotonicity

RE: [xen-devel] System time monotonicity

Re: [xen-devel] System time monotonicity

RE: [xen-devel] System time monotonicity

Re: [xen-devel] System time monotonicity

RE: [xen-devel] System time monotonicity

Re: [xen-devel] System time monotonicity

RE: [xen-devel] System time monotonicity

Re: [xen-devel] System time monotonicity

RE: [xen-devel] System time monotonicity