thr3ads.net - Xen devel - [Xen-devel] TSC scaling and softtsc reprise, and PROPOSAL [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Dan Magenheimer

2009-Jul-20 17:05 UTC

[Xen-devel] TSC scaling and softtsc reprise, and PROPOSAL

While at Linux Symposium last week, I heard a rumor that VMware ESX
always traps and emulates all rdtsc instructions.  (Can anyone confirm
or deny this?)

This reminded me that I''m not sure we came to any conclusion
for proper handling of TSC in Xen, though I think that the
scaling patch was taken into xen-unstable, meaning that some
users will unknowingly be using softtsc (all rdtsc instructions
fully emulated) when live migrating between machines with
different Hz rates.  This could lead to the bizarre situation
where a time-sensitive SMP app might fail in cryptic ways if
it has never migrated, but work fine if it has.

(Here''s the last discussion I think:
http://lists.xensource.com/archives/html/xen-devel/2009-06/msg00980.html)

I dug up some old measurements from when we first implemented
softtsc that I think showed that emulating TSC averages around
one microsecond on my Conroe box.  John Levon''s measurements showed
that Solaris'' mstate accounting was doing rdtsc at a frequency of
about 3000/sec (per processor on an idle system), which would
translate to a fraction of a percent of CPU time, even for this
very excessive use of rdtsc.

While I''d like to see my measurement independently confirmed
(and on a wider variety of old and new systems); and some better
(heavy-workload) data on mstate accounting tsc frequency; and
a rerun of the oltp workload that showed poor (10%?) results
to prove that this number is real and not just apocryphal;
this raw data leads me to the following:

PROPOSAL:

The default mode for all xen systems should be that all rdtsc
instructions should be emulated by xen using xen system time
as the timestamp counter (i.e. nanosecond frequency).

The no-softtsc Xen boot option remains available to force the
non-trapping mechanism if desired.  It might make sense to
add a per-guest config option to override per guest.

The Xen CPU info emulation should reflect that tsc is constant
and safe to use on an SMP.

Comments?  I think someone at Intel (Eddie?) was studying the
TSC emulation path to see if it could be faster, but I''m not
sure where that ended up.

Thanks,
Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-20 17:14 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 20/07/2009 18:05, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> The default mode for all xen systems should be that all rdtsc
> instructions should be emulated by xen using xen system time
> as the timestamp counter (i.e. nanosecond frequency).
> 
> The no-softtsc Xen boot option remains available to force the
> non-trapping mechanism if desired.  It might make sense to
> add a per-guest config option to override per guest.
> 
> The Xen CPU info emulation should reflect that tsc is constant
> and safe to use on an SMP.
> 
> Comments?  I think someone at Intel (Eddie?) was studying the
> TSC emulation path to see if it could be faster, but I''m not
> sure where that ended up.
Defaults which slow things down are never popular. The slowdown on a
non-idle Solaris guest, for example, could be significant. It is a
correctness/accuracy vs performance tradeoff though. But I don''t think
there
are many real-world complaints about the TSC accuracy now -- I think the
default is set appropriately.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-20 20:02 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> > The default mode for all xen systems should be that all rdtsc
> > instructions should be emulated by xen using xen system time
> > as the timestamp counter (i.e. nanosecond frequency).
> > 
> > The no-softtsc Xen boot option remains available to force the
> > non-trapping mechanism if desired.  It might make sense to
> > add a per-guest config option to override per guest.
> > 
> > The Xen CPU info emulation should reflect that tsc is constant
> > and safe to use on an SMP.
> > 
> > Comments?  I think someone at Intel (Eddie?) was studying the
> > TSC emulation path to see if it could be faster, but I''m not
> > sure where that ended up.
> 
> Defaults which slow things down are never popular. The slowdown on a
> non-idle Solaris guest, for example, could be significant. It is a
> correctness/accuracy vs performance tradeoff though. But I 
> don''t think there
> are many real-world complaints about the TSC accuracy now -- 
> I think the
> default is set appropriately.
Just wondering... are there other known cases in Xen where
a correctness-vs-performance tradeoff has been made in favor
of performance?

I agree that if the performance is *really bad*, the default
should not change.  But I think we are still flying on rumors
of data collected years ago in a very different world, and
the performance data should be re-collected to prove that
it is still *really bad*.  If the degradation is a fraction
of a percent even in worst case analysis, I think the default
should be changed so that correctness prevails.

Why now?  Because more and more real-world applications are
built on top of multi-core platforms where TSC is reliable
and (by far) the best timesource.  And I think(?) we all agree
now that softtsc is the only way to guarantee correctness
in a virtual environment.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-20 21:02 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 20/07/2009 21:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> I agree that if the performance is *really bad*, the default
> should not change.  But I think we are still flying on rumors
> of data collected years ago in a very different world, and
> the performance data should be re-collected to prove that
> it is still *really bad*.  If the degradation is a fraction
> of a percent even in worst case analysis, I think the default
> should be changed so that correctness prevails.
> 
> Why now?  Because more and more real-world applications are
> built on top of multi-core platforms where TSC is reliable
> and (by far) the best timesource.  And I think(?) we all agree
> now that softtsc is the only way to guarantee correctness
> in a virtual environment.
So how bad is the non-softtsc default mode anyway? Our default timer_mode
has guest TSCs track host TSC (plus a fixed per-vcpu offset that defaults to
having all vcpus of a domain aligned to vcpu0 boot = zero tsc).

Looking at the email thread you cited, all I see is someone from Intel
saying something about how their code to improve TSC consistency across
migration avoids RDTSC exiting where possible (which I do not see -- if the
TSC rates across the hosts do not match closely then RDTSC exiting is
enabled forever for that domain), and, most bizarrely, that their
''solution''
may have a tsc drift >10^5 cycles. Where did this huge number come from?
What solution is being talked about, and under what conditions might the
claim hold? Who knows!

I don''t think we have really solid data on either the performance or
the
accuracy side of the debate. And that means we don''t have much to argue
over.

 -- Keir

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-20 23:52 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> So how bad is the non-softtsc default mode anyway?
A fair question.  To me, "bad" means that TSC going backwards
can be detected by an application that samples TSC in
different threads that have been synchronized through some
simple "ordering" semaphore.  I admit this is a difficult
goal to achieve and few applications in reality will depend
on this exactly, but it is certainly feasible for a database or
a tracing tool to timestamp ordered events this way and
expect to be able to replay them in timestamp order.
(For the sake of any further discussion, let''s call this
tsc-epsilon.... if the skew exceeds tsc-epsilon then the
app might observe time going backwards.)
> Our default timer_mode
> has guest TSCs track host TSC (plus a fixed per-vcpu offset 
> that defaults to
> having all vcpus of a domain aligned to vcpu0 boot = zero tsc).
Are you referring to c/s 19506?  It looks like this code
only runs on a physical machine on which tsc is already
well-behaved.  Is this because the X86_FEATURE_CONSTANT_TSC
bit is passed through unchanged to the guest so that
you are assuming guests "know" whether they can trust TSC
or not?  AFAIK, this bit is not particularly reliable (reflects
the socket, not the system) and not well-exposed to applications.
> Looking at the email thread you cited, all I see is someone from Intel
> saying something about how their code to improve TSC 
> consistency across
> migration avoids RDTSC exiting where possible (which I do not 
> see -- if the
> TSC rates across the hosts do not match closely then RDTSC exiting is
> enabled forever for that domain), and, most bizarrely, that 
> their ''solution''
> may have a tsc drift >10^5 cycles. Where did this huge number 
> come from?
Yes, I don''t know where that number comes from either.
> What solution is being talked about, and under what 
> conditions might the
> claim hold? Who knows!
> 
> I don''t think we have really solid data on either the 
> performance or the
> accuracy side of the debate. And that means we don''t have 
> much to argue
> over.
I''m concerned with correctness.  Although sufficient accuracy
provides correctness, I don''t think we are anywhere near
tsc-epsilon.  So the only way to guarantee correctness is
via softtsc on all vcpus.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Xiantao

2009-Jul-22 05:05 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

Keir Fraser wrote:> On 20/07/2009 21:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com>
> wrote: 
> 
>> I agree that if the performance is *really bad*, the default
>> should not change.  But I think we are still flying on rumors
>> of data collected years ago in a very different world, and
>> the performance data should be re-collected to prove that
>> it is still *really bad*.  If the degradation is a fraction
>> of a percent even in worst case analysis, I think the default
>> should be changed so that correctness prevails.
>> 
>> Why now?  Because more and more real-world applications are
>> built on top of multi-core platforms where TSC is reliable
>> and (by far) the best timesource.  And I think(?) we all agree
>> now that softtsc is the only way to guarantee correctness
>> in a virtual environment.
> 
> So how bad is the non-softtsc default mode anyway? Our default
> timer_mode has guest TSCs track host TSC (plus a fixed per-vcpu
> offset that defaults to having all vcpus of a domain aligned to vcpu0
> boot = zero tsc). 
> 
> Looking at the email thread you cited, all I see is someone from Intel
> saying something about how their code to improve TSC consistency
> across migration avoids RDTSC exiting where possible (which I do not
> see -- if the TSC rates across the hosts do not match closely then
> RDTSC exiting is enabled forever for that domain), and, most
> bizarrely, that their ''solution'' may have a tsc drift
>10^5 cycles.
> Where did this huge number come from? What solution is being talked
> about, and under what conditions might the claim hold? Who knows!
We had done the experiment to measure the performance impact with softtsc using
oltp workload, and we saw ~10% performance loss if rdtsc rate is more than
120,000/second. And we also did some other tests, and the results show that ~1%
perfomance loss is caused by 10000 rdtsc instructions.  So if the rdtsc rate is
not that high(>10000/second), the performance impact can be ignored.

We also introduced some performance optimization solutions, but as we claimed
before, they may bring some TSC drift ( 10^5~10^6 cycles) between virtual
processors in SMP cases.  One solution is described below, for example, the
guest is migrated from low TSC freq(low_freq) machine to a high TSC freq
one(high_freq), you know, the low frequency is guest''s expected
frequency(exp_freq), and we should let guest be aware that it is running on the
machine with exp_freq TSC to avoid possbile issues caused by faster TSC in any
optimization solution.

1. In this solution, we only guarantee guest''s TSC is increasing
monotonically and the average frequency equals guest''s expected
frequency(exp_freq) in a fixed time slot (eg. ~1ms).
2. To be simple,  let guest running in high_freq TSC (with hardware TSC offset
feature, no perfomrance loss) for 1ms, and then enable rdtsc exiting and use
trap and emulation method(suffers perfomance loss) to let guest running in a
*VERY VERY* low frequency TSC(e.g 0.2 G Hz) for some time, and the specific
value can be calculated with the formula to guarantee average TSC frquency ==
exp_freq:
		time = (high_freq - low_freq) / (low_freq - 0.2). 

3. If the guest migrate from 2.4G machine to 3.0G machine, only in (3.0-2.4)
/(2.4-0.2) == ~0.273ms guest has to suffer performance loss in the total time
1ms+0.273ms ,and that is also to say, in most of the time guest can leverage
hardware''s TSC offset feature to reduce perfomrance loss.

4.  In the 1.273ms, we can say guest''s TSC frequency is emulated to its
expected one through the hardware and software''s co-emulation. And the
perfomance loss is very minor compared with purely softtsc solution.
5.  But at the same time, since each vcpu''s TSC is emulated
indpendently for SMP guest, and they may generate a drift value between vcpus,
and the drift vaule''s range should be 10^5 ~10^6 cycles, and we
don''t know such drift between vcpus whether can bring other
side-effects.  At least, one side-effect case we can figure out is when one
application running on one vcpu, and it may see backward TSC value after its
migrating to another vcpu.  Not sure this is a real problem, but it should exist
in theory.

Attached the draft patch to implement the solution based on an old #Cset19591. 

Xiantao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-23 13:24 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

Hi Xiantao --

Sorry for delayed response.  A few comments/questions:

Thanks very much for the additional detail on the 10%
performance loss.  What is this oltp benchmark?  Is
it available for others to run?  Also is the rdtsc
rate 120000/sec on EACH processor?

Assuming a 3GHz machine, your results seem to show that
emulating a rdtsc with softtsc takes about 2500 cycles.
This agrees with my approximation of about 1 usec.

Have you analyzed where this 2500 cycles is being used?
My suggestion about performance optimization was not
to try a different algorithm but to see if it is possible
to code the existing algorithm much faster using a
special trap path and assembly code. (We called this
a "fast path" on Xen/ia64.)  Even if the 2500 cycles
can be cut in half, that would be a big win.

Am I correct in reading that your patch is ONLY for
HVM guests?  If so, since some (maybe most) workloads
that rely on tsc for transaction timestamps will be
PV, your patch doesn''t solve the whole problem.

Can someone at Intel confirm or deny that VMware ESX
always traps rdtsc?  If so, it is probably not hard
to write an application that works on VMware ESX (on
certain hardware) but fails on Xen.

Thanks,
Dan
> -----Original Message-----
> From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com]
> Sent: Tuesday, July 21, 2009 11:05 PM
> To: Keir Fraser; Dan Magenheimer; Xen-Devel (E-mail)
> Cc: John Levon; Ian Pratt; Dong, Eddie
> Subject: RE: TSC scaling and softtsc reprise, and PROPOSAL
> 
> 
> Keir Fraser wrote:
> > On 20/07/2009 21:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com>
> > wrote: 
> > 
> >> I agree that if the performance is *really bad*, the default
> >> should not change.  But I think we are still flying on rumors
> >> of data collected years ago in a very different world, and
> >> the performance data should be re-collected to prove that
> >> it is still *really bad*.  If the degradation is a fraction
> >> of a percent even in worst case analysis, I think the default
> >> should be changed so that correctness prevails.
> >> 
> >> Why now?  Because more and more real-world applications are
> >> built on top of multi-core platforms where TSC is reliable
> >> and (by far) the best timesource.  And I think(?) we all agree
> >> now that softtsc is the only way to guarantee correctness
> >> in a virtual environment.
> > 
> > So how bad is the non-softtsc default mode anyway? Our default
> > timer_mode has guest TSCs track host TSC (plus a fixed per-vcpu
> > offset that defaults to having all vcpus of a domain 
> aligned to vcpu0
> > boot = zero tsc). 
> > 
> > Looking at the email thread you cited, all I see is someone 
> from Intel
> > saying something about how their code to improve TSC consistency
> > across migration avoids RDTSC exiting where possible (which I do not
> > see -- if the TSC rates across the hosts do not match closely then
> > RDTSC exiting is enabled forever for that domain), and, most
> > bizarrely, that their ''solution'' may have a tsc
drift >10^5 cycles.
> > Where did this huge number come from? What solution is being talked
> > about, and under what conditions might the claim hold? Who knows!
> 
> We had done the experiment to measure the performance impact 
> with softtsc using oltp workload, and we saw ~10% performance 
> loss if rdtsc rate is more than 120,000/second. And we also 
> did some other tests, and the results show that ~1% 
> perfomance loss is caused by 10000 rdtsc instructions.  So if 
> the rdtsc rate is not that high(>10000/second), the 
> performance impact can be ignored.  
> 
> We also introduced some performance optimization solutions, 
> but as we claimed before, they may bring some TSC drift ( 
> 10^5~10^6 cycles) between virtual processors in SMP cases.  
> One solution is described below, for example, the guest is 
> migrated from low TSC freq(low_freq) machine to a high TSC 
> freq one(high_freq), you know, the low frequency is guest''s 
> expected frequency(exp_freq), and we should let guest be 
> aware that it is running on the machine with exp_freq TSC to 
> avoid possbile issues caused by faster TSC in any 
> optimization solution. 
> 
> 1. In this solution, we only guarantee guest''s TSC is 
> increasing monotonically and the average frequency equals 
> guest''s expected frequency(exp_freq) in a fixed time slot (eg.
~1ms).
> 2. To be simple,  let guest running in high_freq TSC (with 
> hardware TSC offset feature, no perfomrance loss) for 1ms, 
> and then enable rdtsc exiting and use trap and emulation 
> method(suffers perfomance loss) to let guest running in a 
> *VERY VERY* low frequency TSC(e.g 0.2 G Hz) for some time, 
> and the specific value can be calculated with the formula to 
> guarantee average TSC frquency == exp_freq:     
> 		time = (high_freq - low_freq) / (low_freq - 0.2). 
> 
> 3. If the guest migrate from 2.4G machine to 3.0G machine, 
> only in (3.0-2.4) /(2.4-0.2) == ~0.273ms guest has to suffer 
> performance loss in the total time 1ms+0.273ms ,and that is 
> also to say, in most of the time guest can leverage 
> hardware''s TSC offset feature to reduce perfomrance loss. 
> 
> 4.  In the 1.273ms, we can say guest''s TSC frequency is 
> emulated to its expected one through the hardware and 
> software''s co-emulation. And the perfomance loss is very 
> minor compared with purely softtsc solution. 
> 5.  But at the same time, since each vcpu''s TSC is emulated 
> indpendently for SMP guest, and they may generate a drift 
> value between vcpus, and the drift vaule''s range should be 
> 10^5 ~10^6 cycles, and we don''t know such drift between vcpus 
> whether can bring other side-effects.  At least, one 
> side-effect case we can figure out is when one application 
> running on one vcpu, and it may see backward TSC value after 
> its migrating to another vcpu.  Not sure this is a real 
> problem, but it should exist in theory. 
> 
> Attached the draft patch to implement the solution based on 
> an old #Cset19591. 
> 
> Xiantao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Ian Pratt

2009-Jul-23 14:54 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> Am I correct in reading that your patch is ONLY for
> HVM guests?  If so, since some (maybe most) workloads
> that rely on tsc for transaction timestamps will be
> PV, your patch doesn''t solve the whole problem.
pre-VT it wasn''t possible to trap RDTSC, so this can''t help PV
guests.
> Can someone at Intel confirm or deny that VMware ESX
> always traps rdtsc?  If so, it is probably not hard
> to write an application that works on VMware ESX (on
> certain hardware) but fails on Xen.
I''d be rather surprised if VMware trapped RDTSC. From what I gather,
ESX3 doesn''t make a great deal of use of VT for 32b guests, so at the
very least it would be tricky to do anything about user space use of rdtsc.

I''ve informally heard that certain version of the JVM and Oracle Db
have a habit of pounding rdtsc hard from user space, but I don''t know
what rates.

Ian

> 
> Thanks,
> Dan
> 
> > -----Original Message-----
> > From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com]
> > Sent: Tuesday, July 21, 2009 11:05 PM
> > To: Keir Fraser; Dan Magenheimer; Xen-Devel (E-mail)
> > Cc: John Levon; Ian Pratt; Dong, Eddie
> > Subject: RE: TSC scaling and softtsc reprise, and PROPOSAL
> >
> >
> > Keir Fraser wrote:
> > > On 20/07/2009 21:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com>
> > > wrote:
> > >
> > >> I agree that if the performance is *really bad*, the default
> > >> should not change.  But I think we are still flying on rumors
> > >> of data collected years ago in a very different world, and
> > >> the performance data should be re-collected to prove that
> > >> it is still *really bad*.  If the degradation is a fraction
> > >> of a percent even in worst case analysis, I think the default
> > >> should be changed so that correctness prevails.
> > >>
> > >> Why now?  Because more and more real-world applications are
> > >> built on top of multi-core platforms where TSC is reliable
> > >> and (by far) the best timesource.  And I think(?) we all
agree
> > >> now that softtsc is the only way to guarantee correctness
> > >> in a virtual environment.
> > >
> > > So how bad is the non-softtsc default mode anyway? Our default
> > > timer_mode has guest TSCs track host TSC (plus a fixed per-vcpu
> > > offset that defaults to having all vcpus of a domain
> > aligned to vcpu0
> > > boot = zero tsc).
> > >
> > > Looking at the email thread you cited, all I see is someone
> > from Intel
> > > saying something about how their code to improve TSC consistency
> > > across migration avoids RDTSC exiting where possible (which I do
not
> > > see -- if the TSC rates across the hosts do not match closely
then
> > > RDTSC exiting is enabled forever for that domain), and, most
> > > bizarrely, that their ''solution'' may have a tsc
drift >10^5 cycles.
> > > Where did this huge number come from? What solution is being
talked
> > > about, and under what conditions might the claim hold? Who knows!
> >
> > We had done the experiment to measure the performance impact
> > with softtsc using oltp workload, and we saw ~10% performance
> > loss if rdtsc rate is more than 120,000/second. And we also
> > did some other tests, and the results show that ~1%
> > perfomance loss is caused by 10000 rdtsc instructions.  So if
> > the rdtsc rate is not that high(>10000/second), the
> > performance impact can be ignored.
> >
> > We also introduced some performance optimization solutions,
> > but as we claimed before, they may bring some TSC drift (
> > 10^5~10^6 cycles) between virtual processors in SMP cases.
> > One solution is described below, for example, the guest is
> > migrated from low TSC freq(low_freq) machine to a high TSC
> > freq one(high_freq), you know, the low frequency is guest''s
> > expected frequency(exp_freq), and we should let guest be
> > aware that it is running on the machine with exp_freq TSC to
> > avoid possbile issues caused by faster TSC in any
> > optimization solution.
> >
> > 1. In this solution, we only guarantee guest''s TSC is
> > increasing monotonically and the average frequency equals
> > guest''s expected frequency(exp_freq) in a fixed time slot
(eg. ~1ms).
> > 2. To be simple,  let guest running in high_freq TSC (with
> > hardware TSC offset feature, no perfomrance loss) for 1ms,
> > and then enable rdtsc exiting and use trap and emulation
> > method(suffers perfomance loss) to let guest running in a
> > *VERY VERY* low frequency TSC(e.g 0.2 G Hz) for some time,
> > and the specific value can be calculated with the formula to
> > guarantee average TSC frquency == exp_freq:
> > 		time = (high_freq - low_freq) / (low_freq - 0.2).
> >
> > 3. If the guest migrate from 2.4G machine to 3.0G machine,
> > only in (3.0-2.4) /(2.4-0.2) == ~0.273ms guest has to suffer
> > performance loss in the total time 1ms+0.273ms ,and that is
> > also to say, in most of the time guest can leverage
> > hardware''s TSC offset feature to reduce perfomrance loss.
> >
> > 4.  In the 1.273ms, we can say guest''s TSC frequency is
> > emulated to its expected one through the hardware and
> > software''s co-emulation. And the perfomance loss is very
> > minor compared with purely softtsc solution.
> > 5.  But at the same time, since each vcpu''s TSC is emulated
> > indpendently for SMP guest, and they may generate a drift
> > value between vcpus, and the drift vaule''s range should be
> > 10^5 ~10^6 cycles, and we don''t know such drift between vcpus
> > whether can bring other side-effects.  At least, one
> > side-effect case we can figure out is when one application
> > running on one vcpu, and it may see backward TSC value after
> > its migrating to another vcpu.  Not sure this is a real
> > problem, but it should exist in theory.
> >
> > Attached the draft patch to implement the solution based on
> > an old #Cset19591.
> >
> > Xiantao
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-23 15:18 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com]
> pre-VT it wasn''t possible to trap RDTSC, so this can''t
help PV guests.
For PV guests, CR4.TSD would always be set, generating
a general protection fault for every rdtsc.  (Or perhaps
I am missing some x86 architectural subtlety?  This is
how it is done on ia64.)
> I''d be rather surprised if VMware trapped RDTSC. From what I 
> gather, ESX3 doesn''t make a great deal of use of VT for 32b 
> guests, so at the very least it would be tricky to do 
> anything about user space use of rdtsc.
I had not heard it before, so am very interested in
independent confirmation (or denial).  Given that
it is impossible (I think) to guarantee correct SMP
behavior without it, and given VMware''s attention
to correctness details, I guess it doesn''t surprise
me.
> I''ve informally heard that certain version of the JVM and 
> Oracle Db have a habit of pounding rdtsc hard from user 
> space, but I don''t know what rates.
Indeed they do and they use it for timestamping
events/transactions, so these are the very same
apps that need to guarantee SMP timestamp ordering.

I realize this is an ugly problem and am searching for
the best middle ground.  For example, if tsc emulation
can be made "fast enough", that''s a good answer.
> -----Original Message-----
> From: Ian Pratt [mailto:Ian.Pratt@eu.citrix.com]
> Sent: Thursday, July 23, 2009 8:54 AM
> To: Dan Magenheimer; Zhang, Xiantao; Keir Fraser; Xen-Devel (E-mail)
> Cc: John Levon; Dong, Eddie; Ian Pratt
> Subject: RE: TSC scaling and softtsc reprise, and PROPOSAL
> 
> 
> 
> > Am I correct in reading that your patch is ONLY for
> > HVM guests?  If so, since some (maybe most) workloads
> > that rely on tsc for transaction timestamps will be
> > PV, your patch doesn''t solve the whole problem.
> 
> pre-VT it wasn''t possible to trap RDTSC, so this can''t
help PV guests.
> 
> > Can someone at Intel confirm or deny that VMware ESX
> > always traps rdtsc?  If so, it is probably not hard
> > to write an application that works on VMware ESX (on
> > certain hardware) but fails on Xen.
> 
> I''d be rather surprised if VMware trapped RDTSC. From what I 
> gather, ESX3 doesn''t make a great deal of use of VT for 32b 
> guests, so at the very least it would be tricky to do 
> anything about user space use of rdtsc.
> 
> I''ve informally heard that certain version of the JVM and 
> Oracle Db have a habit of pounding rdtsc hard from user 
> space, but I don''t know what rates.
> 
> Ian
> 
> 
> > 
> > Thanks,
> > Dan
> > 
> > > -----Original Message-----
> > > From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com]
> > > Sent: Tuesday, July 21, 2009 11:05 PM
> > > To: Keir Fraser; Dan Magenheimer; Xen-Devel (E-mail)
> > > Cc: John Levon; Ian Pratt; Dong, Eddie
> > > Subject: RE: TSC scaling and softtsc reprise, and PROPOSAL
> > >
> > >
> > > Keir Fraser wrote:
> > > > On 20/07/2009 21:02, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com>
> > > > wrote:
> > > >
> > > >> I agree that if the performance is *really bad*, the
default
> > > >> should not change.  But I think we are still flying on
rumors
> > > >> of data collected years ago in a very different world,
and
> > > >> the performance data should be re-collected to prove
that
> > > >> it is still *really bad*.  If the degradation is a
fraction
> > > >> of a percent even in worst case analysis, I think the
default
> > > >> should be changed so that correctness prevails.
> > > >>
> > > >> Why now?  Because more and more real-world applications
are
> > > >> built on top of multi-core platforms where TSC is
reliable
> > > >> and (by far) the best timesource.  And I think(?) we all
agree
> > > >> now that softtsc is the only way to guarantee
correctness
> > > >> in a virtual environment.
> > > >
> > > > So how bad is the non-softtsc default mode anyway? Our
default
> > > > timer_mode has guest TSCs track host TSC (plus a fixed
per-vcpu
> > > > offset that defaults to having all vcpus of a domain
> > > aligned to vcpu0
> > > > boot = zero tsc).
> > > >
> > > > Looking at the email thread you cited, all I see is someone
> > > from Intel
> > > > saying something about how their code to improve TSC
consistency
> > > > across migration avoids RDTSC exiting where possible 
> (which I do not
> > > > see -- if the TSC rates across the hosts do not match 
> closely then
> > > > RDTSC exiting is enabled forever for that domain), and, most
> > > > bizarrely, that their ''solution'' may have
a tsc drift
> >10^5 cycles.
> > > > Where did this huge number come from? What solution is 
> being talked
> > > > about, and under what conditions might the claim hold? 
> Who knows!
> > >
> > > We had done the experiment to measure the performance impact
> > > with softtsc using oltp workload, and we saw ~10% performance
> > > loss if rdtsc rate is more than 120,000/second. And we also
> > > did some other tests, and the results show that ~1%
> > > perfomance loss is caused by 10000 rdtsc instructions.  So if
> > > the rdtsc rate is not that high(>10000/second), the
> > > performance impact can be ignored.
> > >
> > > We also introduced some performance optimization solutions,
> > > but as we claimed before, they may bring some TSC drift (
> > > 10^5~10^6 cycles) between virtual processors in SMP cases.
> > > One solution is described below, for example, the guest is
> > > migrated from low TSC freq(low_freq) machine to a high TSC
> > > freq one(high_freq), you know, the low frequency is
guest''s
> > > expected frequency(exp_freq), and we should let guest be
> > > aware that it is running on the machine with exp_freq TSC to
> > > avoid possbile issues caused by faster TSC in any
> > > optimization solution.
> > >
> > > 1. In this solution, we only guarantee guest''s TSC is
> > > increasing monotonically and the average frequency equals
> > > guest''s expected frequency(exp_freq) in a fixed time
slot
> (eg. ~1ms).
> > > 2. To be simple,  let guest running in high_freq TSC (with
> > > hardware TSC offset feature, no perfomrance loss) for 1ms,
> > > and then enable rdtsc exiting and use trap and emulation
> > > method(suffers perfomance loss) to let guest running in a
> > > *VERY VERY* low frequency TSC(e.g 0.2 G Hz) for some time,
> > > and the specific value can be calculated with the formula to
> > > guarantee average TSC frquency == exp_freq:
> > > 		time = (high_freq - low_freq) / (low_freq - 0.2).
> > >
> > > 3. If the guest migrate from 2.4G machine to 3.0G machine,
> > > only in (3.0-2.4) /(2.4-0.2) == ~0.273ms guest has to suffer
> > > performance loss in the total time 1ms+0.273ms ,and that is
> > > also to say, in most of the time guest can leverage
> > > hardware''s TSC offset feature to reduce perfomrance
loss.
> > >
> > > 4.  In the 1.273ms, we can say guest''s TSC frequency is
> > > emulated to its expected one through the hardware and
> > > software''s co-emulation. And the perfomance loss is very
> > > minor compared with purely softtsc solution.
> > > 5.  But at the same time, since each vcpu''s TSC is
emulated
> > > indpendently for SMP guest, and they may generate a drift
> > > value between vcpus, and the drift vaule''s range should
be
> > > 10^5 ~10^6 cycles, and we don''t know such drift between
vcpus
> > > whether can bring other side-effects.  At least, one
> > > side-effect case we can figure out is when one application
> > > running on one vcpu, and it may see backward TSC value after
> > > its migrating to another vcpu.  Not sure this is a real
> > > problem, but it should exist in theory.
> > >
> > > Attached the draft patch to implement the solution based on
> > > an old #Cset19591.
> > >
> > > Xiantao
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-23 15:29 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 23/07/2009 16:18, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> I''ve informally heard that certain version of the JVM and
>> Oracle Db have a habit of pounding rdtsc hard from user
>> space, but I don''t know what rates.
> 
> Indeed they do and they use it for timestamping
> events/transactions, so these are the very same
> apps that need to guarantee SMP timestamp ordering.
Why would you expect host TSC consistency running on Xen to be worse than
when running on a native OS?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-23 15:45 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 23/07/2009 16:18, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> pre-VT it wasn''t possible to trap RDTSC, so this
can''t help PV guests.
> 
> For PV guests, CR4.TSD would always be set, generating
> a general protection fault for every rdtsc.  (Or perhaps
> I am missing some x86 architectural subtlety?  This is
> how it is done on ia64.)
Forgot about CR4.TSD. Of course it''s not going to be a super fast path,
since #GP(0) will need to be demuxed by decoding the faulting instruction in
the hypervisor, via a not-so-short path.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-23 16:39 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> >> I''ve informally heard that certain version of the JVM and
> >> Oracle Db have a habit of pounding rdtsc hard from user
> >> space, but I don''t know what rates.
> > 
> > Indeed they do and they use it for timestamping
> > events/transactions, so these are the very same
> > apps that need to guarantee SMP timestamp ordering.
> 
> Why would you expect host TSC consistency running on Xen to 
> be worse than
> when running on a native OS?
In short, it is because a new class of machine
is emerging in the virtualization space that
is really a NUMA machine, tries to look like
a SMP (non-NUMA) machine by making memory access
fast enough that NUMA-ness can be ignored,
but for the purposes of time, is still a
NUMA machine.

Let''s consider three physical platforms:

SMALL = single socket (multi-core)
MEDIUM = multiple sockets, same motherboard
LARGE = multiple sockets, multiple motherboards

The LARGE is becoming more widely available (e.g.
HP DL785) because multiple motherboards are
very convenient for field upgradeability (which
has a major impact on support costs).  They
also make a very nice consolidation target for
virtualizing a bunch of SMALL  machines.  However,
SMALL and MEDIUM are much less expensive so much
more prevalent (especially as development machines!).

On SMALL, TSC is always consistent between cores
(at least on all but the first dual-core processors).

On MEDIUM, some claim that TSC is always consistent
between cores on different sockets because the
sockets share a motherboard crystal.  I don''t
know if this is true; if it is true, MEDIUM can
be considered the same as SMALL, if not MEDIUM
can be considered the same as LARGE.  So
ignore MEDIUM as a subcase of one of the others.

On LARGE, the motherboards are connected by
HT or QPI, but neither has any form of clock
synchronization.  So, from a clock perspective,
LARGE needs to be "partitioned"; OR there has
to be sophisticated system software that does
its best to synchronize TSC across all of
the cores (which enterprise OS''s like HP-UX
and AIX have, Linux is working on, and Xen
has... though it remains to be seen if any
of these work "good enough"); OR TSC has to
be abandoned altogether by all software that
relies on it (OR TSC needs to be emulated).

This problem on LARGE machines is obscure enough
that software is developed (on SMALL machines)
that has a hidden timebomb if TSC is not perfectly
consistent. Admittedly, all such software should
have a switch that abandons TSC altogether in favor
of an OS "gettimeofday", but this either depends
on TSC as well or on a verrryyy sllloooowwww
platform timer that if used frequently probably
has as bad or worse a performance impact as
emulating TSC.

So what is "good enough"?  If Xen''s existing
algorithm works poorly on LARGE systems (or
even on older SMALL systems), applications
should abandon TSC.  If Xen''s existing algorithm
works "well", then applications can and should
use TSC.  But unless "good enough" can be carefully
defined and agreed upon between Xen and the
applications AND Xen can communicate "YES
this platform is good enough or NOT" to any
software that cares, we are caught in a gray
area.  Unfortunately, neither is true:  "good
enough" is not defined, AND there is no clean
way to communicate it even if it were.

And living in the gray area means some very
infrequent, very bizarre bugs can arise because
sometimes, unbeknownst to that application,
rarely and irreproducibly, time will appear to
go backwards.  And if timestamps are used,
for example, to replay transactions, data
corruption occurs.

So the choices are:
1) Ignore the problem and hope it never happens (or
   if it does that Xen doesn''t get blamed)
2) Tell all Xen users that TSC should not be used
   as a timestamp.  (In other words, fix your apps
   or always turn on the app''s TSC-is-bad option when
   running virtualized on a "bad" physical machine.)
3) Always emulate TSC and let the heavy TSC users
   pay the performance cost.

Last, as Intel has pointed out, a related kind of
issue occurs when live migration moves a running
VM from a machine with one TSC rate to another machine
with a different TSC rate (or if TSC rate varies
on the same machine, i.e. for power-savings reasons).
It would be nice if our choice (above) solves this
problem too.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-23 16:45 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> >> pre-VT it wasn''t possible to trap RDTSC, so this
can''t
> help PV guests.
> > 
> > For PV guests, CR4.TSD would always be set, generating
> > a general protection fault for every rdtsc.  (Or perhaps
> > I am missing some x86 architectural subtlety?  This is
> > how it is done on ia64.)
> 
> Forgot about CR4.TSD. Of course it''s not going to be a super 
> fast path,
> since #GP(0) will need to be demuxed by decoding the faulting 
> instruction in
> the hypervisor, via a not-so-short path.
And demuxing/decoding can never be "fast enough"?
Even if this particular situation is special-cased?

I''m not saying it will be pretty, or maintainable,
just wondering what the best possible could be?


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-24 08:04 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 23/07/2009 17:39, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> Why would you expect host TSC consistency running on Xen to
>> be worse than
>> when running on a native OS?
> 
> In short, it is because a new class of machine
> is emerging in the virtualization space that
> is really a NUMA machine, tries to look like
> a SMP (non-NUMA) machine by making memory access
> fast enough that NUMA-ness can be ignored,
> but for the purposes of time, is still a
> NUMA machine.
Okay, so the issue you are worried about is not specific to Xen. So how is
native Linux tackling this, for example?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-24 14:47 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> >> Why would you expect host TSC consistency running on Xen to
> >> be worse than
> >> when running on a native OS?
> > 
> > In short, it is because a new class of machine
> > is emerging in the virtualization space that
> > is really a NUMA machine, tries to look like
> > a SMP (non-NUMA) machine by making memory access
> > fast enough that NUMA-ness can be ignored,
> > but for the purposes of time, is still a
> > NUMA machine.
> 
> Okay, so the issue you are worried about is not specific to 
> Xen. So how is
> native Linux tackling this, for example?
I''m not sure that it is, though I''ll look into it.

But the difference is that, in a virtual environment,
sometimes it is "safe" to use TSC and sometimes it is
not and, on a LARGE machine, this changes dynamically.
Further, a guest may "originate" on a physical machine
where it is safe and migrate to a physical machine
where it is not.

OS''s may ask "is TSC safe", but do so once at startup,
and unfortunately the method to ask is ill-defined.
Applications have no way of asking "is TSC safe" so
either use a one-time startup configuration option
or depend on the OS to make the determination by
always using something like gettimeofdayns().

So if Xen ever responds to an OS asking "is TSC safe",
it should answer it for the whole datacenter (which
itself is not static as new machines might be added
while a VM is live).  As a result, Xen''s response must
always be NO.  (Unless, softtsc is the default in which
case the answer can be YES.)

If Xen''s response is always NO, apps must use,
indirectly through the OS, a platform timer (which is
probably a lot slower than softtsc!)

So, in the end, to guarantee correctness, high-
frequency-time-stamping apps are going to have slow
access anyway.  And so my conclusion is that we should
always trap TSC, which can guarantee a fixed-frequency
monotonically-increasing timestamp source across all
machines of all frequencies, whether an app or OS
asks "is TSC safe" or not.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-27 14:47 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> > Can someone at Intel confirm or deny that VMware ESX
> > always traps rdtsc?  If so, it is probably not hard
> > to write an application that works on VMware ESX (on
> > certain hardware) but fails on Xen.
> 
> I''d be rather surprised if VMware trapped RDTSC. From what I 
> gather, ESX3 doesn''t make a great deal of use of VT for 32b 
> guests, so at the very least it would be tricky to do 
> anything about user space use of rdtsc.
Some googling and reading provides evidence that VMware
does indeed virtualize the TSC.  The timekeeping paper
http://www.vmware.com/pdf/vmware_timekeeping.pdf
tells how to turn vTSC off, but says that turning it
off is no longer recommended.  The ASPLOS paper
http://www.vmware.com/pdf/asplos235_adams.pdf
uses rdtsc as an example of how binary translation
is much faster than emulation or callout (though
their BT version fetches a stale TSC which afaict
doesn''t solve the ordering problem).

Also, Avi Kivity tells me that the KVM folks have
also recently come to the conclusion that it is necessary
to emulate TSC, though KVM currently does not.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-27 14:55 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 27/07/2009 15:47, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> Some googling and reading provides evidence that VMware
> does indeed virtualize the TSC.  The timekeeping paper
> http://www.vmware.com/pdf/vmware_timekeeping.pdf
> tells how to turn vTSC off, but says that turning it
> off is no longer recommended.
I believe this affects the guest OS executing RDTSC, not guest apps, and is
only to delay the TSC to not ''run past'' pending timer ticks
(typically where
they have been delayed due to the guest being preempted).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-27 17:25 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> > Some googling and reading provides evidence that VMware
> > does indeed virtualize the TSC.  The timekeeping paper
> > http://www.vmware.com/pdf/vmware_timekeeping.pdf
> > tells how to turn vTSC off, but says that turning it
> > off is no longer recommended.
> 
> I believe this affects the guest OS executing RDTSC, not 
> guest apps, and is
> only to delay the TSC to not ''run past'' pending timer
ticks
> (typically where
> they have been delayed due to the guest being preempted).
> 
>  -- Keir
Could be.  The text would lead me to believe otherwise
though.  Read the section on "Virtual TSC" in the
above PDF. Specifically the Virtual TSC "advances even
when the the virtual CPU is not running" and "In the
past, this feature had sometimes been recommended to
improve performance of APPLICATIONS that read the
TSC frequently..." (my emphasis)




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-27 19:55 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 27/07/2009 18:25, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> I believe this affects the guest OS executing RDTSC, not
>> guest apps, and is
>> only to delay the TSC to not ''run past'' pending timer
ticks
>> (typically where
>> they have been delayed due to the guest being preempted).
>> 
>>  -- Keir
> 
> Could be.  The text would lead me to believe otherwise
> though.  Read the section on "Virtual TSC" in the
> above PDF. Specifically the Virtual TSC "advances even
> when the the virtual CPU is not running" and "In the
> past, this feature had sometimes been recommended to
> improve performance of APPLICATIONS that read the
> TSC frequently..." (my emphasis)
Yes, then it sounds like they virtualise it for apps too. Also there is an
option to virtualise the TSC at a specified frequency -- that would be
pretty weird if it applied only to guest-OS RDTSCs but not guest-app RDTSCs.

Interesting...

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-27 22:14 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> >> I believe this affects the guest OS executing RDTSC, not
> >> guest apps, and is
> >> only to delay the TSC to not ''run past'' pending
timer ticks
> >> (typically where
> >> they have been delayed due to the guest being preempted).
> > 
> > Could be.  The text would lead me to believe otherwise
> > though.  Read the section on "Virtual TSC" in the
> > above PDF. Specifically the Virtual TSC "advances even
> > when the the virtual CPU is not running" and "In the
> > past, this feature had sometimes been recommended to
> > improve performance of APPLICATIONS that read the
> > TSC frequently..." (my emphasis)
> 
> Yes, then it sounds like they virtualise it for apps too. 
> Also there is an
> option to virtualise the TSC at a specified frequency -- that would be
> pretty weird if it applied only to guest-OS RDTSCs but not 
> guest-app RDTSCs.
> 
> Interesting...
> 
>  -- Keir
And further, the frequency is "sticky" across migration, with
the frequency set to whatever machine the VM originated on.

I''d be inclined to just use Xen system time and thus
the TSC frequency would be always 1GHz on all systems.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-27 22:39 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 27/07/2009 23:14, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> Yes, then it sounds like they virtualise it for apps too.
>> Also there is an
>> option to virtualise the TSC at a specified frequency -- that would be
>> pretty weird if it applied only to guest-OS RDTSCs but not
>> guest-app RDTSCs.
>> 
>> Interesting...
> 
> And further, the frequency is "sticky" across migration, with
> the frequency set to whatever machine the VM originated on.
> 
> I''d be inclined to just use Xen system time and thus
> the TSC frequency would be always 1GHz on all systems.
Well that is what softtsc does already, albeit only for HVM guests so far.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Xiantao

2009-Jul-28 00:55 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

Hi, Dan
	Sorry for late reply!  See my comments below. > 
> Thanks very much for the additional detail on the 10%
> performance loss.  What is this oltp benchmark?  Is
> it available for others to run?  Also is the rdtsc
> rate 120000/sec on EACH processor?
OLTP benchmark is a test case of sysbench, and you can get it through the
following link:
http://sysbench.sourceforge.net/

And we only configured one virtual processor for one VM,  and I don''t
know oltp whether can use two virtual processors.
> 
> Assuming a 3GHz machine, your results seem to show that
> emulating a rdtsc with softtsc takes about 2500 cycles.
> This agrees with my approximation of about 1 usec.
> 
> Have you analyzed where this 2500 cycles is being used?
> My suggestion about performance optimization was not
> to try a different algorithm but to see if it is possible
> to code the existing algorithm much faster using a
> special trap path and assembly code. (We called this
> a "fast path" on Xen/ia64.)  Even if the 2500 cycles
> can be cut in half, that would be a big win.
It should have no fast path for emulating rdtsc in x86 side, and the main cost
should be from hardware context switch. Since I am using an old machine when run
this benchmark, the cost should be reduced sharply in latest processors where I
haven''t done the test.
> Am I correct in reading that your patch is ONLY for
> HVM guests?  If so, since some (maybe most) workloads
> that rely on tsc for transaction timestamps will be
> PV, your patch doesn''t solve the whole problem.
Yes, this patch is only for HVM guest, because only HVM guest can use TSC offset
feature(one of VT features) ,and also I don''t think PV guest need it.
> Can someone at Intel confirm or deny that VMware ESX
> always traps rdtsc?  If so, it is probably not hard
> to write an application that works on VMware ESX (on
> certain hardware) but fails on Xen.
\> 
>> -----Original Message-----
>> From: Zhang, Xiantao [mailto:xiantao.zhang@intel.com]
>> Sent: Tuesday, July 21, 2009 11:05 PM
>> To: Keir Fraser; Dan Magenheimer; Xen-Devel (E-mail)
>> Cc: John Levon; Ian Pratt; Dong, Eddie
>> Subject: RE: TSC scaling and softtsc reprise, and PROPOSAL
>> 
>> 
>> Keir Fraser wrote:
>>> On 20/07/2009 21:02, "Dan Magenheimer"
<dan.magenheimer@oracle.com>
>>> wrote: 
>>> 
>>>> I agree that if the performance is *really bad*, the default
>>>> should not change.  But I think we are still flying on rumors
>>>> of data collected years ago in a very different world, and
>>>> the performance data should be re-collected to prove that
>>>> it is still *really bad*.  If the degradation is a fraction
>>>> of a percent even in worst case analysis, I think the default
>>>> should be changed so that correctness prevails.
>>>> 
>>>> Why now?  Because more and more real-world applications are
>>>> built on top of multi-core platforms where TSC is reliable
>>>> and (by far) the best timesource.  And I think(?) we all agree
>>>> now that softtsc is the only way to guarantee correctness
>>>> in a virtual environment.
>>> 
>>> So how bad is the non-softtsc default mode anyway? Our default
>>> timer_mode has guest TSCs track host TSC (plus a fixed per-vcpu
>>> offset that defaults to having all vcpus of a domain aligned to
>>> vcpu0 boot = zero tsc). 
>>> 
>>> Looking at the email thread you cited, all I see is someone from
>>> Intel saying something about how their code to improve TSC
>>> consistency across migration avoids RDTSC exiting where possible
>>> (which I do not see -- if the TSC rates across the hosts do not
>>> match closely then RDTSC exiting is enabled forever for that
>>> domain), and, most bizarrely, that their
''solution'' may have a tsc
>>> drift >10^5 cycles. Where did this huge number come from? What
>>> solution is being talked about, and under what conditions might the
>>> claim hold? Who knows! 
>> 
>> We had done the experiment to measure the performance impact
>> with softtsc using oltp workload, and we saw ~10% performance
>> loss if rdtsc rate is more than 120,000/second. And we also
>> did some other tests, and the results show that ~1%
>> perfomance loss is caused by 10000 rdtsc instructions.  So if
>> the rdtsc rate is not that high(>10000/second), the
>> performance impact can be ignored.
>> 
>> We also introduced some performance optimization solutions,
>> but as we claimed before, they may bring some TSC drift (
>> 10^5~10^6 cycles) between virtual processors in SMP cases.
>> One solution is described below, for example, the guest is
>> migrated from low TSC freq(low_freq) machine to a high TSC
>> freq one(high_freq), you know, the low frequency is guest''s
>> expected frequency(exp_freq), and we should let guest be
>> aware that it is running on the machine with exp_freq TSC to
>> avoid possbile issues caused by faster TSC in any
>> optimization solution.
>> 
>> 1. In this solution, we only guarantee guest''s TSC is
>> increasing monotonically and the average frequency equals
>> guest''s expected frequency(exp_freq) in a fixed time slot (eg.
~1ms).
>> 2. To be simple,  let guest running in high_freq TSC (with
>> hardware TSC offset feature, no perfomrance loss) for 1ms,
>> and then enable rdtsc exiting and use trap and emulation
>> method(suffers perfomance loss) to let guest running in a
>> *VERY VERY* low frequency TSC(e.g 0.2 G Hz) for some time,
>> and the specific value can be calculated with the formula to
>> guarantee average TSC frquency == exp_freq:
>> 		time = (high_freq - low_freq) / (low_freq - 0.2).
>> 
>> 3. If the guest migrate from 2.4G machine to 3.0G machine,
>> only in (3.0-2.4) /(2.4-0.2) == ~0.273ms guest has to suffer
>> performance loss in the total time 1ms+0.273ms ,and that is
>> also to say, in most of the time guest can leverage
>> hardware''s TSC offset feature to reduce perfomrance loss.
>> 
>> 4.  In the 1.273ms, we can say guest''s TSC frequency is
>> emulated to its expected one through the hardware and
>> software''s co-emulation. And the perfomance loss is very
>> minor compared with purely softtsc solution.
>> 5.  But at the same time, since each vcpu''s TSC is emulated
>> indpendently for SMP guest, and they may generate a drift
>> value between vcpus, and the drift vaule''s range should be
>> 10^5 ~10^6 cycles, and we don''t know such drift between vcpus
>> whether can bring other side-effects.  At least, one
>> side-effect case we can figure out is when one application
>> running on one vcpu, and it may see backward TSC value after
>> its migrating to another vcpu.  Not sure this is a real
>> problem, but it should exist in theory.
>> 
>> Attached the draft patch to implement the solution based on
>> an old #Cset19591.
>> 
>> Xiantao

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Zhang, Xiantao

2009-Jul-28 01:46 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

Dan Magenheimer wrote:>>> Can someone at Intel confirm or deny that VMware ESX
>>> always traps rdtsc?  If so, it is probably not hard
>>> to write an application that works on VMware ESX (on
>>> certain hardware) but fails on Xen.
>> 
>> I''d be rather surprised if VMware trapped RDTSC. From what I
>> gather, ESX3 doesn''t make a great deal of use of VT for 32b
>> guests, so at the very least it would be tricky to do
>> anything about user space use of rdtsc.
> 
> Some googling and reading provides evidence that VMware
> does indeed virtualize the TSC.  The timekeeping paper
> http://www.vmware.com/pdf/vmware_timekeeping.pdf
> tells how to turn vTSC off, but says that turning it
> off is no longer recommended.  The ASPLOS paper
> http://www.vmware.com/pdf/asplos235_adams.pdf
> uses rdtsc as an example of how binary translation
> is much faster than emulation or callout (though
> their BT version fetches a stale TSC which afaict
> doesn''t solve the ordering problem).
> Also, Avi Kivity tells me that the KVM folks have
> also recently come to the conclusion that it is necessary
> to emulate TSC, though KVM currently does not. 
Hi, Dan 
   I am still confused about why need to emulate rdtsc is necessary. Even if
emulating it in software, we still need to find a stable time source, right?  If
you think TSC is not stable on SMP system, and I think the issue should exist in
native OS which depends on TSC as time source instead of Xen-self issue.  If
host''s TSC is stable enough, I think the hardware''s TSC offset
feature should be the right way to go ?

I have a proposal on it. If Xen finds hardware''s TSC is not reliable,
it can tell guest about the info at guest''s boot stage, and guest
should use other  time sources(eg, hpet) instead of TSC . And if the TSC is
reliable in hardware, I think we should let Xen try the best to use
hardware''s feature and just leave it as current implementation. If
users know hardware''s TSC is not reliable from his knowledge, it may
set softtsc to solve the possible issue manually.  So maybe we only need to
create a way how to tell guest the TSC''s status in Xen hypervisor.
Xiantao



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-28 14:45 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

>    I am still confused about why need to emulate rdtsc is 
> necessary. Even if emulating it in software, we still need to 
> find a stable time source, right?  If you think TSC is not 
> stable on SMP system, and I think the issue should exist in 
> native OS which depends on TSC as time source instead of 
> Xen-self issue.  If host''s TSC is stable enough, I think the 
> hardware''s TSC offset feature should be the right way to go ?  
> 
> I have a proposal on it. If Xen finds hardware''s TSC is not 
> reliable, it can tell guest about the info at guest''s boot 
> stage, and guest should use other  time sources(eg, hpet) 
> instead of TSC . And if the TSC is reliable in hardware, I 
> think we should let Xen try the best to use hardware''s 
> feature and just leave it as current implementation. If users 
> know hardware''s TSC is not reliable from his knowledge, it 
> may set softtsc to solve the possible issue manually.  So 
> maybe we only need to create a way how to tell guest the 
> TSC''s status in Xen hypervisor. 
Hi Xiantao --

Thanks for the info in your previous reply.

The issue is that there''s no easy way for Xen to
determine for sure if the hardware has a reliable TSC.
The TSC_CONSTANT bit in the MSR only applies to the
socket, not to the entire system.  Even if
it is possible on one box, it is not possible to
determine it for the whole data center (to handle
live migration).  Even if it is possible for the
whole data center, new machines may be live-added
to a data center that might be different.  So
no SMP application in a virtualized data center
can assume TSC is monotonic.  But SMP applications
designed on smaller multi-core physical systems
CAN assume TSC is monotonic; when these apps
are moved to virtual systems, problems will
occur (including possible data corruption).

As you''ve pointed out, there are other issues
with migration and power management where the
TSC frequency changes.  Unless/until there is
a TSC-scaling feature as well as a TSC-offset
feature, frequency changes will have to be
handled in software.

A virtual TSC (always trapping all rdtsc instructions)
allows us to guarantee monotonicity and provide a
constant rate (Xen time*, 1GHz) across all processors
in a machine and all machines in a data center.
There is a performance impact for applications
that execute RDTSC at a high frequency but I hope
that we can reduce this penalty somewhat.

I am proposing that the virtual TSC be the default.
We should provide a per-VM option and a Xen boot
option to allow VMs to NOT trap rdtsc, but this
should have a warning that it is not recommended
and may result in data corruption in some apps.

* Xen time by itself is not monotonic across multiple
processors but can be supplemented with a global
variable to always provide "last_tsc + 1" to
enforce monotonicity.

Dan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-28 15:00 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 28/07/2009 15:45, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> I am proposing that the virtual TSC be the default.
> We should provide a per-VM option and a Xen boot
> option to allow VMs to NOT trap rdtsc, but this
> should have a warning that it is not recommended
> and may result in data corruption in some apps.
This I can agree with. The softtsc boot option is just lazy. This should
properly be a per-VM option, for both HVM and PV guests. For example
tsc_freq=x sets virtual TSC of x MHz. And not specifying tsc_freq gets you
current behaviour (pass through host TSC where possible). That I could quite
happily live with, although I''m not planning to implement it myself.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-28 15:46 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> And not specifying tsc_freq gets you
> current behaviour (pass through host TSC where possible). 
I fear that unless the default is changed, it will
not be possible to sufficiently explain the problem
to users/administrators and the option will not get
turned on. In which case, it might as well not be
done at all... just one more obscure option that
nobody understands or uses.

Given that correctness is at stake (and given that
Xen''s primary competitors are choosing correctness
over performance), I see this as a Xen developer
problem to fix, not one to pawn off on harried
system admins.

Savvy system admins (who know every app in their
data center and/or are willing to take the risk
for better performance) should be able to easily
disable softtsc though on all servers with a xen boot
option, or on a per VM basis.

We could quibble about details (maybe softtsc
should only be automatically enabled on SMP guests
or on 64-bit SMP guests or ?? ) but I suspect
that just creates a mess and IMHO we should just
bite the bullet.
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Tuesday, July 28, 2009 9:00 AM
> To: Dan Magenheimer; Zhang, Xiantao; Ian Pratt; Xen-Devel (E-mail)
> Cc: John Levon; Dong, Eddie
> Subject: Re: TSC scaling and softtsc reprise, and PROPOSAL
> 
> 
> On 28/07/2009 15:45, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> 
> > I am proposing that the virtual TSC be the default.
> > We should provide a per-VM option and a Xen boot
> > option to allow VMs to NOT trap rdtsc, but this
> > should have a warning that it is not recommended
> > and may result in data corruption in some apps.
> 
> This I can agree with. The softtsc boot option is just lazy. 
> This should
> properly be a per-VM option, for both HVM and PV guests. For example
> tsc_freq=x sets virtual TSC of x MHz. And not specifying 
> tsc_freq gets you
> current behaviour (pass through host TSC where possible). 
> That I could quite
> happily live with, although I''m not planning to implement it
myself.
> 
>  -- Keir
> 
> 
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-28 15:58 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 28/07/2009 16:46, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> Savvy system admins (who know every app in their
> data center and/or are willing to take the risk
> for better performance) should be able to easily
> disable softtsc though on all servers with a xen boot
> option, or on a per VM basis.
> 
> We could quibble about details (maybe softtsc
> should only be automatically enabled on SMP guests
> or on 64-bit SMP guests or ?? ) but I suspect
> that just creates a mess and IMHO we should just
> bite the bullet.
I can live with that, if it is driven from the xend toolstack. It will have
to default off in the hypervisor for compatibility with old saved images.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-28 18:15 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> > Savvy system admins (who know every app in their
> > data center and/or are willing to take the risk
> > for better performance) should be able to easily
> > disable softtsc though on all servers with a xen boot
> > option, or on a per VM basis.
> > 
> > We could quibble about details (maybe softtsc
> > should only be automatically enabled on SMP guests
> > or on 64-bit SMP guests or ?? ) but I suspect
> > that just creates a mess and IMHO we should just
> > bite the bullet.
> 
> I can live with that, if it is driven from the xend 
> toolstack. It will have
> to default off in the hypervisor for compatibility with old 
> saved images.
> 
>  -- Keir
Hmmm... one could argue that with the current model,
any VM using TSC is "at your own peril" and there are
certainly cases of restore that will break whatever
assumptions the VM is making about pre-save TSC
values.   So while I''m a believer in compatibility,
I''d suggest default ON in the hypervisor and
a new restore option that force-overrides the
softtsc boot-time default for any VM being restored.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-28 18:43 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 28/07/2009 19:15, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> I can live with that, if it is driven from the xend
>> toolstack. It will have
>> to default off in the hypervisor for compatibility with old
>> saved images.
> 
> Hmmm... one could argue that with the current model,
> any VM using TSC is "at your own peril" and there are
> certainly cases of restore that will break whatever
> assumptions the VM is making about pre-save TSC
> values.   So while I''m a believer in compatibility,
> I''d suggest default ON in the hypervisor and
> a new restore option that force-overrides the
> softtsc boot-time default for any VM being restored.
It would be defaulted on by the toolstack for all newly created guests.
That''s quite sufficient I think.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2009-Jul-28 18:48 UTC

head link

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 28/07/2009 19:15, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> a new restore option that force-overrides the
> softtsc boot-time default for any VM being restored.
Not sure if you mean guest boot-time or host boot-time here, by the way. If
you mean the latter, I should point out that a per-VM option for this would
supplant the softtsc boot parameter (which would be completely removed).

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Jul-28 19:10 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

> >> I can live with that, if it is driven from the xend
> >> toolstack. It will have
> >> to default off in the hypervisor for compatibility with old
> >> saved images.
> > 
> > Hmmm... one could argue that with the current model,
> > any VM using TSC is "at your own peril" and there are
> > certainly cases of restore that will break whatever
> > assumptions the VM is making about pre-save TSC
> > values.   So while I''m a believer in compatibility,
> > I''d suggest default ON in the hypervisor and
> > a new restore option that force-overrides the
> > softtsc boot-time default for any VM being restored.
> 
> It would be defaulted on by the toolstack for all newly 
> created guests.
> That''s quite sufficient I think.
I guess I''m concerned that there are many toolstacks
that will need to be fixed, but there is one hypervisor.
Defaulting to softtsc in the hypervisor essentially
fixes the problem for the future and makes it clear
that the Xen developers have made a decision; waiting
for various vendor toolstacks to enforce a default (not
to mention going through the argument to convince
each vendor) presents a mixed message, prolongs the agony,
and almost guarantees chaos for years to come.

This is a subtle but fundamental change in the way
Xen works, necessary for correctness.  I think we
should bite the bullet and do it right.

Can the hypervisor itself tell the difference whether
a domain is being created vs restored?  I think not,
but if it can, that might be a good compromise.

Dan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Aug-03 20:19 UTC

head link

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

FYI, I have confirmed with a VMware expert that
TSC is always emulated (unless a flag is set).
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Monday, July 27, 2009 1:55 PM
> To: Dan Magenheimer; Ian Pratt; Zhang, Xiantao; Xen-Devel (E-mail)
> Cc: John Levon; Dong, Eddie
> Subject: Re: TSC scaling and softtsc reprise, and PROPOSAL
> 
> 
> On 27/07/2009 18:25, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> 
> >> I believe this affects the guest OS executing RDTSC, not
> >> guest apps, and is
> >> only to delay the TSC to not ''run past'' pending
timer ticks
> >> (typically where
> >> they have been delayed due to the guest being preempted).
> >> 
> >>  -- Keir
> > 
> > Could be.  The text would lead me to believe otherwise
> > though.  Read the section on "Virtual TSC" in the
> > above PDF. Specifically the Virtual TSC "advances even
> > when the the virtual CPU is not running" and "In the
> > past, this feature had sometimes been recommended to
> > improve performance of APPLICATIONS that read the
> > TSC frequently..." (my emphasis)
> 
> Yes, then it sounds like they virtualise it for apps too. 
> Also there is an
> option to virtualise the TSC at a specified frequency -- that would be
> pretty weird if it applied only to guest-OS RDTSCs but not 
> guest-app RDTSCs.
> 
> Interesting...
> 
>  -- Keir
> 
> 
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Jeremy Fitzhardinge

2009-Aug-05 00:05 UTC

head link

Re: [Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

On 07/24/09 01:04, Keir Fraser wrote:> Okay, so the issue you are worried about is not specific to Xen. So how is
> native Linux tackling this, for example?
>   
Linux will use the tsc where possible, but regularly assesses its
perceived accuracy and will move to a different clocksource if the tsc
appears to the playing up.  I don''t think it ever assumes the tsc is
synced between CPU/cores.

It allows rdtsc from usermode, but it is generally considered to be very
buggy and ill-defined behaviour.  It makes no attempt to make usermode
rdtsc in any way meaningful.  The exception is the vgettimeofday
vsyscall which does Xen-like timekeeping, in which it gets the tsc,cpu
tuple atomically, then scales it with timing parameters from the kernel.

    J

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Tian, Kevin

2009-Aug-05 05:35 UTC

head link

RE: [Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

>From: Jeremy Fitzhardinge
>Sent: 2009年8月5日 8:06
>
>On 07/24/09 01:04, Keir Fraser wrote:
>> Okay, so the issue you are worried about is not specific to 
>Xen. So how is
>> native Linux tackling this, for example?
>>   
>
>Linux will use the tsc where possible, but regularly assesses its
>perceived accuracy and will move to a different clocksource if the tsc
>appears to the playing up.  I don't think it ever assumes the tsc is
>synced between CPU/cores.
It cares. See tsc_sync.c under x86 arch, where unsynced warps
mark tsc as unstable. 

Thanks,
Kevin
>
>It allows rdtsc from usermode, but it is generally considered 
>to be very
>buggy and ill-defined behaviour.  It makes no attempt to make usermode
>rdtsc in any way meaningful.  The exception is the vgettimeofday
>vsyscall which does Xen-like timekeeping, in which it gets the tsc,cpu
>tuple atomically, then scales it with timing parameters from 
>the kernel.
>
>    J
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel
>_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Aug-06 21:13 UTC

head link

RE: [Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

Well actually, "how Linux handles this" is subject to
a dizzying matrix of hardware-dependent, CONFIG_-dependent,
linux-boot-parameter-dependent choices
that have evolved/changed at nearly every Linux
kernel version.  While it might be useful to steal
some recent Linux code to help determine if it is
safe to build Xen system time on top of TSC on some
processors, I don''t know if Linux is of much use as
a design guide for how to expose TSC to guests/apps,
especially when said guests/apps may be moving back and
forth between hardware with widely varying TSC
characteristics.

But, yes, as Kevin points out, on some recent versions
of Linux on some hardware with some CONFIG/boot-params,
the kernel does indeed try to use TSC as a reliable
foundation for delivering ticks and gettimeofday-ish
services.
> -----Original Message-----
> From: Tian, Kevin [mailto:kevin.tian@intel.com]
> Sent: Tuesday, August 04, 2009 11:36 PM
> To: Jeremy Fitzhardinge; Keir Fraser
> Cc: Dan Magenheimer; Xen-Devel (E-mail); Dong, Eddie; John
> Levon; Ian Pratt; Zhang, Xiantao
> Subject: RE: [Xen-devel] Re: TSC scaling and softtsc reprise,
> and PROPOSAL
>
>
> >From: Jeremy Fitzhardinge
> >Sent: 2009?8?5? 8:06
> >
> >On 07/24/09 01:04, Keir Fraser wrote:
> >> Okay, so the issue you are worried about is not specific to
> >Xen. So how is
> >> native Linux tackling this, for example?
> >>  
> >
> >Linux will use the tsc where possible, but regularly assesses its
> >perceived accuracy and will move to a different clocksource
> if the tsc
> >appears to the playing up.  I don''t think it ever assumes the
tsc is
> >synced between CPU/cores.
>
> It cares. See tsc_sync.c under x86 arch, where unsynced warps
> mark tsc as unstable.
>
> Thanks,
> Kevin
>
> >
> >It allows rdtsc from usermode, but it is generally considered
> >to be very
> >buggy and ill-defined behaviour.  It makes no attempt to
> make usermode
> >rdtsc in any way meaningful.  The exception is the vgettimeofday
> >vsyscall which does Xen-like timekeeping, in which it gets
> the tsc,cpu
> >tuple atomically, then scales it with timing parameters from
> >the kernel.
> >
> >    J
> >
> >_______________________________________________
> >Xen-devel mailing list
> >Xen-devel@lists.xensource.com
> >http://lists.xensource.com/xen-devel
> >
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2009-Aug-06 21:41 UTC

head link

RE: [Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

Oops, forgot to add...
>>>Linux will use the tsc where possible, but regularly
>>>assesses its
"Regularly assesses" is a big misleading... according
to my reading of the 2.6.30 code, it checks for "good
synchronization" once at boot, then after that only
ensures that things haven''t gone completely whacko
by checking that multiple TSCs haven''t diverged by
more than ~60msec(?).  While I suppose this will catch
the most likely divergent hardware cases, I suspect
Xen''s periodically-attempt-to-sync-the-TSC code might
lull the linux kernel into complacency (and 60msec
accuracy is just not good enough for applications).
> -----Original Message-----
> From: Dan Magenheimer 
> Sent: Thursday, August 06, 2009 3:13 PM
> To: Tian, Kevin; Jeremy Fitzhardinge; Keir Fraser
> Cc: Ian Pratt; Xen-Devel (E-mail); Dong, Eddie; Zhang, Xiantao; John
> Levon
> Subject: RE: [Xen-devel] Re: TSC scaling and softtsc reprise, and
> PROPOSAL
> 
> 
> Well actually, "how Linux handles this" is subject to
> a dizzying matrix of hardware-dependent, CONFIG_-dependent,
> linux-boot-parameter-dependent choices
> that have evolved/changed at nearly every Linux
> kernel version.  While it might be useful to steal
> some recent Linux code to help determine if it is
> safe to build Xen system time on top of TSC on some
> processors, I don''t know if Linux is of much use as
> a design guide for how to expose TSC to guests/apps,
> especially when said guests/apps may be moving back and
> forth between hardware with widely varying TSC
> characteristics.
> 
> But, yes, as Kevin points out, on some recent versions
> of Linux on some hardware with some CONFIG/boot-params,
> the kernel does indeed try to use TSC as a reliable
> foundation for delivering ticks and gettimeofday-ish
> services.
> 
> > -----Original Message-----
> > From: Tian, Kevin [mailto:kevin.tian@intel.com]
> > Sent: Tuesday, August 04, 2009 11:36 PM
> > To: Jeremy Fitzhardinge; Keir Fraser
> > Cc: Dan Magenheimer; Xen-Devel (E-mail); Dong, Eddie; John
> > Levon; Ian Pratt; Zhang, Xiantao
> > Subject: RE: [Xen-devel] Re: TSC scaling and softtsc reprise,
> > and PROPOSAL
> >
> >
> > >From: Jeremy Fitzhardinge
> > >Sent: 2009?8?5? 8:06
> > >
> > >On 07/24/09 01:04, Keir Fraser wrote:
> > >> Okay, so the issue you are worried about is not specific to
> > >Xen. So how is
> > >> native Linux tackling this, for example?
> > >>  
> > >
> > >Linux will use the tsc where possible, but regularly assesses its
> > >perceived accuracy and will move to a different clocksource
> > if the tsc
> > >appears to the playing up.  I don''t think it ever assumes
> the tsc is
> > >synced between CPU/cores.
> >
> > It cares. See tsc_sync.c under x86 arch, where unsynced warps
> > mark tsc as unstable.
> >
> > Thanks,
> > Kevin
> >
> > >
> > >It allows rdtsc from usermode, but it is generally considered
> > >to be very
> > >buggy and ill-defined behaviour.  It makes no attempt to
> > make usermode
> > >rdtsc in any way meaningful.  The exception is the vgettimeofday
> > >vsyscall which does Xen-like timekeeping, in which it gets
> > the tsc,cpu
> > >tuple atomically, then scales it with timing parameters from
> > >the kernel.
> > >
> > >    J
> > >
> > >_______________________________________________
> > >Xen-devel mailing list
> > >Xen-devel@lists.xensource.com
> > >http://lists.xensource.com/xen-devel
> > >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Jul 2009 - TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

[Xen-devel] RE: TSC scaling and softtsc reprise, and PROPOSAL

Re: [Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

RE: [Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

RE: [Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL

RE: [Xen-devel] Re: TSC scaling and softtsc reprise, and PROPOSAL