thr3ads.net - Xen devel - [Xen-devel] [PATCH] rendezvous-based local time calibration WOW! [Aug 2008]

If this information is useful, please help other people find it:
Share via:

Dan Magenheimer

2008-Aug-03 16:50 UTC

[Xen-devel] [PATCH] rendezvous-based local time calibration WOW!

The synchronization of local_time_calibration (l_t_c) via
round-to-nearest-epoch provided some improvement, but I was
still seeing skew up to 16usec and higher.  I measured the
temporal distance between the rounded-epoch vs when ltc
was actually running to ensure there wasn't some kind of
bug and found that l_t_c was running up to 150us after the
round-epoch and sometimes up to 50us before.  I guess this
is the granularity of setting a Xen timer.  While it seemed
that +/- 100us shouldn't cause that much skew, I finally
decided to try synchronization-via-rendezvous, as suggested
by Ian here:

http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01074.html
http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html

The result is phenomenal... using this approach (in attached
patch), I have yet to see a skew exceed 1usec!!!  So this is
about a 10-fold increase in accuracy vs the rounded-epoch
method and about 20-fold over the one-epoch-from-NOW() method.

The platform time is now read once for all processors rather
than once per processor.  (Actually, it is read once again
in platform_time_calibration()... by "inlining" that routine
into master_local_time_calibration() that extra read can
be -- and probably should be -- avoided too.)

It may be too late to get this into 3.3.0 but, if so, please
consider it asap for 3.3.1 rather than just xen-unstable/3.4.

Dan

==================================Thanks... for the memory
I really could use more / My throughput's on the floor
The balloon is flat / My swap disk's fat / I've OOM's in store
Overcommitted so much
(with apologies to the late great Bob Hope)

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-03 17:24 UTC

head link

[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!

It''s not safe to poke a new timestamp record from an interrupt handler
(which is what the smp_call_function() callback functions are). Users of the
timestamp records (e.g., get_s_time) need local_irq_save/restore() or an
equivalent of the Linux seqlock. The latter is likely faster. I''m
dubious
about update_vcpu_system_time() from an interrupt handler too. It needs
thought about how it might race with a context switch (change of
''current'')
or if it interrupts an existing invocation of update_vcpu_system_time().

 -- Keir

On 3/8/08 17:50, "Dan Magenheimer" <dan.magenheimer@oracle.com>
wrote:
> The synchronization of local_time_calibration (l_t_c) via
> round-to-nearest-epoch provided some improvement, but I was
> still seeing skew up to 16usec and higher.  I measured the
> temporal distance between the rounded-epoch vs when ltc
> was actually running to ensure there wasn''t some kind of
> bug and found that l_t_c was running up to 150us after the
> round-epoch and sometimes up to 50us before.  I guess this
> is the granularity of setting a Xen timer.  While it seemed
> that +/- 100us shouldn''t cause that much skew, I finally
> decided to try synchronization-via-rendezvous, as suggested
> by Ian here:
> 
> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01074.html
> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html
> 
> The result is phenomenal... using this approach (in attached
> patch), I have yet to see a skew exceed 1usec!!!  So this is
> about a 10-fold increase in accuracy vs the rounded-epoch
> method and about 20-fold over the one-epoch-from-NOW() method.
> 
> The platform time is now read once for all processors rather
> than once per processor.  (Actually, it is read once again
> in platform_time_calibration()... by "inlining" that routine
> into master_local_time_calibration() that extra read can
> be -- and probably should be -- avoided too.)
> 
> It may be too late to get this into 3.3.0 but, if so, please
> consider it asap for 3.3.1 rather than just xen-unstable/3.4.
> 
> Dan
> 
> ==================================> Thanks... for the memory
> I really could use more / My throughput''s on the floor
> The balloon is flat / My swap disk''s fat / I''ve
OOM''s in store
> Overcommitted so much
> (with apologies to the late great Bob Hope)


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Aug-04 15:24 UTC

head link

[Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

OK, how about this version.  The rendezvous only collects
the key per-cpu time data then sets up a per-cpu 1ms timer
to later update the timestamp record and vcpu system time,
so neither should have racing issues.

I''ve only run it for about an hour but still haven''t seen
any skew over 600nsec so apparently it is the collection of
the key time data that must be closely synchronized (probably
to ensure the slope is correct) while exact synchronization
of setting the timestamp records is less important.

Note that I''m not positive I got the clocksource=tsc part
correct... but am interested in your opinion on whether
clocksource=tsc can now be eliminated anyway (as the
main reason I pushed for it was because of unacceptable
skew which with this patch appears to be fixed).

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Sunday, August 03, 2008 11:25 AM
> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail)
> Cc: Ian Pratt; Dave Winchell
> Subject: Re: [PATCH] rendezvous-based local time calibration WOW!
> 
> 
> It''s not safe to poke a new timestamp record from an interrupt
handler
> (which is what the smp_call_function() callback functions 
> are). Users of the
> timestamp records (e.g., get_s_time) need 
> local_irq_save/restore() or an
> equivalent of the Linux seqlock. The latter is likely faster. 
> I''m dubious
> about update_vcpu_system_time() from an interrupt handler 
> too. It needs
> thought about how it might race with a context switch (change 
> of ''current'')
> or if it interrupts an existing invocation of 
> update_vcpu_system_time().
> 
>  -- Keir
> 
> On 3/8/08 17:50, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> 
> > The synchronization of local_time_calibration (l_t_c) via
> > round-to-nearest-epoch provided some improvement, but I was
> > still seeing skew up to 16usec and higher.  I measured the
> > temporal distance between the rounded-epoch vs when ltc
> > was actually running to ensure there wasn''t some kind of
> > bug and found that l_t_c was running up to 150us after the
> > round-epoch and sometimes up to 50us before.  I guess this
> > is the granularity of setting a Xen timer.  While it seemed
> > that +/- 100us shouldn''t cause that much skew, I finally
> > decided to try synchronization-via-rendezvous, as suggested
> > by Ian here:
> >
> > 
> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg
01074.html> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html
>
> The result is phenomenal... using this approach (in attached
> patch), I have yet to see a skew exceed 1usec!!!  So this is
> about a 10-fold increase in accuracy vs the rounded-epoch
> method and about 20-fold over the one-epoch-from-NOW() method.
>
> The platform time is now read once for all processors rather
> than once per processor.  (Actually, it is read once again
> in platform_time_calibration()... by "inlining" that routine
> into master_local_time_calibration() that extra read can
> be -- and probably should be -- avoided too.)
>
> It may be too late to get this into 3.3.0 but, if so, please
> consider it asap for 3.3.1 rather than just xen-unstable/3.4.
>
> Dan
>
> ==================================> Thanks... for the memory
> I really could use more / My throughput''s on the floor
> The balloon is flat / My swap disk''s fat / I''ve
OOM''s in store
> Overcommitted so much
> (with apologies to the late great Bob Hope)



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-04 15:36 UTC

head link

[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!

I''ll take a look and see if it can be worked out for 3.3.0.
It''d be nicer
than clocksource=tsc.

 -- Keir

On 4/8/08 16:24, "Dan Magenheimer" <dan.magenheimer@oracle.com>
wrote:
> OK, how about this version.  The rendezvous only collects
> the key per-cpu time data then sets up a per-cpu 1ms timer
> to later update the timestamp record and vcpu system time,
> so neither should have racing issues.
> 
> I''ve only run it for about an hour but still haven''t seen
> any skew over 600nsec so apparently it is the collection of
> the key time data that must be closely synchronized (probably
> to ensure the slope is correct) while exact synchronization
> of setting the timestamp records is less important.
> 
> Note that I''m not positive I got the clocksource=tsc part
> correct... but am interested in your opinion on whether
> clocksource=tsc can now be eliminated anyway (as the
> main reason I pushed for it was because of unacceptable
> skew which with this patch appears to be fixed).
> 
> Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> 
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> Sent: Sunday, August 03, 2008 11:25 AM
>> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail)
>> Cc: Ian Pratt; Dave Winchell
>> Subject: Re: [PATCH] rendezvous-based local time calibration WOW!
>> 
>> 
>> It''s not safe to poke a new timestamp record from an interrupt
handler
>> (which is what the smp_call_function() callback functions
>> are). Users of the
>> timestamp records (e.g., get_s_time) need
>> local_irq_save/restore() or an
>> equivalent of the Linux seqlock. The latter is likely faster.
>> I''m dubious
>> about update_vcpu_system_time() from an interrupt handler
>> too. It needs
>> thought about how it might race with a context switch (change
>> of ''current'')
>> or if it interrupts an existing invocation of
>> update_vcpu_system_time().
>> 
>>  -- Keir
>> 
>> On 3/8/08 17:50, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> 
>>> The synchronization of local_time_calibration (l_t_c) via
>>> round-to-nearest-epoch provided some improvement, but I was
>>> still seeing skew up to 16usec and higher.  I measured the
>>> temporal distance between the rounded-epoch vs when ltc
>>> was actually running to ensure there wasn''t some kind of
>>> bug and found that l_t_c was running up to 150us after the
>>> round-epoch and sometimes up to 50us before.  I guess this
>>> is the granularity of setting a Xen timer.  While it seemed
>>> that +/- 100us shouldn''t cause that much skew, I finally
>>> decided to try synchronization-via-rendezvous, as suggested
>>> by Ian here:
>>> 
>>> 
>> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg
> 01074.html
>>
http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html
>> 
>> The result is phenomenal... using this approach (in attached
>> patch), I have yet to see a skew exceed 1usec!!!  So this is
>> about a 10-fold increase in accuracy vs the rounded-epoch
>> method and about 20-fold over the one-epoch-from-NOW() method.
>> 
>> The platform time is now read once for all processors rather
>> than once per processor.  (Actually, it is read once again
>> in platform_time_calibration()... by "inlining" that routine
>> into master_local_time_calibration() that extra read can
>> be -- and probably should be -- avoided too.)
>> 
>> It may be too late to get this into 3.3.0 but, if so, please
>> consider it asap for 3.3.1 rather than just xen-unstable/3.4.
>> 
>> Dan
>> 
>> ==================================>> Thanks... for the memory
>> I really could use more / My throughput''s on the floor
>> The balloon is flat / My swap disk''s fat / I''ve
OOM''s in store
>> Overcommitted so much
>> (with apologies to the late great Bob Hope)
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-04 17:10 UTC

head link

[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!

Applied as c/s 18229. I rewrote it quite a bit, although the principle
remains the same.

 -- Keir

On 4/8/08 16:24, "Dan Magenheimer" <dan.magenheimer@oracle.com>
wrote:
> OK, how about this version.  The rendezvous only collects
> the key per-cpu time data then sets up a per-cpu 1ms timer
> to later update the timestamp record and vcpu system time,
> so neither should have racing issues.
> 
> I''ve only run it for about an hour but still haven''t seen
> any skew over 600nsec so apparently it is the collection of
> the key time data that must be closely synchronized (probably
> to ensure the slope is correct) while exact synchronization
> of setting the timestamp records is less important.
> 
> Note that I''m not positive I got the clocksource=tsc part
> correct... but am interested in your opinion on whether
> clocksource=tsc can now be eliminated anyway (as the
> main reason I pushed for it was because of unacceptable
> skew which with this patch appears to be fixed).
> 
> Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> 
>> -----Original Message-----
>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>> Sent: Sunday, August 03, 2008 11:25 AM
>> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail)
>> Cc: Ian Pratt; Dave Winchell
>> Subject: Re: [PATCH] rendezvous-based local time calibration WOW!
>> 
>> 
>> It''s not safe to poke a new timestamp record from an interrupt
handler
>> (which is what the smp_call_function() callback functions
>> are). Users of the
>> timestamp records (e.g., get_s_time) need
>> local_irq_save/restore() or an
>> equivalent of the Linux seqlock. The latter is likely faster.
>> I''m dubious
>> about update_vcpu_system_time() from an interrupt handler
>> too. It needs
>> thought about how it might race with a context switch (change
>> of ''current'')
>> or if it interrupts an existing invocation of
>> update_vcpu_system_time().
>> 
>>  -- Keir
>> 
>> On 3/8/08 17:50, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
>> 
>>> The synchronization of local_time_calibration (l_t_c) via
>>> round-to-nearest-epoch provided some improvement, but I was
>>> still seeing skew up to 16usec and higher.  I measured the
>>> temporal distance between the rounded-epoch vs when ltc
>>> was actually running to ensure there wasn''t some kind of
>>> bug and found that l_t_c was running up to 150us after the
>>> round-epoch and sometimes up to 50us before.  I guess this
>>> is the granularity of setting a Xen timer.  While it seemed
>>> that +/- 100us shouldn''t cause that much skew, I finally
>>> decided to try synchronization-via-rendezvous, as suggested
>>> by Ian here:
>>> 
>>> 
>> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg
> 01074.html
>>
http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html
>> 
>> The result is phenomenal... using this approach (in attached
>> patch), I have yet to see a skew exceed 1usec!!!  So this is
>> about a 10-fold increase in accuracy vs the rounded-epoch
>> method and about 20-fold over the one-epoch-from-NOW() method.
>> 
>> The platform time is now read once for all processors rather
>> than once per processor.  (Actually, it is read once again
>> in platform_time_calibration()... by "inlining" that routine
>> into master_local_time_calibration() that extra read can
>> be -- and probably should be -- avoided too.)
>> 
>> It may be too late to get this into 3.3.0 but, if so, please
>> consider it asap for 3.3.1 rather than just xen-unstable/3.4.
>> 
>> Dan
>> 
>> ==================================>> Thanks... for the memory
>> I really could use more / My throughput''s on the floor
>> The balloon is flat / My swap disk''s fat / I''ve
OOM''s in store
>> Overcommitted so much
>> (with apologies to the late great Bob Hope)
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Aug-04 17:37 UTC

head link

[Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Looks good to me (and much cleaner).  I''ve booted it and
will leave it running for a few hours.

Thanks!
Dan
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Monday, August 04, 2008 11:10 AM
> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail)
> Cc: Ian Pratt; Dave Winchell
> Subject: Re: [PATCH] rendezvous-based local time calibration WOW!
> 
> 
> Applied as c/s 18229. I rewrote it quite a bit, although the principle
> remains the same.
> 
>  -- Keir
> 
> On 4/8/08 16:24, "Dan Magenheimer"
<dan.magenheimer@oracle.com> wrote:
> 
> > OK, how about this version.  The rendezvous only collects
> > the key per-cpu time data then sets up a per-cpu 1ms timer
> > to later update the timestamp record and vcpu system time,
> > so neither should have racing issues.
> >
> > I''ve only run it for about an hour but still haven''t
seen
> > any skew over 600nsec so apparently it is the collection of
> > the key time data that must be closely synchronized (probably
> > to ensure the slope is correct) while exact synchronization
> > of setting the timestamp records is less important.
> >
> > Note that I''m not positive I got the clocksource=tsc part
> > correct... but am interested in your opinion on whether
> > clocksource=tsc can now be eliminated anyway (as the
> > main reason I pushed for it was because of unacceptable
> > skew which with this patch appears to be fixed).
> >
> > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> >
> >> -----Original Message-----
> >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> >> Sent: Sunday, August 03, 2008 11:25 AM
> >> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail)
> >> Cc: Ian Pratt; Dave Winchell
> >> Subject: Re: [PATCH] rendezvous-based local time calibration WOW!
> >>
> >>
> >> It''s not safe to poke a new timestamp record from an 
> interrupt handler
> >> (which is what the smp_call_function() callback functions
> >> are). Users of the
> >> timestamp records (e.g., get_s_time) need
> >> local_irq_save/restore() or an
> >> equivalent of the Linux seqlock. The latter is likely faster.
> >> I''m dubious
> >> about update_vcpu_system_time() from an interrupt handler
> >> too. It needs
> >> thought about how it might race with a context switch (change
> >> of ''current'')
> >> or if it interrupts an existing invocation of
> >> update_vcpu_system_time().
> >>
> >>  -- Keir
> >>
> >> On 3/8/08 17:50, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> >>
> >>> The synchronization of local_time_calibration (l_t_c) via
> >>> round-to-nearest-epoch provided some improvement, but I was
> >>> still seeing skew up to 16usec and higher.  I measured the
> >>> temporal distance between the rounded-epoch vs when ltc
> >>> was actually running to ensure there wasn''t some kind
of
> >>> bug and found that l_t_c was running up to 150us after the
> >>> round-epoch and sometimes up to 50us before.  I guess this
> >>> is the granularity of setting a Xen timer.  While it seemed
> >>> that +/- 100us shouldn''t cause that much skew, I
finally
> >>> decided to try synchronization-via-rendezvous, as suggested
> >>> by Ian here:
> >>>
> >>>
> >> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg
> > 01074.html
> >> 
http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html>>
>> The result is phenomenal... using this approach (in attached
>> patch), I have yet to see a skew exceed 1usec!!!  So this is
>> about a 10-fold increase in accuracy vs the rounded-epoch
>> method and about 20-fold over the one-epoch-from-NOW() method.
>>
>> The platform time is now read once for all processors rather
>> than once per processor.  (Actually, it is read once again
>> in platform_time_calibration()... by "inlining" that routine
>> into master_local_time_calibration() that extra read can
>> be -- and probably should be -- avoided too.)
>>
>> It may be too late to get this into 3.3.0 but, if so, please
>> consider it asap for 3.3.1 rather than just xen-unstable/3.4.
>>
>> Dan
>>
>> ==================================>> Thanks... for the memory
>> I really could use more / My throughput''s on the floor
>> The balloon is flat / My swap disk''s fat / I''ve
OOM''s in store
>> Overcommitted so much
>> (with apologies to the late great Bob Hope)
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Aug-04 19:40 UTC

head link

[Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

After two hours of constant samples with c/s 18229, max
skew is at 251ns!  That''s 70-150x better than I was
measuring just a couple of weeks ago.  YMMV of course.

If you are looking for another marketing-speak bullet for
the 4.0 release announcement, you can call this:

* Greatly improved precision for time-sensitive SMP VMs

or as I am subject to American hyperbole:

* Dramatically improved precision for time-sensitive SMP VMs

Thanks again!
Dan
> -----Original Message-----
> From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
> Sent: Monday, August 04, 2008 11:37 AM
> To: ''Keir Fraser''; ''Xen-Devel (E-mail)''
> Cc: ''Ian Pratt''; ''Dave Winchell''
> Subject: RE: [PATCH] rendezvous-based local time calibration WOW!
> 
> 
> Looks good to me (and much cleaner).  I''ve booted it and
> will leave it running for a few hours.
> 
> Thanks!
> Dan
> 
> > -----Original Message-----
> > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> > Sent: Monday, August 04, 2008 11:10 AM
> > To: dan.magenheimer@oracle.com; Xen-Devel (E-mail)
> > Cc: Ian Pratt; Dave Winchell
> > Subject: Re: [PATCH] rendezvous-based local time calibration WOW!
> > 
> > 
> > Applied as c/s 18229. I rewrote it quite a bit, although 
> the principle
> > remains the same.
> > 
> >  -- Keir
> > 
> > On 4/8/08 16:24, "Dan Magenheimer" 
> <dan.magenheimer@oracle.com> wrote:
> > 
> > > OK, how about this version.  The rendezvous only collects
> > > the key per-cpu time data then sets up a per-cpu 1ms timer
> > > to later update the timestamp record and vcpu system time,
> > > so neither should have racing issues.
> > >
> > > I''ve only run it for about an hour but still
haven''t seen
> > > any skew over 600nsec so apparently it is the collection of
> > > the key time data that must be closely synchronized (probably
> > > to ensure the slope is correct) while exact synchronization
> > > of setting the timestamp records is less important.
> > >
> > > Note that I''m not positive I got the clocksource=tsc
part
> > > correct... but am interested in your opinion on whether
> > > clocksource=tsc can now be eliminated anyway (as the
> > > main reason I pushed for it was because of unacceptable
> > > skew which with this patch appears to be fixed).
> > >
> > > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> > >
> > >> -----Original Message-----
> > >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> > >> Sent: Sunday, August 03, 2008 11:25 AM
> > >> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail)
> > >> Cc: Ian Pratt; Dave Winchell
> > >> Subject: Re: [PATCH] rendezvous-based local time calibration
WOW!
> > >>
> > >>
> > >> It''s not safe to poke a new timestamp record from an
> > interrupt handler
> > >> (which is what the smp_call_function() callback functions
> > >> are). Users of the
> > >> timestamp records (e.g., get_s_time) need
> > >> local_irq_save/restore() or an
> > >> equivalent of the Linux seqlock. The latter is likely faster.
> > >> I''m dubious
> > >> about update_vcpu_system_time() from an interrupt handler
> > >> too. It needs
> > >> thought about how it might race with a context switch (change
> > >> of ''current'')
> > >> or if it interrupts an existing invocation of
> > >> update_vcpu_system_time().
> > >>
> > >>  -- Keir
> > >>
> > >> On 3/8/08 17:50, "Dan Magenheimer" 
> > <dan.magenheimer@oracle.com> wrote:
> > >>
> > >>> The synchronization of local_time_calibration (l_t_c) via
> > >>> round-to-nearest-epoch provided some improvement, but I
was
> > >>> still seeing skew up to 16usec and higher.  I measured
the
> > >>> temporal distance between the rounded-epoch vs when ltc
> > >>> was actually running to ensure there wasn''t some
kind of
> > >>> bug and found that l_t_c was running up to 150us after
the
> > >>> round-epoch and sometimes up to 50us before.  I guess
this
> > >>> is the granularity of setting a Xen timer.  While it
seemed
> > >>> that +/- 100us shouldn''t cause that much skew, I
finally
> > >>> decided to try synchronization-via-rendezvous, as
suggested
> > >>> by Ian here:
> > >>>
> > >>>
> > >>
http://lists.xensource.com/archives/html/xen-devel/2008-07/msg
> > > 01074.html
> > >> 
> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg
01080.html>>
>> The result is phenomenal... using this approach (in attached
>> patch), I have yet to see a skew exceed 1usec!!!  So this is
>> about a 10-fold increase in accuracy vs the rounded-epoch
>> method and about 20-fold over the one-epoch-from-NOW() method.
>>
>> The platform time is now read once for all processors rather
>> than once per processor.  (Actually, it is read once again
>> in platform_time_calibration()... by "inlining" that routine
>> into master_local_time_calibration() that extra read can
>> be -- and probably should be -- avoided too.)
>>
>> It may be too late to get this into 3.3.0 but, if so, please
>> consider it asap for 3.3.1 rather than just xen-unstable/3.4.
>>
>> Dan
>>
>> ==================================>> Thanks... for the memory
>> I really could use more / My throughput''s on the floor
>> The balloon is flat / My swap disk''s fat / I''ve
OOM''s in store
>> Overcommitted so much
>> (with apologies to the late great Bob Hope)
>
>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-04 19:47 UTC

head link

[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!

Thanks, Dan! Of course, there are new features since 3.2 that I did not
include in by version-number-change announcement email. I''ll make a
suitably
updated list for the actual 4.0 release announcement.

 -- Keir

On 4/8/08 20:40, "Dan Magenheimer" <dan.magenheimer@oracle.com>
wrote:
> After two hours of constant samples with c/s 18229, max
> skew is at 251ns!  That''s 70-150x better than I was
> measuring just a couple of weeks ago.  YMMV of course.
> 
> If you are looking for another marketing-speak bullet for
> the 4.0 release announcement, you can call this:
> 
> * Greatly improved precision for time-sensitive SMP VMs
> 
> or as I am subject to American hyperbole:
> 
> * Dramatically improved precision for time-sensitive SMP VMs
> 
> Thanks again!
> Dan
> 
>> -----Original Message-----
>> From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com]
>> Sent: Monday, August 04, 2008 11:37 AM
>> To: ''Keir Fraser''; ''Xen-Devel
(E-mail)''
>> Cc: ''Ian Pratt''; ''Dave Winchell''
>> Subject: RE: [PATCH] rendezvous-based local time calibration WOW!
>> 
>> 
>> Looks good to me (and much cleaner).  I''ve booted it and
>> will leave it running for a few hours.
>> 
>> Thanks!
>> Dan
>> 
>>> -----Original Message-----
>>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>> Sent: Monday, August 04, 2008 11:10 AM
>>> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail)
>>> Cc: Ian Pratt; Dave Winchell
>>> Subject: Re: [PATCH] rendezvous-based local time calibration WOW!
>>> 
>>> 
>>> Applied as c/s 18229. I rewrote it quite a bit, although
>> the principle
>>> remains the same.
>>> 
>>>  -- Keir
>>> 
>>> On 4/8/08 16:24, "Dan Magenheimer"
>> <dan.magenheimer@oracle.com> wrote:
>>> 
>>>> OK, how about this version.  The rendezvous only collects
>>>> the key per-cpu time data then sets up a per-cpu 1ms timer
>>>> to later update the timestamp record and vcpu system time,
>>>> so neither should have racing issues.
>>>> 
>>>> I''ve only run it for about an hour but still
haven''t seen
>>>> any skew over 600nsec so apparently it is the collection of
>>>> the key time data that must be closely synchronized (probably
>>>> to ensure the slope is correct) while exact synchronization
>>>> of setting the timestamp records is less important.
>>>> 
>>>> Note that I''m not positive I got the clocksource=tsc
part
>>>> correct... but am interested in your opinion on whether
>>>> clocksource=tsc can now be eliminated anyway (as the
>>>> main reason I pushed for it was because of unacceptable
>>>> skew which with this patch appears to be fixed).
>>>> 
>>>> Signed-off-by: Dan Magenheimer
<dan.magenheimer@oracle.com>
>>>> 
>>>>> -----Original Message-----
>>>>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
>>>>> Sent: Sunday, August 03, 2008 11:25 AM
>>>>> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail)
>>>>> Cc: Ian Pratt; Dave Winchell
>>>>> Subject: Re: [PATCH] rendezvous-based local time
calibration WOW!
>>>>> 
>>>>> 
>>>>> It''s not safe to poke a new timestamp record from
an
>>> interrupt handler
>>>>> (which is what the smp_call_function() callback functions
>>>>> are). Users of the
>>>>> timestamp records (e.g., get_s_time) need
>>>>> local_irq_save/restore() or an
>>>>> equivalent of the Linux seqlock. The latter is likely
faster.
>>>>> I''m dubious
>>>>> about update_vcpu_system_time() from an interrupt handler
>>>>> too. It needs
>>>>> thought about how it might race with a context switch
(change
>>>>> of ''current'')
>>>>> or if it interrupts an existing invocation of
>>>>> update_vcpu_system_time().
>>>>> 
>>>>>  -- Keir
>>>>> 
>>>>> On 3/8/08 17:50, "Dan Magenheimer"
>>> <dan.magenheimer@oracle.com> wrote:
>>>>> 
>>>>>> The synchronization of local_time_calibration (l_t_c)
via
>>>>>> round-to-nearest-epoch provided some improvement, but I
was
>>>>>> still seeing skew up to 16usec and higher.  I measured
the
>>>>>> temporal distance between the rounded-epoch vs when ltc
>>>>>> was actually running to ensure there wasn''t
some kind of
>>>>>> bug and found that l_t_c was running up to 150us after
the
>>>>>> round-epoch and sometimes up to 50us before.  I guess
this
>>>>>> is the granularity of setting a Xen timer.  While it
seemed
>>>>>> that +/- 100us shouldn''t cause that much skew,
I finally
>>>>>> decided to try synchronization-via-rendezvous, as
suggested
>>>>>> by Ian here:
>>>>>> 
>>>>>> 
>>>>>
http://lists.xensource.com/archives/html/xen-devel/2008-07/msg
>>>> 01074.html
>>>>> 
>> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg
> 01080.html
>>> 
>>> The result is phenomenal... using this approach (in attached
>>> patch), I have yet to see a skew exceed 1usec!!!  So this is
>>> about a 10-fold increase in accuracy vs the rounded-epoch
>>> method and about 20-fold over the one-epoch-from-NOW() method.
>>> 
>>> The platform time is now read once for all processors rather
>>> than once per processor.  (Actually, it is read once again
>>> in platform_time_calibration()... by "inlining" that
routine
>>> into master_local_time_calibration() that extra read can
>>> be -- and probably should be -- avoided too.)
>>> 
>>> It may be too late to get this into 3.3.0 but, if so, please
>>> consider it asap for 3.3.1 rather than just xen-unstable/3.4.
>>> 
>>> Dan
>>> 
>>> ==================================>>> Thanks... for the
memory
>>> I really could use more / My throughput''s on the floor
>>> The balloon is flat / My swap disk''s fat / I''ve
OOM''s in store
>>> Overcommitted so much
>>> (with apologies to the late great Bob Hope)
>> 
>> 
> 
> 
> 


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Levon

2008-Aug-05 18:56 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On Mon, Aug 04, 2008 at 01:40:06PM -0600, Dan Magenheimer wrote:
> * Greatly improved precision for time-sensitive SMP VMs
I wonder if we could get a more detailed summary of all the changes that
have been made here?

Will this let us stop taking a global lock in our PV time routine to
ensure monotonicity?

regards
john

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Aug-05 20:49 UTC

head link

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

The algorithm used to compute the timestamp information
that''s passed up to a PV domain has been re-worked to
result in a much lower inter-CPU skew.  The old
algorithm had a worst case of 10us to 40 us (depending
on how it was measured).  The new algorithm appears
to have a worst case which is sub-microsecond, though
it needs more exposure and hasn''t been tested on a wide
variety of boxes.  To measure it on your box, in domain0,
run the following (or equivalent) for a few hours:

watch "xm debug-key t; xm dmesg | tail -2"

However, it''s still not perfect and so is not guaranteed
to be monotonic across two CPUs, though it might be good
enough to be effectively monotonic in many environments.
I''m not sure its possible to guarantee monotonicity in
PV domains (without a global lock) except by doing a trap
or hypercall at each "get time".

I''ve thought about implementing softtsc for PV domains for
this reason.  (Softtsc was just added at 4.0 for hvm domains
and causes all hvm tsc reads to trap.)  Would this be of
interest?
> -----Original Message-----
> From: John Levon [mailto:levon@movementarian.org]
> Sent: Tuesday, August 05, 2008 12:57 PM
> To: Dan Magenheimer
> Cc: Keir Fraser; Xen-Devel (E-mail); Ian Pratt; Dave Winchell
> Subject: Re: [Xen-devel] RE: [PATCH] rendezvous-based local time
> calibration WOW!
> 
> 
> On Mon, Aug 04, 2008 at 01:40:06PM -0600, Dan Magenheimer wrote:
> 
> > * Greatly improved precision for time-sensitive SMP VMs
> 
> I wonder if we could get a more detailed summary of all the 
> changes that
> have been made here?
> 
> Will this let us stop taking a global lock in our PV time routine to
> ensure monotonicity?
> 
> regards
> john
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Levon

2008-Aug-05 21:12 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On Tue, Aug 05, 2008 at 02:49:25PM -0600, Dan Magenheimer wrote:
> The algorithm used to compute the timestamp information
Thanks.
> I''m not sure its possible to guarantee monotonicity in
> PV domains (without a global lock) except by doing a trap
> or hypercall at each "get time".
That''s a shame.
> I''ve thought about implementing softtsc for PV domains for
> this reason.  (Softtsc was just added at 4.0 for hvm domains
> and causes all hvm tsc reads to trap.)  Would this be of
> interest?
No, as it would be incredibly slow on Solaris (I dread to imagine).

regards,
john

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Aug-05 21:27 UTC

head link

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

> No, as it would be incredibly slow on Solaris (I dread to imagine).
Could be.  On my box (Conroe), trapping tsc in an hvm is faster
than reading pit or hpet in the hypervisor or in a native OS.
> -----Original Message-----
> From: John Levon [mailto:levon@movementarian.org]
> Sent: Tuesday, August 05, 2008 3:13 PM
> To: Dan Magenheimer
> Cc: Ian Pratt; Xen-Devel (E-mail); Dave Winchell; Keir Fraser
> Subject: Re: [Xen-devel] RE: [PATCH] rendezvous-based local time
> calibration WOW!
> 
> 
> On Tue, Aug 05, 2008 at 02:49:25PM -0600, Dan Magenheimer wrote:
> 
> > The algorithm used to compute the timestamp information
> 
> Thanks.
> 
> > I''m not sure its possible to guarantee monotonicity in
> > PV domains (without a global lock) except by doing a trap
> > or hypercall at each "get time".
> 
> That''s a shame.
> 
> > I''ve thought about implementing softtsc for PV domains for
> > this reason.  (Softtsc was just added at 4.0 for hvm domains
> > and causes all hvm tsc reads to trap.)  Would this be of
> > interest?
> 
> No, as it would be incredibly slow on Solaris (I dread to imagine).
> 
> regards,
> john
> 

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-05 21:43 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On 5/8/08 22:27, "Dan Magenheimer" <dan.magenheimer@oracle.com>
wrote:
>> No, as it would be incredibly slow on Solaris (I dread to imagine).
> 
> Could be.  On my box (Conroe), trapping tsc in an hvm is faster
> than reading pit or hpet in the hypervisor or in a native OS.
For a PV guest it only punts the monotonicity problem into the hypervisor of
course. You still need to access a shared counter, or use a lock (i.e.,
communication/synchronisation between processors), or be guaranteed that
local counters (TSCs) are driven by a common clock signal with negligible
skew.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Aug-06 13:25 UTC

head link

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

> > I''m not sure its possible to guarantee monotonicity in
> > PV domains (without a global lock) except by doing a trap
> > or hypercall at each "get time".
> 
> That''s a shame.
Further followup on this...

I''d encourage you to put some test code in your lock to
see if time ever measurably goes backwards.  It may never,
or it may only on some ill-behaved-tsc machines or when
cpufreq changes occur... needs testing.  Even if it
does, it may be by a smaller delta than all but the
most sophisticated SMP applications can detect.
Why?...

On my (admittedly well-behaved-tsc) machine, I''ve now
run a quarter-million samples on the new code.  The
"xm debug-key t" code now prints out both stime skew
and tsc. The results (TSC scaled for easier reading):

stime: max 349ns avg 114ns
TSC:   max 342ns avg  89ns

This is a dual-core Conroe so the TSC is supposedly
synchronized; so the differences are probably more due
to inter-CPU cache synchronization in the measurement code
than actual skew.

My currently running test code also records distribution
for stime skew. 99% of the samples are less than 200ns,
0.9% are 200ns-300ns, and 0.01% are greater than 300ns
(and less than the max of 349ns).   This compares to the
previous algorithm in which I measured ~2% greater than
1us and a few greater than 10us.  The old code was also
sensitive to load, with average skew increasing when
domains were busy.  The new code should be insensitive
to load.

So still no guarantees, but I do think this qualifies
as "greatly improved" and may also meet your needs.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Levon

2008-Aug-06 13:38 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On Wed, Aug 06, 2008 at 07:25:50AM -0600, Dan Magenheimer wrote:
> > > I''m not sure its possible to guarantee monotonicity in
> > > PV domains (without a global lock) except by doing a trap
> > > or hypercall at each "get time".
> > 
> > That''s a shame.
> 
> Further followup on this...
> 
> I''d encourage you to put some test code in your lock to
> see if time ever measurably goes backwards.  It may never,
> or it may only on some ill-behaved-tsc machines or when
> cpufreq changes occur... needs testing.  Even if it
> does, it may be by a smaller delta than all but the
> most sophisticated SMP applications can detect.
I believe the normal (metal) Solaris algorithm expects any inter-CPU TSC
differences to remain static (that is, no drift), so any machine that
breaks that is problematic:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/timestamp.c

(Compare:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86xpv/os/xpv_timestamp.c
)

The presumption is that gethrtimef() is monotonically increasing, which
at least Xen 3.0.4 regularly broke. If the hypervisor has been fixed to
give as much guarantees as we got already then great.

A monotonic gethrtime() is part of the ABI so I''m not sure we can avoid
a lock even on well-behaved machines if Xen isn''t correct.

I wonder if we couldn''t do something when we know that we''re
scheduling
a VPCU onto a different CPU to ensure time can''t go backwards.

Anyway, some more testing sounds like it would be interesting.

regards
john

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Aug-06 15:09 UTC

head link

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

> I wonder if we couldn''t do something when we know that
we''re scheduling
> a VPCU onto a different CPU to ensure time can''t go backwards.
Again no guarantees but I think we are now under the magic
threshold where the skew is smaller than the time required
for scheduling a VCPU onto a different CPU.  If so,
consecutive gethrtime''s by the same thread in a domain
should always be monotonic.

The overhead of measuring the inter-CPU stime skew is
too large to do at every cross-PCPU-schedule so doing
any kind of adjustment would be difficult.
But it might make sense for the Xen scheduler to do a
get_s_time() before and after a cross-PCPU-schedule
to detect the problem and printk if it occurs
(possibly rate-limited in case it happens a lot on
some badly-behaved machine).
> -----Original Message-----
> From: John Levon [mailto:levon@movementarian.org]
> Sent: Wednesday, August 06, 2008 7:38 AM
> To: Dan Magenheimer
> Cc: Ian Pratt; Xen-Devel (E-mail); Dave Winchell; Keir Fraser
> Subject: Re: [Xen-devel] RE: [PATCH] rendezvous-based local time
> calibration WOW!
> 
> 
> On Wed, Aug 06, 2008 at 07:25:50AM -0600, Dan Magenheimer wrote:
> 
> > > > I''m not sure its possible to guarantee monotonicity
in
> > > > PV domains (without a global lock) except by doing a trap
> > > > or hypercall at each "get time".
> > >
> > > That''s a shame.
> >
> > Further followup on this...
> >
> > I''d encourage you to put some test code in your lock to
> > see if time ever measurably goes backwards.  It may never,
> > or it may only on some ill-behaved-tsc machines or when
> > cpufreq changes occur... needs testing.  Even if it
> > does, it may be by a smaller delta than all but the
> > most sophisticated SMP applications can detect.
> 
> I believe the normal (metal) Solaris algorithm expects any 
> inter-CPU TSC
> differences to remain static (that is, no drift), so any machine that
> breaks that is problematic:
> 
> http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/timestamp.c

(Compare:

http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86xpv/os/xpv_timestamp.c
)

The presumption is that gethrtimef() is monotonically increasing, which
at least Xen 3.0.4 regularly broke. If the hypervisor has been fixed to
give as much guarantees as we got already then great.

A monotonic gethrtime() is part of the ABI so I''m not sure we can avoid
a lock even on well-behaved machines if Xen isn''t correct.

I wonder if we couldn''t do something when we know that we''re
scheduling
a VPCU onto a different CPU to ensure time can''t go backwards.

Anyway, some more testing sounds like it would be interesting.

regards
john


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Levon

2008-Aug-06 15:21 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On Wed, Aug 06, 2008 at 09:09:06AM -0600, Dan Magenheimer wrote:
> Again no guarantees but I think we are now under the magic
> threshold where the skew is smaller than the time required
> for scheduling a VCPU onto a different CPU.  If so,
> consecutive gethrtime''s by the same thread in a domain
> should always be monotonic.
Right! That sounds positive.
> The overhead of measuring the inter-CPU stime skew is
> too large to do at every cross-PCPU-schedule so doing
> any kind of adjustment would be difficult.
> But it might make sense for the Xen scheduler to do a
> get_s_time() before and after a cross-PCPU-schedule
> to detect the problem and printk if it occurs
> (possibly rate-limited in case it happens a lot on
> some badly-behaved machine).
If we''re doing a get_s_time() before the schedule, don''t we
merely* have
to ensure that the new s_time is after the last recorded one on the
previous CPU? (Yes, I''m handwaving terribly)

regards
john

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Aug-06 15:34 UTC

head link

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

> > The overhead of measuring the inter-CPU stime skew is
> > too large to do at every cross-PCPU-schedule so doing
> > any kind of adjustment would be difficult.
> > But it might make sense for the Xen scheduler to do a
> > get_s_time() before and after a cross-PCPU-schedule
> > to detect the problem and printk if it occurs
> > (possibly rate-limited in case it happens a lot on
> > some badly-behaved machine).
> 
> If we''re doing a get_s_time() before the schedule, don''t
we
> merely* have
> to ensure that the new s_time is after the last recorded one on the
> previous CPU? (Yes, I''m handwaving terribly)
Yes, that detects the problem so it can be printk''d.
But what can be done to reliably adjust for it?  Adding
a fixed offset to the new cpu''s stime doesn''t work because
stime computation is adapted independently and dynamically
on each cpu, so inter-CPU skew "jitters" and adding a
constant may just make the max skew worse.

I''m not saying it can''t be done, but I''m pretty sure
it
will be messy, so let''s make sure it needs to be fixed
before trying to fix it.


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Nils Nieuwejaar

2008-Aug-09 14:47 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On Wed, Aug 6, 2008 at 11:21 AM, John Levon <levon@movementarian.org>
wrote:> On Wed, Aug 06, 2008 at 09:09:06AM -0600, Dan Magenheimer wrote:
>
>> Again no guarantees but I think we are now under the magic
>> threshold where the skew is smaller than the time required
>> for scheduling a VCPU onto a different CPU.  If so,
>> consecutive gethrtime''s by the same thread in a domain
>> should always be monotonic.
>
> Right! That sounds positive.
It''s an improvement, but I''m pretty sure it''s still
not sufficient for
Solaris.  If I understand the change correctly, it seems to solve the
problem for single-vcpu guests on an SMP,  but not for multi-vcpu
guests on an SMP.  It sounds like the OS could reschedule a thread
from VCPU 0 to VCPU 1 and consecutive calls to gethrtime() could still
return non-monotonic results.

Nils

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Dan Magenheimer

2008-Aug-09 20:55 UTC

head link

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

> On Wed, Aug 6, 2008 at 11:21 AM, John Levon 
> <levon@movementarian.org> wrote:
> > On Wed, Aug 06, 2008 at 09:09:06AM -0600, Dan Magenheimer wrote:
> >
> >> Again no guarantees but I think we are now under the magic
> >> threshold where the skew is smaller than the time required
> >> for scheduling a VCPU onto a different CPU.  If so,
> >> consecutive gethrtime''s by the same thread in a domain
> >> should always be monotonic.
> >
> > Right! That sounds positive.
> 
> It''s an improvement, but I''m pretty sure it''s
still not sufficient for
> Solaris.  If I understand the change correctly, it seems to solve the
> problem for single-vcpu guests on an SMP,  but not for multi-vcpu
> guests on an SMP.  It sounds like the OS could reschedule a thread
> from VCPU 0 to VCPU 1 and consecutive calls to gethrtime() could still
> return non-monotonic results.
How long does it take for Solaris to reschedule a thread from
VCPU0 to VCPU1?  Its certainly not zero time (and you also need
to add the overhead of gethrtime).

But, yes, the same "no guarantees" applies to this situation...
if a Solaris thread continuously calls gethrtime(), there is a
non-zero probability that, if the thread changes physical CPUs
and the thread rescheduling code is "very fast",
two consecutive calls could observe time going backwards.  But
that''s true with much recent vintage hardware because TSCs
sometimes skew, and so most OS''s with high-res timers are able
to deal with this.

True of Solaris, John?

Dan



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Levon

2008-Aug-11 14:37 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On Sat, Aug 09, 2008 at 02:55:33PM -0600, Dan Magenheimer wrote:
> > >> Again no guarantees but I think we are now under the magic
> > >> threshold where the skew is smaller than the time required
> > >> for scheduling a VCPU onto a different CPU.  If so,
> > >> consecutive gethrtime''s by the same thread in a
domain
> > >> should always be monotonic.
> > >
> > > Right! That sounds positive.
> > 
> > It''s an improvement, but I''m pretty sure
it''s still not sufficient for
> > Solaris.  If I understand the change correctly, it seems to solve the
> > problem for single-vcpu guests on an SMP,  but not for multi-vcpu
> > guests on an SMP.  It sounds like the OS could reschedule a thread
> > from VCPU 0 to VCPU 1 and consecutive calls to gethrtime() could still
> > return non-monotonic results.
> 
> How long does it take for Solaris to reschedule a thread from
> VCPU0 to VCPU1?  Its certainly not zero time (and you also need
> to add the overhead of gethrtime).
> 
> But, yes, the same "no guarantees" applies to this situation...
> if a Solaris thread continuously calls gethrtime(), there is a
> non-zero probability that, if the thread changes physical CPUs
> and the thread rescheduling code is "very fast",
> two consecutive calls could observe time going backwards.
It''s only non-zero if we can indeed reschedule fast enough. If
it''s now
below the threshold, then we can consider it effectively fixed. Only
testing can really tell us that.
>  But that''s true with much recent vintage hardware because TSCs
>  sometimes skew, and so most OS''s with high-res timers are able to
>  deal with this.
> 
> True of Solaris, John?
I''m not an expert on the relevant code, but I believe the solution to
TSC drift (as Solaris calls what I think you call skew) is to set
''tsc_gethrtime_enable'' to zero, so we don''t use the
TSC for this
purpose.

regards
john

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-11 14:38 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On 11/8/08 15:37, "John Levon" <levon@movementarian.org> wrote:
> It''s only non-zero if we can indeed reschedule fast enough. If
it''s now
> below the threshold, then we can consider it effectively fixed. Only
> testing can really tell us that.
Depending on how critical this guarantee is, I wouldn''t rely on Xen to
perform perfectly. Probably you should keep your lock.

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Levon

2008-Aug-11 14:43 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On Mon, Aug 11, 2008 at 03:38:31PM +0100, Keir Fraser wrote:
> > It''s only non-zero if we can indeed reschedule fast enough.
If it''s now
> > below the threshold, then we can consider it effectively fixed. Only
> > testing can really tell us that.
> 
> Depending on how critical this guarantee is, I wouldn''t rely on
Xen to
> perform perfectly. Probably you should keep your lock.
Or maybe make it optional, and let people turn it on when VCPUs are
pinned (VCPU migration doesn''t make much sense to me for server
workloads AFAICS).

That lock is *painful*.

john

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-11 14:46 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On 11/8/08 15:43, "John Levon" <levon@movementarian.org> wrote:
>> Depending on how critical this guarantee is, I wouldn''t rely
on Xen to
>> perform perfectly. Probably you should keep your lock.
> 
> Or maybe make it optional, and let people turn it on when VCPUs are
> pinned (VCPU migration doesn''t make much sense to me for server
> workloads AFAICS).
> 
> That lock is *painful*.
What guarantee are you providing? Per thread, per address space, or global
monotonicity?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Levon

2008-Aug-11 14:49 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On Mon, Aug 11, 2008 at 03:46:10PM +0100, Keir Fraser wrote:
> >> Depending on how critical this guarantee is, I wouldn''t
rely on Xen to
> >> perform perfectly. Probably you should keep your lock.
> > 
> > Or maybe make it optional, and let people turn it on when VCPUs are
> > pinned (VCPU migration doesn''t make much sense to me for
server
> > workloads AFAICS).
> > 
> > That lock is *painful*.
> 
> What guarantee are you providing? Per thread, per address space, or global
> monotonicity?
Per thread non-strict monotonicity.

john

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Keir Fraser

2008-Aug-11 14:50 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On 11/8/08 15:49, "John Levon" <levon@movementarian.org> wrote:
>> What guarantee are you providing? Per thread, per address space, or
global
>> monotonicity?
> 
> Per thread non-strict monotonicity.
Doesn''t this just require thread-local storage and no lock?

 -- Keir



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

John Levon

2008-Aug-11 18:41 UTC

head link

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

On Mon, Aug 11, 2008 at 03:50:57PM +0100, Keir Fraser wrote:
> >> What guarantee are you providing? Per thread, per address space,
or global
> >> monotonicity?
> > 
> > Per thread non-strict monotonicity.
> 
> Doesn''t this just require thread-local storage and no lock?
The above is what we guarantee but it''s not how it''s
implemented. All of
that is based upon the per-CPU hrtime, so we need the lock (or a
wholesale rework of how hrtime is managed in the Solaris kernel: that''s
not going to happen :)

regards
john

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

Xen devel - Aug 2008 - [PATCH] rendezvous-based local time calibration WOW!

[Xen-devel] [PATCH] rendezvous-based local time calibration WOW!

[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!

[Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!

[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!

[Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

[Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!

Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!