Dan Magenheimer
2008-Aug-03  16:50 UTC
[Xen-devel] [PATCH] rendezvous-based local time calibration WOW!
The synchronization of local_time_calibration (l_t_c) via round-to-nearest-epoch provided some improvement, but I was still seeing skew up to 16usec and higher. I measured the temporal distance between the rounded-epoch vs when ltc was actually running to ensure there wasn't some kind of bug and found that l_t_c was running up to 150us after the round-epoch and sometimes up to 50us before. I guess this is the granularity of setting a Xen timer. While it seemed that +/- 100us shouldn't cause that much skew, I finally decided to try synchronization-via-rendezvous, as suggested by Ian here: http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01074.html http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html The result is phenomenal... using this approach (in attached patch), I have yet to see a skew exceed 1usec!!! So this is about a 10-fold increase in accuracy vs the rounded-epoch method and about 20-fold over the one-epoch-from-NOW() method. The platform time is now read once for all processors rather than once per processor. (Actually, it is read once again in platform_time_calibration()... by "inlining" that routine into master_local_time_calibration() that extra read can be -- and probably should be -- avoided too.) It may be too late to get this into 3.3.0 but, if so, please consider it asap for 3.3.1 rather than just xen-unstable/3.4. Dan ==================================Thanks... for the memory I really could use more / My throughput's on the floor The balloon is flat / My swap disk's fat / I've OOM's in store Overcommitted so much (with apologies to the late great Bob Hope) _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Aug-03  17:24 UTC
[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!
It''s not safe to poke a new timestamp record from an interrupt handler (which is what the smp_call_function() callback functions are). Users of the timestamp records (e.g., get_s_time) need local_irq_save/restore() or an equivalent of the Linux seqlock. The latter is likely faster. I''m dubious about update_vcpu_system_time() from an interrupt handler too. It needs thought about how it might race with a context switch (change of ''current'') or if it interrupts an existing invocation of update_vcpu_system_time(). -- Keir On 3/8/08 17:50, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> The synchronization of local_time_calibration (l_t_c) via > round-to-nearest-epoch provided some improvement, but I was > still seeing skew up to 16usec and higher. I measured the > temporal distance between the rounded-epoch vs when ltc > was actually running to ensure there wasn''t some kind of > bug and found that l_t_c was running up to 150us after the > round-epoch and sometimes up to 50us before. I guess this > is the granularity of setting a Xen timer. While it seemed > that +/- 100us shouldn''t cause that much skew, I finally > decided to try synchronization-via-rendezvous, as suggested > by Ian here: > > http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01074.html > http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html > > The result is phenomenal... using this approach (in attached > patch), I have yet to see a skew exceed 1usec!!! So this is > about a 10-fold increase in accuracy vs the rounded-epoch > method and about 20-fold over the one-epoch-from-NOW() method. > > The platform time is now read once for all processors rather > than once per processor. (Actually, it is read once again > in platform_time_calibration()... by "inlining" that routine > into master_local_time_calibration() that extra read can > be -- and probably should be -- avoided too.) > > It may be too late to get this into 3.3.0 but, if so, please > consider it asap for 3.3.1 rather than just xen-unstable/3.4. > > Dan > > ==================================> Thanks... for the memory > I really could use more / My throughput''s on the floor > The balloon is flat / My swap disk''s fat / I''ve OOM''s in store > Overcommitted so much > (with apologies to the late great Bob Hope)_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Aug-04  15:24 UTC
[Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
OK, how about this version. The rendezvous only collects the key per-cpu time data then sets up a per-cpu 1ms timer to later update the timestamp record and vcpu system time, so neither should have racing issues. I''ve only run it for about an hour but still haven''t seen any skew over 600nsec so apparently it is the collection of the key time data that must be closely synchronized (probably to ensure the slope is correct) while exact synchronization of setting the timestamp records is less important. Note that I''m not positive I got the clocksource=tsc part correct... but am interested in your opinion on whether clocksource=tsc can now be eliminated anyway (as the main reason I pushed for it was because of unacceptable skew which with this patch appears to be fixed). Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Sunday, August 03, 2008 11:25 AM > To: dan.magenheimer@oracle.com; Xen-Devel (E-mail) > Cc: Ian Pratt; Dave Winchell > Subject: Re: [PATCH] rendezvous-based local time calibration WOW! > > > It''s not safe to poke a new timestamp record from an interrupt handler > (which is what the smp_call_function() callback functions > are). Users of the > timestamp records (e.g., get_s_time) need > local_irq_save/restore() or an > equivalent of the Linux seqlock. The latter is likely faster. > I''m dubious > about update_vcpu_system_time() from an interrupt handler > too. It needs > thought about how it might race with a context switch (change > of ''current'') > or if it interrupts an existing invocation of > update_vcpu_system_time(). > > -- Keir > > On 3/8/08 17:50, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote: > > > The synchronization of local_time_calibration (l_t_c) via > > round-to-nearest-epoch provided some improvement, but I was > > still seeing skew up to 16usec and higher. I measured the > > temporal distance between the rounded-epoch vs when ltc > > was actually running to ensure there wasn''t some kind of > > bug and found that l_t_c was running up to 150us after the > > round-epoch and sometimes up to 50us before. I guess this > > is the granularity of setting a Xen timer. While it seemed > > that +/- 100us shouldn''t cause that much skew, I finally > > decided to try synchronization-via-rendezvous, as suggested > > by Ian here: > > > > > http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01074.html> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html > > The result is phenomenal... using this approach (in attached > patch), I have yet to see a skew exceed 1usec!!! So this is > about a 10-fold increase in accuracy vs the rounded-epoch > method and about 20-fold over the one-epoch-from-NOW() method. > > The platform time is now read once for all processors rather > than once per processor. (Actually, it is read once again > in platform_time_calibration()... by "inlining" that routine > into master_local_time_calibration() that extra read can > be -- and probably should be -- avoided too.) > > It may be too late to get this into 3.3.0 but, if so, please > consider it asap for 3.3.1 rather than just xen-unstable/3.4. > > Dan > > ==================================> Thanks... for the memory > I really could use more / My throughput''s on the floor > The balloon is flat / My swap disk''s fat / I''ve OOM''s in store > Overcommitted so much > (with apologies to the late great Bob Hope)_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Aug-04  15:36 UTC
[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!
I''ll take a look and see if it can be worked out for 3.3.0. It''d be nicer than clocksource=tsc. -- Keir On 4/8/08 16:24, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> OK, how about this version. The rendezvous only collects > the key per-cpu time data then sets up a per-cpu 1ms timer > to later update the timestamp record and vcpu system time, > so neither should have racing issues. > > I''ve only run it for about an hour but still haven''t seen > any skew over 600nsec so apparently it is the collection of > the key time data that must be closely synchronized (probably > to ensure the slope is correct) while exact synchronization > of setting the timestamp records is less important. > > Note that I''m not positive I got the clocksource=tsc part > correct... but am interested in your opinion on whether > clocksource=tsc can now be eliminated anyway (as the > main reason I pushed for it was because of unacceptable > skew which with this patch appears to be fixed). > > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> > >> -----Original Message----- >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Sunday, August 03, 2008 11:25 AM >> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail) >> Cc: Ian Pratt; Dave Winchell >> Subject: Re: [PATCH] rendezvous-based local time calibration WOW! >> >> >> It''s not safe to poke a new timestamp record from an interrupt handler >> (which is what the smp_call_function() callback functions >> are). Users of the >> timestamp records (e.g., get_s_time) need >> local_irq_save/restore() or an >> equivalent of the Linux seqlock. The latter is likely faster. >> I''m dubious >> about update_vcpu_system_time() from an interrupt handler >> too. It needs >> thought about how it might race with a context switch (change >> of ''current'') >> or if it interrupts an existing invocation of >> update_vcpu_system_time(). >> >> -- Keir >> >> On 3/8/08 17:50, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote: >> >>> The synchronization of local_time_calibration (l_t_c) via >>> round-to-nearest-epoch provided some improvement, but I was >>> still seeing skew up to 16usec and higher. I measured the >>> temporal distance between the rounded-epoch vs when ltc >>> was actually running to ensure there wasn''t some kind of >>> bug and found that l_t_c was running up to 150us after the >>> round-epoch and sometimes up to 50us before. I guess this >>> is the granularity of setting a Xen timer. While it seemed >>> that +/- 100us shouldn''t cause that much skew, I finally >>> decided to try synchronization-via-rendezvous, as suggested >>> by Ian here: >>> >>> >> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg > 01074.html >> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html >> >> The result is phenomenal... using this approach (in attached >> patch), I have yet to see a skew exceed 1usec!!! So this is >> about a 10-fold increase in accuracy vs the rounded-epoch >> method and about 20-fold over the one-epoch-from-NOW() method. >> >> The platform time is now read once for all processors rather >> than once per processor. (Actually, it is read once again >> in platform_time_calibration()... by "inlining" that routine >> into master_local_time_calibration() that extra read can >> be -- and probably should be -- avoided too.) >> >> It may be too late to get this into 3.3.0 but, if so, please >> consider it asap for 3.3.1 rather than just xen-unstable/3.4. >> >> Dan >> >> ==================================>> Thanks... for the memory >> I really could use more / My throughput''s on the floor >> The balloon is flat / My swap disk''s fat / I''ve OOM''s in store >> Overcommitted so much >> (with apologies to the late great Bob Hope) > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Aug-04  17:10 UTC
[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!
Applied as c/s 18229. I rewrote it quite a bit, although the principle remains the same. -- Keir On 4/8/08 16:24, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> OK, how about this version. The rendezvous only collects > the key per-cpu time data then sets up a per-cpu 1ms timer > to later update the timestamp record and vcpu system time, > so neither should have racing issues. > > I''ve only run it for about an hour but still haven''t seen > any skew over 600nsec so apparently it is the collection of > the key time data that must be closely synchronized (probably > to ensure the slope is correct) while exact synchronization > of setting the timestamp records is less important. > > Note that I''m not positive I got the clocksource=tsc part > correct... but am interested in your opinion on whether > clocksource=tsc can now be eliminated anyway (as the > main reason I pushed for it was because of unacceptable > skew which with this patch appears to be fixed). > > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> > >> -----Original Message----- >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >> Sent: Sunday, August 03, 2008 11:25 AM >> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail) >> Cc: Ian Pratt; Dave Winchell >> Subject: Re: [PATCH] rendezvous-based local time calibration WOW! >> >> >> It''s not safe to poke a new timestamp record from an interrupt handler >> (which is what the smp_call_function() callback functions >> are). Users of the >> timestamp records (e.g., get_s_time) need >> local_irq_save/restore() or an >> equivalent of the Linux seqlock. The latter is likely faster. >> I''m dubious >> about update_vcpu_system_time() from an interrupt handler >> too. It needs >> thought about how it might race with a context switch (change >> of ''current'') >> or if it interrupts an existing invocation of >> update_vcpu_system_time(). >> >> -- Keir >> >> On 3/8/08 17:50, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote: >> >>> The synchronization of local_time_calibration (l_t_c) via >>> round-to-nearest-epoch provided some improvement, but I was >>> still seeing skew up to 16usec and higher. I measured the >>> temporal distance between the rounded-epoch vs when ltc >>> was actually running to ensure there wasn''t some kind of >>> bug and found that l_t_c was running up to 150us after the >>> round-epoch and sometimes up to 50us before. I guess this >>> is the granularity of setting a Xen timer. While it seemed >>> that +/- 100us shouldn''t cause that much skew, I finally >>> decided to try synchronization-via-rendezvous, as suggested >>> by Ian here: >>> >>> >> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg > 01074.html >> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html >> >> The result is phenomenal... using this approach (in attached >> patch), I have yet to see a skew exceed 1usec!!! So this is >> about a 10-fold increase in accuracy vs the rounded-epoch >> method and about 20-fold over the one-epoch-from-NOW() method. >> >> The platform time is now read once for all processors rather >> than once per processor. (Actually, it is read once again >> in platform_time_calibration()... by "inlining" that routine >> into master_local_time_calibration() that extra read can >> be -- and probably should be -- avoided too.) >> >> It may be too late to get this into 3.3.0 but, if so, please >> consider it asap for 3.3.1 rather than just xen-unstable/3.4. >> >> Dan >> >> ==================================>> Thanks... for the memory >> I really could use more / My throughput''s on the floor >> The balloon is flat / My swap disk''s fat / I''ve OOM''s in store >> Overcommitted so much >> (with apologies to the late great Bob Hope) > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Aug-04  17:37 UTC
[Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
Looks good to me (and much cleaner). I''ve booted it and will leave it running for a few hours. Thanks! Dan> -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Monday, August 04, 2008 11:10 AM > To: dan.magenheimer@oracle.com; Xen-Devel (E-mail) > Cc: Ian Pratt; Dave Winchell > Subject: Re: [PATCH] rendezvous-based local time calibration WOW! > > > Applied as c/s 18229. I rewrote it quite a bit, although the principle > remains the same. > > -- Keir > > On 4/8/08 16:24, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote: > > > OK, how about this version. The rendezvous only collects > > the key per-cpu time data then sets up a per-cpu 1ms timer > > to later update the timestamp record and vcpu system time, > > so neither should have racing issues. > > > > I''ve only run it for about an hour but still haven''t seen > > any skew over 600nsec so apparently it is the collection of > > the key time data that must be closely synchronized (probably > > to ensure the slope is correct) while exact synchronization > > of setting the timestamp records is less important. > > > > Note that I''m not positive I got the clocksource=tsc part > > correct... but am interested in your opinion on whether > > clocksource=tsc can now be eliminated anyway (as the > > main reason I pushed for it was because of unacceptable > > skew which with this patch appears to be fixed). > > > > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> > > > >> -----Original Message----- > >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > >> Sent: Sunday, August 03, 2008 11:25 AM > >> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail) > >> Cc: Ian Pratt; Dave Winchell > >> Subject: Re: [PATCH] rendezvous-based local time calibration WOW! > >> > >> > >> It''s not safe to poke a new timestamp record from an > interrupt handler > >> (which is what the smp_call_function() callback functions > >> are). Users of the > >> timestamp records (e.g., get_s_time) need > >> local_irq_save/restore() or an > >> equivalent of the Linux seqlock. The latter is likely faster. > >> I''m dubious > >> about update_vcpu_system_time() from an interrupt handler > >> too. It needs > >> thought about how it might race with a context switch (change > >> of ''current'') > >> or if it interrupts an existing invocation of > >> update_vcpu_system_time(). > >> > >> -- Keir > >> > >> On 3/8/08 17:50, "Dan Magenheimer" > <dan.magenheimer@oracle.com> wrote: > >> > >>> The synchronization of local_time_calibration (l_t_c) via > >>> round-to-nearest-epoch provided some improvement, but I was > >>> still seeing skew up to 16usec and higher. I measured the > >>> temporal distance between the rounded-epoch vs when ltc > >>> was actually running to ensure there wasn''t some kind of > >>> bug and found that l_t_c was running up to 150us after the > >>> round-epoch and sometimes up to 50us before. I guess this > >>> is the granularity of setting a Xen timer. While it seemed > >>> that +/- 100us shouldn''t cause that much skew, I finally > >>> decided to try synchronization-via-rendezvous, as suggested > >>> by Ian here: > >>> > >>> > >> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg > > 01074.html > >>http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html>> >> The result is phenomenal... using this approach (in attached >> patch), I have yet to see a skew exceed 1usec!!! So this is >> about a 10-fold increase in accuracy vs the rounded-epoch >> method and about 20-fold over the one-epoch-from-NOW() method. >> >> The platform time is now read once for all processors rather >> than once per processor. (Actually, it is read once again >> in platform_time_calibration()... by "inlining" that routine >> into master_local_time_calibration() that extra read can >> be -- and probably should be -- avoided too.) >> >> It may be too late to get this into 3.3.0 but, if so, please >> consider it asap for 3.3.1 rather than just xen-unstable/3.4. >> >> Dan >> >> ==================================>> Thanks... for the memory >> I really could use more / My throughput''s on the floor >> The balloon is flat / My swap disk''s fat / I''ve OOM''s in store >> Overcommitted so much >> (with apologies to the late great Bob Hope) > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Aug-04  19:40 UTC
[Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
After two hours of constant samples with c/s 18229, max skew is at 251ns! That''s 70-150x better than I was measuring just a couple of weeks ago. YMMV of course. If you are looking for another marketing-speak bullet for the 4.0 release announcement, you can call this: * Greatly improved precision for time-sensitive SMP VMs or as I am subject to American hyperbole: * Dramatically improved precision for time-sensitive SMP VMs Thanks again! Dan> -----Original Message----- > From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] > Sent: Monday, August 04, 2008 11:37 AM > To: ''Keir Fraser''; ''Xen-Devel (E-mail)'' > Cc: ''Ian Pratt''; ''Dave Winchell'' > Subject: RE: [PATCH] rendezvous-based local time calibration WOW! > > > Looks good to me (and much cleaner). I''ve booted it and > will leave it running for a few hours. > > Thanks! > Dan > > > -----Original Message----- > > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > > Sent: Monday, August 04, 2008 11:10 AM > > To: dan.magenheimer@oracle.com; Xen-Devel (E-mail) > > Cc: Ian Pratt; Dave Winchell > > Subject: Re: [PATCH] rendezvous-based local time calibration WOW! > > > > > > Applied as c/s 18229. I rewrote it quite a bit, although > the principle > > remains the same. > > > > -- Keir > > > > On 4/8/08 16:24, "Dan Magenheimer" > <dan.magenheimer@oracle.com> wrote: > > > > > OK, how about this version. The rendezvous only collects > > > the key per-cpu time data then sets up a per-cpu 1ms timer > > > to later update the timestamp record and vcpu system time, > > > so neither should have racing issues. > > > > > > I''ve only run it for about an hour but still haven''t seen > > > any skew over 600nsec so apparently it is the collection of > > > the key time data that must be closely synchronized (probably > > > to ensure the slope is correct) while exact synchronization > > > of setting the timestamp records is less important. > > > > > > Note that I''m not positive I got the clocksource=tsc part > > > correct... but am interested in your opinion on whether > > > clocksource=tsc can now be eliminated anyway (as the > > > main reason I pushed for it was because of unacceptable > > > skew which with this patch appears to be fixed). > > > > > > Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> > > > > > >> -----Original Message----- > > >> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > > >> Sent: Sunday, August 03, 2008 11:25 AM > > >> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail) > > >> Cc: Ian Pratt; Dave Winchell > > >> Subject: Re: [PATCH] rendezvous-based local time calibration WOW! > > >> > > >> > > >> It''s not safe to poke a new timestamp record from an > > interrupt handler > > >> (which is what the smp_call_function() callback functions > > >> are). Users of the > > >> timestamp records (e.g., get_s_time) need > > >> local_irq_save/restore() or an > > >> equivalent of the Linux seqlock. The latter is likely faster. > > >> I''m dubious > > >> about update_vcpu_system_time() from an interrupt handler > > >> too. It needs > > >> thought about how it might race with a context switch (change > > >> of ''current'') > > >> or if it interrupts an existing invocation of > > >> update_vcpu_system_time(). > > >> > > >> -- Keir > > >> > > >> On 3/8/08 17:50, "Dan Magenheimer" > > <dan.magenheimer@oracle.com> wrote: > > >> > > >>> The synchronization of local_time_calibration (l_t_c) via > > >>> round-to-nearest-epoch provided some improvement, but I was > > >>> still seeing skew up to 16usec and higher. I measured the > > >>> temporal distance between the rounded-epoch vs when ltc > > >>> was actually running to ensure there wasn''t some kind of > > >>> bug and found that l_t_c was running up to 150us after the > > >>> round-epoch and sometimes up to 50us before. I guess this > > >>> is the granularity of setting a Xen timer. While it seemed > > >>> that +/- 100us shouldn''t cause that much skew, I finally > > >>> decided to try synchronization-via-rendezvous, as suggested > > >>> by Ian here: > > >>> > > >>> > > >> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg > > > 01074.html > > >> > http://lists.xensource.com/archives/html/xen-devel/2008-07/msg01080.html>> >> The result is phenomenal... using this approach (in attached >> patch), I have yet to see a skew exceed 1usec!!! So this is >> about a 10-fold increase in accuracy vs the rounded-epoch >> method and about 20-fold over the one-epoch-from-NOW() method. >> >> The platform time is now read once for all processors rather >> than once per processor. (Actually, it is read once again >> in platform_time_calibration()... by "inlining" that routine >> into master_local_time_calibration() that extra read can >> be -- and probably should be -- avoided too.) >> >> It may be too late to get this into 3.3.0 but, if so, please >> consider it asap for 3.3.1 rather than just xen-unstable/3.4. >> >> Dan >> >> ==================================>> Thanks... for the memory >> I really could use more / My throughput''s on the floor >> The balloon is flat / My swap disk''s fat / I''ve OOM''s in store >> Overcommitted so much >> (with apologies to the late great Bob Hope) > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Aug-04  19:47 UTC
[Xen-devel] Re: [PATCH] rendezvous-based local time calibration WOW!
Thanks, Dan! Of course, there are new features since 3.2 that I did not include in by version-number-change announcement email. I''ll make a suitably updated list for the actual 4.0 release announcement. -- Keir On 4/8/08 20:40, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:> After two hours of constant samples with c/s 18229, max > skew is at 251ns! That''s 70-150x better than I was > measuring just a couple of weeks ago. YMMV of course. > > If you are looking for another marketing-speak bullet for > the 4.0 release announcement, you can call this: > > * Greatly improved precision for time-sensitive SMP VMs > > or as I am subject to American hyperbole: > > * Dramatically improved precision for time-sensitive SMP VMs > > Thanks again! > Dan > >> -----Original Message----- >> From: Dan Magenheimer [mailto:dan.magenheimer@oracle.com] >> Sent: Monday, August 04, 2008 11:37 AM >> To: ''Keir Fraser''; ''Xen-Devel (E-mail)'' >> Cc: ''Ian Pratt''; ''Dave Winchell'' >> Subject: RE: [PATCH] rendezvous-based local time calibration WOW! >> >> >> Looks good to me (and much cleaner). I''ve booted it and >> will leave it running for a few hours. >> >> Thanks! >> Dan >> >>> -----Original Message----- >>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>> Sent: Monday, August 04, 2008 11:10 AM >>> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail) >>> Cc: Ian Pratt; Dave Winchell >>> Subject: Re: [PATCH] rendezvous-based local time calibration WOW! >>> >>> >>> Applied as c/s 18229. I rewrote it quite a bit, although >> the principle >>> remains the same. >>> >>> -- Keir >>> >>> On 4/8/08 16:24, "Dan Magenheimer" >> <dan.magenheimer@oracle.com> wrote: >>> >>>> OK, how about this version. The rendezvous only collects >>>> the key per-cpu time data then sets up a per-cpu 1ms timer >>>> to later update the timestamp record and vcpu system time, >>>> so neither should have racing issues. >>>> >>>> I''ve only run it for about an hour but still haven''t seen >>>> any skew over 600nsec so apparently it is the collection of >>>> the key time data that must be closely synchronized (probably >>>> to ensure the slope is correct) while exact synchronization >>>> of setting the timestamp records is less important. >>>> >>>> Note that I''m not positive I got the clocksource=tsc part >>>> correct... but am interested in your opinion on whether >>>> clocksource=tsc can now be eliminated anyway (as the >>>> main reason I pushed for it was because of unacceptable >>>> skew which with this patch appears to be fixed). >>>> >>>> Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> >>>> >>>>> -----Original Message----- >>>>> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] >>>>> Sent: Sunday, August 03, 2008 11:25 AM >>>>> To: dan.magenheimer@oracle.com; Xen-Devel (E-mail) >>>>> Cc: Ian Pratt; Dave Winchell >>>>> Subject: Re: [PATCH] rendezvous-based local time calibration WOW! >>>>> >>>>> >>>>> It''s not safe to poke a new timestamp record from an >>> interrupt handler >>>>> (which is what the smp_call_function() callback functions >>>>> are). Users of the >>>>> timestamp records (e.g., get_s_time) need >>>>> local_irq_save/restore() or an >>>>> equivalent of the Linux seqlock. The latter is likely faster. >>>>> I''m dubious >>>>> about update_vcpu_system_time() from an interrupt handler >>>>> too. It needs >>>>> thought about how it might race with a context switch (change >>>>> of ''current'') >>>>> or if it interrupts an existing invocation of >>>>> update_vcpu_system_time(). >>>>> >>>>> -- Keir >>>>> >>>>> On 3/8/08 17:50, "Dan Magenheimer" >>> <dan.magenheimer@oracle.com> wrote: >>>>> >>>>>> The synchronization of local_time_calibration (l_t_c) via >>>>>> round-to-nearest-epoch provided some improvement, but I was >>>>>> still seeing skew up to 16usec and higher. I measured the >>>>>> temporal distance between the rounded-epoch vs when ltc >>>>>> was actually running to ensure there wasn''t some kind of >>>>>> bug and found that l_t_c was running up to 150us after the >>>>>> round-epoch and sometimes up to 50us before. I guess this >>>>>> is the granularity of setting a Xen timer. While it seemed >>>>>> that +/- 100us shouldn''t cause that much skew, I finally >>>>>> decided to try synchronization-via-rendezvous, as suggested >>>>>> by Ian here: >>>>>> >>>>>> >>>>> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg >>>> 01074.html >>>>> >> http://lists.xensource.com/archives/html/xen-devel/2008-07/msg > 01080.html >>> >>> The result is phenomenal... using this approach (in attached >>> patch), I have yet to see a skew exceed 1usec!!! So this is >>> about a 10-fold increase in accuracy vs the rounded-epoch >>> method and about 20-fold over the one-epoch-from-NOW() method. >>> >>> The platform time is now read once for all processors rather >>> than once per processor. (Actually, it is read once again >>> in platform_time_calibration()... by "inlining" that routine >>> into master_local_time_calibration() that extra read can >>> be -- and probably should be -- avoided too.) >>> >>> It may be too late to get this into 3.3.0 but, if so, please >>> consider it asap for 3.3.1 rather than just xen-unstable/3.4. >>> >>> Dan >>> >>> ==================================>>> Thanks... for the memory >>> I really could use more / My throughput''s on the floor >>> The balloon is flat / My swap disk''s fat / I''ve OOM''s in store >>> Overcommitted so much >>> (with apologies to the late great Bob Hope) >> >> > > >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2008-Aug-05  18:56 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On Mon, Aug 04, 2008 at 01:40:06PM -0600, Dan Magenheimer wrote:> * Greatly improved precision for time-sensitive SMP VMsI wonder if we could get a more detailed summary of all the changes that have been made here? Will this let us stop taking a global lock in our PV time routine to ensure monotonicity? regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Aug-05  20:49 UTC
RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
The algorithm used to compute the timestamp information that''s passed up to a PV domain has been re-worked to result in a much lower inter-CPU skew. The old algorithm had a worst case of 10us to 40 us (depending on how it was measured). The new algorithm appears to have a worst case which is sub-microsecond, though it needs more exposure and hasn''t been tested on a wide variety of boxes. To measure it on your box, in domain0, run the following (or equivalent) for a few hours: watch "xm debug-key t; xm dmesg | tail -2" However, it''s still not perfect and so is not guaranteed to be monotonic across two CPUs, though it might be good enough to be effectively monotonic in many environments. I''m not sure its possible to guarantee monotonicity in PV domains (without a global lock) except by doing a trap or hypercall at each "get time". I''ve thought about implementing softtsc for PV domains for this reason. (Softtsc was just added at 4.0 for hvm domains and causes all hvm tsc reads to trap.) Would this be of interest?> -----Original Message----- > From: John Levon [mailto:levon@movementarian.org] > Sent: Tuesday, August 05, 2008 12:57 PM > To: Dan Magenheimer > Cc: Keir Fraser; Xen-Devel (E-mail); Ian Pratt; Dave Winchell > Subject: Re: [Xen-devel] RE: [PATCH] rendezvous-based local time > calibration WOW! > > > On Mon, Aug 04, 2008 at 01:40:06PM -0600, Dan Magenheimer wrote: > > > * Greatly improved precision for time-sensitive SMP VMs > > I wonder if we could get a more detailed summary of all the > changes that > have been made here? > > Will this let us stop taking a global lock in our PV time routine to > ensure monotonicity? > > regards > john >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2008-Aug-05  21:12 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On Tue, Aug 05, 2008 at 02:49:25PM -0600, Dan Magenheimer wrote:> The algorithm used to compute the timestamp informationThanks.> I''m not sure its possible to guarantee monotonicity in > PV domains (without a global lock) except by doing a trap > or hypercall at each "get time".That''s a shame.> I''ve thought about implementing softtsc for PV domains for > this reason. (Softtsc was just added at 4.0 for hvm domains > and causes all hvm tsc reads to trap.) Would this be of > interest?No, as it would be incredibly slow on Solaris (I dread to imagine). regards, john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Aug-05  21:27 UTC
RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
> No, as it would be incredibly slow on Solaris (I dread to imagine).Could be. On my box (Conroe), trapping tsc in an hvm is faster than reading pit or hpet in the hypervisor or in a native OS.> -----Original Message----- > From: John Levon [mailto:levon@movementarian.org] > Sent: Tuesday, August 05, 2008 3:13 PM > To: Dan Magenheimer > Cc: Ian Pratt; Xen-Devel (E-mail); Dave Winchell; Keir Fraser > Subject: Re: [Xen-devel] RE: [PATCH] rendezvous-based local time > calibration WOW! > > > On Tue, Aug 05, 2008 at 02:49:25PM -0600, Dan Magenheimer wrote: > > > The algorithm used to compute the timestamp information > > Thanks. > > > I''m not sure its possible to guarantee monotonicity in > > PV domains (without a global lock) except by doing a trap > > or hypercall at each "get time". > > That''s a shame. > > > I''ve thought about implementing softtsc for PV domains for > > this reason. (Softtsc was just added at 4.0 for hvm domains > > and causes all hvm tsc reads to trap.) Would this be of > > interest? > > No, as it would be incredibly slow on Solaris (I dread to imagine). > > regards, > john >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Aug-05  21:43 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On 5/8/08 22:27, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote:>> No, as it would be incredibly slow on Solaris (I dread to imagine). > > Could be. On my box (Conroe), trapping tsc in an hvm is faster > than reading pit or hpet in the hypervisor or in a native OS.For a PV guest it only punts the monotonicity problem into the hypervisor of course. You still need to access a shared counter, or use a lock (i.e., communication/synchronisation between processors), or be guaranteed that local counters (TSCs) are driven by a common clock signal with negligible skew. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Aug-06  13:25 UTC
RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
> > I''m not sure its possible to guarantee monotonicity in > > PV domains (without a global lock) except by doing a trap > > or hypercall at each "get time". > > That''s a shame.Further followup on this... I''d encourage you to put some test code in your lock to see if time ever measurably goes backwards. It may never, or it may only on some ill-behaved-tsc machines or when cpufreq changes occur... needs testing. Even if it does, it may be by a smaller delta than all but the most sophisticated SMP applications can detect. Why?... On my (admittedly well-behaved-tsc) machine, I''ve now run a quarter-million samples on the new code. The "xm debug-key t" code now prints out both stime skew and tsc. The results (TSC scaled for easier reading): stime: max 349ns avg 114ns TSC: max 342ns avg 89ns This is a dual-core Conroe so the TSC is supposedly synchronized; so the differences are probably more due to inter-CPU cache synchronization in the measurement code than actual skew. My currently running test code also records distribution for stime skew. 99% of the samples are less than 200ns, 0.9% are 200ns-300ns, and 0.01% are greater than 300ns (and less than the max of 349ns). This compares to the previous algorithm in which I measured ~2% greater than 1us and a few greater than 10us. The old code was also sensitive to load, with average skew increasing when domains were busy. The new code should be insensitive to load. So still no guarantees, but I do think this qualifies as "greatly improved" and may also meet your needs. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2008-Aug-06  13:38 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On Wed, Aug 06, 2008 at 07:25:50AM -0600, Dan Magenheimer wrote:> > > I''m not sure its possible to guarantee monotonicity in > > > PV domains (without a global lock) except by doing a trap > > > or hypercall at each "get time". > > > > That''s a shame. > > Further followup on this... > > I''d encourage you to put some test code in your lock to > see if time ever measurably goes backwards. It may never, > or it may only on some ill-behaved-tsc machines or when > cpufreq changes occur... needs testing. Even if it > does, it may be by a smaller delta than all but the > most sophisticated SMP applications can detect.I believe the normal (metal) Solaris algorithm expects any inter-CPU TSC differences to remain static (that is, no drift), so any machine that breaks that is problematic: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/timestamp.c (Compare: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86xpv/os/xpv_timestamp.c ) The presumption is that gethrtimef() is monotonically increasing, which at least Xen 3.0.4 regularly broke. If the hypervisor has been fixed to give as much guarantees as we got already then great. A monotonic gethrtime() is part of the ABI so I''m not sure we can avoid a lock even on well-behaved machines if Xen isn''t correct. I wonder if we couldn''t do something when we know that we''re scheduling a VPCU onto a different CPU to ensure time can''t go backwards. Anyway, some more testing sounds like it would be interesting. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Aug-06  15:09 UTC
RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
> I wonder if we couldn''t do something when we know that we''re scheduling > a VPCU onto a different CPU to ensure time can''t go backwards.Again no guarantees but I think we are now under the magic threshold where the skew is smaller than the time required for scheduling a VCPU onto a different CPU. If so, consecutive gethrtime''s by the same thread in a domain should always be monotonic. The overhead of measuring the inter-CPU stime skew is too large to do at every cross-PCPU-schedule so doing any kind of adjustment would be difficult. But it might make sense for the Xen scheduler to do a get_s_time() before and after a cross-PCPU-schedule to detect the problem and printk if it occurs (possibly rate-limited in case it happens a lot on some badly-behaved machine).> -----Original Message----- > From: John Levon [mailto:levon@movementarian.org] > Sent: Wednesday, August 06, 2008 7:38 AM > To: Dan Magenheimer > Cc: Ian Pratt; Xen-Devel (E-mail); Dave Winchell; Keir Fraser > Subject: Re: [Xen-devel] RE: [PATCH] rendezvous-based local time > calibration WOW! > > > On Wed, Aug 06, 2008 at 07:25:50AM -0600, Dan Magenheimer wrote: > > > > > I''m not sure its possible to guarantee monotonicity in > > > > PV domains (without a global lock) except by doing a trap > > > > or hypercall at each "get time". > > > > > > That''s a shame. > > > > Further followup on this... > > > > I''d encourage you to put some test code in your lock to > > see if time ever measurably goes backwards. It may never, > > or it may only on some ill-behaved-tsc machines or when > > cpufreq changes occur... needs testing. Even if it > > does, it may be by a smaller delta than all but the > > most sophisticated SMP applications can detect. > > I believe the normal (metal) Solaris algorithm expects any > inter-CPU TSC > differences to remain static (that is, no drift), so any machine that > breaks that is problematic: > > http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86pc/os/timestamp.c (Compare: http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/i86xpv/os/xpv_timestamp.c ) The presumption is that gethrtimef() is monotonically increasing, which at least Xen 3.0.4 regularly broke. If the hypervisor has been fixed to give as much guarantees as we got already then great. A monotonic gethrtime() is part of the ABI so I''m not sure we can avoid a lock even on well-behaved machines if Xen isn''t correct. I wonder if we couldn''t do something when we know that we''re scheduling a VPCU onto a different CPU to ensure time can''t go backwards. Anyway, some more testing sounds like it would be interesting. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2008-Aug-06  15:21 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On Wed, Aug 06, 2008 at 09:09:06AM -0600, Dan Magenheimer wrote:> Again no guarantees but I think we are now under the magic > threshold where the skew is smaller than the time required > for scheduling a VCPU onto a different CPU. If so, > consecutive gethrtime''s by the same thread in a domain > should always be monotonic.Right! That sounds positive.> The overhead of measuring the inter-CPU stime skew is > too large to do at every cross-PCPU-schedule so doing > any kind of adjustment would be difficult. > But it might make sense for the Xen scheduler to do a > get_s_time() before and after a cross-PCPU-schedule > to detect the problem and printk if it occurs > (possibly rate-limited in case it happens a lot on > some badly-behaved machine).If we''re doing a get_s_time() before the schedule, don''t we merely* have to ensure that the new s_time is after the last recorded one on the previous CPU? (Yes, I''m handwaving terribly) regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Aug-06  15:34 UTC
RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
> > The overhead of measuring the inter-CPU stime skew is > > too large to do at every cross-PCPU-schedule so doing > > any kind of adjustment would be difficult. > > But it might make sense for the Xen scheduler to do a > > get_s_time() before and after a cross-PCPU-schedule > > to detect the problem and printk if it occurs > > (possibly rate-limited in case it happens a lot on > > some badly-behaved machine). > > If we''re doing a get_s_time() before the schedule, don''t we > merely* have > to ensure that the new s_time is after the last recorded one on the > previous CPU? (Yes, I''m handwaving terribly)Yes, that detects the problem so it can be printk''d. But what can be done to reliably adjust for it? Adding a fixed offset to the new cpu''s stime doesn''t work because stime computation is adapted independently and dynamically on each cpu, so inter-CPU skew "jitters" and adding a constant may just make the max skew worse. I''m not saying it can''t be done, but I''m pretty sure it will be messy, so let''s make sure it needs to be fixed before trying to fix it. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Nils Nieuwejaar
2008-Aug-09  14:47 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On Wed, Aug 6, 2008 at 11:21 AM, John Levon <levon@movementarian.org> wrote:> On Wed, Aug 06, 2008 at 09:09:06AM -0600, Dan Magenheimer wrote: > >> Again no guarantees but I think we are now under the magic >> threshold where the skew is smaller than the time required >> for scheduling a VCPU onto a different CPU. If so, >> consecutive gethrtime''s by the same thread in a domain >> should always be monotonic. > > Right! That sounds positive.It''s an improvement, but I''m pretty sure it''s still not sufficient for Solaris. If I understand the change correctly, it seems to solve the problem for single-vcpu guests on an SMP, but not for multi-vcpu guests on an SMP. It sounds like the OS could reschedule a thread from VCPU 0 to VCPU 1 and consecutive calls to gethrtime() could still return non-monotonic results. Nils _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Dan Magenheimer
2008-Aug-09  20:55 UTC
RE: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
> On Wed, Aug 6, 2008 at 11:21 AM, John Levon > <levon@movementarian.org> wrote: > > On Wed, Aug 06, 2008 at 09:09:06AM -0600, Dan Magenheimer wrote: > > > >> Again no guarantees but I think we are now under the magic > >> threshold where the skew is smaller than the time required > >> for scheduling a VCPU onto a different CPU. If so, > >> consecutive gethrtime''s by the same thread in a domain > >> should always be monotonic. > > > > Right! That sounds positive. > > It''s an improvement, but I''m pretty sure it''s still not sufficient for > Solaris. If I understand the change correctly, it seems to solve the > problem for single-vcpu guests on an SMP, but not for multi-vcpu > guests on an SMP. It sounds like the OS could reschedule a thread > from VCPU 0 to VCPU 1 and consecutive calls to gethrtime() could still > return non-monotonic results.How long does it take for Solaris to reschedule a thread from VCPU0 to VCPU1? Its certainly not zero time (and you also need to add the overhead of gethrtime). But, yes, the same "no guarantees" applies to this situation... if a Solaris thread continuously calls gethrtime(), there is a non-zero probability that, if the thread changes physical CPUs and the thread rescheduling code is "very fast", two consecutive calls could observe time going backwards. But that''s true with much recent vintage hardware because TSCs sometimes skew, and so most OS''s with high-res timers are able to deal with this. True of Solaris, John? Dan _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2008-Aug-11  14:37 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On Sat, Aug 09, 2008 at 02:55:33PM -0600, Dan Magenheimer wrote:> > >> Again no guarantees but I think we are now under the magic > > >> threshold where the skew is smaller than the time required > > >> for scheduling a VCPU onto a different CPU. If so, > > >> consecutive gethrtime''s by the same thread in a domain > > >> should always be monotonic. > > > > > > Right! That sounds positive. > > > > It''s an improvement, but I''m pretty sure it''s still not sufficient for > > Solaris. If I understand the change correctly, it seems to solve the > > problem for single-vcpu guests on an SMP, but not for multi-vcpu > > guests on an SMP. It sounds like the OS could reschedule a thread > > from VCPU 0 to VCPU 1 and consecutive calls to gethrtime() could still > > return non-monotonic results. > > How long does it take for Solaris to reschedule a thread from > VCPU0 to VCPU1? Its certainly not zero time (and you also need > to add the overhead of gethrtime). > > But, yes, the same "no guarantees" applies to this situation... > if a Solaris thread continuously calls gethrtime(), there is a > non-zero probability that, if the thread changes physical CPUs > and the thread rescheduling code is "very fast", > two consecutive calls could observe time going backwards.It''s only non-zero if we can indeed reschedule fast enough. If it''s now below the threshold, then we can consider it effectively fixed. Only testing can really tell us that.> But that''s true with much recent vintage hardware because TSCs > sometimes skew, and so most OS''s with high-res timers are able to > deal with this. > > True of Solaris, John?I''m not an expert on the relevant code, but I believe the solution to TSC drift (as Solaris calls what I think you call skew) is to set ''tsc_gethrtime_enable'' to zero, so we don''t use the TSC for this purpose. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Aug-11  14:38 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On 11/8/08 15:37, "John Levon" <levon@movementarian.org> wrote:> It''s only non-zero if we can indeed reschedule fast enough. If it''s now > below the threshold, then we can consider it effectively fixed. Only > testing can really tell us that.Depending on how critical this guarantee is, I wouldn''t rely on Xen to perform perfectly. Probably you should keep your lock. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2008-Aug-11  14:43 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On Mon, Aug 11, 2008 at 03:38:31PM +0100, Keir Fraser wrote:> > It''s only non-zero if we can indeed reschedule fast enough. If it''s now > > below the threshold, then we can consider it effectively fixed. Only > > testing can really tell us that. > > Depending on how critical this guarantee is, I wouldn''t rely on Xen to > perform perfectly. Probably you should keep your lock.Or maybe make it optional, and let people turn it on when VCPUs are pinned (VCPU migration doesn''t make much sense to me for server workloads AFAICS). That lock is *painful*. john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Aug-11  14:46 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On 11/8/08 15:43, "John Levon" <levon@movementarian.org> wrote:>> Depending on how critical this guarantee is, I wouldn''t rely on Xen to >> perform perfectly. Probably you should keep your lock. > > Or maybe make it optional, and let people turn it on when VCPUs are > pinned (VCPU migration doesn''t make much sense to me for server > workloads AFAICS). > > That lock is *painful*.What guarantee are you providing? Per thread, per address space, or global monotonicity? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2008-Aug-11  14:49 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On Mon, Aug 11, 2008 at 03:46:10PM +0100, Keir Fraser wrote:> >> Depending on how critical this guarantee is, I wouldn''t rely on Xen to > >> perform perfectly. Probably you should keep your lock. > > > > Or maybe make it optional, and let people turn it on when VCPUs are > > pinned (VCPU migration doesn''t make much sense to me for server > > workloads AFAICS). > > > > That lock is *painful*. > > What guarantee are you providing? Per thread, per address space, or global > monotonicity?Per thread non-strict monotonicity. john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2008-Aug-11  14:50 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On 11/8/08 15:49, "John Levon" <levon@movementarian.org> wrote:>> What guarantee are you providing? Per thread, per address space, or global >> monotonicity? > > Per thread non-strict monotonicity.Doesn''t this just require thread-local storage and no lock? -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2008-Aug-11  18:41 UTC
Re: [Xen-devel] RE: [PATCH] rendezvous-based local time calibration WOW!
On Mon, Aug 11, 2008 at 03:50:57PM +0100, Keir Fraser wrote:> >> What guarantee are you providing? Per thread, per address space, or global > >> monotonicity? > > > > Per thread non-strict monotonicity. > > Doesn''t this just require thread-local storage and no lock?The above is what we guarantee but it''s not how it''s implemented. All of that is based upon the per-CPU hrtime, so we need the lock (or a wholesale rework of how hrtime is managed in the Solaris kernel: that''s not going to happen :) regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel