David Vrabel
2013-May-30 14:25 UTC
[PATCH 1/2] x86/xen: sync the wallclock when the system time changes
From: David Vrabel <david.vrabel@citrix.com> Currently the Xen wallclock is only updated every 11 minutes if NTP is synchronized to its clock source. If a guest is started before NTP is synchronized it may see an incorrect wallclock time. Use the pvclock_gtod notifier chain to receive a notification when the system time has changed and update the wallclock to match. This chain is called on every timer tick and we want to avoid an extra (expensive) hypercall on every tick. Because dom0 has historically never provided a very accurate wallclock and guests do not expect one, we can do this simply. The wallclock is only updated if the difference between now and the last update is more than 0.5 s. Signed-off-by: David Vrabel <david.vrabel@citrix.com> --- arch/x86/xen/time.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 files changed, 46 insertions(+), 0 deletions(-) diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index a1947ac..60b7d1f 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -14,6 +14,7 @@ #include <linux/kernel_stat.h> #include <linux/math64.h> #include <linux/gfp.h> +#include <linux/pvclock_gtod.h> #include <asm/pvclock.h> #include <asm/xen/hypervisor.h> @@ -212,6 +213,48 @@ static int xen_set_wallclock(const struct timespec *now) return HYPERVISOR_dom0_op(&op); } +static int xen_pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused, + void *priv) +{ + static struct timespec last, next; + struct timespec now; + struct xen_platform_op op; + int ret; + + /* + * Set the Xen wallclock from Linux system time. + * + * dom0 hasn''t historically maintained a very accurate + * wallclock so guests don''t expect it. We can therefore + * reduce the number of expensive hypercalls by only updating + * the wallclock every 0.5 s. + */ + + now = __current_kernel_time(); + + if (timespec_compare(&now, &last) > 0 + && timespec_compare(&now, &next) < 0) + return 0; + + op.cmd = XENPF_settime; + op.u.settime.secs = now.tv_sec; + op.u.settime.nsecs = now.tv_nsec; + op.u.settime.system_time = xen_clocksource_read(); + + ret = HYPERVISOR_dom0_op(&op); + if (ret) + return 0; + + last = now; + next = timespec_add(now, ns_to_timespec(NSEC_PER_SEC / 2)); + + return 0; +} + +static struct notifier_block xen_pvclock_gtod_notifier = { + .notifier_call = xen_pvclock_gtod_notify, +}; + static struct clocksource xen_clocksource __read_mostly = { .name = "xen", .rating = 400, @@ -473,6 +516,9 @@ static void __init xen_time_init(void) xen_setup_runstate_info(cpu); xen_setup_timer(cpu); xen_setup_cpu_clockevents(); + + if (xen_initial_domain()) + pvclock_gtod_register_notifier(&xen_pvclock_gtod_notifier); } void __init xen_init_time_ops(void) -- 1.7.2.5
John Stultz
2013-May-31 00:30 UTC
Re: [PATCH 1/2] x86/xen: sync the wallclock when the system time changes
On 05/30/2013 07:25 AM, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > Currently the Xen wallclock is only updated every 11 minutes if NTP is > synchronized to its clock source. If a guest is started before NTP is > synchronized it may see an incorrect wallclock time.Ok.. So this is maybe starting to make a little more sense. I was under the mistaken impression domN guests referenced dom0''s system time when they called xen_get_wallclock(), but looking at this a bit closer, its starting to make a bit more sense that xen_get_wallclock() is just shared hypervisor data that is updated by dom0, and guests can access this data without interacting with dom0. Thus I can finally see the 11 minute update interval for dom0 to update the hypervisor wallclock data causes guests to get invalid time values when they initialize, reading the unset by dom0 hypervisor wallclock data. And thus I finally see the need for dom0 to more frequently update the hypervisor wallclock data. So first, sorry for being so dense through all of this. But this really could use a clearer explanation as to the nature of the issue. Few minor issues below.> Use the pvclock_gtod notifier chain to receive a notification when the > system time has changed and update the wallclock to match. > > This chain is called on every timer tick and we want to avoid an extra > (expensive) hypercall on every tick. Because dom0 has historically > never provided a very accurate wallclock and guests do not expect one, > we can do this simply. The wallclock is only updated if the > difference between now and the last update is more than 0.5 s.So given (at least I think ) I get why this is needed, is there a reason you''re using the notifier chain instead of a regular timer in dom0 to update the hypervisor''s wallclock data?> > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > --- > arch/x86/xen/time.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++ > 1 files changed, 46 insertions(+), 0 deletions(-) > > diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c > index a1947ac..60b7d1f 100644 > --- a/arch/x86/xen/time.c > +++ b/arch/x86/xen/time.c > @@ -14,6 +14,7 @@ > #include <linux/kernel_stat.h> > #include <linux/math64.h> > #include <linux/gfp.h> > +#include <linux/pvclock_gtod.h> > > #include <asm/pvclock.h> > #include <asm/xen/hypervisor.h> > @@ -212,6 +213,48 @@ static int xen_set_wallclock(const struct timespec *now) > return HYPERVISOR_dom0_op(&op); > } > > +static int xen_pvclock_gtod_notify(struct notifier_block *nb, unsigned long unused, > + void *priv) > +{ > + static struct timespec last, next; > + struct timespec now; > + struct xen_platform_op op; > + int ret; > + > + /* > + * Set the Xen wallclock from Linux system time. > + * > + * dom0 hasn''t historically maintained a very accurate > + * wallclock so guests don''t expect it. We can therefore > + * reduce the number of expensive hypercalls by only updating > + * the wallclock every 0.5 s.This comment needs some improvement. It doesn''t explain why Xen needs to set the virtual RTC so frequently, but then goes on to say it can be done every half second because guests don''t really expect it anyway.> + */ > + > + now = __current_kernel_time();You don''t seem to be holding the timekeeping lock here, so why are you calling the internal __ prefixed current_kernel_time() accessor?> + > + if (timespec_compare(&now, &last) > 0Not sure I understand why you''re bothering with the last value? Aren''t you just wanting to trigger when now is greater then next? So again, apologies for some of the runaround in the discussion! Lets sort out the above minor issues and I''ll be fine to queue this (given Xen maintainer acks) without grumbling. thanks -john
David Vrabel
2013-May-31 09:49 UTC
Re: [PATCH 1/2] x86/xen: sync the wallclock when the system time changes
On 31/05/13 01:30, John Stultz wrote:> On 05/30/2013 07:25 AM, David Vrabel wrote: >> From: David Vrabel <david.vrabel@citrix.com> >> >> Currently the Xen wallclock is only updated every 11 minutes if NTP is >> synchronized to its clock source. If a guest is started before NTP is >> synchronized it may see an incorrect wallclock time. > > Ok.. So this is maybe starting to make a little more sense. > > I was under the mistaken impression domN guests referenced dom0''s system > time when they called xen_get_wallclock(), but looking at this a bit > closer, its starting to make a bit more sense that xen_get_wallclock() > is just shared hypervisor data that is updated by dom0, and guests can > access this data without interacting with dom0. > > Thus I can finally see the 11 minute update interval for dom0 to update > the hypervisor wallclock data causes guests to get invalid time values > when they initialize, reading the unset by dom0 hypervisor wallclock > data. And thus I finally see the need for dom0 to more frequently update > the hypervisor wallclock data.This is correct. I''ll add an explanatory paragraph about the Xen wallclock to the changelog.>> Use the pvclock_gtod notifier chain to receive a notification when the >> system time has changed and update the wallclock to match. >> >> This chain is called on every timer tick and we want to avoid an extra >> (expensive) hypercall on every tick. Because dom0 has historically >> never provided a very accurate wallclock and guests do not expect one, >> we can do this simply. The wallclock is only updated if the >> difference between now and the last update is more than 0.5 s. > > > So given (at least I think ) I get why this is needed, is there a reason > you''re using the notifier chain instead of a regular timer in dom0 to > update the hypervisor''s wallclock data?Using the notifier allows step changes to be noticed immediately, using just a timer would leave a window after any step change where the Xen wallclock is wrong. Ideally, I would like a notification of step changes and a long period timer (to correct for drift). Can you think of a good way to do this?>> --- a/arch/x86/xen/time.c >> +++ b/arch/x86/xen/time.c >> @@ -212,6 +213,48 @@ static int xen_set_wallclock(const struct >> timespec *now) >> return HYPERVISOR_dom0_op(&op); >> } >> +static int xen_pvclock_gtod_notify(struct notifier_block *nb, >> unsigned long unused, >> + void *priv) >> +{ >> + static struct timespec last, next; >> + struct timespec now; >> + struct xen_platform_op op; >> + int ret; >> + >> + /* >> + * Set the Xen wallclock from Linux system time. >> + * >> + * dom0 hasn''t historically maintained a very accurate >> + * wallclock so guests don''t expect it. We can therefore >> + * reduce the number of expensive hypercalls by only updating >> + * the wallclock every 0.5 s. > > This comment needs some improvement. It doesn''t explain why Xen needs to > set the virtual RTC so frequently, but then goes on to say it can be > done every half second because guests don''t really expect it anyway.This would probably be better done as: if abs(current_wallclock - current_kernel_time) > threshold) update_wallclock(); i.e., we''re correcting the wallclock if it is wrong.>> + */ >> + >> + now = __current_kernel_time(); > > You don''t seem to be holding the timekeeping lock here, so why are you > calling the internal __ prefixed current_kernel_time() accessor?The notifier chain is called from timekeeping_update() which is called in a write_seqcount_begin/end(&timekeeper_seq) block.>> + >> + if (timespec_compare(&now, &last) > 0 > > Not sure I understand why you''re bothering with the last value? Aren''t > you just wanting to trigger when now is greater then next?This is to handle step changes that go backwards.> So again, apologies for some of the runaround in the discussion! Lets > sort out the above minor issues and I''ll be fine to queue this (given > Xen maintainer acks) without grumbling.No problem. David
John Stultz
2013-May-31 20:21 UTC
Re: [PATCH 1/2] x86/xen: sync the wallclock when the system time changes
On 05/31/2013 02:49 AM, David Vrabel wrote:> On 31/05/13 01:30, John Stultz wrote: >> On 05/30/2013 07:25 AM, David Vrabel wrote: >>> From: David Vrabel <david.vrabel@citrix.com> >>> >>> Currently the Xen wallclock is only updated every 11 minutes if NTP is >>> synchronized to its clock source. If a guest is started before NTP is >>> synchronized it may see an incorrect wallclock time. >> Ok.. So this is maybe starting to make a little more sense. >> >> I was under the mistaken impression domN guests referenced dom0''s system >> time when they called xen_get_wallclock(), but looking at this a bit >> closer, its starting to make a bit more sense that xen_get_wallclock() >> is just shared hypervisor data that is updated by dom0, and guests can >> access this data without interacting with dom0. >> >> Thus I can finally see the 11 minute update interval for dom0 to update >> the hypervisor wallclock data causes guests to get invalid time values >> when they initialize, reading the unset by dom0 hypervisor wallclock >> data. And thus I finally see the need for dom0 to more frequently update >> the hypervisor wallclock data. > This is correct. I''ll add an explanatory paragraph about the Xen > wallclock to the changelog.Thanks! I appreciate it!> >>> Use the pvclock_gtod notifier chain to receive a notification when the >>> system time has changed and update the wallclock to match. >>> >>> This chain is called on every timer tick and we want to avoid an extra >>> (expensive) hypercall on every tick. Because dom0 has historically >>> never provided a very accurate wallclock and guests do not expect one, >>> we can do this simply. The wallclock is only updated if the >>> difference between now and the last update is more than 0.5 s. >> >> So given (at least I think ) I get why this is needed, is there a reason >> you''re using the notifier chain instead of a regular timer in dom0 to >> update the hypervisor''s wallclock data? > Using the notifier allows step changes to be noticed immediately, using > just a timer would leave a window after any step change where the Xen > wallclock is wrong. > > Ideally, I would like a notification of step changes and a long period > timer (to correct for drift). Can you think of a good way to do this?So we have the clock_was_set() hook that we use to notify the hrtimer code and we use that for the timerfd notification as well (which allows userland to detect changes to CLOCK_REALTIME). Maybe that hook should get extended for this use?> >>> --- a/arch/x86/xen/time.c >>> +++ b/arch/x86/xen/time.c >>> @@ -212,6 +213,48 @@ static int xen_set_wallclock(const struct >>> timespec *now) >>> return HYPERVISOR_dom0_op(&op); >>> } >>> +static int xen_pvclock_gtod_notify(struct notifier_block *nb, >>> unsigned long unused, >>> + void *priv) >>> +{ >>> + static struct timespec last, next; >>> + struct timespec now; >>> + struct xen_platform_op op; >>> + int ret; >>> + >>> + /* >>> + * Set the Xen wallclock from Linux system time. >>> + * >>> + * dom0 hasn''t historically maintained a very accurate >>> + * wallclock so guests don''t expect it. We can therefore >>> + * reduce the number of expensive hypercalls by only updating >>> + * the wallclock every 0.5 s. >> This comment needs some improvement. It doesn''t explain why Xen needs to >> set the virtual RTC so frequently, but then goes on to say it can be >> done every half second because guests don''t really expect it anyway. > This would probably be better done as: > > if abs(current_wallclock - current_kernel_time) > threshold) > update_wallclock(); > > i.e., we''re correcting the wallclock if it is wrong.Yea, this makes more sense (though reading the current_wallclock may be too expensive each time?).>>> + */ >>> + >>> + now = __current_kernel_time(); >> You don''t seem to be holding the timekeeping lock here, so why are you >> calling the internal __ prefixed current_kernel_time() accessor? > The notifier chain is called from timekeeping_update() which is called > in a write_seqcount_begin/end(&timekeeper_seq) block.Ok. Please add a comment just to be clear. While I was ok with it when it was merged, calling the pvclock notifier chain while holding the timekeeping locks is striking me as not the smartest approach. So this may need to change in the future.>>> + >>> + if (timespec_compare(&now, &last) > 0 >> Not sure I understand why you''re bothering with the last value? Aren''t >> you just wanting to trigger when now is greater then next? > This is to handle step changes that go backwards.Ok, thanks for the clarification. Send me the next revision and we can get it queued up unless you want to look at doing something with clock_was_set instead. thanks -john