David Vrabel
2013-Jun-19 15:25 UTC
[PATCH 2/4] time: add a notifier chain for when the system time is stepped
From: David Vrabel <david.vrabel@citrix.com> The high resolution timer code gets notified of step changes to the system time with clock_was_set() or clock_was_set_delayed() calls. If other parts of the kernel require similar notification there is no clear place to hook into. Add a clock_was_set atomic notifier chain (clock_was_set_notifier_list) and call this in place of clock_was_set(). If the timekeeping locks are held, the calls are deferred to a new tasklet. The hrtimer code adds a notifier block to this chain and uses it to call (the now internal) clock_was_set(). Since the timekeeping code does not call the chain from the timer irq clock_was_set_delayed() and associated code can be removed. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Thomas Gleixner <tglx@linutronix.de> --- include/linux/hrtimer.h | 7 ------- include/linux/time.h | 5 +++++ kernel/hrtimer.c | 33 ++++++++++++++------------------- kernel/time/timekeeping.c | 34 +++++++++++++++++++++++++--------- 4 files changed, 44 insertions(+), 35 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index d19a5c2..6da7439 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -166,7 +166,6 @@ enum hrtimer_base_type { * @lock: lock protecting the base and associated clock bases * and timers * @active_bases: Bitfield to mark bases with active timers - * @clock_was_set: Indicates that clock was set from irq context. * @expires_next: absolute time of the next event which was scheduled * via clock_set_next_event() * @hres_active: State of high resolution mode @@ -180,7 +179,6 @@ enum hrtimer_base_type { struct hrtimer_cpu_base { raw_spinlock_t lock; unsigned int active_bases; - unsigned int clock_was_set; #ifdef CONFIG_HIGH_RES_TIMERS ktime_t expires_next; int hres_active; @@ -289,8 +287,6 @@ extern void hrtimer_peek_ahead_timers(void); # define MONOTONIC_RES_NSEC HIGH_RES_NSEC # define KTIME_MONOTONIC_RES KTIME_HIGH_RES -extern void clock_was_set_delayed(void); - #else # define MONOTONIC_RES_NSEC LOW_RES_NSEC @@ -312,11 +308,8 @@ static inline int hrtimer_is_hres_active(struct hrtimer *timer) return 0; } -static inline void clock_was_set_delayed(void) { } - #endif -extern void clock_was_set(void); #ifdef CONFIG_TIMERFD extern void timerfd_clock_was_set(void); #else diff --git a/include/linux/time.h b/include/linux/time.h index d5d229b..75bca39 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -185,6 +185,11 @@ struct tms; extern void do_sys_times(struct tms *); /* + * Notifier chain called when system time is stepped. + */ +extern struct atomic_notifier_head clock_was_set_notifier_list; + +/* * Similar to the struct tm in userspace <time.h>, but it needs to be here so * that the kernel source is self contained. */ diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index fd4b13b..6e475d5 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -721,19 +721,6 @@ static int hrtimer_switch_to_hres(void) return 1; } -/* - * Called from timekeeping code to reprogramm the hrtimer interrupt - * device. If called from the timer interrupt context we defer it to - * softirq context. - */ -void clock_was_set_delayed(void) -{ - struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases); - - cpu_base->clock_was_set = 1; - __raise_softirq_irqoff(HRTIMER_SOFTIRQ); -} - #else static inline int hrtimer_hres_active(void) { return 0; } @@ -762,7 +749,7 @@ static inline void retrigger_next_event(void *arg) { } * resolution timer interrupts. On UP we just disable interrupts and * call the high resolution interrupt code. */ -void clock_was_set(void) +static void clock_was_set(void) { #ifdef CONFIG_HIGH_RES_TIMERS /* Retrigger the CPU local events everywhere */ @@ -1434,11 +1421,6 @@ static void run_hrtimer_softirq(struct softirq_action *h) { struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases); - if (cpu_base->clock_was_set) { - cpu_base->clock_was_set = 0; - clock_was_set(); - } - hrtimer_peek_ahead_timers(); } @@ -1776,11 +1758,24 @@ static struct notifier_block __cpuinitdata hrtimers_nb = { .notifier_call = hrtimer_cpu_notify, }; +static int hrtimer_clock_was_set_notify(struct notifier_block *self, + unsigned long action, void *data) +{ + clock_was_set(); + return NOTIFY_OK; +} + +static struct notifier_block hrtimers_clock_was_set_nb = { + .notifier_call = hrtimer_clock_was_set_notify, +}; + void __init hrtimers_init(void) { hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE, (void *)(long)smp_processor_id()); register_cpu_notifier(&hrtimers_nb); + atomic_notifier_chain_register(&clock_was_set_notifier_list, + &hrtimers_clock_was_set_nb); #ifdef CONFIG_HIGH_RES_TIMERS open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq); #endif diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index baeeb5c..852b880 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -198,6 +198,25 @@ static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk) return nsec + get_arch_timeoffset(); } +ATOMIC_NOTIFIER_HEAD(clock_was_set_notifier_list); +EXPORT_SYMBOL_GPL(clock_was_set_notifier_list); + +static void timekeeping_clock_was_set(void) +{ + atomic_notifier_call_chain(&clock_was_set_notifier_list, 0, NULL); +} + +static void timekeeping_clock_was_set_task(unsigned long d) +{ + timekeeping_clock_was_set(); +} +DECLARE_TASKLET(clock_was_set_tasklet, timekeeping_clock_was_set_task, 0); + +static void timekeeping_clock_was_set_delayed(void) +{ + tasklet_schedule(&clock_was_set_tasklet); +} + static RAW_NOTIFIER_HEAD(pvclock_gtod_chain); static void update_pvclock_gtod(struct timekeeper *tk) @@ -513,8 +532,7 @@ int do_settimeofday(const struct timespec *tv) write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); - /* signal hrtimers about time change */ - clock_was_set(); + timekeeping_clock_was_set(); return 0; } @@ -557,8 +575,7 @@ error: /* even if we error out, we forwarded the time, so call update */ write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); - /* signal hrtimers about time change */ - clock_was_set(); + timekeeping_clock_was_set(); return ret; } @@ -607,7 +624,7 @@ void timekeeping_set_tai_offset(s32 tai_offset) __timekeeping_set_tai_offset(tk, tai_offset); write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); - clock_was_set(); + timekeeping_clock_was_set(); } /** @@ -877,8 +894,7 @@ void timekeeping_inject_sleeptime(struct timespec *delta) write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); - /* signal hrtimers about time change */ - clock_was_set(); + timekeeping_clock_was_set(); } /** @@ -1260,7 +1276,7 @@ static inline void accumulate_nsecs_to_secs(struct timekeeper *tk) __timekeeping_set_tai_offset(tk, tk->tai_offset - leap); - clock_was_set_delayed(); + timekeeping_clock_was_set_delayed(); } } } @@ -1677,7 +1693,7 @@ int do_adjtimex(struct timex *txc) if (tai != orig_tai) { __timekeeping_set_tai_offset(tk, tai); - clock_was_set_delayed(); + timekeeping_clock_was_set_delayed(); } write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); -- 1.7.2.5
John Stultz
2013-Jun-19 16:52 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On 06/19/2013 08:25 AM, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > The high resolution timer code gets notified of step changes to the > system time with clock_was_set() or clock_was_set_delayed() calls. If > other parts of the kernel require similar notification there is no > clear place to hook into. > > Add a clock_was_set atomic notifier chain > (clock_was_set_notifier_list) and call this in place of > clock_was_set(). If the timekeeping locks are held, the calls are > deferred to a new tasklet. > > The hrtimer code adds a notifier block to this chain and uses it to > call (the now internal) clock_was_set(). Since the timekeeping code > does not call the chain from the timer irq clock_was_set_delayed() and > associated code can be removed.So on my initial quick review, this *looks* pretty reasonable. I get a little worried about interface abuse (ie: random drivers trying to hook into clock_was_set_notifier_list), but we can move that into timekeeper_internal.h or something similar to limit that. The other issue here is we''ve been burned pretty badly in the past with changes to clock_was_set(), as its key to keeping timers in line with timekeeping. So this will need a fair amount of testing and run time before this gets merged, so 3.12 is what we''d be targeting at the earliest (its getting a bit late for taking changes for 3.11 anyway). If you want to try to push patch 1/4 in for 3.11 via the Xen tree, I''ll see about queuing the other three for hopefully 3.12. thanks -john
Konrad Rzeszutek Wilk
2013-Jun-19 17:13 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On Wed, Jun 19, 2013 at 09:52:06AM -0700, John Stultz wrote:> On 06/19/2013 08:25 AM, David Vrabel wrote: > >From: David Vrabel <david.vrabel@citrix.com> > > > >The high resolution timer code gets notified of step changes to the > >system time with clock_was_set() or clock_was_set_delayed() calls. If > >other parts of the kernel require similar notification there is no > >clear place to hook into. > > > >Add a clock_was_set atomic notifier chain > >(clock_was_set_notifier_list) and call this in place of > >clock_was_set(). If the timekeeping locks are held, the calls are > >deferred to a new tasklet. > > > >The hrtimer code adds a notifier block to this chain and uses it to > >call (the now internal) clock_was_set(). Since the timekeeping code > >does not call the chain from the timer irq clock_was_set_delayed() and > >associated code can be removed. > > So on my initial quick review, this *looks* pretty reasonable. I get > a little worried about interface abuse (ie: random drivers trying to > hook into clock_was_set_notifier_list), but we can move that into > timekeeper_internal.h or something similar to limit that. > > The other issue here is we''ve been burned pretty badly in the past > with changes to clock_was_set(), as its key to keeping timers in > line with timekeeping. So this will need a fair amount of testing > and run time before this gets merged, so 3.12 is what we''d be > targeting at the earliest (its getting a bit late for taking changes > for 3.11 anyway).I think we have four weeks left? Or you think Linus is going to release at the end of the month?> > If you want to try to push patch 1/4 in for 3.11 via the Xen tree,Done.> I''ll see about queuing the other three for hopefully 3.12.OK, let me run them through the testing gauntleet to have the warm fuzzy feeling.> > thanks > -john >
John Stultz
2013-Jun-19 17:38 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On 06/19/2013 10:13 AM, Konrad Rzeszutek Wilk wrote:> On Wed, Jun 19, 2013 at 09:52:06AM -0700, John Stultz wrote: >> On 06/19/2013 08:25 AM, David Vrabel wrote: >>> From: David Vrabel <david.vrabel@citrix.com> >>> >>> The high resolution timer code gets notified of step changes to the >>> system time with clock_was_set() or clock_was_set_delayed() calls. If >>> other parts of the kernel require similar notification there is no >>> clear place to hook into. >>> >>> Add a clock_was_set atomic notifier chain >>> (clock_was_set_notifier_list) and call this in place of >>> clock_was_set(). If the timekeeping locks are held, the calls are >>> deferred to a new tasklet. >>> >>> The hrtimer code adds a notifier block to this chain and uses it to >>> call (the now internal) clock_was_set(). Since the timekeeping code >>> does not call the chain from the timer irq clock_was_set_delayed() and >>> associated code can be removed. >> So on my initial quick review, this *looks* pretty reasonable. I get >> a little worried about interface abuse (ie: random drivers trying to >> hook into clock_was_set_notifier_list), but we can move that into >> timekeeper_internal.h or something similar to limit that. >> >> The other issue here is we''ve been burned pretty badly in the past >> with changes to clock_was_set(), as its key to keeping timers in >> line with timekeeping. So this will need a fair amount of testing >> and run time before this gets merged, so 3.12 is what we''d be >> targeting at the earliest (its getting a bit late for taking changes >> for 3.11 anyway). > I think we have four weeks left? Or you think Linus is going to release > at the end of the month?Well, I have to queue it, and then get Thomas to pull it from me, and that can take some time. And after -rc6 there''s just not a lot of time for things to get testing in -next before being pushed to Linus. Its just a bit rushed for me.>> If you want to try to push patch 1/4 in for 3.11 via the Xen tree, > Done. >> I''ll see about queuing the other three for hopefully 3.12. > OK, let me run them through the testing gauntleet to have the > warm fuzzy feeling.Thanks. I''ll be running it through my test cases as well. -john
David Vrabel
2013-Jun-20 10:50 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On 19/06/13 17:52, John Stultz wrote:> On 06/19/2013 08:25 AM, David Vrabel wrote: >> From: David Vrabel <david.vrabel@citrix.com> >> >> The high resolution timer code gets notified of step changes to the >> system time with clock_was_set() or clock_was_set_delayed() calls. If >> other parts of the kernel require similar notification there is no >> clear place to hook into. >> >> Add a clock_was_set atomic notifier chain >> (clock_was_set_notifier_list) and call this in place of >> clock_was_set(). If the timekeeping locks are held, the calls are >> deferred to a new tasklet. >> >> The hrtimer code adds a notifier block to this chain and uses it to >> call (the now internal) clock_was_set(). Since the timekeeping code >> does not call the chain from the timer irq clock_was_set_delayed() and >> associated code can be removed. > > So on my initial quick review, this *looks* pretty reasonable. I get a > little worried about interface abuse (ie: random drivers trying to hook > into clock_was_set_notifier_list), but we can move that into > timekeeper_internal.h or something similar to limit that.I can move the actual list to time.c but we still need to provide a register_clock_was_set_notifier() call in linux/time.h as the two current users (Xen and hrtimers) don''t include and probably should not include timekeeper_internal.h.> The other issue here is we''ve been burned pretty badly in the past with > changes to clock_was_set(), as its key to keeping timers in line with > timekeeping. So this will need a fair amount of testing and run time > before this gets merged, so 3.12 is what we''d be targeting at the > earliest (its getting a bit late for taking changes for 3.11 anyway).hrtimer''s clock_was_set() is called at the same point in the non-delayed case. For the delayed case, it used to be called at the beginning of the hrtimer softirq and now it is called from a tasklet, but if I understand this correctly, the tasklet softirq will be called before the hrtimer one so this should give the same behaviour.> If you want to try to push patch 1/4 in for 3.11 via the Xen tree, I''ll > see about queuing the other three for hopefully 3.12.3.12 is fine. I''d prefer to have a correct and well-tested fix merged. David
David Vrabel
2013-Jun-20 19:16 UTC
[PATCH 2/4] time: add a notifier chain for when the system time is stepped
From: David Vrabel <david.vrabel@citrix.com> The high resolution timer code gets notified of step changes to the system time with clock_was_set() or clock_was_set_delayed() calls. If other parts of the kernel require similar notification there is no clear place to hook into. Add a clock_was_set atomic notifier chain (clock_was_set_notifier_list) and call this in place of clock_was_set(). If the timekeeping locks are held, the calls are deferred to a new tasklet. The hrtimer code adds a notifier block to this chain and uses it to call (the now internal) clock_was_set(). Since the timekeeping code does not call the chain from the timer irq clock_was_set_delayed() and associated code can be removed. For the delayed case, clock_was_set() used to be called at the beginning of the hrtimer softirq and now it is called from a tasklet. The tasklet softirq will be called before the hrtimer one so this should give the same behaviour. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Cc: Thomas Gleixner <tglx@linutronix.de> --- include/linux/hrtimer.h | 7 ------- include/linux/time.h | 7 +++++++ kernel/hrtimer.c | 34 +++++++++++++--------------------- kernel/time/timekeeping.c | 39 ++++++++++++++++++++++++++++++--------- 4 files changed, 50 insertions(+), 37 deletions(-) diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 13df0fa..45e30f6 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -166,7 +166,6 @@ enum hrtimer_base_type { * @lock: lock protecting the base and associated clock bases * and timers * @active_bases: Bitfield to mark bases with active timers - * @clock_was_set: Indicates that clock was set from irq context. * @expires_next: absolute time of the next event which was scheduled * via clock_set_next_event() * @hres_active: State of high resolution mode @@ -180,7 +179,6 @@ enum hrtimer_base_type { struct hrtimer_cpu_base { raw_spinlock_t lock; unsigned int active_bases; - unsigned int clock_was_set; #ifdef CONFIG_HIGH_RES_TIMERS ktime_t expires_next; int hres_active; @@ -289,8 +287,6 @@ extern void hrtimer_peek_ahead_timers(void); # define MONOTONIC_RES_NSEC HIGH_RES_NSEC # define KTIME_MONOTONIC_RES KTIME_HIGH_RES -extern void clock_was_set_delayed(void); - #else # define MONOTONIC_RES_NSEC LOW_RES_NSEC @@ -312,11 +308,8 @@ static inline int hrtimer_is_hres_active(struct hrtimer *timer) return 0; } -static inline void clock_was_set_delayed(void) { } - #endif -extern void clock_was_set(void); #ifdef CONFIG_TIMERFD extern void timerfd_clock_was_set(void); #else diff --git a/include/linux/time.h b/include/linux/time.h index d5d229b..a0c08a7 100644 --- a/include/linux/time.h +++ b/include/linux/time.h @@ -184,6 +184,13 @@ extern void timekeeping_clocktai(struct timespec *ts); struct tms; extern void do_sys_times(struct tms *); +struct notifier_block; + +/* + * Notifier chain called when system time is stepped. + */ +extern int register_clock_was_set_notifier(struct notifier_block *nb); + /* * Similar to the struct tm in userspace <time.h>, but it needs to be here so * that the kernel source is self contained. diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index 34384b4..a853f9b 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -721,19 +721,6 @@ static int hrtimer_switch_to_hres(void) return 1; } -/* - * Called from timekeeping code to reprogramm the hrtimer interrupt - * device. If called from the timer interrupt context we defer it to - * softirq context. - */ -void clock_was_set_delayed(void) -{ - struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases); - - cpu_base->clock_was_set = 1; - __raise_softirq_irqoff(HRTIMER_SOFTIRQ); -} - #else static inline int hrtimer_hres_active(void) { return 0; } @@ -762,7 +749,7 @@ static inline void retrigger_next_event(void *arg) { } * resolution timer interrupts. On UP we just disable interrupts and * call the high resolution interrupt code. */ -void clock_was_set(void) +static void clock_was_set(void) { #ifdef CONFIG_HIGH_RES_TIMERS /* Retrigger the CPU local events everywhere */ @@ -1441,13 +1428,6 @@ void hrtimer_peek_ahead_timers(void) static void run_hrtimer_softirq(struct softirq_action *h) { - struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases); - - if (cpu_base->clock_was_set) { - cpu_base->clock_was_set = 0; - clock_was_set(); - } - hrtimer_peek_ahead_timers(); } @@ -1785,11 +1765,23 @@ static struct notifier_block __cpuinitdata hrtimers_nb = { .notifier_call = hrtimer_cpu_notify, }; +static int hrtimer_clock_was_set_notify(struct notifier_block *self, + unsigned long action, void *data) +{ + clock_was_set(); + return NOTIFY_OK; +} + +static struct notifier_block hrtimers_clock_was_set_nb = { + .notifier_call = hrtimer_clock_was_set_notify, +}; + void __init hrtimers_init(void) { hrtimer_cpu_notify(&hrtimers_nb, (unsigned long)CPU_UP_PREPARE, (void *)(long)smp_processor_id()); register_cpu_notifier(&hrtimers_nb); + register_clock_was_set_notifier(&hrtimers_clock_was_set_nb); #ifdef CONFIG_HIGH_RES_TIMERS open_softirq(HRTIMER_SOFTIRQ, run_hrtimer_softirq); #endif diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index baeeb5c..96c5c8e 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -198,6 +198,30 @@ static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk) return nsec + get_arch_timeoffset(); } +static ATOMIC_NOTIFIER_HEAD(clock_was_set_notifier_list); + +int register_clock_was_set_notifier(struct notifier_block *nb) +{ + return atomic_notifier_chain_register(&clock_was_set_notifier_list, nb); +} +EXPORT_SYMBOL_GPL(register_clock_was_set_notifier); + +static void timekeeping_clock_was_set(void) +{ + atomic_notifier_call_chain(&clock_was_set_notifier_list, 0, NULL); +} + +static void timekeeping_clock_was_set_task(unsigned long d) +{ + timekeeping_clock_was_set(); +} +DECLARE_TASKLET(clock_was_set_tasklet, timekeeping_clock_was_set_task, 0); + +static void timekeeping_clock_was_set_delayed(void) +{ + tasklet_schedule(&clock_was_set_tasklet); +} + static RAW_NOTIFIER_HEAD(pvclock_gtod_chain); static void update_pvclock_gtod(struct timekeeper *tk) @@ -513,8 +537,7 @@ int do_settimeofday(const struct timespec *tv) write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); - /* signal hrtimers about time change */ - clock_was_set(); + timekeeping_clock_was_set(); return 0; } @@ -557,8 +580,7 @@ error: /* even if we error out, we forwarded the time, so call update */ write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); - /* signal hrtimers about time change */ - clock_was_set(); + timekeeping_clock_was_set(); return ret; } @@ -607,7 +629,7 @@ void timekeeping_set_tai_offset(s32 tai_offset) __timekeeping_set_tai_offset(tk, tai_offset); write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); - clock_was_set(); + timekeeping_clock_was_set(); } /** @@ -877,8 +899,7 @@ void timekeeping_inject_sleeptime(struct timespec *delta) write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); - /* signal hrtimers about time change */ - clock_was_set(); + timekeeping_clock_was_set(); } /** @@ -1260,7 +1281,7 @@ static inline void accumulate_nsecs_to_secs(struct timekeeper *tk) __timekeeping_set_tai_offset(tk, tk->tai_offset - leap); - clock_was_set_delayed(); + timekeeping_clock_was_set_delayed(); } } } @@ -1677,7 +1698,7 @@ int do_adjtimex(struct timex *txc) if (tai != orig_tai) { __timekeeping_set_tai_offset(tk, tai); - clock_was_set_delayed(); + timekeeping_clock_was_set_delayed(); } write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); -- 1.7.2.5
Thomas Gleixner
2013-Jun-21 07:57 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On Thu, 20 Jun 2013, David Vrabel wrote:> From: David Vrabel <david.vrabel@citrix.com> > > The high resolution timer code gets notified of step changes to the > system time with clock_was_set() or clock_was_set_delayed() calls. If > other parts of the kernel require similar notification there is no > clear place to hook into.You fail to explain why any other part of the kernel requires a notification. We went great length to confine timekeeping inside the core code and now you add random notifiers along with totally ugly tasklet constructs. What for? Without a reasonable explanation of the problem you are trying to solve this is going nowhere near the tree. Thanks, tglx
David Vrabel
2013-Jun-21 12:41 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On 21/06/13 08:57, Thomas Gleixner wrote:> On Thu, 20 Jun 2013, David Vrabel wrote: > >> From: David Vrabel <david.vrabel@citrix.com> >> >> The high resolution timer code gets notified of step changes to the >> system time with clock_was_set() or clock_was_set_delayed() calls. If >> other parts of the kernel require similar notification there is no >> clear place to hook into. > > You fail to explain why any other part of the kernel requires a > notification.This is needed by patch 3 in this series. "The Xen wallclock is a software only clock within the Xen hypervisor that is used by: a) PV guests as the equivalent of a hardware RTC; and b) the hypervisor as the clock source for the emulated RTC provided to HVM guests. Currently the Xen wallclock is only updated every 11 minutes if NTP is synchronized to its clock source. If a guest is started before NTP is synchronized it may see an incorrect wallclock time. Use the clock_was_set notifier chain to receive a notification when the system time is stepped and update the wallclock to match the current system time."> We went great length to confine timekeeping inside the core code and > now you add random notifiers along with totally ugly tasklet > constructs.I''m not sure I understand your objection to the use of a tasklet. Using the hrtimer softirq for something that is no longer hrtimer-specific did not seem like the correct thing to do. David
John Stultz
2013-Jun-21 16:22 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On 06/21/2013 12:57 AM, Thomas Gleixner wrote:> On Thu, 20 Jun 2013, David Vrabel wrote: > >> From: David Vrabel <david.vrabel@citrix.com> >> >> The high resolution timer code gets notified of step changes to the >> system time with clock_was_set() or clock_was_set_delayed() calls. If >> other parts of the kernel require similar notification there is no >> clear place to hook into. > You fail to explain why any other part of the kernel requires a > notification. > > We went great length to confine timekeeping inside the core code and > now you add random notifiers along with totally ugly tasklet > constructs. What for? > > Without a reasonable explanation of the problem you are trying to > solve this is going nowhere near the tree.It took awhile for me to get why it was needed as well. My understanding is that on Xen, they want the hypervisor''s virtual RTC needs to be aligned with Dom0s system time (so that DomN guests boot with the correct time). Thus changes to the system time need to cause Dom0 to update the virtual RTC. Its not terribly unlike the timerfd notification we do to userspace, but instead is done for the Dom0 Xen management code. I do agree we need to keep users of the notification on a short leash (hopefully keeping the interface behind some sort of internal.h header file) so its easy to see when folks start trying to use it. thanks -john
Thomas Gleixner
2013-Jun-21 23:06 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On Fri, 21 Jun 2013, David Vrabel wrote:> On 21/06/13 08:57, Thomas Gleixner wrote: > > On Thu, 20 Jun 2013, David Vrabel wrote: > > > >> From: David Vrabel <david.vrabel@citrix.com> > >> > >> The high resolution timer code gets notified of step changes to the > >> system time with clock_was_set() or clock_was_set_delayed() calls. If > >> other parts of the kernel require similar notification there is no > >> clear place to hook into. > > > > You fail to explain why any other part of the kernel requires a > > notification. > > This is needed by patch 3 in this series. > > "The Xen wallclock is a software only clock within the Xen hypervisor > that is used by: a) PV guests as the equivalent of a hardware RTC; and > b) the hypervisor as the clock source for the emulated RTC provided to > HVM guests. > > Currently the Xen wallclock is only updated every 11 minutes if NTP is > synchronized to its clock source. If a guest is started before NTP is > synchronized it may see an incorrect wallclock time.What you are saying is, that you are fixing Xens failure to implement a proper RTC emulation by hacking a notifier into the core code. You can''t be serious about that. According to your changelog: Currently the Xen wallclock is only updated every 11 minutes if NTP is synchronized to its clocksource. How is that related to clock_was_set() ? clock_was_set*() is invoked from: do_settimeofday() timekeeping_inject_offset() timekeeping_set_tai_offset() timekeeping_inject_sleeptime() update_wall_time() do_adjtimex() The only function which calls clock_was_set() and can affect RTC is do_adjtimex(). Though you claim that the natural place to add a notifier is clock_was_set(). So you went the other way round this time. In the hrtimers case you tried to fix shortcomings of the core code in some random Xen code. With this patch you try to fix Xen nonsense in the core code. Can you please provide a proper explanation of the problem you are trying to solve? This means that you should explain the semantics of the desired XEN RTC emulation and not the desired workarounds to fix the shortcommings current implementation. Thanks, tglx
David Vrabel
2013-Jun-24 10:51 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On 22/06/13 00:06, Thomas Gleixner wrote:> On Fri, 21 Jun 2013, David Vrabel wrote: >> On 21/06/13 08:57, Thomas Gleixner wrote: >>> On Thu, 20 Jun 2013, David Vrabel wrote: >>> >>>> From: David Vrabel <david.vrabel@citrix.com> >>>> >>>> The high resolution timer code gets notified of step changes to the >>>> system time with clock_was_set() or clock_was_set_delayed() calls. If >>>> other parts of the kernel require similar notification there is no >>>> clear place to hook into. >>> >>> You fail to explain why any other part of the kernel requires a >>> notification. >> >> This is needed by patch 3 in this series. >> >> "The Xen wallclock is a software only clock within the Xen hypervisor >> that is used by: a) PV guests as the equivalent of a hardware RTC; and >> b) the hypervisor as the clock source for the emulated RTC provided to >> HVM guests. >> >> Currently the Xen wallclock is only updated every 11 minutes if NTP is >> synchronized to its clock source. If a guest is started before NTP is >> synchronized it may see an incorrect wallclock time. > > What you are saying is, that you are fixing Xens failure to implement > a proper RTC emulation by hacking a notifier into the core code. You > can''t be serious about that.Xen does provide proper emulation of an RTC for guests. Both full hardware emulation for fully-virtualized guest (HVM) and a lighter weight interface for paravirtualized guests (PV). As with any emulated RTC, there needs to be underlying clocksource providing the time. Under Xen, this is the Xen wallclock and it is implemented as a record of the date/time timestamped with the corresponding Xen clocksource value[1]. KVM provides an identical wallclock to its guests -- see the common pvclock_read_wallclock() function. Xen has hardware drivers for only the minimal amount of hardware necessary for the scheduling and isolation of guests. It does not have drivers for any hardware RTC nor does it have a network stack or an implementation of NTP. Therefore it has no way to maintain the correctness of the Xen wallclock and relies on the control domain (dom0) to do this. Dom0 updates the Xen wallclock with the XENPF_settime platform_op hypercall. To ensure the correctness of the Xen wallclock it is kept in sync with dom0''s system time (which is assumed to be correct and would typically be corrected by NTP). This requires that the Xen wallclock is both: a) updated on step changes to system time. b) updated periodically to correct for any drift. This behaviour (keeping the wallclock in sync with dom0 system time) is part of the ABI provided by the kernel and changing it would break existing user space. This patch set is fixing the rare case where a guest is started before NTP has synced and thus sees an incorrect wallclock time which may cause the guest to fail to boot.> According to your changelog: > > Currently the Xen wallclock is only updated every 11 minutes if NTP is > synchronized to its clocksource. > > How is that related to clock_was_set() ?It''s not. This is the update_persistent_clock() call from the periodic sync_cmos_clock() work.> clock_was_set*() is invoked from: > > do_settimeofday() > timekeeping_inject_offset() > timekeeping_set_tai_offset() > timekeeping_inject_sleeptime() > update_wall_time() > do_adjtimex() > > The only function which calls clock_was_set() and can affect RTC is > do_adjtimex(). Though you claim that the natural place to add a > notifier is clock_was_set(). > > So you went the other way round this time. In the hrtimers case you > tried to fix shortcomings of the core code in some random Xen > code. With this patch you try to fix Xen nonsense in the core code.KVM uses a very similar mechanism to maintain system time for a guest so guest system time is synchronized with host system time. See the pvclock_gtod notifier chain and its usage in arch/x86/kvm/x86.c. v3 of this series did use this existing notifier but it is called on every timer tick so this is more expensive than necessary to meet the requirements (see (a) and (b) above) for maintaining the Xen wallclock. John suggested hooking into clock_was_set().> Can you please provide a proper explanation of the problem you are > trying to solve? This means that you should explain the semantics of > the desired XEN RTC emulation and not the desired workarounds to fix > the shortcommings current implementation.In summary, both Xen and KVM need to solve similar problems with keeping time synchronized between a host and guests. The key difference between the two hypervisors is that Xen synchronizes the wallclock and KVM synchronizes system time. David [1] The Xen clocksource is monotonic, nanosecond resolution clocksource provided by Xen for use internally and by guests.
Thomas Gleixner
2013-Jun-24 16:30 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On Mon, 24 Jun 2013, David Vrabel wrote:> On 22/06/13 00:06, Thomas Gleixner wrote: > This patch set is fixing the rare case where a guest is started before > NTP has synced and thus sees an incorrect wallclock time which may cause > the guest to fail to boot.You''re not fixing it, you are just making the window smaller. clock_was_set() is called outside of the timekeeper_lock protected regions, so what prevents the guest to start before the notifier is invoked? We already have a synchronous notifier in place and the notifier call itself is not expensive. What''s expensive is the hypercall and there is no way at the moment to figure out whether the update is relevant for you or just a tick. Though that''s trivial information to provide without imposing another notifier including the surrounding mess on the core code. Completely untested patch below. Thanks, tglx --- diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index baeeb5c..6e9f838 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -200,9 +200,9 @@ static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk) static RAW_NOTIFIER_HEAD(pvclock_gtod_chain); -static void update_pvclock_gtod(struct timekeeper *tk) +static void update_pvclock_gtod(struct timekeeper *tk, bool cws) { - raw_notifier_call_chain(&pvclock_gtod_chain, 0, tk); + raw_notifier_call_chain(&pvclock_gtod_chain, cws, tk); } /** @@ -216,7 +216,7 @@ int pvclock_gtod_register_notifier(struct notifier_block *nb) raw_spin_lock_irqsave(&timekeeper_lock, flags); ret = raw_notifier_chain_register(&pvclock_gtod_chain, nb); - update_pvclock_gtod(tk); + update_pvclock_gtod(tk, true); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); return ret; @@ -241,14 +241,15 @@ int pvclock_gtod_unregister_notifier(struct notifier_block *nb) EXPORT_SYMBOL_GPL(pvclock_gtod_unregister_notifier); /* must hold timekeeper_lock */ -static void timekeeping_update(struct timekeeper *tk, bool clearntp, bool mirror) +static void timekeeping_update(struct timekeeper *tk, bool clearntp, + bool mirror, bool cws) { if (clearntp) { tk->ntp_error = 0; ntp_clear(); } update_vsyscall(tk); - update_pvclock_gtod(tk); + update_pvclock_gtod(tk, cws); if (mirror) memcpy(&shadow_timekeeper, &timekeeper, sizeof(timekeeper)); @@ -508,7 +509,7 @@ int do_settimeofday(const struct timespec *tv) tk_set_xtime(tk, tv); - timekeeping_update(tk, true, true); + timekeeping_update(tk, true, true, true); write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); @@ -552,7 +553,7 @@ int timekeeping_inject_offset(struct timespec *ts) tk_set_wall_to_mono(tk, timespec_sub(tk->wall_to_monotonic, *ts)); error: /* even if we error out, we forwarded the time, so call update */ - timekeeping_update(tk, true, true); + timekeeping_update(tk, true, true, true); write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); @@ -633,7 +634,7 @@ static int change_clocksource(void *data) if (old->disable) old->disable(old); } - timekeeping_update(tk, true, true); + timekeeping_update(tk, true, true, true); write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); @@ -872,7 +873,7 @@ void timekeeping_inject_sleeptime(struct timespec *delta) __timekeeping_inject_sleeptime(tk, delta); - timekeeping_update(tk, true, true); + timekeeping_update(tk, true, true, true); write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); @@ -954,7 +955,7 @@ static void timekeeping_resume(void) tk->cycle_last = clock->cycle_last = cycle_now; tk->ntp_error = 0; timekeeping_suspended = 0; - timekeeping_update(tk, false, true); + timekeeping_update(tk, false, true, true); write_seqcount_end(&timekeeper_seq); raw_spin_unlock_irqrestore(&timekeeper_lock, flags); @@ -1236,9 +1237,10 @@ out_adjust: * It also calls into the NTP code to handle leapsecond processing. * */ -static inline void accumulate_nsecs_to_secs(struct timekeeper *tk) +static inline bool accumulate_nsecs_to_secs(struct timekeeper *tk) { u64 nsecps = (u64)NSEC_PER_SEC << tk->shift; + bool ret = false; while (tk->xtime_nsec >= nsecps) { int leap; @@ -1261,8 +1263,10 @@ static inline void accumulate_nsecs_to_secs(struct timekeeper *tk) __timekeeping_set_tai_offset(tk, tk->tai_offset - leap); clock_was_set_delayed(); + ret = true; } } + return ret; } /** @@ -1348,6 +1352,7 @@ static void update_wall_time(void) cycle_t offset; int shift = 0, maxshift; unsigned long flags; + bool cws; raw_spin_lock_irqsave(&timekeeper_lock, flags); @@ -1399,7 +1404,7 @@ static void update_wall_time(void) * Finally, make sure that after the rounding * xtime_nsec isn''t larger than NSEC_PER_SEC */ - accumulate_nsecs_to_secs(tk); + cws = accumulate_nsecs_to_secs(tk); write_seqcount_begin(&timekeeper_seq); /* Update clock->cycle_last with the new value */ @@ -1415,7 +1420,7 @@ static void update_wall_time(void) * updating. */ memcpy(real_tk, tk, sizeof(*tk)); - timekeeping_update(real_tk, false, false); + timekeeping_update(real_tk, false, false, cws); write_seqcount_end(&timekeeper_seq); out: raw_spin_unlock_irqrestore(&timekeeper_lock, flags); @@ -1677,6 +1682,7 @@ int do_adjtimex(struct timex *txc) if (tai != orig_tai) { __timekeeping_set_tai_offset(tk, tai); + update_pvclock_gtod(tk, true); clock_was_set_delayed(); } write_seqcount_end(&timekeeper_seq);
David Vrabel
2013-Jun-24 17:00 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On 24/06/13 17:30, Thomas Gleixner wrote:> > We already have a synchronous notifier in place and the notifier call > itself is not expensive. What''s expensive is the hypercall and there > is no way at the moment to figure out whether the update is relevant > for you or just a tick. Though that''s trivial information to provide > without imposing another notifier including the surrounding mess on > the core code.This looks good, thanks.> --- a/kernel/time/timekeeping.c > +++ b/kernel/time/timekeeping.c[...]> @@ -508,7 +509,7 @@ int do_settimeofday(const struct timespec *tv) > > tk_set_xtime(tk, tv); > > - timekeeping_update(tk, true, true); > + timekeeping_update(tk, true, true, true);These three booleans in a row is getting a bit opaque. How about I also change it to a set of flags? e.g., timekeeping_updated(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET); David
John Stultz
2013-Jun-24 17:50 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On 06/24/2013 10:00 AM, David Vrabel wrote:> On 24/06/13 17:30, Thomas Gleixner wrote: >> @@ -508,7 +509,7 @@ int do_settimeofday(const struct timespec *tv) >> >> tk_set_xtime(tk, tv); >> >> - timekeeping_update(tk, true, true); >> + timekeeping_update(tk, true, true, true); > These three booleans in a row is getting a bit opaque. How about I also > change it to a set of flags? e.g., > > timekeeping_updated(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);Yea. I''m not a fan of the bool arguments to functions (which I have to look up every time as which bool is which). The bitflag approach is nicer in my mind, since its a bit more explicit when reading the code. The other approach would be to have different function calls (timekeeping_clear_ntp, timekeeping_mirror, timekeeping_clock_was_set), which call into the same back end logic. thanks -john
Thomas Gleixner
2013-Jun-24 19:55 UTC
Re: [PATCH 2/4] time: add a notifier chain for when the system time is stepped
On Mon, 24 Jun 2013, David Vrabel wrote:> On 24/06/13 17:30, Thomas Gleixner wrote: > > > > We already have a synchronous notifier in place and the notifier call > > itself is not expensive. What''s expensive is the hypercall and there > > is no way at the moment to figure out whether the update is relevant > > for you or just a tick. Though that''s trivial information to provide > > without imposing another notifier including the surrounding mess on > > the core code. > > This looks good, thanks. > > > --- a/kernel/time/timekeeping.c > > +++ b/kernel/time/timekeeping.c > [...] > > @@ -508,7 +509,7 @@ int do_settimeofday(const struct timespec *tv) > > > > tk_set_xtime(tk, tv); > > > > - timekeeping_update(tk, true, true); > > + timekeeping_update(tk, true, true, true); > > These three booleans in a row is getting a bit opaque. How about I also > change it to a set of flags? e.g., > > timekeeping_updated(tk, TK_CLEAR_NTP | TK_MIRROR | TK_CLOCK_WAS_SET);Fair enough. Can you convert the existing booleans first and then put the CLOCK_WAS_SET patch on top of that? Thanks, tglx
Apparently Analagous Threads
- [PATCH 2/5] time: pass flags instead of multiple bools to timekeeping_update()
- [RFC][PATCH 3/3] timekeeping: Fix potential lost pv notification of time change
- [RFC][PATCH 2/5] timekeeping: Fix potential lost pv notification of time change
- Was: Re: [GIT PULL] timer changes for v3.6, Is: Regression introduced by 1e75fa8be9fb61e1af46b5b3b176347a4c958ca1
- [PATCH] xen: use freeze/restore/thaw PM events for suspend/resume/chkpt