Jeremy Fitzhardinge
2007-Apr-18 13:02 UTC
[patch 3/4] Locally disable the softlockup watchdog rather than touching it
If we're about to enter a prolonged period in which we know we're going to hold off scheduler ticks, then disable the CPU's softlockup watchdog for the duration. This avoids having to repeatedly touch the timestamp, or conversely, risk having the watchdog timeout in the middle of your long operation. A question about each of these changes is whether they expect just the current CPU to be locked up, or the whole machine. touch_softlockup_watchdog updates the per-cpu timestamp, so I assumed that its per-cpu for all these cases. One semantic change this makes is that softlockup_disable/enable does a preempt_disable/enable. I don't think this is a problem, since if you're worried about triggering the softlockup watchdog, you're not scheduling. I haven't really worked out how this should interact with the nmi watchdog; touch_nmi_watchdog() still ends up calling touch_softlockup_watchdog(), so there's still some redundancy here. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Cc: John Hawkes <hawkes@sgi.com> Cc: Andrew Morton <akpm@linux-foundation.org> --- arch/ia64/kernel/uncached.c | 8 ++++++-- drivers/ide/ide-iops.c | 25 +++++++++++++++++++------ drivers/ide/ide-taskfile.c | 8 +++++++- drivers/mtd/nand/nand_base.c | 8 +++++++- 4 files changed, 39 insertions(+), 10 deletions(-) ==================================================================--- a/arch/ia64/kernel/uncached.c +++ b/arch/ia64/kernel/uncached.c @@ -253,13 +253,17 @@ static int __init uncached_build_memmap( int nid = paddr_to_nid(uc_start - __IA64_UNCACHED_OFFSET); struct gen_pool *pool = uncached_pools[nid].pool; size_t size = uc_end - uc_start; - - touch_softlockup_watchdog(); + int sl_state; + + sl_state = softlockup_disable(); if (pool != NULL) { memset((char *)uc_start, 0, size); (void) gen_pool_add(pool, uc_start, size, nid); } + + softlockup_enable(sl_state); + return 0; } ==================================================================--- a/drivers/ide/ide-iops.c +++ b/drivers/ide/ide-iops.c @@ -1215,6 +1215,12 @@ int ide_wait_not_busy(ide_hwif_t *hwif, int ide_wait_not_busy(ide_hwif_t *hwif, unsigned long timeout) { u8 stat = 0; + int ret = -EBUSY; + int sl_state; + + /* Tell softlockup this might take a while. + XXX should this be global or per-cpu? */ + sl_state = softlockup_disable(); while(timeout--) { /* @@ -1223,19 +1229,26 @@ int ide_wait_not_busy(ide_hwif_t *hwif, */ mdelay(1); stat = hwif->INB(hwif->io_ports[IDE_STATUS_OFFSET]); - if ((stat & BUSY_STAT) == 0) - return 0; + if ((stat & BUSY_STAT) == 0) { + ret = 0; + break; + } + /* * Assume a value of 0xff means nothing is connected to * the interface and it doesn't implement the pull-down * resistor on D7. */ - if (stat == 0xff) - return -ENODEV; - touch_softlockup_watchdog(); + if (stat == 0xff) { + ret = -ENODEV; + break; + } touch_nmi_watchdog(); } - return -EBUSY; + + softlockup_enable(sl_state); + + return ret; } EXPORT_SYMBOL_GPL(ide_wait_not_busy); ==================================================================--- a/drivers/ide/ide-taskfile.c +++ b/drivers/ide/ide-taskfile.c @@ -310,10 +310,14 @@ static void ide_pio_datablock(ide_drive_ static void ide_pio_datablock(ide_drive_t *drive, struct request *rq, unsigned int write) { + int sl_state; + if (rq->bio) /* fs request */ rq->errors = 0; - touch_softlockup_watchdog(); + /* Tell softlockup this might take a while. + XXX should this be global or per-cpu? */ + sl_state = softlockup_disable(); switch (drive->hwif->data_phase) { case TASKFILE_MULTI_IN: @@ -324,6 +328,8 @@ static void ide_pio_datablock(ide_drive_ ide_pio_sector(drive, write); break; } + + softlockup_enable(sl_state); } static ide_startstop_t task_error(ide_drive_t *drive, struct request *rq, ==================================================================--- a/drivers/mtd/nand/nand_base.c +++ b/drivers/mtd/nand/nand_base.c @@ -419,15 +419,21 @@ void nand_wait_ready(struct mtd_info *mt { struct nand_chip *chip = mtd->priv; unsigned long timeo = jiffies + 2; + int sl_state; + + /* Tell softlockup this might take a while. + XXX should this be global or per-cpu? */ + sl_state = softlockup_disable(); led_trigger_event(nand_led_trigger, LED_FULL); /* wait until command is processed or timeout occures */ do { if (chip->dev_ready(mtd)) break; - touch_softlockup_watchdog(); } while (time_before(jiffies, timeo)); led_trigger_event(nand_led_trigger, LED_OFF); + + softlockup_enable(sl_state); } EXPORT_SYMBOL_GPL(nand_wait_ready); --
Jeremy Fitzhardinge
2007-Apr-18 13:02 UTC
[patch 4/4] Add global disable/enable for softlockup watchdog
Some machine-wide activities can cause spurious softlockup watchdog warnings, so add a mechanism to allow the watchdog to be disabled. The most obvious activity is suspend/resume, but long sysrq output can also stall the system long enough to cause problems. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Eric Dumazet <dada1@cosmosbay.com> --- drivers/char/sysrq.c | 8 +++++ include/linux/sched.h | 10 ++++++ kernel/panic.c | 3 +- kernel/power/swsusp.c | 3 +- kernel/softlockup.c | 72 ++++++++++++++++++++++++++++++++++++++++++------- kernel/timer.c | 4 ++ 6 files changed, 87 insertions(+), 13 deletions(-) ==================================================================--- a/drivers/char/sysrq.c +++ b/drivers/char/sysrq.c @@ -211,7 +211,11 @@ static struct sysrq_key_op sysrq_showreg static void sysrq_handle_showstate(int key, struct tty_struct *tty) { + softlockup_global_disable(); /* may take a while */ + show_state(); + + softlockup_global_enable(); } static struct sysrq_key_op sysrq_showstate_op = { .handler = sysrq_handle_showstate, @@ -222,7 +226,11 @@ static struct sysrq_key_op sysrq_showsta static void sysrq_handle_showstate_blocked(int key, struct tty_struct *tty) { + softlockup_global_disable(); /* may take a while */ + show_state_filter(TASK_UNINTERRUPTIBLE); + + softlockup_global_enable(); } static struct sysrq_key_op sysrq_showstate_blocked_op = { .handler = sysrq_handle_showstate_blocked, ==================================================================--- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -243,6 +243,10 @@ extern int softlockup_disable(void); extern int softlockup_disable(void); extern void softlockup_enable(int state); +/* Disable/re-enable softlockup watchdog on all CPUs */ +extern void softlockup_global_disable(void); +extern void softlockup_global_enable(void); + extern void spawn_softlockup_task(void); extern void touch_softlockup_watchdog(void); #else @@ -263,6 +267,12 @@ static inline void softlockup_enable(int static inline void softlockup_enable(int state) { preempt_enable(); +} +static inline void softlockup_global_enable(void) +{ +} +static inline void softlockup_global_disable(void) +{ } static inline void spawn_softlockup_task(void) { ==================================================================--- a/kernel/panic.c +++ b/kernel/panic.c @@ -131,8 +131,9 @@ NORET_TYPE void panic(const char * fmt, disabled_wait(caller); #endif local_irq_enable(); + softlockup_global_disable(); + for (i = 0;;) { - touch_softlockup_watchdog(); i += panic_blink(i); mdelay(1); i++; ==================================================================--- a/kernel/power/swsusp.c +++ b/kernel/power/swsusp.c @@ -289,6 +289,7 @@ int swsusp_suspend(void) * that suspended with irqs off ... no overall powerup. */ device_power_up(); + softlockup_global_disable(); Enable_irqs: local_irq_enable(); return error; @@ -323,7 +324,7 @@ int swsusp_resume(void) */ swsusp_free(); restore_processor_state(); - touch_softlockup_watchdog(); + softlockup_global_enable(); device_power_up(); local_irq_enable(); return error; ==================================================================--- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -17,10 +17,21 @@ static DEFINE_SPINLOCK(print_lock); +/* + * Since sched_clock() is inherently per-cpu, its not possible to + * update another CPU's timestamp. To deal with this, we add an extra + * state meaning "enabled, but timestamp needs update". + */ +enum state { + SL_OFF = 0, /* disabled */ + SL_UPDATE, /* enabled, but timestamp old */ + SL_ON, /* enabled */ +}; + static DEFINE_PER_CPU(unsigned long, touch_timestamp); static DEFINE_PER_CPU(unsigned long, print_timestamp); static DEFINE_PER_CPU(struct task_struct *, watchdog_task); -static DEFINE_PER_CPU(int, enabled); +static DEFINE_PER_CPU(enum state, softlock_state); static int did_panic = 0; @@ -48,7 +59,12 @@ static unsigned long get_timestamp(void) void inline touch_softlockup_watchdog(void) { + if (__raw_get_cpu_var(softlock_state) == SL_OFF) + return; + __raw_get_cpu_var(touch_timestamp) = get_timestamp(); + barrier(); + __raw_get_cpu_var(softlock_state) = SL_ON; } EXPORT_SYMBOL(touch_softlockup_watchdog); @@ -58,7 +74,7 @@ EXPORT_SYMBOL(touch_softlockup_watchdog) */ void inline softlockup_tick_disable(void) { - __get_cpu_var(enabled) = 0; + __get_cpu_var(softlock_state) = SL_OFF; } /* @@ -73,7 +89,7 @@ int softlockup_disable(void) preempt_disable(); - ret = __get_cpu_var(enabled); + ret = __get_cpu_var(softlock_state) == SL_OFF; softlockup_tick_disable(); return ret; @@ -86,7 +102,7 @@ EXPORT_SYMBOL(softlockup_disable); */ void inline softlockup_tick_enable(void) { - __get_cpu_var(enabled) = 1; + __get_cpu_var(softlock_state) = SL_UPDATE; } /* @@ -96,15 +112,41 @@ void softlockup_enable(int state) void softlockup_enable(int state) { if (state) { - touch_softlockup_watchdog(); - /* update timestamp before enable */ - barrier(); softlockup_tick_enable(); + touch_softlockup_watchdog(); } preempt_enable(); } EXPORT_SYMBOL(softlockup_enable); + +/* + * Disable softlockup watchdog on all CPUs. This is useful for + * globally disruptive activities, like suspend/resume or large sysrq + * debug outputs. + */ +void softlockup_global_disable() +{ + unsigned cpu; + + for_each_online_cpu(cpu) + per_cpu(softlock_state, cpu) = SL_OFF; +} +EXPORT_SYMBOL(softlockup_global_disable); + +/* + * Globally re-enable soft lockups. This will obviously interfere + * with any CPU's local softlockup disable, but with luck that won't + * matter. + */ +void softlockup_global_enable() +{ + unsigned cpu; + + for_each_online_cpu(cpu) + per_cpu(softlock_state, cpu) = SL_UPDATE; +} +EXPORT_SYMBOL(softlockup_global_enable); /* * This callback runs from the timer interrupt, and checks @@ -117,9 +159,19 @@ void softlockup_tick(void) unsigned long print_timestamp; unsigned long now; - /* return if not enabled */ - if (!__get_cpu_var(enabled)) - return; + switch(__get_cpu_var(softlock_state)) { + case SL_OFF: + /* not enabled */ + return; + + case SL_UPDATE: + /* update timestamp */ + touch_softlockup_watchdog(); + return; + + case SL_ON: + break; + } print_timestamp = __get_cpu_var(print_timestamp); ==================================================================--- a/kernel/timer.c +++ b/kernel/timer.c @@ -1011,7 +1011,7 @@ static int timekeeping_resume(struct sys timekeeping_suspended = 0; write_sequnlock_irqrestore(&xtime_lock, flags); - touch_softlockup_watchdog(); + softlockup_global_enable(); clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL); @@ -1029,6 +1029,8 @@ static int timekeeping_suspend(struct sy timekeeping_suspended = 1; timekeeping_suspend_time = read_persistent_clock(); write_sequnlock_irqrestore(&xtime_lock, flags); + + softlockup_global_disable(); clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL); --
Jeremy Fitzhardinge
2007-Apr-18 13:02 UTC
[patch 1/4] Ignore stolen time in the softlockup watchdog
The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> --- kernel/softlockup.c | 37 ++++++++++++++++++++++++++++++------- 1 file changed, 30 insertions(+), 7 deletions(-) ==================================================================--- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -35,9 +35,19 @@ static struct notifier_block panic_block .notifier_call = softlock_panic, }; +/* + * Returns seconds, approximately. We don't need nanosecond + * resolution, and we don't need to waste time with a big divide when + * 2^30ns == 1.074s. + */ +static unsigned long get_timestamp(void) +{ + return sched_clock() >> 30; /* 2^30 ~= 10^9 */ +} + void touch_softlockup_watchdog(void) { - __raw_get_cpu_var(touch_timestamp) = jiffies; + __raw_get_cpu_var(touch_timestamp) = get_timestamp(); } EXPORT_SYMBOL(touch_softlockup_watchdog); @@ -48,10 +58,18 @@ void softlockup_tick(void) void softlockup_tick(void) { int this_cpu = smp_processor_id(); - unsigned long touch_timestamp = per_cpu(touch_timestamp, this_cpu); + unsigned long touch_timestamp = __get_cpu_var(touch_timestamp); + unsigned long print_timestamp; + unsigned long now; - /* prevent double reports: */ - if (per_cpu(print_timestamp, this_cpu) == touch_timestamp || + /* watchdog task hasn't updated timestamp yet */ + if (touch_timestamp == 0) + return; + + print_timestamp = __get_cpu_var(print_timestamp); + + /* report at most once a second */ + if (print_timestamp < (touch_timestamp + 1) || did_panic || !per_cpu(watchdog_task, this_cpu)) return; @@ -62,12 +80,14 @@ void softlockup_tick(void) return; } + now = get_timestamp(); + /* Wake up the high-prio watchdog task every second: */ - if (time_after(jiffies, touch_timestamp + HZ)) + if (now > (touch_timestamp + 1)) wake_up_process(per_cpu(watchdog_task, this_cpu)); /* Warn about unreasonable 10+ seconds delays: */ - if (time_after(jiffies, touch_timestamp + 10*HZ)) { + if (now > (touch_timestamp + 10)) { per_cpu(print_timestamp, this_cpu) = touch_timestamp; spin_lock(&print_lock); @@ -87,6 +107,9 @@ static int watchdog(void * __bind_cpu) sched_setscheduler(current, SCHED_FIFO, ¶m); current->flags |= PF_NOFREEZE; + + /* initialize timestamp */ + touch_softlockup_watchdog(); /* * Run briefly once per second to reset the softlockup timestamp. @@ -120,7 +143,7 @@ cpu_callback(struct notifier_block *nfb, printk("watchdog for %i failed\n", hotcpu); return NOTIFY_BAD; } - per_cpu(touch_timestamp, hotcpu) = jiffies; + per_cpu(touch_timestamp, hotcpu) = 0; per_cpu(watchdog_task, hotcpu) = p; kthread_bind(p, hotcpu); break; --
Jeremy Fitzhardinge
2007-Apr-18 13:02 UTC
[patch 2/4] percpu enable flag for softlockup watchdog
On a NO_HZ system, there may be an arbitrarily long delay between ticks on a CPU. When we're disabling ticks for a CPU, also disable the softlockup watchdog timer. This makes the touch_softlockup_watchdog() interface redundant; if a piece of code knows its going to be holding off timer interrupts long enough to trigger the watchdog, then it may as well simply temporarily disable the timer. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Eric Dumazet <dada1@cosmosbay.com> --- include/linux/sched.h | 25 +++++++++++++++++ kernel/softlockup.c | 67 ++++++++++++++++++++++++++++++++++++++++++---- kernel/time/tick-sched.c | 34 ++++++++++------------- 3 files changed, 102 insertions(+), 24 deletions(-) ==================================================================--- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -232,12 +232,37 @@ extern void scheduler_tick(void); extern void scheduler_tick(void); #ifdef CONFIG_DETECT_SOFTLOCKUP +/* A scheduler tick on this CPU */ extern void softlockup_tick(void); + +/* Scheduler is disabling/re-enabling ticks on this CPU */ +extern void softlockup_tick_disable(void); +extern void softlockup_tick_enable(void); + +/* Some code wants to temporarily disable the watchdog */ +extern int softlockup_disable(void); +extern void softlockup_enable(int state); + extern void spawn_softlockup_task(void); extern void touch_softlockup_watchdog(void); #else static inline void softlockup_tick(void) { +} +static inline void softlockup_tick_disable(void) +{ +} +static inline void softlockup_tick_enable(void) +{ +} +static inline int softlockup_disable(void) +{ + preempt_disable(); + return 0; +} +static inline void softlockup_enable(int state) +{ + preempt_enable(); } static inline void spawn_softlockup_task(void) { ==================================================================--- a/kernel/softlockup.c +++ b/kernel/softlockup.c @@ -20,6 +20,7 @@ static DEFINE_PER_CPU(unsigned long, tou static DEFINE_PER_CPU(unsigned long, touch_timestamp); static DEFINE_PER_CPU(unsigned long, print_timestamp); static DEFINE_PER_CPU(struct task_struct *, watchdog_task); +static DEFINE_PER_CPU(int, enabled); static int did_panic = 0; @@ -45,11 +46,65 @@ static unsigned long get_timestamp(void) return sched_clock() >> 30; /* 2^30 ~= 10^9 */ } -void touch_softlockup_watchdog(void) +void inline touch_softlockup_watchdog(void) { __raw_get_cpu_var(touch_timestamp) = get_timestamp(); } EXPORT_SYMBOL(touch_softlockup_watchdog); + +/* + * Disable the watchdog on this CPU. This is called directly by the + * scheduler to tell us it's going tickless. + */ +void inline softlockup_tick_disable(void) +{ + __get_cpu_var(enabled) = 0; +} + +/* + * Disable the watchdog for this CPU, returning the current state to + * allow nesting. Returns with preemptio disabled, since we can't + * switch CPUs before we re-enable the watchdog (also, if we're + * worried about getting watchdog timeouts, we're not scheduling). + */ +int softlockup_disable(void) +{ + int ret; + + preempt_disable(); + + ret = __get_cpu_var(enabled); + softlockup_tick_disable(); + + return ret; +} +EXPORT_SYMBOL(softlockup_disable); + +/* + * Re-enable the watchdog on this CPU. Called directly by the + * scheduler to tell us ticks are resuming. + */ +void inline softlockup_tick_enable(void) +{ + __get_cpu_var(enabled) = 1; +} + +/* + * Returns softlockup watchdog state to before the last per-cpu + * disable. + */ +void softlockup_enable(int state) +{ + if (state) { + touch_softlockup_watchdog(); + /* update timestamp before enable */ + barrier(); + softlockup_tick_enable(); + } + + preempt_enable(); +} +EXPORT_SYMBOL(softlockup_enable); /* * This callback runs from the timer interrupt, and checks @@ -62,8 +117,8 @@ void softlockup_tick(void) unsigned long print_timestamp; unsigned long now; - /* watchdog task hasn't updated timestamp yet */ - if (touch_timestamp == 0) + /* return if not enabled */ + if (!__get_cpu_var(enabled)) return; print_timestamp = __get_cpu_var(print_timestamp); @@ -108,8 +163,8 @@ static int watchdog(void * __bind_cpu) sched_setscheduler(current, SCHED_FIFO, ¶m); current->flags |= PF_NOFREEZE; - /* initialize timestamp */ - touch_softlockup_watchdog(); + /* enable on this cpu */ + softlockup_tick_enable(); /* * Run briefly once per second to reset the softlockup timestamp. @@ -122,6 +177,8 @@ static int watchdog(void * __bind_cpu) touch_softlockup_watchdog(); schedule(); } + + softlockup_tick_disable(); return 0; } ==================================================================--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -228,6 +228,8 @@ void tick_nohz_stop_sched_tick(void) ts->idle_tick = ts->sched_timer.expires; ts->tick_stopped = 1; ts->idle_jiffies = last_jiffies; + + softlockup_tick_disable(); } /* * calculate the expiry time for the next timer wheel @@ -255,6 +257,7 @@ void tick_nohz_stop_sched_tick(void) cpu_clear(cpu, nohz_cpu_mask); } raise_softirq_irqoff(TIMER_SOFTIRQ); + out: ts->next_jiffies = next_jiffies; ts->last_jiffies = last_jiffies; @@ -311,6 +314,8 @@ void tick_nohz_restart_sched_tick(void) ts->tick_stopped = 0; hrtimer_cancel(&ts->sched_timer); ts->sched_timer.expires = ts->idle_tick; + + softlockup_tick_enable(); while (1) { /* Forward the time to expire in the future */ @@ -355,17 +360,12 @@ static void tick_nohz_handler(struct clo tick_do_update_jiffies64(now); /* - * When we are idle and the tick is stopped, we have to touch - * the watchdog as we might not schedule for a really long - * time. This happens on complete idle SMP systems while - * waiting on the login prompt. We also increment the "start - * of idle" jiffy stamp so the idle accounting adjustment we - * do when we go busy again does not account too much ticks. - */ - if (ts->tick_stopped) { - touch_softlockup_watchdog(); + * Increment the "start of idle" jiffy stamp so the idle + * accounting adjustment we do when we go busy again does not + * account too much ticks. + */ + if (ts->tick_stopped) ts->idle_jiffies++; - } update_process_times(user_mode(regs)); profile_tick(CPU_PROFILING); @@ -450,17 +450,12 @@ static enum hrtimer_restart tick_sched_t */ if (regs) { /* - * When we are idle and the tick is stopped, we have to touch - * the watchdog as we might not schedule for a really long - * time. This happens on complete idle SMP systems while - * waiting on the login prompt. We also increment the "start of - * idle" jiffy stamp so the idle accounting adjustment we do - * when we go busy again does not account too much ticks. + * Increment the "start of idle" jiffy stamp so the + * idle accounting adjustment we do when we go busy + * again does not account too much ticks. */ - if (ts->tick_stopped) { - touch_softlockup_watchdog(); + if (ts->tick_stopped) ts->idle_jiffies++; - } /* * update_process_times() might take tasklist_lock, hence * drop the base lock. sched-tick hrtimers are per-CPU and @@ -522,6 +517,7 @@ void tick_cancel_sched_timer(int cpu) if (ts->sched_timer.base) hrtimer_cancel(&ts->sched_timer); ts->tick_stopped = 0; + softlockup_tick_enable(); ts->nohz_mode = NOHZ_MODE_INACTIVE; } #endif /* HIGH_RES_TIMERS */ --
Jeremy Fitzhardinge
2007-Apr-18 13:02 UTC
[patch 0/4] Revised softlockup watchdog improvement patches
Hi Ingo, This series of patches implements a number of improvements to the softlockup watchdog and its users. They are: 1. Make the watchdog ignore stolen time When running under a hypervisor, the kernel may lose an arbitrary amount of time as "stolen time". This may cause the softlockup watchdog to trigger spruiously. Xen and VMI implement sched_clock() as measuring unstolen time, so use that as the timebase for the softlockup watchdog. This also removes a dependency on jiffies. 2. Add a per-cpu enable flag for the watchdog When the scheduler disables ticks for a specific CPU, this allows the watchdog to be disabled as well. This avoids spurious watchdog errors if a CPU has been tickless for a long time. The existing tick-sched.c code tried to address this by periodically touching the watchdog, but this seems more direct. 3. Use the per-cpu enable flags to temporarily disable the watchdog Some drivers perform long operations which will cause the watchdog to time out. If we know we're about to start such an operation, disable the watchdog timer for the interim. This is more straightforward than trying to periodically tickle the watchdog timer during the operation, and safer than just doing it once at the start and hoping there's enough time. 4. Add a global disable flag Sometimes a long operation will affect all CPUs' ability to tickle the watchdog timer. An obvious example is suspend/resume, but it can also happen while generating lots of sysrq output, or other specialized operations. Add global enable/disable calls to deal with these case. --
Andrew Morton
2007-Apr-23 23:52 UTC
[patch 1/4] Ignore stolen time in the softlockup watchdog
On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge <jeremy@goop.org> wrote:> The softlockup watchdog is currently a nuisance in a virtual machine, > since the whole system could have the CPU stolen from it for a long > period of time. While it would be unlikely for a guest domain to be > denied timer interrupts for over 10s, it could happen and any softlockup > message would be completely spurious. > > Earlier I proposed that sched_clock() return time in unstolen > nanoseconds, which is how Xen and VMI currently implement it. If the > softlockup watchdog uses sched_clock() to measure time, it would > automatically ignore stolen time, and therefore only report when the > guest itself locked up. When running native, sched_clock() returns > real-time nanoseconds, so the behaviour would be unchanged. > > Note that sched_clock() used this way is inherently per-cpu, so this > patch makes sure that the per-processor watchdog thread initialized > its own timestamp.This patch (ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch) causes six failures in the locking self-tests, which I must say is rather clever of it. Here's the first one: [17179569.184000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar [17179569.184000] ... MAX_LOCKDEP_SUBCLASSES: 8 [17179569.184000] ... MAX_LOCK_DEPTH: 30 [17179569.184000] ... MAX_LOCKDEP_KEYS: 2048 [17179569.184000] ... CLASSHASH_SIZE: 1024 [17179569.184000] ... MAX_LOCKDEP_ENTRIES: 8192 [17179569.184000] ... MAX_LOCKDEP_CHAINS: 16384 [17179569.184000] ... CHAINHASH_SIZE: 8192 [17179569.184000] memory used by lock dependency info: 992 kB [17179569.184000] per task-struct memory footprint: 1200 bytes [17179569.184000] ------------------------ [17179569.184000] | Locking API testsuite: [17179569.184000] ---------------------------------------------------------------------------- [17179569.184000] | spin |wlock |rlock |mutex | wsem | rsem | [17179569.184000] -------------------------------------------------------------------------- [17179569.184000] A-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184000] A-B-B-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184000] A-B-B-C-C-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184001] A-B-C-A-B-C deadlock: ok | ok | ok | ok | ok | ok | [17179569.184002] A-B-B-C-C-D-D-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184003] A-B-C-D-B-D-D-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184004] A-B-C-D-B-C-D-A deadlock: ok | ok | ok | ok | ok | ok | [17179569.184005] double unlock: ok | ok | ok | ok | ok | ok | [17179569.184006] initialize held: ok | ok | ok | ok | ok | ok | [17179569.184006] bad unlock order: ok | ok | ok | ok | ok | ok | [17179569.184006] -------------------------------------------------------------------------- [17179569.184006] recursive read-lock: | ok | | ok | [17179569.184006] recursive read-lock #2: | ok | | ok | [17179569.184007] mixed read-write-lock: | ok | | ok | [17179569.184007] mixed write-read-lock: | ok | | ok | [17179569.184007] -------------------------------------------------------------------------- [17179569.184007] hard-irqs-on + irq-safe-A/12: ok | ok | ok | [17179569.184007] soft-irqs-on + irq-safe-A/12: ok | ok | ok | [17179569.184007] hard-irqs-on + irq-safe-A/21: ok | ok | ok | [17179569.184007] soft-irqs-on + irq-safe-A/21: ok | ok | ok | [17179569.184007] sirq-safe-A => hirqs-on/12: ok | ok |irq event stamp: 458 [17179569.184007] hardirqs last enabled at (458): [<c01e4116>] irqsafe2A_rlock_12+0x96/0xa3 [17179569.184007] hardirqs last disabled at (457): [<c01095b9>] sched_clock+0x5e/0xe9 [17179569.184007] softirqs last enabled at (454): [<c01e4101>] irqsafe2A_rlock_12+0x81/0xa3 [17179569.184007] softirqs last disabled at (450): [<c01e408b>] irqsafe2A_rlock_12+0xb/0xa3 [17179569.184007] FAILED| [<c0104cf0>] dump_trace+0x63/0x1ec [17179569.184007] [<c0104e93>] show_trace_log_lvl+0x1a/0x30 [17179569.184007] [<c01059ec>] show_trace+0x12/0x14 [17179569.184007] [<c0105a45>] dump_stack+0x16/0x18 [17179569.184007] [<c01e1eb5>] dotest+0x6b/0x3d0 [17179569.184007] [<c01eb249>] locking_selftest+0x915/0x1a58 [17179569.184007] [<c048c979>] start_kernel+0x1d0/0x2a2 [17179569.184007] ======================[17179569.184007] [17179569.184007] sirq-safe-A => hirqs-on/21:irq event stamp: 462 [17179569.184007] hardirqs last enabled at (462): [<c01e3eb2>] irqsafe2A_spin_21+0x1b/0xa3 [17179569.184007] hardirqs last disabled at (461): [<c01095b9>] sched_clock+0x5e/0xe9 [17179569.184007] softirqs last enabled at (454): [<c01e4101>] irqsafe2A_rlock_12+0x81/0xa3 [17179569.184007] softirqs last disabled at (450): [<c01e408b>] irqsafe2A_rlock_12+0xb/0xa3 [17179569.184007] ok |irq event stamp: 466 It's a challenge to even find the code which corresponds with this failure, so good luck. It seems fairly sensitive to .config settings. See http://userweb.kernel.org/~akpm/config-sony.txt Unfortunately this causes lockdep to disable itself, thus hiding all the other bugs which people have contributed, so I'll drop your patch for now.
Maybe Matching Threads
- [patch 0/4] Revised softlockup watchdog improvement patches
- [patch 0/2] softlockup watchdog improvements
- [patch 0/2] softlockup watchdog improvements
- [PATCH RFC] Change softlockup watchdog to ignore stolen time
- [PATCH RFC] Change softlockup watchdog to ignore stolen time