thr3ads.net - Linux Virtualization - [patch 0/4] Revised softlockup watchdog improvement patches [Apr 2007]

If this information is useful, please help other people find it:
Share via:

Jeremy Fitzhardinge

2007-Apr-18 13:02 UTC

[patch 3/4] Locally disable the softlockup watchdog rather than touching it

If we're about to enter a prolonged period in which we know we're
going to hold off scheduler ticks, then disable the CPU's softlockup
watchdog for the duration.  This avoids having to repeatedly touch the
timestamp, or conversely, risk having the watchdog timeout in the
middle of your long operation.

A question about each of these changes is whether they expect just the
current CPU to be locked up, or the whole machine.
touch_softlockup_watchdog updates the per-cpu timestamp, so I assumed
that its per-cpu for all these cases.

One semantic change this makes is that softlockup_disable/enable does
a preempt_disable/enable.  I don't think this is a problem, since if
you're worried about triggering the softlockup watchdog, you're not
scheduling.

I haven't really worked out how this should interact with the nmi
watchdog; touch_nmi_watchdog() still ends up calling
touch_softlockup_watchdog(), so there's still some redundancy here.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
Cc: John Hawkes <hawkes@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>

---
 arch/ia64/kernel/uncached.c  |    8 ++++++--
 drivers/ide/ide-iops.c       |   25 +++++++++++++++++++------
 drivers/ide/ide-taskfile.c   |    8 +++++++-
 drivers/mtd/nand/nand_base.c |    8 +++++++-
 4 files changed, 39 insertions(+), 10 deletions(-)

==================================================================---
a/arch/ia64/kernel/uncached.c
+++ b/arch/ia64/kernel/uncached.c
@@ -253,13 +253,17 @@ static int __init uncached_build_memmap(
 	int nid = paddr_to_nid(uc_start - __IA64_UNCACHED_OFFSET);
 	struct gen_pool *pool = uncached_pools[nid].pool;
 	size_t size = uc_end - uc_start;
-
-	touch_softlockup_watchdog();
+	int sl_state;
+
+	sl_state = softlockup_disable();
 
 	if (pool != NULL) {
 		memset((char *)uc_start, 0, size);
 		(void) gen_pool_add(pool, uc_start, size, nid);
 	}
+
+	softlockup_enable(sl_state);
+
 	return 0;
 }
 
==================================================================---
a/drivers/ide/ide-iops.c
+++ b/drivers/ide/ide-iops.c
@@ -1215,6 +1215,12 @@ int ide_wait_not_busy(ide_hwif_t *hwif, 
 int ide_wait_not_busy(ide_hwif_t *hwif, unsigned long timeout)
 {
 	u8 stat = 0;
+	int ret = -EBUSY;
+	int sl_state;
+
+	/* Tell softlockup this might take a while.
+	   XXX should this be global or per-cpu? */
+	sl_state = softlockup_disable();
 
 	while(timeout--) {
 		/*
@@ -1223,19 +1229,26 @@ int ide_wait_not_busy(ide_hwif_t *hwif, 
 		 */
 		mdelay(1);
 		stat = hwif->INB(hwif->io_ports[IDE_STATUS_OFFSET]);
-		if ((stat & BUSY_STAT) == 0)
-			return 0;
+		if ((stat & BUSY_STAT) == 0) {
+			ret = 0;
+			break;
+		}
+
 		/*
 		 * Assume a value of 0xff means nothing is connected to
 		 * the interface and it doesn't implement the pull-down
 		 * resistor on D7.
 		 */
-		if (stat == 0xff)
-			return -ENODEV;
-		touch_softlockup_watchdog();
+		if (stat == 0xff) {
+			ret = -ENODEV;
+			break;
+		}
 		touch_nmi_watchdog();
 	}
-	return -EBUSY;
+
+	softlockup_enable(sl_state);
+
+	return ret;
 }
 
 EXPORT_SYMBOL_GPL(ide_wait_not_busy);
==================================================================---
a/drivers/ide/ide-taskfile.c
+++ b/drivers/ide/ide-taskfile.c
@@ -310,10 +310,14 @@ static void ide_pio_datablock(ide_drive_
 static void ide_pio_datablock(ide_drive_t *drive, struct request *rq,
 				     unsigned int write)
 {
+	int sl_state;
+
 	if (rq->bio)	/* fs request */
 		rq->errors = 0;
 
-	touch_softlockup_watchdog();
+	/* Tell softlockup this might take a while.
+	   XXX should this be global or per-cpu? */
+	sl_state = softlockup_disable();
 
 	switch (drive->hwif->data_phase) {
 	case TASKFILE_MULTI_IN:
@@ -324,6 +328,8 @@ static void ide_pio_datablock(ide_drive_
 		ide_pio_sector(drive, write);
 		break;
 	}
+
+	softlockup_enable(sl_state);
 }
 
 static ide_startstop_t task_error(ide_drive_t *drive, struct request *rq,
==================================================================---
a/drivers/mtd/nand/nand_base.c
+++ b/drivers/mtd/nand/nand_base.c
@@ -419,15 +419,21 @@ void nand_wait_ready(struct mtd_info *mt
 {
 	struct nand_chip *chip = mtd->priv;
 	unsigned long timeo = jiffies + 2;
+	int sl_state;
+
+	/* Tell softlockup this might take a while.
+	   XXX should this be global or per-cpu? */
+	sl_state = softlockup_disable();
 
 	led_trigger_event(nand_led_trigger, LED_FULL);
 	/* wait until command is processed or timeout occures */
 	do {
 		if (chip->dev_ready(mtd))
 			break;
-		touch_softlockup_watchdog();
 	} while (time_before(jiffies, timeo));
 	led_trigger_event(nand_led_trigger, LED_OFF);
+
+	softlockup_enable(sl_state);
 }
 EXPORT_SYMBOL_GPL(nand_wait_ready);
 

--

Jeremy Fitzhardinge

2007-Apr-18 13:02 UTC

head link

[patch 4/4] Add global disable/enable for softlockup watchdog

Some machine-wide activities can cause spurious softlockup watchdog
warnings, so add a mechanism to allow the watchdog to be disabled.

The most obvious activity is suspend/resume, but long sysrq output can
also stall the system long enough to cause problems.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>

---
 drivers/char/sysrq.c  |    8 +++++
 include/linux/sched.h |   10 ++++++
 kernel/panic.c        |    3 +-
 kernel/power/swsusp.c |    3 +-
 kernel/softlockup.c   |   72 ++++++++++++++++++++++++++++++++++++++++++-------
 kernel/timer.c        |    4 ++
 6 files changed, 87 insertions(+), 13 deletions(-)

==================================================================---
a/drivers/char/sysrq.c
+++ b/drivers/char/sysrq.c
@@ -211,7 +211,11 @@ static struct sysrq_key_op sysrq_showreg
 
 static void sysrq_handle_showstate(int key, struct tty_struct *tty)
 {
+	softlockup_global_disable();  /* may take a while */
+
 	show_state();
+
+	softlockup_global_enable();
 }
 static struct sysrq_key_op sysrq_showstate_op = {
 	.handler	= sysrq_handle_showstate,
@@ -222,7 +226,11 @@ static struct sysrq_key_op sysrq_showsta
 
 static void sysrq_handle_showstate_blocked(int key, struct tty_struct *tty)
 {
+	softlockup_global_disable();  /* may take a while */
+
 	show_state_filter(TASK_UNINTERRUPTIBLE);
+
+	softlockup_global_enable();
 }
 static struct sysrq_key_op sysrq_showstate_blocked_op = {
 	.handler	= sysrq_handle_showstate_blocked,
==================================================================---
a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -243,6 +243,10 @@ extern int  softlockup_disable(void);
 extern int  softlockup_disable(void);
 extern void softlockup_enable(int state);
 
+/* Disable/re-enable softlockup watchdog on all CPUs */
+extern void softlockup_global_disable(void);
+extern void softlockup_global_enable(void);
+
 extern void spawn_softlockup_task(void);
 extern void touch_softlockup_watchdog(void);
 #else
@@ -263,6 +267,12 @@ static inline void softlockup_enable(int
 static inline void softlockup_enable(int state)
 {
 	preempt_enable();
+}
+static inline void softlockup_global_enable(void)
+{
+}
+static inline void softlockup_global_disable(void)
+{
 }
 static inline void spawn_softlockup_task(void)
 {
==================================================================---
a/kernel/panic.c
+++ b/kernel/panic.c
@@ -131,8 +131,9 @@ NORET_TYPE void panic(const char * fmt, 
         disabled_wait(caller);
 #endif
 	local_irq_enable();
+	softlockup_global_disable();
+
 	for (i = 0;;) {
-		touch_softlockup_watchdog();
 		i += panic_blink(i);
 		mdelay(1);
 		i++;
==================================================================---
a/kernel/power/swsusp.c
+++ b/kernel/power/swsusp.c
@@ -289,6 +289,7 @@ int swsusp_suspend(void)
 	 * that suspended with irqs off ... no overall powerup.
 	 */
 	device_power_up();
+	softlockup_global_disable();
  Enable_irqs:
 	local_irq_enable();
 	return error;
@@ -323,7 +324,7 @@ int swsusp_resume(void)
 	 */
 	swsusp_free();
 	restore_processor_state();
-	touch_softlockup_watchdog();
+	softlockup_global_enable();
 	device_power_up();
 	local_irq_enable();
 	return error;
==================================================================---
a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -17,10 +17,21 @@
 
 static DEFINE_SPINLOCK(print_lock);
 
+/*
+ * Since sched_clock() is inherently per-cpu, its not possible to
+ * update another CPU's timestamp.  To deal with this, we add an extra
+ * state meaning "enabled, but timestamp needs update".
+ */
+enum state {
+	SL_OFF = 0,		/* disabled */
+	SL_UPDATE,		/* enabled, but timestamp old */
+	SL_ON,			/* enabled */
+};
+
 static DEFINE_PER_CPU(unsigned long, touch_timestamp);
 static DEFINE_PER_CPU(unsigned long, print_timestamp);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
-static DEFINE_PER_CPU(int, enabled);
+static DEFINE_PER_CPU(enum state, softlock_state);
 
 static int did_panic = 0;
 
@@ -48,7 +59,12 @@ static unsigned long get_timestamp(void)
 
 void inline touch_softlockup_watchdog(void)
 {
+	if (__raw_get_cpu_var(softlock_state) == SL_OFF)
+		return;
+
 	__raw_get_cpu_var(touch_timestamp) = get_timestamp();
+	barrier();
+	__raw_get_cpu_var(softlock_state) = SL_ON;
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
@@ -58,7 +74,7 @@ EXPORT_SYMBOL(touch_softlockup_watchdog)
  */
 void inline softlockup_tick_disable(void)
 {
-	__get_cpu_var(enabled) = 0;
+	__get_cpu_var(softlock_state) = SL_OFF;
 }
 
 /*
@@ -73,7 +89,7 @@ int softlockup_disable(void)
 
 	preempt_disable();
 
-	ret = __get_cpu_var(enabled);
+	ret = __get_cpu_var(softlock_state) == SL_OFF;
 	softlockup_tick_disable();
 
 	return ret;
@@ -86,7 +102,7 @@ EXPORT_SYMBOL(softlockup_disable);
  */
 void inline softlockup_tick_enable(void)
 {
-	__get_cpu_var(enabled) = 1;
+	__get_cpu_var(softlock_state) = SL_UPDATE;
 }
 
 /*
@@ -96,15 +112,41 @@ void softlockup_enable(int state)
 void softlockup_enable(int state)
 {
 	if (state) {
-		touch_softlockup_watchdog();
-		/* update timestamp before enable */
-		barrier();
 		softlockup_tick_enable();
+		touch_softlockup_watchdog();
 	}
 
 	preempt_enable();
 }
 EXPORT_SYMBOL(softlockup_enable);
+
+/*
+ * Disable softlockup watchdog on all CPUs.  This is useful for
+ * globally disruptive activities, like suspend/resume or large sysrq
+ * debug outputs.
+ */
+void softlockup_global_disable()
+{
+	unsigned cpu;
+
+	for_each_online_cpu(cpu)
+		per_cpu(softlock_state, cpu) = SL_OFF;
+}
+EXPORT_SYMBOL(softlockup_global_disable);
+
+/*
+ * Globally re-enable soft lockups.  This will obviously interfere
+ * with any CPU's local softlockup disable, but with luck that won't
+ * matter.
+ */
+void softlockup_global_enable()
+{
+	unsigned cpu;
+
+	for_each_online_cpu(cpu)
+		per_cpu(softlock_state, cpu) = SL_UPDATE;
+}
+EXPORT_SYMBOL(softlockup_global_enable);
 
 /*
  * This callback runs from the timer interrupt, and checks
@@ -117,9 +159,19 @@ void softlockup_tick(void)
 	unsigned long print_timestamp;
 	unsigned long now;
 
-	/* return if not enabled */
-	if (!__get_cpu_var(enabled))
-		return;
+	switch(__get_cpu_var(softlock_state)) {
+	case SL_OFF:
+		/* not enabled */
+		return;
+
+	case SL_UPDATE:
+		/* update timestamp */
+		touch_softlockup_watchdog();
+		return;
+
+	case SL_ON:
+		break;
+	}
 
 	print_timestamp = __get_cpu_var(print_timestamp);
 
==================================================================---
a/kernel/timer.c
+++ b/kernel/timer.c
@@ -1011,7 +1011,7 @@ static int timekeeping_resume(struct sys
 	timekeeping_suspended = 0;
 	write_sequnlock_irqrestore(&xtime_lock, flags);
 
-	touch_softlockup_watchdog();
+	softlockup_global_enable();
 
 	clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);
 
@@ -1029,6 +1029,8 @@ static int timekeeping_suspend(struct sy
 	timekeeping_suspended = 1;
 	timekeeping_suspend_time = read_persistent_clock();
 	write_sequnlock_irqrestore(&xtime_lock, flags);
+
+	softlockup_global_disable();
 
 	clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);
 

--

Jeremy Fitzhardinge

2007-Apr-18 13:02 UTC

head link

[patch 1/4] Ignore stolen time in the softlockup watchdog

The softlockup watchdog is currently a nuisance in a virtual machine,
since the whole system could have the CPU stolen from it for a long
period of time.  While it would be unlikely for a guest domain to be
denied timer interrupts for over 10s, it could happen and any softlockup
message would be completely spurious.

Earlier I proposed that sched_clock() return time in unstolen
nanoseconds, which is how Xen and VMI currently implement it.  If the
softlockup watchdog uses sched_clock() to measure time, it would
automatically ignore stolen time, and therefore only report when the
guest itself locked up.  When running native, sched_clock() returns
real-time nanoseconds, so the behaviour would be unchanged.

Note that sched_clock() used this way is inherently per-cpu, so this
patch makes sure that the per-processor watchdog thread initialized
its own timestamp.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: James Morris <jmorris@namei.org>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Rick Lindsley <ricklind@us.ibm.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>
---
 kernel/softlockup.c |   37 ++++++++++++++++++++++++++++++-------
 1 file changed, 30 insertions(+), 7 deletions(-)

==================================================================---
a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -35,9 +35,19 @@ static struct notifier_block panic_block
 	.notifier_call = softlock_panic,
 };
 
+/*
+ * Returns seconds, approximately.  We don't need nanosecond
+ * resolution, and we don't need to waste time with a big divide when
+ * 2^30ns == 1.074s.
+ */
+static unsigned long get_timestamp(void)
+{
+	return sched_clock() >> 30;  /* 2^30 ~= 10^9 */
+}
+
 void touch_softlockup_watchdog(void)
 {
-	__raw_get_cpu_var(touch_timestamp) = jiffies;
+	__raw_get_cpu_var(touch_timestamp) = get_timestamp();
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
 
@@ -48,10 +58,18 @@ void softlockup_tick(void)
 void softlockup_tick(void)
 {
 	int this_cpu = smp_processor_id();
-	unsigned long touch_timestamp = per_cpu(touch_timestamp, this_cpu);
+	unsigned long touch_timestamp = __get_cpu_var(touch_timestamp);
+	unsigned long print_timestamp;
+	unsigned long now;
 
-	/* prevent double reports: */
-	if (per_cpu(print_timestamp, this_cpu) == touch_timestamp ||
+	/* watchdog task hasn't updated timestamp yet */
+	if (touch_timestamp == 0)
+		return;
+
+	print_timestamp = __get_cpu_var(print_timestamp);
+
+	/* report at most once a second */
+	if (print_timestamp < (touch_timestamp + 1) ||
 		did_panic ||
 			!per_cpu(watchdog_task, this_cpu))
 		return;
@@ -62,12 +80,14 @@ void softlockup_tick(void)
 		return;
 	}
 
+	now = get_timestamp();
+
 	/* Wake up the high-prio watchdog task every second: */
-	if (time_after(jiffies, touch_timestamp + HZ))
+	if (now > (touch_timestamp + 1))
 		wake_up_process(per_cpu(watchdog_task, this_cpu));
 
 	/* Warn about unreasonable 10+ seconds delays: */
-	if (time_after(jiffies, touch_timestamp + 10*HZ)) {
+	if (now > (touch_timestamp + 10)) {
 		per_cpu(print_timestamp, this_cpu) = touch_timestamp;
 
 		spin_lock(&print_lock);
@@ -87,6 +107,9 @@ static int watchdog(void * __bind_cpu)
 
 	sched_setscheduler(current, SCHED_FIFO, &param);
 	current->flags |= PF_NOFREEZE;
+
+	/* initialize timestamp */
+	touch_softlockup_watchdog();
 
 	/*
 	 * Run briefly once per second to reset the softlockup timestamp.
@@ -120,7 +143,7 @@ cpu_callback(struct notifier_block *nfb,
 			printk("watchdog for %i failed\n", hotcpu);
 			return NOTIFY_BAD;
 		}
-  		per_cpu(touch_timestamp, hotcpu) = jiffies;
+  		per_cpu(touch_timestamp, hotcpu) = 0;
   		per_cpu(watchdog_task, hotcpu) = p;
 		kthread_bind(p, hotcpu);
  		break;

--

Jeremy Fitzhardinge

2007-Apr-18 13:02 UTC

head link

[patch 2/4] percpu enable flag for softlockup watchdog

On a NO_HZ system, there may be an arbitrarily long delay between
ticks on a CPU.  When we're disabling ticks for a CPU, also disable
the softlockup watchdog timer.

This makes the touch_softlockup_watchdog() interface redundant; if a
piece of code knows its going to be holding off timer interrupts long
enough to trigger the watchdog, then it may as well simply temporarily
disable the timer.

Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Zachary Amsden <zach@vmware.com>
Cc: James Morris <jmorris@namei.org>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Chris Lalancette <clalance@redhat.com>
Cc: Eric Dumazet <dada1@cosmosbay.com>

---
 include/linux/sched.h    |   25 +++++++++++++++++
 kernel/softlockup.c      |   67 ++++++++++++++++++++++++++++++++++++++++++----
 kernel/time/tick-sched.c |   34 ++++++++++-------------
 3 files changed, 102 insertions(+), 24 deletions(-)

==================================================================---
a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -232,12 +232,37 @@ extern void scheduler_tick(void);
 extern void scheduler_tick(void);
 
 #ifdef CONFIG_DETECT_SOFTLOCKUP
+/* A scheduler tick on this CPU */
 extern void softlockup_tick(void);
+
+/* Scheduler is disabling/re-enabling ticks on this CPU */
+extern void softlockup_tick_disable(void);
+extern void softlockup_tick_enable(void);
+
+/* Some code wants to temporarily disable the watchdog */
+extern int  softlockup_disable(void);
+extern void softlockup_enable(int state);
+
 extern void spawn_softlockup_task(void);
 extern void touch_softlockup_watchdog(void);
 #else
 static inline void softlockup_tick(void)
 {
+}
+static inline void softlockup_tick_disable(void)
+{
+}
+static inline void softlockup_tick_enable(void)
+{
+}
+static inline int softlockup_disable(void)
+{
+	preempt_disable();
+	return 0;
+}
+static inline void softlockup_enable(int state)
+{
+	preempt_enable();
 }
 static inline void spawn_softlockup_task(void)
 {
==================================================================---
a/kernel/softlockup.c
+++ b/kernel/softlockup.c
@@ -20,6 +20,7 @@ static DEFINE_PER_CPU(unsigned long, tou
 static DEFINE_PER_CPU(unsigned long, touch_timestamp);
 static DEFINE_PER_CPU(unsigned long, print_timestamp);
 static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
+static DEFINE_PER_CPU(int, enabled);
 
 static int did_panic = 0;
 
@@ -45,11 +46,65 @@ static unsigned long get_timestamp(void)
 	return sched_clock() >> 30;  /* 2^30 ~= 10^9 */
 }
 
-void touch_softlockup_watchdog(void)
+void inline touch_softlockup_watchdog(void)
 {
 	__raw_get_cpu_var(touch_timestamp) = get_timestamp();
 }
 EXPORT_SYMBOL(touch_softlockup_watchdog);
+
+/*
+ * Disable the watchdog on this CPU.  This is called directly by the
+ * scheduler to tell us it's going tickless.
+ */
+void inline softlockup_tick_disable(void)
+{
+	__get_cpu_var(enabled) = 0;
+}
+
+/*
+ * Disable the watchdog for this CPU, returning the current state to
+ * allow nesting.  Returns with preemptio disabled, since we can't
+ * switch CPUs before we re-enable the watchdog (also, if we're
+ * worried about getting watchdog timeouts, we're not scheduling).
+ */
+int softlockup_disable(void)
+{
+	int ret;
+
+	preempt_disable();
+
+	ret = __get_cpu_var(enabled);
+	softlockup_tick_disable();
+
+	return ret;
+}
+EXPORT_SYMBOL(softlockup_disable);
+
+/*
+ * Re-enable the watchdog on this CPU.  Called directly by the
+ * scheduler to tell us ticks are resuming.
+ */
+void inline softlockup_tick_enable(void)
+{
+	__get_cpu_var(enabled) = 1;
+}
+
+/*
+ * Returns softlockup watchdog state to before the last per-cpu
+ * disable.
+ */
+void softlockup_enable(int state)
+{
+	if (state) {
+		touch_softlockup_watchdog();
+		/* update timestamp before enable */
+		barrier();
+		softlockup_tick_enable();
+	}
+
+	preempt_enable();
+}
+EXPORT_SYMBOL(softlockup_enable);
 
 /*
  * This callback runs from the timer interrupt, and checks
@@ -62,8 +117,8 @@ void softlockup_tick(void)
 	unsigned long print_timestamp;
 	unsigned long now;
 
-	/* watchdog task hasn't updated timestamp yet */
-	if (touch_timestamp == 0)
+	/* return if not enabled */
+	if (!__get_cpu_var(enabled))
 		return;
 
 	print_timestamp = __get_cpu_var(print_timestamp);
@@ -108,8 +163,8 @@ static int watchdog(void * __bind_cpu)
 	sched_setscheduler(current, SCHED_FIFO, &param);
 	current->flags |= PF_NOFREEZE;
 
-	/* initialize timestamp */
-	touch_softlockup_watchdog();
+	/* enable on this cpu */
+	softlockup_tick_enable();
 
 	/*
 	 * Run briefly once per second to reset the softlockup timestamp.
@@ -122,6 +177,8 @@ static int watchdog(void * __bind_cpu)
 		touch_softlockup_watchdog();
 		schedule();
 	}
+
+	softlockup_tick_disable();
 
 	return 0;
 }
==================================================================---
a/kernel/time/tick-sched.c
+++ b/kernel/time/tick-sched.c
@@ -228,6 +228,8 @@ void tick_nohz_stop_sched_tick(void)
 			ts->idle_tick = ts->sched_timer.expires;
 			ts->tick_stopped = 1;
 			ts->idle_jiffies = last_jiffies;
+
+			softlockup_tick_disable();
 		}
 		/*
 		 * calculate the expiry time for the next timer wheel
@@ -255,6 +257,7 @@ void tick_nohz_stop_sched_tick(void)
 		cpu_clear(cpu, nohz_cpu_mask);
 	}
 	raise_softirq_irqoff(TIMER_SOFTIRQ);
+
 out:
 	ts->next_jiffies = next_jiffies;
 	ts->last_jiffies = last_jiffies;
@@ -311,6 +314,8 @@ void tick_nohz_restart_sched_tick(void)
 	ts->tick_stopped  = 0;
 	hrtimer_cancel(&ts->sched_timer);
 	ts->sched_timer.expires = ts->idle_tick;
+
+	softlockup_tick_enable();
 
 	while (1) {
 		/* Forward the time to expire in the future */
@@ -355,17 +360,12 @@ static void tick_nohz_handler(struct clo
 	tick_do_update_jiffies64(now);
 
 	/*
-	 * When we are idle and the tick is stopped, we have to touch
-	 * the watchdog as we might not schedule for a really long
-	 * time. This happens on complete idle SMP systems while
-	 * waiting on the login prompt. We also increment the "start
-	 * of idle" jiffy stamp so the idle accounting adjustment we
-	 * do when we go busy again does not account too much ticks.
-	 */
-	if (ts->tick_stopped) {
-		touch_softlockup_watchdog();
+	 * Increment the "start of idle" jiffy stamp so the idle
+	 * accounting adjustment we do when we go busy again does not
+	 * account too much ticks.
+	 */
+	if (ts->tick_stopped)
 		ts->idle_jiffies++;
-	}
 
 	update_process_times(user_mode(regs));
 	profile_tick(CPU_PROFILING);
@@ -450,17 +450,12 @@ static enum hrtimer_restart tick_sched_t
 	 */
 	if (regs) {
 		/*
-		 * When we are idle and the tick is stopped, we have to touch
-		 * the watchdog as we might not schedule for a really long
-		 * time. This happens on complete idle SMP systems while
-		 * waiting on the login prompt. We also increment the "start of
-		 * idle" jiffy stamp so the idle accounting adjustment we do
-		 * when we go busy again does not account too much ticks.
+		 * Increment the "start of idle" jiffy stamp so the
+		 * idle accounting adjustment we do when we go busy
+		 * again does not account too much ticks.
 		 */
-		if (ts->tick_stopped) {
-			touch_softlockup_watchdog();
+		if (ts->tick_stopped)
 			ts->idle_jiffies++;
-		}
 		/*
 		 * update_process_times() might take tasklist_lock, hence
 		 * drop the base lock. sched-tick hrtimers are per-CPU and
@@ -522,6 +517,7 @@ void tick_cancel_sched_timer(int cpu)
 	if (ts->sched_timer.base)
 		hrtimer_cancel(&ts->sched_timer);
 	ts->tick_stopped = 0;
+	softlockup_tick_enable();
 	ts->nohz_mode = NOHZ_MODE_INACTIVE;
 }
 #endif /* HIGH_RES_TIMERS */

--

Jeremy Fitzhardinge

2007-Apr-18 13:02 UTC

head link

[patch 0/4] Revised softlockup watchdog improvement patches

Hi Ingo,

This series of patches implements a number of improvements to the
softlockup watchdog and its users.

They are:

1. Make the watchdog ignore stolen time

When running under a hypervisor, the kernel may lose an arbitrary amount
of time as "stolen time".  This may cause the softlockup watchdog to
trigger spruiously.  Xen and VMI implement sched_clock() as measuring
unstolen time, so use that as the timebase for the softlockup watchdog.
This also removes a dependency on jiffies.

2. Add a per-cpu enable flag for the watchdog

When the scheduler disables ticks for a specific CPU, this allows the
watchdog to be disabled as well.  This avoids spurious watchdog errors
if a CPU has been tickless for a long time.  The existing tick-sched.c
code tried to address this by periodically touching the watchdog, but
this seems more direct.

3. Use the per-cpu enable flags to temporarily disable the watchdog

Some drivers perform long operations which will cause the watchdog to
time out.  If we know we're about to start such an operation, disable
the watchdog timer for the interim.  This is more straightforward than
trying to periodically tickle the watchdog timer during the operation, and
safer than just doing it once at the start and hoping there's enough time.

4. Add a global disable flag

Sometimes a long operation will affect all CPUs' ability to tickle the
watchdog timer.  An obvious example is suspend/resume, but it can also
happen while generating lots of sysrq output, or other specialized
operations.  Add global enable/disable calls to deal with these case.

--

Andrew Morton

2007-Apr-23 23:52 UTC

head link

[patch 1/4] Ignore stolen time in the softlockup watchdog

On Tue, 27 Mar 2007 14:49:20 -0700 Jeremy Fitzhardinge <jeremy@goop.org>
wrote:
> The softlockup watchdog is currently a nuisance in a virtual machine,
> since the whole system could have the CPU stolen from it for a long
> period of time.  While it would be unlikely for a guest domain to be
> denied timer interrupts for over 10s, it could happen and any softlockup
> message would be completely spurious.
> 
> Earlier I proposed that sched_clock() return time in unstolen
> nanoseconds, which is how Xen and VMI currently implement it.  If the
> softlockup watchdog uses sched_clock() to measure time, it would
> automatically ignore stolen time, and therefore only report when the
> guest itself locked up.  When running native, sched_clock() returns
> real-time nanoseconds, so the behaviour would be unchanged.
> 
> Note that sched_clock() used this way is inherently per-cpu, so this
> patch makes sure that the per-processor watchdog thread initialized
> its own timestamp.
This patch
(ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc6/2.6.21-rc6-mm1/broken-out/ignore-stolen-time-in-the-softlockup-watchdog.patch)
causes six failures in the locking self-tests, which I must say is rather
clever of it.


Here's the first one:

[17179569.184000] Lock dependency validator: Copyright (c) 2006 Red Hat, Inc.,
Ingo Molnar
[17179569.184000] ... MAX_LOCKDEP_SUBCLASSES:    8
[17179569.184000] ... MAX_LOCK_DEPTH:          30
[17179569.184000] ... MAX_LOCKDEP_KEYS:        2048
[17179569.184000] ... CLASSHASH_SIZE:           1024
[17179569.184000] ... MAX_LOCKDEP_ENTRIES:     8192
[17179569.184000] ... MAX_LOCKDEP_CHAINS:      16384
[17179569.184000] ... CHAINHASH_SIZE:          8192
[17179569.184000]  memory used by lock dependency info: 992 kB
[17179569.184000]  per task-struct memory footprint: 1200 bytes
[17179569.184000] ------------------------
[17179569.184000] | Locking API testsuite:
[17179569.184000]
----------------------------------------------------------------------------
[17179569.184000]                                  | spin |wlock |rlock |mutex |
wsem | rsem |
[17179569.184000]  
--------------------------------------------------------------------------
[17179569.184000]                      A-A deadlock:  ok  |  ok  |  ok  |  ok  |
ok  |  ok  |
[17179569.184000]                  A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  |
ok  |  ok  |
[17179569.184000]              A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  |
ok  |  ok  |
[17179569.184001]              A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  |
ok  |  ok  |
[17179569.184002]          A-B-B-C-C-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |
ok  |  ok  |
[17179569.184003]          A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |
ok  |  ok  |
[17179569.184004]          A-B-C-D-B-C-D-A deadlock:  ok  |  ok  |  ok  |  ok  |
ok  |  ok  |
[17179569.184005]                     double unlock:  ok  |  ok  |  ok  |  ok  |
ok  |  ok  |
[17179569.184006]                   initialize held:  ok  |  ok  |  ok  |  ok  |
ok  |  ok  |
[17179569.184006]                  bad unlock order:  ok  |  ok  |  ok  |  ok  |
ok  |  ok  |
[17179569.184006]  
--------------------------------------------------------------------------
[17179569.184006]               recursive read-lock:             |  ok  |       
|  ok  |
[17179569.184006]            recursive read-lock #2:             |  ok  |       
|  ok  |
[17179569.184007]             mixed read-write-lock:             |  ok  |       
|  ok  |
[17179569.184007]             mixed write-read-lock:             |  ok  |       
|  ok  |
[17179569.184007]  
--------------------------------------------------------------------------
[17179569.184007]      hard-irqs-on + irq-safe-A/12:  ok  |  ok  |  ok  |
[17179569.184007]      soft-irqs-on + irq-safe-A/12:  ok  |  ok  |  ok  |
[17179569.184007]      hard-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[17179569.184007]      soft-irqs-on + irq-safe-A/21:  ok  |  ok  |  ok  |
[17179569.184007]        sirq-safe-A => hirqs-on/12:  ok  |  ok  |irq event
stamp: 458
[17179569.184007] hardirqs last  enabled at (458): [<c01e4116>]
irqsafe2A_rlock_12+0x96/0xa3
[17179569.184007] hardirqs last disabled at (457): [<c01095b9>]
sched_clock+0x5e/0xe9
[17179569.184007] softirqs last  enabled at (454): [<c01e4101>]
irqsafe2A_rlock_12+0x81/0xa3
[17179569.184007] softirqs last disabled at (450): [<c01e408b>]
irqsafe2A_rlock_12+0xb/0xa3
[17179569.184007] FAILED| [<c0104cf0>] dump_trace+0x63/0x1ec
[17179569.184007]  [<c0104e93>] show_trace_log_lvl+0x1a/0x30
[17179569.184007]  [<c01059ec>] show_trace+0x12/0x14
[17179569.184007]  [<c0105a45>] dump_stack+0x16/0x18
[17179569.184007]  [<c01e1eb5>] dotest+0x6b/0x3d0
[17179569.184007]  [<c01eb249>] locking_selftest+0x915/0x1a58
[17179569.184007]  [<c048c979>] start_kernel+0x1d0/0x2a2
[17179569.184007]  ======================[17179569.184007] 
[17179569.184007]        sirq-safe-A => hirqs-on/21:irq event stamp: 462
[17179569.184007] hardirqs last  enabled at (462): [<c01e3eb2>]
irqsafe2A_spin_21+0x1b/0xa3
[17179569.184007] hardirqs last disabled at (461): [<c01095b9>]
sched_clock+0x5e/0xe9
[17179569.184007] softirqs last  enabled at (454): [<c01e4101>]
irqsafe2A_rlock_12+0x81/0xa3
[17179569.184007] softirqs last disabled at (450): [<c01e408b>]
irqsafe2A_rlock_12+0xb/0xa3
[17179569.184007]   ok  |irq event stamp: 466


It's a challenge to even find the code which corresponds with this failure,
so good luck.


It seems fairly sensitive to .config settings.  See
http://userweb.kernel.org/~akpm/config-sony.txt

Unfortunately this causes lockdep to disable itself, thus hiding all the
other bugs which people have contributed, so I'll drop your patch for now.

Maybe Matching Threads

Search for more apparently analagous threads

Linux Virtualization - Apr 2007 - [patch 0/4] Revised softlockup watchdog improvement patches

[patch 3/4] Locally disable the softlockup watchdog rather than touching it

[patch 4/4] Add global disable/enable for softlockup watchdog

[patch 1/4] Ignore stolen time in the softlockup watchdog

[patch 2/4] percpu enable flag for softlockup watchdog

[patch 0/4] Revised softlockup watchdog improvement patches

[patch 1/4] Ignore stolen time in the softlockup watchdog

Maybe Matching Threads