thr3ads.net - Xen devel - [PATCH 0/2] Kexec crash path fixes [Nov 2013]

If this information is useful, please help other people find it:
Share via:

Andrew Cooper

2013-Nov-15 20:32 UTC

[PATCH 0/2] Kexec crash path fixes

Hello,

Here are two fixes for the kexec crash path itself, both from issues XenServer
has encountered.  They should both be considered for backport.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: David Vrabel <david.vrabel@citrix.com>

-- 
1.7.10.4

Andrew Cooper

2013-Nov-15 20:32 UTC

head link

[PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

In some cases, such as suffering a queued-invalidation timeout while
performing an iommu_crash_shutdown(), Xen can end up reentering the crash
path. Previously, this would result in a deadlock in one_cpu_only(), as the
test_and_set_bit() would fail.

The crash path is not reentrant, and even if it could be made to be so, it is
almost certain that we would fall over the same reentry condition again.

The new code can distinguish a reentry case from multiple cpus racing down the
crash path.  In the case that a reentry is detected, return back out to the
nested panic() call, which will maybe_reboot() on our behalf.  This requires a
bit of return plumbing back up to kexec_crash().

While fixing this deadlock, also fix up an minor niggle seen recently from a
XenServer crash report.  The report was from a Bank 8 MCE, which had managed
to crash on all cpus at once.  The result was a lot of stack traces with cpus
in kexec_common_shutdown(), which was infact the inlined version of
one_cpu_only().  The kexec crash path is not a hotpath, so we can easily
afford to prevent inlining for the sake of clarity in the stack traces.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
CC: David Vrabel <david.vrabel@citrix.com>
---
 xen/common/kexec.c |   51 ++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/xen/common/kexec.c b/xen/common/kexec.c
index 17f3ed7..481b0c2 100644
--- a/xen/common/kexec.c
+++ b/xen/common/kexec.c
@@ -233,11 +233,39 @@ void __init set_kexec_crash_area_size(u64 system_ram)
     }
 }
 
-static void one_cpu_only(void)
+/*
+ * Only allow one cpu to continue on the crash path, forcing others to spin.
+ * Racing on the crash path from here will end in misery.  If we reenter,
+ * something has very gone wrong and retrying will (almost certainly) be
+ * futile.  Return up to our nested panic() to try and reboot.
+ *
+ * This is noinline to make it obvious in stack traces which cpus have lost
+ * the race (as opposed to being somewhere in kexec_common_shutdown())
+ */
+static int noinline one_cpu_only(void)
 {
-    /* Only allow the first cpu to continue - force other cpus to spin */
-    if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
-        for ( ; ; ) ;
+    static unsigned int crashing_cpu = -1;
+    unsigned int cpu = smp_processor_id();
+
+    if ( cmpxchg(&crashing_cpu, -1, cpu) != -1 )
+    {
+        /* Not the first entry into one_cpu_only(). */
+        if ( crashing_cpu == cpu )
+        {
+            printk("Reentered the crash path.  Something is very
broken\n");
+            return -EBUSY;
+        }
+
+        /*
+         * Another cpu has beaten us to this point.  Wait here patiently for
+         * it to kill us.
+         */
+        for ( ; ; )
+            halt();
+    }
+
+    set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags);
+    return 0;
 }
 
 /* Save the registers in the per-cpu crash note buffer. */
@@ -288,13 +316,20 @@ crash_xen_info_t *kexec_crash_save_info(void)
     return out;
 }
 
-static void kexec_common_shutdown(void)
+static int kexec_common_shutdown(void)
 {
+    int ret;
+
+    ret = one_cpu_only();
+    if ( ret )
+        return ret;
+
     watchdog_disable();
     console_start_sync();
     spin_debug_disable();
-    one_cpu_only();
     acpi_dmar_reinstate();
+
+    return 0;
 }
 
 void kexec_crash(void)
@@ -309,7 +344,9 @@ void kexec_crash(void)
 
     kexecing = TRUE;
 
-    kexec_common_shutdown();
+    if ( kexec_common_shutdown() != 0 )
+        return;
+
     kexec_crash_save_cpu();
     machine_crash_shutdown();
     machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
-- 
1.7.10.4

Andrew Cooper

2013-Nov-15 20:32 UTC

head link

[PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

PVOps Linux as a kexec image shoots itself in the foot otherwise.

On a Core2 system, Linux declares a firmware bug and tries to invert some bits
in the performance counter register.  It ends up setting the number of retired
instructions to generate another NMI to fewer instructions than the NMI
interrupt path itself, and ceases to make any useful progress.

While this is not strictly Xen''s fault, Xen can at least be kind and
leave the
kexec environment with fewer issues to deal with.

From: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>
---
 xen/arch/x86/crash.c       |    1 +
 xen/arch/x86/nmi.c         |    2 +-
 xen/include/asm-x86/apic.h |    1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index 4ef379b..01fd906 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void)
     unsigned long msecs;
     int i, cpu = smp_processor_id();
 
+    disable_lapic_nmi_watchdog();
     local_irq_disable();
 
     crashing_cpu = cpu;
diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index 2c16d93..c67a9c3 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused)
     set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000));
 }
 
-static void disable_lapic_nmi_watchdog(void)
+void disable_lapic_nmi_watchdog(void)
 {
     if (nmi_active <= 0)
         return;
diff --git a/xen/include/asm-x86/apic.h b/xen/include/asm-x86/apic.h
index dd528fb..43b39ce 100644
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -200,6 +200,7 @@ extern void smp_local_timer_interrupt (struct cpu_user_regs
*regs);
 extern void setup_boot_APIC_clock (void);
 extern void setup_secondary_APIC_clock (void);
 extern void setup_apic_nmi_watchdog (void);
+extern void disable_lapic_nmi_watchdog(void);
 extern int reserve_lapic_nmi(void);
 extern void release_lapic_nmi(void);
 extern void self_nmi(void);
-- 
1.7.10.4

David Vrabel

2013-Nov-15 21:01 UTC

head link

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

On 15/11/13 20:32, Andrew Cooper wrote:> PVOps Linux as a kexec image shoots itself in the foot otherwise.
> 
> On a Core2 system, Linux declares a firmware bug and tries to invert some
bits
> in the performance counter register.  It ends up setting the number of
retired
> instructions to generate another NMI to fewer instructions than the NMI
> interrupt path itself, and ceases to make any useful progress.
> 
> While this is not strictly Xen''s fault, Xen can at least be kind
and leave the
> kexec environment with fewer issues to deal with.
I don''t appreciate my commit message being rewritten in this way.

My original commit message was:

  "kexec: disable the NMI watchdog during a crash

  nmi_shootdown_cpus() is called during a crash to park all the other
  CPUs.  This changes the NMI trap handlers which means there''s no
point
  in having the watchdog still running.

  This also disables the watchdog before executing any crash kexec image
  and prevents the image from receiving unexpected NMIs."

David

Andrew Cooper

2013-Nov-15 21:09 UTC

head link

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

On 15/11/13 21:01, David Vrabel wrote:> On 15/11/13 20:32, Andrew Cooper wrote:
>> PVOps Linux as a kexec image shoots itself in the foot otherwise.
>>
>> On a Core2 system, Linux declares a firmware bug and tries to invert
some bits
>> in the performance counter register.  It ends up setting the number of
retired
>> instructions to generate another NMI to fewer instructions than the NMI
>> interrupt path itself, and ceases to make any useful progress.
>>
>> While this is not strictly Xen''s fault, Xen can at least be
kind and leave the
>> kexec environment with fewer issues to deal with.
> I don''t appreciate my commit message being rewritten in this way.
>
> My original commit message was:
>
>   "kexec: disable the NMI watchdog during a crash
>
>   nmi_shootdown_cpus() is called during a crash to park all the other
>   CPUs.  This changes the NMI trap handlers which means there''s no
point
>   in having the watchdog still running.
>
>   This also disables the watchdog before executing any crash kexec image
>   and prevents the image from receiving unexpected NMIs."
>
> David
Sorry - I took the patch as-was out of the patch queue, which had no
message.  I forgot to check the commit history for the message being
there, and therefore wrote the message myself from scratch.

I am not too fussed which message gets used.

~Andrew

Jan Beulich

2013-Nov-18 09:26 UTC

head link

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

>>> On 15.11.13 at 21:32, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> --- a/xen/arch/x86/crash.c
> +++ b/xen/arch/x86/crash.c
> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void)
>      unsigned long msecs;
>      int i, cpu = smp_processor_id();
>  
> +    disable_lapic_nmi_watchdog();
>      local_irq_disable();
>  
>      crashing_cpu = cpu;
_If_ you do this here, I wonder why it''s being done before
disabling interrupts.

But then again I wonder whether it wouldn''t be better to do
this even earlier (i.e. by passing a flag to watchdog_disable()),
as the NMI watchdog becomes useless with that call being done
from kexec_common_shutdown().
> --- a/xen/arch/x86/nmi.c
> +++ b/xen/arch/x86/nmi.c
> @@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused)
>      set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000));
>  }
>  
> -static void disable_lapic_nmi_watchdog(void)
> +void disable_lapic_nmi_watchdog(void)
The suggested alternative would also make it unnecessary to
make this function non-static...

Jan

Andrew Cooper

2013-Nov-18 10:33 UTC

head link

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

On 18/11/13 09:26, Jan Beulich wrote:>>>> On 15.11.13 at 21:32, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> --- a/xen/arch/x86/crash.c
>> +++ b/xen/arch/x86/crash.c
>> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void)
>>      unsigned long msecs;
>>      int i, cpu = smp_processor_id();
>>  
>> +    disable_lapic_nmi_watchdog();
>>      local_irq_disable();
>>  
>>      crashing_cpu = cpu;
> _If_ you do this here, I wonder why it''s being done before
> disabling interrupts.
>
> But then again I wonder whether it wouldn''t be better to do
> this even earlier (i.e. by passing a flag to watchdog_disable()),
> as the NMI watchdog becomes useless with that call being done
> from kexec_common_shutdown().
Disabling interrupts here is more defensive coding than anything else. 
It is not expected to be able to get here with interrupts enabled, but
in a crash

Putting this in watchdog_disable() would result in a race condition. 
disable_lapic_nmi_watchdog() mutates global state, meaning that it can
only possibly run correctly on a single cpu. In an ideal world with
plenty of time, the lapic watchdog code could be improved.  However,
restricting its use until after one_cpu_only() is the easiest fix.

~Andrew
>
>> --- a/xen/arch/x86/nmi.c
>> +++ b/xen/arch/x86/nmi.c
>> @@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused)
>>      set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000));
>>  }
>>  
>> -static void disable_lapic_nmi_watchdog(void)
>> +void disable_lapic_nmi_watchdog(void)
> The suggested alternative would also make it unnecessary to
> make this function non-static...
>
> Jan
>

Andrew Cooper

2013-Nov-18 10:35 UTC

head link

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

On 18/11/13 10:33, Andrew Cooper wrote:> On 18/11/13 09:26, Jan Beulich wrote:
>>>>> On 15.11.13 at 21:32, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/crash.c
>>> +++ b/xen/arch/x86/crash.c
>>> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void)
>>>      unsigned long msecs;
>>>      int i, cpu = smp_processor_id();
>>>  
>>> +    disable_lapic_nmi_watchdog();
>>>      local_irq_disable();
>>>  
>>>      crashing_cpu = cpu;
>> _If_ you do this here, I wonder why it''s being done before
>> disabling interrupts.
>>
>> But then again I wonder whether it wouldn''t be better to do
>> this even earlier (i.e. by passing a flag to watchdog_disable()),
>> as the NMI watchdog becomes useless with that call being done
>> from kexec_common_shutdown().
> Disabling interrupts here is more defensive coding than anything else. 
> It is not expected to be able to get here with interrupts enabled, but
> in a crash
... we can make no guarantees.
>
> Putting this in watchdog_disable() would result in a race condition. 
> disable_lapic_nmi_watchdog() mutates global state, meaning that it can
> only possibly run correctly on a single cpu. In an ideal world with
> plenty of time, the lapic watchdog code could be improved.  However,
> restricting its use until after one_cpu_only() is the easiest fix.
>
> ~Andrew
>
>>> --- a/xen/arch/x86/nmi.c
>>> +++ b/xen/arch/x86/nmi.c
>>> @@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused)
>>>      set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000));
>>>  }
>>>  
>>> -static void disable_lapic_nmi_watchdog(void)
>>> +void disable_lapic_nmi_watchdog(void)
>> The suggested alternative would also make it unnecessary to
>> make this function non-static...
>>
>> Jan
>>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

Jan Beulich

2013-Nov-18 11:04 UTC

head link

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

>>> On 18.11.13 at 11:33, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 18/11/13 09:26, Jan Beulich wrote:
>>>>> On 15.11.13 at 21:32, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>>> --- a/xen/arch/x86/crash.c
>>> +++ b/xen/arch/x86/crash.c
>>> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void)
>>>      unsigned long msecs;
>>>      int i, cpu = smp_processor_id();
>>>  
>>> +    disable_lapic_nmi_watchdog();
>>>      local_irq_disable();
>>>  
>>>      crashing_cpu = cpu;
>> _If_ you do this here, I wonder why it''s being done before
>> disabling interrupts.
>>
>> But then again I wonder whether it wouldn''t be better to do
>> this even earlier (i.e. by passing a flag to watchdog_disable()),
>> as the NMI watchdog becomes useless with that call being done
>> from kexec_common_shutdown().
> 
> Disabling interrupts here is more defensive coding than anything else. 
> It is not expected to be able to get here with interrupts enabled, but
> in a crash
> 
> Putting this in watchdog_disable() would result in a race condition. 
> disable_lapic_nmi_watchdog() mutates global state, meaning that it can
> only possibly run correctly on a single cpu. In an ideal world with
> plenty of time, the lapic watchdog code could be improved.  However,
> restricting its use until after one_cpu_only() is the easiest fix.
Just looked at disable_lapic_nmi_watchdog() another time, and I
don''t see the race. What I do see though is that you''d need to
run this on each CPU for it to have the full intended effect...

Jan

Andrew Cooper

2013-Nov-18 11:09 UTC

head link

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

On 18/11/13 11:04, Jan Beulich wrote:>>>> On 18.11.13 at 11:33, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> On 18/11/13 09:26, Jan Beulich wrote:
>>>>>> On 15.11.13 at 21:32, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>>>> --- a/xen/arch/x86/crash.c
>>>> +++ b/xen/arch/x86/crash.c
>>>> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void)
>>>>      unsigned long msecs;
>>>>      int i, cpu = smp_processor_id();
>>>>  
>>>> +    disable_lapic_nmi_watchdog();
>>>>      local_irq_disable();
>>>>  
>>>>      crashing_cpu = cpu;
>>> _If_ you do this here, I wonder why it''s being done before
>>> disabling interrupts.
>>>
>>> But then again I wonder whether it wouldn''t be better to
do
>>> this even earlier (i.e. by passing a flag to watchdog_disable()),
>>> as the NMI watchdog becomes useless with that call being done
>>> from kexec_common_shutdown().
>> Disabling interrupts here is more defensive coding than anything else. 
>> It is not expected to be able to get here with interrupts enabled, but
>> in a crash
>>
>> Putting this in watchdog_disable() would result in a race condition. 
>> disable_lapic_nmi_watchdog() mutates global state, meaning that it can
>> only possibly run correctly on a single cpu. In an ideal world with
>> plenty of time, the lapic watchdog code could be improved.  However,
>> restricting its use until after one_cpu_only() is the easiest fix.
> Just looked at disable_lapic_nmi_watchdog() another time, and I
> don''t see the race. What I do see though is that you''d
need to
> run this on each CPU for it to have the full intended effect...
>
> Jan
>
disable_lapic_nmi_watchdog() sets nmi_active to -1, which casues
subsequent calls to exit early.

There are possible crash paths where a first cpu will execute
watchdog_disable() but a subsequent cpu will complete the kexec path. 
Hooking disable_lapic_nmi_watchdog() off watchdog_disable() will result
in certain paths where the wrong cpu ends up with its MSRs properly
disabled.

~Andrew

disable_lapic_nmi_watchdog()

Ian Campbell

2013-Nov-19 10:53 UTC

head link

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

On Fri, 2013-11-15 at 20:32 +0000, Andrew Cooper wrote:> PVOps Linux as a kexec image shoots itself in the foot otherwise.
> 
> On a Core2 system, Linux declares a firmware bug and tries to invert some
bits
> in the performance counter register.  It ends up setting the number of
retired
> instructions to generate another NMI to fewer instructions than the NMI
> interrupt path itself, and ceases to make any useful progress.
> 
> While this is not strictly Xen''s fault, Xen can at least be kind
and leave the
> kexec environment with fewer issues to deal with.
> 
> From: David Vrabel <david.vrabel@citrix.com>
This belongs in the first line, such that git am will DTRT. If you "git
commit --author ''David Vrabel <...>'' then git send-email
does TRT for
you.

Ian.

Andrew Cooper

2013-Nov-20 15:08 UTC

head link

[Patch v2 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu

From: David Vrabel <david.vrabel@citrix.com>

nmi_shootdown_cpus() is called during a crash to park all the other
CPUs.  This changes the NMI trap handlers which means there''s no point
in having the watchdog still running.

This also disables the watchdog before executing any crash kexec image
and prevents the image from receiving unexpected NMIs.

Signed-off-by: David Vrabel <david.vrabel@citrix.com>

PVOps Linux as a kexec image shoots itself in the foot otherwise.

On a Core2 system, Linux declares a firmware bug and tries to invert some bits
in the performance counter register.  It ends up setting the number of retired
instructions to generate another NMI to fewer instructions than the NMI
interrupt path itself, and ceases to make any useful progress.

The call to disable_lapic_nmi_watchdog() must be this late into the kexec path
to be sure that this cpu is the one which will execute the kexec image.
Otherwise there are race conditions where the NMIs might be disabled on the
wrong cpu, resulting in the kexec image still receiving NMIs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Keir Fraser <keir@xen.org>
CC: Jan Beulich <JBeulich@suse.com>
CC: Tim Deegan <tim@xen.org>

---
Changes in v2:
 * Return the original commit message which I originally failed to find in our
   source control, and further clarify the positioning of the call.
---
 xen/arch/x86/crash.c       |    1 +
 xen/arch/x86/nmi.c         |    2 +-
 xen/include/asm-x86/apic.h |    1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c
index 4ef379b..01fd906 100644
--- a/xen/arch/x86/crash.c
+++ b/xen/arch/x86/crash.c
@@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void)
     unsigned long msecs;
     int i, cpu = smp_processor_id();
 
+    disable_lapic_nmi_watchdog();
     local_irq_disable();
 
     crashing_cpu = cpu;
diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c
index 2c16d93..c67a9c3 100644
--- a/xen/arch/x86/nmi.c
+++ b/xen/arch/x86/nmi.c
@@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused)
     set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000));
 }
 
-static void disable_lapic_nmi_watchdog(void)
+void disable_lapic_nmi_watchdog(void)
 {
     if (nmi_active <= 0)
         return;
diff --git a/xen/include/asm-x86/apic.h b/xen/include/asm-x86/apic.h
index dd528fb..43b39ce 100644
--- a/xen/include/asm-x86/apic.h
+++ b/xen/include/asm-x86/apic.h
@@ -200,6 +200,7 @@ extern void smp_local_timer_interrupt (struct cpu_user_regs
*regs);
 extern void setup_boot_APIC_clock (void);
 extern void setup_secondary_APIC_clock (void);
 extern void setup_apic_nmi_watchdog (void);
+extern void disable_lapic_nmi_watchdog(void);
 extern int reserve_lapic_nmi(void);
 extern void release_lapic_nmi(void);
 extern void self_nmi(void);
-- 
1.7.10.4

Andrew Cooper

2013-Nov-22 14:55 UTC

head link

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

Ping?

On 15/11/13 20:32, Andrew Cooper wrote:> In some cases, such as suffering a queued-invalidation timeout while
> performing an iommu_crash_shutdown(), Xen can end up reentering the crash
> path. Previously, this would result in a deadlock in one_cpu_only(), as the
> test_and_set_bit() would fail.
>
> The crash path is not reentrant, and even if it could be made to be so, it
is
> almost certain that we would fall over the same reentry condition again.
>
> The new code can distinguish a reentry case from multiple cpus racing down
the
> crash path.  In the case that a reentry is detected, return back out to the
> nested panic() call, which will maybe_reboot() on our behalf.  This
requires a
> bit of return plumbing back up to kexec_crash().
>
> While fixing this deadlock, also fix up an minor niggle seen recently from
a
> XenServer crash report.  The report was from a Bank 8 MCE, which had
managed
> to crash on all cpus at once.  The result was a lot of stack traces with
cpus
> in kexec_common_shutdown(), which was infact the inlined version of
> one_cpu_only().  The kexec crash path is not a hotpath, so we can easily
> afford to prevent inlining for the sake of clarity in the stack traces.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Tim Deegan <tim@xen.org>
> CC: David Vrabel <david.vrabel@citrix.com>
> ---
>  xen/common/kexec.c |   51
++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 44 insertions(+), 7 deletions(-)
>
> diff --git a/xen/common/kexec.c b/xen/common/kexec.c
> index 17f3ed7..481b0c2 100644
> --- a/xen/common/kexec.c
> +++ b/xen/common/kexec.c
> @@ -233,11 +233,39 @@ void __init set_kexec_crash_area_size(u64 system_ram)
>      }
>  }
>  
> -static void one_cpu_only(void)
> +/*
> + * Only allow one cpu to continue on the crash path, forcing others to
spin.
> + * Racing on the crash path from here will end in misery.  If we reenter,
> + * something has very gone wrong and retrying will (almost certainly) be
> + * futile.  Return up to our nested panic() to try and reboot.
> + *
> + * This is noinline to make it obvious in stack traces which cpus have
lost
> + * the race (as opposed to being somewhere in kexec_common_shutdown())
> + */
> +static int noinline one_cpu_only(void)
>  {
> -    /* Only allow the first cpu to continue - force other cpus to spin */
> -    if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) )
> -        for ( ; ; ) ;
> +    static unsigned int crashing_cpu = -1;
> +    unsigned int cpu = smp_processor_id();
> +
> +    if ( cmpxchg(&crashing_cpu, -1, cpu) != -1 )
> +    {
> +        /* Not the first entry into one_cpu_only(). */
> +        if ( crashing_cpu == cpu )
> +        {
> +            printk("Reentered the crash path.  Something is very
broken\n");
> +            return -EBUSY;
> +        }
> +
> +        /*
> +         * Another cpu has beaten us to this point.  Wait here patiently
for
> +         * it to kill us.
> +         */
> +        for ( ; ; )
> +            halt();
> +    }
> +
> +    set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags);
> +    return 0;
>  }
>  
>  /* Save the registers in the per-cpu crash note buffer. */
> @@ -288,13 +316,20 @@ crash_xen_info_t *kexec_crash_save_info(void)
>      return out;
>  }
>  
> -static void kexec_common_shutdown(void)
> +static int kexec_common_shutdown(void)
>  {
> +    int ret;
> +
> +    ret = one_cpu_only();
> +    if ( ret )
> +        return ret;
> +
>      watchdog_disable();
>      console_start_sync();
>      spin_debug_disable();
> -    one_cpu_only();
>      acpi_dmar_reinstate();
> +
> +    return 0;
>  }
>  
>  void kexec_crash(void)
> @@ -309,7 +344,9 @@ void kexec_crash(void)
>  
>      kexecing = TRUE;
>  
> -    kexec_common_shutdown();
> +    if ( kexec_common_shutdown() != 0 )
> +        return;
> +
>      kexec_crash_save_cpu();
>      machine_crash_shutdown();
>      machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);

Jan Beulich

2013-Nov-25 13:28 UTC

head link

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

>>> On 15.11.13 at 21:32, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> In some cases, such as suffering a queued-invalidation timeout while
> performing an iommu_crash_shutdown(), Xen can end up reentering the crash
> path. Previously, this would result in a deadlock in one_cpu_only(), as the
> test_and_set_bit() would fail.
> 
> The crash path is not reentrant, and even if it could be made to be so, it
is
> almost certain that we would fall over the same reentry condition again.
> 
> The new code can distinguish a reentry case from multiple cpus racing down
the
> crash path.  In the case that a reentry is detected, return back out to the
> nested panic() call, which will maybe_reboot() on our behalf.  This
requires a
> bit of return plumbing back up to kexec_crash().
> 
> While fixing this deadlock, also fix up an minor niggle seen recently from
a
> XenServer crash report.  The report was from a Bank 8 MCE, which had
managed
> to crash on all cpus at once.  The result was a lot of stack traces with
cpus
> in kexec_common_shutdown(), which was infact the inlined version of
> one_cpu_only().  The kexec crash path is not a hotpath, so we can easily
> afford to prevent inlining for the sake of clarity in the stack traces.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Tim Deegan <tim@xen.org>
> CC: David Vrabel <david.vrabel@citrix.com>
> ---
>  xen/common/kexec.c |   51
++++++++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 44 insertions(+), 7 deletions(-)
David, you being the maintainer of this code now, I don''t think
I''ve
seen a response from you on this patch, despite - iirc - Andrew
having pinged you on it already too.

Jan

Andrew Cooper

2013-Nov-25 13:30 UTC

head link

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

On 25/11/13 13:28, Jan Beulich wrote:>>>> On 15.11.13 at 21:32, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> In some cases, such as suffering a queued-invalidation timeout while
>> performing an iommu_crash_shutdown(), Xen can end up reentering the
crash
>> path. Previously, this would result in a deadlock in one_cpu_only(), as
the
>> test_and_set_bit() would fail.
>>
>> The crash path is not reentrant, and even if it could be made to be so,
it is
>> almost certain that we would fall over the same reentry condition
again.
>>
>> The new code can distinguish a reentry case from multiple cpus racing
down the
>> crash path.  In the case that a reentry is detected, return back out to
the
>> nested panic() call, which will maybe_reboot() on our behalf.  This
requires a
>> bit of return plumbing back up to kexec_crash().
>>
>> While fixing this deadlock, also fix up an minor niggle seen recently
from a
>> XenServer crash report.  The report was from a Bank 8 MCE, which had
managed
>> to crash on all cpus at once.  The result was a lot of stack traces
with cpus
>> in kexec_common_shutdown(), which was infact the inlined version of
>> one_cpu_only().  The kexec crash path is not a hotpath, so we can
easily
>> afford to prevent inlining for the sake of clarity in the stack traces.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> CC: Keir Fraser <keir@xen.org>
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Tim Deegan <tim@xen.org>
>> CC: David Vrabel <david.vrabel@citrix.com>
>> ---
>>  xen/common/kexec.c |   51
++++++++++++++++++++++++++++++++++++++++++++-------
>>  1 file changed, 44 insertions(+), 7 deletions(-)
> David, you being the maintainer of this code now, I don''t think
I''ve
> seen a response from you on this patch, despite - iirc - Andrew
> having pinged you on it already too.
>
> Jan
>
David is out of the office for a week on vacation at the moment.

I did get code-review from him before submitting it upstream, but I
guess that doesn''t count for much as a formal ack.

~Andrew

Jan Beulich

2013-Nov-25 13:39 UTC

head link

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

>>> On 25.11.13 at 14:30, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
> On 25/11/13 13:28, Jan Beulich wrote:
>>>>> On 15.11.13 at 21:32, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>>> In some cases, such as suffering a queued-invalidation timeout
while
>>> performing an iommu_crash_shutdown(), Xen can end up reentering the
crash
>>> path. Previously, this would result in a deadlock in
one_cpu_only(), as the
>>> test_and_set_bit() would fail.
>>>
>>> The crash path is not reentrant, and even if it could be made to be
so, it
> is
>>> almost certain that we would fall over the same reentry condition
again.
>>>
>>> The new code can distinguish a reentry case from multiple cpus
racing down
> the
>>> crash path.  In the case that a reentry is detected, return back
out to the
>>> nested panic() call, which will maybe_reboot() on our behalf.  This
requires
> a
>>> bit of return plumbing back up to kexec_crash().
>>>
>>> While fixing this deadlock, also fix up an minor niggle seen
recently from a
>>> XenServer crash report.  The report was from a Bank 8 MCE, which
had managed
>>> to crash on all cpus at once.  The result was a lot of stack traces
with
> cpus
>>> in kexec_common_shutdown(), which was infact the inlined version of
>>> one_cpu_only().  The kexec crash path is not a hotpath, so we can
easily
>>> afford to prevent inlining for the sake of clarity in the stack
traces.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> CC: Keir Fraser <keir@xen.org>
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Tim Deegan <tim@xen.org>
>>> CC: David Vrabel <david.vrabel@citrix.com>
>>> ---
>>>  xen/common/kexec.c |   51
++++++++++++++++++++++++++++++++++++++++++++-------
>>>  1 file changed, 44 insertions(+), 7 deletions(-)
>> David, you being the maintainer of this code now, I don''t
think I''ve
>> seen a response from you on this patch, despite - iirc - Andrew
>> having pinged you on it already too.
>>
>> Jan
>>
> 
> David is out of the office for a week on vacation at the moment.
> 
> I did get code-review from him before submitting it upstream, but I
> guess that doesn''t count for much as a formal ack.
Then why did you not include his Reviewed-by with the patch?
That would have sufficed (and put you in a bad light in case he
came back and said he didn''t do any such review - which I trust
he did if you say so).

Jan

Andrew Cooper

2013-Nov-25 15:38 UTC

head link

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

On 25/11/13 13:39, Jan Beulich wrote:>>>> On 25.11.13 at 14:30, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>> On 25/11/13 13:28, Jan Beulich wrote:
>>>>>> On 15.11.13 at 21:32, Andrew Cooper
<andrew.cooper3@citrix.com> wrote:
>>>> In some cases, such as suffering a queued-invalidation timeout
while
>>>> performing an iommu_crash_shutdown(), Xen can end up reentering
the crash
>>>> path. Previously, this would result in a deadlock in
one_cpu_only(), as the
>>>> test_and_set_bit() would fail.
>>>>
>>>> The crash path is not reentrant, and even if it could be made
to be so, it
>> is
>>>> almost certain that we would fall over the same reentry
condition again.
>>>>
>>>> The new code can distinguish a reentry case from multiple cpus
racing down
>> the
>>>> crash path.  In the case that a reentry is detected, return
back out to the
>>>> nested panic() call, which will maybe_reboot() on our behalf. 
This requires
>> a
>>>> bit of return plumbing back up to kexec_crash().
>>>>
>>>> While fixing this deadlock, also fix up an minor niggle seen
recently from a
>>>> XenServer crash report.  The report was from a Bank 8 MCE,
which had managed
>>>> to crash on all cpus at once.  The result was a lot of stack
traces with
>> cpus
>>>> in kexec_common_shutdown(), which was infact the inlined
version of
>>>> one_cpu_only().  The kexec crash path is not a hotpath, so we
can easily
>>>> afford to prevent inlining for the sake of clarity in the stack
traces.
>>>>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> CC: Keir Fraser <keir@xen.org>
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Tim Deegan <tim@xen.org>
>>>> CC: David Vrabel <david.vrabel@citrix.com>
>>>> ---
>>>>  xen/common/kexec.c |   51
++++++++++++++++++++++++++++++++++++++++++++-------
>>>>  1 file changed, 44 insertions(+), 7 deletions(-)
>>> David, you being the maintainer of this code now, I don''t
think I''ve
>>> seen a response from you on this patch, despite - iirc - Andrew
>>> having pinged you on it already too.
>>>
>>> Jan
>>>
>> David is out of the office for a week on vacation at the moment.
>>
>> I did get code-review from him before submitting it upstream, but I
>> guess that doesn''t count for much as a formal ack.
> Then why did you not include his Reviewed-by with the patch?
> That would have sufficed (and put you in a bad light in case he
> came back and said he didn''t do any such review - which I trust
> he did if you say so).
>
> Jan
>
I think we had agreed that he would formally review it on-list, but it
was late one evening of a busy week, so I suspect it just got forgotten.

~Andrew

David Vrabel

2013-Nov-27 10:27 UTC

head link

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

On 15/11/2013 20:32, Andrew Cooper wrote:> In some cases, such as suffering a queued-invalidation timeout while
> performing an iommu_crash_shutdown(), Xen can end up reentering the crash
> path. Previously, this would result in a deadlock in one_cpu_only(), as the
> test_and_set_bit() would fail.
> 
> The crash path is not reentrant, and even if it could be made to be so, it
is
> almost certain that we would fall over the same reentry condition again.
> 
> The new code can distinguish a reentry case from multiple cpus racing down
the
> crash path.  In the case that a reentry is detected, return back out to the
> nested panic() call, which will maybe_reboot() on our behalf.  This
requires a
> bit of return plumbing back up to kexec_crash().
> 
> While fixing this deadlock, also fix up an minor niggle seen recently from
a
> XenServer crash report.  The report was from a Bank 8 MCE, which had
managed
> to crash on all cpus at once.  The result was a lot of stack traces with
cpus
> in kexec_common_shutdown(), which was infact the inlined version of
> one_cpu_only().  The kexec crash path is not a hotpath, so we can easily
> afford to prevent inlining for the sake of clarity in the stack traces.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> CC: Keir Fraser <keir@xen.org>
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Tim Deegan <tim@xen.org>
> CC: David Vrabel <david.vrabel@citrix.com>
Reviewed-by: David Vrabel <david.vrabel@citrix.com>

David

Xen devel - Nov 2013 - [PATCH 0/2] Kexec crash path fixes

[PATCH 0/2] Kexec crash path fixes

[PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

[PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.

[Patch v2 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.

Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.