Hello, Here are two fixes for the kexec crash path itself, both from issues XenServer has encountered. They should both be considered for backport. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: David Vrabel <david.vrabel@citrix.com> -- 1.7.10.4
Andrew Cooper
2013-Nov-15 20:32 UTC
[PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.
In some cases, such as suffering a queued-invalidation timeout while performing an iommu_crash_shutdown(), Xen can end up reentering the crash path. Previously, this would result in a deadlock in one_cpu_only(), as the test_and_set_bit() would fail. The crash path is not reentrant, and even if it could be made to be so, it is almost certain that we would fall over the same reentry condition again. The new code can distinguish a reentry case from multiple cpus racing down the crash path. In the case that a reentry is detected, return back out to the nested panic() call, which will maybe_reboot() on our behalf. This requires a bit of return plumbing back up to kexec_crash(). While fixing this deadlock, also fix up an minor niggle seen recently from a XenServer crash report. The report was from a Bank 8 MCE, which had managed to crash on all cpus at once. The result was a lot of stack traces with cpus in kexec_common_shutdown(), which was infact the inlined version of one_cpu_only(). The kexec crash path is not a hotpath, so we can easily afford to prevent inlining for the sake of clarity in the stack traces. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> CC: Tim Deegan <tim@xen.org> CC: David Vrabel <david.vrabel@citrix.com> --- xen/common/kexec.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/xen/common/kexec.c b/xen/common/kexec.c index 17f3ed7..481b0c2 100644 --- a/xen/common/kexec.c +++ b/xen/common/kexec.c @@ -233,11 +233,39 @@ void __init set_kexec_crash_area_size(u64 system_ram) } } -static void one_cpu_only(void) +/* + * Only allow one cpu to continue on the crash path, forcing others to spin. + * Racing on the crash path from here will end in misery. If we reenter, + * something has very gone wrong and retrying will (almost certainly) be + * futile. Return up to our nested panic() to try and reboot. + * + * This is noinline to make it obvious in stack traces which cpus have lost + * the race (as opposed to being somewhere in kexec_common_shutdown()) + */ +static int noinline one_cpu_only(void) { - /* Only allow the first cpu to continue - force other cpus to spin */ - if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) ) - for ( ; ; ) ; + static unsigned int crashing_cpu = -1; + unsigned int cpu = smp_processor_id(); + + if ( cmpxchg(&crashing_cpu, -1, cpu) != -1 ) + { + /* Not the first entry into one_cpu_only(). */ + if ( crashing_cpu == cpu ) + { + printk("Reentered the crash path. Something is very broken\n"); + return -EBUSY; + } + + /* + * Another cpu has beaten us to this point. Wait here patiently for + * it to kill us. + */ + for ( ; ; ) + halt(); + } + + set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags); + return 0; } /* Save the registers in the per-cpu crash note buffer. */ @@ -288,13 +316,20 @@ crash_xen_info_t *kexec_crash_save_info(void) return out; } -static void kexec_common_shutdown(void) +static int kexec_common_shutdown(void) { + int ret; + + ret = one_cpu_only(); + if ( ret ) + return ret; + watchdog_disable(); console_start_sync(); spin_debug_disable(); - one_cpu_only(); acpi_dmar_reinstate(); + + return 0; } void kexec_crash(void) @@ -309,7 +344,9 @@ void kexec_crash(void) kexecing = TRUE; - kexec_common_shutdown(); + if ( kexec_common_shutdown() != 0 ) + return; + kexec_crash_save_cpu(); machine_crash_shutdown(); machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]); -- 1.7.10.4
Andrew Cooper
2013-Nov-15 20:32 UTC
[PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.
PVOps Linux as a kexec image shoots itself in the foot otherwise. On a Core2 system, Linux declares a firmware bug and tries to invert some bits in the performance counter register. It ends up setting the number of retired instructions to generate another NMI to fewer instructions than the NMI interrupt path itself, and ceases to make any useful progress. While this is not strictly Xen''s fault, Xen can at least be kind and leave the kexec environment with fewer issues to deal with. From: David Vrabel <david.vrabel@citrix.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> CC: Tim Deegan <tim@xen.org> --- xen/arch/x86/crash.c | 1 + xen/arch/x86/nmi.c | 2 +- xen/include/asm-x86/apic.h | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c index 4ef379b..01fd906 100644 --- a/xen/arch/x86/crash.c +++ b/xen/arch/x86/crash.c @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void) unsigned long msecs; int i, cpu = smp_processor_id(); + disable_lapic_nmi_watchdog(); local_irq_disable(); crashing_cpu = cpu; diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c index 2c16d93..c67a9c3 100644 --- a/xen/arch/x86/nmi.c +++ b/xen/arch/x86/nmi.c @@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused) set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000)); } -static void disable_lapic_nmi_watchdog(void) +void disable_lapic_nmi_watchdog(void) { if (nmi_active <= 0) return; diff --git a/xen/include/asm-x86/apic.h b/xen/include/asm-x86/apic.h index dd528fb..43b39ce 100644 --- a/xen/include/asm-x86/apic.h +++ b/xen/include/asm-x86/apic.h @@ -200,6 +200,7 @@ extern void smp_local_timer_interrupt (struct cpu_user_regs *regs); extern void setup_boot_APIC_clock (void); extern void setup_secondary_APIC_clock (void); extern void setup_apic_nmi_watchdog (void); +extern void disable_lapic_nmi_watchdog(void); extern int reserve_lapic_nmi(void); extern void release_lapic_nmi(void); extern void self_nmi(void); -- 1.7.10.4
David Vrabel
2013-Nov-15 21:01 UTC
Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.
On 15/11/13 20:32, Andrew Cooper wrote:> PVOps Linux as a kexec image shoots itself in the foot otherwise. > > On a Core2 system, Linux declares a firmware bug and tries to invert some bits > in the performance counter register. It ends up setting the number of retired > instructions to generate another NMI to fewer instructions than the NMI > interrupt path itself, and ceases to make any useful progress. > > While this is not strictly Xen''s fault, Xen can at least be kind and leave the > kexec environment with fewer issues to deal with.I don''t appreciate my commit message being rewritten in this way. My original commit message was: "kexec: disable the NMI watchdog during a crash nmi_shootdown_cpus() is called during a crash to park all the other CPUs. This changes the NMI trap handlers which means there''s no point in having the watchdog still running. This also disables the watchdog before executing any crash kexec image and prevents the image from receiving unexpected NMIs." David
Andrew Cooper
2013-Nov-15 21:09 UTC
Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.
On 15/11/13 21:01, David Vrabel wrote:> On 15/11/13 20:32, Andrew Cooper wrote: >> PVOps Linux as a kexec image shoots itself in the foot otherwise. >> >> On a Core2 system, Linux declares a firmware bug and tries to invert some bits >> in the performance counter register. It ends up setting the number of retired >> instructions to generate another NMI to fewer instructions than the NMI >> interrupt path itself, and ceases to make any useful progress. >> >> While this is not strictly Xen''s fault, Xen can at least be kind and leave the >> kexec environment with fewer issues to deal with. > I don''t appreciate my commit message being rewritten in this way. > > My original commit message was: > > "kexec: disable the NMI watchdog during a crash > > nmi_shootdown_cpus() is called during a crash to park all the other > CPUs. This changes the NMI trap handlers which means there''s no point > in having the watchdog still running. > > This also disables the watchdog before executing any crash kexec image > and prevents the image from receiving unexpected NMIs." > > DavidSorry - I took the patch as-was out of the patch queue, which had no message. I forgot to check the commit history for the message being there, and therefore wrote the message myself from scratch. I am not too fussed which message gets used. ~Andrew
Jan Beulich
2013-Nov-18 09:26 UTC
Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.
>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > --- a/xen/arch/x86/crash.c > +++ b/xen/arch/x86/crash.c > @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void) > unsigned long msecs; > int i, cpu = smp_processor_id(); > > + disable_lapic_nmi_watchdog(); > local_irq_disable(); > > crashing_cpu = cpu;_If_ you do this here, I wonder why it''s being done before disabling interrupts. But then again I wonder whether it wouldn''t be better to do this even earlier (i.e. by passing a flag to watchdog_disable()), as the NMI watchdog becomes useless with that call being done from kexec_common_shutdown().> --- a/xen/arch/x86/nmi.c > +++ b/xen/arch/x86/nmi.c > @@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused) > set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000)); > } > > -static void disable_lapic_nmi_watchdog(void) > +void disable_lapic_nmi_watchdog(void)The suggested alternative would also make it unnecessary to make this function non-static... Jan
Andrew Cooper
2013-Nov-18 10:33 UTC
Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.
On 18/11/13 09:26, Jan Beulich wrote:>>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> --- a/xen/arch/x86/crash.c >> +++ b/xen/arch/x86/crash.c >> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void) >> unsigned long msecs; >> int i, cpu = smp_processor_id(); >> >> + disable_lapic_nmi_watchdog(); >> local_irq_disable(); >> >> crashing_cpu = cpu; > _If_ you do this here, I wonder why it''s being done before > disabling interrupts. > > But then again I wonder whether it wouldn''t be better to do > this even earlier (i.e. by passing a flag to watchdog_disable()), > as the NMI watchdog becomes useless with that call being done > from kexec_common_shutdown().Disabling interrupts here is more defensive coding than anything else. It is not expected to be able to get here with interrupts enabled, but in a crash Putting this in watchdog_disable() would result in a race condition. disable_lapic_nmi_watchdog() mutates global state, meaning that it can only possibly run correctly on a single cpu. In an ideal world with plenty of time, the lapic watchdog code could be improved. However, restricting its use until after one_cpu_only() is the easiest fix. ~Andrew> >> --- a/xen/arch/x86/nmi.c >> +++ b/xen/arch/x86/nmi.c >> @@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused) >> set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000)); >> } >> >> -static void disable_lapic_nmi_watchdog(void) >> +void disable_lapic_nmi_watchdog(void) > The suggested alternative would also make it unnecessary to > make this function non-static... > > Jan >
Andrew Cooper
2013-Nov-18 10:35 UTC
Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.
On 18/11/13 10:33, Andrew Cooper wrote:> On 18/11/13 09:26, Jan Beulich wrote: >>>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> --- a/xen/arch/x86/crash.c >>> +++ b/xen/arch/x86/crash.c >>> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void) >>> unsigned long msecs; >>> int i, cpu = smp_processor_id(); >>> >>> + disable_lapic_nmi_watchdog(); >>> local_irq_disable(); >>> >>> crashing_cpu = cpu; >> _If_ you do this here, I wonder why it''s being done before >> disabling interrupts. >> >> But then again I wonder whether it wouldn''t be better to do >> this even earlier (i.e. by passing a flag to watchdog_disable()), >> as the NMI watchdog becomes useless with that call being done >> from kexec_common_shutdown(). > Disabling interrupts here is more defensive coding than anything else. > It is not expected to be able to get here with interrupts enabled, but > in a crash... we can make no guarantees.> > Putting this in watchdog_disable() would result in a race condition. > disable_lapic_nmi_watchdog() mutates global state, meaning that it can > only possibly run correctly on a single cpu. In an ideal world with > plenty of time, the lapic watchdog code could be improved. However, > restricting its use until after one_cpu_only() is the easiest fix. > > ~Andrew > >>> --- a/xen/arch/x86/nmi.c >>> +++ b/xen/arch/x86/nmi.c >>> @@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused) >>> set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000)); >>> } >>> >>> -static void disable_lapic_nmi_watchdog(void) >>> +void disable_lapic_nmi_watchdog(void) >> The suggested alternative would also make it unnecessary to >> make this function non-static... >> >> Jan >> > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
Jan Beulich
2013-Nov-18 11:04 UTC
Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.
>>> On 18.11.13 at 11:33, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 18/11/13 09:26, Jan Beulich wrote: >>>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> --- a/xen/arch/x86/crash.c >>> +++ b/xen/arch/x86/crash.c >>> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void) >>> unsigned long msecs; >>> int i, cpu = smp_processor_id(); >>> >>> + disable_lapic_nmi_watchdog(); >>> local_irq_disable(); >>> >>> crashing_cpu = cpu; >> _If_ you do this here, I wonder why it''s being done before >> disabling interrupts. >> >> But then again I wonder whether it wouldn''t be better to do >> this even earlier (i.e. by passing a flag to watchdog_disable()), >> as the NMI watchdog becomes useless with that call being done >> from kexec_common_shutdown(). > > Disabling interrupts here is more defensive coding than anything else. > It is not expected to be able to get here with interrupts enabled, but > in a crash > > Putting this in watchdog_disable() would result in a race condition. > disable_lapic_nmi_watchdog() mutates global state, meaning that it can > only possibly run correctly on a single cpu. In an ideal world with > plenty of time, the lapic watchdog code could be improved. However, > restricting its use until after one_cpu_only() is the easiest fix.Just looked at disable_lapic_nmi_watchdog() another time, and I don''t see the race. What I do see though is that you''d need to run this on each CPU for it to have the full intended effect... Jan
Andrew Cooper
2013-Nov-18 11:09 UTC
Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.
On 18/11/13 11:04, Jan Beulich wrote:>>>> On 18.11.13 at 11:33, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> On 18/11/13 09:26, Jan Beulich wrote: >>>>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>>> --- a/xen/arch/x86/crash.c >>>> +++ b/xen/arch/x86/crash.c >>>> @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void) >>>> unsigned long msecs; >>>> int i, cpu = smp_processor_id(); >>>> >>>> + disable_lapic_nmi_watchdog(); >>>> local_irq_disable(); >>>> >>>> crashing_cpu = cpu; >>> _If_ you do this here, I wonder why it''s being done before >>> disabling interrupts. >>> >>> But then again I wonder whether it wouldn''t be better to do >>> this even earlier (i.e. by passing a flag to watchdog_disable()), >>> as the NMI watchdog becomes useless with that call being done >>> from kexec_common_shutdown(). >> Disabling interrupts here is more defensive coding than anything else. >> It is not expected to be able to get here with interrupts enabled, but >> in a crash >> >> Putting this in watchdog_disable() would result in a race condition. >> disable_lapic_nmi_watchdog() mutates global state, meaning that it can >> only possibly run correctly on a single cpu. In an ideal world with >> plenty of time, the lapic watchdog code could be improved. However, >> restricting its use until after one_cpu_only() is the easiest fix. > Just looked at disable_lapic_nmi_watchdog() another time, and I > don''t see the race. What I do see though is that you''d need to > run this on each CPU for it to have the full intended effect... > > Jan >disable_lapic_nmi_watchdog() sets nmi_active to -1, which casues subsequent calls to exit early. There are possible crash paths where a first cpu will execute watchdog_disable() but a subsequent cpu will complete the kexec path. Hooking disable_lapic_nmi_watchdog() off watchdog_disable() will result in certain paths where the wrong cpu ends up with its MSRs properly disabled. ~Andrew disable_lapic_nmi_watchdog()
Ian Campbell
2013-Nov-19 10:53 UTC
Re: [PATCH 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu.
On Fri, 2013-11-15 at 20:32 +0000, Andrew Cooper wrote:> PVOps Linux as a kexec image shoots itself in the foot otherwise. > > On a Core2 system, Linux declares a firmware bug and tries to invert some bits > in the performance counter register. It ends up setting the number of retired > instructions to generate another NMI to fewer instructions than the NMI > interrupt path itself, and ceases to make any useful progress. > > While this is not strictly Xen''s fault, Xen can at least be kind and leave the > kexec environment with fewer issues to deal with. > > From: David Vrabel <david.vrabel@citrix.com>This belongs in the first line, such that git am will DTRT. If you "git commit --author ''David Vrabel <...>'' then git send-email does TRT for you. Ian.
Andrew Cooper
2013-Nov-20 15:08 UTC
[Patch v2 2/2] x86/crash: Disable the watchdog NMIs on the crashing cpu
From: David Vrabel <david.vrabel@citrix.com> nmi_shootdown_cpus() is called during a crash to park all the other CPUs. This changes the NMI trap handlers which means there''s no point in having the watchdog still running. This also disables the watchdog before executing any crash kexec image and prevents the image from receiving unexpected NMIs. Signed-off-by: David Vrabel <david.vrabel@citrix.com> PVOps Linux as a kexec image shoots itself in the foot otherwise. On a Core2 system, Linux declares a firmware bug and tries to invert some bits in the performance counter register. It ends up setting the number of retired instructions to generate another NMI to fewer instructions than the NMI interrupt path itself, and ceases to make any useful progress. The call to disable_lapic_nmi_watchdog() must be this late into the kexec path to be sure that this cpu is the one which will execute the kexec image. Otherwise there are race conditions where the NMIs might be disabled on the wrong cpu, resulting in the kexec image still receiving NMIs. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> CC: Keir Fraser <keir@xen.org> CC: Jan Beulich <JBeulich@suse.com> CC: Tim Deegan <tim@xen.org> --- Changes in v2: * Return the original commit message which I originally failed to find in our source control, and further clarify the positioning of the call. --- xen/arch/x86/crash.c | 1 + xen/arch/x86/nmi.c | 2 +- xen/include/asm-x86/apic.h | 1 + 3 files changed, 3 insertions(+), 1 deletion(-) diff --git a/xen/arch/x86/crash.c b/xen/arch/x86/crash.c index 4ef379b..01fd906 100644 --- a/xen/arch/x86/crash.c +++ b/xen/arch/x86/crash.c @@ -118,6 +118,7 @@ static void nmi_shootdown_cpus(void) unsigned long msecs; int i, cpu = smp_processor_id(); + disable_lapic_nmi_watchdog(); local_irq_disable(); crashing_cpu = cpu; diff --git a/xen/arch/x86/nmi.c b/xen/arch/x86/nmi.c index 2c16d93..c67a9c3 100644 --- a/xen/arch/x86/nmi.c +++ b/xen/arch/x86/nmi.c @@ -165,7 +165,7 @@ static void nmi_timer_fn(void *unused) set_timer(&this_cpu(nmi_timer), NOW() + MILLISECS(1000)); } -static void disable_lapic_nmi_watchdog(void) +void disable_lapic_nmi_watchdog(void) { if (nmi_active <= 0) return; diff --git a/xen/include/asm-x86/apic.h b/xen/include/asm-x86/apic.h index dd528fb..43b39ce 100644 --- a/xen/include/asm-x86/apic.h +++ b/xen/include/asm-x86/apic.h @@ -200,6 +200,7 @@ extern void smp_local_timer_interrupt (struct cpu_user_regs *regs); extern void setup_boot_APIC_clock (void); extern void setup_secondary_APIC_clock (void); extern void setup_apic_nmi_watchdog (void); +extern void disable_lapic_nmi_watchdog(void); extern int reserve_lapic_nmi(void); extern void release_lapic_nmi(void); extern void self_nmi(void); -- 1.7.10.4
Andrew Cooper
2013-Nov-22 14:55 UTC
Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.
Ping? On 15/11/13 20:32, Andrew Cooper wrote:> In some cases, such as suffering a queued-invalidation timeout while > performing an iommu_crash_shutdown(), Xen can end up reentering the crash > path. Previously, this would result in a deadlock in one_cpu_only(), as the > test_and_set_bit() would fail. > > The crash path is not reentrant, and even if it could be made to be so, it is > almost certain that we would fall over the same reentry condition again. > > The new code can distinguish a reentry case from multiple cpus racing down the > crash path. In the case that a reentry is detected, return back out to the > nested panic() call, which will maybe_reboot() on our behalf. This requires a > bit of return plumbing back up to kexec_crash(). > > While fixing this deadlock, also fix up an minor niggle seen recently from a > XenServer crash report. The report was from a Bank 8 MCE, which had managed > to crash on all cpus at once. The result was a lot of stack traces with cpus > in kexec_common_shutdown(), which was infact the inlined version of > one_cpu_only(). The kexec crash path is not a hotpath, so we can easily > afford to prevent inlining for the sake of clarity in the stack traces. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Keir Fraser <keir@xen.org> > CC: Jan Beulich <JBeulich@suse.com> > CC: Tim Deegan <tim@xen.org> > CC: David Vrabel <david.vrabel@citrix.com> > --- > xen/common/kexec.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 44 insertions(+), 7 deletions(-) > > diff --git a/xen/common/kexec.c b/xen/common/kexec.c > index 17f3ed7..481b0c2 100644 > --- a/xen/common/kexec.c > +++ b/xen/common/kexec.c > @@ -233,11 +233,39 @@ void __init set_kexec_crash_area_size(u64 system_ram) > } > } > > -static void one_cpu_only(void) > +/* > + * Only allow one cpu to continue on the crash path, forcing others to spin. > + * Racing on the crash path from here will end in misery. If we reenter, > + * something has very gone wrong and retrying will (almost certainly) be > + * futile. Return up to our nested panic() to try and reboot. > + * > + * This is noinline to make it obvious in stack traces which cpus have lost > + * the race (as opposed to being somewhere in kexec_common_shutdown()) > + */ > +static int noinline one_cpu_only(void) > { > - /* Only allow the first cpu to continue - force other cpus to spin */ > - if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) ) > - for ( ; ; ) ; > + static unsigned int crashing_cpu = -1; > + unsigned int cpu = smp_processor_id(); > + > + if ( cmpxchg(&crashing_cpu, -1, cpu) != -1 ) > + { > + /* Not the first entry into one_cpu_only(). */ > + if ( crashing_cpu == cpu ) > + { > + printk("Reentered the crash path. Something is very broken\n"); > + return -EBUSY; > + } > + > + /* > + * Another cpu has beaten us to this point. Wait here patiently for > + * it to kill us. > + */ > + for ( ; ; ) > + halt(); > + } > + > + set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags); > + return 0; > } > > /* Save the registers in the per-cpu crash note buffer. */ > @@ -288,13 +316,20 @@ crash_xen_info_t *kexec_crash_save_info(void) > return out; > } > > -static void kexec_common_shutdown(void) > +static int kexec_common_shutdown(void) > { > + int ret; > + > + ret = one_cpu_only(); > + if ( ret ) > + return ret; > + > watchdog_disable(); > console_start_sync(); > spin_debug_disable(); > - one_cpu_only(); > acpi_dmar_reinstate(); > + > + return 0; > } > > void kexec_crash(void) > @@ -309,7 +344,9 @@ void kexec_crash(void) > > kexecing = TRUE; > > - kexec_common_shutdown(); > + if ( kexec_common_shutdown() != 0 ) > + return; > + > kexec_crash_save_cpu(); > machine_crash_shutdown(); > machine_kexec(kexec_image[KEXEC_IMAGE_CRASH_BASE + pos]);
Jan Beulich
2013-Nov-25 13:28 UTC
Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.
>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > In some cases, such as suffering a queued-invalidation timeout while > performing an iommu_crash_shutdown(), Xen can end up reentering the crash > path. Previously, this would result in a deadlock in one_cpu_only(), as the > test_and_set_bit() would fail. > > The crash path is not reentrant, and even if it could be made to be so, it is > almost certain that we would fall over the same reentry condition again. > > The new code can distinguish a reentry case from multiple cpus racing down the > crash path. In the case that a reentry is detected, return back out to the > nested panic() call, which will maybe_reboot() on our behalf. This requires a > bit of return plumbing back up to kexec_crash(). > > While fixing this deadlock, also fix up an minor niggle seen recently from a > XenServer crash report. The report was from a Bank 8 MCE, which had managed > to crash on all cpus at once. The result was a lot of stack traces with cpus > in kexec_common_shutdown(), which was infact the inlined version of > one_cpu_only(). The kexec crash path is not a hotpath, so we can easily > afford to prevent inlining for the sake of clarity in the stack traces. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Keir Fraser <keir@xen.org> > CC: Jan Beulich <JBeulich@suse.com> > CC: Tim Deegan <tim@xen.org> > CC: David Vrabel <david.vrabel@citrix.com> > --- > xen/common/kexec.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- > 1 file changed, 44 insertions(+), 7 deletions(-)David, you being the maintainer of this code now, I don''t think I''ve seen a response from you on this patch, despite - iirc - Andrew having pinged you on it already too. Jan
Andrew Cooper
2013-Nov-25 13:30 UTC
Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.
On 25/11/13 13:28, Jan Beulich wrote:>>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> In some cases, such as suffering a queued-invalidation timeout while >> performing an iommu_crash_shutdown(), Xen can end up reentering the crash >> path. Previously, this would result in a deadlock in one_cpu_only(), as the >> test_and_set_bit() would fail. >> >> The crash path is not reentrant, and even if it could be made to be so, it is >> almost certain that we would fall over the same reentry condition again. >> >> The new code can distinguish a reentry case from multiple cpus racing down the >> crash path. In the case that a reentry is detected, return back out to the >> nested panic() call, which will maybe_reboot() on our behalf. This requires a >> bit of return plumbing back up to kexec_crash(). >> >> While fixing this deadlock, also fix up an minor niggle seen recently from a >> XenServer crash report. The report was from a Bank 8 MCE, which had managed >> to crash on all cpus at once. The result was a lot of stack traces with cpus >> in kexec_common_shutdown(), which was infact the inlined version of >> one_cpu_only(). The kexec crash path is not a hotpath, so we can easily >> afford to prevent inlining for the sake of clarity in the stack traces. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> CC: Keir Fraser <keir@xen.org> >> CC: Jan Beulich <JBeulich@suse.com> >> CC: Tim Deegan <tim@xen.org> >> CC: David Vrabel <david.vrabel@citrix.com> >> --- >> xen/common/kexec.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- >> 1 file changed, 44 insertions(+), 7 deletions(-) > David, you being the maintainer of this code now, I don''t think I''ve > seen a response from you on this patch, despite - iirc - Andrew > having pinged you on it already too. > > Jan >David is out of the office for a week on vacation at the moment. I did get code-review from him before submitting it upstream, but I guess that doesn''t count for much as a formal ack. ~Andrew
Jan Beulich
2013-Nov-25 13:39 UTC
Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.
>>> On 25.11.13 at 14:30, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > On 25/11/13 13:28, Jan Beulich wrote: >>>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>> In some cases, such as suffering a queued-invalidation timeout while >>> performing an iommu_crash_shutdown(), Xen can end up reentering the crash >>> path. Previously, this would result in a deadlock in one_cpu_only(), as the >>> test_and_set_bit() would fail. >>> >>> The crash path is not reentrant, and even if it could be made to be so, it > is >>> almost certain that we would fall over the same reentry condition again. >>> >>> The new code can distinguish a reentry case from multiple cpus racing down > the >>> crash path. In the case that a reentry is detected, return back out to the >>> nested panic() call, which will maybe_reboot() on our behalf. This requires > a >>> bit of return plumbing back up to kexec_crash(). >>> >>> While fixing this deadlock, also fix up an minor niggle seen recently from a >>> XenServer crash report. The report was from a Bank 8 MCE, which had managed >>> to crash on all cpus at once. The result was a lot of stack traces with > cpus >>> in kexec_common_shutdown(), which was infact the inlined version of >>> one_cpu_only(). The kexec crash path is not a hotpath, so we can easily >>> afford to prevent inlining for the sake of clarity in the stack traces. >>> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>> CC: Keir Fraser <keir@xen.org> >>> CC: Jan Beulich <JBeulich@suse.com> >>> CC: Tim Deegan <tim@xen.org> >>> CC: David Vrabel <david.vrabel@citrix.com> >>> --- >>> xen/common/kexec.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- >>> 1 file changed, 44 insertions(+), 7 deletions(-) >> David, you being the maintainer of this code now, I don''t think I''ve >> seen a response from you on this patch, despite - iirc - Andrew >> having pinged you on it already too. >> >> Jan >> > > David is out of the office for a week on vacation at the moment. > > I did get code-review from him before submitting it upstream, but I > guess that doesn''t count for much as a formal ack.Then why did you not include his Reviewed-by with the patch? That would have sufficed (and put you in a bad light in case he came back and said he didn''t do any such review - which I trust he did if you say so). Jan
Andrew Cooper
2013-Nov-25 15:38 UTC
Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.
On 25/11/13 13:39, Jan Beulich wrote:>>>> On 25.11.13 at 14:30, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> On 25/11/13 13:28, Jan Beulich wrote: >>>>>> On 15.11.13 at 21:32, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >>>> In some cases, such as suffering a queued-invalidation timeout while >>>> performing an iommu_crash_shutdown(), Xen can end up reentering the crash >>>> path. Previously, this would result in a deadlock in one_cpu_only(), as the >>>> test_and_set_bit() would fail. >>>> >>>> The crash path is not reentrant, and even if it could be made to be so, it >> is >>>> almost certain that we would fall over the same reentry condition again. >>>> >>>> The new code can distinguish a reentry case from multiple cpus racing down >> the >>>> crash path. In the case that a reentry is detected, return back out to the >>>> nested panic() call, which will maybe_reboot() on our behalf. This requires >> a >>>> bit of return plumbing back up to kexec_crash(). >>>> >>>> While fixing this deadlock, also fix up an minor niggle seen recently from a >>>> XenServer crash report. The report was from a Bank 8 MCE, which had managed >>>> to crash on all cpus at once. The result was a lot of stack traces with >> cpus >>>> in kexec_common_shutdown(), which was infact the inlined version of >>>> one_cpu_only(). The kexec crash path is not a hotpath, so we can easily >>>> afford to prevent inlining for the sake of clarity in the stack traces. >>>> >>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>>> CC: Keir Fraser <keir@xen.org> >>>> CC: Jan Beulich <JBeulich@suse.com> >>>> CC: Tim Deegan <tim@xen.org> >>>> CC: David Vrabel <david.vrabel@citrix.com> >>>> --- >>>> xen/common/kexec.c | 51 ++++++++++++++++++++++++++++++++++++++++++++------- >>>> 1 file changed, 44 insertions(+), 7 deletions(-) >>> David, you being the maintainer of this code now, I don''t think I''ve >>> seen a response from you on this patch, despite - iirc - Andrew >>> having pinged you on it already too. >>> >>> Jan >>> >> David is out of the office for a week on vacation at the moment. >> >> I did get code-review from him before submitting it upstream, but I >> guess that doesn''t count for much as a formal ack. > Then why did you not include his Reviewed-by with the patch? > That would have sufficed (and put you in a bad light in case he > came back and said he didn''t do any such review - which I trust > he did if you say so). > > Jan >I think we had agreed that he would formally review it on-list, but it was late one evening of a busy week, so I suspect it just got forgotten. ~Andrew
David Vrabel
2013-Nov-27 10:27 UTC
Re: [PATCH 1/2] common/kexec: Prevent deadlock on reentry to the crash path.
On 15/11/2013 20:32, Andrew Cooper wrote:> In some cases, such as suffering a queued-invalidation timeout while > performing an iommu_crash_shutdown(), Xen can end up reentering the crash > path. Previously, this would result in a deadlock in one_cpu_only(), as the > test_and_set_bit() would fail. > > The crash path is not reentrant, and even if it could be made to be so, it is > almost certain that we would fall over the same reentry condition again. > > The new code can distinguish a reentry case from multiple cpus racing down the > crash path. In the case that a reentry is detected, return back out to the > nested panic() call, which will maybe_reboot() on our behalf. This requires a > bit of return plumbing back up to kexec_crash(). > > While fixing this deadlock, also fix up an minor niggle seen recently from a > XenServer crash report. The report was from a Bank 8 MCE, which had managed > to crash on all cpus at once. The result was a lot of stack traces with cpus > in kexec_common_shutdown(), which was infact the inlined version of > one_cpu_only(). The kexec crash path is not a hotpath, so we can easily > afford to prevent inlining for the sake of clarity in the stack traces. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > CC: Keir Fraser <keir@xen.org> > CC: Jan Beulich <JBeulich@suse.com> > CC: Tim Deegan <tim@xen.org> > CC: David Vrabel <david.vrabel@citrix.com>Reviewed-by: David Vrabel <david.vrabel@citrix.com> David