Andrew Cooper
2013-Apr-05 19:52 UTC
[PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()
During automated testing of the Xen crash path via fatal NMIs, we discovered the following deadlock: [22:47:04 UTC] (XEN) **************************************** [22:47:04 UTC] (XEN) Panic on CPU 0: [22:47:04 UTC] (XEN) FATAL TRAP: vector = 2 (nmi) [22:47:04 UTC] (XEN) [error_code=0000] , IN INTERRUPT CONTEXT [22:47:04 UTC] (XEN) **************************************** [22:47:04 UTC] (XEN) [22:47:04 UTC] (XEN) Reboot in five seconds... [22:47:05 UTC] (XEN) DMAR_IQA_REG = 839b34002 [22:47:05 UTC] (XEN) DMAR_IQH_REG = 0 [22:47:05 UTC] (XEN) DMAR_IQT_REG = 26b0 [22:47:05 UTC] (XEN) [22:47:05 UTC] (XEN) **************************************** [22:47:05 UTC] (XEN) Panic on CPU 0: [22:47:05 UTC] (XEN) queue invalidate wait descriptor was not executed [22:47:05 UTC] (XEN) **************************************** [22:47:05 UTC] (XEN) [22:47:05 UTC] (XEN) Reboot in five seconds... [23:09:54 UTC] The server is not powered on. The Virtual Serial Port is not available. The problem is quite difficult to reproduce (seen twice, unable to reproduce manually), but inspection of kexec_crash() path shows a certain deadlock because of one_cpu_only(). This patch replaces the use of one_cpu_only() with a recursive spinlock which provides exclusion between different CPUs racing into kexec_crash(), but prevents deadlock if the same CPU reenters kexec_crash() again. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> diff -r 996fbd7657bb -r a559ec5225fc xen/common/kexec.c --- a/xen/common/kexec.c +++ b/xen/common/kexec.c @@ -234,13 +234,6 @@ void __init set_kexec_crash_area_size(u6 } } -static void one_cpu_only(void) -{ - /* Only allow the first cpu to continue - force other cpus to spin */ - if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) ) - for ( ; ; ) ; -} - /* Save the registers in the per-cpu crash note buffer. */ void kexec_crash_save_cpu(void) { @@ -294,19 +287,25 @@ static void kexec_common_shutdown(void) watchdog_disable(); console_start_sync(); spin_debug_disable(); - one_cpu_only(); acpi_dmar_reinstate(); } void kexec_crash(void) { + static DEFINE_SPINLOCK(crash_lock); int pos; pos = (test_bit(KEXEC_FLAG_CRASH_POS, &kexec_flags) != 0); if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) ) return; + /* In a cascade failure, it is possible to re-enter kexec_crash on the same + * CPU. This spinlock is deliberately never unlocked, to allow reentrance + * on the same CPU, but provides excusion from different CPUs racing. */ + spin_lock_recursive(&crash_lock); + kexecing = TRUE; + set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags); kexec_common_shutdown(); kexec_crash_save_cpu();
Keir Fraser
2013-Apr-05 21:21 UTC
Re: [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()
On 05/04/2013 20:52, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote:> During automated testing of the Xen crash path via fatal NMIs, we discovered > the following deadlock: > > [22:47:04 UTC] (XEN) **************************************** > [22:47:04 UTC] (XEN) Panic on CPU 0: > [22:47:04 UTC] (XEN) FATAL TRAP: vector = 2 (nmi) > [22:47:04 UTC] (XEN) [error_code=0000] , IN INTERRUPT CONTEXT > [22:47:04 UTC] (XEN) **************************************** > [22:47:04 UTC] (XEN) > [22:47:04 UTC] (XEN) Reboot in five seconds... > [22:47:05 UTC] (XEN) DMAR_IQA_REG = 839b34002 > [22:47:05 UTC] (XEN) DMAR_IQH_REG = 0 > [22:47:05 UTC] (XEN) DMAR_IQT_REG = 26b0 > [22:47:05 UTC] (XEN) > [22:47:05 UTC] (XEN) **************************************** > [22:47:05 UTC] (XEN) Panic on CPU 0: > [22:47:05 UTC] (XEN) queue invalidate wait descriptor was not executed > [22:47:05 UTC] (XEN) **************************************** > [22:47:05 UTC] (XEN) > [22:47:05 UTC] (XEN) Reboot in five seconds... > [23:09:54 UTC] The server is not powered on. The Virtual Serial Port is not > available. > > The problem is quite difficult to reproduce (seen twice, unable to reproduce > manually), but inspection of kexec_crash() path shows a certain deadlock > because of one_cpu_only(). > > > This patch replaces the use of one_cpu_only() with a recursive spinlock which > provides exclusion between different CPUs racing into kexec_crash(), but > prevents deadlock if the same CPU reenters kexec_crash() again.Does kexec_crash() not mind being reentered in this way? -- Keir> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > > diff -r 996fbd7657bb -r a559ec5225fc xen/common/kexec.c > --- a/xen/common/kexec.c > +++ b/xen/common/kexec.c > @@ -234,13 +234,6 @@ void __init set_kexec_crash_area_size(u6 > } > } > > -static void one_cpu_only(void) > -{ > - /* Only allow the first cpu to continue - force other cpus to spin */ > - if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) ) > - for ( ; ; ) ; > -} > - > /* Save the registers in the per-cpu crash note buffer. */ > void kexec_crash_save_cpu(void) > { > @@ -294,19 +287,25 @@ static void kexec_common_shutdown(void) > watchdog_disable(); > console_start_sync(); > spin_debug_disable(); > - one_cpu_only(); > acpi_dmar_reinstate(); > } > > void kexec_crash(void) > { > + static DEFINE_SPINLOCK(crash_lock); > int pos; > > pos = (test_bit(KEXEC_FLAG_CRASH_POS, &kexec_flags) != 0); > if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) ) > return; > > + /* In a cascade failure, it is possible to re-enter kexec_crash on the > same > + * CPU. This spinlock is deliberately never unlocked, to allow > reentrance > + * on the same CPU, but provides excusion from different CPUs racing. */ > + spin_lock_recursive(&crash_lock); > + > kexecing = TRUE; > + set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags); > > kexec_common_shutdown(); > kexec_crash_save_cpu();
Andrew Cooper
2013-Apr-06 15:03 UTC
Re: [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()
On 05/04/2013 22:21, Keir Fraser wrote:> On 05/04/2013 20:52, "Andrew Cooper" <andrew.cooper3@citrix.com> wrote: > >> During automated testing of the Xen crash path via fatal NMIs, we discovered >> the following deadlock: >> >> [22:47:04 UTC] (XEN) **************************************** >> [22:47:04 UTC] (XEN) Panic on CPU 0: >> [22:47:04 UTC] (XEN) FATAL TRAP: vector = 2 (nmi) >> [22:47:04 UTC] (XEN) [error_code=0000] , IN INTERRUPT CONTEXT >> [22:47:04 UTC] (XEN) **************************************** >> [22:47:04 UTC] (XEN) >> [22:47:04 UTC] (XEN) Reboot in five seconds... >> [22:47:05 UTC] (XEN) DMAR_IQA_REG = 839b34002 >> [22:47:05 UTC] (XEN) DMAR_IQH_REG = 0 >> [22:47:05 UTC] (XEN) DMAR_IQT_REG = 26b0 >> [22:47:05 UTC] (XEN) >> [22:47:05 UTC] (XEN) **************************************** >> [22:47:05 UTC] (XEN) Panic on CPU 0: >> [22:47:05 UTC] (XEN) queue invalidate wait descriptor was not executed >> [22:47:05 UTC] (XEN) **************************************** >> [22:47:05 UTC] (XEN) >> [22:47:05 UTC] (XEN) Reboot in five seconds... >> [23:09:54 UTC] The server is not powered on. The Virtual Serial Port is not >> available. >> >> The problem is quite difficult to reproduce (seen twice, unable to reproduce >> manually), but inspection of kexec_crash() path shows a certain deadlock >> because of one_cpu_only(). >> >> >> This patch replaces the use of one_cpu_only() with a recursive spinlock which >> provides exclusion between different CPUs racing into kexec_crash(), but >> prevents deadlock if the same CPU reenters kexec_crash() again. > Does kexec_crash() not mind being reentered in this way? > > -- KeirNot really. I first experimented with seeing whether it was possible to detect reentrance and back out, but there certainly cases with the reentrant NMI/MCE where you can loose the state required to get back to the outer instance of kexec_crash. Furthermore, in the case of a crash its possible memory corruption has occurred, meaning faults are more likely, leading to reentrance back onto the kexec_crash() path. Also, there are BUG()s and ASSERTS()s in the current kexec_crash() path. Therefore, the sensible approach is to make kexec_crash() safe to be reentered, even if we do make all attempts to minimise the chances of reentrance. Looking through the entire kexec_crash() path, console_start_sync() could deadlock on the serial tx_lock if a fault occurs midway through the function, and acpi_dmar_reinstate() will leave the DMAR table in a state which fails the checksum. kexec_crash_save_cpu() might not completely write the crash notes, but there are no other obvious problems. I can probably fix the acpi_dmar_reinstate() and kexec_crash_save_cpu() issues without too much problem and will look into those in due course, but my next task is to see whether there was a programatic reason as to why the wait descriptor was not executed successfully. Given that on the crash path something has already gone wrong, I would argue this patch on the merit of removing a deadlock without making the situation any worse. I think it is impossible to have a kexec_crash() path which is safely reentrant and guarantee to work without issue; after all, if we reenter too many times because of faults we will likely triple fault. ~Andrew> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> >> diff -r 996fbd7657bb -r a559ec5225fc xen/common/kexec.c >> --- a/xen/common/kexec.c >> +++ b/xen/common/kexec.c >> @@ -234,13 +234,6 @@ void __init set_kexec_crash_area_size(u6 >> } >> } >> >> -static void one_cpu_only(void) >> -{ >> - /* Only allow the first cpu to continue - force other cpus to spin */ >> - if ( test_and_set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags) ) >> - for ( ; ; ) ; >> -} >> - >> /* Save the registers in the per-cpu crash note buffer. */ >> void kexec_crash_save_cpu(void) >> { >> @@ -294,19 +287,25 @@ static void kexec_common_shutdown(void) >> watchdog_disable(); >> console_start_sync(); >> spin_debug_disable(); >> - one_cpu_only(); >> acpi_dmar_reinstate(); >> } >> >> void kexec_crash(void) >> { >> + static DEFINE_SPINLOCK(crash_lock); >> int pos; >> >> pos = (test_bit(KEXEC_FLAG_CRASH_POS, &kexec_flags) != 0); >> if ( !test_bit(KEXEC_IMAGE_CRASH_BASE + pos, &kexec_flags) ) >> return; >> >> + /* In a cascade failure, it is possible to re-enter kexec_crash on the >> same >> + * CPU. This spinlock is deliberately never unlocked, to allow >> reentrance >> + * on the same CPU, but provides excusion from different CPUs racing. */ >> + spin_lock_recursive(&crash_lock); >> + >> kexecing = TRUE; >> + set_bit(KEXEC_FLAG_IN_PROGRESS, &kexec_flags); >> >> kexec_common_shutdown(); >> kexec_crash_save_cpu(); >
Jan Beulich
2013-Apr-08 07:48 UTC
Re: [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()
>>> On 06.04.13 at 17:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > I can probably fix the acpi_dmar_reinstate() and kexec_crash_save_cpu() > issues without too much problem and will look into those in due course, > but my next task is to see whether there was a programatic reason as to > why the wait descriptor was not executed successfully.Was that perhaps still without the Intel chipset interrupt remapping errata workaround? These errata can - afaict - have exactly that panic as a symptom. Jan
Andrew Cooper
2013-Apr-08 09:47 UTC
Re: [PATCH] xen/kexec: Prevent deadlock because of one_cpu_only()
On 08/04/13 08:48, Jan Beulich wrote:>>>> On 06.04.13 at 17:03, Andrew Cooper <andrew.cooper3@citrix.com> wrote: >> I can probably fix the acpi_dmar_reinstate() and kexec_crash_save_cpu() >> issues without too much problem and will look into those in due course, >> but my next task is to see whether there was a programatic reason as to >> why the wait descriptor was not executed successfully. > Was that perhaps still without the Intel chipset interrupt remapping > errata workaround? These errata can - afaict - have exactly that > panic as a symptom. > > Jan >This is not from an affected system; It is Sandy-Bridge EP (And we have the workaround in place anyway), although we too saw these symptoms for the errata-affected systems. I am double checking the errata docs, given the similarity. ~Andrew