Andrew Cooper
2012-Dec-05 18:26 UTC
Recursive locking in Xen (in reference to NMI/MCE path audit)
Hello, While auditing the NMI/MCE paths, I have encountered some issues with recursive locking in Xen, discovered by the misuse of the console_lock intermittently as a regular lock and as a recursive lock. The comment in spinlock.h is unclear as to whether mixing recursive and non recursive calls on the same spinlock is valid. If the calls are genuinely not valid, then surely regular spinlocks and recursive spinlocks should be separate types to let the compiler work for us. If mixing calls are valid, then there appear to be problems with nesting recursive and regular calls, as either ordering of spin_lock and spin_lock_recursive will deadlock. As a result, I am wondering which of the above to fix? There are very few users of recursive locks (domain lock, domain page_alloc lock, mm (pod and paging) locks and console lock). The console and page_alloc locks appear to have mixed callers, while the domain and mm locks appear to have strictly recursive callers. It seems to me that either we need to make the two locks different types, or use ASSERT()s to ensure we dont next spin_lock() and spin_lock_recursive() calls. In addition to the above problems, I find myself needing to implement spin_lock_recursive_irq{,save,restore}() variants. The implementations themselves are not too hard to do, but I did wonder whether we might want to have extra ASSERT()s to remove potential deadlock scenarios from the NMI/MCE paths. The ASSERT()s would have to be along the lines of "assert this is exclusively a recursive lock" or "assert this is a per-cpu regular spinlock which is never referenced outside of this specific NMI/MCE path". The possible implementation of these pseudo-asserts would differ depending on the outcome of the query. Any other comment or queries? -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
Tim Deegan
2012-Dec-06 10:39 UTC
Re: Recursive locking in Xen (in reference to NMI/MCE path audit)
At 18:26 +0000 on 05 Dec (1354731981), Andrew Cooper wrote:> Hello, > > While auditing the NMI/MCE paths, I have encountered some issues with > recursive locking in Xen, discovered by the misuse of the console_lock > intermittently as a regular lock and as a recursive lock. > > The comment in spinlock.h is unclear as to whether mixing recursive and > non recursive calls on the same spinlock is valid. If the calls are > genuinely not valid, then surely regular spinlocks and recursive > spinlocks should be separate types to let the compiler work for us.Seems like a good idea.> If mixing calls are valid, then there appear to be problems with nesting > recursive and regular calls, as either ordering of spin_lock and > spin_lock_recursive will deadlock.Yes. But paths that know they will not need to recurse can safely use the non-recursive lock op. The shadow code used to do this sort of thing (with a better failure mode) to explicitly catch recursive paths that weren''t intended.> As a result, I am wondering which of the above to fix? > > There are very few users of recursive locks (domain lock, domain > page_alloc lock, mm (pod and paging) locks and console lock). The > console and page_alloc locks appear to have mixed callers, while the > domain and mm locks appear to have strictly recursive callers.Please don''t touch the mm locks! Any code that uses them in NMI or MCE handlers is in a bad way already. :)> It seems to me that either we need to make the two locks different > types, or use ASSERT()s to ensure we dont next spin_lock() and > spin_lock_recursive() calls.I''d be happy with different types, unless there are cases where we care about the speed of extra operations on the page_alloc or domain locks.> In addition to the above problems, I find myself needing to implement > spin_lock_recursive_irq{,save,restore}() variants. The implementations > themselves are not too hard to do, but I did wonder whether we might > want to have extra ASSERT()s to remove potential deadlock scenarios from > the NMI/MCE paths. The ASSERT()s would have to be along the lines of > "assert this is exclusively a recursive lock" or "assert this is a > per-cpu regular spinlock which is never referenced outside of this > specific NMI/MCE path". The possible implementation of these > pseudo-asserts would differ depending on the outcome of the query.Have you looked at the existing lock debugging that Keir put in to do exactly this for normal vs IRQ? Can it be trivially extended? Tim.
Possibly Parallel Threads
- Audit of NMI and MCE paths
- [PATCH V5] x86/kexec: Change NMI and MCE handling on kexec path
- [PATCH] x86/hvm: don't give vector callback higher priority than NMI/MCE
- [PATCH V3] vmx/nmi: Do not use self_nmi() in VMEXIT handler
- [PATCH v2] xen/console: buffer and show origin of guest PV writes