I have just starting auditing the NMI path and found that the oprofile code calls into a fair amount of common code. So far, down the first leg of the call graph, I have found several ASSERT()s, a BUG() and many {rd,wr}msr()s. Given that these are common code, and sensible in their places, removing them for the sake of being on the NMI path seems silly. As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s NMI/MCE safe, from a printk spinlock point of view. Either we can modify the macros to do a console_force_unlock(), which is fine for BUG() and ASSERT(), but problematic for WARN() (and deferring the printing to a tasklet wont work if we want a stack trace). Alternativly, we could change the console lock to be a recursive lock, at which point it is safe from the deadlock point of view. Are there any performance concerns from changing to a recursive lock? As for spinlocks themselves, as far as I can reason, recursive locks are safe to use, as are per-cpu spinlocks which are used exclusivly in the NMI handler or MCE handler (but not both), given the proviso that we have C level reentrance protection for do_{nmi,mce}(). For the {rd,wr}msr()s, we can assume that the Xen code is good and is not going to fault on access to the MSR, but we certainly cant guarantee this. As a result, I do not think it is practical or indeed sensible to remove all possibility of faults from the NMI path (and MCE to a lesser extent). Would it however be acceptable to change the console lock to a recursive lock, and rely on the Linux-inspired extended solution which will correctly deal with some nested cases, and panic verbosely in all other cases? -- Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer T: +44 (0)1223 225 900, http://www.citrix.com
>>> On 04.12.12 at 21:04, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s > NMI/MCE safe, from a printk spinlock point of view. > > Either we can modify the macros to do a console_force_unlock(), which is > fine for BUG() and ASSERT(), but problematic for WARN() (and deferring > the printing to a tasklet wont work if we want a stack trace). > Alternativly, we could change the console lock to be a recursive lock, > at which point it is safe from the deadlock point of view. Are there > any performance concerns from changing to a recursive lock?Not really, and the console lock isn''t performance critical anyway.> As for spinlocks themselves, as far as I can reason, recursive locks are > safe to use, as are per-cpu spinlocks which are used exclusivly in the > NMI handler or MCE handler (but not both), given the proviso that we > have C level reentrance protection for do_{nmi,mce}(). > > For the {rd,wr}msr()s, we can assume that the Xen code is good and is > not going to fault on access to the MSR, but we certainly cant guarantee > this.{rd,wr}msr() are of no concern - if they fault it''s exactly like a #PF or #GP from a bad memory reference: a bug that will bring down the hypervisor. Their _safe counterparts are what needs to be looked for, as there the fault is being recovered from (and it''s this recovery''s side effect of re-enabling NMIs that we don''t want). Jan
At 20:04 +0000 on 04 Dec (1354651442), Andrew Cooper wrote:> I have just starting auditing the NMI path and found that the oprofile > code calls into a fair amount of common code. > > So far, down the first leg of the call graph, I have found several > ASSERT()s, a BUG() and many {rd,wr}msr()s. Given that these are common > code, and sensible in their places, removing them for the sake of being > on the NMI path seems silly. > > As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s > NMI/MCE safe, from a printk spinlock point of view.WARN()s would need to be removed, since they involve a non-fatal fault.> Either we can modify the macros to do a console_force_unlock(), which is > fine for BUG() and ASSERT(), but problematic for WARN() (and deferring > the printing to a tasklet wont work if we want a stack trace). > Alternativly, we could change the console lock to be a recursive lock, > at which point it is safe from the deadlock point of view.It''s only safe if the console lock is the _only_ lock that can be taken both in NMI/MCE context and in ''normal'' IRQ context. Otherwise we''d end up with exactly the class of deadlocks we had before with IRQ/non-IRQ.> For the {rd,wr}msr()s, we can assume that the Xen code is good and is > not going to fault on access to the MSR, but we certainly cant guarantee > this.As Jan points out, it''s *msr_safe() we need to worry about.> As a result, I do not think it is practical or indeed sensible to remove > all possibility of faults from the NMI path (and MCE to a lesser > extent).I''m not sure what the problem is -- the printk() locking issue is AFAICT unrelated to the nested-NMI one, and will have to be fixed separately from whatever we do for nested NMI. So AFAICT we have to audit for WARN()s and non-fatal printk()s in NMI/MCE code regardless. Tim.
Seemingly Similar Threads
- Recursive locking in Xen (in reference to NMI/MCE path audit)
- [PATCH V5] x86/kexec: Change NMI and MCE handling on kexec path
- Should SEV-ES #VC use IST? (Re: [PATCH] Allow RDTSC and RDTSCP from userspace)
- [PATCH V3] vmx/nmi: Do not use self_nmi() in VMEXIT handler
- [PATCH] x86/hvm: don't give vector callback higher priority than NMI/MCE