I have just starting auditing the NMI path and found that the oprofile
code calls into a fair amount of common code.
So far, down the first leg of the call graph, I have found several
ASSERT()s, a BUG() and many {rd,wr}msr()s. Given that these are common
code, and sensible in their places, removing them for the sake of being
on the NMI path seems silly.
As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s
NMI/MCE safe, from a printk spinlock point of view.
Either we can modify the macros to do a console_force_unlock(), which is
fine for BUG() and ASSERT(), but problematic for WARN() (and deferring
the printing to a tasklet wont work if we want a stack trace).
Alternativly, we could change the console lock to be a recursive lock,
at which point it is safe from the deadlock point of view. Are there
any performance concerns from changing to a recursive lock?
As for spinlocks themselves, as far as I can reason, recursive locks are
safe to use, as are per-cpu spinlocks which are used exclusivly in the
NMI handler or MCE handler (but not both), given the proviso that we
have C level reentrance protection for do_{nmi,mce}().
For the {rd,wr}msr()s, we can assume that the Xen code is good and is
not going to fault on access to the MSR, but we certainly cant guarantee
this.
As a result, I do not think it is practical or indeed sensible to remove
all possibility of faults from the NMI path (and MCE to a lesser
extent). Would it however be acceptable to change the console lock to a
recursive lock, and rely on the Linux-inspired extended solution which
will correctly deal with some nested cases, and panic verbosely in all
other cases?
--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
>>> On 04.12.12 at 21:04, Andrew Cooper <andrew.cooper3@citrix.com> wrote: > As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s > NMI/MCE safe, from a printk spinlock point of view. > > Either we can modify the macros to do a console_force_unlock(), which is > fine for BUG() and ASSERT(), but problematic for WARN() (and deferring > the printing to a tasklet wont work if we want a stack trace). > Alternativly, we could change the console lock to be a recursive lock, > at which point it is safe from the deadlock point of view. Are there > any performance concerns from changing to a recursive lock?Not really, and the console lock isn''t performance critical anyway.> As for spinlocks themselves, as far as I can reason, recursive locks are > safe to use, as are per-cpu spinlocks which are used exclusivly in the > NMI handler or MCE handler (but not both), given the proviso that we > have C level reentrance protection for do_{nmi,mce}(). > > For the {rd,wr}msr()s, we can assume that the Xen code is good and is > not going to fault on access to the MSR, but we certainly cant guarantee > this.{rd,wr}msr() are of no concern - if they fault it''s exactly like a #PF or #GP from a bad memory reference: a bug that will bring down the hypervisor. Their _safe counterparts are what needs to be looked for, as there the fault is being recovered from (and it''s this recovery''s side effect of re-enabling NMIs that we don''t want). Jan
At 20:04 +0000 on 04 Dec (1354651442), Andrew Cooper wrote:> I have just starting auditing the NMI path and found that the oprofile > code calls into a fair amount of common code. > > So far, down the first leg of the call graph, I have found several > ASSERT()s, a BUG() and many {rd,wr}msr()s. Given that these are common > code, and sensible in their places, removing them for the sake of being > on the NMI path seems silly. > > As an alternative, I suggest that we make ASSERT()s, BUG()s and WARN()s > NMI/MCE safe, from a printk spinlock point of view.WARN()s would need to be removed, since they involve a non-fatal fault.> Either we can modify the macros to do a console_force_unlock(), which is > fine for BUG() and ASSERT(), but problematic for WARN() (and deferring > the printing to a tasklet wont work if we want a stack trace). > Alternativly, we could change the console lock to be a recursive lock, > at which point it is safe from the deadlock point of view.It''s only safe if the console lock is the _only_ lock that can be taken both in NMI/MCE context and in ''normal'' IRQ context. Otherwise we''d end up with exactly the class of deadlocks we had before with IRQ/non-IRQ.> For the {rd,wr}msr()s, we can assume that the Xen code is good and is > not going to fault on access to the MSR, but we certainly cant guarantee > this.As Jan points out, it''s *msr_safe() we need to worry about.> As a result, I do not think it is practical or indeed sensible to remove > all possibility of faults from the NMI path (and MCE to a lesser > extent).I''m not sure what the problem is -- the printk() locking issue is AFAICT unrelated to the nested-NMI one, and will have to be fixed separately from whatever we do for nested NMI. So AFAICT we have to audit for WARN()s and non-fatal printk()s in NMI/MCE code regardless. Tim.
Reasonably Related Threads
- Recursive locking in Xen (in reference to NMI/MCE path audit)
- [PATCH V5] x86/kexec: Change NMI and MCE handling on kexec path
- Should SEV-ES #VC use IST? (Re: [PATCH] Allow RDTSC and RDTSCP from userspace)
- [PATCH V3] vmx/nmi: Do not use self_nmi() in VMEXIT handler
- [PATCH] x86/hvm: don't give vector callback higher priority than NMI/MCE