Jürgen Groß
2020-Aug-11 08:38 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On 11.08.20 10:12, Peter Zijlstra wrote:> On Tue, Aug 11, 2020 at 09:57:55AM +0200, J?rgen Gro? wrote: >> On 11.08.20 09:41, Peter Zijlstra wrote: >>> On Fri, Aug 07, 2020 at 05:19:03PM +0200, Marco Elver wrote: >>> >>>> My hypothesis here is simply that kvm_wait() may be called in a place >>>> where we get the same case I mentioned to Peter, >>>> >>>> raw_local_irq_save(); /* or other IRQs off without tracing */ >>>> ... >>>> kvm_wait() /* IRQ state tracing gets confused */ >>>> ... >>>> raw_local_irq_restore(); >>>> >>>> and therefore, using raw variants in kvm_wait() works. It's also safe >>>> because it doesn't call any other libraries that would result in corrupt >>> >>> Yes, this is definitely an issue. >>> >>> Tracing, we also musn't call into tracing when using raw_local_irq_*(). >>> Because then we re-intoduce this same issue all over again. >>> >>> Both halt() and safe_halt() are more paravirt calls, but given we're in >>> a KVM paravirt call already, I suppose we can directly use native_*() >>> here. >>> >>> Something like so then... I suppose, but then the Xen variants need TLC >>> too. >> >> Just to be sure I understand you correct: >> >> You mean that xen_qlock_kick() and xen_qlock_wait() and all functions >> called by those should gain the "notrace" attribute, right? >> >> I am not sure why the kick variants need it, though. IMO those are >> called only after the lock has been released, so they should be fine >> without notrace. > > The issue happens when someone uses arch_spinlock_t under > raw_local_irq_*(). > >> And again: we shouldn't forget the Hyper-V variants. > > Bah, my grep failed :/ Also *groan*, that's calling apic->send_IPI().In case you don't want to do it I can send the patch for the Xen variants. Juergen
peterz at infradead.org
2020-Aug-11 09:20 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On Tue, Aug 11, 2020 at 10:38:50AM +0200, J?rgen Gro? wrote:> In case you don't want to do it I can send the patch for the Xen > variants.I might've opened a whole new can of worms here. I'm not sure we can/want to fix the entire fallout this release :/ Let me ponder this a little, because the more I look at things, the more problems I keep finding... bah bah bah.
peterz at infradead.org
2020-Aug-11 09:46 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On Tue, Aug 11, 2020 at 11:20:54AM +0200, peterz at infradead.org wrote:> On Tue, Aug 11, 2020 at 10:38:50AM +0200, J?rgen Gro? wrote: > > In case you don't want to do it I can send the patch for the Xen > > variants. > > I might've opened a whole new can of worms here. I'm not sure we > can/want to fix the entire fallout this release :/ > > Let me ponder this a little, because the more I look at things, the more > problems I keep finding... bah bah bah.That is, most of these irq-tracking problem are new because commit: 859d069ee1dd ("lockdep: Prepare for NMI IRQ state tracking") changed irq-tracking to ignore the lockdep recursion count. This then allows: lock_acquire() raw_local_irq_save(); current->lockdep_recursion++; trace_lock_acquire() ... tracing ... #PF under raw_local_irq_*() __lock_acquire() arch_spin_lock(&graph_lock) pv-spinlock-wait() local_irq_save() under raw_local_irq_*() However afaict that just made a bad situation worse. There already were issues, take for example: trace_clock_global() raw_local_irq_save(); arch_spin_lock() pv-spinlock-wait local_irq_save() And that has no lockdep_recursion to 'save' the say. The tracing recursion does however avoid some of the obvious fails there, like trace_clock calling into paravirt which then calls back into tracing. But still, that would've caused IRQ tracking problems even with the old code. And in that respect, this is all the exact same problem as that other set of patches has ( 20200807192336.405068898 at infradead.org ). Now, on the flip side, it does find actual problems, the trace_lock_*() things were using RCU in RCU-disabled code, and here I found that trace_clock_global() thinkg (and I suspect there's more of that). But at this point I'm not entirelty sure how best to proceed... tracing uses arch_spinlock_t, which means all spinlock implementations should be notrace, but then that drops into paravirt and all hell breaks loose because Hyper-V then calls into the APIC code etc.. etc.. At that rate we'll have the entire kernel marked notrace, and I'm fairly sure that's not a solution either. So let me once again see if I can't find a better solution for this all. Clearly it needs one :/
Reasonably Related Threads
- [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
- [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
- [PATCH] xen/events: xen/events: fix RCU warning
- [PATCH] xen/events: xen/events: fix RCU warning
- [Bug 100691] New: [4.10] BUG: KASAN: use-after-free in drm_calc_vbltimestamp_from_scanoutpos+0x625/0x740