peterz at infradead.org
2020-Aug-11 09:46 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On Tue, Aug 11, 2020 at 11:20:54AM +0200, peterz at infradead.org wrote:> On Tue, Aug 11, 2020 at 10:38:50AM +0200, J?rgen Gro? wrote: > > In case you don't want to do it I can send the patch for the Xen > > variants. > > I might've opened a whole new can of worms here. I'm not sure we > can/want to fix the entire fallout this release :/ > > Let me ponder this a little, because the more I look at things, the more > problems I keep finding... bah bah bah.That is, most of these irq-tracking problem are new because commit: 859d069ee1dd ("lockdep: Prepare for NMI IRQ state tracking") changed irq-tracking to ignore the lockdep recursion count. This then allows: lock_acquire() raw_local_irq_save(); current->lockdep_recursion++; trace_lock_acquire() ... tracing ... #PF under raw_local_irq_*() __lock_acquire() arch_spin_lock(&graph_lock) pv-spinlock-wait() local_irq_save() under raw_local_irq_*() However afaict that just made a bad situation worse. There already were issues, take for example: trace_clock_global() raw_local_irq_save(); arch_spin_lock() pv-spinlock-wait local_irq_save() And that has no lockdep_recursion to 'save' the say. The tracing recursion does however avoid some of the obvious fails there, like trace_clock calling into paravirt which then calls back into tracing. But still, that would've caused IRQ tracking problems even with the old code. And in that respect, this is all the exact same problem as that other set of patches has ( 20200807192336.405068898 at infradead.org ). Now, on the flip side, it does find actual problems, the trace_lock_*() things were using RCU in RCU-disabled code, and here I found that trace_clock_global() thinkg (and I suspect there's more of that). But at this point I'm not entirelty sure how best to proceed... tracing uses arch_spinlock_t, which means all spinlock implementations should be notrace, but then that drops into paravirt and all hell breaks loose because Hyper-V then calls into the APIC code etc.. etc.. At that rate we'll have the entire kernel marked notrace, and I'm fairly sure that's not a solution either. So let me once again see if I can't find a better solution for this all. Clearly it needs one :/
peterz at infradead.org
2020-Aug-11 20:17 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On Tue, Aug 11, 2020 at 11:46:51AM +0200, peterz at infradead.org wrote:> So let me once again see if I can't find a better solution for this all. > Clearly it needs one :/So the below boots without triggering the debug code from Marco -- it should allow nesting local_irq_save/restore under raw_local_irq_*(). I tried unconditional counting, but there's some _reallly_ wonky / asymmetric code that wrecks that and I've not been able to come up with anything useful. This one starts counting when local_irq_save() finds it didn't disable IRQs while lockdep though it did. At that point, local_irq_restore() will decrement and enable things again when it reaches 0. This assumes local_irq_save()/local_irq_restore() are nested sane, which is mostly true. This leaves #PF, which I fixed in these other patches, but I realized it needs fixing for all architectures :-( No bright ideas there yet. --- arch/x86/entry/thunk_32.S | 5 ---- include/linux/irqflags.h | 45 +++++++++++++++++++------------- init/main.c | 16 ++++++++++++ kernel/locking/lockdep.c | 58 +++++++++++++++++++++++++++++++++++++++++ kernel/trace/trace_preemptirq.c | 33 +++++++++++++++++++++++ 5 files changed, 134 insertions(+), 23 deletions(-) diff --git a/arch/x86/entry/thunk_32.S b/arch/x86/entry/thunk_32.S index 3a07ce3ec70b..f1f96d4d8cd6 100644 --- a/arch/x86/entry/thunk_32.S +++ b/arch/x86/entry/thunk_32.S @@ -29,11 +29,6 @@ SYM_CODE_START_NOALIGN(\name) SYM_CODE_END(\name) .endm -#ifdef CONFIG_TRACE_IRQFLAGS - THUNK trace_hardirqs_on_thunk,trace_hardirqs_on_caller,1 - THUNK trace_hardirqs_off_thunk,trace_hardirqs_off_caller,1 -#endif - #ifdef CONFIG_PREEMPTION THUNK preempt_schedule_thunk, preempt_schedule THUNK preempt_schedule_notrace_thunk, preempt_schedule_notrace diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index bd5c55755447..67e2a16d3846 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -23,12 +23,16 @@ extern void lockdep_hardirqs_on_prepare(unsigned long ip); extern void lockdep_hardirqs_on(unsigned long ip); extern void lockdep_hardirqs_off(unsigned long ip); + extern void lockdep_hardirqs_save(unsigned long ip, unsigned long flags); + extern void lockdep_hardirqs_restore(unsigned long ip, unsigned long flags); #else static inline void lockdep_softirqs_on(unsigned long ip) { } static inline void lockdep_softirqs_off(unsigned long ip) { } static inline void lockdep_hardirqs_on_prepare(unsigned long ip) { } static inline void lockdep_hardirqs_on(unsigned long ip) { } static inline void lockdep_hardirqs_off(unsigned long ip) { } + static inline void lockdep_hardirqs_save(unsigned long ip, unsigned long flags) { } + static inline void lockdep_hardirqs_restore(unsigned long ip, unsigned long flags) { } #endif #ifdef CONFIG_TRACE_IRQFLAGS @@ -49,10 +53,13 @@ struct irqtrace_events { DECLARE_PER_CPU(int, hardirqs_enabled); DECLARE_PER_CPU(int, hardirq_context); - extern void trace_hardirqs_on_prepare(void); - extern void trace_hardirqs_off_finish(void); - extern void trace_hardirqs_on(void); - extern void trace_hardirqs_off(void); +extern void trace_hardirqs_on_prepare(void); +extern void trace_hardirqs_off_finish(void); +extern void trace_hardirqs_on(void); +extern void trace_hardirqs_off(void); +extern void trace_hardirqs_save(unsigned long flags); +extern void trace_hardirqs_restore(unsigned long flags); + # define lockdep_hardirq_context() (this_cpu_read(hardirq_context)) # define lockdep_softirq_context(p) ((p)->softirq_context) # define lockdep_hardirqs_enabled() (this_cpu_read(hardirqs_enabled)) @@ -120,17 +127,19 @@ do { \ #else # define trace_hardirqs_on_prepare() do { } while (0) # define trace_hardirqs_off_finish() do { } while (0) -# define trace_hardirqs_on() do { } while (0) -# define trace_hardirqs_off() do { } while (0) -# define lockdep_hardirq_context() 0 -# define lockdep_softirq_context(p) 0 -# define lockdep_hardirqs_enabled() 0 -# define lockdep_softirqs_enabled(p) 0 -# define lockdep_hardirq_enter() do { } while (0) -# define lockdep_hardirq_threaded() do { } while (0) -# define lockdep_hardirq_exit() do { } while (0) -# define lockdep_softirq_enter() do { } while (0) -# define lockdep_softirq_exit() do { } while (0) +# define trace_hardirqs_on() do { } while (0) +# define trace_hardirqs_off() do { } while (0) +# define trace_hardirqs_save(flags) do { } while (0) +# define trace_hardirqs_restore(flags) do { } while (0) +# define lockdep_hardirq_context() 0 +# define lockdep_softirq_context(p) 0 +# define lockdep_hardirqs_enabled() 0 +# define lockdep_softirqs_enabled(p) 0 +# define lockdep_hardirq_enter() do { } while (0) +# define lockdep_hardirq_threaded() do { } while (0) +# define lockdep_hardirq_exit() do { } while (0) +# define lockdep_softirq_enter() do { } while (0) +# define lockdep_softirq_exit() do { } while (0) # define lockdep_hrtimer_enter(__hrtimer) false # define lockdep_hrtimer_exit(__context) do { } while (0) # define lockdep_posixtimer_enter() do { } while (0) @@ -185,18 +194,18 @@ do { \ do { trace_hardirqs_on(); raw_local_irq_enable(); } while (0) #define local_irq_disable() \ do { raw_local_irq_disable(); trace_hardirqs_off(); } while (0) + #define local_irq_save(flags) \ do { \ raw_local_irq_save(flags); \ - trace_hardirqs_off(); \ + trace_hardirqs_save(flags); \ } while (0) - #define local_irq_restore(flags) \ do { \ if (raw_irqs_disabled_flags(flags)) { \ raw_local_irq_restore(flags); \ - trace_hardirqs_off(); \ + trace_hardirqs_restore(flags); \ } else { \ trace_hardirqs_on(); \ raw_local_irq_restore(flags); \ diff --git a/init/main.c b/init/main.c index 15bd0efff3df..0873319dcff4 100644 --- a/init/main.c +++ b/init/main.c @@ -1041,6 +1041,22 @@ asmlinkage __visible void __init start_kernel(void) sfi_init_late(); kcsan_init(); + /* DEBUG CODE */ + lockdep_assert_irqs_enabled(); /* Pass. */ + { + unsigned long flags1; + raw_local_irq_save(flags1); + { + unsigned long flags2; + lockdep_assert_irqs_enabled(); /* Pass - expectedly blind. */ + local_irq_save(flags2); + lockdep_assert_irqs_disabled(); /* Pass. */ + local_irq_restore(flags2); + } + raw_local_irq_restore(flags1); + } + lockdep_assert_irqs_enabled(); /* FAIL! */ + /* Do the rest non-__init'ed, we're now alive */ arch_call_rest_init(); diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 3617abb08e31..2c88574b817c 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -3763,6 +3763,30 @@ void noinstr lockdep_hardirqs_on(unsigned long ip) } EXPORT_SYMBOL_GPL(lockdep_hardirqs_on); +static DEFINE_PER_CPU(int, hardirqs_disabled); + +void lockdep_hardirqs_restore(unsigned long ip, unsigned long flags) +{ + if (unlikely(!debug_locks)) + return; + + if (in_nmi()) { + if (!IS_ENABLED(CONFIG_TRACE_IRQFLAGS_NMI)) + return; + } else if (current->lockdep_recursion & LOCKDEP_RECURSION_MASK) + return; + + if (__this_cpu_read(hardirqs_disabled) && + __this_cpu_dec_return(hardirqs_disabled) == 0) { + + lockdep_hardirqs_on_prepare(ip); + lockdep_hardirqs_on(ip); + } else { + lockdep_hardirqs_off(ip); + } +} +EXPORT_SYMBOL_GPL(lockdep_hardirqs_restore); + /* * Hardirqs were disabled: */ @@ -3805,6 +3829,40 @@ void noinstr lockdep_hardirqs_off(unsigned long ip) } EXPORT_SYMBOL_GPL(lockdep_hardirqs_off); +void lockdep_hardirqs_save(unsigned long ip, unsigned long flags) +{ + if (unlikely(!debug_locks)) + return; + + /* + * Matching lockdep_hardirqs_on(), allow NMIs in the middle of lockdep; + * they will restore the software state. This ensures the software + * state is consistent inside NMIs as well. + */ + if (in_nmi()) { + if (!IS_ENABLED(CONFIG_TRACE_IRQFLAGS_NMI)) + return; + } else if (current->lockdep_recursion & LOCKDEP_RECURSION_MASK) + return; + + /* + * If IRQs were disabled, but IRQ tracking state said enabled we + * 'missed' an update (or are nested inside raw_local_irq_*()) and + * cannot rely on IRQ state to tell us when we should enable again. + * + * Count our way through this. + */ + if (raw_irqs_disabled_flags(flags)) { + if (__this_cpu_read(hardirqs_enabled)) { + WARN_ON_ONCE(__this_cpu_read(hardirqs_disabled) != 0); + __this_cpu_write(hardirqs_disabled, 1); + } else if (__this_cpu_read(hardirqs_disabled)) + __this_cpu_inc(hardirqs_disabled); + } + lockdep_hardirqs_off(ip); +} +EXPORT_SYMBOL_GPL(lockdep_hardirqs_save); + /* * Softirqs will be enabled: */ diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c index f10073e62603..080deaa1d9c9 100644 --- a/kernel/trace/trace_preemptirq.c +++ b/kernel/trace/trace_preemptirq.c @@ -85,6 +85,36 @@ void trace_hardirqs_off(void) EXPORT_SYMBOL(trace_hardirqs_off); NOKPROBE_SYMBOL(trace_hardirqs_off); +void trace_hardirqs_save(unsigned long flags) +{ + lockdep_hardirqs_save(CALLER_ADDR0, flags); + + if (!this_cpu_read(tracing_irq_cpu)) { + this_cpu_write(tracing_irq_cpu, 1); + tracer_hardirqs_off(CALLER_ADDR0, CALLER_ADDR1); + if (!in_nmi()) + trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); + } +} +EXPORT_SYMBOL(trace_hardirqs_save); +NOKPROBE_SYMBOL(trace_hardirqs_save); + +void trace_hardirqs_restore(unsigned long flags) +{ + if (this_cpu_read(tracing_irq_cpu)) { + if (!in_nmi()) + trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1); + tracer_hardirqs_on(CALLER_ADDR0, CALLER_ADDR1); + this_cpu_write(tracing_irq_cpu, 0); + } + + lockdep_hardirqs_restore(CALLER_ADDR0, flags); +} +EXPORT_SYMBOL(trace_hardirqs_restore); +NOKPROBE_SYMBOL(trace_hardirqs_restore); + +#ifdef __s390__ + __visible void trace_hardirqs_on_caller(unsigned long caller_addr) { if (this_cpu_read(tracing_irq_cpu)) { @@ -113,6 +143,9 @@ __visible void trace_hardirqs_off_caller(unsigned long caller_addr) } EXPORT_SYMBOL(trace_hardirqs_off_caller); NOKPROBE_SYMBOL(trace_hardirqs_off_caller); + +#endif /* __s390__ */ + #endif /* CONFIG_TRACE_IRQFLAGS */ #ifdef CONFIG_TRACE_PREEMPT_TOGGLE
peterz at infradead.org
2020-Aug-12 08:18 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On Wed, Aug 12, 2020 at 10:06:50AM +0200, Marco Elver wrote:> On Tue, Aug 11, 2020 at 10:17PM +0200, peterz at infradead.org wrote: > > On Tue, Aug 11, 2020 at 11:46:51AM +0200, peterz at infradead.org wrote: > > > > > So let me once again see if I can't find a better solution for this all. > > > Clearly it needs one :/ > > > > So the below boots without triggering the debug code from Marco -- it > > should allow nesting local_irq_save/restore under raw_local_irq_*(). > > > > I tried unconditional counting, but there's some _reallly_ wonky / > > asymmetric code that wrecks that and I've not been able to come up with > > anything useful. > > > > This one starts counting when local_irq_save() finds it didn't disable > > IRQs while lockdep though it did. At that point, local_irq_restore() > > will decrement and enable things again when it reaches 0. > > > > This assumes local_irq_save()/local_irq_restore() are nested sane, which > > is mostly true. > > > > This leaves #PF, which I fixed in these other patches, but I realized it > > needs fixing for all architectures :-( No bright ideas there yet. > > > > --- > > arch/x86/entry/thunk_32.S | 5 ---- > > include/linux/irqflags.h | 45 +++++++++++++++++++------------- > > init/main.c | 16 ++++++++++++ > > kernel/locking/lockdep.c | 58 +++++++++++++++++++++++++++++++++++++++++ > > kernel/trace/trace_preemptirq.c | 33 +++++++++++++++++++++++ > > 5 files changed, 134 insertions(+), 23 deletions(-) > > Testing this again with syzkaller produced some new reports: > > BUG: stack guard page was hit in error_entry > BUG: stack guard page was hit in exc_int3 > PANIC: double fault in error_entry > PANIC: double fault in exc_int3 > > Most of them have corrupted reports, but this one might be useful: > > BUG: stack guard page was hit at 000000001fab0982 (stack is 00000000063f33dc..00000000bf04b0d8) > BUG: stack guard page was hit at 00000000ca97ac69 (stack is 00000000af3e6c84..000000001597e1bf) > kernel stack overflow (double-fault): 0000 [#1] PREEMPT SMP > CPU: 1 PID: 4709 Comm: kworker/1:1H Not tainted 5.8.0+ #5 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1 04/01/2014 > Workqueue: events_highpri snd_vmidi_output_work > RIP: 0010:exc_int3+0x5/0xf0 arch/x86/kernel/traps.c:636 > Code: c9 85 4d 89 e8 31 c0 e8 a9 7d 68 fd e9 90 fe ff ff e8 0f 35 00 00 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 55 53 48 89 fb <e8> 76 0e 00 00 85 c0 74 03 5b 5d c3 f6 83 88 00 00 00 03 74 7e 48 > RSP: 0018:ffffc90008114000 EFLAGS: 00010083 > RAX: 0000000084e00e17 RBX: ffffc90008114018 RCX: ffffffff84e00e17 > RDX: 0000000000000000 RSI: ffffffff84e00a39 RDI: ffffc90008114018 > RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 > FS: 0000000000000000(0000) GS:ffff88807dc80000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: ffffc90008113ff8 CR3: 000000002dae4006 CR4: 0000000000770ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 00000000 > Call Trace: > asm_exc_int3+0x31/0x40 arch/x86/include/asm/idtentry.h:537 > RIP: 0010:arch_static_branch include/trace/events/preemptirq.h:40 [inline] > RIP: 0010:static_key_false include/linux/jump_label.h:200 [inline] > RIP: 0010:trace_irq_enable_rcuidle+0xd/0x120 include/trace/events/preemptirq.h:40 > Code: 24 08 48 89 df e8 43 8d ef ff 48 89 df 5b e9 4a 2e 99 03 66 2e 0f 1f 84 00 00 00 00 00 55 41 56 53 48 89 fb e8 84 1a fd ff cc <1f> 44 00 00 5b 41 5e 5d c3 65 8b 05 ab 74 c3 7e 89 c0 31 f6 48 0f > RSP: 0018:ffffc900081140f8 EFLAGS: 00000093 > RAX: ffffffff813d9e8c RBX: ffffffff81314dd3 RCX: ffff888076ce6000 > RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff81314dd3 > RBP: 0000000000000000 R08: ffffffff813da3d4 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 > R13: 0000000000000082 R14: 0000000000000000 R15: ffff888076ce6000 > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106 > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074 > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40 > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106 > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074 > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40 > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106 > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074 > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40 > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106 > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074 > > <... repeated many many times ...> > > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40 > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106 > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074 > Lost 500 message(s)! > BUG: stack guard page was hit at 00000000cab483ba (stack is 00000000b1442365..00000000c26f9ad3) > BUG: stack guard page was hit at 00000000318ff8d8 (stack is 00000000fd87d656..0000000058100136) > ---[ end trace 4157e0bb4a65941a ]---Wheee... recursion! Let me try and see if I can make something of that.
peterz at infradead.org
2020-Aug-12 08:57 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On Wed, Aug 12, 2020 at 10:18:32AM +0200, peterz at infradead.org wrote:> > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106 > > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074 > > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40 > > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106 > > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074 > > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40 > > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106 > > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074 > > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40 > > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106 > > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074 > > > > <... repeated many many times ...> > > > > trace_irq_enable_rcuidle+0x87/0x120 include/trace/events/preemptirq.h:40 > > trace_hardirqs_restore+0x59/0x80 kernel/trace/trace_preemptirq.c:106 > > rcu_irq_enter_irqson+0x43/0x70 kernel/rcu/tree.c:1074 > > Lost 500 message(s)! > > BUG: stack guard page was hit at 00000000cab483ba (stack is 00000000b1442365..00000000c26f9ad3) > > BUG: stack guard page was hit at 00000000318ff8d8 (stack is 00000000fd87d656..0000000058100136) > > ---[ end trace 4157e0bb4a65941a ]--- > > Wheee... recursion! Let me try and see if I can make something of that.All that's needed is enabling the preemptirq tracepoints. Lemme go fix.
Possibly Parallel Threads
- [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
- [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
- [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
- [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
- [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers