peterz at infradead.org
2020-Aug-05 14:12 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On Wed, Aug 05, 2020 at 03:59:40PM +0200, Marco Elver wrote:> On Wed, Aug 05, 2020 at 03:42PM +0200, peterz at infradead.org wrote:> > Shouldn't we __always_inline those? They're going to be really small. > > I can send a v2, and you can choose. For reference, though: > > ffffffff86271ee0 <arch_local_save_flags>: > ffffffff86271ee0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > ffffffff86271ee5: 48 83 3d 43 87 e4 01 cmpq $0x0,0x1e48743(%rip) # ffffffff880ba630 <pv_ops+0x120> > ffffffff86271eec: 00 > ffffffff86271eed: 74 0d je ffffffff86271efc <arch_local_save_flags+0x1c> > ffffffff86271eef: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > ffffffff86271ef4: ff 14 25 30 a6 0b 88 callq *0xffffffff880ba630 > ffffffff86271efb: c3 retq > ffffffff86271efc: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > ffffffff86271f01: 0f 0b ud2> ffffffff86271a90 <arch_local_irq_restore>: > ffffffff86271a90: 53 push %rbx > ffffffff86271a91: 48 89 fb mov %rdi,%rbx > ffffffff86271a94: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > ffffffff86271a99: 48 83 3d 97 8b e4 01 cmpq $0x0,0x1e48b97(%rip) # ffffffff880ba638 <pv_ops+0x128> > ffffffff86271aa0: 00 > ffffffff86271aa1: 74 11 je ffffffff86271ab4 <arch_local_irq_restore+0x24> > ffffffff86271aa3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > ffffffff86271aa8: 48 89 df mov %rbx,%rdi > ffffffff86271aab: ff 14 25 38 a6 0b 88 callq *0xffffffff880ba638 > ffffffff86271ab2: 5b pop %rbx > ffffffff86271ab3: c3 retq > ffffffff86271ab4: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > ffffffff86271ab9: 0f 0b ud2Blergh, that's abysmall. In part I suspect because you have CONFIG_PARAVIRT_DEBUG, let me try and untangle that PV macro maze.
Jürgen Groß
2020-Aug-05 14:17 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On 05.08.20 16:12, peterz at infradead.org wrote:> On Wed, Aug 05, 2020 at 03:59:40PM +0200, Marco Elver wrote: >> On Wed, Aug 05, 2020 at 03:42PM +0200, peterz at infradead.org wrote: > >>> Shouldn't we __always_inline those? They're going to be really small. >> >> I can send a v2, and you can choose. For reference, though: >> >> ffffffff86271ee0 <arch_local_save_flags>: >> ffffffff86271ee0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >> ffffffff86271ee5: 48 83 3d 43 87 e4 01 cmpq $0x0,0x1e48743(%rip) # ffffffff880ba630 <pv_ops+0x120> >> ffffffff86271eec: 00 >> ffffffff86271eed: 74 0d je ffffffff86271efc <arch_local_save_flags+0x1c> >> ffffffff86271eef: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >> ffffffff86271ef4: ff 14 25 30 a6 0b 88 callq *0xffffffff880ba630 >> ffffffff86271efb: c3 retq >> ffffffff86271efc: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >> ffffffff86271f01: 0f 0b ud2 > >> ffffffff86271a90 <arch_local_irq_restore>: >> ffffffff86271a90: 53 push %rbx >> ffffffff86271a91: 48 89 fb mov %rdi,%rbx >> ffffffff86271a94: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >> ffffffff86271a99: 48 83 3d 97 8b e4 01 cmpq $0x0,0x1e48b97(%rip) # ffffffff880ba638 <pv_ops+0x128> >> ffffffff86271aa0: 00 >> ffffffff86271aa1: 74 11 je ffffffff86271ab4 <arch_local_irq_restore+0x24> >> ffffffff86271aa3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >> ffffffff86271aa8: 48 89 df mov %rbx,%rdi >> ffffffff86271aab: ff 14 25 38 a6 0b 88 callq *0xffffffff880ba638 >> ffffffff86271ab2: 5b pop %rbx >> ffffffff86271ab3: c3 retq >> ffffffff86271ab4: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) >> ffffffff86271ab9: 0f 0b ud2 > > > Blergh, that's abysmall. In part I suspect because you have > CONFIG_PARAVIRT_DEBUG, let me try and untangle that PV macro maze. >Probably. I have found the following in my kernel: fffffff81540a5f <arch_local_save_flags>: ffffffff81540a5f: ff 14 25 40 a4 23 82 callq *0xffffffff8223a440 ffffffff81540a66: c3 retq Juergen
peterz at infradead.org
2020-Aug-05 14:17 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On Wed, Aug 05, 2020 at 04:12:37PM +0200, peterz at infradead.org wrote:> On Wed, Aug 05, 2020 at 03:59:40PM +0200, Marco Elver wrote: > > On Wed, Aug 05, 2020 at 03:42PM +0200, peterz at infradead.org wrote: > > > > Shouldn't we __always_inline those? They're going to be really small. > > > > I can send a v2, and you can choose. For reference, though: > > > > ffffffff86271ee0 <arch_local_save_flags>: > > ffffffff86271ee0: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > ffffffff86271ee5: 48 83 3d 43 87 e4 01 cmpq $0x0,0x1e48743(%rip) # ffffffff880ba630 <pv_ops+0x120> > > ffffffff86271eec: 00 > > ffffffff86271eed: 74 0d je ffffffff86271efc <arch_local_save_flags+0x1c> > > ffffffff86271eef: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > ffffffff86271ef4: ff 14 25 30 a6 0b 88 callq *0xffffffff880ba630 > > ffffffff86271efb: c3 retq > > ffffffff86271efc: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > ffffffff86271f01: 0f 0b ud2 > > > ffffffff86271a90 <arch_local_irq_restore>: > > ffffffff86271a90: 53 push %rbx > > ffffffff86271a91: 48 89 fb mov %rdi,%rbx > > ffffffff86271a94: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > ffffffff86271a99: 48 83 3d 97 8b e4 01 cmpq $0x0,0x1e48b97(%rip) # ffffffff880ba638 <pv_ops+0x128> > > ffffffff86271aa0: 00 > > ffffffff86271aa1: 74 11 je ffffffff86271ab4 <arch_local_irq_restore+0x24> > > ffffffff86271aa3: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > ffffffff86271aa8: 48 89 df mov %rbx,%rdi > > ffffffff86271aab: ff 14 25 38 a6 0b 88 callq *0xffffffff880ba638 > > ffffffff86271ab2: 5b pop %rbx > > ffffffff86271ab3: c3 retq > > ffffffff86271ab4: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) > > ffffffff86271ab9: 0f 0b ud2 > > > Blergh, that's abysmall. In part I suspect because you have > CONFIG_PARAVIRT_DEBUG, let me try and untangle that PV macro maze.Yeah, look here: 0000 0000000000462149 <arch_local_save_flags>: 0000 462149: ff 14 25 00 00 00 00 callq *0x0 0003 46214c: R_X86_64_32S pv_ops+0x120 0007 462150: c3 retq That's exactly what I was expecting.
peterz at infradead.org
2020-Aug-06 11:32 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote:> Testing my hypothesis that raw then nested non-raw > local_irq_save/restore() breaks IRQ state tracking -- see the reproducer > below. This is at least 1 case I can think of that we're bound to hit.Aaargh!> diff --git a/init/main.c b/init/main.c > index 15bd0efff3df..0873319dcff4 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -1041,6 +1041,22 @@ asmlinkage __visible void __init start_kernel(void) > sfi_init_late(); > kcsan_init(); > > + /* DEBUG CODE */ > + lockdep_assert_irqs_enabled(); /* Pass. */ > + { > + unsigned long flags1; > + raw_local_irq_save(flags1);This disables IRQs but doesn't trace..> + { > + unsigned long flags2; > + lockdep_assert_irqs_enabled(); /* Pass - expectedly blind. */Indeed, we didn't trace the above disable, so software state is still on.> + local_irq_save(flags2);So here we save IRQ state, and unconditionally disable IRQs and trace them disabled.> + lockdep_assert_irqs_disabled(); /* Pass. */ > + local_irq_restore(flags2);But here, we restore IRQ state to 'disabled' and explicitly trace it disabled *again* (which is a bit daft, but whatever).> + } > + raw_local_irq_restore(flags1);This then restores the IRQ state to enable, but no tracing.> + } > + lockdep_assert_irqs_enabled(); /* FAIL! */And we're out of sync... :/ /me goes ponder things... How's something like this then? --- include/linux/sched.h | 3 --- kernel/kcsan/core.c | 62 ++++++++++++++++++++++++++++++++++++--------------- 2 files changed, 44 insertions(+), 21 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 06ec60462af0..2f5aef57e687 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1193,9 +1193,6 @@ struct task_struct { #ifdef CONFIG_KCSAN struct kcsan_ctx kcsan_ctx; -#ifdef CONFIG_TRACE_IRQFLAGS - struct irqtrace_events kcsan_save_irqtrace; -#endif #endif #ifdef CONFIG_FUNCTION_GRAPH_TRACER diff --git a/kernel/kcsan/core.c b/kernel/kcsan/core.c index 9147ff6a12e5..9c4436bf0561 100644 --- a/kernel/kcsan/core.c +++ b/kernel/kcsan/core.c @@ -291,17 +291,50 @@ static inline unsigned int get_delay(void) 0); } -void kcsan_save_irqtrace(struct task_struct *task) +/* + * KCSAN hooks are everywhere, which means they're NMI like for interrupt + * tracing. In order to present a 'normal' as possible context to the code + * called by KCSAN when reporting errors we need to update the irq-tracing + * state. + * + * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's + * runtime is entered for every memory access, and potentially useful + * information is lost if dirtied by KCSAN. + */ + +struct kcsan_irq_state { + unsigned long flags; +#ifdef CONFIG_TRACE_IRQFLAGS + int hardirqs; + struct irqtrace_events irqtrace; +#endif +}; + +void kcsan_save_irqtrace(struct kcsan_irq_state *irq_state) { #ifdef CONFIG_TRACE_IRQFLAGS - task->kcsan_save_irqtrace = task->irqtrace; + irq_state->irqtrace = task->irqtrace; + irq_state->hardirq = lockdep_hardirqs_enabled(); #endif + if (!kcsan_interrupt_watcher) { + raw_local_irq_save(irq_state->flags); + lockdep_hardirqs_off(CALLER_ADDR0); + } } -void kcsan_restore_irqtrace(struct task_struct *task) +void kcsan_restore_irqtrace(struct kcsan_irq_state *irq_state) { + if (!kcsan_interrupt_watcher) { +#ifdef CONFIG_TRACE_IRQFLAGS + if (irq_state->hardirqs) { + lockdep_hardirqs_on_prepare(CALLER_ADDR0); + lockdep_hardirqs_on(CALLER_ADDR0); + } +#endif + raw_local_irq_restore(irq_state->flags); + } #ifdef CONFIG_TRACE_IRQFLAGS - task->irqtrace = task->kcsan_save_irqtrace; + task->irqtrace = irq_state->irqtrace; #endif } @@ -350,11 +383,13 @@ static noinline void kcsan_found_watchpoint(const volatile void *ptr, flags = user_access_save(); if (consumed) { - kcsan_save_irqtrace(current); + struct kcsan_irq_state irqstate; + + kcsan_save_irqtrace(&irqstate); kcsan_report(ptr, size, type, KCSAN_VALUE_CHANGE_MAYBE, KCSAN_REPORT_CONSUMED_WATCHPOINT, watchpoint - watchpoints); - kcsan_restore_irqtrace(current); + kcsan_restore_irqtrace(&irqstate); } else { /* * The other thread may not print any diagnostics, as it has @@ -387,7 +422,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type) unsigned long access_mask; enum kcsan_value_change value_change = KCSAN_VALUE_CHANGE_MAYBE; unsigned long ua_flags = user_access_save(); - unsigned long irq_flags = 0; + struct kcsan_irq_state irqstate; /* * Always reset kcsan_skip counter in slow-path to avoid underflow; see @@ -412,14 +447,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type) goto out; } - /* - * Save and restore the IRQ state trace touched by KCSAN, since KCSAN's - * runtime is entered for every memory access, and potentially useful - * information is lost if dirtied by KCSAN. - */ - kcsan_save_irqtrace(current); - if (!kcsan_interrupt_watcher) - local_irq_save(irq_flags); + kcsan_save_irqtrace(&irqstate); watchpoint = insert_watchpoint((unsigned long)ptr, size, is_write); if (watchpoint == NULL) { @@ -559,9 +587,7 @@ kcsan_setup_watchpoint(const volatile void *ptr, size_t size, int type) remove_watchpoint(watchpoint); kcsan_counter_dec(KCSAN_COUNTER_USED_WATCHPOINTS); out_unlock: - if (!kcsan_interrupt_watcher) - local_irq_restore(irq_flags); - kcsan_restore_irqtrace(current); + kcsan_restore_irqtrace(&irqstate); out: user_access_restore(ua_flags); }
Jürgen Groß
2020-Aug-07 09:24 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On 07.08.20 11:01, Marco Elver wrote:> On Thu, 6 Aug 2020 at 18:06, Marco Elver <elver at google.com> wrote: >> On Thu, 6 Aug 2020 at 15:17, Marco Elver <elver at google.com> wrote: >>> On Thu, Aug 06, 2020 at 01:32PM +0200, peterz at infradead.org wrote: >>>> On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote: >>>>> Testing my hypothesis that raw then nested non-raw >>>>> local_irq_save/restore() breaks IRQ state tracking -- see the reproducer >>>>> below. This is at least 1 case I can think of that we're bound to hit. >>> ... >>>> >>>> /me goes ponder things... >>>> >>>> How's something like this then? >>>> >>>> --- >>>> include/linux/sched.h | 3 --- >>>> kernel/kcsan/core.c | 62 ++++++++++++++++++++++++++++++++++++--------------- >>>> 2 files changed, 44 insertions(+), 21 deletions(-) >>> >>> Thank you! That approach seems to pass syzbot (also with >>> CONFIG_PARAVIRT) and kcsan-test tests. >>> >>> I had to modify it some, so that report.c's use of the restore logic >>> works and not mess up the IRQ trace printed on KCSAN reports (with >>> CONFIG_KCSAN_VERBOSE). >>> >>> I still need to fully convince myself all is well now and we don't end >>> up with more fixes. :-) If it passes further testing, I'll send it as a >>> real patch (I want to add you as Co-developed-by, but would need your >>> Signed-off-by for the code you pasted, I think.) > > I let it run on syzbot through the night, and it's fine without > PARAVIRT (see below). I have sent the patch (need your Signed-off-by > as it's based on your code, thank you!): > https://lkml.kernel.org/r/20200807090031.3506555-1-elver at google.com > >> With CONFIG_PARAVIRT=y (without the notrace->noinstr patch), I still >> get lockdep DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()), although >> it takes longer for syzbot to hit them. But I think that's expected >> because we can still get the recursion that I pointed out, and will >> need that patch. > > Never mind, I get these warnings even if I don't turn on KCSAN > (CONFIG_KCSAN=n). Something else is going on with PARAVIRT=y that > throws off IRQ state tracking. :-/What are the settings of CONFIG_PARAVIRT_XXL and CONFIG_PARAVIRT_SPINLOCKS in this case? Juergen
Jürgen Groß
2020-Aug-07 10:35 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On 07.08.20 11:50, Marco Elver wrote:> On Fri, Aug 07, 2020 at 11:24AM +0200, J?rgen Gro? wrote: >> On 07.08.20 11:01, Marco Elver wrote: >>> On Thu, 6 Aug 2020 at 18:06, Marco Elver <elver at google.com> wrote: >>>> On Thu, 6 Aug 2020 at 15:17, Marco Elver <elver at google.com> wrote: >>>>> On Thu, Aug 06, 2020 at 01:32PM +0200, peterz at infradead.org wrote: >>>>>> On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote: >>>>>>> Testing my hypothesis that raw then nested non-raw >>>>>>> local_irq_save/restore() breaks IRQ state tracking -- see the reproducer >>>>>>> below. This is at least 1 case I can think of that we're bound to hit. >>>>> ... >>>>>> >>>>>> /me goes ponder things... >>>>>> >>>>>> How's something like this then? >>>>>> >>>>>> --- >>>>>> include/linux/sched.h | 3 --- >>>>>> kernel/kcsan/core.c | 62 ++++++++++++++++++++++++++++++++++++--------------- >>>>>> 2 files changed, 44 insertions(+), 21 deletions(-) >>>>> >>>>> Thank you! That approach seems to pass syzbot (also with >>>>> CONFIG_PARAVIRT) and kcsan-test tests. >>>>> >>>>> I had to modify it some, so that report.c's use of the restore logic >>>>> works and not mess up the IRQ trace printed on KCSAN reports (with >>>>> CONFIG_KCSAN_VERBOSE). >>>>> >>>>> I still need to fully convince myself all is well now and we don't end >>>>> up with more fixes. :-) If it passes further testing, I'll send it as a >>>>> real patch (I want to add you as Co-developed-by, but would need your >>>>> Signed-off-by for the code you pasted, I think.) >>> >>> I let it run on syzbot through the night, and it's fine without >>> PARAVIRT (see below). I have sent the patch (need your Signed-off-by >>> as it's based on your code, thank you!): >>> https://lkml.kernel.org/r/20200807090031.3506555-1-elver at google.com >>> >>>> With CONFIG_PARAVIRT=y (without the notrace->noinstr patch), I still >>>> get lockdep DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()), although >>>> it takes longer for syzbot to hit them. But I think that's expected >>>> because we can still get the recursion that I pointed out, and will >>>> need that patch. >>> >>> Never mind, I get these warnings even if I don't turn on KCSAN >>> (CONFIG_KCSAN=n). Something else is going on with PARAVIRT=y that >>> throws off IRQ state tracking. :-/ >> >> What are the settings of CONFIG_PARAVIRT_XXL and >> CONFIG_PARAVIRT_SPINLOCKS in this case? > > I attached a config. > > $> grep PARAVIRT .config > CONFIG_PARAVIRT=y > CONFIG_PARAVIRT_XXL=y > # CONFIG_PARAVIRT_DEBUG is not set > CONFIG_PARAVIRT_SPINLOCKS=y > # CONFIG_PARAVIRT_TIME_ACCOUNTING is not set > CONFIG_PARAVIRT_CLOCK=yAnything special I need to do to reproduce the problem? Or would you be willing to do some more rounds with different config settings? I think CONFIG_PARAVIRT_XXL shouldn't matter, but I'm not completely sure about that. CONFIG_PARAVIRT_SPINLOCKS would be my primary suspect. Juergen
Jürgen Groß
2020-Aug-07 12:04 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On 07.08.20 13:38, Marco Elver wrote:> On Fri, Aug 07, 2020 at 12:35PM +0200, J?rgen Gro? wrote: >> On 07.08.20 11:50, Marco Elver wrote: >>> On Fri, Aug 07, 2020 at 11:24AM +0200, J?rgen Gro? wrote: >>>> On 07.08.20 11:01, Marco Elver wrote: >>>>> On Thu, 6 Aug 2020 at 18:06, Marco Elver <elver at google.com> wrote: >>>>>> On Thu, 6 Aug 2020 at 15:17, Marco Elver <elver at google.com> wrote: >>>>>>> On Thu, Aug 06, 2020 at 01:32PM +0200, peterz at infradead.org wrote: >>>>>>>> On Thu, Aug 06, 2020 at 09:47:23AM +0200, Marco Elver wrote: >>>>>>>>> Testing my hypothesis that raw then nested non-raw >>>>>>>>> local_irq_save/restore() breaks IRQ state tracking -- see the reproducer >>>>>>>>> below. This is at least 1 case I can think of that we're bound to hit. >>>>>>> ... >>>>>>>> >>>>>>>> /me goes ponder things... >>>>>>>> >>>>>>>> How's something like this then? >>>>>>>> >>>>>>>> --- >>>>>>>> include/linux/sched.h | 3 --- >>>>>>>> kernel/kcsan/core.c | 62 ++++++++++++++++++++++++++++++++++++--------------- >>>>>>>> 2 files changed, 44 insertions(+), 21 deletions(-) >>>>>>> >>>>>>> Thank you! That approach seems to pass syzbot (also with >>>>>>> CONFIG_PARAVIRT) and kcsan-test tests. >>>>>>> >>>>>>> I had to modify it some, so that report.c's use of the restore logic >>>>>>> works and not mess up the IRQ trace printed on KCSAN reports (with >>>>>>> CONFIG_KCSAN_VERBOSE). >>>>>>> >>>>>>> I still need to fully convince myself all is well now and we don't end >>>>>>> up with more fixes. :-) If it passes further testing, I'll send it as a >>>>>>> real patch (I want to add you as Co-developed-by, but would need your >>>>>>> Signed-off-by for the code you pasted, I think.) >>>>> >>>>> I let it run on syzbot through the night, and it's fine without >>>>> PARAVIRT (see below). I have sent the patch (need your Signed-off-by >>>>> as it's based on your code, thank you!): >>>>> https://lkml.kernel.org/r/20200807090031.3506555-1-elver at google.com >>>>> >>>>>> With CONFIG_PARAVIRT=y (without the notrace->noinstr patch), I still >>>>>> get lockdep DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled()), although >>>>>> it takes longer for syzbot to hit them. But I think that's expected >>>>>> because we can still get the recursion that I pointed out, and will >>>>>> need that patch. >>>>> >>>>> Never mind, I get these warnings even if I don't turn on KCSAN >>>>> (CONFIG_KCSAN=n). Something else is going on with PARAVIRT=y that >>>>> throws off IRQ state tracking. :-/ >>>> >>>> What are the settings of CONFIG_PARAVIRT_XXL and >>>> CONFIG_PARAVIRT_SPINLOCKS in this case? >>> >>> I attached a config. >>> >>> $> grep PARAVIRT .config >>> CONFIG_PARAVIRT=y >>> CONFIG_PARAVIRT_XXL=y >>> # CONFIG_PARAVIRT_DEBUG is not set >>> CONFIG_PARAVIRT_SPINLOCKS=y >>> # CONFIG_PARAVIRT_TIME_ACCOUNTING is not set >>> CONFIG_PARAVIRT_CLOCK=y >> >> Anything special I need to do to reproduce the problem? Or would you be >> willing to do some more rounds with different config settings? > > I can only test it with syzkaller, but that probably doesn't help if you > don't already have it set up. It can't seem to find a C reproducer. > > I did some more rounds with different configs. > >> I think CONFIG_PARAVIRT_XXL shouldn't matter, but I'm not completely >> sure about that. CONFIG_PARAVIRT_SPINLOCKS would be my primary suspect. > > Yes, PARAVIRT_XXL doesn't make a different. When disabling > PARAVIRT_SPINLOCKS, however, the warnings go away.Thanks for testing! I take it you are doing the tests in a KVM guest? If so I have a gut feeling that the use of local_irq_save() and local_irq_restore() in kvm_wait() might be fishy. I might be completely wrong here, though. BTW, I think Xen's variant of pv spinlocks is fine (no playing with IRQ on/off). Hyper-V seems to do the same as KVM, and kicking another vcpu could be problematic as well, as it is just using IPI. Juergen
Jürgen Groß
2020-Aug-11 07:04 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On 11.08.20 09:00, Marco Elver wrote:> On Fri, 7 Aug 2020 at 17:19, Marco Elver <elver at google.com> wrote: >> On Fri, Aug 07, 2020 at 02:08PM +0200, Marco Elver wrote: >>> On Fri, 7 Aug 2020 at 14:04, J?rgen Gro? <jgross at suse.com> wrote: >>>> >>>> On 07.08.20 13:38, Marco Elver wrote: >>>>> On Fri, Aug 07, 2020 at 12:35PM +0200, J?rgen Gro? wrote: > ... >>>>>> I think CONFIG_PARAVIRT_XXL shouldn't matter, but I'm not completely >>>>>> sure about that. CONFIG_PARAVIRT_SPINLOCKS would be my primary suspect. >>>>> >>>>> Yes, PARAVIRT_XXL doesn't make a different. When disabling >>>>> PARAVIRT_SPINLOCKS, however, the warnings go away. >>>> >>>> Thanks for testing! >>>> >>>> I take it you are doing the tests in a KVM guest? >>> >>> Yes, correct. >>> >>>> If so I have a gut feeling that the use of local_irq_save() and >>>> local_irq_restore() in kvm_wait() might be fishy. I might be completely >>>> wrong here, though. >>> >>> Happy to help debug more, although I might need patches or pointers >>> what to play with. >>> >>>> BTW, I think Xen's variant of pv spinlocks is fine (no playing with IRQ >>>> on/off). >>>> >>>> Hyper-V seems to do the same as KVM, and kicking another vcpu could be >>>> problematic as well, as it is just using IPI. >> >> I experimented a bit more, and the below patch seems to solve the >> warnings. However, that was based on your pointer about kvm_wait(), and >> I can't quite tell if it is the right solution. >> >> My hypothesis here is simply that kvm_wait() may be called in a place >> where we get the same case I mentioned to Peter, >> >> raw_local_irq_save(); /* or other IRQs off without tracing */ >> ... >> kvm_wait() /* IRQ state tracing gets confused */ >> ... >> raw_local_irq_restore(); >> >> and therefore, using raw variants in kvm_wait() works. It's also safe >> because it doesn't call any other libraries that would result in corrupt >> IRQ state AFAIK. > > Just to follow-up, it'd still be nice to fix this. Suggestions? > > I could send the below as a patch, but can only go off my above > hypothesis and the fact that syzbot is happier, so not entirely > convincing.Peter has told me via IRC he will look soon further into this. Your finding suggests that the pv-lock implementation for Hyper-V needs some tweaking, too. For that purpose I'm adding Wei to Cc. Juergen> > Thanks, > -- Marco > >> ------ >8 ------ >> >> diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c >> index 233c77d056c9..1d412d1466f0 100644 >> --- a/arch/x86/kernel/kvm.c >> +++ b/arch/x86/kernel/kvm.c >> @@ -797,7 +797,7 @@ static void kvm_wait(u8 *ptr, u8 val) >> if (in_nmi()) >> return; >> >> - local_irq_save(flags); >> + raw_local_irq_save(flags); >> >> if (READ_ONCE(*ptr) != val) >> goto out; >> @@ -810,10 +810,10 @@ static void kvm_wait(u8 *ptr, u8 val) >> if (arch_irqs_disabled_flags(flags)) >> halt(); >> else >> - safe_halt(); >> + raw_safe_halt(); >> >> out: >> - local_irq_restore(flags); >> + raw_local_irq_restore(flags); >> } >> >> #ifdef CONFIG_X86_32 >
Peter Zijlstra
2020-Aug-11 07:41 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On Fri, Aug 07, 2020 at 05:19:03PM +0200, Marco Elver wrote:> My hypothesis here is simply that kvm_wait() may be called in a place > where we get the same case I mentioned to Peter, > > raw_local_irq_save(); /* or other IRQs off without tracing */ > ... > kvm_wait() /* IRQ state tracing gets confused */ > ... > raw_local_irq_restore(); > > and therefore, using raw variants in kvm_wait() works. It's also safe > because it doesn't call any other libraries that would result in corruptYes, this is definitely an issue. Tracing, we also musn't call into tracing when using raw_local_irq_*(). Because then we re-intoduce this same issue all over again. Both halt() and safe_halt() are more paravirt calls, but given we're in a KVM paravirt call already, I suppose we can directly use native_*() here. Something like so then... I suppose, but then the Xen variants need TLC too. --- arch/x86/include/asm/irqflags.h | 4 ++-- arch/x86/include/asm/kvm_para.h | 18 +++++++++--------- arch/x86/kernel/kvm.c | 14 +++++++------- 3 files changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h index 02a0cf547d7b..7c614db25274 100644 --- a/arch/x86/include/asm/irqflags.h +++ b/arch/x86/include/asm/irqflags.h @@ -54,13 +54,13 @@ static __always_inline void native_irq_enable(void) asm volatile("sti": : :"memory"); } -static inline __cpuidle void native_safe_halt(void) +static __always_inline __cpuidle void native_safe_halt(void) { mds_idle_clear_cpu_buffers(); asm volatile("sti; hlt": : :"memory"); } -static inline __cpuidle void native_halt(void) +static __always_inline __cpuidle void native_halt(void) { mds_idle_clear_cpu_buffers(); asm volatile("hlt": : :"memory"); diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 49d3a9edb06f..90f7ea58ebb0 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -30,7 +30,7 @@ static inline bool kvm_check_and_clear_guest_paused(void) * noted by the particular hypercall. */ -static inline long kvm_hypercall0(unsigned int nr) +static __always_inline long kvm_hypercall0(unsigned int nr) { long ret; asm volatile(KVM_HYPERCALL @@ -40,7 +40,7 @@ static inline long kvm_hypercall0(unsigned int nr) return ret; } -static inline long kvm_hypercall1(unsigned int nr, unsigned long p1) +static __always_inline long kvm_hypercall1(unsigned int nr, unsigned long p1) { long ret; asm volatile(KVM_HYPERCALL @@ -50,8 +50,8 @@ static inline long kvm_hypercall1(unsigned int nr, unsigned long p1) return ret; } -static inline long kvm_hypercall2(unsigned int nr, unsigned long p1, - unsigned long p2) +static __always_inline long kvm_hypercall2(unsigned int nr, unsigned long p1, + unsigned long p2) { long ret; asm volatile(KVM_HYPERCALL @@ -61,8 +61,8 @@ static inline long kvm_hypercall2(unsigned int nr, unsigned long p1, return ret; } -static inline long kvm_hypercall3(unsigned int nr, unsigned long p1, - unsigned long p2, unsigned long p3) +static __always_inline long kvm_hypercall3(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3) { long ret; asm volatile(KVM_HYPERCALL @@ -72,9 +72,9 @@ static inline long kvm_hypercall3(unsigned int nr, unsigned long p1, return ret; } -static inline long kvm_hypercall4(unsigned int nr, unsigned long p1, - unsigned long p2, unsigned long p3, - unsigned long p4) +static __always_inline long kvm_hypercall4(unsigned int nr, unsigned long p1, + unsigned long p2, unsigned long p3, + unsigned long p4) { long ret; asm volatile(KVM_HYPERCALL diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 233c77d056c9..15f8dfd8812d 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -779,7 +779,7 @@ arch_initcall(kvm_alloc_cpumask); #ifdef CONFIG_PARAVIRT_SPINLOCKS /* Kick a cpu by its apicid. Used to wake up a halted vcpu */ -static void kvm_kick_cpu(int cpu) +static notrace kvm_kick_cpu(int cpu) { int apicid; unsigned long flags = 0; @@ -790,14 +790,14 @@ static void kvm_kick_cpu(int cpu) #include <asm/qspinlock.h> -static void kvm_wait(u8 *ptr, u8 val) +static notrace kvm_wait(u8 *ptr, u8 val) { unsigned long flags; if (in_nmi()) return; - local_irq_save(flags); + raw_local_irq_save(flags); if (READ_ONCE(*ptr) != val) goto out; @@ -808,16 +808,16 @@ static void kvm_wait(u8 *ptr, u8 val) * in irq spinlock slowpath and no spurious interrupt occur to save us. */ if (arch_irqs_disabled_flags(flags)) - halt(); + native_halt(); else - safe_halt(); + native_safe_halt(); out: - local_irq_restore(flags); + raw_local_irq_restore(flags); } #ifdef CONFIG_X86_32 -__visible bool __kvm_vcpu_is_preempted(long cpu) +__visible notrace bool __kvm_vcpu_is_preempted(long cpu) { struct kvm_steal_time *src = &per_cpu(steal_time, cpu);
Jürgen Groß
2020-Aug-11 07:57 UTC
[PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
On 11.08.20 09:41, Peter Zijlstra wrote:> On Fri, Aug 07, 2020 at 05:19:03PM +0200, Marco Elver wrote: > >> My hypothesis here is simply that kvm_wait() may be called in a place >> where we get the same case I mentioned to Peter, >> >> raw_local_irq_save(); /* or other IRQs off without tracing */ >> ... >> kvm_wait() /* IRQ state tracing gets confused */ >> ... >> raw_local_irq_restore(); >> >> and therefore, using raw variants in kvm_wait() works. It's also safe >> because it doesn't call any other libraries that would result in corrupt > > Yes, this is definitely an issue. > > Tracing, we also musn't call into tracing when using raw_local_irq_*(). > Because then we re-intoduce this same issue all over again. > > Both halt() and safe_halt() are more paravirt calls, but given we're in > a KVM paravirt call already, I suppose we can directly use native_*() > here. > > Something like so then... I suppose, but then the Xen variants need TLC > too.Just to be sure I understand you correct: You mean that xen_qlock_kick() and xen_qlock_wait() and all functions called by those should gain the "notrace" attribute, right? I am not sure why the kick variants need it, though. IMO those are called only after the lock has been released, so they should be fine without notrace. And again: we shouldn't forget the Hyper-V variants. Juergen
Possibly Parallel Threads
- [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
- [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers
- [RFC/PATCH PV_OPS X86_64 12/17] paravirt_ops - interrupt/exception changes
- [RFC/PATCH PV_OPS X86_64 12/17] paravirt_ops - interrupt/exception changes
- [PATCH] x86/paravirt: Add missing noinstr to arch_local*() helpers