Josh Poimboeuf
2017-Oct-06 14:33 UTC
[PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure
On Thu, Oct 05, 2017 at 04:35:03PM -0400, Boris Ostrovsky wrote:> > > #ifdef CONFIG_PARAVIRT > > +/* > > + * Paravirt alternatives are applied much earlier than normal alternatives. > > + * They are only applied when running on a hypervisor. They replace some > > + * native instructions with calls to pv ops. > > + */ > > +void __init apply_pv_alternatives(void) > > +{ > > + setup_force_cpu_cap(X86_FEATURE_PV_OPS); > > Not for Xen HVM guests.>From what I can tell, HVM guests still use pv_time_ops andpv_mmu_ops.exit_mmap, right?> > + apply_alternatives(__pv_alt_instructions, __pv_alt_instructions_end); > > +} > > > This is a problem (at least for Xen PV guests): > apply_alternatives()->text_poke_early()->local_irq_save()->...'cli'->death.Ah, right.> It might be possible not to turn off/on the interrupts in this > particular case since the guest probably won't be able to handle an > interrupt at this point anyway.Yeah, that should work. For Xen and for the other hypervisors, this is called well before irq init, so interrupts can't be handled yet anyway.> > + > > void __init_or_module apply_paravirt(struct paravirt_patch_site *start, > > struct paravirt_patch_site *end) > > { > > diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c > > index 4fa90006ac68..17243fe0f5ce 100644 > > --- a/arch/x86/kernel/cpu/hypervisor.c > > +++ b/arch/x86/kernel/cpu/hypervisor.c > > @@ -71,6 +71,8 @@ void __init init_hypervisor_platform(void) > > if (!x86_hyper) > > return; > > > > + apply_pv_alternatives(); > > Not for Xen PV guests who have already done this.I think it would be harmless, but yeah, it's probably best to only write it once. Thanks for the review! -- Josh
Boris Ostrovsky
2017-Oct-06 15:29 UTC
[PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure
On 10/06/2017 10:32 AM, Josh Poimboeuf wrote:> On Thu, Oct 05, 2017 at 04:35:03PM -0400, Boris Ostrovsky wrote: >>> #ifdef CONFIG_PARAVIRT >>> +/* >>> + * Paravirt alternatives are applied much earlier than normal alternatives. >>> + * They are only applied when running on a hypervisor. They replace some >>> + * native instructions with calls to pv ops. >>> + */ >>> +void __init apply_pv_alternatives(void) >>> +{ >>> + setup_force_cpu_cap(X86_FEATURE_PV_OPS); >> Not for Xen HVM guests. > From what I can tell, HVM guests still use pv_time_ops and > pv_mmu_ops.exit_mmap, right?Right, I forgot about that one.>>> + >>> void __init_or_module apply_paravirt(struct paravirt_patch_site *start, >>> struct paravirt_patch_site *end) >>> { >>> diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c >>> index 4fa90006ac68..17243fe0f5ce 100644 >>> --- a/arch/x86/kernel/cpu/hypervisor.c >>> +++ b/arch/x86/kernel/cpu/hypervisor.c >>> @@ -71,6 +71,8 @@ void __init init_hypervisor_platform(void) >>> if (!x86_hyper) >>> return; >>> >>> + apply_pv_alternatives(); >> Not for Xen PV guests who have already done this. > I think it would be harmless, but yeah, it's probably best to only write > it once.I also wonder whether calling apply_pv_alternatives() here before x86_hyper->init_platform() will work since the latter may be setting those op. In fact, that's what Xen HVM does for pv_mmu_ops.exit_mmap. -boris
Josh Poimboeuf
2017-Oct-06 16:30 UTC
[PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure
On Fri, Oct 06, 2017 at 11:29:52AM -0400, Boris Ostrovsky wrote:> >>> + > >>> void __init_or_module apply_paravirt(struct paravirt_patch_site *start, > >>> struct paravirt_patch_site *end) > >>> { > >>> diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c > >>> index 4fa90006ac68..17243fe0f5ce 100644 > >>> --- a/arch/x86/kernel/cpu/hypervisor.c > >>> +++ b/arch/x86/kernel/cpu/hypervisor.c > >>> @@ -71,6 +71,8 @@ void __init init_hypervisor_platform(void) > >>> if (!x86_hyper) > >>> return; > >>> > >>> + apply_pv_alternatives(); > >> Not for Xen PV guests who have already done this. > > I think it would be harmless, but yeah, it's probably best to only write > > it once. > > I also wonder whether calling apply_pv_alternatives() here before > x86_hyper->init_platform() will work since the latter may be setting > those op. In fact, that's what Xen HVM does for pv_mmu_ops.exit_mmap.apply_pv_alternatives() changes: (native code) to call *pv_whatever_ops.whatever So apply_pv_alternatives() should be called *before* any of the ops are set up. -- Josh
Boris Ostrovsky
2017-Oct-12 19:11 UTC
[PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure
On 10/06/2017 10:32 AM, Josh Poimboeuf wrote:> On Thu, Oct 05, 2017 at 04:35:03PM -0400, Boris Ostrovsky wrote: >>> #ifdef CONFIG_PARAVIRT >>> +/* >>> + * Paravirt alternatives are applied much earlier than normal alternatives. >>> + * They are only applied when running on a hypervisor. They replace some >>> + * native instructions with calls to pv ops. >>> + */ >>> +void __init apply_pv_alternatives(void) >>> +{ >>> + setup_force_cpu_cap(X86_FEATURE_PV_OPS); >> Not for Xen HVM guests. > From what I can tell, HVM guests still use pv_time_ops and > pv_mmu_ops.exit_mmap, right? > >>> + apply_alternatives(__pv_alt_instructions, __pv_alt_instructions_end); >>> +} >> >> This is a problem (at least for Xen PV guests): >> apply_alternatives()->text_poke_early()->local_irq_save()->...'cli'->death. > Ah, right. > >> It might be possible not to turn off/on the interrupts in this >> particular case since the guest probably won't be able to handle an >> interrupt at this point anyway. > Yeah, that should work. For Xen and for the other hypervisors, this is > called well before irq init, so interrupts can't be handled yet anyway.There is also another problem: [ 1.312425] general protection fault: 0000 [#1] SMP [ 1.312901] Modules linked in: [ 1.313389] CPU: 0 PID: 1 Comm: init Not tainted 4.14.0-rc4+ #6 [ 1.313878] task: ffff88003e2c0000 task.stack: ffffc9000038c000 [ 1.314360] RIP: 10000e030:entry_SYSCALL_64_fastpath+0x1/0xa5 [ 1.314854] RSP: e02b:ffffc9000038ff50 EFLAGS: 00010046 [ 1.315336] RAX: 000000000000000c RBX: 000055f550168040 RCX: 00007fcfc959f59a [ 1.315827] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 1.316315] RBP: 000000000000000a R08: 000000000000037f R09: 0000000000000064 [ 1.316805] R10: 000000001f89cbf5 R11: ffff88003e2c0000 R12: 00007fcfc958ad60 [ 1.317300] R13: 0000000000000000 R14: 000055f550185954 R15: 0000000000001000 [ 1.317801] FS: 0000000000000000(0000) GS:ffff88003f800000(0000) knlGS:0000000000000000 [ 1.318267] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.318750] CR2: 00007fcfc97ab218 CR3: 000000003c88e000 CR4: 0000000000042660 [ 1.319235] Call Trace: [ 1.319700] Code: 51 50 57 56 52 51 6a da 41 50 41 51 41 52 41 53 48 83 ec 30 65 4c 8b 1c 25 c0 d2 00 00 41 f7 03 df 39 08 90 0f 85 a5 00 00 00 50 <ff> 15 9c 95 d0 ff 58 48 3d 4c 01 00 00 77 0f 4c 89 d1 ff 14 c5 [ 1.321161] RIP: entry_SYSCALL_64_fastpath+0x1/0xa5 RSP: ffffc9000038ff50 [ 1.344255] ---[ end trace d7cb8cd6cd7c294c ]--- [ 1.345009] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b All code ======= 0: 51 push %rcx 1: 50 push %rax 2: 57 push %rdi 3: 56 push %rsi 4: 52 push %rdx 5: 51 push %rcx 6: 6a da pushq $0xffffffffffffffda 8: 41 50 push %r8 a: 41 51 push %r9 c: 41 52 push %r10 e: 41 53 push %r11 10: 48 83 ec 30 sub $0x30,%rsp 14: 65 4c 8b 1c 25 c0 d2 mov %gs:0xd2c0,%r11 1b: 00 00 1d: 41 f7 03 df 39 08 90 testl $0x900839df,(%r11) 24: 0f 85 a5 00 00 00 jne 0xcf 2a: 50 push %rax 2b:* ff 15 9c 95 d0 ff callq *-0x2f6a64(%rip) # 0xffffffffffd095cd <-- trapping instruction 31: 58 pop %rax 32: 48 3d 4c 01 00 00 cmp $0x14c,%rax 38: 77 0f ja 0x49 3a: 4c 89 d1 mov %r10,%rcx 3d: ff .byte 0xff 3e: 14 c5 adc $0xc5,%al so the original 'cli' was replaced with the pv call but to me the offset looks a bit off, no? Shouldn't it always be positive? -boris
Andrew Cooper
2017-Oct-12 19:27 UTC
[Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure
On 12/10/17 20:11, Boris Ostrovsky wrote:> On 10/06/2017 10:32 AM, Josh Poimboeuf wrote: >> On Thu, Oct 05, 2017 at 04:35:03PM -0400, Boris Ostrovsky wrote: >>>> #ifdef CONFIG_PARAVIRT >>>> +/* >>>> + * Paravirt alternatives are applied much earlier than normal alternatives. >>>> + * They are only applied when running on a hypervisor. They replace some >>>> + * native instructions with calls to pv ops. >>>> + */ >>>> +void __init apply_pv_alternatives(void) >>>> +{ >>>> + setup_force_cpu_cap(X86_FEATURE_PV_OPS); >>> Not for Xen HVM guests. >> From what I can tell, HVM guests still use pv_time_ops and >> pv_mmu_ops.exit_mmap, right? >> >>>> + apply_alternatives(__pv_alt_instructions, __pv_alt_instructions_end); >>>> +} >>> This is a problem (at least for Xen PV guests): >>> apply_alternatives()->text_poke_early()->local_irq_save()->...'cli'->death. >> Ah, right. >> >>> It might be possible not to turn off/on the interrupts in this >>> particular case since the guest probably won't be able to handle an >>> interrupt at this point anyway. >> Yeah, that should work. For Xen and for the other hypervisors, this is >> called well before irq init, so interrupts can't be handled yet anyway. > There is also another problem: > > [ 1.312425] general protection fault: 0000 [#1] SMP > [ 1.312901] Modules linked in: > [ 1.313389] CPU: 0 PID: 1 Comm: init Not tainted 4.14.0-rc4+ #6 > [ 1.313878] task: ffff88003e2c0000 task.stack: ffffc9000038c000 > [ 1.314360] RIP: 10000e030:entry_SYSCALL_64_fastpath+0x1/0xa5 > [ 1.314854] RSP: e02b:ffffc9000038ff50 EFLAGS: 00010046 > [ 1.315336] RAX: 000000000000000c RBX: 000055f550168040 RCX: > 00007fcfc959f59a > [ 1.315827] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > 0000000000000000 > [ 1.316315] RBP: 000000000000000a R08: 000000000000037f R09: > 0000000000000064 > [ 1.316805] R10: 000000001f89cbf5 R11: ffff88003e2c0000 R12: > 00007fcfc958ad60 > [ 1.317300] R13: 0000000000000000 R14: 000055f550185954 R15: > 0000000000001000 > [ 1.317801] FS: 0000000000000000(0000) GS:ffff88003f800000(0000) > knlGS:0000000000000000 > [ 1.318267] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 1.318750] CR2: 00007fcfc97ab218 CR3: 000000003c88e000 CR4: > 0000000000042660 > [ 1.319235] Call Trace: > [ 1.319700] Code: 51 50 57 56 52 51 6a da 41 50 41 51 41 52 41 53 48 > 83 ec 30 65 4c 8b 1c 25 c0 d2 00 00 41 f7 03 df 39 08 90 0f 85 a5 00 00 > 00 50 <ff> 15 9c 95 d0 ff 58 48 3d 4c 01 00 00 77 0f 4c 89 d1 ff 14 c5 > [ 1.321161] RIP: entry_SYSCALL_64_fastpath+0x1/0xa5 RSP: ffffc9000038ff50 > [ 1.344255] ---[ end trace d7cb8cd6cd7c294c ]--- > [ 1.345009] Kernel panic - not syncing: Attempted to kill init! > exitcode=0x0000000b > > > All code > =======> 0: 51 push %rcx > 1: 50 push %rax > 2: 57 push %rdi > 3: 56 push %rsi > 4: 52 push %rdx > 5: 51 push %rcx > 6: 6a da pushq $0xffffffffffffffda > 8: 41 50 push %r8 > a: 41 51 push %r9 > c: 41 52 push %r10 > e: 41 53 push %r11 > 10: 48 83 ec 30 sub $0x30,%rsp > 14: 65 4c 8b 1c 25 c0 d2 mov %gs:0xd2c0,%r11 > 1b: 00 00 > 1d: 41 f7 03 df 39 08 90 testl $0x900839df,(%r11) > 24: 0f 85 a5 00 00 00 jne 0xcf > 2a: 50 push %rax > 2b:* ff 15 9c 95 d0 ff callq *-0x2f6a64(%rip) # > 0xffffffffffd095cd <-- trapping instruction > 31: 58 pop %rax > 32: 48 3d 4c 01 00 00 cmp $0x14c,%rax > 38: 77 0f ja 0x49 > 3a: 4c 89 d1 mov %r10,%rcx > 3d: ff .byte 0xff > 3e: 14 c5 adc $0xc5,%al > > > so the original 'cli' was replaced with the pv call but to me the offset > looks a bit off, no? Shouldn't it always be positive?callq takes a 32bit signed displacement, so jumping back by up to 2G is perfectly legitimate. The #GP[0] however means that whatever 8 byte value was found at -0x2f6a64(%rip) was a non-canonical address. One option is that the pvops structure hasn't been initialised properly, but an alternative is that the relocation wasn't processed correctly, and the code is trying to reference something which isn't a function pointer. ~Andrew
Possibly Parallel Threads
- [Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure
- [Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure
- [Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure
- [Xen-devel] [PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure
- [PATCH 11/13] x86/paravirt: Add paravirt alternatives infrastructure