I think I see what's going on, and I agree it looks like a bug. It was too much to hope that later passes weren't going to mess with the PUSH instruction. :( While I was trying to reproduce your problem, I think I found another one that looks like this: $ clang -S -O2 -m32 -mregparm=3 -mretpoline spill_across_rp.cpp -o - | grep _retpoline_push -B2 ... movl %eax, 8(%esp) # 4-byte Spill ... pushl %edi movl 8(%esp), %edi # 4-byte Reload calll __llvm_retpoline_push That's obviously broken, it doesn't account for the SP adjustment in the push. It's weird, because it's basically the opposite of the problem you're having, which looks like LLVM *is* accounting for the push and is trying to adjust its offsets accordingly. Can you send along the full command line used to compile io_apic_b.i? On Fri, Feb 9, 2018 at 11:38 AM, David Woodhouse <dwmw2 at infradead.org> wrote:> On Fri, 2018-02-09 at 11:24 -0800, Reid Kleckner wrote: > > I haven't read the all the emails in full detail, but it seems pretty > > clear that __x86_indirect_thunk and __llvm_retpoline_push do not do > > the same things. It sounds like __llvm_retpoline_push is equivalent > > to __x86_indirect_thunk except first it swaps the two words on the > > top of the stack. > > > > I arranged it this way because the x86 call instruction puts the > > intended return address on the top of the stack, and there's no easy > > way to put it anywhere else. We use this thunk when we want to make > > an indirect call and there are no available scratch registers, i.e. > > 32-bit -mregparm=3 and the call has three or more arguments, which > > happens in Linux. One way to avoid this would be to compile with > > -mregparm=2, but that would pessimize direct calls unnecessarily. > > > > It sounds like Linux is already providing its own thunks, so it might > > be better to for us to go back to the __llvm_retpoline_push external > > thunk name and then get Linux to provide that thunk ifdef __clang__. > > Yes, the thunks are different in this case. That one should indeed be > renamed, and I'll provide the correct one in the kernel. > > However, even if I let LLVM emit the thunk for itself, code generation > seems hosed. It looks like after a call to __llvm_retpoline_push (or > perhaps more to the point after pushing the target address in > preparation for the call?) we're losing track of where %esp points, and > thus everything else that was spilled to the stack. > > Please let me know if you need any more detail on that, than I provided > in my last email. You should have everything you need there to follow > along with my analysis and reproduce it for yourself. Except perhaps > the Grandma-suck-eggs command lines like > > $ qemu-system-i386 -display none -serial stdio -kernel > arch/x86/boot/bzImage -append earlyprintk=ttyS0,keep -s -S > > $ gdb vmlinux > (gdb) tar rem localhost:1234 > (gdb) break mp_register_ioapic > (gdb) c >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180209/0ffda508/attachment.html>
David Woodhouse via llvm-dev
2018-Feb-09 22:28 UTC
[llvm-dev] retpoline mitigation and 6.0
On Fri, 2018-02-09 at 14:23 -0800, Reid Kleckner wrote:> I think I see what's going on, and I agree it looks like a bug. It > was too much to hope that later passes weren't going to mess with the > PUSH instruction. :( > > While I was trying to reproduce your problem, I think I found another > one that looks like this: > $ clang -S -O2 -m32 -mregparm=3 -mretpoline spill_across_rp.cpp -o - > | grep _retpoline_push -B2 > ... > movl %eax, 8(%esp) # 4-byte Spill > ... > pushl %edi > movl 8(%esp), %edi # 4-byte Reload > calll __llvm_retpoline_push > That's obviously broken, it doesn't account for the SP adjustment in > the push.Fun. Isn't that the *opposite* of the problem I was seeing? I thought it *was* accounting for the SP adjustment in the push, even after __llvm_retpoline_push had done its thing and effectively popped that word back off the stack again. I've now filed this as PR36329 and marked it blocking PR35804 (6.0 release).> It's weird, because it's basically the opposite of the problem you're > having, which looks like LLVM *is* accounting for the push and is > trying to adjust its offsets accordingly. > > Can you send along the full command line used to compile io_apic_b.i?/home/dwmw2/git/llvm6/bin/clang -Wp,-MD,arch/x86/kernel/apic/.io_apic_b.o.d -nostdinc -isystem /home/dwmw2/git/llvm6/lib/clang/6.0.0/include -I./arch/x86/include -I./arch/x86/include/generated -I./include -I./arch/x86/include/uapi -I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi -include ./include/linux/kconfig.h -D__KERNEL__ -Qunused-arguments -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common -fshort-wchar -Werror-implicit-function-declaration -Wno-format-security -std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m32 -msoft-float -mregparm=3 -freg-struct-return -fno-pic -mstack-alignment=4 -march=i686 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_SHA1_NI=1 -DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables -mretpoline-external-thunk -DRETPOLINE -O2 -Wframe-larger-than=2048 -fno-stack-protector -Wno-unused-variable -Wno-format-invalid-specifier -Wno-gnu -Wno-address-of-packed-member -Wno-tautological-compare -mno-global-merge -no-integrated-as -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -pg -mfentry -DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign -fno-strict-overflow -fno-stack-check -Werror=implicit-int -Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types -Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare -Wno-format-zero-length -Wno-uninitialized -mno-retpoline-external-thunk -DKBUILD_BASENAME='"io_apic_b"' -DKBUILD_MODNAME='"io_apic_b"' -c -o arch/x86/kernel/apic/.tmp_io_apic_b.o arch/x86/kernel/apic/io_apic_b.c -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5213 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180209/a9ac248e/attachment.bin>
Reid, is this what you fixed with r325049 that's also merged to 6.0? If so, do we have everything that's needed, or is there anything else I should wait for? Chandler, do you have anything else? Also, release notes please :-) On Fri, Feb 9, 2018 at 11:23 PM, Reid Kleckner via llvm-dev <llvm-dev at lists.llvm.org> wrote:> I think I see what's going on, and I agree it looks like a bug. It was too > much to hope that later passes weren't going to mess with the PUSH > instruction. :( > > While I was trying to reproduce your problem, I think I found another one > that looks like this: > > $ clang -S -O2 -m32 -mregparm=3 -mretpoline spill_across_rp.cpp -o - | grep > _retpoline_push -B2 > ... > movl %eax, 8(%esp) # 4-byte Spill > ... > pushl %edi > movl 8(%esp), %edi # 4-byte Reload > calll __llvm_retpoline_push > > That's obviously broken, it doesn't account for the SP adjustment in the > push. > > It's weird, because it's basically the opposite of the problem you're > having, which looks like LLVM *is* accounting for the push and is trying to > adjust its offsets accordingly. > > Can you send along the full command line used to compile io_apic_b.i? > > On Fri, Feb 9, 2018 at 11:38 AM, David Woodhouse <dwmw2 at infradead.org> > wrote: >> >> On Fri, 2018-02-09 at 11:24 -0800, Reid Kleckner wrote: >> > I haven't read the all the emails in full detail, but it seems pretty >> > clear that __x86_indirect_thunk and __llvm_retpoline_push do not do >> > the same things. It sounds like __llvm_retpoline_push is equivalent >> > to __x86_indirect_thunk except first it swaps the two words on the >> > top of the stack. >> > >> > I arranged it this way because the x86 call instruction puts the >> > intended return address on the top of the stack, and there's no easy >> > way to put it anywhere else. We use this thunk when we want to make >> > an indirect call and there are no available scratch registers, i.e. >> > 32-bit -mregparm=3 and the call has three or more arguments, which >> > happens in Linux. One way to avoid this would be to compile with >> > -mregparm=2, but that would pessimize direct calls unnecessarily. >> > >> > It sounds like Linux is already providing its own thunks, so it might >> > be better to for us to go back to the __llvm_retpoline_push external >> > thunk name and then get Linux to provide that thunk ifdef __clang__. >> >> Yes, the thunks are different in this case. That one should indeed be >> renamed, and I'll provide the correct one in the kernel. >> >> However, even if I let LLVM emit the thunk for itself, code generation >> seems hosed. It looks like after a call to __llvm_retpoline_push (or >> perhaps more to the point after pushing the target address in >> preparation for the call?) we're losing track of where %esp points, and >> thus everything else that was spilled to the stack. >> >> Please let me know if you need any more detail on that, than I provided >> in my last email. You should have everything you need there to follow >> along with my analysis and reproduce it for yourself. Except perhaps >> the Grandma-suck-eggs command lines like >> >> $ qemu-system-i386 -display none -serial stdio -kernel >> arch/x86/boot/bzImage -append earlyprintk=ttyS0,keep -s -S >> >> $ gdb vmlinux >> (gdb) tar rem localhost:1234 >> (gdb) break mp_register_ioapic >> (gdb) c > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
David Woodhouse via llvm-dev
2018-Feb-19 15:31 UTC
[llvm-dev] retpoline mitigation and 6.0
On Mon, 2018-02-19 at 16:18 +0100, Hans Wennborg wrote:> Reid, is this what you fixed with r325049 that's also merged to 6.0?That's working for me, yes.> If so, do we have everything that's needed, or is there anything else > I should wait for?I wouldn't mind an opinion on https://bugs.llvm.org/show_bug.cgi?id=33587 Clang for x86_32 will happily compile this: long long ret; asm ("movl $5, %0" : "=r" (ret)); // 32-bit reg for 64-bit val is fine. ... but not this: long long ret; asm ("movb $5, %0" : "=q" (ret)); // 8-bit reg for 64-bit val not fine. In practice, those cases go away in optimisation but the problem for us in building Linux is that the 8-bit case causes a build failure *before* the optimiser realises it'd never have to emit it anyway. I can hack that up so that we're pretending to have 8-bit variables: https://lkml.org/lkml/2018/2/9/529 But before I go further with that rather icky approach I'd like to know why it doesn't break for the "r" case, and if it's going to *keep* working. And if we can make the "q" case behave the same... -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5213 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180219/7d12b7a0/attachment.bin>