thr3ads.net - llvm dev - [llvm-dev] retpoline mitigation and 6.0 [Feb 2018]

If this information is useful, please help other people find it:
Share via:

Reid Kleckner via llvm-dev

2018-Feb-09 22:23 UTC

[llvm-dev] retpoline mitigation and 6.0

I think I see what's going on, and I agree it looks like a bug. It was too
much to hope that later passes weren't going to mess with the PUSH
instruction. :(

While I was trying to reproduce your problem, I think I found another one
that looks like this:

$ clang -S -O2 -m32 -mregparm=3 -mretpoline spill_across_rp.cpp -o - |
grep _retpoline_push -B2
...
        movl    %eax, 8(%esp)           # 4-byte Spill
...
        pushl   %edi
        movl    8(%esp), %edi           # 4-byte Reload
        calll   __llvm_retpoline_push

That's obviously broken, it doesn't account for the SP adjustment in the
push.

It's weird, because it's basically the opposite of the problem
you're
having, which looks like LLVM *is* accounting for the push and is trying to
adjust its offsets accordingly.

Can you send along the full command line used to compile io_apic_b.i?

On Fri, Feb 9, 2018 at 11:38 AM, David Woodhouse <dwmw2 at infradead.org>
wrote:
> On Fri, 2018-02-09 at 11:24 -0800, Reid Kleckner wrote:
> > I haven't read the all the emails in full detail, but it seems
pretty
> > clear that  __x86_indirect_thunk and __llvm_retpoline_push do not do
> > the same things. It sounds like __llvm_retpoline_push is equivalent
> > to __x86_indirect_thunk except first it swaps the two words on the
> > top of the stack.
> >
> > I arranged it this way because the x86 call instruction puts the
> > intended return address on the top of the stack, and there's no
easy
> > way to put it anywhere else. We use this thunk when we want to make
> > an indirect call and there are no available scratch registers, i.e.
> > 32-bit -mregparm=3 and the call has three or more arguments, which
> > happens in Linux. One way to avoid this would be to compile with
> > -mregparm=2, but that would pessimize direct calls unnecessarily.
> >
> > It sounds like Linux is already providing its own thunks, so it might
> > be better to for us to go back to the __llvm_retpoline_push external
> > thunk name and then get Linux to provide that thunk ifdef __clang__.
>
> Yes, the thunks are different in this case. That one should indeed be
> renamed, and I'll provide the correct one in the kernel.
>
> However, even if I let LLVM emit the thunk for itself, code generation
> seems hosed. It looks like after a call to __llvm_retpoline_push (or
> perhaps more to the point after pushing the target address in
> preparation for the call?) we're losing track of where %esp points, and
> thus everything else that was spilled to the stack.
>
> Please let me know if you need any more detail on that, than I provided
> in my last email. You should have everything you need there to follow
> along with my analysis and reproduce it for yourself. Except perhaps
> the Grandma-suck-eggs command lines like
>
> $ qemu-system-i386  -display none -serial stdio -kernel
> arch/x86/boot/bzImage  -append earlyprintk=ttyS0,keep -s -S
>
> $ gdb vmlinux
> (gdb) tar rem localhost:1234
> (gdb) break mp_register_ioapic
> (gdb) c
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180209/0ffda508/attachment.html>

David Woodhouse via llvm-dev

2018-Feb-09 22:28 UTC

head link

[llvm-dev] retpoline mitigation and 6.0

On Fri, 2018-02-09 at 14:23 -0800, Reid Kleckner wrote:> I think I see what's going on, and I agree it looks like a bug. It
> was too much to hope that later passes weren't going to mess with the
> PUSH instruction. :(
> 
> While I was trying to reproduce your problem, I think I found another
> one that looks like this:
> $ clang -S -O2 -m32 -mregparm=3 -mretpoline spill_across_rp.cpp -o -
> | grep _retpoline_push -B2
> ...
>         movl    %eax, 8(%esp)           # 4-byte Spill
> ...
>         pushl   %edi
>         movl    8(%esp), %edi           # 4-byte Reload
>         calll   __llvm_retpoline_push
> That's obviously broken, it doesn't account for the SP adjustment
in
> the push.
Fun. Isn't that the *opposite* of the problem I was seeing? I thought
it *was* accounting for the SP adjustment in the push, even after
__llvm_retpoline_push had done its thing and effectively popped that
word back off the stack again.

I've now filed this as PR36329 and marked it blocking PR35804 (6.0
release).
> It's weird, because it's basically the opposite of the problem
you're
> having, which looks like LLVM *is* accounting for the push and is
> trying to adjust its offsets accordingly.
> 
> Can you send along the full command line used to compile io_apic_b.i?
  /home/dwmw2/git/llvm6/bin/clang
-Wp,-MD,arch/x86/kernel/apic/.io_apic_b.o.d  -nostdinc -isystem
/home/dwmw2/git/llvm6/lib/clang/6.0.0/include -I./arch/x86/include
-I./arch/x86/include/generated  -I./include -I./arch/x86/include/uapi
-I./arch/x86/include/generated/uapi -I./include/uapi -I./include/generated/uapi
-include ./include/linux/kconfig.h -D__KERNEL__ -Qunused-arguments -Wall -Wundef
-Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
-fshort-wchar -Werror-implicit-function-declaration -Wno-format-security
-std=gnu89 -fno-PIE -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m32
-msoft-float -mregparm=3 -freg-struct-return -fno-pic -mstack-alignment=4
-march=i686 -ffreestanding -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
-DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_SSSE3=1 -DCONFIG_AS_CRC32=1
-DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -DCONFIG_AS_SHA1_NI=1
-DCONFIG_AS_SHA256_NI=1 -pipe -Wno-sign-compare -fno-asynchronous-unwind-tables
-mretpoline-external-thunk -DRETPOLINE -O2 -Wframe-larger-than=2048
-fno-stack-protector -Wno-unused-variable -Wno-format-invalid-specifier -Wno-gnu
-Wno-address-of-packed-member -Wno-tautological-compare -mno-global-merge
-no-integrated-as -fno-omit-frame-pointer -fno-optimize-sibling-calls -g -pg
-mfentry -DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign
-fno-strict-overflow -fno-stack-check -Werror=implicit-int
-Werror=strict-prototypes -Werror=date-time -Werror=incompatible-pointer-types
-Wno-initializer-overrides -Wno-unused-value -Wno-format -Wno-sign-compare
-Wno-format-zero-length -Wno-uninitialized
-mno-retpoline-external-thunk    -DKBUILD_BASENAME='"io_apic_b"'  -DKBUILD_MODNAME='"io_apic_b"'
-c -o arch/x86/kernel/apic/.tmp_io_apic_b.o arch/x86/kernel/apic/io_apic_b.c
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5213 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180209/a9ac248e/attachment.bin>

Hans Wennborg via llvm-dev

2018-Feb-19 15:18 UTC

head link

[llvm-dev] retpoline mitigation and 6.0

Reid, is this what you fixed with r325049 that's also merged to 6.0?

If so, do we have everything that's needed, or is there anything else
I should wait for?

Chandler, do you have anything else? Also, release notes please :-)

On Fri, Feb 9, 2018 at 11:23 PM, Reid Kleckner via llvm-dev
<llvm-dev at lists.llvm.org> wrote:> I think I see what's going on, and I agree it looks like a bug. It was
too
> much to hope that later passes weren't going to mess with the PUSH
> instruction. :(
>
> While I was trying to reproduce your problem, I think I found another one
> that looks like this:
>
> $ clang -S -O2 -m32 -mregparm=3 -mretpoline spill_across_rp.cpp -o - | grep
> _retpoline_push -B2
> ...
>         movl    %eax, 8(%esp)           # 4-byte Spill
> ...
>         pushl   %edi
>         movl    8(%esp), %edi           # 4-byte Reload
>         calll   __llvm_retpoline_push
>
> That's obviously broken, it doesn't account for the SP adjustment
in the
> push.
>
> It's weird, because it's basically the opposite of the problem
you're
> having, which looks like LLVM *is* accounting for the push and is trying to
> adjust its offsets accordingly.
>
> Can you send along the full command line used to compile io_apic_b.i?
>
> On Fri, Feb 9, 2018 at 11:38 AM, David Woodhouse <dwmw2 at
infradead.org>
> wrote:
>>
>> On Fri, 2018-02-09 at 11:24 -0800, Reid Kleckner wrote:
>> > I haven't read the all the emails in full detail, but it seems
pretty
>> > clear that  __x86_indirect_thunk and __llvm_retpoline_push do not
do
>> > the same things. It sounds like __llvm_retpoline_push is
equivalent
>> > to __x86_indirect_thunk except first it swaps the two words on the
>> > top of the stack.
>> >
>> > I arranged it this way because the x86 call instruction puts the
>> > intended return address on the top of the stack, and there's
no easy
>> > way to put it anywhere else. We use this thunk when we want to
make
>> > an indirect call and there are no available scratch registers,
i.e.
>> > 32-bit -mregparm=3 and the call has three or more arguments, which
>> > happens in Linux. One way to avoid this would be to compile with
>> > -mregparm=2, but that would pessimize direct calls unnecessarily.
>> >
>> > It sounds like Linux is already providing its own thunks, so it
might
>> > be better to for us to go back to the __llvm_retpoline_push
external
>> > thunk name and then get Linux to provide that thunk ifdef
__clang__.
>>
>> Yes, the thunks are different in this case. That one should indeed be
>> renamed, and I'll provide the correct one in the kernel.
>>
>> However, even if I let LLVM emit the thunk for itself, code generation
>> seems hosed. It looks like after a call to __llvm_retpoline_push (or
>> perhaps more to the point after pushing the target address in
>> preparation for the call?) we're losing track of where %esp points,
and
>> thus everything else that was spilled to the stack.
>>
>> Please let me know if you need any more detail on that, than I provided
>> in my last email. You should have everything you need there to follow
>> along with my analysis and reproduce it for yourself. Except perhaps
>> the Grandma-suck-eggs command lines like
>>
>> $ qemu-system-i386  -display none -serial stdio -kernel
>> arch/x86/boot/bzImage  -append earlyprintk=ttyS0,keep -s -S
>>
>> $ gdb vmlinux
>> (gdb) tar rem localhost:1234
>> (gdb) break mp_register_ioapic
>> (gdb) c
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

David Woodhouse via llvm-dev

2018-Feb-19 15:31 UTC

head link

[llvm-dev] retpoline mitigation and 6.0

On Mon, 2018-02-19 at 16:18 +0100, Hans Wennborg wrote:> Reid, is this what you fixed with r325049 that's also merged to 6.0?
That's working for me, yes.
> If so, do we have everything that's needed, or is there anything else
> I should wait for?
I wouldn't mind an opinion on 
https://bugs.llvm.org/show_bug.cgi?id=33587

Clang for x86_32 will happily compile this:

	long long ret;
	asm ("movl $5, %0" : "=r" (ret)); // 32-bit reg for 64-bit
val is fine.

... but not this:

	long long ret;
	asm ("movb $5, %0" : "=q" (ret)); // 8-bit reg for 64-bit
val not fine.

In practice, those cases go away in optimisation but the problem for us
in building Linux is that the 8-bit case causes a build failure
*before* the optimiser realises it'd never have to emit it anyway.

I can hack that up so that we're pretending to have 8-bit variables:
https://lkml.org/lkml/2018/2/9/529

But before I go further with that rather icky approach I'd like to know
why it doesn't break for the "r" case, and if it's going to
*keep*
working. And if we can make the "q" case behave the same...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 5213 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180219/7d12b7a0/attachment.bin>

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Feb 2018 - retpoline mitigation and 6.0

[llvm-dev] retpoline mitigation and 6.0

[llvm-dev] retpoline mitigation and 6.0

[llvm-dev] retpoline mitigation and 6.0

[llvm-dev] retpoline mitigation and 6.0

Apparently Analagous Threads