Johannes Doerfert via llvm-dev
2021-Feb-18 05:42 UTC
[llvm-dev] Potentially unsafe loop optimization
So I looked at it again because the O3 output in IR has the return for trunk and 11 but for the latter it's not there in assembly, this was looking interesting: https://godbolt.org/z/EqhW4v I think there is UB in the jump based on the memcmp because only one byte of the 3 is initialized. To me it seems there is some type punning going on and the two integers are used as storage for 3 bytes instead of the 3 bytes array. I thought this may lead to the selection of the abort branch and bad consequences. However, on the IR level we do not recognize abort as noreturn as far as I can tell, thus that is not it. I took the IR 11 produced as input and run it again through 11 and trunk, same result. Then I replaced the bcmp with `f` and the return magically appeared for 11 too: https://godbolt.org/z/j4T4hW We are on to something here. bcmp is recognized and we act on it. The UB described earlier is present there too, bcmp compares 3 bytes in the two integer struct instead of the 3 byte array. I can manually do the optimization in the middle end that LLVM 11 does (removed two redundant stores from the entry) but no change: https://godbolt.org/z/7efdqb At this point I would guess the backend knows bcmp and utilizes the UB. Whatever it is, as far as I can tell it is technically correct given the UB in the input. However, understanding what made the difference might be worth while nevertheless. ~ Johannes On 2/17/21 7:01 PM, Craig Topper wrote:> I ran the IR through clang -O2 on godbolt and didn't get a return. So I > think something is happening in the middle end? > > ~Craig > > > On Wed, Feb 17, 2021 at 4:57 PM Johannes Doerfert < > johannesdoerfert at gmail.com> wrote: > >> llc generates the return in all version I tried: >> https://godbolt.org/z/oh3fqh >> >> >> On 2/17/21 6:46 PM, Craig Topper wrote: >>> What version of llvm are you using? Godbolt is showing trunk and llvm 10 >>> have a conditional branch, llvm 11 does not. >>> >>> ~Craig >>> >>> >>> On Wed, Feb 17, 2021 at 4:41 PM Richard Kenner via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>>> Long story short, from what I can see there is no miscompilation >>>>> or change in semantics for that matter. >>>> So why does the .s file not contain the loop exit test? >>>> >>>> .text >>>> .file "c26006a.adb" >>>> .globl _ada_c26006a # -- Begin function >>>> _ada_c26006a >>>> .p2align 4, 0x90 >>>> .type _ada_c26006a, at function >>>> _ada_c26006a: # @_ada_c26006a >>>> .cfi_startproc >>>> # %bb.0: # %entry >>>> pushq %rbx >>>> .cfi_def_cfa_offset 16 >>>> subq $32, %rsp >>>> .cfi_def_cfa_offset 48 >>>> .cfi_offset %rbx, -16 >>>> movw $8257, 16(%rsp) # imm = 0x2041 >>>> movb $49, 18(%rsp) >>>> movw $8257, (%rsp) # imm = 0x2041 >>>> movb $50, 2(%rsp) >>>> xorl %ebx, %ebx >>>> jmp .LBB0_1 >>>> .p2align 4, 0x90 >>>> .LBB0_3: # %loop.cond.iter >>>> # in Loop: Header=BB0_1 >> Depth=1 >>>> incb %bl >>>> .LBB0_1: # %loop.cond >>>> # =>This Inner Loop Header: >> Depth=1 >>>> movb %bl, 17(%rsp) >>>> movb %bl, 1(%rsp) >>>> movzwl (%rsp), %eax >>>> xorw 16(%rsp), %ax >>>> movzbl 2(%rsp), %ecx >>>> xorb 18(%rsp), %cl >>>> movzbl %cl, %ecx >>>> orw %ax, %cx >>>> jne .LBB0_3 >>>> # %bb.2: # in Loop: Header=BB0_1 >> Depth=1 >>>> callq abort >>>> jmp .LBB0_3 >>>> .Lfunc_end0: >>>> .size _ada_c26006a, .Lfunc_end0-_ada_c26006a >>>> .cfi_endproc >>>> # -- End function >>>> .section ".note.GNU-stack","", at progbits >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>>
Craig Topper via llvm-dev
2021-Feb-18 06:34 UTC
[llvm-dev] Potentially unsafe loop optimization
Looks like the expand memcmp pass expanded the bcmp. Then ran InstructionSimplify on every basic block in the function because a change was made. That decided that the compare was always false. But I'm not sure it had anything to do with the bcmp expansion. ~Craig On Wed, Feb 17, 2021 at 9:42 PM Johannes Doerfert < johannesdoerfert at gmail.com> wrote:> So I looked at it again because the O3 output in IR has the > return for trunk and 11 but for the latter it's not there in assembly, > this was looking interesting: https://godbolt.org/z/EqhW4v > > I think there is UB in the jump based on the memcmp because only > one byte of the 3 is initialized. To me it seems there is some type > punning going on and the two integers are used as storage for 3 bytes > instead of the 3 bytes array. I thought this may lead to the > selection of the abort branch and bad consequences. However, on > the IR level we do not recognize abort as noreturn as far as I can tell, > thus that is not it. > > I took the IR 11 produced as input and run it again through 11 and trunk, > same result. Then I replaced the bcmp with `f` and the return magically > appeared for 11 too: https://godbolt.org/z/j4T4hW > > We are on to something here. bcmp is recognized and we act on it. The > UB described earlier is present there too, bcmp compares 3 bytes in the two > integer struct instead of the 3 byte array. > > I can manually do the optimization in the middle end that LLVM 11 does > (removed two redundant stores from the entry) but no change: > https://godbolt.org/z/7efdqb > At this point I would guess the backend knows bcmp and utilizes the UB. > > Whatever it is, as far as I can tell it is technically correct given the UB > in the input. However, understanding what made the difference might be > worth > while nevertheless. > > ~ Johannes > > > > On 2/17/21 7:01 PM, Craig Topper wrote: > > I ran the IR through clang -O2 on godbolt and didn't get a return. So I > > think something is happening in the middle end? > > > > ~Craig > > > > > > On Wed, Feb 17, 2021 at 4:57 PM Johannes Doerfert < > > johannesdoerfert at gmail.com> wrote: > > > >> llc generates the return in all version I tried: > >> https://godbolt.org/z/oh3fqh > >> > >> > >> On 2/17/21 6:46 PM, Craig Topper wrote: > >>> What version of llvm are you using? Godbolt is showing trunk and llvm > 10 > >>> have a conditional branch, llvm 11 does not. > >>> > >>> ~Craig > >>> > >>> > >>> On Wed, Feb 17, 2021 at 4:41 PM Richard Kenner via llvm-dev < > >>> llvm-dev at lists.llvm.org> wrote: > >>> > >>>>> Long story short, from what I can see there is no miscompilation > >>>>> or change in semantics for that matter. > >>>> So why does the .s file not contain the loop exit test? > >>>> > >>>> .text > >>>> .file "c26006a.adb" > >>>> .globl _ada_c26006a # -- Begin function > >>>> _ada_c26006a > >>>> .p2align 4, 0x90 > >>>> .type _ada_c26006a, at function > >>>> _ada_c26006a: # @_ada_c26006a > >>>> .cfi_startproc > >>>> # %bb.0: # %entry > >>>> pushq %rbx > >>>> .cfi_def_cfa_offset 16 > >>>> subq $32, %rsp > >>>> .cfi_def_cfa_offset 48 > >>>> .cfi_offset %rbx, -16 > >>>> movw $8257, 16(%rsp) # imm = 0x2041 > >>>> movb $49, 18(%rsp) > >>>> movw $8257, (%rsp) # imm = 0x2041 > >>>> movb $50, 2(%rsp) > >>>> xorl %ebx, %ebx > >>>> jmp .LBB0_1 > >>>> .p2align 4, 0x90 > >>>> .LBB0_3: # %loop.cond.iter > >>>> # in Loop: Header=BB0_1 > >> Depth=1 > >>>> incb %bl > >>>> .LBB0_1: # %loop.cond > >>>> # =>This Inner Loop Header: > >> Depth=1 > >>>> movb %bl, 17(%rsp) > >>>> movb %bl, 1(%rsp) > >>>> movzwl (%rsp), %eax > >>>> xorw 16(%rsp), %ax > >>>> movzbl 2(%rsp), %ecx > >>>> xorb 18(%rsp), %cl > >>>> movzbl %cl, %ecx > >>>> orw %ax, %cx > >>>> jne .LBB0_3 > >>>> # %bb.2: # in Loop: Header=BB0_1 > >> Depth=1 > >>>> callq abort > >>>> jmp .LBB0_3 > >>>> .Lfunc_end0: > >>>> .size _ada_c26006a, .Lfunc_end0-_ada_c26006a > >>>> .cfi_endproc > >>>> # -- End function > >>>> .section ".note.GNU-stack","", at progbits > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org > >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210217/3ff70b5a/attachment.html>
Richard Kenner via llvm-dev
2021-Feb-18 13:13 UTC
[llvm-dev] Potentially unsafe loop optimization
> I think there is UB in the jump based on the memcmp because only > one byte of the 3 is initialized. To me it seems there is some type > punning going on and the two integers are used as storage for 3 bytes > instead of the 3 bytes array. I thought this may lead to the > selection of the abort branch and bad consequences. However, on > the IR level we do not recognize abort as noreturn as far as I can tell, > thus that is not it.The larger test that this was from didn't have an abort there, but a call to a procedure that printed an error message, so the abort isn't fundamental to the issue. Because of this, I wasn't looking at the comparison, which should have been defined. However, I don't see the UB here. The code does look wrong, because the pointer pun for %0 and %1 should have been a GEP for the second part of the structure (and actually, the allocated variable shouldn't include the two integers), but everything is done with %0 and %1: they're initialized to three characters, the second character is replaced by the loop variable, and the three characters are compared.