Craig Topper via llvm-dev
2021-Feb-18 06:34 UTC
[llvm-dev] Potentially unsafe loop optimization
Looks like the expand memcmp pass expanded the bcmp. Then ran InstructionSimplify on every basic block in the function because a change was made. That decided that the compare was always false. But I'm not sure it had anything to do with the bcmp expansion. ~Craig On Wed, Feb 17, 2021 at 9:42 PM Johannes Doerfert < johannesdoerfert at gmail.com> wrote:> So I looked at it again because the O3 output in IR has the > return for trunk and 11 but for the latter it's not there in assembly, > this was looking interesting: https://godbolt.org/z/EqhW4v > > I think there is UB in the jump based on the memcmp because only > one byte of the 3 is initialized. To me it seems there is some type > punning going on and the two integers are used as storage for 3 bytes > instead of the 3 bytes array. I thought this may lead to the > selection of the abort branch and bad consequences. However, on > the IR level we do not recognize abort as noreturn as far as I can tell, > thus that is not it. > > I took the IR 11 produced as input and run it again through 11 and trunk, > same result. Then I replaced the bcmp with `f` and the return magically > appeared for 11 too: https://godbolt.org/z/j4T4hW > > We are on to something here. bcmp is recognized and we act on it. The > UB described earlier is present there too, bcmp compares 3 bytes in the two > integer struct instead of the 3 byte array. > > I can manually do the optimization in the middle end that LLVM 11 does > (removed two redundant stores from the entry) but no change: > https://godbolt.org/z/7efdqb > At this point I would guess the backend knows bcmp and utilizes the UB. > > Whatever it is, as far as I can tell it is technically correct given the UB > in the input. However, understanding what made the difference might be > worth > while nevertheless. > > ~ Johannes > > > > On 2/17/21 7:01 PM, Craig Topper wrote: > > I ran the IR through clang -O2 on godbolt and didn't get a return. So I > > think something is happening in the middle end? > > > > ~Craig > > > > > > On Wed, Feb 17, 2021 at 4:57 PM Johannes Doerfert < > > johannesdoerfert at gmail.com> wrote: > > > >> llc generates the return in all version I tried: > >> https://godbolt.org/z/oh3fqh > >> > >> > >> On 2/17/21 6:46 PM, Craig Topper wrote: > >>> What version of llvm are you using? Godbolt is showing trunk and llvm > 10 > >>> have a conditional branch, llvm 11 does not. > >>> > >>> ~Craig > >>> > >>> > >>> On Wed, Feb 17, 2021 at 4:41 PM Richard Kenner via llvm-dev < > >>> llvm-dev at lists.llvm.org> wrote: > >>> > >>>>> Long story short, from what I can see there is no miscompilation > >>>>> or change in semantics for that matter. > >>>> So why does the .s file not contain the loop exit test? > >>>> > >>>> .text > >>>> .file "c26006a.adb" > >>>> .globl _ada_c26006a # -- Begin function > >>>> _ada_c26006a > >>>> .p2align 4, 0x90 > >>>> .type _ada_c26006a, at function > >>>> _ada_c26006a: # @_ada_c26006a > >>>> .cfi_startproc > >>>> # %bb.0: # %entry > >>>> pushq %rbx > >>>> .cfi_def_cfa_offset 16 > >>>> subq $32, %rsp > >>>> .cfi_def_cfa_offset 48 > >>>> .cfi_offset %rbx, -16 > >>>> movw $8257, 16(%rsp) # imm = 0x2041 > >>>> movb $49, 18(%rsp) > >>>> movw $8257, (%rsp) # imm = 0x2041 > >>>> movb $50, 2(%rsp) > >>>> xorl %ebx, %ebx > >>>> jmp .LBB0_1 > >>>> .p2align 4, 0x90 > >>>> .LBB0_3: # %loop.cond.iter > >>>> # in Loop: Header=BB0_1 > >> Depth=1 > >>>> incb %bl > >>>> .LBB0_1: # %loop.cond > >>>> # =>This Inner Loop Header: > >> Depth=1 > >>>> movb %bl, 17(%rsp) > >>>> movb %bl, 1(%rsp) > >>>> movzwl (%rsp), %eax > >>>> xorw 16(%rsp), %ax > >>>> movzbl 2(%rsp), %ecx > >>>> xorb 18(%rsp), %cl > >>>> movzbl %cl, %ecx > >>>> orw %ax, %cx > >>>> jne .LBB0_3 > >>>> # %bb.2: # in Loop: Header=BB0_1 > >> Depth=1 > >>>> callq abort > >>>> jmp .LBB0_3 > >>>> .Lfunc_end0: > >>>> .size _ada_c26006a, .Lfunc_end0-_ada_c26006a > >>>> .cfi_endproc > >>>> # -- End function > >>>> .section ".note.GNU-stack","", at progbits > >>>> _______________________________________________ > >>>> LLVM Developers mailing list > >>>> llvm-dev at lists.llvm.org > >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >>>> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210217/3ff70b5a/attachment.html>
Craig Topper via llvm-dev
2021-Feb-18 06:59 UTC
[llvm-dev] Potentially unsafe loop optimization
Adding Nikita. I think this might have been fixed by 835104a1141a06ae7821fe2b642b9603e00aa17b [LSR] Drop potentially invalid nowrap flags when switching to post-inc IV (PR46943) ~Craig On Wed, Feb 17, 2021 at 10:34 PM Craig Topper <craig.topper at gmail.com> wrote:> Looks like the expand memcmp pass expanded the bcmp. Then ran > InstructionSimplify on every basic block in the function because a change > was made. That decided that the compare was always false. But I'm not sure > it had anything to do with the bcmp expansion. > > ~Craig > > > On Wed, Feb 17, 2021 at 9:42 PM Johannes Doerfert < > johannesdoerfert at gmail.com> wrote: > >> So I looked at it again because the O3 output in IR has the >> return for trunk and 11 but for the latter it's not there in assembly, >> this was looking interesting: https://godbolt.org/z/EqhW4v >> >> I think there is UB in the jump based on the memcmp because only >> one byte of the 3 is initialized. To me it seems there is some type >> punning going on and the two integers are used as storage for 3 bytes >> instead of the 3 bytes array. I thought this may lead to the >> selection of the abort branch and bad consequences. However, on >> the IR level we do not recognize abort as noreturn as far as I can tell, >> thus that is not it. >> >> I took the IR 11 produced as input and run it again through 11 and trunk, >> same result. Then I replaced the bcmp with `f` and the return magically >> appeared for 11 too: https://godbolt.org/z/j4T4hW >> >> We are on to something here. bcmp is recognized and we act on it. The >> UB described earlier is present there too, bcmp compares 3 bytes in the >> two >> integer struct instead of the 3 byte array. >> >> I can manually do the optimization in the middle end that LLVM 11 does >> (removed two redundant stores from the entry) but no change: >> https://godbolt.org/z/7efdqb >> At this point I would guess the backend knows bcmp and utilizes the UB. >> >> Whatever it is, as far as I can tell it is technically correct given the >> UB >> in the input. However, understanding what made the difference might be >> worth >> while nevertheless. >> >> ~ Johannes >> >> >> >> On 2/17/21 7:01 PM, Craig Topper wrote: >> > I ran the IR through clang -O2 on godbolt and didn't get a return. So I >> > think something is happening in the middle end? >> > >> > ~Craig >> > >> > >> > On Wed, Feb 17, 2021 at 4:57 PM Johannes Doerfert < >> > johannesdoerfert at gmail.com> wrote: >> > >> >> llc generates the return in all version I tried: >> >> https://godbolt.org/z/oh3fqh >> >> >> >> >> >> On 2/17/21 6:46 PM, Craig Topper wrote: >> >>> What version of llvm are you using? Godbolt is showing trunk and llvm >> 10 >> >>> have a conditional branch, llvm 11 does not. >> >>> >> >>> ~Craig >> >>> >> >>> >> >>> On Wed, Feb 17, 2021 at 4:41 PM Richard Kenner via llvm-dev < >> >>> llvm-dev at lists.llvm.org> wrote: >> >>> >> >>>>> Long story short, from what I can see there is no miscompilation >> >>>>> or change in semantics for that matter. >> >>>> So why does the .s file not contain the loop exit test? >> >>>> >> >>>> .text >> >>>> .file "c26006a.adb" >> >>>> .globl _ada_c26006a # -- Begin function >> >>>> _ada_c26006a >> >>>> .p2align 4, 0x90 >> >>>> .type _ada_c26006a, at function >> >>>> _ada_c26006a: # @_ada_c26006a >> >>>> .cfi_startproc >> >>>> # %bb.0: # %entry >> >>>> pushq %rbx >> >>>> .cfi_def_cfa_offset 16 >> >>>> subq $32, %rsp >> >>>> .cfi_def_cfa_offset 48 >> >>>> .cfi_offset %rbx, -16 >> >>>> movw $8257, 16(%rsp) # imm = 0x2041 >> >>>> movb $49, 18(%rsp) >> >>>> movw $8257, (%rsp) # imm = 0x2041 >> >>>> movb $50, 2(%rsp) >> >>>> xorl %ebx, %ebx >> >>>> jmp .LBB0_1 >> >>>> .p2align 4, 0x90 >> >>>> .LBB0_3: # %loop.cond.iter >> >>>> # in Loop: Header=BB0_1 >> >> Depth=1 >> >>>> incb %bl >> >>>> .LBB0_1: # %loop.cond >> >>>> # =>This Inner Loop Header: >> >> Depth=1 >> >>>> movb %bl, 17(%rsp) >> >>>> movb %bl, 1(%rsp) >> >>>> movzwl (%rsp), %eax >> >>>> xorw 16(%rsp), %ax >> >>>> movzbl 2(%rsp), %ecx >> >>>> xorb 18(%rsp), %cl >> >>>> movzbl %cl, %ecx >> >>>> orw %ax, %cx >> >>>> jne .LBB0_3 >> >>>> # %bb.2: # in Loop: Header=BB0_1 >> >> Depth=1 >> >>>> callq abort >> >>>> jmp .LBB0_3 >> >>>> .Lfunc_end0: >> >>>> .size _ada_c26006a, .Lfunc_end0-_ada_c26006a >> >>>> .cfi_endproc >> >>>> # -- End function >> >>>> .section ".note.GNU-stack","", at progbits >> >>>> _______________________________________________ >> >>>> LLVM Developers mailing list >> >>>> llvm-dev at lists.llvm.org >> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >>>> >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210217/e25fe119/attachment-0001.html>
Richard Kenner via llvm-dev
2021-Feb-18 13:16 UTC
[llvm-dev] Potentially unsafe loop optimization
> Looks like the expand memcmp pass expanded the bcmp. Then ran > InstructionSimplify on every basic block in the function because a > change was made. That decided that the compare was always false. But > I'm not sure it had anything to do with the bcmp expansion.But that's not the relevant compare. That compare is actually provably false (abort is never called). The compare that's at issue is the loop end compare.