Krzysztof Parzyszek via llvm-dev
2017-Jul-28 18:08 UTC
[llvm-dev] Tail merging "undef" with a defined register: wrong code
I've looked into that and it's not going to be simple, unfortunately. Here's the original example again: --- name: fred tracksRegLiveness: true body: | bb.0: successors: %bb.1, %bb.2 J2_jumpt undef %p0, %bb.2, implicit-def %pc J2_jump %bb.1, implicit-def %pc bb.1: successors: %bb.3 %r0 = L2_loadruh_io undef %r0, 0 PS_storerhabs 0, killed %r0 J2_jump %bb.3, implicit-def %pc bb.2: successors: %bb.3 PS_storerhabs 0, undef %r0 J2_jump %bb.3, implicit-def %pc bb.3: PS_jmpret killed %r31, implicit-def %pc ... If the merging process produced this intermediate code (I stripped some unimportant pieces): bb.1: %r0 = L2_loadruh_io undef %r0, 0 J2_jump %bb.tail bb.2: J2_jump %bb.tail bb.tail: liveins: %r0 PS_storerhabs 0, killed %r0 J2_jump %bb.3, implicit-def %pc Then we could insert IMPLICIT_DEF into bb.2 and everything would be fine. This is not what happens, though. In this case, the entire bb.2 will be used as the new tail leaving us without a good point to put the implicit def at. In this particular testcase, the implicit def could go in bb.0 (which was not involved in the tail merging at all), but in general that may not be an option. -Krzysztof On 7/27/2017 2:20 PM, Quentin Colombet wrote:> Yeah, that’s want I was expecting from the verifier. > I’d say we should add an implicit_def (assuming it gets kill at code emission) in the part where it is undef. > >> On Jul 27, 2017, at 11:52 AM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote: >> >> Yes, immediately after branch folding the code would still behave the same as the original. At the same time, any subsequent optimization may "exploit" the incorrect liveness information to do something bad. If you add -run-pass if-converter, you'll get: >> >> # After If Converter >> # Machine code for function fred: IsSSA, NoPHIs, TracksLiveness, NoVRegs >> >> BB#0: >> %R0<def> = L2_ploadruhf_io %P0<undef>, %R0<undef>, 0, %R0<imp-use> >> PS_storerhabs 0, %R0 >> PS_jmpret %R31<kill>, %PC<imp-def> >> >> # End machine code for function fred. >> >> *** Bad machine code: Using an undefined physical register *** >> - function: fred >> - basic block: BB#0 (0x411e8e8) >> - instruction: %R0<def> = L2_ploadruhf_io >> - operand 4: %R0<imp-use> >> LLVM ERROR: Found 1 machine code errors. >> >> -Krzysztof >> >> >> On 7/27/2017 1:47 PM, Quentin Colombet wrote: >>> Why do you say the code is wrong? >>> I can see why the MachineVerifier would complain, but the code seems to do the same thing as it was doing before the transformation, isn’t it? >>>> On Jul 27, 2017, at 11:40 AM, Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>> >>>> The comment in test/CodeGen/X86/branchfolding-undef.mir states that such merging is legal, however doing so can actually generate wrong code: >>>> >>>> Consider this (valid code): >>>> >>>> --- >>>> name: fred >>>> tracksRegLiveness: true >>>> >>>> body: | >>>> bb.0: >>>> successors: %bb.1, %bb.2 >>>> J2_jumpt undef %p0, %bb.2, implicit-def %pc >>>> J2_jump %bb.1, implicit-def %pc >>>> >>>> bb.1: >>>> successors: %bb.3 >>>> %r0 = L2_loadruh_io undef %r0, 0 >>>> PS_storerhabs 0, killed %r0 >>>> J2_jump %bb.3, implicit-def %pc >>>> >>>> bb.2: >>>> successors: %bb.3 >>>> PS_storerhabs 0, undef %r0 >>>> J2_jump %bb.3, implicit-def %pc >>>> >>>> bb.3: >>>> PS_jmpret killed %r31, implicit-def %pc >>>> ... >>>> >>>> >>>> Run: >>>> llc -march=hexagon -run-pass branch-folder test.mir -o - >>>> >>>> >>>> We get an invalid code: >>>> >>>> body: | >>>> bb.0: >>>> successors: %bb.1(0x40000000), %bb.2(0x40000000) >>>> J2_jumpt undef %p0, %bb.2, implicit-def %pc >>>> >>>> bb.1: >>>> successors: %bb.2(0x80000000) >>>> %r0 = L2_loadruh_io undef %r0, 0 >>>> >>>> bb.2: >>>> liveins: %r0 ; Invalid: r0 is not live out of bb.0 >>>> PS_storerhabs 0, %r0 >>>> PS_jmpret killed %r31, implicit-def %pc >>>> ... >>>> >>>> >>>> Should we disallow merging undef with non-undef? >>>> >>>> -Krzysztof >>>> >>>> >>>> -- >>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> -- >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation >-- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Quentin Colombet via llvm-dev
2017-Jul-28 18:59 UTC
[llvm-dev] Tail merging "undef" with a defined register: wrong code
Hi Krzysztof, Thanks for digging into this.> On Jul 28, 2017, at 11:08 AM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote: > > I've looked into that and it's not going to be simple, unfortunately. > > Here's the original example again: > > --- > name: fred > tracksRegLiveness: true > > body: | > bb.0: > successors: %bb.1, %bb.2 > J2_jumpt undef %p0, %bb.2, implicit-def %pc > J2_jump %bb.1, implicit-def %pc > > bb.1: > successors: %bb.3 > %r0 = L2_loadruh_io undef %r0, 0 > PS_storerhabs 0, killed %r0 > J2_jump %bb.3, implicit-def %pc > > bb.2: > successors: %bb.3 > PS_storerhabs 0, undef %r0 > J2_jump %bb.3, implicit-def %pc > > bb.3: > PS_jmpret killed %r31, implicit-def %pc > ... > > > If the merging process produced this intermediate code (I stripped some unimportant pieces): > > bb.1: > %r0 = L2_loadruh_io undef %r0, 0 > J2_jump %bb.tail > > bb.2: > J2_jump %bb.tail > > bb.tail: > liveins: %r0 > PS_storerhabs 0, killed %r0 > J2_jump %bb.3, implicit-def %pc > > Then we could insert IMPLICIT_DEF into bb.2 and everything would be fine. This is not what happens, though. In this case, the entire bb.2 will be used as the new tail leaving us without a good point to put the implicit def at.We should need to put the implicit_def only if the related register is dead (as in not live-out). Thus, for this case we wouldn’t need to do anything, right? I am not saying it is easy though :). Cheers, -Quentin> > In this particular testcase, the implicit def could go in bb.0 (which was not involved in the tail merging at all), but in general that may not be an option. > > -Krzysztof > > > > On 7/27/2017 2:20 PM, Quentin Colombet wrote: >> Yeah, that’s want I was expecting from the verifier. >> I’d say we should add an implicit_def (assuming it gets kill at code emission) in the part where it is undef. >>> On Jul 27, 2017, at 11:52 AM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote: >>> >>> Yes, immediately after branch folding the code would still behave the same as the original. At the same time, any subsequent optimization may "exploit" the incorrect liveness information to do something bad. If you add -run-pass if-converter, you'll get: >>> >>> # After If Converter >>> # Machine code for function fred: IsSSA, NoPHIs, TracksLiveness, NoVRegs >>> >>> BB#0: >>> %R0<def> = L2_ploadruhf_io %P0<undef>, %R0<undef>, 0, %R0<imp-use> >>> PS_storerhabs 0, %R0 >>> PS_jmpret %R31<kill>, %PC<imp-def> >>> >>> # End machine code for function fred. >>> >>> *** Bad machine code: Using an undefined physical register *** >>> - function: fred >>> - basic block: BB#0 (0x411e8e8) >>> - instruction: %R0<def> = L2_ploadruhf_io >>> - operand 4: %R0<imp-use> >>> LLVM ERROR: Found 1 machine code errors. >>> >>> -Krzysztof >>> >>> >>> On 7/27/2017 1:47 PM, Quentin Colombet wrote: >>>> Why do you say the code is wrong? >>>> I can see why the MachineVerifier would complain, but the code seems to do the same thing as it was doing before the transformation, isn’t it? >>>>> On Jul 27, 2017, at 11:40 AM, Krzysztof Parzyszek via llvm-dev <llvm-dev at lists.llvm.org> wrote: >>>>> >>>>> The comment in test/CodeGen/X86/branchfolding-undef.mir states that such merging is legal, however doing so can actually generate wrong code: >>>>> >>>>> Consider this (valid code): >>>>> >>>>> --- >>>>> name: fred >>>>> tracksRegLiveness: true >>>>> >>>>> body: | >>>>> bb.0: >>>>> successors: %bb.1, %bb.2 >>>>> J2_jumpt undef %p0, %bb.2, implicit-def %pc >>>>> J2_jump %bb.1, implicit-def %pc >>>>> >>>>> bb.1: >>>>> successors: %bb.3 >>>>> %r0 = L2_loadruh_io undef %r0, 0 >>>>> PS_storerhabs 0, killed %r0 >>>>> J2_jump %bb.3, implicit-def %pc >>>>> >>>>> bb.2: >>>>> successors: %bb.3 >>>>> PS_storerhabs 0, undef %r0 >>>>> J2_jump %bb.3, implicit-def %pc >>>>> >>>>> bb.3: >>>>> PS_jmpret killed %r31, implicit-def %pc >>>>> ... >>>>> >>>>> >>>>> Run: >>>>> llc -march=hexagon -run-pass branch-folder test.mir -o - >>>>> >>>>> >>>>> We get an invalid code: >>>>> >>>>> body: | >>>>> bb.0: >>>>> successors: %bb.1(0x40000000), %bb.2(0x40000000) >>>>> J2_jumpt undef %p0, %bb.2, implicit-def %pc >>>>> >>>>> bb.1: >>>>> successors: %bb.2(0x80000000) >>>>> %r0 = L2_loadruh_io undef %r0, 0 >>>>> >>>>> bb.2: >>>>> liveins: %r0 ; Invalid: r0 is not live out of bb.0 >>>>> PS_storerhabs 0, %r0 >>>>> PS_jmpret killed %r31, implicit-def %pc >>>>> ... >>>>> >>>>> >>>>> Should we disallow merging undef with non-undef? >>>>> >>>>> -Krzysztof >>>>> >>>>> >>>>> -- >>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation >>>>> _______________________________________________ >>>>> LLVM Developers mailing list >>>>> llvm-dev at lists.llvm.org >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> -- >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Krzysztof Parzyszek via llvm-dev
2017-Jul-28 19:18 UTC
[llvm-dev] Tail merging "undef" with a defined register: wrong code
On 7/28/2017 1:59 PM, Quentin Colombet wrote:> Hi Krzysztof, > > Thanks for digging into this. > >> On Jul 28, 2017, at 11:08 AM, Krzysztof Parzyszek <kparzysz at codeaurora.org> wrote: >> >> I've looked into that and it's not going to be simple, unfortunately. >> >> Here's the original example again: >> >> --- >> name: fred >> tracksRegLiveness: true >> >> body: | >> bb.0: >> successors: %bb.1, %bb.2 >> J2_jumpt undef %p0, %bb.2, implicit-def %pc >> J2_jump %bb.1, implicit-def %pc >> >> bb.1: >> successors: %bb.3 >> %r0 = L2_loadruh_io undef %r0, 0 >> PS_storerhabs 0, killed %r0 >> J2_jump %bb.3, implicit-def %pc >> >> bb.2: >> successors: %bb.3 >> PS_storerhabs 0, undef %r0 >> J2_jump %bb.3, implicit-def %pc >> >> bb.3: >> PS_jmpret killed %r31, implicit-def %pc >> ... >> >> >> If the merging process produced this intermediate code (I stripped some unimportant pieces): >> >> bb.1: >> %r0 = L2_loadruh_io undef %r0, 0 >> J2_jump %bb.tail >> >> bb.2: >> J2_jump %bb.tail >> >> bb.tail: >> liveins: %r0 >> PS_storerhabs 0, killed %r0 >> J2_jump %bb.3, implicit-def %pc >> >> Then we could insert IMPLICIT_DEF into bb.2 and everything would be fine. This is not what happens, though. In this case, the entire bb.2 will be used as the new tail leaving us without a good point to put the implicit def at. > > We should need to put the implicit_def only if the related register is dead (as in not live-out).Yes.> Thus, for this case we wouldn’t need to do anything, right?Why not? This is the same testcase as in the beginning of the thread. In the actual output produced by tail merging, r0 is not live out of bb.0, but is live on entry to one of its successors. Even with the hypothetical code, r0 is live on entry to bb.tail, but not live on exit from bb.2. Both cases need some fixes. Since I've removed the old quotes for brevity, this is the current output: body: | bb.0: successors: %bb.1, %bb.2 J2_jumpt undef %p0, %bb.2, implicit-def %pc bb.1: successors: %bb.2 %r0 = L2_loadruh_io undef %r0, 0 bb.2: liveins: %r0 PS_storerhabs 0, %r0 PS_jmpret killed %r31, implicit-def %pc ... -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Apparently Analagous Threads
- Tail merging "undef" with a defined register: wrong code
- Tail merging "undef" with a defined register: wrong code
- Tail merging "undef" with a defined register: wrong code
- [hexagon][PowerPC] code regression (sub-optimal code) on LLVM 9 when generating hardware loops, and the "llvm.uadd" intrinsic.
- Reaching definitions on Machine IR post register allocation