Chuang-Yu Cheng via llvm-dev
2021-Sep-29 08:52 UTC
[llvm-dev] Instcombine-code-sinking increases the value’s live range
Hi, In the InstCombinePass, by default the pass will try to sink an instruction to its successor basic block when possible (so that the instruction isn’t executed on a path where its result isn’t needed.). But doing that will also increase a value’s live range. For example: entry: .. %6 = load float, .. %s.0 = load float, .. %mul22 = fmul float %6, %s.0 %add23 = fadd float %mul22, zeroinitializer %7 = load float, .. %s.1 = load float, .. %mul26 = fmul float %7, %s.1 %add27 = fadd float %add23, %mul26 .. br i1 %cmp, label %cleanup, label %if.end1 if.end1: %15 = load float, .. %add67 = fadd %add27, %15 store float %add67, .. br label %cleanup cleanup: return In the original input, only %add27 has longer live range, but after InstCombine with instcombine-code-sinking=true (default), it turns out that %6, %s.0, %7, %s.1 are having longer live ranges. entry: .. %6 = load float, .. %s.0 = load float, .. %7 = load float, .. %s.1 = load float, .. .. br i1 %cmp, label %cleanup, label %if.end1 if.end1: %mul22 = fmul float %6, %s.0 %add23 = fadd float %mul22, zeroinitializer %mul26 = fmul float %7, %s.1 %add27 = fadd float %add23, %mul26 %15 = load float, .. %add67 = fadd %add27, %15 store float %add67, .. br label %cleanup cleanup: return We see an issue which causes our customized register-allocator keeping those values like %6, %s.0, %7, %s.1 in registers with a long period. My questions are: Does llvm expect the backend's instruction scheduler and register allocator can handle this properly? Can this be solved by llvm’s GlobalISel? Thank you! CY
Chuang-Yu Cheng via llvm-dev
2021-Oct-13 09:20 UTC
[llvm-dev] Instcombine-code-sinking increases the value’s live range
Answer by myself :P The original input pattern is as below: ```c local memory for (...) { a function (has side effect) which copies from global to local memory access data in local memory and do compute } if (...) return; store the computed result back. ``` If the for loop is fully unrolled, and the computing part is sunk to the basicblock which stores the computed result back, then the backend compiler needs to find some places (registers or memory) to store these copied data. I've tested with aarch64 and amdgcn, in the test pattern both targets will spill the data to memory. In the for-loop If we can directly copy instead of using a copy function, both targets can generate better basicblock layouts. (aarch64: "Machine code sinking (machine-sink)" pass, amdgcn: "Code sinking (sink)" pass) On Wed, Sep 29, 2021 at 9:52 AM Chuang-Yu Cheng <cycheng.buddhist at gmail.com> wrote:> > Hi, > > In the InstCombinePass, by default the pass will try to sink an > instruction to its successor basic block when possible (so that the > instruction isn’t executed on a path where its result isn’t needed.). > But doing that will also increase a value’s live range. For example: > > entry: > .. > %6 = load float, .. > %s.0 = load float, .. > %mul22 = fmul float %6, %s.0 > %add23 = fadd float %mul22, zeroinitializer > > %7 = load float, .. > %s.1 = load float, .. > %mul26 = fmul float %7, %s.1 > %add27 = fadd float %add23, %mul26 > > .. > br i1 %cmp, label %cleanup, label %if.end1 > > if.end1: > %15 = load float, .. > %add67 = fadd %add27, %15 > store float %add67, .. > br label %cleanup > > cleanup: > return > > > In the original input, only %add27 has longer live range, but after > InstCombine with instcombine-code-sinking=true (default), it turns out > that %6, %s.0, %7, %s.1 are having longer live ranges. > > entry: > .. > %6 = load float, .. > %s.0 = load float, .. > > %7 = load float, .. > %s.1 = load float, .. > > .. > br i1 %cmp, label %cleanup, label %if.end1 > > if.end1: > %mul22 = fmul float %6, %s.0 > %add23 = fadd float %mul22, zeroinitializer > > %mul26 = fmul float %7, %s.1 > %add27 = fadd float %add23, %mul26 > > %15 = load float, .. > %add67 = fadd %add27, %15 > store float %add67, .. > br label %cleanup > > cleanup: > return > > We see an issue which causes our customized register-allocator keeping > those values like %6, %s.0, %7, %s.1 in registers with a long period. > > My questions are: > > Does llvm expect the backend's instruction scheduler and register > allocator can handle this properly? > > Can this be solved by llvm’s GlobalISel? > > Thank you! > CY
Amara Emerson via llvm-dev
2021-Oct-14 05:02 UTC
[llvm-dev] Instcombine-code-sinking increases the value’s live range
> On Sep 29, 2021, at 1:52 AM, Chuang-Yu Cheng via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi, > > In the InstCombinePass, by default the pass will try to sink an > instruction to its successor basic block when possible (so that the > instruction isn’t executed on a path where its result isn’t needed.). > But doing that will also increase a value’s live range. For example: > > entry: > .. > %6 = load float, .. > %s.0 = load float, .. > %mul22 = fmul float %6, %s.0 > %add23 = fadd float %mul22, zeroinitializer > > %7 = load float, .. > %s.1 = load float, .. > %mul26 = fmul float %7, %s.1 > %add27 = fadd float %add23, %mul26 > > .. > br i1 %cmp, label %cleanup, label %if.end1 > > if.end1: > %15 = load float, .. > %add67 = fadd %add27, %15 > store float %add67, .. > br label %cleanup > > cleanup: > return > > > In the original input, only %add27 has longer live range, but after > InstCombine with instcombine-code-sinking=true (default), it turns out > that %6, %s.0, %7, %s.1 are having longer live ranges. > > entry: > .. > %6 = load float, .. > %s.0 = load float, .. > > %7 = load float, .. > %s.1 = load float, .. > > .. > br i1 %cmp, label %cleanup, label %if.end1 > > if.end1: > %mul22 = fmul float %6, %s.0 > %add23 = fadd float %mul22, zeroinitializer > > %mul26 = fmul float %7, %s.1 > %add27 = fadd float %add23, %mul26 > > %15 = load float, .. > %add67 = fadd %add27, %15 > store float %add67, .. > br label %cleanup > > cleanup: > return > > We see an issue which causes our customized register-allocator keeping > those values like %6, %s.0, %7, %s.1 in registers with a long period. > > My questions are: > > Does llvm expect the backend's instruction scheduler and register > allocator can handle this properly? > > Can this be solved by llvm’s GlobalISel?GlobalISel’s function-scope optimization doesn’t really help in these cases unless the target can somehow fold expressions into simpler instructions. If that’s not possible, the generated code should be fairly similar to that of SelectionDAG.> > Thank you! > CY > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev