Gerolf Hoflehner via llvm-dev
2015-Oct-02 17:40 UTC
[llvm-dev] Register Spill Caused by the Reassociation pass
This conflict is with many optimizations incl. copy prop, coalescing, hoisting etc. Each could increase register pressure and with similar impact. Attempts to control the register pressure locally (within an optimization pass) tend to get hard to tune and maintain. Would it be a better way to describe eg in metadata how to undo an optimization? Optimizations that attempt to reduce pressure like splitting or remat could be hooked up and call an undo routine based on a cost model. I think there is time to do something longer term. This particular instance can only be an issue under -fast-math. Cheers Gerolf> On Oct 1, 2015, at 9:27 AM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Hi Haicheng, > > We need to prevent the transform if it causes spilling, but I'm not sure yet what mechanism/heuristic we can use to do that. > Can you file a bug report with a reduced test case? > > Thanks! > > On Thu, Oct 1, 2015 at 9:10 AM, Haicheng Wu <haicheng at codeaurora.com <mailto:haicheng at codeaurora.com>> wrote: > Hi Sanjay, > > > > I observed some extra register spills when applying the reassociation pass on spec2006 benchmarks and I would like to listen to your advice. > > > > For example, function get_new_point_on_quad() of tria_boundary.cc in spec2006/dealII has a sequences of code like this > > > > … > > X=a+b > > … > > Y=X+c > > … > > Z=Y+d > > … > > > > There are many other instructions between these float adds. The reassociation pass first swaps a and c when checking the second add, and then swaps a and d when checking the third add. The transformed code looks like > > > > … > > X=c+b > > … > > Y=X+d > > … > > Z=Y+a > > > > a is pushed all the way down to the bottom and its live range is much larger now. > > > > Best, > > > > Haicheng > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151002/b2920ff2/attachment.html>
Sanjay Patel via llvm-dev
2015-Oct-02 23:09 UTC
[llvm-dev] Register Spill Caused by the Reassociation pass
The test case in the bug report exposes at least one problem, but it's not the presumed problem of spilling. Reduced example based on the PR attachment: define double @foo_calls_bar_4_times_and_sums_the_results() { %a = call double @bar() %b = call double @bar() %t0 = fadd double %a, %b %c = call double @bar() %t1 = fadd double %t0, %c %d = call double @bar() %t2 = fadd double %t1, %d ret double %t2 } I don't think we're ever going to induce any extra spilling in a case like this. The default (any?) x86-64 ABI requires spilling because no SSE registers are preserved across function calls. So we get 3 spills regardless of any reassociation of the adds: $ ./llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx -o - 25016.ll callq bar vmovsd %xmm0, (%rsp) # 8-byte Spill callq bar vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload vmovsd %xmm0, (%rsp) # 8-byte Spill callq bar vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload vmovsd %xmm0, (%rsp) # 8-byte Spill callq bar vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload If we enable reassociation via -enable-unsafe-fp-math, we still have 3 spills: callq bar vmovsd %xmm0, 16(%rsp) # 8-byte Spill callq bar vmovsd %xmm0, 8(%rsp) # 8-byte Spill callq bar vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload vmovsd %xmm0, 8(%rsp) # 8-byte Spill callq bar vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload vaddsd 16(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload This looks like what is described in the original problem: the adds got reassociated for no benefit (and possibly some harm, although it may be out-of-scope for the MachineCombiner pass). We wanted to add the results of the first 2 function calls, add the results of the last 2 function calls, and then add those 2 results to reduce the critical path. Instead, we got: ((b + c) + d) + a This shows that either the cost calculation in the MachineCombiner is wrong or the results coming back from MachineTraceMetrics are wrong. Or maybe MachineCombiner should be bailing out of a situation like this in the first place - are we even allowed to move instructions around those function calls? Here's where it gets worse - if the adds are already arranged to reduce the critical path: define double @foo4_reassociated() { %a = call double @bar() %b = call double @bar() %c = call double @bar() %d = call double @bar() %t0 = fadd double %a, %b %t1 = fadd double %c, %d %t2 = fadd double %t0, %t1 ret double %t2 } The MachineCombiner is *increasing* the critical path by reassociating the operands: callq bar vmovsd %xmm0, 16(%rsp) # 8-byte Spill callq bar vmovsd %xmm0, 8(%rsp) # 8-byte Spill callq bar vmovsd %xmm0, (%rsp) # 8-byte Spill callq bar vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload vaddsd 16(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload (a + b) + (c + d) --> ((d + c) + b) + a I think this is a problem calculating and/or using the "instruction slack" in MachineTraceMetrics. On Fri, Oct 2, 2015 at 11:40 AM, Gerolf Hoflehner <ghoflehner at apple.com> wrote:> This conflict is with many optimizations incl. copy prop, coalescing, > hoisting etc. Each could increase register pressure and with similar > impact. Attempts to control the register pressure locally (within an > optimization pass) tend to get hard to tune and maintain. Would it be a > better way to describe eg in metadata how to undo an optimization? > Optimizations that attempt to reduce pressure like splitting or remat could > be hooked up and call an undo routine based on a cost model. > > I think there is time to do something longer term. This particular > instance can only be an issue under -fast-math. > > Cheers > Gerolf > > On Oct 1, 2015, at 9:27 AM, Sanjay Patel via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Hi Haicheng, > > We need to prevent the transform if it causes spilling, but I'm not sure > yet what mechanism/heuristic we can use to do that. > Can you file a bug report with a reduced test case? > > Thanks! > > On Thu, Oct 1, 2015 at 9:10 AM, Haicheng Wu <haicheng at codeaurora.com> > wrote: > >> Hi Sanjay, >> >> >> >> I observed some extra register spills when applying the reassociation >> pass on spec2006 benchmarks and I would like to listen to your advice. >> >> >> >> For example, function get_new_point_on_quad() of tria_boundary.cc in >> spec2006/dealII has a sequences of code like this >> >> >> >> … >> >> X=a+b >> >> … >> >> Y=X+c >> >> … >> >> Z=Y+d >> >> … >> >> >> >> There are many other instructions between these float adds. The >> reassociation pass first swaps a and c when checking the second add, and >> then swaps a and d when checking the third add. The transformed code looks >> like >> >> >> >> … >> >> X=c+b >> >> … >> >> Y=X+d >> >> … >> >> Z=Y+a >> >> >> >> a is pushed all the way down to the bottom and its live range is much >> larger now. >> >> >> >> Best, >> >> >> >> Haicheng >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151002/4e897ba4/attachment.html>
Gerolf Hoflehner via llvm-dev
2015-Oct-05 17:22 UTC
[llvm-dev] Register Spill Caused by the Reassociation pass
The machine combiner does not see spills. Perhaps there is a phase ordering issue. From the analysis here I don’t see an explanation for a performance loss (the potential increase in register pressure did make sense to me, though). -Gerolf> On Oct 2, 2015, at 4:09 PM, Sanjay Patel <spatel at rotateright.com> wrote: > > The test case in the bug report exposes at least one problem, but it's not the presumed problem of spilling. > > Reduced example based on the PR attachment: > > define double @foo_calls_bar_4_times_and_sums_the_results() { > %a = call double @bar() > %b = call double @bar() > %t0 = fadd double %a, %b > %c = call double @bar() > %t1 = fadd double %t0, %c > %d = call double @bar() > %t2 = fadd double %t1, %d > ret double %t2 > } > > I don't think we're ever going to induce any extra spilling in a case like this. The default (any?) x86-64 ABI requires spilling because no SSE registers are preserved across function calls. So we get 3 spills regardless of any reassociation of the adds: > > $ ./llc -mtriple=x86_64-unknown-unknown -mcpu=x86-64 -mattr=avx -o - 25016.ll > callq bar > vmovsd %xmm0, (%rsp) # 8-byte Spill > callq bar > vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload > vmovsd %xmm0, (%rsp) # 8-byte Spill > callq bar > vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload > vmovsd %xmm0, (%rsp) # 8-byte Spill > callq bar > vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload > > > If we enable reassociation via -enable-unsafe-fp-math, we still have 3 spills: > > callq bar > vmovsd %xmm0, 16(%rsp) # 8-byte Spill > callq bar > vmovsd %xmm0, 8(%rsp) # 8-byte Spill > callq bar > vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload > vmovsd %xmm0, 8(%rsp) # 8-byte Spill > callq bar > vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload > vaddsd 16(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload > > This looks like what is described in the original problem: the adds got reassociated for no benefit (and possibly some harm, although it may be out-of-scope for the MachineCombiner pass). > > We wanted to add the results of the first 2 function calls, add the results of the last 2 function calls, and then add those 2 results to reduce the critical path. Instead, we got: > > ((b + c) + d) + a > > This shows that either the cost calculation in the MachineCombiner is wrong or the results coming back from MachineTraceMetrics are wrong. Or maybe MachineCombiner should be bailing out of a situation like this in the first place - are we even allowed to move instructions around those function calls? > > Here's where it gets worse - if the adds are already arranged to reduce the critical path: > > define double @foo4_reassociated() { > %a = call double @bar() > %b = call double @bar() > %c = call double @bar() > %d = call double @bar() > %t0 = fadd double %a, %b > %t1 = fadd double %c, %d > %t2 = fadd double %t0, %t1 > ret double %t2 > } > > The MachineCombiner is *increasing* the critical path by reassociating the operands: > > callq bar > vmovsd %xmm0, 16(%rsp) # 8-byte Spill > callq bar > vmovsd %xmm0, 8(%rsp) # 8-byte Spill > callq bar > vmovsd %xmm0, (%rsp) # 8-byte Spill > callq bar > vaddsd (%rsp), %xmm0, %xmm0 # 8-byte Folded Reload > vaddsd 8(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload > vaddsd 16(%rsp), %xmm0, %xmm0 # 8-byte Folded Reload > > (a + b) + (c + d) --> ((d + c) + b) + a > > I think this is a problem calculating and/or using the "instruction slack" in MachineTraceMetrics. > > > On Fri, Oct 2, 2015 at 11:40 AM, Gerolf Hoflehner <ghoflehner at apple.com <mailto:ghoflehner at apple.com>> wrote: > This conflict is with many optimizations incl. copy prop, coalescing, hoisting etc. Each could increase register pressure and with similar impact. Attempts to control the register pressure locally (within an optimization pass) tend to get hard to tune and maintain. Would it be a better way to describe eg in metadata how to undo an optimization? Optimizations that attempt to reduce pressure like splitting or remat could be hooked up and call an undo routine based on a cost model. > > I think there is time to do something longer term. This particular instance can only be an issue under -fast-math. > > Cheers > Gerolf > >> On Oct 1, 2015, at 9:27 AM, Sanjay Patel via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> Hi Haicheng, >> >> We need to prevent the transform if it causes spilling, but I'm not sure yet what mechanism/heuristic we can use to do that. >> Can you file a bug report with a reduced test case? >> >> Thanks! >> >> On Thu, Oct 1, 2015 at 9:10 AM, Haicheng Wu <haicheng at codeaurora.com <mailto:haicheng at codeaurora.com>> wrote: >> Hi Sanjay, >> >> >> >> I observed some extra register spills when applying the reassociation pass on spec2006 benchmarks and I would like to listen to your advice. >> >> >> >> For example, function get_new_point_on_quad() of tria_boundary.cc in spec2006/dealII has a sequences of code like this >> >> >> >> … >> >> X=a+b >> >> … >> >> Y=X+c >> >> … >> >> Z=Y+d >> >> … >> >> >> >> There are many other instructions between these float adds. The reassociation pass first swaps a and c when checking the second add, and then swaps a and d when checking the third add. The transformed code looks like >> >> >> >> … >> >> X=c+b >> >> … >> >> Y=X+d >> >> … >> >> Z=Y+a >> >> >> >> a is pushed all the way down to the bottom and its live range is much larger now. >> >> >> >> Best, >> >> >> >> Haicheng >> >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org> >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20151005/eaaab06d/attachment-0001.html>