Hal Finkel
2015-Mar-16 17:09 UTC
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
----- Original Message -----> From: "Jingyue Wu" <jingyue at google.com> > To: "Daniel Berlin" <dberlin at dberlin.org>, "Mark Heffernan" <meheff at google.com>, "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Friday, March 13, 2015 1:31:59 PM > Subject: Re: [LLVMdev] Question about shouldMergeGEPs in InstructionCombining > > > Hi Daniel, > > Thanks! I wonder if there's a way to reuse some code in > Reassociation. Looks like most of the logic we want to implement is > in Reassociate.cpp already. But the entire pass seems too expensive > to run with CGP or after each instcombine. >I don't think that the algorithm itself is too expensive for CGP (and we already do dominance-aware GEP splitting in CGP), we don't do any of this at -O0 anyway, but we need something that specifically handles GEPs. For most operations (adds, etc.) reassociating in a way likely to enable LICM is, I believe, already our canonical IR form, and so we do that as part of the normal optimization pipeline. For GEPs, it is anti-canonical, and so GEPs are the special case that needs handling in CGP where we perform these kinds of anti-canonical transformations. -Hal> > Jingyue > > On Fri, Mar 13, 2015 at 10:43 AM Daniel Berlin < dberlin at dberlin.org > > wrote: > > > > > On Fri, Mar 13, 2015 at 10:16 AM Mark Heffernan < meheff at google.com > > wrote: > > > > > > On Thu, Mar 12, 2015 at 2:34 PM, Hal Finkel < hfinkel at anl.gov > > wrote: > > > It is not clear to me at all that preventing the merging is the right > solution. There are a large number of analysis, including alias > analysis, and optimizations that use GetUnderlyingObject, and > related routines to search back through GEPs. They only do this up > to some small finite depth (six, IIRC). So reducing the GEP depth is > likely the right solution for InstCombine (which has the job of > canonicalizing the IR). > > We should, however, pull these apart somewhere, and probably in some > way that is address-mode aware. I'd recommend trying to split > non-free (via the addressing-mode) loop-invariant parts of GEPs out > of loops in CodeGenPrep. > > > > > > > Thanks, Hal. I'll have a look at CGP to see how this might be done. > It's a little more complicated than just pulling the GEP apart, > there needs to be a loop-invariant-aware reassociation to undo the > damage done by the initial merge. > > > > > So, this is in fact, just using the ranks reassociation would > normally use, to do splitting :) > > > That is, even outside of geps, you end up with chains of operations > where, if you moved the operands around, you would expose > loop-invariant calculation. > > > > Reassociation builds a rank map ordered by RPO traversal of the CFG, > and uses it to place operations at the same rank into the same > expression. This guarantees deeper loops have higher ranks. > For your purposes, if you have it calculated already, you could just > use loop depth instead of RPO ordering, since you only care about > invariantness (the downside to not using RPO here is that you may > increase register pressure. The upside is it's easier to reason > about the ranks for your purposes). > > > Anyway, reassociate tries to place operations with the lowest ranks > into leaf computations, since those are "the most loop invariant". > > > You want to do exactly the same thing, with likely the same > algorithm, splitting geps as necessary into "leaf geps" to place > low-ranking operations in the same gep. > > > The only heuristic part is "what is the lowest rank you want to split > for". > If you have stuff at rank 0 and stuff not at rank 0, you always want > to split out the rank 0 stuff. > rank 1 and rank 2, well, if you are using loop depth, it can only be > invariant in a loop of rank 2, etc. > You don't need to split if rank == highest rank in functions, since > you are guaranteed it is not invariant in any loop in that case > ______________________________ _________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/ mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Mark Heffernan
2015-Mar-30 18:08 UTC
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
I implemented this optimization within CGP and it helped several of our benchmarks as well as addressing Francois' original issue for powerpc. It's pretty general and shouldn't negatively impact the ability to take advantage of free address computation in the fancy addressing modes of some processors. However, when writing it I realized a better place to put this might be what is now called the "SeparateConstOffsetFromGEP" pass. It performs two related GEP optimizations late in compilation very close to CGP: reassociating constants in GEP expressions and splittling GEPs. This new pass would be able to share a good chunk of existing code as well (hoisting values out index expressions). Sound reasonable? Mark On Mon, Mar 16, 2015 at 10:09 AM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- > > From: "Jingyue Wu" <jingyue at google.com> > > To: "Daniel Berlin" <dberlin at dberlin.org>, "Mark Heffernan" < > meheff at google.com>, "Hal Finkel" <hfinkel at anl.gov> > > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > > Sent: Friday, March 13, 2015 1:31:59 PM > > Subject: Re: [LLVMdev] Question about shouldMergeGEPs in > InstructionCombining > > > > > > Hi Daniel, > > > > Thanks! I wonder if there's a way to reuse some code in > > Reassociation. Looks like most of the logic we want to implement is > > in Reassociate.cpp already. But the entire pass seems too expensive > > to run with CGP or after each instcombine. > > > > I don't think that the algorithm itself is too expensive for CGP (and we > already do dominance-aware GEP splitting in CGP), we don't do any of this > at -O0 anyway, but we need something that specifically handles GEPs. For > most operations (adds, etc.) reassociating in a way likely to enable LICM > is, I believe, already our canonical IR form, and so we do that as part of > the normal optimization pipeline. For GEPs, it is anti-canonical, and so > GEPs are the special case that needs handling in CGP where we perform these > kinds of anti-canonical transformations. > > -Hal > > > > > Jingyue > > > > On Fri, Mar 13, 2015 at 10:43 AM Daniel Berlin < dberlin at dberlin.org > > > wrote: > > > > > > > > > > On Fri, Mar 13, 2015 at 10:16 AM Mark Heffernan < meheff at google.com > > > wrote: > > > > > > > > > > > > On Thu, Mar 12, 2015 at 2:34 PM, Hal Finkel < hfinkel at anl.gov > > > wrote: > > > > > > It is not clear to me at all that preventing the merging is the right > > solution. There are a large number of analysis, including alias > > analysis, and optimizations that use GetUnderlyingObject, and > > related routines to search back through GEPs. They only do this up > > to some small finite depth (six, IIRC). So reducing the GEP depth is > > likely the right solution for InstCombine (which has the job of > > canonicalizing the IR). > > > > We should, however, pull these apart somewhere, and probably in some > > way that is address-mode aware. I'd recommend trying to split > > non-free (via the addressing-mode) loop-invariant parts of GEPs out > > of loops in CodeGenPrep. > > > > > > > > > > > > > > Thanks, Hal. I'll have a look at CGP to see how this might be done. > > It's a little more complicated than just pulling the GEP apart, > > there needs to be a loop-invariant-aware reassociation to undo the > > damage done by the initial merge. > > > > > > > > > > So, this is in fact, just using the ranks reassociation would > > normally use, to do splitting :) > > > > > > That is, even outside of geps, you end up with chains of operations > > where, if you moved the operands around, you would expose > > loop-invariant calculation. > > > > > > > > Reassociation builds a rank map ordered by RPO traversal of the CFG, > > and uses it to place operations at the same rank into the same > > expression. This guarantees deeper loops have higher ranks. > > For your purposes, if you have it calculated already, you could just > > use loop depth instead of RPO ordering, since you only care about > > invariantness (the downside to not using RPO here is that you may > > increase register pressure. The upside is it's easier to reason > > about the ranks for your purposes). > > > > > > Anyway, reassociate tries to place operations with the lowest ranks > > into leaf computations, since those are "the most loop invariant". > > > > > > You want to do exactly the same thing, with likely the same > > algorithm, splitting geps as necessary into "leaf geps" to place > > low-ranking operations in the same gep. > > > > > > The only heuristic part is "what is the lowest rank you want to split > > for". > > If you have stuff at rank 0 and stuff not at rank 0, you always want > > to split out the rank 0 stuff. > > rank 1 and rank 2, well, if you are using loop depth, it can only be > > invariant in a loop of rank 2, etc. > > You don't need to split if rank == highest rank in functions, since > > you are guaranteed it is not invariant in any loop in that case > > ______________________________ _________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/ mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150330/b47c061c/attachment.html>
Jingyue Wu
2015-Mar-30 18:26 UTC
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
SGTM. I think SeparateConstOffsetFromGEP is a nice place to be. The part that reassociates constants seems subsumed by your new optimization which AFAIK ranks nodes in expression trees and reorganizes them per the ranking. Also, this pass should be renamed to something like GEPReassociate after your optimization gets in. Jingyue On Mon, Mar 30, 2015 at 11:08 AM Mark Heffernan <meheff at google.com> wrote:> I implemented this optimization within CGP and it helped several of our > benchmarks as well as addressing Francois' original issue for powerpc. > It's pretty general and shouldn't negatively impact the ability to take > advantage of free address computation in the fancy addressing modes of some > processors. > > However, when writing it I realized a better place to put this might be > what is now called the "SeparateConstOffsetFromGEP" pass. It performs two > related GEP optimizations late in compilation very close to CGP: > reassociating constants in GEP expressions and splittling GEPs. This new > pass would be able to share a good chunk of existing code as well (hoisting > values out index expressions). Sound reasonable? > > Mark > > > > On Mon, Mar 16, 2015 at 10:09 AM, Hal Finkel <hfinkel at anl.gov> wrote: > >> ----- Original Message ----- >> > From: "Jingyue Wu" <jingyue at google.com> >> > To: "Daniel Berlin" <dberlin at dberlin.org>, "Mark Heffernan" < >> meheff at google.com>, "Hal Finkel" <hfinkel at anl.gov> >> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> >> > Sent: Friday, March 13, 2015 1:31:59 PM >> > Subject: Re: [LLVMdev] Question about shouldMergeGEPs in >> InstructionCombining >> > >> > >> > Hi Daniel, >> > >> > Thanks! I wonder if there's a way to reuse some code in >> > Reassociation. Looks like most of the logic we want to implement is >> > in Reassociate.cpp already. But the entire pass seems too expensive >> > to run with CGP or after each instcombine. >> > >> >> I don't think that the algorithm itself is too expensive for CGP (and we >> already do dominance-aware GEP splitting in CGP), we don't do any of this >> at -O0 anyway, but we need something that specifically handles GEPs. For >> most operations (adds, etc.) reassociating in a way likely to enable LICM >> is, I believe, already our canonical IR form, and so we do that as part of >> the normal optimization pipeline. For GEPs, it is anti-canonical, and so >> GEPs are the special case that needs handling in CGP where we perform these >> kinds of anti-canonical transformations. >> >> -Hal >> >> > >> > Jingyue >> > >> > On Fri, Mar 13, 2015 at 10:43 AM Daniel Berlin < dberlin at dberlin.org >> > > wrote: >> > >> > >> > >> > >> > On Fri, Mar 13, 2015 at 10:16 AM Mark Heffernan < meheff at google.com > >> > wrote: >> > >> > >> > >> > >> > >> > On Thu, Mar 12, 2015 at 2:34 PM, Hal Finkel < hfinkel at anl.gov > >> > wrote: >> > >> > >> > It is not clear to me at all that preventing the merging is the right >> > solution. There are a large number of analysis, including alias >> > analysis, and optimizations that use GetUnderlyingObject, and >> > related routines to search back through GEPs. They only do this up >> > to some small finite depth (six, IIRC). So reducing the GEP depth is >> > likely the right solution for InstCombine (which has the job of >> > canonicalizing the IR). >> > >> > We should, however, pull these apart somewhere, and probably in some >> > way that is address-mode aware. I'd recommend trying to split >> > non-free (via the addressing-mode) loop-invariant parts of GEPs out >> > of loops in CodeGenPrep. >> > >> > >> > >> > >> > >> > >> > Thanks, Hal. I'll have a look at CGP to see how this might be done. >> > It's a little more complicated than just pulling the GEP apart, >> > there needs to be a loop-invariant-aware reassociation to undo the >> > damage done by the initial merge. >> > >> > >> > >> > >> > So, this is in fact, just using the ranks reassociation would >> > normally use, to do splitting :) >> > >> > >> > That is, even outside of geps, you end up with chains of operations >> > where, if you moved the operands around, you would expose >> > loop-invariant calculation. >> > >> > >> > >> > Reassociation builds a rank map ordered by RPO traversal of the CFG, >> > and uses it to place operations at the same rank into the same >> > expression. This guarantees deeper loops have higher ranks. >> > For your purposes, if you have it calculated already, you could just >> > use loop depth instead of RPO ordering, since you only care about >> > invariantness (the downside to not using RPO here is that you may >> > increase register pressure. The upside is it's easier to reason >> > about the ranks for your purposes). >> > >> > >> > Anyway, reassociate tries to place operations with the lowest ranks >> > into leaf computations, since those are "the most loop invariant". >> > >> > >> > You want to do exactly the same thing, with likely the same >> > algorithm, splitting geps as necessary into "leaf geps" to place >> > low-ranking operations in the same gep. >> > >> > >> > The only heuristic part is "what is the lowest rank you want to split >> > for". >> > If you have stuff at rank 0 and stuff not at rank 0, you always want >> > to split out the rank 0 stuff. >> > rank 1 and rank 2, well, if you are using loop depth, it can only be >> > invariant in a loop of rank 2, etc. >> > You don't need to split if rank == highest rank in functions, since >> > you are guaranteed it is not invariant in any loop in that case >> > ______________________________ _________________ >> > LLVM Developers mailing list >> > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> > http://lists.cs.uiuc.edu/ mailman/listinfo/llvmdev >> > >> >> -- >> Hal Finkel >> Assistant Computational Scientist >> Leadership Computing Facility >> Argonne National Laboratory >> > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150330/1c110625/attachment.html>