Hal Finkel
2015-Feb-25 21:41 UTC
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
----- Original Message -----> From: "Hal Finkel" <hfinkel at anl.gov> > To: "Francois Pichet" <pichet2000 at gmail.com> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "chandlerc" <chandlerc at gmail.com> > Sent: Tuesday, February 24, 2015 11:27:43 PM > Subject: Re: [LLVMdev] Question about shouldMergeGEPs in InstructionCombining > > w----- Original Message ----- > > From: "Francois Pichet" <pichet2000 at gmail.com> > > To: "Hal Finkel" <hfinkel at anl.gov> > > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > > Sent: Tuesday, February 24, 2015 6:17:22 AM > > Subject: Re: [LLVMdev] Question about shouldMergeGEPs in > > InstructionCombining > > > > On Mon, Feb 23, 2015 at 2:17 PM, Hal Finkel < hfinkel at anl.gov > > > wrote: > > > > > > ----- Original Message ----- > > > From: "Francois Pichet" < pichet2000 at gmail.com > > > > To: "LLVM Developers Mailing List" < llvmdev at cs.uiuc.edu > > > > Sent: Sunday, February 22, 2015 5:34:11 PM > > > Subject: [LLVMdev] Question about shouldMergeGEPs in > > > InstructionCombining > > > > > > Hello > > > > > > I am not sure I understand the logic for merging GEPs in > > > InstructionCombining.cpp: > > > > > > static bool shouldMergeGEPs (GEPOperator &GEP, GEPOperator &Src) > > > { > > > // If this GEP has only 0 indices, it is the same pointer as > > > // Src. If Src is not a trivial GEP too, don't combine > > > // the indices. > > > if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() && > > > !Src.hasOneUse()) > > > return false; > > > return true; > > > } > > > > > > > > > > > > I have a case where > > > GEP: %arrayidx7 = getelementptr inbounds i32* %arrayidx, i32 > > > %shl6 > > > Src: %arrayidx = getelementptr inbounds [4096 x i32]* > > > @phasor_4096, > > > i32 0, i32 %shl2 > > > > > > > > > GEP. hasAllZeroIndices() will return false and the merge will > > > occur > > > Why do we want to combine these 2 getelementptr? > > > > > > > > > On my out of tree target, combining these 2 GetElementPtr create > > > a > > > performance regression because since GEP is in a loop (Src is out > > > of > > > loop), GEP will lower to a more complicated address for a > > > subsequent > > > load. (the complicated address needs to be calculated over and > > > over > > > in the loop) > > > > > > > There are a couple of issues here. One, InstCombine's job is the > > move > > the IR toward a reasonable canonical form. It is doing that here, > > and I think that's the right thing to do. However, the problem you > > point out is a general one. Can you please explain why the > > MachineLICM pass is not able to hoist the redundant parts of the > > addressing computation out of the loop? We might also want to > > adjust > > for this in CodeGenPrep (CGP already has addressing-mode aware GEP > > splitting logic, although not quite for this case). > > > > > > > > > > Hi Hal, > > > > > > MachineLICM is not able to hoist anything because the address mode > > is > > not loop invariant. > > > > > > Here is a reduction of the code I am talking about. > > > > > > > > extern const unsigned phasor[4096]; > > void test(unsigned* out , unsigned step_size) > > { > > unsigned big_step_size = step_size<<2; > > int *phasor_ptr_temp_1 = &phasor[big_step_size]; > > for (int i = 0 ; i < 1020 ; i+=4) > > out[i] = phasor_ptr_temp_1[i<<step_size]; > > } > > > > > > I am getting slightly better code on my target (Octasic's Opus) if > > I > > return false for shouldMergeGEPs. > > I just tried with ppc64 and x86-64 and I am also getting better > > code > > without the GEP merging in InstructionCombining. I am not sure what > > the solution is yet but I think we are too aggressive when merging > > GEPs in InstructionCombining. > > > > > > Here is the details for ppc64: http://pastie.org/9978022 > > > > > > First, thanks for posting this, it is quite useful. LLVM trunk's > PowerPC backend does a slightly better job now, but for an > irrelevant reason, so to stick with the code you generated, we have: > > .LBB0_1: # %for.body > # =>This Inner Loop Header: > Depth=1 > slw 8, 5, 4 > ld 9, .LC1 at toc@l(7) > addi 5, 5, 4 > add 8, 8, 6 > extsw 8, 8 > sldi 8, 8, 2 > lwzx 8, 9, 8 > addi 9, 3, 16 > stw 8, 0(3) > mr 3, 9 > bdnz .LBB0_1 > > there are two things wrong here, first: > > ld 9, .LC1 at toc@l(7) > > this load is loop invariant, and should be hoisted (but was not).The problem of the non-hoisted TOC load was a backend deficiency, now fixed in r230553. Thanks for providing this example. I'll now give some more thought to the GEP splitting issue. -Hal> But > more to your point, this: > > add 8, 8, 6 > > which represents this in your C code: > > int *phasor_ptr_temp_1 = &phasor[big_step_size]; > > has remained inside the loop. As you point out, the programmer even > hoisted it for us, and we sunk it into the loop ourselves. The > question is really this: How should we fix this? Generically, I > believe that using fewer GEPs makes a cleaner canonical form for the > IR as it passes through the mid-level optimizers, and we should > split the GEPs to hoist invariant parts out of loops later in the > pipeline (in or near CodeGenPrep, LSR, etc.) where we normally do > addressing-mode-sensitive transformations. > > -Hal > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Mark Heffernan
2015-Mar-12 20:14 UTC
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
Coincidentally, I just ran into this same issue on some of our benchmarks for the NVPTX backend. You have something like this before instcombine: %tmp = getelementptr inbounds i32, i32* %input, i64 %offset loop: %loop_variant = ... %ptr = getelementptr inbounds i32, i32* %tmp, i64 %loop_variant Which gets transformed to: loop: %loop_variant = ... %sum = add nsw i64 %loop_variant, %offset %ptr = getelementptr inbounds i32, i32* %input, i64 %sum The merge essentially reassociates the loop-variant term (%loop_variant) and loop-invariant terms (%input and %offset) in such a way that LICM can't remove it. One idea is to only perform this style of gep merge if at least one of the following conditions is true: (1) both index terms in the GEP are constant. In this case no new add instruction is created, instead the constants are folded. (2) the GEPs are in the same BB. (3) LoopInfo is available, and we know we're not creating a new instruction in a (deeper) loop. What do you think? Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150312/47af6f01/attachment.html>
Francois Pichet
2015-Mar-12 21:03 UTC
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
I think it would make sense for (1) and (2). I am not sure if (3) is feasible in instcombine. (I am not too familiar with LoopInfo) For the Octasic's Opus platform, I modified shouldMergeGEPs in our fork to: if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() && !Src.hasOneUse()) return false; return Src.hasAllConstantIndices(); // was return false; Following that change, I noticed some performance gain for a few specific tests and no regression at all in our (admittedly limited) benchmarks suite. Regards, Francois Pichet, Octasic. On Thu, Mar 12, 2015 at 4:14 PM, Mark Heffernan <meheff at google.com> wrote:> Coincidentally, I just ran into this same issue on some of our benchmarks > for the NVPTX backend. You have something like this before instcombine: > > %tmp = getelementptr inbounds i32, i32* %input, i64 %offset > loop: > %loop_variant = ... > %ptr = getelementptr inbounds i32, i32* %tmp, i64 %loop_variant > > Which gets transformed to: > > loop: > %loop_variant = ... > %sum = add nsw i64 %loop_variant, %offset > %ptr = getelementptr inbounds i32, i32* %input, i64 %sum > > The merge essentially reassociates the loop-variant term (%loop_variant) > and loop-invariant terms (%input and %offset) in such a way that LICM can't > remove it. > > One idea is to only perform this style of gep merge if at least one of the > following conditions is true: > (1) both index terms in the GEP are constant. In this case no new add > instruction is created, instead the constants are folded. > (2) the GEPs are in the same BB. > (3) LoopInfo is available, and we know we're not creating a new > instruction in a (deeper) loop. > > What do you think? > > Mark > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150312/76632ccb/attachment.html>