Francois Pichet
2015-Feb-22 23:34 UTC
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
Hello I am not sure I understand the logic for merging GEPs in InstructionCombining.cpp: static bool shouldMergeGEPs(GEPOperator &GEP, GEPOperator &Src) { // If this GEP has only 0 indices, it is the same pointer as // Src. If Src is not a trivial GEP too, don't combine // the indices. if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() && !Src.hasOneUse()) return false; return true; } I have a case where GEP: %arrayidx7 = getelementptr inbounds i32* %arrayidx, i32 %shl6 Src: %arrayidx = getelementptr inbounds [4096 x i32]* @phasor_4096, i32 0, i32 %shl2 GEP.hasAllZeroIndices() will return false and the merge will occur Why do we want to combine these 2 getelementptr? On my out of tree target, combining these 2 GetElementPtr create a performance regression because since GEP is in a loop (Src is out of loop), GEP will lower to a more complicated address for a subsequent load. (the complicated address needs to be calculated over and over in the loop) Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150222/010b0393/attachment.html>
Hal Finkel
2015-Feb-23 19:17 UTC
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
----- Original Message -----> From: "Francois Pichet" <pichet2000 at gmail.com> > To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Sunday, February 22, 2015 5:34:11 PM > Subject: [LLVMdev] Question about shouldMergeGEPs in InstructionCombining > > Hello > > I am not sure I understand the logic for merging GEPs in > InstructionCombining.cpp: > > static bool shouldMergeGEPs (GEPOperator &GEP, GEPOperator &Src) { > // If this GEP has only 0 indices, it is the same pointer as > // Src. If Src is not a trivial GEP too, don't combine > // the indices. > if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() && > !Src.hasOneUse()) > return false; > return true; > } > > > > I have a case where > GEP: %arrayidx7 = getelementptr inbounds i32* %arrayidx, i32 %shl6 > Src: %arrayidx = getelementptr inbounds [4096 x i32]* @phasor_4096, > i32 0, i32 %shl2 > > > GEP. hasAllZeroIndices() will return false and the merge will occur > Why do we want to combine these 2 getelementptr? > > > On my out of tree target, combining these 2 GetElementPtr create a > performance regression because since GEP is in a loop (Src is out of > loop), GEP will lower to a more complicated address for a subsequent > load. (the complicated address needs to be calculated over and over > in the loop) >There are a couple of issues here. One, InstCombine's job is the move the IR toward a reasonable canonical form. It is doing that here, and I think that's the right thing to do. However, the problem you point out is a general one. Can you please explain why the MachineLICM pass is not able to hoist the redundant parts of the addressing computation out of the loop? We might also want to adjust for this in CodeGenPrep (CGP already has addressing-mode aware GEP splitting logic, although not quite for this case). -Hal> > Thanks. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Francois Pichet
2015-Feb-24 12:17 UTC
[LLVMdev] Question about shouldMergeGEPs in InstructionCombining
On Mon, Feb 23, 2015 at 2:17 PM, Hal Finkel <hfinkel at anl.gov> wrote:> ----- Original Message ----- > > From: "Francois Pichet" <pichet2000 at gmail.com> > > To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > > Sent: Sunday, February 22, 2015 5:34:11 PM > > Subject: [LLVMdev] Question about shouldMergeGEPs in InstructionCombining > > > > Hello > > > > I am not sure I understand the logic for merging GEPs in > > InstructionCombining.cpp: > > > > static bool shouldMergeGEPs (GEPOperator &GEP, GEPOperator &Src) { > > // If this GEP has only 0 indices, it is the same pointer as > > // Src. If Src is not a trivial GEP too, don't combine > > // the indices. > > if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() && > > !Src.hasOneUse()) > > return false; > > return true; > > } > > > > > > > > I have a case where > > GEP: %arrayidx7 = getelementptr inbounds i32* %arrayidx, i32 %shl6 > > Src: %arrayidx = getelementptr inbounds [4096 x i32]* @phasor_4096, > > i32 0, i32 %shl2 > > > > > > GEP. hasAllZeroIndices() will return false and the merge will occur > > Why do we want to combine these 2 getelementptr? > > > > > > On my out of tree target, combining these 2 GetElementPtr create a > > performance regression because since GEP is in a loop (Src is out of > > loop), GEP will lower to a more complicated address for a subsequent > > load. (the complicated address needs to be calculated over and over > > in the loop) > > > > There are a couple of issues here. One, InstCombine's job is the move the > IR toward a reasonable canonical form. It is doing that here, and I think > that's the right thing to do. However, the problem you point out is a > general one. Can you please explain why the MachineLICM pass is not able to > hoist the redundant parts of the addressing computation out of the loop? We > might also want to adjust for this in CodeGenPrep (CGP already has > addressing-mode aware GEP splitting logic, although not quite for this > case). > >Hi Hal, MachineLICM is not able to hoist anything because the address mode is not loop invariant. Here is a reduction of the code I am talking about. extern const unsigned phasor[4096]; void test(unsigned* out , unsigned step_size) { unsigned big_step_size = step_size<<2; int *phasor_ptr_temp_1 = &phasor[big_step_size]; for (int i = 0 ; i < 1020 ; i+=4) out[i] = phasor_ptr_temp_1[i<<step_size]; } I am getting slightly better code on my target (Octasic's Opus) if I return false for shouldMergeGEPs. I just tried with ppc64 and x86-64 and I am also getting better code without the GEP merging in InstructionCombining. I am not sure what the solution is yet but I think we are too aggressive when merging GEPs in InstructionCombining. Here is the details for ppc64: http://pastie.org/9978022 -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150224/606b7b39/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Question about shouldMergeGEPs in InstructionCombining
- [LLVMdev] Question about shouldMergeGEPs in InstructionCombining
- [LLVMdev] Question about shouldMergeGEPs in InstructionCombining
- [LLVMdev] Question about shouldMergeGEPs in InstructionCombining
- [LLVMdev] Question about shouldMergeGEPs in InstructionCombining