thr3ads.net - llvm dev - [LLVMdev] Question about shouldMergeGEPs in InstructionCombining [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Francois Pichet

2015-Mar-12 21:03 UTC

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

I think it would make sense for (1) and (2). I am not sure if (3) is
feasible in instcombine. (I am not too familiar with LoopInfo)

For the Octasic's Opus platform, I modified shouldMergeGEPs in our fork to:

  if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&
      !Src.hasOneUse())
    return false;

  return Src.hasAllConstantIndices(); // was return false;

Following that change, I noticed some performance gain for a few specific
tests and no regression at all in our (admittedly limited) benchmarks suite.

Regards,
Francois Pichet, Octasic.

On Thu, Mar 12, 2015 at 4:14 PM, Mark Heffernan <meheff at google.com>
wrote:
> Coincidentally, I just ran into this same issue on some of our benchmarks
> for the NVPTX backend.  You have something like this before instcombine:
>
>   %tmp = getelementptr inbounds i32, i32* %input, i64 %offset
> loop:
>   %loop_variant = ...
>   %ptr = getelementptr inbounds i32, i32* %tmp, i64 %loop_variant
>
> Which gets transformed to:
>
> loop:
>   %loop_variant = ...
>   %sum = add nsw i64 %loop_variant, %offset
>   %ptr = getelementptr inbounds i32, i32* %input, i64 %sum
>
> The merge essentially reassociates the loop-variant term (%loop_variant)
> and loop-invariant terms (%input and %offset) in such a way that LICM
can't
> remove it.
>
> One idea is to only perform this style of gep merge if at least one of the
> following conditions is true:
> (1) both index terms in the GEP  are constant.  In this case no new add
> instruction is created, instead the constants are folded.
> (2) the GEPs are in the same BB.
> (3) LoopInfo is available, and we know we're not creating a new
> instruction in a (deeper) loop.
>
> What do you think?
>
> Mark
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150312/76632ccb/attachment.html>

Francois Pichet

2015-Mar-12 21:06 UTC

head link

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

On Thu, Mar 12, 2015 at 5:03 PM, Francois Pichet <pichet2000 at gmail.com>
wrote:
> I think it would make sense for (1) and (2). I am not sure if (3) is
> feasible in instcombine. (I am not too familiar with LoopInfo)
>
> For the Octasic's Opus platform, I modified shouldMergeGEPs in our fork
to:
>
>   if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices()
&&
>       !Src.hasOneUse())
>     return false;
>
>   return Src.hasAllConstantIndices(); // was return false;
>
Sorry I meant            // was return true;
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150312/aa250419/attachment.html>

Mark Heffernan

2015-Mar-12 21:19 UTC

head link

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

On Thu, Mar 12, 2015 at 2:03 PM, Francois Pichet <pichet2000 at gmail.com>
wrote:
> I think it would make sense for (1) and (2). I am not sure if (3) is
> feasible in instcombine. (I am not too familiar with LoopInfo)
>
LoopInfo should be available for at least one of the instcombine
invocations.  In the -O2 pipeline it looks like instcombine is called 7
times and it should have loopinfo for the third invocation (called
immediate after some loop passes).  Here's output from
--debug-pass=Structure:

Pass Arguments:  -tti -no-aa -tbaa -scoped-noalias
-assumption-cache-tracker -targetlibinfo -basicaa -verify -simplifycfg
-domtree -sroa -early-cse -lower-expect
Target Transform Information
No Alias Analysis (always returns 'may' alias)
Type-Based Alias Analysis
Scoped NoAlias Alias Analysis
Assumption Cache Tracker
Target Library Information
Basic Alias Analysis (stateless AA impl)
  FunctionPass Manager
    Module Verifier
    Simplify the CFG
    Dominator Tree Construction
    SROA
    Early CSE
    Lower 'expect' Intrinsics
Pass Arguments:  -targetlibinfo -tti -no-aa -tbaa -scoped-noalias
-assumption-cache-tracker -basicaa -verify-di -ipsccp -globalopt
-deadargelim -domtree -instcombine -simplifycfg -basiccg -prune-eh
-inline-cost -i
nline -functionattrs -sroa -domtree -early-cse -lazy-value-info
-jump-threading -correlated-propagation -simplifycfg -domtree -instcombine
-tailcallelim -simplifycfg -reassociate -domtree -loops -loop-simplify -lc
ssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution
-loop-simplify -lcssa -indvars -loop-idiom -loop-deletion -loop-unroll
-memdep -mldst-motion -domtree -memdep -gvn -memdep -memcpyopt -sccp -dom
tree -bdce -instcombine -lazy-value-info -jump-threading
-correlated-propagation -domtree -memdep -dse -loops -loop-simplify -lcssa
-licm -adce -simplifycfg -domtree -instcombine -barrier -domtree -loops
-loop-sim
plify -lcssa -loop-rotate -branch-prob -block-freq -scalar-evolution
-loop-accesses -loop-vectorize -instcombine -scalar-evolution
-slp-vectorizer -simplifycfg -domtree -instcombine -loops -loop-simplify
-lcssa -s
calar-evolution -loop-unroll -alignment-from-assumptions
-strip-dead-prototypes -globaldce -constmerge -verify -verify-di
Target Library Information
Target Transform Information
No Alias Analysis (always returns 'may' alias)
Type-Based Alias Analysis
Scoped NoAlias Alias Analysis
Assumption Cache Tracker
Basic Alias Analysis (stateless AA impl)
  ModulePass Manager
    Debug Info Verifier
    Interprocedural Sparse Conditional Constant Propagation
    Global Variable Optimizer
    Dead Argument Elimination
    FunctionPass Manager
      Dominator Tree Construction
      *Combine redundant instructions*
      Simplify the CFG
    CallGraph Construction
    Call Graph SCC Pass Manager
      Remove unused exception handling info
      Inline Cost Analysis
      Function Integration/Inlining
      Deduce function attributes
      FunctionPass Manager
        SROA
        Dominator Tree Construction
        Early CSE
        Lazy Value Information Analysis
        Jump Threading
        Value Propagation
        Simplify the CFG
        Dominator Tree Construction
        *Combine redundant instructions*
        Tail Call Elimination
        Simplify the CFG
        Reassociate expressions
        Dominator Tree Construction
        Natural Loop Information
        Canonicalize natural loops
        Loop-Closed SSA Form Pass
        Loop Pass Manager
          Rotate Loops
          Loop Invariant Code Motion
          Unswitch loops
        *Combine redundant instructions*
        Scalar Evolution Analysis
        Canonicalize natural loops
        Loop-Closed SSA Form Pass
        Loop Pass Manager
          Induction Variable Simplification
          Recognize loop idioms
          Delete dead loops
          Unroll loops
        Memory Dependence Analysis
        MergedLoadStoreMotion
        Dominator Tree Construction
        Memory Dependence Analysis
        Global Value Numbering
        Memory Dependence Analysis
        MemCpy Optimization
        Sparse Conditional Constant Propagation
        Dominator Tree Construction
        Bit-Tracking Dead Code Elimination
        *Combine redundant instructions*
        Lazy Value Information Analysis
        Jump Threading
        Value Propagation
        Dominator Tree Construction
        Memory Dependence Analysis
        Dead Store Elimination
        Natural Loop Information
        Canonicalize natural loops
        Loop-Closed SSA Form Pass
        Loop Pass Manager
          Loop Invariant Code Motion
        Aggressive Dead Code Elimination
        Simplify the CFG
        Dominator Tree Construction
        *Combine redundant instructions*
    A No-Op Barrier Pass
    FunctionPass Manager
      Dominator Tree Construction
      Natural Loop Information
      Canonicalize natural loops
      Loop-Closed SSA Form Pass
      Loop Pass Manager
        Rotate Loops
      Branch Probability Analysis
      Block Frequency Analysis
      Scalar Evolution Analysis
      Loop Access Analysis
      Loop Vectorization
      *Combine redundant instructions*
      Scalar Evolution Analysis
      SLP Vectorizer
      Simplify the CFG
      Dominator Tree Construction
      *Combine redundant instructions*
      Natural Loop Information
      Canonicalize natural loops
      Loop-Closed SSA Form Pass
      Scalar Evolution Analysis
      Loop Pass Manager
        Unroll loops
      Alignment from assumptions
    Strip Unused Function Prototypes
    Dead Global Elimination
    Merge Duplicate Global Constants
    FunctionPass Manager
      Module Verifier
    Debug Info Verifier
    Bitcode Writer



Mark


For the Octasic's Opus platform, I modified shouldMergeGEPs in our fork
to:>
>   if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices()
&&
>       !Src.hasOneUse())
>     return false;
>
>   return Src.hasAllConstantIndices(); // was return false;
>
> Following that change, I noticed some performance gain for a few specific
> tests and no regression at all in our (admittedly limited) benchmarks
suite.
>
> Regards,
> Francois Pichet, Octasic.
>
> On Thu, Mar 12, 2015 at 4:14 PM, Mark Heffernan <meheff at
google.com> wrote:
>
>> Coincidentally, I just ran into this same issue on some of our
benchmarks
>> for the NVPTX backend.  You have something like this before
instcombine:
>>
>>   %tmp = getelementptr inbounds i32, i32* %input, i64 %offset
>> loop:
>>   %loop_variant = ...
>>   %ptr = getelementptr inbounds i32, i32* %tmp, i64 %loop_variant
>>
>> Which gets transformed to:
>>
>> loop:
>>   %loop_variant = ...
>>   %sum = add nsw i64 %loop_variant, %offset
>>   %ptr = getelementptr inbounds i32, i32* %input, i64 %sum
>>
>> The merge essentially reassociates the loop-variant term
(%loop_variant)
>> and loop-invariant terms (%input and %offset) in such a way that LICM
can't
>> remove it.
>>
>> One idea is to only perform this style of gep merge if at least one of
>> the following conditions is true:
>> (1) both index terms in the GEP  are constant.  In this case no new add
>> instruction is created, instead the constants are folded.
>> (2) the GEPs are in the same BB.
>> (3) LoopInfo is available, and we know we're not creating a new
>> instruction in a (deeper) loop.
>>
>> What do you think?
>>
>> Mark
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150312/72a0fbf4/attachment.html>

Hal Finkel

2015-Mar-12 21:34 UTC

head link

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

Hi Mark,

It is not clear to me at all that preventing the merging is the right solution.
There are a large number of analysis, including alias analysis, and
optimizations that use GetUnderlyingObject, and related routines to search back
through GEPs. They only do this up to some small finite depth (six, IIRC). So
reducing the GEP depth is likely the right solution for InstCombine (which has
the job of canonicalizing the IR).

We should, however, pull these apart somewhere, and probably in some way that is
address-mode aware. I'd recommend trying to split non-free (via the
addressing-mode) loop-invariant parts of GEPs out of loops in CodeGenPrep.

 -Hal

----- Original Message -----> From: "Mark Heffernan" <meheff at google.com>
> To: "Francois Pichet" <pichet2000 at gmail.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "LLVM
Developers Mailing List" <llvmdev at cs.uiuc.edu>
> Sent: Thursday, March 12, 2015 4:19:56 PM
> Subject: Re: [LLVMdev] Question about shouldMergeGEPs in
InstructionCombining
> 
> 
> 
> 
> On Thu, Mar 12, 2015 at 2:03 PM, Francois Pichet <
> pichet2000 at gmail.com > wrote:
> 
> 
> 
> 
> I think it would make sense for (1) and (2). I am not sure if (3) is
> feasible in instcombine . (I am not too familiar with LoopInfo)
> 
> 
> LoopInfo should be available for at least one of the instcombine
> invocations. In the -O2 pipeline it looks like instcombine is called
> 7 times and it should have loopinfo for the third invocation (called
> immediate after some loop passes). Here's output from
> --debug-pass=Structure:
> 
> 
> 
> Pass Arguments: -tti -no- aa -tbaa -scoped-noalias
> -assumption-cache-tracker -targetlibinfo -basicaa -verify
> -simplifycfg -domtree -sroa -early-cse -lower-expect
> Target Transform Information
> No Alias Analysis (always returns 'may' alias)
> Type-Based Alias Analysis
> Scoped NoAlias Alias Analysis
> Assumption Cache Tracker
> Target Library Information
> Basic Alias Analysis (stateless AA impl)
> FunctionPass Manager
> Module Verifier
> Simplify the CFG
> Dominator Tree Construction
> SROA
> Early CSE
> Lower 'expect' Intrinsics
> Pass Arguments: -targetlibinfo -tti -no-aa -tbaa -scoped-noalias
> -assumption-cache-tracker -basicaa -verify-di -ipsccp -globalopt
> -deadargelim -domtree -instcombine -simplifycfg -basiccg -prune-eh
> -inline-cost -i
> nline -functionattrs -sroa -domtree -early-cse -lazy-value-info
> -jump-threadi ng -correlated-propagation -simplifycfg -domtree
> -instcombine -tailcallelim -simplifycfg -reassociate -domtree -loops
> -loop-simplify -lc
> ssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution
> -loop-simplify -lcssa -indvars -loop-idiom -loop-deletion
> -loop-unroll -memdep -mldst-motion -domtree -memdep -gvn -memdep
> -memcpyopt -sccp -dom
> tree -bdce -instcombine -lazy-value-info -jump-threading
> -correlated-propagation -domtree -memdep -dse -loops -loop-simplify
> -lcssa -licm -adce -simplifycfg -domtree -instcombine -barrier
> -domtree -loops -loop-sim
> plify -lcssa -loop-rotate -branch-prob -block-freq -scalar-evolution
> -loop-accesses -loop-vectorize -instcombine -scalar-evolution
> -slp-vectorizer -simplifycfg -domtree -instcombine -loops
> -loop-simplify -lcssa -s
> calar-evolution -loop-unroll -alignment-from-assumptions
> -strip-dead-prototypes -globaldce -constmerge -verify -verify-di
> Target Library Information
> Tar get Transform Information
> No Alias Analysis (always returns 'may' alias)
> Type-Based Alias Analysis
> Scoped NoAlias Alias Analysis
> Assumption Cache Tracker
> Basic Alias Analysis (stateless AA impl)
> ModulePass Manager
> Debug Info Verifier
> Interprocedural Sparse Conditional Constant Propagation
> Global Variable Optimizer
> Dead Argument Elimination
> FunctionPass Manager
> Dominator Tree Construction
> Combine redundant instructions
> Simplify the CFG
> CallGraph Construction
> Call Graph SCC Pass Manager
> Remove unused exception handling info
> Inline Cost Analysis
> Function Integration/Inlining
> Deduce function attributes
> FunctionPass Manager
> SROA
> Dominator Tree Construction
> Early CSE
> Lazy Value Information Analysis
> Jump Threading
> Value Propagation
> Simplify the CFG
> Dominator Tree Construction
> Combine redundant instructions
> Tail Call Elimination
> Simplify the CFG
> Reassociate expressions
> Dominator Tree Construction
> Natural Loop Information
> Canonicalize natural loops
> Loop-Closed SSA Form Pass
> Loop Pass Manager
> Rotate Loops
> Loop Invariant Code Motion
> Unswitch loops
> 
> Combine redundant instructions
> Scalar Evolution Analysis
> Canonicalize natural loops
> Loop-Closed SSA Form Pass
> Loop Pass Manager
> Induction Variable Simplification
> Recognize loop idioms
> Delete dead loops
> Unroll loops
> Memory Dependence Analysis
> MergedLoadStoreMotion
> Dominator Tree Construction
> Memory Dependence Analysis
> Global Value Numbering
> Memory Dependence Analysis
> MemCpy Optimization
> Sparse Conditional Constant Propagation
> Dominator Tree Construction
> Bit-Tracking Dead Code Elimination
> Combine redundant instructions
> Lazy Value Information Analysis
> Jump Threading
> Value Propagation
> Dominator Tree Construction
> Memory Dependence Analysis
> Dead Store Elimination
> Natural Loop Information
> Canonicalize natural loops
> Loop-Closed SSA Form Pass
> Loop Pass Manager
> Loop Invariant Code Motion
> Aggressive Dead Code Elimination
> Simplify the CFG
> Dominator Tree Construction
> Combine redundant instructions
> A No-Op Barrier Pass
> FunctionPass Manager
> Dominator Tree Construction
> Natural Loop Information
> Canonicalize natural loops
> Loop-Closed SSA Form Pass
> Loop Pass Manager
> Rotate Loops
> Branch Probability Analysis
> Block Frequency Analysis
> Scalar Evolution Analysis
> Loop Access Analysis
> Loop Vectorization
> Combine redundant instructions
> Scalar Evolution Analysis
> SLP Vectorizer
> Simplify the CFG
> Dominator Tree Construction
> Combine redundant instructions
> Natural Loop Information
> Canonicalize natural loops
> Loop-Closed SSA Form Pass
> Scalar Evolution Analysis
> Loop Pass Manager
> Unroll loops
> Alignment from assumptions
> 
> Strip Unused Function Prototypes
> Dead Global Elimination
> Merge Duplicate Global Constants
> FunctionPass Manager
> Module Verifier
> Debug Info Verifier Bitcode Writer
> 
> 
> 
> 
> 
> 
> Mark
> 
> 
> 
> 
> 
> 
> 
> For the Octasic's Opus platform, I modified shouldMergeGEPs in our
> fork to:
> 
> 
> 
> if (GEP.hasAllZeroIndices() && !Src.hasAllZeroIndices() &&
> !Src.hasOneUse())
> return false;
> 
> 
> return Src.hasAllConstantIndices(); // was return false;
> 
> 
> 
> Following that change, I noticed some performance gain for a few
> specific tests and no regression at all in our (admittedly limited)
> benchmarks suite.
> 
> 
> Regards,
> Francois Pichet, Octasic.
> 
> 
> 
> 
> On Thu, Mar 12, 2015 at 4:14 PM, Mark Heffernan < meheff at google.com
>
> wrote:
> 
> 
> 
> Coincidentally, I just ran into this same issue on some of our
> benchmarks for the NVPTX backend. You have something like this
> before instcombine:
> 
> 
> %tmp = getelementptr inbounds i32, i32* %input, i64 %offset
> 
> loop:
> 
> %loop_variant = ...
> %ptr = getelementptr inbounds i32, i32* %tmp, i64 %loop_variant
> 
> 
> Which gets transformed to:
> 
> 
> 
> loop:
> 
> %loop_variant = ...
> %sum = add nsw i64 %loop_variant, %offset
> 
> %ptr = getelementptr inbounds i32, i32* %input, i64 %sum
> 
> 
> 
> The merge essentially reassociates the loop-variant term
> (%loop_variant) and loop-invariant terms (%input and %offset) in
> such a way that LICM can't remove it.
> 
> 
> 
> 
> One idea is to only perform this style of gep merge if at least one
> of the following conditions is true:
> (1) both index terms in the GEP are constant. In this case no new add
> instruction is created, instead the constants are folded.
> (2) the GEPs are in the same BB.
> (3) LoopInfo is available, and we know we're not creating a new
> instruction in a (deeper) loop.
> 
> 
> 
> What do you think?
> 
> 
> Mark
> 
> 
> 
> 
-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Mar 2015 - [LLVMdev] Question about shouldMergeGEPs in InstructionCombining

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

[LLVMdev] Question about shouldMergeGEPs in InstructionCombining

Reasonably Related Threads