thr3ads.net - llvm dev - [LLVMdev] RFC: Loop versioning for LICM [Mar 2015]

If this information is useful, please help other people find it:
Share via:

Adam Nemet

2015-Mar-04 05:08 UTC

[LLVMdev] RFC: Loop versioning for LICM

> On Mar 3, 2015, at 1:29 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com
<mailto:Ashutosh.Nema at amd.com>> wrote:
> 
> Hi Adam,
>  
> Thanks for looking into LoopVersioning work.
>  
> I have gone through recent LoopAccessAnalysis changes and found some of the
stuff
> overlaps (i.e. runtime memory check, loop access analysis etc.). 
LoopVersioning can
> use some of the things from LAA.
>  
> LoopVersioning is a memory check based multi versioning optimization, it
simply creates
> aggressive alias version of loop preceded by a memory check. It’s not
concerned about
> the order of instructions and detailed dependency check that LoopVectorizer
does.
> It does some basic loop structure check, loop instruction checks &
memory checks.
>  
> In general found LAA work is more inclined towards LoopVectorizer.
> Found some of the possible reusable functions are biased towards
LoopVectorizer,
> they has specific condition checks for it.  
I am about to post the patches to make LAA suitable for Loop Distribution.  As
you will hopefully find this will make the LAA more generic.  I will cc you on
the patches.
> It’s good to make some of the classes & function more generic and
reusable.
> Will be covering some of the points in this mail.
>  
> RuntimeCheckEmitter
> “RuntimeCheckEmitter::addRuntimeCheck”
> While creating runtime check I have found, some of the things are not
getting considered.
> 1) No need to check if two read only pointers intersect.
> 2) Only need to check pointers between two different dependency sets.
> 3) Only need to check pointers in the same alias set
>  
> I’m sure if we like this to be used by other optimization then not all
optimization appreciate
> above checks. Specifically LoopVersioning does not care about this, it
expects all the pointers
> in a loop should be considered for a memory check. Also it does not care
about different
> dependency set & different alias sets.
>  
> I suggest we can make these checks optional, and give flexibility to users
of this class to set it.
I am not sure I follow.  The logic is meant to reduce the number of memory
checks necessary to ensure the independence of may-alias accesses.  We can omit
1-3 but then we would end up with unnecessary checks that could unnecessarily
slow things down.
> For the same suggesting following change:
> 1) 
> class RuntimeCheckEmitter {
>    …………
>    …………
>   /// Consider readonly pointer intersection in memcheck                 
>   bool CheckReadOnlyPointersIntersection;
>   /// Consider pointers in same dependency sets for memcheck.
>   bool CheckPointersInSameDependencySet;
>   /// Consider pointers in different Alias sets for memcheck
>   bool CheckPointersInDifferentAliasSet
>  
>  
> Add the above 3 variables to class, and allow users of this class to set
it.
>  
>  
> 2) 
> In "RuntimeCheckEmitter::addRuntimeCheck" following 3 condition
needs to
> controlled by above conditional variables.
>  
> a> 
>    Change Following Check:
>       // No need to check if two readonly pointers intersect.
>       if (!PtrRtCheck->IsWritePtr[i] &&
!PtrRtCheck->IsWritePtr[j])
>         continue;
>    To:             
>       // No need to check if two readonly pointers intersect.
>       if (!CheckReadOnlyPointersIntersection &&
!PtrRtCheck->IsWritePtr[i] &&
>             !PtrRtCheck->IsWritePtr[j])
>         continue;        
>         
>  
> b>
>    Change Following Check:
>       // Only need to check pointers between two different dependency sets.
>       if (PtrRtCheck->DependencySetId[i] ==
PtrRtCheck->DependencySetId[j])
>        continue;        
>    To:    
>       // Only need to check pointers between two different dependency sets.
>       if (!CheckPointersInSameDependencySet && 
>              PtrRtCheck->DependencySetId[i] ==
PtrRtCheck->DependencySetId[j])
>        continue;        
>  
>  
> c>
>    Change Following Check:
>       // Only need to check pointers in the same alias set.
>       if (PtrRtCheck->AliasSetId[i] != PtrRtCheck->AliasSetId[j])
>         continue;
>    To:     
>       // Only need to check pointers in the same alias set.
>       if (!CheckPointersInDifferentAliasSet && 
>             PtrRtCheck->AliasSetId[i] != PtrRtCheck->AliasSetId[j])
>         continue;        
>         
> By this we allowing RuntimeCheckEmitter as more flexible and providing user
> more control to use it.
>  
>  
> LoopAccessAnalysis::analyzeLoop
> Here again its very specific to LoopVectorizer. 
> The way it handles stores & loads may not be appreciated by other
optimization
> expecting other treatment. I suggest we should think on flexibility for
user to
> override load & store handling. We can provide virtual methods for load
& store
> handling (i.e. analyzeLoads & analyzeStores). Also some of the
optimization may not
> like call instruction, or further they like to analyze call. We should also
think on those
> lines to make some provision.
Can you please elaborate how you want to analyze function calls?
> AccessAnalysis & LoopAccessAnalysis are tied up dependency check, If
some analysis
> needs same functionality except dependency check then there should be
provision available.
> i.e. LoopVersioning needs similar stuff except dependency analysis, for now
possibility is
> extend & rewrite functions by removing dependency checks. 
I actually consider the coupling of dependency analysis and memcheck generation
as a feature.  Since dependence analysis can further disambiguate memory
accesses (as in the DependencySet except you mentioned above) it can reduce the
number of run-time memory checks necessary.

Adam
>  
> Regards,
> Ashutosh
>  
>  
> From: Adam Nemet [mailto:anemet at apple.com <mailto:anemet at
apple.com>]
> Sent: Friday, February 27, 2015 12:40 AM
> To: Nema, Ashutosh
> Cc: llvmdev at cs.uiuc.edu <mailto:llvmdev at cs.uiuc.edu>
> Subject: Re: [LLVMdev] RFC: Loop versioning for LICM
>  
> Hi Ashutosh,
>  
> Have you been following the recent Loop Access Analysis work?  LAA was
split out from the Loop Vectorizer that have been performing the kind of loop
versioning that you describe.  The main reason was to be able to share this
functionality with other passes.
>  
> Loop Access Analysis is an analysis pass that computes basic memory
dependence and the runtime checks.  The versioning decision and then performing
the transformation are left to the transform passes using this analysis.
>  
> If we decide that a stand-alone memcheck-based loop-versioning is desired
we should probably use this analysis and possibly extend it instead of
duplicating the code.
>  
> Adam
>  
> On Feb 26, 2015, at 2:31 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com
<mailto:Ashutosh.Nema at amd.com>> wrote:
>  
> I like to propose a new loop multi versioning optimization for LICM.
> For now I kept this for LICM only, but it can be used in multiple places.
> The main motivation is to allow optimizations stuck because of memory
> alias dependencies. Most of the time when alias analysis is unsure about
> memory access and it says may-alias. This un surety from alias analysis
restrict
> some of the memory based optimizations to proceed further.
> We observed some cases with LICM, where things are beyond aliasing.
> In cases where alias analysis is unsure we like to use loop versioning as
an alternative.
>  
> Loop Versioning will creates version of the loop with aggressive alias and
the other
> with conservative (default) alias. Aggressive alias version of loop will
have all the
> memory access marked as no-alias. These two version of loop will be
preceded by a
> memory runtime check. This runtime check consists of bound checks for all
unique memory
> accessed in loop, and it ensures aliasing of memory. Based on this check
result at runtime
> any of the loops gets executed, if memory is non aliased then aggressive
aliasing loop
> gets executed, else when memory is aliased then non aggressive aliased
version gets executed.
>  
> By setting no-alias to memory accessed in aggressive alias version of loop,
enable other
> optimization to continue further.
>  
> Following are the top level steps:
>  
> 1) Perform loop do versioning feasibility check.
> 2) If loop is a candidate for versioning then create a memory bound check,
by considering
>      all the memory access in loop body.
> 3) Clone original loop and set all memory access as no-alias in new loop.
> 4) Set original loop & versioned loop as a branch target of runtime
check result.
> 5) Call LICM on aggressive alias versioned of loop(For now LICM is
scheduled later and not directly
>      called from LoopVersioning pass).
>  
> Consider following test:
>  
>      1  int foo(int * var1, int * var2, int * var3, unsigned itr) {
>      2    unsigned i = 0, j = 0;
>      3    for(; i < itr; i++) {
>      4      for(; j < itr; j++) {
>      5        var1[j] = itr + i;
>      6        var3[i] = var1[j] + var3[i];
>      7      }
>      8    }
>      9  }
>  
> At line #6 store to var3 can be moved out by
LICM(promoteLoopAccessesToScalars)
> but because of alias analysis un surety about memory access it unable to
move it out.
>  
> After Loop versioning IR:
>  
> <Versioned Loop>
> for.body3.loopVersion:                            ; preds =
%for.body3.loopVersion.preheader, %for.body3.loopVersion
>   %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion,
%for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ]
>   %arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64
%indvars.iv.loopVersion
>   store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1,
!alias.scope !11, !noalias !11
>   %indvars.iv.next.loopVersion = add nuw nsw i64 %indvars.iv.loopVersion, 1
>   %lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32
>   %exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0
>   br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label
%for.body3.loopVersion
>  
> <Original Loop>
> for.body3:                                        ; preds =
%for.body3.lr.ph, %for.body3
>   %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2,
%for.body3.lr.ph ]
>   %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv
>   store i32 %add, i32* %arrayidx, align 4, !tbaa !1
>   %8 = load i32* %arrayidx7, align 4, !tbaa !1
>   %add8 = add nsw i32 %8, %add
>   store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1
>   %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>   %lftr.wideiv = trunc i64 %indvars.iv to i32
>   %exitcond = icmp eq i32 %lftr.wideiv, %0
>   br i1 %exitcond, label %for.inc11, label %for.body3
>  
> In versioned loop difference is visible, 1 store has moved out.
>  
> Following are some high level details about current implementation:
>  
> -  LoopVersioning
> LoopVersioning is main class which holds multi versioning functionality.
>  
> - LoopVersioning :: isVersioningBeneficial 
> Its member to ‘LoopVersioning’
> Does feasibility check for loop versioning. 
> a) Checks layout of loop.
> b) Instruction level check.
> c) memory checks.
>  
> - LoopVersioning :: versionizeLoop
> a) Clone original loo
> b) Create a runtime memory check.
> c) Add both loops under runtime check results target.
>  
> - RuntimeMemoryCheck
> This class take cares runtime memory check.
>  
> - RuntimeMemoryCheck ::createRuntimeCheck
> It creates runtime memory check.
>  
> In this patch used maximum loop nest threshold as 2, and maximum number
> of pointers in runtime memory check as 5.
>  
> Later I like to make this as a utility so others can use it.
>  
> Requesting to go through patch for detailed approach.
> Patch available at http://reviews.llvm.org/D7900
<http://reviews.llvm.org/D7900>
>  
> Suggestions are comments are welcome.
>  
> Regards,
> Ashutosh
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150303/30da32b4/attachment.html>

Nema, Ashutosh

2015-Mar-06 06:33 UTC

head link

[LLVMdev] RFC: Loop versioning for LICM

I am about to post the patches to make LAA suitable for Loop Distribution.  As
you will hopefully find this will make the LAA more generic.  I will cc you on
the patches.

Sure Adam.

RuntimeCheckEmitter
“RuntimeCheckEmitter::addRuntimeCheck”
While creating runtime check I have found, some of the things are not getting
considered.
1) No need to check if two read only pointers intersect.
2) Only need to check pointers between two different dependency sets.
3) Only need to check pointers in the same alias set

I’m sure if we like this to be used by other optimization then not all
optimization appreciate
above checks. Specifically LoopVersioning does not care about this, it expects
all the pointers
in a loop should be considered for a memory check. Also it does not care about
different
dependency set & different alias sets.

I suggest we can make these checks optional, and give flexibility to users of
this class to set it.

I am not sure I follow.  The logic is meant to reduce the number of memory
checks necessary to ensure the independence of may-alias accesses.  We can omit
1-3 but then we would end up with unnecessary checks that could unnecessarily
slow things down.


I just wanted to keep a flexibility open.
We can give a try to LoopVersioning by keeping point 1 & 3 checks. but I’m
not sure about point 2.
Will give a try to your upcoming patch.



LoopAccessAnalysis::analyzeLoop
Here again its very specific to LoopVectorizer.
The way it handles stores & loads may not be appreciated by other
optimization
expecting other treatment. I suggest we should think on flexibility for user to
override load & store handling. We can provide virtual methods for load
& store
handling (i.e. analyzeLoads & analyzeStores). Also some of the optimization
may not
like call instruction, or further they like to analyze call. We should also
think on those
lines to make some provision.

Can you please elaborate how you want to analyze function calls?
For calls expecting overridable method, providing flexibility to user to
redefine behavior.
There are few possible cases:
may not like any call in the loop body, or may like only few specific calls.
or may not like to pass any pointer/address out of loop by calls.

AccessAnalysis & LoopAccessAnalysis are tied up dependency check, If some
analysis
needs same functionality except dependency check then there should be provision
available.
i.e. LoopVersioning needs similar stuff except dependency analysis, for now
possibility is
extend & rewrite functions by removing dependency checks.

I actually consider the coupling of dependency analysis and memcheck generation
as a feature.  Since dependence analysis can further disambiguate memory
accesses (as in the DependencySet except you mentioned above) it can reduce the
number of run-time memory checks necessary.
As mentioned above I’m not sure this will be useful for loopVersioning.
But like to give a try with your upcoming patch.


Thanks,
Ashutosh

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150306/de018f32/attachment.html>

Adam Nemet

2015-Mar-11 05:18 UTC

head link

[LLVMdev] RFC: Loop versioning for LICM

> On Mar 5, 2015, at 10:33 PM, Nema, Ashutosh <Ashutosh.Nema at
amd.com> wrote:
> 
>  
> I am about to post the patches to make LAA suitable for Loop Distribution. 
As you will hopefully find this will make the LAA more generic.  I will cc you
on the patches.
>  
> Sure Adam.
>  
> RuntimeCheckEmitter
> “RuntimeCheckEmitter::addRuntimeCheck”
> While creating runtime check I have found, some of the things are not
getting considered.
> 1) No need to check if two read only pointers intersect.
> 2) Only need to check pointers between two different dependency sets.
> 3) Only need to check pointers in the same alias set
>  
> I’m sure if we like this to be used by other optimization then not all
optimization appreciate
> above checks. Specifically LoopVersioning does not care about this, it
expects all the pointers
> in a loop should be considered for a memory check. Also it does not care
about different
> dependency set & different alias sets.
>  
> I suggest we can make these checks optional, and give flexibility to users
of this class to set it.
>  
> I am not sure I follow.  The logic is meant to reduce the number of memory
checks necessary to ensure the independence of may-alias accesses.  We can omit
1-3 but then we would end up with unnecessary checks that could unnecessarily
slow things down.
>  
>  
> I just wanted to keep a flexibility open.
> We can give a try to LoopVersioning by keeping point 1 & 3 checks. but
I’m not sure about point 2.
> Will give a try to your upcoming patch.
>  
>  
>  
> LoopAccessAnalysis::analyzeLoop
> Here again its very specific to LoopVectorizer. 
> The way it handles stores & loads may not be appreciated by other
optimization
> expecting other treatment. I suggest we should think on flexibility for
user to
> override load & store handling. We can provide virtual methods for load
& store
> handling (i.e. analyzeLoads & analyzeStores). Also some of the
optimization may not
> like call instruction, or further they like to analyze call. We should also
think on those
> lines to make some provision.
>  
> Can you please elaborate how you want to analyze function calls?
> For calls expecting overridable method, providing flexibility to user to
redefine behavior.
> There are few possible cases:
> may not like any call in the loop body, or may like only few specific
calls.
> or may not like to pass any pointer/address out of loop by calls.
>  
> AccessAnalysis & LoopAccessAnalysis are tied up dependency check, If
some analysis
> needs same functionality except dependency check then there should be
provision available.
> i.e. LoopVersioning needs similar stuff except dependency analysis, for now
possibility is
> extend & rewrite functions by removing dependency checks. 
>  
> I actually consider the coupling of dependency analysis and memcheck
generation as a feature.  Since dependence analysis can further disambiguate
memory accesses (as in the DependencySet except you mentioned above) it can
reduce the number of run-time memory checks necessary.
> As mentioned above I’m not sure this will be useful for loopVersioning.
> But like to give a try with your upcoming patch.
Hi Ashutosh,

My changes are committed now.  LoopAccessAnalysis is an analysis pass, so it has
the advantage that the result of the analysis is cached until it gets
invalidated (i.e. when the loop changes).

For an example of how to use it, you can look at either the loop-vectorizer in
the tree or the WIP patch for the loop-distribution pass in
http://reviews.llvm.org/D6930 <http://reviews.llvm.org/D6930>.

Please let me know if you have any questions.

Adam
>  
> 
> Thanks,
> Ashutosh
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150310/6f81764f/attachment.html>

Maybe Matching Threads

Search for more possibly parallel threads

llvm dev - Mar 2015 - [LLVMdev] RFC: Loop versioning for LICM

[LLVMdev] RFC: Loop versioning for LICM

[LLVMdev] RFC: Loop versioning for LICM

[LLVMdev] RFC: Loop versioning for LICM

Maybe Matching Threads