I like to propose a new loop multi versioning optimization for LICM. For now I kept this for LICM only, but it can be used in multiple places. The main motivation is to allow optimizations stuck because of memory alias dependencies. Most of the time when alias analysis is unsure about memory access and it says may-alias. This un surety from alias analysis restrict some of the memory based optimizations to proceed further. We observed some cases with LICM, where things are beyond aliasing. In cases where alias analysis is unsure we like to use loop versioning as an alternative. Loop Versioning will creates version of the loop with aggressive alias and the other with conservative (default) alias. Aggressive alias version of loop will have all the memory access marked as no-alias. These two version of loop will be preceded by a memory runtime check. This runtime check consists of bound checks for all unique memory accessed in loop, and it ensures aliasing of memory. Based on this check result at runtime any of the loops gets executed, if memory is non aliased then aggressive aliasing loop gets executed, else when memory is aliased then non aggressive aliased version gets executed. By setting no-alias to memory accessed in aggressive alias version of loop, enable other optimization to continue further. Following are the top level steps: 1) Perform loop do versioning feasibility check. 2) If loop is a candidate for versioning then create a memory bound check, by considering all the memory access in loop body. 3) Clone original loop and set all memory access as no-alias in new loop. 4) Set original loop & versioned loop as a branch target of runtime check result. 5) Call LICM on aggressive alias versioned of loop(For now LICM is scheduled later and not directly called from LoopVersioning pass). Consider following test: 1 int foo(int * var1, int * var2, int * var3, unsigned itr) { 2 unsigned i = 0, j = 0; 3 for(; i < itr; i++) { 4 for(; j < itr; j++) { 5 var1[j] = itr + i; 6 var3[i] = var1[j] + var3[i]; 7 } 8 } 9 } At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars) but because of alias analysis un surety about memory access it unable to move it out. After Loop versioning IR: <Versioned Loop> for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ] %arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 %indvars.iv.loopVersion store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, !alias.scope !11, !noalias !11 %indvars.iv.next.loopVersion = add nuw nsw i64 %indvars.iv.loopVersion, 1 %lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32 %exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0 br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label %for.body3.loopVersion <Original Loop> for.body3: ; preds = %for.body3.lr.ph, %for.body3 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, %for.body3.lr.ph ] %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv store i32 %add, i32* %arrayidx, align 4, !tbaa !1 %8 = load i32* %arrayidx7, align 4, !tbaa !1 %add8 = add nsw i32 %8, %add store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv to i32 %exitcond = icmp eq i32 %lftr.wideiv, %0 br i1 %exitcond, label %for.inc11, label %for.body3 In versioned loop difference is visible, 1 store has moved out. Following are some high level details about current implementation: - LoopVersioning LoopVersioning is main class which holds multi versioning functionality. - LoopVersioning :: isVersioningBeneficial Its member to 'LoopVersioning' Does feasibility check for loop versioning. a) Checks layout of loop. b) Instruction level check. c) memory checks. - LoopVersioning :: versionizeLoop a) Clone original loo b) Create a runtime memory check. c) Add both loops under runtime check results target. - RuntimeMemoryCheck This class take cares runtime memory check. - RuntimeMemoryCheck ::createRuntimeCheck It creates runtime memory check. In this patch used maximum loop nest threshold as 2, and maximum number of pointers in runtime memory check as 5. Later I like to make this as a utility so others can use it. Requesting to go through patch for detailed approach. Patch available at http://reviews.llvm.org/D7900 Suggestions are comments are welcome. Regards, Ashutosh -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150226/a5f0ecdb/attachment.html>
Hi Ashutosh, Have you been following the recent Loop Access Analysis work? LAA was split out from the Loop Vectorizer that have been performing the kind of loop versioning that you describe. The main reason was to be able to share this functionality with other passes. Loop Access Analysis is an analysis pass that computes basic memory dependence and the runtime checks. The versioning decision and then performing the transformation are left to the transform passes using this analysis. If we decide that a stand-alone memcheck-based loop-versioning is desired we should probably use this analysis and possibly extend it instead of duplicating the code. Adam> On Feb 26, 2015, at 2:31 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com> wrote: > > I like to propose a new loop multi versioning optimization for LICM. > For now I kept this for LICM only, but it can be used in multiple places. > The main motivation is to allow optimizations stuck because of memory > alias dependencies. Most of the time when alias analysis is unsure about > memory access and it says may-alias. This un surety from alias analysis restrict > some of the memory based optimizations to proceed further. > We observed some cases with LICM, where things are beyond aliasing. > In cases where alias analysis is unsure we like to use loop versioning as an alternative. > > Loop Versioning will creates version of the loop with aggressive alias and the other > with conservative (default) alias. Aggressive alias version of loop will have all the > memory access marked as no-alias. These two version of loop will be preceded by a > memory runtime check. This runtime check consists of bound checks for all unique memory > accessed in loop, and it ensures aliasing of memory. Based on this check result at runtime > any of the loops gets executed, if memory is non aliased then aggressive aliasing loop > gets executed, else when memory is aliased then non aggressive aliased version gets executed. > > By setting no-alias to memory accessed in aggressive alias version of loop, enable other > optimization to continue further. > > Following are the top level steps: > > 1) Perform loop do versioning feasibility check. > 2) If loop is a candidate for versioning then create a memory bound check, by considering > all the memory access in loop body. > 3) Clone original loop and set all memory access as no-alias in new loop. > 4) Set original loop & versioned loop as a branch target of runtime check result. > 5) Call LICM on aggressive alias versioned of loop(For now LICM is scheduled later and not directly > called from LoopVersioning pass). > > Consider following test: > > 1 int foo(int * var1, int * var2, int * var3, unsigned itr) { > 2 unsigned i = 0, j = 0; > 3 for(; i < itr; i++) { > 4 for(; j < itr; j++) { > 5 var1[j] = itr + i; > 6 var3[i] = var1[j] + var3[i]; > 7 } > 8 } > 9 } > > At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars) > but because of alias analysis un surety about memory access it unable to move it out. > > After Loop versioning IR: > > <Versioned Loop> > for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion > %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ] > %arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 %indvars.iv.loopVersion > store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, !alias.scope !11, !noalias !11 > %indvars.iv.next.loopVersion = add nuw nsw i64 %indvars.iv.loopVersion, 1 > %lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32 > %exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0 > br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label %for.body3.loopVersion > > <Original Loop> > for.body3: ; preds = %for.body3.lr.ph, %for.body3 > %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, %for.body3.lr.ph ] > %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv > store i32 %add, i32* %arrayidx, align 4, !tbaa !1 > %8 = load i32* %arrayidx7, align 4, !tbaa !1 > %add8 = add nsw i32 %8, %add > store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1 > %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 > %lftr.wideiv = trunc i64 %indvars.iv to i32 > %exitcond = icmp eq i32 %lftr.wideiv, %0 > br i1 %exitcond, label %for.inc11, label %for.body3 > > In versioned loop difference is visible, 1 store has moved out. > > Following are some high level details about current implementation: > > - LoopVersioning > LoopVersioning is main class which holds multi versioning functionality. > > - LoopVersioning :: isVersioningBeneficial > Its member to ‘LoopVersioning’ > Does feasibility check for loop versioning. > a) Checks layout of loop. > b) Instruction level check. > c) memory checks. > > - LoopVersioning :: versionizeLoop > a) Clone original loo > b) Create a runtime memory check. > c) Add both loops under runtime check results target. > > - RuntimeMemoryCheck > This class take cares runtime memory check. > > - RuntimeMemoryCheck ::createRuntimeCheck > It creates runtime memory check. > > In this patch used maximum loop nest threshold as 2, and maximum number > of pointers in runtime memory check as 5. > > Later I like to make this as a utility so others can use it. > > Requesting to go through patch for detailed approach. > Patch available at http://reviews.llvm.org/D7900 <http://reviews.llvm.org/D7900> > > Suggestions are comments are welcome. > > Regards, > Ashutosh > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150226/59f41bec/attachment.html>
Hi Adam, Thanks for looking into LoopVersioning work. I have gone through recent LoopAccessAnalysis changes and found some of the stuff overlaps (i.e. runtime memory check, loop access analysis etc.). LoopVersioning can use some of the things from LAA. LoopVersioning is a memory check based multi versioning optimization, it simply creates aggressive alias version of loop preceded by a memory check. It’s not concerned about the order of instructions and detailed dependency check that LoopVectorizer does. It does some basic loop structure check, loop instruction checks & memory checks. In general found LAA work is more inclined towards LoopVectorizer. Found some of the possible reusable functions are biased towards LoopVectorizer, they has specific condition checks for it. It’s good to make some of the classes & function more generic and reusable. Will be covering some of the points in this mail. RuntimeCheckEmitter “RuntimeCheckEmitter::addRuntimeCheck” While creating runtime check I have found, some of the things are not getting considered. 1) No need to check if two read only pointers intersect. 2) Only need to check pointers between two different dependency sets. 3) Only need to check pointers in the same alias set I’m sure if we like this to be used by other optimization then not all optimization appreciate above checks. Specifically LoopVersioning does not care about this, it expects all the pointers in a loop should be considered for a memory check. Also it does not care about different dependency set & different alias sets. I suggest we can make these checks optional, and give flexibility to users of this class to set it. For the same suggesting following change: 1) class RuntimeCheckEmitter { ………… ………… /// Consider readonly pointer intersection in memcheck bool CheckReadOnlyPointersIntersection; /// Consider pointers in same dependency sets for memcheck. bool CheckPointersInSameDependencySet; /// Consider pointers in different Alias sets for memcheck bool CheckPointersInDifferentAliasSet Add the above 3 variables to class, and allow users of this class to set it. 2) In "RuntimeCheckEmitter::addRuntimeCheck" following 3 condition needs to controlled by above conditional variables. a> Change Following Check: // No need to check if two readonly pointers intersect. if (!PtrRtCheck->IsWritePtr[i] && !PtrRtCheck->IsWritePtr[j]) continue; To: // No need to check if two readonly pointers intersect. if (!CheckReadOnlyPointersIntersection && !PtrRtCheck->IsWritePtr[i] && !PtrRtCheck->IsWritePtr[j]) continue; b> Change Following Check: // Only need to check pointers between two different dependency sets. if (PtrRtCheck->DependencySetId[i] == PtrRtCheck->DependencySetId[j]) continue; To: // Only need to check pointers between two different dependency sets. if (!CheckPointersInSameDependencySet && PtrRtCheck->DependencySetId[i] == PtrRtCheck->DependencySetId[j]) continue; c> Change Following Check: // Only need to check pointers in the same alias set. if (PtrRtCheck->AliasSetId[i] != PtrRtCheck->AliasSetId[j]) continue; To: // Only need to check pointers in the same alias set. if (!CheckPointersInDifferentAliasSet && PtrRtCheck->AliasSetId[i] != PtrRtCheck->AliasSetId[j]) continue; By this we allowing RuntimeCheckEmitter as more flexible and providing user more control to use it. LoopAccessAnalysis::analyzeLoop Here again its very specific to LoopVectorizer. The way it handles stores & loads may not be appreciated by other optimization expecting other treatment. I suggest we should think on flexibility for user to override load & store handling. We can provide virtual methods for load & store handling (i.e. analyzeLoads & analyzeStores). Also some of the optimization may not like call instruction, or further they like to analyze call. We should also think on those lines to make some provision. AccessAnalysis & LoopAccessAnalysis are tied up dependency check, If some analysis needs same functionality except dependency check then there should be provision available. i.e. LoopVersioning needs similar stuff except dependency analysis, for now possibility is extend & rewrite functions by removing dependency checks. Regards, Ashutosh From: Adam Nemet [mailto:anemet at apple.com] Sent: Friday, February 27, 2015 12:40 AM To: Nema, Ashutosh Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] RFC: Loop versioning for LICM Hi Ashutosh, Have you been following the recent Loop Access Analysis work? LAA was split out from the Loop Vectorizer that have been performing the kind of loop versioning that you describe. The main reason was to be able to share this functionality with other passes. Loop Access Analysis is an analysis pass that computes basic memory dependence and the runtime checks. The versioning decision and then performing the transformation are left to the transform passes using this analysis. If we decide that a stand-alone memcheck-based loop-versioning is desired we should probably use this analysis and possibly extend it instead of duplicating the code. Adam On Feb 26, 2015, at 2:31 AM, Nema, Ashutosh <Ashutosh.Nema at amd.com<mailto:Ashutosh.Nema at amd.com>> wrote: I like to propose a new loop multi versioning optimization for LICM. For now I kept this for LICM only, but it can be used in multiple places. The main motivation is to allow optimizations stuck because of memory alias dependencies. Most of the time when alias analysis is unsure about memory access and it says may-alias. This un surety from alias analysis restrict some of the memory based optimizations to proceed further. We observed some cases with LICM, where things are beyond aliasing. In cases where alias analysis is unsure we like to use loop versioning as an alternative. Loop Versioning will creates version of the loop with aggressive alias and the other with conservative (default) alias. Aggressive alias version of loop will have all the memory access marked as no-alias. These two version of loop will be preceded by a memory runtime check. This runtime check consists of bound checks for all unique memory accessed in loop, and it ensures aliasing of memory. Based on this check result at runtime any of the loops gets executed, if memory is non aliased then aggressive aliasing loop gets executed, else when memory is aliased then non aggressive aliased version gets executed. By setting no-alias to memory accessed in aggressive alias version of loop, enable other optimization to continue further. Following are the top level steps: 1) Perform loop do versioning feasibility check. 2) If loop is a candidate for versioning then create a memory bound check, by considering all the memory access in loop body. 3) Clone original loop and set all memory access as no-alias in new loop. 4) Set original loop & versioned loop as a branch target of runtime check result. 5) Call LICM on aggressive alias versioned of loop(For now LICM is scheduled later and not directly called from LoopVersioning pass). Consider following test: 1 int foo(int * var1, int * var2, int * var3, unsigned itr) { 2 unsigned i = 0, j = 0; 3 for(; i < itr; i++) { 4 for(; j < itr; j++) { 5 var1[j] = itr + i; 6 var3[i] = var1[j] + var3[i]; 7 } 8 } 9 } At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars) but because of alias analysis un surety about memory access it unable to move it out. After Loop versioning IR: <Versioned Loop> for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ] %arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 %indvars.iv.loopVersion store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, !alias.scope !11, !noalias !11 %indvars.iv.next.loopVersion = add nuw nsw i64 %indvars.iv.loopVersion, 1 %lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32 %exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0 br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label %for.body3.loopVersion <Original Loop> for.body3: ; preds = %for.body3.lr.ph, %for.body3 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, %for.body3.lr.ph ] %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv store i32 %add, i32* %arrayidx, align 4, !tbaa !1 %8 = load i32* %arrayidx7, align 4, !tbaa !1 %add8 = add nsw i32 %8, %add store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv to i32 %exitcond = icmp eq i32 %lftr.wideiv, %0 br i1 %exitcond, label %for.inc11, label %for.body3 In versioned loop difference is visible, 1 store has moved out. Following are some high level details about current implementation: - LoopVersioning LoopVersioning is main class which holds multi versioning functionality. - LoopVersioning :: isVersioningBeneficial Its member to ‘LoopVersioning’ Does feasibility check for loop versioning. a) Checks layout of loop. b) Instruction level check. c) memory checks. - LoopVersioning :: versionizeLoop a) Clone original loo b) Create a runtime memory check. c) Add both loops under runtime check results target. - RuntimeMemoryCheck This class take cares runtime memory check. - RuntimeMemoryCheck ::createRuntimeCheck It creates runtime memory check. In this patch used maximum loop nest threshold as 2, and maximum number of pointers in runtime memory check as 5. Later I like to make this as a utility so others can use it. Requesting to go through patch for detailed approach. Patch available at http://reviews.llvm.org/D7900 Suggestions are comments are welcome. Regards, Ashutosh _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu<http://llvm.cs.uiuc.edu/> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150303/b10feddf/attachment.html>
On 02/26/2015 02:31 AM, Nema, Ashutosh wrote:> > I like to propose a new loop multi versioning optimization for LICM. > > For now I kept this for LICM only, but it can be used in multiple places. > > The main motivation is to allow optimizations stuck because of memory > > alias dependencies. Most of the time when alias analysis is unsure about > > memory access and it says may-alias. This un surety from alias > analysis restrict > > some of the memory based optimizations to proceed further. > > We observed some cases with LICM, where things are beyond aliasing. > > In cases where alias analysis is unsure we like to use loop versioning > as an alternative. >I'm concerned about this approach. LICM is deliberately somewhat conservative about aliasing to avoid compile time impact. In practice, we defer some of the harder cases to GVN in the form of full redundancy and partial redundancy elimination. It's not clear to me that versioning a loop in LICM is a good use of either compile time or code size. In particular, I'm concerned about the code size increase that can result from recursive application of loop versioning. I think this is an interesting idea, but the profitability heuristics will need to be incredibly carefully tuned. It would be really really easy to have a net negative impact on the quality of generated code while doing something that seems like a good idea within LICM. Can you spell out the profitability heuristics you're planning on using? You might want to look at what Sanjoy has been doing with IRCE. This is a specialized form of Index Set Splitting with the roughly the same problem to solve. Have you looked at trying to make LICM more precise with regards to aliasing? I was looking at this a while back and it seemed like there was plenty of room for cheap changes with payoff here. (One example: a read only call in a loop collapses all alias sets. Oops.) (It's not clear to me whether you're proposing a new pass, or extending LICM. If the former, much of my concern disappears. If the later, you'll need to really justify the complexity.)> Loop Versioning will creates version of the loop with aggressive alias > and the other > > with conservative (default) alias. Aggressive alias version of loop > will have all the > > memory access marked as no-alias. These two version of loop will be > preceded by a > > memory runtime check. This runtime check consists of bound checks for > all unique memory > > accessed in loop, and it ensures aliasing of memory. Based on this > check result at runtime > > any of the loops gets executed, if memory is non aliased then > aggressive aliasing loop > > gets executed, else when memory is aliased then non aggressive aliased > version gets executed. > > By setting no-alias to memory accessed in aggressive alias version of > loop, enable other > > optimization to continue further. > > Following are the top level steps: > > 1) Perform loop do versioning feasibility check. > > 2) If loop is a candidate for versioning then create a memory bound > check, by considering > > all the memory access in loop body. > > 3) Clone original loop and set all memory access as no-alias in new loop. > > 4) Set original loop & versioned loop as a branch target of runtime > check result. > > 5) Call LICM on aggressive alias versioned of loop(For now LICM is > scheduled later and not directly > > called from LoopVersioning pass). > > Consider following test: > > 1 int foo(int * var1, int * var2, int * var3, unsigned itr) { > > 2 unsigned i = 0, j = 0; > > 3 for(; i < itr; i++) { > > 4 for(; j < itr; j++) { > > 5 var1[j] = itr + i; > > 6 var3[i] = var1[j] + var3[i]; > > 7 } > > 8 } > > 9 } > > At line #6 store to var3 can be moved out by > LICM(promoteLoopAccessesToScalars) > > but because of alias analysis un surety about memory access it unable > to move it out. > > After Loop versioning IR: > > *<Versioned Loop>* > > for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, > %for.body3.loopVersion > > %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, > %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ] > > %arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 > %indvars.iv.loopVersion > > store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, > !alias.scope !11, !noalias !11 > > %indvars.iv.next.loopVersion = add nuw nsw i64 > %indvars.iv.loopVersion, 1 > > %lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32 > > %exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0 > > br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label > %for.body3.loopVersion > > *<Original Loop>* > > for.body3: ; preds = %for.body3.lr.ph, %for.body3 > > %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, > %for.body3.lr.ph ] > > %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv > > store i32 %add, i32* %arrayidx, align 4, !tbaa !1 > > %8 = load i32* %arrayidx7, align 4, !tbaa !1 > > %add8 = add nsw i32 %8, %add > > store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1 > > %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 > > %lftr.wideiv = trunc i64 %indvars.iv to i32 > > %exitcond = icmp eq i32 %lftr.wideiv, %0 > > br i1 %exitcond, label %for.inc11, label %for.body3 > > In versioned loop difference is visible, 1 store has moved out. > > Following are some high level details about current implementation: > > - LoopVersioning > > LoopVersioning is main class which holds multi versioning functionality. > > - LoopVersioning :: isVersioningBeneficial > > Its member to ‘LoopVersioning’ > > Does feasibility check for loop versioning. > > a) Checks layout of loop. > > b) Instruction level check. > > c) memory checks. > > - LoopVersioning :: versionizeLoop > > a) Clone original loo > > b) Create a runtime memory check. > > c) Add both loops under runtime check results target. > > - RuntimeMemoryCheck > > This class take cares runtime memory check. > > - RuntimeMemoryCheck ::createRuntimeCheck > > It creates runtime memory check. > > In this patch used maximum loop nest threshold as 2, and maximum number > > of pointers in runtime memory check as 5. > > Later I like to make this as a utility so others can use it. > > Requesting to go through patch for detailed approach. > > Patch available at http://reviews.llvm.org/D7900 >If you're expecting review, this needs to go to llvm-commits. I have not really looked at it, but one high level comment. You've got a couple of really long functions that need to be broken up into semantic pieces. Reviewing in the current form would be hard.> > Suggestions are comments are welcome. > > Regards, > > Ashutosh > > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150303/ce5bbe53/attachment.html>
Thanks Philip for looking into LoopVersioning.> It's not clear to me whether you're proposing a new pass, or extending LICMIt's a new pass, and we are not changing existing LICM functionality. LoopVersioning is a memory check based multi versioning optimization. It will creates version of the loop with aggressive alias and the other with conservative (default) alias. Aggressive alias version of loop will have all the memory access marked as no-alias. Because of aggressive aliasing, alias analysis will be sure about memory accesses and it will mark no-alias for all memory accesses in alias sets. This will help some of the optimization to proceed further. One case is LICM PromoteAliasSets (recently refactored as promoteLoopAccessesToScalars). Post multi versioning in same pass we are planning to call "promoteLoopAccessesToScalars" on aggressive alias versioned loop.> In particular, I'm concerned about the code size increase that can result from > recursive application of loop versioning. > I think this is an interesting idea, but the profitability heuristics will need to be > incredibly carefully tuned.We are also concerned about profitability heuristics, and we are trying to minimize the negative impact of this pass. Code size increase is a concern, but we are planning to touch the cases where gain by this optimization will be good enough to justify. For now we made this pass optional. Profitability analysis contains basic loop structure analysis, loop instructions analysis and loop memory access analysis. We are also trying to ensure the new versioned loop will get benefit by LICM. Apart from this there are few thresholds: 1) Maximum loop nest threshold allowed is 2(Planning to make this configurable). 2) Maximum number of pointers threshold in memcheck allowed is 5(Planning to make this configurable). I understand profitability plays a vital role here, in current implementation checks added may not be enough for a right profitability analysis. Accepting a possible scope of improvement, as suggested will look into Sanjoy work for IRCE.> I have not really looked at it, but one high level comment. You've got a couple of really long > functions that need to be broken up into semantic pieces. Reviewing in the current form would be hard.At this point I'm not expecting a formal review, just emphasizing on the functionality as a part of RFC. Sure will split out long functions. Regards, Ashutosh From: Philip Reames [mailto:listmail at philipreames.com] Sent: Wednesday, March 04, 2015 12:16 AM To: Nema, Ashutosh; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] RFC: Loop versioning for LICM On 02/26/2015 02:31 AM, Nema, Ashutosh wrote: I like to propose a new loop multi versioning optimization for LICM. For now I kept this for LICM only, but it can be used in multiple places. The main motivation is to allow optimizations stuck because of memory alias dependencies. Most of the time when alias analysis is unsure about memory access and it says may-alias. This un surety from alias analysis restrict some of the memory based optimizations to proceed further. We observed some cases with LICM, where things are beyond aliasing. In cases where alias analysis is unsure we like to use loop versioning as an alternative. I'm concerned about this approach. LICM is deliberately somewhat conservative about aliasing to avoid compile time impact. In practice, we defer some of the harder cases to GVN in the form of full redundancy and partial redundancy elimination. It's not clear to me that versioning a loop in LICM is a good use of either compile time or code size. In particular, I'm concerned about the code size increase that can result from recursive application of loop versioning. I think this is an interesting idea, but the profitability heuristics will need to be incredibly carefully tuned. It would be really really easy to have a net negative impact on the quality of generated code while doing something that seems like a good idea within LICM. Can you spell out the profitability heuristics you're planning on using? You might want to look at what Sanjoy has been doing with IRCE. This is a specialized form of Index Set Splitting with the roughly the same problem to solve. Have you looked at trying to make LICM more precise with regards to aliasing? I was looking at this a while back and it seemed like there was plenty of room for cheap changes with payoff here. (One example: a read only call in a loop collapses all alias sets. Oops.) (It's not clear to me whether you're proposing a new pass, or extending LICM. If the former, much of my concern disappears. If the later, you'll need to really justify the complexity.) Loop Versioning will creates version of the loop with aggressive alias and the other with conservative (default) alias. Aggressive alias version of loop will have all the memory access marked as no-alias. These two version of loop will be preceded by a memory runtime check. This runtime check consists of bound checks for all unique memory accessed in loop, and it ensures aliasing of memory. Based on this check result at runtime any of the loops gets executed, if memory is non aliased then aggressive aliasing loop gets executed, else when memory is aliased then non aggressive aliased version gets executed. By setting no-alias to memory accessed in aggressive alias version of loop, enable other optimization to continue further. Following are the top level steps: 1) Perform loop do versioning feasibility check. 2) If loop is a candidate for versioning then create a memory bound check, by considering all the memory access in loop body. 3) Clone original loop and set all memory access as no-alias in new loop. 4) Set original loop & versioned loop as a branch target of runtime check result. 5) Call LICM on aggressive alias versioned of loop(For now LICM is scheduled later and not directly called from LoopVersioning pass). Consider following test: 1 int foo(int * var1, int * var2, int * var3, unsigned itr) { 2 unsigned i = 0, j = 0; 3 for(; i < itr; i++) { 4 for(; j < itr; j++) { 5 var1[j] = itr + i; 6 var3[i] = var1[j] + var3[i]; 7 } 8 } 9 } At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars) but because of alias analysis un surety about memory access it unable to move it out. After Loop versioning IR: <Versioned Loop> for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ] %arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 %indvars.iv.loopVersion store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, !alias.scope !11, !noalias !11 %indvars.iv.next.loopVersion = add nuw nsw i64 %indvars.iv.loopVersion, 1 %lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32 %exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0 br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label %for.body3.loopVersion <Original Loop> for.body3: ; preds = %for.body3.lr.ph, %for.body3 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, %for.body3.lr.ph ] %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv store i32 %add, i32* %arrayidx, align 4, !tbaa !1 %8 = load i32* %arrayidx7, align 4, !tbaa !1 %add8 = add nsw i32 %8, %add store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv to i32 %exitcond = icmp eq i32 %lftr.wideiv, %0 br i1 %exitcond, label %for.inc11, label %for.body3 In versioned loop difference is visible, 1 store has moved out. Following are some high level details about current implementation: - LoopVersioning LoopVersioning is main class which holds multi versioning functionality. - LoopVersioning :: isVersioningBeneficial Its member to 'LoopVersioning' Does feasibility check for loop versioning. a) Checks layout of loop. b) Instruction level check. c) memory checks. - LoopVersioning :: versionizeLoop a) Clone original loo b) Create a runtime memory check. c) Add both loops under runtime check results target. - RuntimeMemoryCheck This class take cares runtime memory check. - RuntimeMemoryCheck ::createRuntimeCheck It creates runtime memory check. In this patch used maximum loop nest threshold as 2, and maximum number of pointers in runtime memory check as 5. Later I like to make this as a utility so others can use it. Requesting to go through patch for detailed approach. Patch available at http://reviews.llvm.org/D7900 If you're expecting review, this needs to go to llvm-commits. I have not really looked at it, but one high level comment. You've got a couple of really long functions that need to be broken up into semantic pieces. Reviewing in the current form would be hard. Suggestions are comments are welcome. Regards, Ashutosh _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150304/5cf17d6d/attachment.html>
----- Original Message -----> From: "Ashutosh Nema" <Ashutosh.Nema at amd.com> > To: llvmdev at cs.uiuc.edu > Sent: Thursday, February 26, 2015 4:31:18 AM > Subject: [LLVMdev] RFC: Loop versioning for LICM> I like to propose a new loop multi versioning optimization for LICM. > For now I kept this for LICM only, but it can be used in multiple > places. > The main motivation is to allow optimizations stuck because of memory > alias dependencies. Most of the time when alias analysis is unsure > about > memory access and it says may-alias. This un surety from alias > analysis restrict > some of the memory based optimizations to proceed further. > We observed some cases with LICM, where things are beyond aliasing. > In cases where alias analysis is unsure we like to use loop > versioning as an alternative.> Loop Versioning will creates version of the loop with aggressive > alias and the other > with conservative (default) alias. Aggressive alias version of loop > will have all the > memory access marked as no-alias. These two version of loop will be > preceded by a > memory runtime check. This runtime check consists of bound checks for > all unique memory > accessed in loop, and it ensures aliasing of memory. Based on this > check result at runtime > any of the loops gets executed, if memory is non aliased then > aggressive aliasing loop > gets executed, else when memory is aliased then non aggressive > aliased version gets executed.Hi Ashutosh, I think this is a really interesting idea, and I'd like to encourage you to continue working on it. Regarding profitability, I think you'll want to check that you'll be able to hoist/sink a significant number of the memory accesses inside the new "versioned" loop. If a loop has a large number of accesses, and the new loop body only differs by the hosting/sinking of a few, then you're unlikely to see the difference. -Hal> By setting no-alias to memory accessed in aggressive alias version of > loop, enable other > optimization to continue further.> Following are the top level steps:> 1) Perform loop do versioning feasibility check. > 2) If loop is a candidate for versioning then create a memory bound > check, by considering > all the memory access in loop body. > 3) Clone original loop and set all memory access as no-alias in new > loop. > 4) Set original loop & versioned loop as a branch target of runtime > check result. > 5) Call LICM on aggressive alias versioned of loop(For now LICM is > scheduled later and not directly > called from LoopVersioning pass).> Consider following test:> 1 int foo(int * var1, int * var2, int * var3, unsigned itr) { > 2 unsigned i = 0, j = 0; > 3 for(; i < itr; i++) { > 4 for(; j < itr; j++) { > 5 var1[j] = itr + i; > 6 var3[i] = var1[j] + var3[i]; > 7 } > 8 } > 9 }> At line #6 store to var3 can be moved out by > LICM(promoteLoopAccessesToScalars) > but because of alias analysis un surety about memory access it unable > to move it out.> After Loop versioning IR:> <Versioned Loop> > for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, > %for.body3.loopVersion > %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, > %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ] > %arrayidx.loopVersion = getelementptr inbounds i32* %var1, i64 > %indvars.iv.loopVersion > store i32 %add, i32* %arrayidx.loopVersion, align 4, !tbaa !1, > !alias.scope !11, !noalias !11 > %indvars.iv.next.loopVersion = add nuw nsw i64 > %indvars.iv.loopVersion, 1 > %lftr.wideiv.loopVersion = trunc i64 %indvars.iv.loopVersion to i32 > %exitcond.loopVersion = icmp eq i32 %lftr.wideiv.loopVersion, %0 > br i1 %exitcond.loopVersion, label %for.inc11.loopexit38, label > %for.body3.loopVersion> <Original Loop> > for.body3: ; preds = %for.body3.lr.ph, %for.body3 > %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, > %for.body3.lr.ph ] > %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv > store i32 %add, i32* %arrayidx, align 4, !tbaa !1 > %8 = load i32* %arrayidx7, align 4, !tbaa !1 > %add8 = add nsw i32 %8, %add > store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1 > %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 > %lftr.wideiv = trunc i64 %indvars.iv to i32 > %exitcond = icmp eq i32 %lftr.wideiv, %0 > br i1 %exitcond, label %for.inc11, label %for.body3> In versioned loop difference is visible, 1 store has moved out.> Following are some high level details about current implementation:> - LoopVersioning > LoopVersioning is main class which holds multi versioning > functionality.> - LoopVersioning :: isVersioningBeneficial > Its member to ‘LoopVersioning’ > Does feasibility check for loop versioning. > a) Checks layout of loop. > b) Instruction level check. > c) memory checks.> - LoopVersioning :: versionizeLoop > a) Clone original loo > b) Create a runtime memory check. > c) Add both loops under runtime check results target.> - RuntimeMemoryCheck > This class take cares runtime memory check.> - RuntimeMemoryCheck ::createRuntimeCheck > It creates runtime memory check.> In this patch used maximum loop nest threshold as 2, and maximum > number > of pointers in runtime memory check as 5.> Later I like to make this as a utility so others can use it.> Requesting to go through patch for detailed approach. > Patch available at http://reviews.llvm.org/D7900> Suggestions are comments are welcome.> Regards, > Ashutosh > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150305/e675f705/attachment.html>
From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: Friday, March 06, 2015 10:47 AM To: Nema, Ashutosh Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] RFC: Loop versioning for LICM I like to propose a new loop multi versioning optimization for LICM. For now I kept this for LICM only, but it can be used in multiple places. The main motivation is to allow optimizations stuck because of memory alias dependencies. Most of the time when alias analysis is unsure about memory access and it says may-alias. This un surety from alias analysis restrict some of the memory based optimizations to proceed further. We observed some cases with LICM, where things are beyond aliasing. In cases where alias analysis is unsure we like to use loop versioning as an alternative. Loop Versioning will creates version of the loop with aggressive alias and the other with conservative (default) alias. Aggressive alias version of loop will have all the memory access marked as no-alias. These two version of loop will be preceded by a memory runtime check. This runtime check consists of bound checks for all unique memory accessed in loop, and it ensures aliasing of memory. Based on this check result at runtime any of the loops gets executed, if memory is non aliased then aggressive aliasing loop gets executed, else when memory is aliased then non aggressive aliased version gets executed. Hi Ashutosh, I think this is a really interesting idea, and I'd like to encourage you to continue working on it. Thanks Hal. Regarding profitability, I think you'll want to check that you'll be able to hoist/sink a significant number of the memory accesses inside the new "versioned" loop. If a loop has a large number of accesses, and the new loop body only differs by the hosting/sinking of a few, then you're unlikely to see the difference. Yes it’s a good point, we can add this to profitability. Some part of this change overlaps with loop access analysis work by Adam Nemet. Will reuse his changes once gets accepted. Regards, Ashutosh -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150310/d4f82b65/attachment.html>