Tobias Grosser
2011-Dec-29 14:00 UTC
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On 12/14/2011 01:25 AM, Hal Finkel wrote:> Tobias, > > I've attached an updated copy of the patch. I believe that I accounted > for all of your suggestions except for: > > 1. You said that I could make AA a member of the class and initialize it > for each basic block. I suppose that I'd need to make it a pointer, but > more generally, what is the thread-safely model that I should have in > mind for the analysis passes (will multiple threads always use distinct > instances of the passes)? Furthermore, if I am going to make AA a class > variable that is re-initialized on every call to runOnBasicBlock, then I > should probably do the same with all of the top-level maps, etc. Do you > agree?I am actually not sure about the thread safety conventions in LLVM. I personally always assumed that a transformation class will only be instantiated and used within a single thread and in case of multiple threads, multiple class instances would be created. The pattern of assigning analysis results to object variables is common (e.g. lib/Analysis/IVUsers.cpp). So we should be able to use it. I personally do not have any strong opinion about the other top-level maps. Just decide yourself what is easier to read.> 2. I have not (yet) changed the types of the maps from holding Value* to > Instruction*. Doing so would eliminate a few casts, but would cause > even-more casts (and maybe checks) in computePairsConnectedTo. Even so, > it might be worthwhile to make the change for clarity (conceptually, > those maps do hold only pointers to instructions).I believe if code relies on the assumption that some data structures only objects of a certain type, we should use that type to define the data structures. Though, I do not think this change is required to commit the patch. It can be addressed after the code was committed. One thing that I would still like to have is a test case where bb-vectorize-search-limit is needed to avoid exponential compile time growth and another test case that is not optimized, if bb-vectorize-search-limit is set to a value less than 4000. I think those cases are very valuable to understand the performance behavior of this code. Especially, as I am not yet sure why we need a value as high as 4000. This is my last issue on this patch, that should be addressed before committing it. After we discussed this, I propose to post a final patch stating that you will commit after three days to give other people a chance to veto. Tobi
Hal Finkel
2011-Dec-29 17:32 UTC
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, 2011-12-29 at 15:00 +0100, Tobias Grosser wrote:> On 12/14/2011 01:25 AM, Hal Finkel wrote: > > Tobias, > > > > I've attached an updated copy of the patch. I believe that I accounted > > for all of your suggestions except for: > > > > 1. You said that I could make AA a member of the class and initialize it > > for each basic block. I suppose that I'd need to make it a pointer, but > > more generally, what is the thread-safely model that I should have in > > mind for the analysis passes (will multiple threads always use distinct > > instances of the passes)? Furthermore, if I am going to make AA a class > > variable that is re-initialized on every call to runOnBasicBlock, then I > > should probably do the same with all of the top-level maps, etc. Do you > > agree? > > I am actually not sure about the thread safety conventions in LLVM. I > personally always assumed that a transformation class will only be > instantiated and used within a single thread and in case of multiple > threads, multiple class instances would be created. > > The pattern of assigning analysis results to object variables is common > (e.g. lib/Analysis/IVUsers.cpp). So we should be able to use it.Fair enough; I'll make the change.> I > personally do not have any strong opinion about the other top-level > maps. Just decide yourself what is easier to read. > > > 2. I have not (yet) changed the types of the maps from holding Value* to > > Instruction*. Doing so would eliminate a few casts, but would cause > > even-more casts (and maybe checks) in computePairsConnectedTo. Even so, > > it might be worthwhile to make the change for clarity (conceptually, > > those maps do hold only pointers to instructions). > > I believe if code relies on the assumption that some data structures > only objects of a certain type, we should use that type to define the > data structures. Though, I do not think this change is required to > commit the patch. It can be addressed after the code was committed.This makes sense, I'll probably change it prior to commit.> > One thing that I would still like to have is a test case where > bb-vectorize-search-limit is needed to avoid exponential compile time > growth and another test case that is not optimized, if > bb-vectorize-search-limit is set to a value less than 4000. I think > those cases are very valuable to understand the performance behavior of > this code.Good idea, I'll add these test cases.> Especially, as I am not yet sure why we need a value as high > as 4000.I am not exactly sure why that turned out to be the best number, but I'll try this again in combination with my load/store reordering patch and see if such a large value still seems best.> > This is my last issue on this patch, that should be addressed before > committing it. After we discussed this, I propose to post a final patch > stating that you will commit after three days to give other people a > chance to veto.Sounds like a good plan. I had actually introduced some bugs during refactoring in the last patch I had posted; I'm writing test cases for the fixes and then I'll post an updated patch. Thanks again, Hal> > Tobi-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Tobias Grosser
2011-Dec-30 09:09 UTC
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On 12/29/2011 06:32 PM, Hal Finkel wrote:> On Thu, 2011-12-29 at 15:00 +0100, Tobias Grosser wrote: >> On 12/14/2011 01:25 AM, Hal Finkel wrote: >> One thing that I would still like to have is a test case where >> bb-vectorize-search-limit is needed to avoid exponential compile time >> growth and another test case that is not optimized, if >> bb-vectorize-search-limit is set to a value less than 4000. I think >> those cases are very valuable to understand the performance behavior of >> this code. > > Good idea, I'll add these test cases. > >> Especially, as I am not yet sure why we need a value as high >> as 4000. > > I am not exactly sure why that turned out to be the best number, but > I'll try this again in combination with my load/store reordering patch > and see if such a large value still seems best.They reason why I am surprised about this value, is that I believe partial loop unrolling would not yield bbs of this size. Code size limits should prevent size. However, loop unrolling seems to be the major reason why two accesses to adjacent memory may be placed far away. Without loop unrolling, at a distance of 4000 the fact that two instructions access adjacent memory locations seems to be completely random and the probability that the following instructions perform the same calculations seems low. Also, I believe at 4000 the compile time should already be significant higher. As it seems my intuition is wrong, I am very eager to see and understand an example where a search limit of 4000 is really needed. Cheers Tobi
Apparently Analagous Threads
- [LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
- [LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
- [LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
- [LLVMdev] [llvm] r184698 - Add a flag to defer vectorization into a phase after the inliner and its
- [LLVMdev] loop vectorizer