I'm current facing an issue related to AliasSetTracker/LICM: the transitive closure of the alias sets is materially more conservative than individual aliasing queries and this conservatism leads to generally worse optimization in LICM. For instance, consider this module: declare i32 @only_reads() readonly declare void @escape(i32*, i32*, i32*) define void @f(i32 %count) { entry: %a = alloca i32 %b = alloca i32 %c = alloca i32 call void @escape(i32* %a, i32* %b, i32* %c) %enter = icmp sle i32 0, %count br i1 %enter, label %loop, label %exit loop: %idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ] %idx.inc = add i32 %idx, 1 %take.be = icmp slt i32 %idx, %count %a.load = load i32, i32* %a store i32 %a.load, i32* %b %v = call i32 @only_reads() store i32 %v, i32* %c br i1 %take.be, label %loop, label %exit exit: ret void } BasicAA knows that %a, %b and %c are all pairwise NoAlias, and given that, LICM should be able to hoist `%a.load` out of the loop. That does not happen because the read only call to `@only_reads` collapses the alias sets for %a, %b and %c into one and so as far as LICM is concerned, they all alias each other. The store to `%b` now "clobbers" `%a.load`. One potential fix is to do N^2 alias queries (every mem operation in the loop with every other mem operation) before hoisting a load, but I suspect AliasSetTracker exists mainly to make LICM faster than N^2, so I'm ruling this solution out. The solution I like: have LICM create two AliasSetTracker objects. The first one tracks all memory operations, as the alias set tracker object does now. The second one only tracks mods. When hoisting a load we check if any alias set in the mod tracker aliases the load location -- if not, the load is okay to hoist to the preheader from the data dependence point of view. In the worst case, this is 2x more (than now) work for building the alias sets and no more extra work when checking if we can hoist a load. Before I dive into implementing this, I have a few questions: * is this even the right problem to solve? Or should I look at other passes for this optimization? * do you think this will be too expensive? If so, does it make sense to do this only on -O3?` -- Sanjoy
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Sanjoy Das > Subject: [LLVMdev] alias set collapse and LICM> declare i32 @only_reads() readonly > declare void @escape(i32*, i32*, i32*) > define void @f(i32 %count) { > entry: > %a = alloca i32 > %b = alloca i32 > %c = alloca i32 > call void @escape(i32* %a, i32* %b, i32* %c) > %enter = icmp sle i32 0, %count > br i1 %enter, label %loop, label %exit > loop: > %idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ] > %idx.inc = add i32 %idx, 1 > %take.be = icmp slt i32 %idx, %count > %a.load = load i32, i32* %a > store i32 %a.load, i32* %b > %v = call i32 @only_reads() > store i32 %v, i32* %c > br i1 %take.be, label %loop, label %exit > exit: > ret void > }> BasicAA knows that %a, %b and %c are all pairwise NoAlias, and given > that, LICM should be able to hoist `%a.load` out of the loop. That > does not happen because the read only call to `@only_reads` collapses > the alias sets for %a, %b and %c into oneCan you explain why the alias sets are collapsed into one here? Can that collapsing simply be avoided for read-only calls without creating a second alias set? - Chuck
> Can you explain why the alias sets are collapsed into one here?If I'm reading the code correctly, it is because the readonly call aliases all of %a, %b and %c. Since two pointers can be in two different alias sets only if they do not alias, %a has to be in the same alias set as the read only call and so does %b and %c. Therefore, all of them have to be in the same alias set.> Can that collapsing simply be avoided for read-only calls without creating a second alias set?I have not yet come up with a simple scheme to solve that problem within AliasSetTracker. Obviously, that does not mean that there isn't one. -- Sanjoy
[+Danny] ----- Original Message -----> From: "Sanjoy Das" <sanjoy at playingwithpointers.com> > To: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Cc: "Philip Reames" <listmail at philipreames.com>, "Hal Finkel" <hfinkel at anl.gov>, "Chandler Carruth" > <chandlerc at google.com> > Sent: Monday, April 27, 2015 4:58:09 PM > Subject: alias set collapse and LICM > > I'm current facing an issue related to AliasSetTracker/LICM: the > transitive closure of the alias sets is materially more conservative > than individual aliasing queries and this conservatism leads to > generally worse optimization in LICM. > > For instance, consider this module: > > declare i32 @only_reads() readonly > declare void @escape(i32*, i32*, i32*) > > define void @f(i32 %count) { > entry: > %a = alloca i32 > %b = alloca i32 > %c = alloca i32 > call void @escape(i32* %a, i32* %b, i32* %c) > %enter = icmp sle i32 0, %count > br i1 %enter, label %loop, label %exit > > loop: > %idx = phi i32 [ 0, %entry ], [ %idx.inc, %loop ] > %idx.inc = add i32 %idx, 1 > %take.be = icmp slt i32 %idx, %count > > %a.load = load i32, i32* %a > store i32 %a.load, i32* %b > > %v = call i32 @only_reads() > store i32 %v, i32* %c > > br i1 %take.be, label %loop, label %exit > > exit: > ret void > } > > BasicAA knows that %a, %b and %c are all pairwise NoAlias, and given > that, LICM should be able to hoist `%a.load` out of the loop. That > does not happen because the read only call to `@only_reads` collapses > the alias sets for %a, %b and %c into one and so as far as LICM is > concerned, they all alias each other. The store to `%b` now > "clobbers" `%a.load`.Each alias set currently keeps track of its AccessType (Ref/Mod/ModRef). Maybe we should not collapse Ref access sets with Mod ones? -Hal> > One potential fix is to do N^2 alias queries (every mem operation in > the loop with every other mem operation) before hoisting a load, but > I > suspect AliasSetTracker exists mainly to make LICM faster than N^2, > so > I'm ruling this solution out. > > The solution I like: have LICM create two AliasSetTracker > objects. The first one tracks all memory operations, as the alias > set > tracker object does now. The second one only tracks mods. When > hoisting a load we check if any alias set in the mod tracker aliases > the load location -- if not, the load is okay to hoist to the > preheader from the data dependence point of view. In the worst case, > this is 2x more (than now) work for building the alias sets and no > more > extra work when checking if we can hoist a load. > > Before I dive into implementing this, I have a few questions: > > * is this even the right problem to solve? Or should I look at > other > passes for this optimization? > > * do you think this will be too expensive? If so, does it make > sense > to do this only on -O3?` > > -- Sanjoy >-- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
> Each alias set currently keeps track of its AccessType (Ref/Mod/ModRef). Maybe we should not collapse Ref access sets with Mod ones?AFAICT, then the read only call will have to be in two alias sets -- a Ref alias set (also containing %a) and a Mod alias set (also containing %b and %c). The collapse is driven by the fact that a alias sets are disjoint -- to not collapse I think we'll have to break that invariant. -- Sanjoy> > -Hal > >> >> One potential fix is to do N^2 alias queries (every mem operation in >> the loop with every other mem operation) before hoisting a load, but >> I >> suspect AliasSetTracker exists mainly to make LICM faster than N^2, >> so >> I'm ruling this solution out. >> >> The solution I like: have LICM create two AliasSetTracker >> objects. The first one tracks all memory operations, as the alias >> set >> tracker object does now. The second one only tracks mods. When >> hoisting a load we check if any alias set in the mod tracker aliases >> the load location -- if not, the load is okay to hoist to the >> preheader from the data dependence point of view. In the worst case, >> this is 2x more (than now) work for building the alias sets and no >> more >> extra work when checking if we can hoist a load. >> >> Before I dive into implementing this, I have a few questions: >> >> * is this even the right problem to solve? Or should I look at >> other >> passes for this optimization? >> >> * do you think this will be too expensive? If so, does it make >> sense >> to do this only on -O3?` >> >> -- Sanjoy >> > > -- > Hal Finkel > Assistant Computational Scientist > Leadership Computing Facility > Argonne National Laboratory
Possibly Parallel Threads
- [LICM][MemorySSA] Converting LICM pass to use MemorySSA to avoid AliasSet collapse issue
- [LLVMdev] alias set collapse and LICM
- RFC: Replace usage of Alias Set Tracker with MemorySSA in LICM
- Hoisting in the presence of volatile loads.
- [LLVMdev] alias set collapse and LICM