thr3ads.net - llvm dev - [LLVMdev] alias set collapse and LICM [Apr 2015]

If this information is useful, please help other people find it:
Share via:

Sanjoy Das

2015-Apr-28 00:32 UTC

[LLVMdev] alias set collapse and LICM

On Mon, Apr 27, 2015 at 4:21 PM, Daniel Berlin <dberlin at dberlin.org>
wrote:> You can't win here (believe me, i've tried, and better people than
me have
> tried, for years :P).
> No matter what you do, the partitioning will never be 100% precise.  The
> only way to solve that in general is to pairwise query over the
> partitioning.
>
> Your basic problem is that all of the alias analyses do not come up with
the
> same partitioning of the heap.
>
> Worse, some are not even transitive, or even symmetric.
Am I correct in saying that the lack of transitivity fundamentally
cannot be helped -- a MayAlias composed with a MayAlias is not a
MayAlias.  The simplest example of this I can come up with is {A ,
C?A:B , B} where A `NoAlias` B and C is a runtime value that cannot be
computed at compile-time.  Then A `MayAlias` C?A:B and C?A:B
`MayAlias` B but A `NoAlias` B.
> This means one single variable may be in multiple disjoint alias sets in
> reality.
Agreed.  I did not propose solving the problem within ASTracker for
this very reason.
> (For example, GEP, PHI will give you different AA results than PHI, GEP in
> some cases. That is mostly a FIXME, but there are plenty of cases you can
> come up with where info tells you something about one variable and it's
> relationship to another, but not to a third. The non-symmetric case is
> rarer, but again, possible.)
>
> Let's say you modify AliasSetTracker to not collapse, and handle
multiple
> disjoint partitions at once.
>
> (Here is here i will draw the equivalence between this and what we've
done
> before in GCC :) )
>
> The only way to maintain correctness (in general) and not collapse is to
> maintain multiple alias sets, and say which are affected by which
> instructions.
>
> (IE you have 1 instruction, and it may MOD 5 different alias sets, each of
> which represents a different partitioning of the heap).
>
> You will discover very quickly, that for most partitioning you can think
of,
> the cost of maintaining the alias sets over a set of instructions with is
> greater than the cost of additional queries you may come up with in the
> first place
>
> In other words, depending on how precise your partitioning you may have
> variables that clobber 10 or 20 or 30 of the disjoint alias sets.  Handling
> this is a lot more expensive than saying "hey, i want to move this one
load
> up another instruction. Does it really conflict?".  This is because
you
> really only care about individual aliases in the alias set at any given
> time, but unless you go through all the addresses ahead of time, and had a
> way to say "okay alias set tracker, i only care about variable x, y,
z, and
> even then, only x in block a, y in block b, etc", you are paying the
cost of
> doing whatever queries are necessary to prove something about *all*
> variables in an alias set.
>
> MemorySSA was essentially built with this problem in mind (and thus, tries
> to provide chains that let you do querying on.  You ask it about a load, it
> can O(1) tell you the nearest dominating thing that MOD's it (IE to
which
> getModRefInfo(Instruction, Location) says Mod).
So my proposal is somewhat related to what you're suggesting --
tracking the stores through a separate ASTracker lets me basically ask
a conservative version of getModRefInfo over the entire loop body.
And given that the loop body will be strongly connected, I think the
list of clobbering stores reported by MemorySSA will be exactly equal
to the list of stores reported by AST, control flow should not matter
since everything in the loop body reaches everything else.

-- Sanjoy

Daniel Berlin

2015-Apr-28 00:47 UTC

head link

[LLVMdev] alias set collapse and LICM

On Mon, Apr 27, 2015 at 5:32 PM, Sanjoy Das
<sanjoy at playingwithpointers.com> wrote:> On Mon, Apr 27, 2015 at 4:21 PM, Daniel Berlin <dberlin at
dberlin.org> wrote:
>> You can't win here (believe me, i've tried, and better people
than me have
>> tried, for years :P).
>> No matter what you do, the partitioning will never be 100% precise. 
The
>> only way to solve that in general is to pairwise query over the
>> partitioning.
>>
>> Your basic problem is that all of the alias analyses do not come up
with the
>> same partitioning of the heap.
>>
>> Worse, some are not even transitive, or even symmetric.
>
> Am I correct in saying that the lack of transitivity fundamentally
> cannot be helped -- a MayAlias composed with a MayAlias is not a
> MayAlias.
Correct.  I also imagine you can abuse noalias to get a noalias b, b
noalias c, a mayalias c.

(But i haven't thought too hard about it)


As a trivial>  The simplest example of this I can come up with is {A ,
> C?A:B , B} where A `NoAlias` B and C is a runtime value that cannot be
> computed at compile-time.  Then A `MayAlias` C?A:B and C?A:B
> `MayAlias` B but A `NoAlias` B.
Yes.

l>
>> This means one single variable may be in multiple disjoint alias sets
in
>> reality.
>
> Agreed.  I did not propose solving the problem within ASTracker for
> this very reason.
>
>> (For example, GEP, PHI will give you different AA results than PHI, GEP
in
>> some cases. That is mostly a FIXME, but there are plenty of cases you
can
>> come up with where info tells you something about one variable and
it's
>> relationship to another, but not to a third. The non-symmetric case is
>> rarer, but again, possible.)
>>
>> Let's say you modify AliasSetTracker to not collapse, and handle
multiple
>> disjoint partitions at once.
>>
>> (Here is here i will draw the equivalence between this and what
we've done
>> before in GCC :) )
>>
>> The only way to maintain correctness (in general) and not collapse is
to
>> maintain multiple alias sets, and say which are affected by which
>> instructions.
>>
>> (IE you have 1 instruction, and it may MOD 5 different alias sets, each
of
>> which represents a different partitioning of the heap).
>>
>> You will discover very quickly, that for most partitioning you can
think of,
>> the cost of maintaining the alias sets over a set of instructions with
is
>> greater than the cost of additional queries you may come up with in the
>> first place
>>
>> In other words, depending on how precise your partitioning you may have
>> variables that clobber 10 or 20 or 30 of the disjoint alias sets. 
Handling
>> this is a lot more expensive than saying "hey, i want to move this
one load
>> up another instruction. Does it really conflict?".  This is
because you
>> really only care about individual aliases in the alias set at any given
>> time, but unless you go through all the addresses ahead of time, and
had a
>> way to say "okay alias set tracker, i only care about variable x,
y, z, and
>> even then, only x in block a, y in block b, etc", you are paying
the cost of
>> doing whatever queries are necessary to prove something about *all*
>> variables in an alias set.
>>
>> MemorySSA was essentially built with this problem in mind (and thus,
tries
>> to provide chains that let you do querying on.  You ask it about a
load, it
>> can O(1) tell you the nearest dominating thing that MOD's it (IE to
which
>> getModRefInfo(Instruction, Location) says Mod).
>
> So my proposal is somewhat related to what you're suggesting --
> tracking the stores through a separate ASTracker lets me basically ask
> a conservative version of getModRefInfo over the entire loop body.
Yes.
MemorySSA just lets you do this per-load instead.
> And given that the loop body will be strongly connected, I think the
> list of clobbering stores reported by MemorySSA will be exactly equal
> to the list of stores reported by AST, control flow should not matter
> since everything in the loop body reaches everything else.
Depends how good AST is.

Given a loop like this:

; Function Attrs: nounwind ssp
define void @test10(i32 %N, double* nocapture %G) #0 {
entry:
  %0 = add i32 %N, -1
  %1 = icmp sgt i32 %0, 1
  br i1 %1, label %bb.nph, label %return

bb.nph:                                           ; preds = %entry
  %tmp = sext i32 %0 to i64
  %tmp8 = add i64 %tmp, -1
  br label %bb

bb:                                               ; preds = %bb, %bb.nph
  %indvar = phi i64 [ 0, %bb.nph ], [ %tmp11, %bb ]
  %scevgep = getelementptr double, double* %G, i64 %indvar
  %tmp9 = add i64 %indvar, 2
  %scevgep10 = getelementptr double, double* %G, i64 %tmp9
  %tmp11 = add i64 %indvar, 1
  %scevgep12 = getelementptr double, double* %G, i64 %tmp11
  %2 = load double, double* %scevgep12, align 8
  %3 = load double, double* %scevgep10, align 8
  %4 = fadd double %2, %3
  %5 = load double, double* %scevgep, align 8
  %6 = fadd double %4, %5
  store double %6, double* %scevgep12, align 8
  %exitcond = icmp eq i64 %tmp11, %tmp8
  br i1 %exitcond, label %return, label %bb

return:                                           ; preds = %bb, %entry
  ret void
}


MemorySSA will return that load scevgep12 is not clobbered in the loop
(in one path through the phi it leads all the way back to the entry
block, in the other path, it leads all the way back to the entry block
because scevgep12 becomes scevgep10 along that path), load scevgep10
gets back to live on entry block and is not clobbered by the loop, and
load scevgep is clobbered by the store below it.

IE:
; Function Attrs: nounwind ssp
define void @test10(i32 %N, double* nocapture %G) #0 {
entry:
  %0 = add i32 %N, -1
  %1 = icmp sgt i32 %0, 1
  br i1 %1, label %bb.nph, label %return

bb.nph:                                           ; preds = %entry
  %tmp = sext i32 %0 to i64
  %tmp8 = add i64 %tmp, -1
  br label %bb

bb:                                               ; preds = %bb, %bb.nph
; 3 = MemoryPhi({bb.nph,liveOnEntry},{bb,1})
  %indvar = phi i64 [ 0, %bb.nph ], [ %tmp11, %bb ]
  %scevgep = getelementptr double, double* %G, i64 %indvar
  %tmp9 = add i64 %indvar, 2
  %scevgep10 = getelementptr double, double* %G, i64 %tmp9
  %tmp11 = add i64 %indvar, 1
  %scevgep12 = getelementptr double, double* %G, i64 %tmp11
; MemoryUse(liveOnEntry)
  %2 = load double, double* %scevgep12, align 8
; MemoryUse(liveOnEntry)
  %3 = load double, double* %scevgep10, align 8
  %4 = fadd double %2, %3
; MemoryUse(3)
  %5 = load double, double* %scevgep, align 8
  %6 = fadd double %4, %5
; 1 = MemoryDef(3)
  store double %6, double* %scevgep12, align 8
  %exitcond = icmp eq i64 %tmp11, %tmp8
  br i1 %exitcond, label %return, label %bb

return:                                           ; preds = %bb, %entry
; 2 = MemoryPhi({entry,liveOnEntry},{bb,1})
  ret void
}
>
> -- Sanjoy

Daniel Berlin

2015-Apr-28 00:55 UTC

head link

[LLVMdev] alias set collapse and LICM

> bb:                                               ; preds = %bb, %bb.nph
> ; 3 = MemoryPhi({bb.nph,liveOnEntry},{bb,1})
>   %indvar = phi i64 [ 0, %bb.nph ], [ %tmp11, %bb ]
>   %scevgep = getelementptr double, double* %G, i64 %indvar
>   %tmp9 = add i64 %indvar, 2
>   %scevgep10 = getelementptr double, double* %G, i64 %tmp9
>   %tmp11 = add i64 %indvar, 1
>   %scevgep12 = getelementptr double, double* %G, i64 %tmp11
> ; MemoryUse(liveOnEntry)
>   %2 = load double, double* %scevgep12, align 8
> ; MemoryUse(liveOnEntry)
>   %3 = load double, double* %scevgep10, align 8
Note that it knows these are both, memory-wise,  not clobbered by the
store in the loop, even though the operands are loop variant (and you
could not hoist these loads out of the loop).

Later, GVN will use this fact to prove the equivalence of these two
loads, and use it to PRE load scevgep12 into a phi of(new load in the
header, %3)
>   %4 = fadd double %2, %3
> ; MemoryUse(3)
>   %5 = load double, double* %scevgep, align 8
^^^^ Note here it's MemoryUse(3) because the PHI dominates it.
MemorySSA, while computing this, walks and hits the store below it (1
= MemoryDef(3)).  But that store does not dominate it, so it is not
valid to say MemoryUse(1). Instead, it unwinds to the nearest
dominating thing.

(that is what i meant by we could make this query O(1), because it
computes the info and throws it away)

We later use this to replace load of scevgep with a phi of (new load
in the header, %6)

>   %6 = fadd double %4, %5
> ; 1 = MemoryDef(3)
>   store double %6, double* %scevgep12, align 8
>   %exitcond = icmp eq i64 %tmp11, %tmp8
>   br i1 %exitcond, label %return, label %bb
>
> return:                                           ; preds = %bb, %entry
> ; 2 = MemoryPhi({entry,liveOnEntry},{bb,1})
>   ret void
> }
>
>>
>> -- Sanjoy

llvm dev - Apr 2015 - [LLVMdev] alias set collapse and LICM

[LLVMdev] alias set collapse and LICM

[LLVMdev] alias set collapse and LICM

[LLVMdev] alias set collapse and LICM