On Mon, May 4, 2015 at 2:19 PM, Nicholas White <n.j.white at gmail.com>
wrote:>> It's not quite the same testcase.
> Yes - it's an extension of the first test case that I'd expect to
be
> optimised out in the same way as my earlier example (i.e., store a
> value, read it back and branch on it). If you miss out the
> "check.first.array.element" block (changing the branch that jumps
to
> it to go to the "abort" label instead) like this:
>
> define void @func2(i8* %mem) {
> %1 = icmp eq i8* %mem, null
> br i1 %1, label %check.zero, label %stash.zero
>
> stash.zero:
> %2 = bitcast i8* %mem to %struct.my_s*
> %3 = getelementptr inbounds i8, i8* %mem, i64 4
> %4 = bitcast i8* %3 to i32*
> store i32 0, i32* %4, align 4
> br label %check.zero
>
> check.zero:
> %.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
> %5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32
1
> %6 = load i32, i32* %5, align 4
> %7 = icmp eq i32 %6, 0
> br i1 %7, label %success, label %abort
>
> abort:
> tail call void @__assert_rtn()
> unreachable
>
> success:
> ret void
> }
>
> Then opt -O3 does optimize it down to:
>
> ; Function Attrs: nounwind
> define void @func2(i8* nocapture %mem) #0 {
> stash.zero:
> %0 = getelementptr inbounds i8, i8* %mem, i64 4
> %1 = bitcast i8* %0 to i32*
> store i32 0, i32* %1, align 4
> ret void
> }
>
> ...so something about the "check.first.array.element" block
confuses
> whatever analysis opt used to determine %6 was zero in func2.
>
>> Can you walk me through the below testcase and epxlain what you expect
> to ahppen?
> Definitely:
>
>> %struct.my_s = type { i32, i32, [0 x i8*] }
> We only read and write to the first i32, although a code branch never
> taken will read first element of the the variable length array.
>
>> ; Function Attrs: noreturn
>> declare void @__assert_rtn()
> basically any noreturn function
>
>> define void @func(i8* %mem) {
>> %1 = icmp eq i8* %mem, null
>> br i1 %1, label %check.zero, label %stash.zero
> Checks the input pointer to see if it's null - the C code this is
> originally derived from didn't check this return value from malloc.
>
>> stash.zero:
>> %2 = bitcast i8* %mem to %struct.my_s*
>> %3 = getelementptr inbounds i8, i8* %mem, i64 4
> get a pointer to the 4th byte of memory, i.e. the second i32 member of
> the struct
>> %4 = bitcast i8* %3 to i32*
>> store i32 0, i32* %4, align 4
> and put a zero in it - nb. this branch is only taken when %mem is not null
>> br label %check.zero
>
>>check.zero:
>> %.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
>> %5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0,
i32 1
> get a pointer to the second element of the struct a different way, but
> because the control flow from both exits of block %0 end up here the
> base pointer is actually always %mem, but we may know whether it's
> null or not
Sure
>> %6 = load i32, i32* %5, align 4
> the C code loads the value from the second i32 of the struct,
> regardless of whether the pointer's null or not. Opt correctly assumes
> %5 therefore can't be null.
GVN, at least, does not in fact, determine the equality that %5 is not null.
This is because it only propagates equalities from edge conditions,
not from inbounds conditions.
This is the cause of your next failure (at least, insofar as GVN's
ability to optimize it).
>> %7 = icmp eq i32 %6, 0
> compare the value of the second element to zero; we stored zero here
> (%4) in the previous block (as we can't have taken the null path from
> %0 and still got this far).
It does not understand the null path is unreachable.
> Opt sometimes seems to deduce this (ie %7
> == i1 1) as in the func2 example above
If it knew the GEP's were the same, and knew that it was not null, it
would be able to determine the value of this.
So, the fix depends on how aggressive you want to be.
First, propagateEquality in GVN really is not meant to understand
non-edge conditions. You could teach it that anything that compares
it against null are false after an inbounds:
IE
// If the GEP is inbounds, and this is address space 0, we know the pointer is
// not null.
if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I))
if (GEP->isInBounds() && GEP->getAddressSpace() == 0) {
// Need a propagateInequality that replaces any equality checks against
// null with false
}
However, you want to go further here:
What you really want to do in this case is teach GVN that any inbounds
gep that post-dominates a comparison of that pointer against null,
must result in that pointer comparison coming up with non-null.
IE In the above, you would follow through all getPointerOperand->uses,
and if they are icmps against null and post-dominated by the inbounds
GEP block, replace them with false :)
You need to use phitransaddr to translate the pointers backwards, to
get that %2 is%mem. In this case, it will work, and is easy, because
the predecessor is the thing you post-dominate. The path walking to
get a correct translation is tricky in some cases.
IE
// If the GEP is inbounds, and this is address space 0, we know the pointer is
// not null.
if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I))
if (GEP->isInBounds() && GEP->getAddressSpace() == 0) {
for (auto Use : GEP->getPointerOperand->uses())
{
PHITransAddr AddressTranslator(...)
if (!AddressTranslator.PHITranslateValue(GEP->getPointerOperand,
predecessor))
for (auto User : AddressTranslator.getAddr())
// If this is an icmp eq against null, declare victory and
replace the result with false.
}
This is easier in NewGVN (and not N^2 like the above).
You also could make a pass that is identical to SCCP, but walks in the
reverse direction (IE operands instead of uses), and walks up the
post-dom tree.
Propagate inbounds assumptions upwards.
That would be O(n)
(because by the definition of post domination, every path from the
icmp to the exit block must go through the inbounds check, which would
result in the gep having an undef value. If it has an undef value, we
can say it's not-null anyway :P)