In tracking down an alignment bug, I think I found a problem in
ComputeMaskedBits. But I am not very well versed in this area of LLVM and
need some more eyes. This is in the 2.3 release, though it looks like the
relevant pieces operate the same way in trunk.
I have the following add recurrence:
%r849 = select i1 %r848, i64 0, i64 %r847 ; <i64> [#uses=10]
%r862 = shl i64 %r849, 3 ; <i64> [#uses=3]
%"$SR_S112.0" = phi i64 [ %r862, %"file alignment.f, line 79, in
reduction
loop at depth 0, bb64" ], [ %r1874, %"file alignment.f, line 79, in
loop at
depth 1, bb77" ] ; <i64> [#uses=2]
%r1874 = add i64 %r851, %"$SR_S112.0" ; <i64> [#uses=1]
%r849 = select i1 %r848, i64 0, i64 %r847 ; <i64> [#uses=10]
%r851 = shl i64 %r849, 6 ; <i64> [#uses=16]
It's unimportant what the values from the two selects are -- we are deep
enough into ComputeMaskedBits at this point that the Depth threshold takes
hold and we return "unknown" for those instructions.
So the recurrence looks like:
SR_S112.0 = phi(r849 << 3, sh r849 << 6 + SR_S112.0)
ComputeMaskedBits correctly identifies the recurrence and sets
L = %r851 = shl i64 %r849, 6 ; <i64> [#uses=16]
R = %r862 = shl i64 %r849, 3
ComputeMaskedBits on R returns KnownZero as 0x7, as expected.
The PHI-analysis code in ComputeMaskedBits sets Mask2 to 0x7. It then calls
itself recursively on L passing Mask2 as the mask.
Here's where the problem creeps in. ComputeMaskedBits returns KnownZero as
0x3f for L. Shouldn't the passed Mask limit it to returning 0x7 since the
comment at the top of ComputeMaskedBits says:
"This code only analyzes bits in Mask, in order to short-circuit
processing."
The Shl code for ComputeMaskedBits shifts the incoming Mask to the right by
the (constant) left shift amount. Presumably it does this because we know
those bits will be zero after the Shl and it wants to analyze fewer bits. The
operand of this Shl (and the other one, too) has an unknown number of zero
bits. So ComputeMaskedBits says we know the output has the lower six bits set
to zero, which is correct.
The PHI code for ComputeMaskedBits then does this:
KnownZero = Mask &
APInt::getLowBitsSet(BitWidth,
KnownZero2.countTrailingOnes());
KnownZero2 is the output of the second call to ComputeMaskedBits (the Shl by
six in this case). Mask is the original mask coming into ComputeMaskedBits
with the PHI, which is all ones in this case.l
So the PHI ends up producing 0x3f as KnownZero.
I think this is wrong. We've got a PHI where one operand is known to have
three low zero bits and the other is known to have six low bits. I contend
that after the PHI we only know we have three low bits for sure.
Is my analysis correct? If so, is the PHI code the culprit (for not returning
the min of the KnownZero bits) or is the Shl code the culprit (for not paying
attention to the Mask passed in (it right shifts it)?
If my analysis is incorrect, can someone explain how we can know we have six
low zero bits coming out of this?
Thanks!
-Dave