thr3ads.net - llvm dev - [llvm-dev] "trunc"s generated by LSR cause problem for SCEV [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Ehsan Amiri via llvm-dev

2016-Jul-22 15:57 UTC

[llvm-dev] "trunc"s generated by LSR cause problem for SCEV

Hi

I am working on a bug that is caused by Scalar Evolution not being able to
compute the iteration count of an unrolled loop (PR 28363). While I believe
there is enough information for SCEV to do its job, I think the code that
is generated by earlier transformations can be simpler. There is one bug in
IndVarSimplify for which Sanjoy Das suggested a fix. With that fix if I
disable loop strength reduction the problem is fixed. Below I have copied
the code before and after loop strength reduction.

For this code pattern, it is possible to prove that truncs generated by LSR
can be avoided (see bottom of the email). Andy Trick says that LSR
generally thinks that trunc is free, but there might be ways to work around
it or improve LSR target hooks.

1- Does anyone has any suggestion on how to fix this in LSR?
2- Any reason that we should not fix LSR, and instead focus on Scalar
Evolution so it can handle more complicated code patterns properly?


*Before LSR:*

*for.body.preheader*:
%xtraiter = and i32 %m, 7

*for.body.preheader.new:*
%unroll_iter = sub i32 %m, %xtraiter

*for.body:*
%niter = phi i32 [ %unroll_iter, %for.body.preheader.new ], [
%niter.nsub.7, %for.body ]
%indvars.iv = phi i64 [ 0, %for.body.preheader.new ], [ %indvars.iv.next.7,
%for.body ]
%indvars.iv.next.7 = add nsw i64 %indvars.iv, 8
%niter.nsub.7 = add nsw i32 %niter, -8
%niter.ncmp.7 = icmp eq i32 %niter.nsub.7, 0


*After LSR:*

*for.body.preheader:*
%xtraiter = and i32 %m, 7

*for.body.preheader.new: *
%unroll_iter = sub i32 %m, %xtraiter
%2 = zext i32 %unroll_iter to i64

*for.body:*
%indvars.iv = phi i64 [ 0, %for.body.preheader.new ], [ %indvars.iv.next.7,
%for.body ]
%indvars.iv.next.7 = add nsw i64 %indvars.iv, 8
%tmp = trunc i64 %indvars.iv.next.7 to i32
%tmp80 = trunc i64 %2 to i32
%niter.ncmp.7 = icmp eq i32 %tmp80, %tmp

*Why trunc is not needed:* %indvars.iv starts from 0 and increments by 8.
%2 is divsible by 8.  If indvars.iv.next.7 ever reaches a value, which has
a non-zero bit in its upper 32 bits, it will repeat that pattern until it
overflows. But the definition of %indvars.iv.next.7 is marked nsw.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160722/f2cdbdb4/attachment.html>

Ehsan Amiri via llvm-dev

2016-Jul-22 20:35 UTC

head link

[llvm-dev] "trunc"s generated by LSR cause problem for SCEV

Adding a couple of points just to make sure I have been clear:

1- Without any trunc the code after LSR will directly compare %2 and
%indvars.iv.next.7 in the loop control logic.
2- The argument for why trunc is not needed basically says that if we
compare %2 and %indvars.iv.next.7, the loop will finish while the upper 32
bits of %indvars.iv.next.7 are still all zero. So the behavior remains the
same as the current behavior.

I am going to look into LSR a little bit to see if I can teach it not to
generate those truncs. If those truncs are needed for some reason, please
let me know.


On Fri, Jul 22, 2016 at 11:57 AM, Ehsan Amiri <ehsanamiri at gmail.com>
wrote:
> Hi
>
> I am working on a bug that is caused by Scalar Evolution not being able to
> compute the iteration count of an unrolled loop (PR 28363). While I believe
> there is enough information for SCEV to do its job, I think the code that
> is generated by earlier transformations can be simpler. There is one bug in
> IndVarSimplify for which Sanjoy Das suggested a fix. With that fix if I
> disable loop strength reduction the problem is fixed. Below I have copied
> the code before and after loop strength reduction.
>
> For this code pattern, it is possible to prove that truncs generated by
> LSR can be avoided (see bottom of the email). Andy Trick says that LSR
> generally thinks that trunc is free, but there might be ways to work around
> it or improve LSR target hooks.
>
> 1- Does anyone has any suggestion on how to fix this in LSR?
> 2- Any reason that we should not fix LSR, and instead focus on Scalar
> Evolution so it can handle more complicated code patterns properly?
>
>
> *Before LSR:*
>
> *for.body.preheader*:
> %xtraiter = and i32 %m, 7
>
> *for.body.preheader.new:*
> %unroll_iter = sub i32 %m, %xtraiter
>
> *for.body:*
> %niter = phi i32 [ %unroll_iter, %for.body.preheader.new ], [
> %niter.nsub.7, %for.body ]
> %indvars.iv = phi i64 [ 0, %for.body.preheader.new ], [
> %indvars.iv.next.7, %for.body ]
> %indvars.iv.next.7 = add nsw i64 %indvars.iv, 8
> %niter.nsub.7 = add nsw i32 %niter, -8
> %niter.ncmp.7 = icmp eq i32 %niter.nsub.7, 0
>
>
> *After LSR:*
>
> *for.body.preheader:*
> %xtraiter = and i32 %m, 7
>
> *for.body.preheader.new: *
> %unroll_iter = sub i32 %m, %xtraiter
> %2 = zext i32 %unroll_iter to i64
>
> *for.body:*
> %indvars.iv = phi i64 [ 0, %for.body.preheader.new ], [
> %indvars.iv.next.7, %for.body ]
> %indvars.iv.next.7 = add nsw i64 %indvars.iv, 8
> %tmp = trunc i64 %indvars.iv.next.7 to i32
> %tmp80 = trunc i64 %2 to i32
> %niter.ncmp.7 = icmp eq i32 %tmp80, %tmp
>
> *Why trunc is not needed:* %indvars.iv starts from 0 and increments by 8.
> %2 is divsible by 8.  If indvars.iv.next.7 ever reaches a value, which has
> a non-zero bit in its upper 32 bits, it will repeat that pattern until it
> overflows. But the definition of %indvars.iv.next.7 is marked nsw.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160722/d043dbe2/attachment.html>

llvm dev - Jul 2016 - "trunc"s generated by LSR cause problem for SCEV

[llvm-dev] "trunc"s generated by LSR cause problem for SCEV

[llvm-dev] "trunc"s generated by LSR cause problem for SCEV