Hi,
An update on my experiment wih the first loop:
For the first loop, if I change the pragma to "#pragma clang loop
vectorize_width(4) interleave_count(2)", and force the legality check
in isStridedPtr(), the loop gets vectorized and runs faster too.
So in summary,the issue with vectorizing the first loop seems to be (1) Too
strict legality check that does not understand that index cannot really overflow
and (2) Cost computation that says its not profitable to vectorize the loop.
Thanks,
- Vaivaswatha
On Thursday, 23 April 2015 11:05 AM, Vaivaswatha N <vaivaswatha at
yahoo.co.in> wrote:
Thank you Sanjoy for the explanation. Is it worth filing a bug over this at
this point?
Hi James,>Your first example is similar to the strided loops that Hao is
working on vectorizing with his indexed load intrinsics.I'm curious. For
the example I mentioned, legality check fails because the corresponding SCEV
doesn't have nsw set and hence isStridedPtr() returns false. In reality
the induction variable has a statically known bound and it cannot overflow, so
it is really legal to vectorize the loop. Did you face this problem (and solve
it) ?
Thanks everyone for your response and clarification.
- Vaivaswatha
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
On Thursday, 23 April 2015 12:22 AM, Sanjoy Das <sanjoy at
playingwithpointers.com> wrote:
> I expect SCEV treats them differently because of MAX_INT
handling.> Look as the definedness of both if n == MAX_INT. The first has
> undefined behavior, the second does not.
> If you change the second into the first, you introduce undefined behavior.
> (or maybe it's implementation defined, but whatever)
To elaborate a little further on this:
In the first loop, you can never enter the loop with "j ==
INT_SMAX"
since INT_SMAX will never be < anything. This means j + 1 cannot
overflow. In the second loop you /can/ enter the loop with "j
=INT_SMAX" if "n == INT_SMAX" so j + 1 can
potentially overflow.
Ideally SCEV should be able to infer the nsw'ness of the additions
from the nsw bits in the source IR; but that's more complex that it
sounds since SCEV does not have a notion of control flow within the
loop and it hashes SCEVs by the operands and not by the nsw/nuw bits.
Crude example:
define void @x(i32 %a, i32 %b, i1 %c) {
entry:
%m = add i32 %a, %b
br i1 %c, label %do, label %dont
do:
%m1 = add nsw i32 %a, %b
br label %dont
dont:
ret void
}
both %m and %m1 get mapped to the *same* SCEV, and you cannot mark
that SCEV as nsw even though %m1 is nsw.
-- Sanjoy
>
>
> This is the:
> if (!getUnsignedRange(RHS).getUnsignedMax().isMaxValue()) {
>
> check in that function simplify.
>
> But you should file a bug anyway.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150423/021bedc3/attachment.html>