thr3ads.net - llvm dev - [llvm-dev] Loop Unrolling Fail in Simple Vectorized loop [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Alexandre Isoard via llvm-dev

2016-Oct-13 16:30 UTC

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

If count > MAX_UINT-4 your loop loops indefinitely with an increment of 4,
I think.

On Thu, Oct 13, 2016 at 4:42 PM, Charith Mendis via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> So, I tried unrolling the following simple loop.
>
> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){
>
>   for(unsigned i=0; i<count; i++){
>
>     a[i] = b[i]*c[i-1];
>
>   }
>
>   return 0;
>
> }
>
>
> Then, the unroller is able to unroll it by 2 even though it doesn't
know
> about the range of count. SCEV of backedge taken count is (-1 + %count)
>
>
> But, when I change the increment to 4, as in
>
>
> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){
>
>   for(unsigned i=0; i<count; i+=4){
>
>     a[i] = b[i]*c[i-1];
>
>   }
>
>   return 0;
>
> }
>
>
>  The unroller cannot compute the backedge taken count. Therefore, it seems
> like the problem is not with the range of "count", can't the
unroller
> compute it as (- 1 + %count / 4)?
>
> On Wed, Oct 12, 2016 at 11:28 PM, Charith Mendis <
> char.mendis1989 at gmail.com> wrote:
>
>> Thanks for the explanation. But I am a little confused with the
following
>> fact. Can't LLVM keep vectorizable_elements as a symbolic value and
convert
>> the loop to say;
>>
>> for(unsigned i = 0; i < vectorizable_elements  ; i += 2){
>>     //main loop
>> }
>>
>> for(unsigned i=0 ; i < vectorizable_elements % 2; i++){
>>    //fix up
>> }
>>
>> Why does it have to reason about the range of vectorizable_elements?
Even
>> if vectorizable_elements == SIZE_MAX the above decomposition would
work?
>>
>> On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at
codeaurora.org>
>> wrote:
>>
>>> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote:
>>>
>>> Hi all,
>>>
>>> Attached herewith is a simple vectorized function with loops
performing
>>> a simple shuffle.
>>>
>>> I want all loops (inner and outer) to be unrolled by 2 and as such
used
>>> -unroll-count=2
>>> The inner loops(with k as the induction variable and having
constant
>>> trip counts) unroll fully, but the outer loop with (j) fails to
unroll.
>>>
>>> The llvm code is also attached with inner loops fully unrolled.
>>>
>>> To inspect further, I added the following to the
PassManagerBuilder.cpp
>>> to run some canonicalization routines and redo unrolling again. I
have set
>>> partial unrolling on + have a huge threshold + allows expensive
loop trip
>>> counts. Still it didn't unroll by 2.
>>>
>>> MPM.add(createLoopUnrollPass());
>>>
>>> MPM.add(createCFGSimplificationPass());
>>>
>>> MPM.add(createLoopSimplifyPass());
>>>
>>> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));
>>>
>>> MPM.add(createLCSSAPass());
>>>
>>> MPM.add(createIndVarSimplifyPass());        // Canonicalize indvars
>>>
>>> MPM.add(createLoopUnrollPass());
>>>
>>>
>>> Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder
>>> function, where it is unable to calculate the BackEdge taken
amount.
>>>
>>> Can anybody explain what is need to get the outer loop unrolled by
2? It
>>> would be a great help.
>>>
>>>
>>> Well, I can at least explain what is happening... runtime unrolling
>>> needs to be able to symbolically compute the trip count to avoid
inserting
>>> a branch after every iteration.  SCEV isn't able to prove that
your loop
>>> isn't an infinite loop (consider the case of
vectorizable_elements==SIZE_MAX),
>>> therefore it can't compute the trip count.  Therefore, we
don't unroll.
>>>
>>> There's a few different angles you could use to attack this:
you could
>>> teach the unroller to unroll loops with an uncomputable trip count,
or you
>>> can make the trip count of your loop computable somehow.  Changing
the
>>> unroller is probably straightforward (see the recently committed
r284044).
>>> Making the trip count computable is more complicated... it's
probably
>>> possible to teach SCEV to reason about the overflow in the pointer
>>> computation, or maybe you could version the loop.
>>>
>>> -Eli
>>>
>>> --
>>> Employee of Qualcomm Innovation Center, Inc.
>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
>>>
>>>
>>
>>
>> --
>> Kind regards,
>> Charith Mendis
>>
>> Graduate Student,
>> CSAIL,
>> Massachusetts Institute of Technology
>>
>
>
>
> --
> Kind regards,
> Charith Mendis
>
> Graduate Student,
> CSAIL,
> Massachusetts Institute of Technology
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

-- 
*Alexandre Isoard*
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161013/355c93cc/attachment.html>

Charith Mendis via llvm-dev

2016-Oct-13 17:57 UTC

head link

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

Oh I see, the original loop may end normally, but by unrolling it may
induce an infinite loop.

On Thursday, October 13, 2016, Alexandre Isoard <alexandre.isoard at
gmail.com>
wrote:
> If count > MAX_UINT-4 your loop loops indefinitely with an increment of
4,
> I think.
>
> On Thu, Oct 13, 2016 at 4:42 PM, Charith Mendis via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> So, I tried unrolling the following simple loop.
>>
>> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){
>>
>>   for(unsigned i=0; i<count; i++){
>>
>>     a[i] = b[i]*c[i-1];
>>
>>   }
>>
>>   return 0;
>>
>> }
>>
>>
>> Then, the unroller is able to unroll it by 2 even though it doesn't
know
>> about the range of count. SCEV of backedge taken count is (-1 + %count)
>>
>>
>> But, when I change the increment to 4, as in
>>
>>
>> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){
>>
>>   for(unsigned i=0; i<count; i+=4){
>>
>>     a[i] = b[i]*c[i-1];
>>
>>   }
>>
>>   return 0;
>>
>> }
>>
>>
>>  The unroller cannot compute the backedge taken count. Therefore, it
>> seems like the problem is not with the range of "count",
can't the unroller
>> compute it as (- 1 + %count / 4)?
>>
>> On Wed, Oct 12, 2016 at 11:28 PM, Charith Mendis <
>> char.mendis1989 at gmail.com> wrote:
>>
>>> Thanks for the explanation. But I am a little confused with the
>>> following fact. Can't LLVM keep vectorizable_elements as a
symbolic value
>>> and convert the loop to say;
>>>
>>> for(unsigned i = 0; i < vectorizable_elements  ; i += 2){
>>>     //main loop
>>> }
>>>
>>> for(unsigned i=0 ; i < vectorizable_elements % 2; i++){
>>>    //fix up
>>> }
>>>
>>> Why does it have to reason about the range of
vectorizable_elements?
>>> Even if vectorizable_elements == SIZE_MAX the above decomposition
would
>>> work?
>>>
>>> On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at
codeaurora.org>
>>> wrote:
>>>
>>>> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Attached herewith is a simple vectorized function with loops
performing
>>>> a simple shuffle.
>>>>
>>>> I want all loops (inner and outer) to be unrolled by 2 and as
such used
>>>> -unroll-count=2
>>>> The inner loops(with k as the induction variable and having
constant
>>>> trip counts) unroll fully, but the outer loop with (j) fails to
unroll.
>>>>
>>>> The llvm code is also attached with inner loops fully unrolled.
>>>>
>>>> To inspect further, I added the following to the
PassManagerBuilder.cpp
>>>> to run some canonicalization routines and redo unrolling again.
I have set
>>>> partial unrolling on + have a huge threshold + allows expensive
loop trip
>>>> counts. Still it didn't unroll by 2.
>>>>
>>>> MPM.add(createLoopUnrollPass());
>>>>
>>>> MPM.add(createCFGSimplificationPass());
>>>>
>>>> MPM.add(createLoopSimplifyPass());
>>>>
>>>> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));
>>>>
>>>> MPM.add(createLCSSAPass());
>>>>
>>>> MPM.add(createIndVarSimplifyPass());        // Canonicalize
indvars
>>>>
>>>> MPM.add(createLoopUnrollPass());
>>>>
>>>>
>>>> Digging deeper I found, that it fails in
UnrollRuntimeLoopRemainder
>>>> function, where it is unable to calculate the BackEdge taken
amount.
>>>>
>>>> Can anybody explain what is need to get the outer loop unrolled
by 2?
>>>> It would be a great help.
>>>>
>>>>
>>>> Well, I can at least explain what is happening... runtime
unrolling
>>>> needs to be able to symbolically compute the trip count to
avoid inserting
>>>> a branch after every iteration.  SCEV isn't able to prove
that your loop
>>>> isn't an infinite loop (consider the case of
vectorizable_elements==SIZE_MAX),
>>>> therefore it can't compute the trip count.  Therefore, we
don't unroll.
>>>>
>>>> There's a few different angles you could use to attack
this: you could
>>>> teach the unroller to unroll loops with an uncomputable trip
count, or you
>>>> can make the trip count of your loop computable somehow. 
Changing the
>>>> unroller is probably straightforward (see the recently
committed r284044).
>>>> Making the trip count computable is more complicated...
it's probably
>>>> possible to teach SCEV to reason about the overflow in the
pointer
>>>> computation, or maybe you could version the loop.
>>>>
>>>> -Eli
>>>>
>>>> --
>>>> Employee of Qualcomm Innovation Center, Inc.
>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora
Forum, a Linux Foundation Collaborative Project
>>>>
>>>>
>>>
>>>
>>> --
>>> Kind regards,
>>> Charith Mendis
>>>
>>> Graduate Student,
>>> CSAIL,
>>> Massachusetts Institute of Technology
>>>
>>
>>
>>
>> --
>> Kind regards,
>> Charith Mendis
>>
>> Graduate Student,
>> CSAIL,
>> Massachusetts Institute of Technology
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>>
>
>
> --
> *Alexandre Isoard*
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161013/532e1991/attachment.html>

Ehsan Amiri via llvm-dev

2016-Oct-13 21:50 UTC

head link

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

On Thu, Oct 13, 2016 at 1:57 PM, Charith Mendis via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Oh I see, the original loop may end normally, but by unrolling it may
> induce an infinite loop.
>
>No. The problem is that step of the original loop is not 1.

  for(unsigned i=0; i<count; i+=4){
    a[i] = b[i]*c[i-1];
  }

Let's assume unsigned is a 32 bit integer. Then maximum unsigned number
will be 2^32 - 1. Let count = 2^32 - 1. When the loop iterates at some
point  we will have i = 2^32 - 4. What is the value of i in the next
iteration? 2^32 cannot be represented in a 32 bit integer, so what happens
is that you will wrap around and in the next iteration you will have i = 0.
So your original loop may be infinte. You can confirm this by compiling and
running the following program

#include <climits>
#include <iostream>

using namespace std;

int main() {

  for (int i = 0; i < UINT_MAX; i += 4) {
    if (i == 0)
      cout << "i == 0!" << endl;
  }

  return 0;
}








>
> On Thursday, October 13, 2016, Alexandre Isoard <
> alexandre.isoard at gmail.com> wrote:
>
>> If count > MAX_UINT-4 your loop loops indefinitely with an increment
of
>> 4, I think.
>>
>> On Thu, Oct 13, 2016 at 4:42 PM, Charith Mendis via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> So, I tried unrolling the following simple loop.
>>>
>>> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned
count){
>>>
>>>   for(unsigned i=0; i<count; i++){
>>>
>>>     a[i] = b[i]*c[i-1];
>>>
>>>   }
>>>
>>>   return 0;
>>>
>>> }
>>>
>>>
>>> Then, the unroller is able to unroll it by 2 even though it
doesn't know
>>> about the range of count. SCEV of backedge taken count is (-1 +
%count)
>>>
>>>
>>> But, when I change the increment to 4, as in
>>>
>>>
>>> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned
count){
>>>
>>>   for(unsigned i=0; i<count; i+=4){
>>>
>>>     a[i] = b[i]*c[i-1];
>>>
>>>   }
>>>
>>>   return 0;
>>>
>>> }
>>>
>>>
>>>  The unroller cannot compute the backedge taken count. Therefore,
it
>>> seems like the problem is not with the range of "count",
can't the unroller
>>> compute it as (- 1 + %count / 4)?
>>>
>>> On Wed, Oct 12, 2016 at 11:28 PM, Charith Mendis <
>>> char.mendis1989 at gmail.com> wrote:
>>>
>>>> Thanks for the explanation. But I am a little confused with the
>>>> following fact. Can't LLVM keep vectorizable_elements as a
symbolic value
>>>> and convert the loop to say;
>>>>
>>>> for(unsigned i = 0; i < vectorizable_elements  ; i += 2){
>>>>     //main loop
>>>> }
>>>>
>>>> for(unsigned i=0 ; i < vectorizable_elements % 2; i++){
>>>>    //fix up
>>>> }
>>>>
>>>> Why does it have to reason about the range of
vectorizable_elements?
>>>> Even if vectorizable_elements == SIZE_MAX the above
decomposition would
>>>> work?
>>>>
>>>> On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at
codeaurora.org
>>>> > wrote:
>>>>
>>>>> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote:
>>>>>
>>>>> Hi all,
>>>>>
>>>>> Attached herewith is a simple vectorized function with
loops
>>>>> performing a simple shuffle.
>>>>>
>>>>> I want all loops (inner and outer) to be unrolled by 2 and
as such
>>>>> used -unroll-count=2
>>>>> The inner loops(with k as the induction variable and having
constant
>>>>> trip counts) unroll fully, but the outer loop with (j)
fails to unroll.
>>>>>
>>>>> The llvm code is also attached with inner loops fully
unrolled.
>>>>>
>>>>> To inspect further, I added the following to the
>>>>> PassManagerBuilder.cpp to run some canonicalization
routines and redo
>>>>> unrolling again. I have set partial unrolling on + have a
huge threshold +
>>>>> allows expensive loop trip counts. Still it didn't
unroll by 2.
>>>>>
>>>>> MPM.add(createLoopUnrollPass());
>>>>>
>>>>> MPM.add(createCFGSimplificationPass());
>>>>>
>>>>> MPM.add(createLoopSimplifyPass());
>>>>>
>>>>> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));
>>>>>
>>>>> MPM.add(createLCSSAPass());
>>>>>
>>>>> MPM.add(createIndVarSimplifyPass());        // Canonicalize
indvars
>>>>>
>>>>> MPM.add(createLoopUnrollPass());
>>>>>
>>>>>
>>>>> Digging deeper I found, that it fails in
UnrollRuntimeLoopRemainder
>>>>> function, where it is unable to calculate the BackEdge
taken amount.
>>>>>
>>>>> Can anybody explain what is need to get the outer loop
unrolled by 2?
>>>>> It would be a great help.
>>>>>
>>>>>
>>>>> Well, I can at least explain what is happening... runtime
unrolling
>>>>> needs to be able to symbolically compute the trip count to
avoid inserting
>>>>> a branch after every iteration.  SCEV isn't able to
prove that your loop
>>>>> isn't an infinite loop (consider the case of
vectorizable_elements==SIZE_MAX),
>>>>> therefore it can't compute the trip count.  Therefore,
we don't unroll.
>>>>>
>>>>> There's a few different angles you could use to attack
this: you could
>>>>> teach the unroller to unroll loops with an uncomputable
trip count, or you
>>>>> can make the trip count of your loop computable somehow. 
Changing the
>>>>> unroller is probably straightforward (see the recently
committed r284044).
>>>>> Making the trip count computable is more complicated...
it's probably
>>>>> possible to teach SCEV to reason about the overflow in the
pointer
>>>>> computation, or maybe you could version the loop.
>>>>>
>>>>> -Eli
>>>>>
>>>>> --
>>>>> Employee of Qualcomm Innovation Center, Inc.
>>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora
Forum, a Linux Foundation Collaborative Project
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Kind regards,
>>>> Charith Mendis
>>>>
>>>> Graduate Student,
>>>> CSAIL,
>>>> Massachusetts Institute of Technology
>>>>
>>>
>>>
>>>
>>> --
>>> Kind regards,
>>> Charith Mendis
>>>
>>> Graduate Student,
>>> CSAIL,
>>> Massachusetts Institute of Technology
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>
>>
>> --
>> *Alexandre Isoard*
>>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161013/a8e3f5c5/attachment.html>

llvm dev - Oct 2016 - Loop Unrolling Fail in Simple Vectorized loop

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop