thr3ads.net - llvm dev - [llvm-dev] Loop Unrolling Fail in Simple Vectorized loop [Oct 2016]

If this information is useful, please help other people find it:
Share via:

Charith Mendis via llvm-dev

2016-Oct-12 23:35 UTC

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

Hi all,

Attached herewith is a simple vectorized function with loops performing a
simple shuffle.

I want all loops (inner and outer) to be unrolled by 2 and as such used
-unroll-count=2
The inner loops(with k as the induction variable and having constant trip
counts) unroll fully, but the outer loop with (j) fails to unroll.

The llvm code is also attached with inner loops fully unrolled.

To inspect further, I added the following to the PassManagerBuilder.cpp to
run some canonicalization routines and redo unrolling again. I have set
partial unrolling on + have a huge threshold + allows expensive loop trip
counts. Still it didn't unroll by 2.

MPM.add(createLoopUnrollPass());

MPM.add(createCFGSimplificationPass());

MPM.add(createLoopSimplifyPass());

MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));

MPM.add(createLCSSAPass());

MPM.add(createIndVarSimplifyPass());        // Canonicalize indvars

MPM.add(createLoopUnrollPass());


Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder
function, where it is unable to calculate the BackEdge taken amount.

Can anybody explain what is need to get the outer loop unrolled by 2? It
would be a great help.

Thanks.
-- 
Kind regards,
Charith Mendis

Graduate Student,
CSAIL,
Massachusetts Institute of Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/1efd6500/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test_unroll_sse2.c
Type: text/x-csrc
Size: 1431 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/1efd6500/attachment.c>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: out.ll
Type: application/octet-stream
Size: 6090 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/1efd6500/attachment.obj>

Friedman, Eli via llvm-dev

2016-Oct-13 00:25 UTC

head link

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev
wrote:> Hi all,
>
> Attached herewith is a simple vectorized function with loops 
> performing a simple shuffle.
>
> I want all loops (inner and outer) to be unrolled by 2 and as such 
> used -unroll-count=2
> The inner loops(with k as the induction variable and having constant 
> trip counts) unroll fully, but the outer loop with (j) fails to unroll.
>
> The llvm code is also attached with inner loops fully unrolled.
>
> To inspect further, I added the following to the 
> PassManagerBuilder.cpp to run some canonicalization routines and redo 
> unrolling again. I have set partial unrolling on + have a huge 
> threshold + allows expensive loop trip counts. Still it didn't unroll 
> by 2.
>
> MPM.add(createLoopUnrollPass());
>
> MPM.add(createCFGSimplificationPass());
>
> MPM.add(createLoopSimplifyPass());
>
> MPM.add(createLoopRotatePass(SizeLevel == 2? 0: -1));
>
> MPM.add(createLCSSAPass());
>
> MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars
>
> MPM.add(createLoopUnrollPass());
>
>
>
> Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder 
> function, where it is unable to calculate the BackEdge taken amount.
>
> Can anybody explain what is need to get the outer loop unrolled by 2? 
> It would be a great help.
Well, I can at least explain what is happening... runtime unrolling 
needs to be able to symbolically compute the trip count to avoid 
inserting a branch after every iteration.  SCEV isn't able to prove that 
your loop isn't an infinite loop (consider the case of 
vectorizable_elements==SIZE_MAX), therefore it can't compute the trip 
count.  Therefore, we don't unroll.

There's a few different angles you could use to attack this: you could 
teach the unroller to unroll loops with an uncomputable trip count, or 
you can make the trip count of your loop computable somehow.  Changing 
the unroller is probably straightforward (see the recently committed 
r284044).  Making the trip count computable is more complicated... it's 
probably possible to teach SCEV to reason about the overflow in the 
pointer computation, or maybe you could version the loop.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/bb6bc4ad/attachment.html>

Charith Mendis via llvm-dev

2016-Oct-13 03:28 UTC

head link

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

Thanks for the explanation. But I am a little confused with the following
fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert
the loop to say;

for(unsigned i = 0; i < vectorizable_elements  ; i += 2){
    //main loop
}

for(unsigned i=0 ; i < vectorizable_elements % 2; i++){
   //fix up
}

Why does it have to reason about the range of vectorizable_elements? Even
if vectorizable_elements == SIZE_MAX the above decomposition would work?

On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at
codeaurora.org>
wrote:
> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote:
>
> Hi all,
>
> Attached herewith is a simple vectorized function with loops performing a
> simple shuffle.
>
> I want all loops (inner and outer) to be unrolled by 2 and as such used
> -unroll-count=2
> The inner loops(with k as the induction variable and having constant trip
> counts) unroll fully, but the outer loop with (j) fails to unroll.
>
> The llvm code is also attached with inner loops fully unrolled.
>
> To inspect further, I added the following to the PassManagerBuilder.cpp to
> run some canonicalization routines and redo unrolling again. I have set
> partial unrolling on + have a huge threshold + allows expensive loop trip
> counts. Still it didn't unroll by 2.
>
> MPM.add(createLoopUnrollPass());
>
> MPM.add(createCFGSimplificationPass());
>
> MPM.add(createLoopSimplifyPass());
>
> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1));
>
> MPM.add(createLCSSAPass());
>
> MPM.add(createIndVarSimplifyPass());        // Canonicalize indvars
>
> MPM.add(createLoopUnrollPass());
>
>
> Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder
> function, where it is unable to calculate the BackEdge taken amount.
>
> Can anybody explain what is need to get the outer loop unrolled by 2? It
> would be a great help.
>
>
> Well, I can at least explain what is happening... runtime unrolling needs
> to be able to symbolically compute the trip count to avoid inserting a
> branch after every iteration.  SCEV isn't able to prove that your loop
> isn't an infinite loop (consider the case of
vectorizable_elements==SIZE_MAX),
> therefore it can't compute the trip count.  Therefore, we don't
unroll.
>
> There's a few different angles you could use to attack this: you could
> teach the unroller to unroll loops with an uncomputable trip count, or you
> can make the trip count of your loop computable somehow.  Changing the
> unroller is probably straightforward (see the recently committed r284044).
> Making the trip count computable is more complicated... it's probably
> possible to teach SCEV to reason about the overflow in the pointer
> computation, or maybe you could version the loop.
>
> -Eli
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project
>
>

-- 
Kind regards,
Charith Mendis

Graduate Student,
CSAIL,
Massachusetts Institute of Technology
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/13e89dc9/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Oct 2016 - Loop Unrolling Fail in Simple Vectorized loop

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop

Possibly Parallel Threads