thr3ads.net - llvm dev - [LLVMdev] Partial loop unrolling [Jul 2014]

If this information is useful, please help other people find it:
Share via:

Mahesha S

2014-Jul-15 11:34 UTC

[LLVMdev] Partial loop unrolling

Hi,

PS: It is a generic question related to partial loop unrolling, and nothing
specific to LLVM.

As far as partial loop unrolling is concerned, I could see following three
different possibilities. Assume that unroll factor is 3.

Original loop:
      for (i = 0; i < 10; i++)
      {
           do_foo(i);
      }

1. First possibility
      i = 0;
      do_foo(i++);
      do_foo(i++);
      do_foo(i++);
      for (; i < 10; i++)
      {
           do_foo(i);
      }

2. Second possibility
      for (i = 0; i < 7; i++)
      {
           do_foo(i);
      }
      do_foo(i++);
      do_foo(i++);
      do_foo(i++);

3. Third possibility
      for (i = 0; i < 10;)
      {
           do_foo(i++);
           do_foo(i++);
           do_foo(i++);
      }

My questions are:

a. In case, if we get any performance improvement due to partial loop
unrolling, are all the three possibilities give almost same performance
improvement?

b. If answer to question 'a' is 'no', then which one of these
three
possibilities is ideal for generic partial unrolling implementation? and in
general, which one of these is implemented in production compilers? and in
particular, which one of these is implemented in LLVM compiler?


-- 
mahesha
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140715/3e3819bd/attachment.html>

Cristianno Martins

2014-Jul-15 12:08 UTC

head link

[LLVMdev] Partial loop unrolling

Hello Mahesha,

Well, if I had to guess, I would say the third option is the best one. You
see, loop unrolling's main goal (as I see it) is not to generate a better
code, but to expose the potential of the code to other kinds of
optimizations within the compiler* -- increasing the size of the code in
the process. If you consider that, the first two options ended up
increasing the size of the program, but any new simplifications or
optimizations to the code will be applied to a part of it that runs only
once -- notice that the code inside the loop is, potentially, the one that
executes most times, but the only difference between the original code and
those ones is found outside the loop.

* There is, in fact, another interesting function for unrolling: if the
upper limit of the loop is known during compilation-time and it is a very
small value, it could be interesting to substitute the whole loop for all
the necessary calls to do_foo -- this way, you can remove all the
loop-related code from your program (conditions, variables, operations over
the induction variable, etc), and still have new opportunities to optimize
the code.

Hope I could help,

--
Cristianno Martins
PhD Student of Computer Science
University of Campinas
cmartins at ic.unicamp.br
 <cristiannomartins at hotmail.com>

On Tue, Jul 15, 2014 at 8:34 AM, Mahesha S <mahesha.llvm at gmail.com>
wrote:
> Hi,
>
> PS: It is a generic question related to partial loop unrolling, and
> nothing specific to LLVM.
>
> As far as partial loop unrolling is concerned, I could see following three
> different possibilities. Assume that unroll factor is 3.
>
> Original loop:
>       for (i = 0; i < 10; i++)
>       {
>            do_foo(i);
>       }
>
> 1. First possibility
>       i = 0;
>       do_foo(i++);
>       do_foo(i++);
>       do_foo(i++);
>       for (; i < 10; i++)
>       {
>            do_foo(i);
>       }
>
> 2. Second possibility
>       for (i = 0; i < 7; i++)
>       {
>            do_foo(i);
>       }
>       do_foo(i++);
>       do_foo(i++);
>       do_foo(i++);
>
> 3. Third possibility
>       for (i = 0; i < 10;)
>       {
>            do_foo(i++);
>            do_foo(i++);
>            do_foo(i++);
>       }
>
> My questions are:
>
> a. In case, if we get any performance improvement due to partial loop
> unrolling, are all the three possibilities give almost same performance
> improvement?
>
> b. If answer to question 'a' is 'no', then which one of
these three
> possibilities is ideal for generic partial unrolling implementation? and in
> general, which one of these is implemented in production compilers? and in
> particular, which one of these is implemented in LLVM compiler?
>
>
> --
> mahesha
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140715/870bf54b/attachment.html>

Renato Golin

2014-Jul-15 12:37 UTC

head link

[LLVMdev] Partial loop unrolling

On 15 July 2014 13:08, Cristianno Martins <cristiannomartins at gmail.com>
wrote:> * There is, in fact, another interesting function for unrolling: if the
> upper limit of the loop is known during compilation-time and it is a very
> small value, it could be interesting to substitute the whole loop for all
> the necessary calls to do_foo
This is also the case for re-rolling, where the loop is unrolled in
the "wrong way", and re-rolling, than unrolling will expose
paralellism.

but there are others:

3. join all loads and stores into a block and hide the delays in
between the loop cycles.

Reading a large block of data is almost as efficient as reading a
small one, so a totally rolled loop will have read-op-write while a
partially unrolled loop will have read-op-op-op-write, saving two
reads and two writes every three.

4. Align vectorized loops

Vectorization will often partially unroll the loop (like your 1 and 3
examples) to make the loop aligned to the memory constraints (ex, if
the pointer or if the loop count is unaligned, etc).

AFAIK, all vectorizing compilers, including LLVM, do all of them. It
depends more on what you want and how's your original loop than
generic goodness.

cheers,
--renato

Krzysztof Parzyszek

2014-Jul-16 17:30 UTC

head link

[LLVMdev] Partial loop unrolling

On 7/15/2014 6:34 AM, Mahesha S wrote:>
> As far as partial loop unrolling is concerned, I could see following
> three different possibilities. Assume that unroll factor is 3.
>
> Original loop:
>        for (i = 0; i < 10; i++)
>        {
>             do_foo(i);
>        }
>
> 1. First possibility
>        i = 0;
>        do_foo(i++);
>        do_foo(i++);
>        do_foo(i++);
>        for (; i < 10; i++)
>        {
>             do_foo(i);
>        }
This is really peeling, not unrolling.

> 2. Second possibility
>        for (i = 0; i < 7; i++)
>        {
>             do_foo(i);
>        }
>        do_foo(i++);
>        do_foo(i++);
>        do_foo(i++);
Same as above.

> 3. Third possibility
>        for (i = 0; i < 10;)
>        {
>             do_foo(i++);
>             do_foo(i++);
>             do_foo(i++);
>        }
This one is unrolled, but incorrectly.


-Krzysztof


-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Mahesha S

2014-Jul-17 09:25 UTC

head link

[LLVMdev] Partial loop unrolling

On Wed, Jul 16, 2014 at 11:00 PM, Krzysztof Parzyszek <
kparzysz at codeaurora.org> wrote:
> On 7/15/2014 6:34 AM, Mahesha S wrote:
>
>>
>> As far as partial loop unrolling is concerned, I could see following
>> three different possibilities. Assume that unroll factor is 3.
>>
>> Original loop:
>>        for (i = 0; i < 10; i++)
>>        {
>>             do_foo(i);
>>        }
>>
>> 1. First possibility
>>        i = 0;
>>        do_foo(i++);
>>        do_foo(i++);
>>        do_foo(i++);
>>        for (; i < 10; i++)
>>        {
>>             do_foo(i);
>>        }
>>
>
> This is really peeling, not unrolling.
>
>
>
>  2. Second possibility
>>        for (i = 0; i < 7; i++)
>>        {
>>             do_foo(i);
>>        }
>>        do_foo(i++);
>>        do_foo(i++);
>>        do_foo(i++);
>>
>
> Same as above.


I understand. I must agree that, I am newbee when it comes to compiler
optimization techniques and nomenclature
>
>
>
>  3. Third possibility
>>        for (i = 0; i < 10;)
>>        {
>>             do_foo(i++);
>>             do_foo(i++);
>>             do_foo(i++);
>>        }
>>
>
> This one is unrolled, but incorrectly.


I think I understood the mistake here. In fact, I noticed it just after
shooting my initial mail. One of the correct ways may be below one.


3. Third possibility
       for (i = 0; i < 10;)
       {
            do_foo(i++);
            do_foo(i++);
            do_foo(i++);
            do_foo(i++);
            do_foo(i++);
       }


--
mahesha


>
>
> -Krzysztof
>
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted
> by The Linux Foundation
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>


-- 
mahesha
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140717/47acdaca/attachment.html>

llvm dev - Jul 2014 - [LLVMdev] Partial loop unrolling

[LLVMdev] Partial loop unrolling

[LLVMdev] Partial loop unrolling

[LLVMdev] Partial loop unrolling

[LLVMdev] Partial loop unrolling

[LLVMdev] Partial loop unrolling