Hi, PS: It is a generic question related to partial loop unrolling, and nothing specific to LLVM. As far as partial loop unrolling is concerned, I could see following three different possibilities. Assume that unroll factor is 3. Original loop: for (i = 0; i < 10; i++) { do_foo(i); } 1. First possibility i = 0; do_foo(i++); do_foo(i++); do_foo(i++); for (; i < 10; i++) { do_foo(i); } 2. Second possibility for (i = 0; i < 7; i++) { do_foo(i); } do_foo(i++); do_foo(i++); do_foo(i++); 3. Third possibility for (i = 0; i < 10;) { do_foo(i++); do_foo(i++); do_foo(i++); } My questions are: a. In case, if we get any performance improvement due to partial loop unrolling, are all the three possibilities give almost same performance improvement? b. If answer to question 'a' is 'no', then which one of these three possibilities is ideal for generic partial unrolling implementation? and in general, which one of these is implemented in production compilers? and in particular, which one of these is implemented in LLVM compiler? -- mahesha -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140715/3e3819bd/attachment.html>
Hello Mahesha, Well, if I had to guess, I would say the third option is the best one. You see, loop unrolling's main goal (as I see it) is not to generate a better code, but to expose the potential of the code to other kinds of optimizations within the compiler* -- increasing the size of the code in the process. If you consider that, the first two options ended up increasing the size of the program, but any new simplifications or optimizations to the code will be applied to a part of it that runs only once -- notice that the code inside the loop is, potentially, the one that executes most times, but the only difference between the original code and those ones is found outside the loop. * There is, in fact, another interesting function for unrolling: if the upper limit of the loop is known during compilation-time and it is a very small value, it could be interesting to substitute the whole loop for all the necessary calls to do_foo -- this way, you can remove all the loop-related code from your program (conditions, variables, operations over the induction variable, etc), and still have new opportunities to optimize the code. Hope I could help, -- Cristianno Martins PhD Student of Computer Science University of Campinas cmartins at ic.unicamp.br <cristiannomartins at hotmail.com> On Tue, Jul 15, 2014 at 8:34 AM, Mahesha S <mahesha.llvm at gmail.com> wrote:> Hi, > > PS: It is a generic question related to partial loop unrolling, and > nothing specific to LLVM. > > As far as partial loop unrolling is concerned, I could see following three > different possibilities. Assume that unroll factor is 3. > > Original loop: > for (i = 0; i < 10; i++) > { > do_foo(i); > } > > 1. First possibility > i = 0; > do_foo(i++); > do_foo(i++); > do_foo(i++); > for (; i < 10; i++) > { > do_foo(i); > } > > 2. Second possibility > for (i = 0; i < 7; i++) > { > do_foo(i); > } > do_foo(i++); > do_foo(i++); > do_foo(i++); > > 3. Third possibility > for (i = 0; i < 10;) > { > do_foo(i++); > do_foo(i++); > do_foo(i++); > } > > My questions are: > > a. In case, if we get any performance improvement due to partial loop > unrolling, are all the three possibilities give almost same performance > improvement? > > b. If answer to question 'a' is 'no', then which one of these three > possibilities is ideal for generic partial unrolling implementation? and in > general, which one of these is implemented in production compilers? and in > particular, which one of these is implemented in LLVM compiler? > > > -- > mahesha > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140715/870bf54b/attachment.html>
On 15 July 2014 13:08, Cristianno Martins <cristiannomartins at gmail.com> wrote:> * There is, in fact, another interesting function for unrolling: if the > upper limit of the loop is known during compilation-time and it is a very > small value, it could be interesting to substitute the whole loop for all > the necessary calls to do_fooThis is also the case for re-rolling, where the loop is unrolled in the "wrong way", and re-rolling, than unrolling will expose paralellism. but there are others: 3. join all loads and stores into a block and hide the delays in between the loop cycles. Reading a large block of data is almost as efficient as reading a small one, so a totally rolled loop will have read-op-write while a partially unrolled loop will have read-op-op-op-write, saving two reads and two writes every three. 4. Align vectorized loops Vectorization will often partially unroll the loop (like your 1 and 3 examples) to make the loop aligned to the memory constraints (ex, if the pointer or if the loop count is unaligned, etc). AFAIK, all vectorizing compilers, including LLVM, do all of them. It depends more on what you want and how's your original loop than generic goodness. cheers, --renato
On 7/15/2014 6:34 AM, Mahesha S wrote:> > As far as partial loop unrolling is concerned, I could see following > three different possibilities. Assume that unroll factor is 3. > > Original loop: > for (i = 0; i < 10; i++) > { > do_foo(i); > } > > 1. First possibility > i = 0; > do_foo(i++); > do_foo(i++); > do_foo(i++); > for (; i < 10; i++) > { > do_foo(i); > }This is really peeling, not unrolling.> 2. Second possibility > for (i = 0; i < 7; i++) > { > do_foo(i); > } > do_foo(i++); > do_foo(i++); > do_foo(i++);Same as above.> 3. Third possibility > for (i = 0; i < 10;) > { > do_foo(i++); > do_foo(i++); > do_foo(i++); > }This one is unrolled, but incorrectly. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
On Wed, Jul 16, 2014 at 11:00 PM, Krzysztof Parzyszek < kparzysz at codeaurora.org> wrote:> On 7/15/2014 6:34 AM, Mahesha S wrote: > >> >> As far as partial loop unrolling is concerned, I could see following >> three different possibilities. Assume that unroll factor is 3. >> >> Original loop: >> for (i = 0; i < 10; i++) >> { >> do_foo(i); >> } >> >> 1. First possibility >> i = 0; >> do_foo(i++); >> do_foo(i++); >> do_foo(i++); >> for (; i < 10; i++) >> { >> do_foo(i); >> } >> > > This is really peeling, not unrolling. > > > > 2. Second possibility >> for (i = 0; i < 7; i++) >> { >> do_foo(i); >> } >> do_foo(i++); >> do_foo(i++); >> do_foo(i++); >> > > Same as above.I understand. I must agree that, I am newbee when it comes to compiler optimization techniques and nomenclature> > > > 3. Third possibility >> for (i = 0; i < 10;) >> { >> do_foo(i++); >> do_foo(i++); >> do_foo(i++); >> } >> > > This one is unrolled, but incorrectly.I think I understood the mistake here. In fact, I noticed it just after shooting my initial mail. One of the correct ways may be below one. 3. Third possibility for (i = 0; i < 10;) { do_foo(i++); do_foo(i++); do_foo(i++); do_foo(i++); do_foo(i++); } -- mahesha> > > -Krzysztof > > > -- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by The Linux Foundation > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- mahesha -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140717/47acdaca/attachment.html>