Charith Mendis via llvm-dev
2016-Oct-13 03:28 UTC
[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop
Thanks for the explanation. But I am a little confused with the following fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert the loop to say; for(unsigned i = 0; i < vectorizable_elements ; i += 2){ //main loop } for(unsigned i=0 ; i < vectorizable_elements % 2; i++){ //fix up } Why does it have to reason about the range of vectorizable_elements? Even if vectorizable_elements == SIZE_MAX the above decomposition would work? On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at codeaurora.org> wrote:> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote: > > Hi all, > > Attached herewith is a simple vectorized function with loops performing a > simple shuffle. > > I want all loops (inner and outer) to be unrolled by 2 and as such used > -unroll-count=2 > The inner loops(with k as the induction variable and having constant trip > counts) unroll fully, but the outer loop with (j) fails to unroll. > > The llvm code is also attached with inner loops fully unrolled. > > To inspect further, I added the following to the PassManagerBuilder.cpp to > run some canonicalization routines and redo unrolling again. I have set > partial unrolling on + have a huge threshold + allows expensive loop trip > counts. Still it didn't unroll by 2. > > MPM.add(createLoopUnrollPass()); > > MPM.add(createCFGSimplificationPass()); > > MPM.add(createLoopSimplifyPass()); > > MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1)); > > MPM.add(createLCSSAPass()); > > MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars > > MPM.add(createLoopUnrollPass()); > > > Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder > function, where it is unable to calculate the BackEdge taken amount. > > Can anybody explain what is need to get the outer loop unrolled by 2? It > would be a great help. > > > Well, I can at least explain what is happening... runtime unrolling needs > to be able to symbolically compute the trip count to avoid inserting a > branch after every iteration. SCEV isn't able to prove that your loop > isn't an infinite loop (consider the case of vectorizable_elements==SIZE_MAX), > therefore it can't compute the trip count. Therefore, we don't unroll. > > There's a few different angles you could use to attack this: you could > teach the unroller to unroll loops with an uncomputable trip count, or you > can make the trip count of your loop computable somehow. Changing the > unroller is probably straightforward (see the recently committed r284044). > Making the trip count computable is more complicated... it's probably > possible to teach SCEV to reason about the overflow in the pointer > computation, or maybe you could version the loop. > > -Eli > > -- > Employee of Qualcomm Innovation Center, Inc. > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project > >-- Kind regards, Charith Mendis Graduate Student, CSAIL, Massachusetts Institute of Technology -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161012/13e89dc9/attachment.html>
Charith Mendis via llvm-dev
2016-Oct-13 15:42 UTC
[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop
So, I tried unrolling the following simple loop. int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ for(unsigned i=0; i<count; i++){ a[i] = b[i]*c[i-1]; } return 0; } Then, the unroller is able to unroll it by 2 even though it doesn't know about the range of count. SCEV of backedge taken count is (-1 + %count) But, when I change the increment to 4, as in int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ for(unsigned i=0; i<count; i+=4){ a[i] = b[i]*c[i-1]; } return 0; } The unroller cannot compute the backedge taken count. Therefore, it seems like the problem is not with the range of "count", can't the unroller compute it as (- 1 + %count / 4)? On Wed, Oct 12, 2016 at 11:28 PM, Charith Mendis <char.mendis1989 at gmail.com> wrote:> Thanks for the explanation. But I am a little confused with the following > fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert > the loop to say; > > for(unsigned i = 0; i < vectorizable_elements ; i += 2){ > //main loop > } > > for(unsigned i=0 ; i < vectorizable_elements % 2; i++){ > //fix up > } > > Why does it have to reason about the range of vectorizable_elements? Even > if vectorizable_elements == SIZE_MAX the above decomposition would work? > > On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at codeaurora.org> > wrote: > >> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote: >> >> Hi all, >> >> Attached herewith is a simple vectorized function with loops performing a >> simple shuffle. >> >> I want all loops (inner and outer) to be unrolled by 2 and as such used >> -unroll-count=2 >> The inner loops(with k as the induction variable and having constant trip >> counts) unroll fully, but the outer loop with (j) fails to unroll. >> >> The llvm code is also attached with inner loops fully unrolled. >> >> To inspect further, I added the following to the PassManagerBuilder.cpp >> to run some canonicalization routines and redo unrolling again. I have set >> partial unrolling on + have a huge threshold + allows expensive loop trip >> counts. Still it didn't unroll by 2. >> >> MPM.add(createLoopUnrollPass()); >> >> MPM.add(createCFGSimplificationPass()); >> >> MPM.add(createLoopSimplifyPass()); >> >> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1)); >> >> MPM.add(createLCSSAPass()); >> >> MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars >> >> MPM.add(createLoopUnrollPass()); >> >> >> Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder >> function, where it is unable to calculate the BackEdge taken amount. >> >> Can anybody explain what is need to get the outer loop unrolled by 2? It >> would be a great help. >> >> >> Well, I can at least explain what is happening... runtime unrolling needs >> to be able to symbolically compute the trip count to avoid inserting a >> branch after every iteration. SCEV isn't able to prove that your loop >> isn't an infinite loop (consider the case of vectorizable_elements==SIZE_MAX), >> therefore it can't compute the trip count. Therefore, we don't unroll. >> >> There's a few different angles you could use to attack this: you could >> teach the unroller to unroll loops with an uncomputable trip count, or you >> can make the trip count of your loop computable somehow. Changing the >> unroller is probably straightforward (see the recently committed r284044). >> Making the trip count computable is more complicated... it's probably >> possible to teach SCEV to reason about the overflow in the pointer >> computation, or maybe you could version the loop. >> >> -Eli >> >> -- >> Employee of Qualcomm Innovation Center, Inc. >> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project >> >> > > > -- > Kind regards, > Charith Mendis > > Graduate Student, > CSAIL, > Massachusetts Institute of Technology >-- Kind regards, Charith Mendis Graduate Student, CSAIL, Massachusetts Institute of Technology -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161013/24c23768/attachment.html>
Alexandre Isoard via llvm-dev
2016-Oct-13 16:30 UTC
[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop
If count > MAX_UINT-4 your loop loops indefinitely with an increment of 4, I think. On Thu, Oct 13, 2016 at 4:42 PM, Charith Mendis via llvm-dev < llvm-dev at lists.llvm.org> wrote:> So, I tried unrolling the following simple loop. > > int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ > > for(unsigned i=0; i<count; i++){ > > a[i] = b[i]*c[i-1]; > > } > > return 0; > > } > > > Then, the unroller is able to unroll it by 2 even though it doesn't know > about the range of count. SCEV of backedge taken count is (-1 + %count) > > > But, when I change the increment to 4, as in > > > int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ > > for(unsigned i=0; i<count; i+=4){ > > a[i] = b[i]*c[i-1]; > > } > > return 0; > > } > > > The unroller cannot compute the backedge taken count. Therefore, it seems > like the problem is not with the range of "count", can't the unroller > compute it as (- 1 + %count / 4)? > > On Wed, Oct 12, 2016 at 11:28 PM, Charith Mendis < > char.mendis1989 at gmail.com> wrote: > >> Thanks for the explanation. But I am a little confused with the following >> fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert >> the loop to say; >> >> for(unsigned i = 0; i < vectorizable_elements ; i += 2){ >> //main loop >> } >> >> for(unsigned i=0 ; i < vectorizable_elements % 2; i++){ >> //fix up >> } >> >> Why does it have to reason about the range of vectorizable_elements? Even >> if vectorizable_elements == SIZE_MAX the above decomposition would work? >> >> On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at codeaurora.org> >> wrote: >> >>> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote: >>> >>> Hi all, >>> >>> Attached herewith is a simple vectorized function with loops performing >>> a simple shuffle. >>> >>> I want all loops (inner and outer) to be unrolled by 2 and as such used >>> -unroll-count=2 >>> The inner loops(with k as the induction variable and having constant >>> trip counts) unroll fully, but the outer loop with (j) fails to unroll. >>> >>> The llvm code is also attached with inner loops fully unrolled. >>> >>> To inspect further, I added the following to the PassManagerBuilder.cpp >>> to run some canonicalization routines and redo unrolling again. I have set >>> partial unrolling on + have a huge threshold + allows expensive loop trip >>> counts. Still it didn't unroll by 2. >>> >>> MPM.add(createLoopUnrollPass()); >>> >>> MPM.add(createCFGSimplificationPass()); >>> >>> MPM.add(createLoopSimplifyPass()); >>> >>> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1)); >>> >>> MPM.add(createLCSSAPass()); >>> >>> MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars >>> >>> MPM.add(createLoopUnrollPass()); >>> >>> >>> Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder >>> function, where it is unable to calculate the BackEdge taken amount. >>> >>> Can anybody explain what is need to get the outer loop unrolled by 2? It >>> would be a great help. >>> >>> >>> Well, I can at least explain what is happening... runtime unrolling >>> needs to be able to symbolically compute the trip count to avoid inserting >>> a branch after every iteration. SCEV isn't able to prove that your loop >>> isn't an infinite loop (consider the case of vectorizable_elements==SIZE_MAX), >>> therefore it can't compute the trip count. Therefore, we don't unroll. >>> >>> There's a few different angles you could use to attack this: you could >>> teach the unroller to unroll loops with an uncomputable trip count, or you >>> can make the trip count of your loop computable somehow. Changing the >>> unroller is probably straightforward (see the recently committed r284044). >>> Making the trip count computable is more complicated... it's probably >>> possible to teach SCEV to reason about the overflow in the pointer >>> computation, or maybe you could version the loop. >>> >>> -Eli >>> >>> -- >>> Employee of Qualcomm Innovation Center, Inc. >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project >>> >>> >> >> >> -- >> Kind regards, >> Charith Mendis >> >> Graduate Student, >> CSAIL, >> Massachusetts Institute of Technology >> > > > > -- > Kind regards, > Charith Mendis > > Graduate Student, > CSAIL, > Massachusetts Institute of Technology > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-- *Alexandre Isoard* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161013/355c93cc/attachment.html>