Alexandre Isoard via llvm-dev
2016-Oct-13 16:30 UTC
[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop
If count > MAX_UINT-4 your loop loops indefinitely with an increment of 4, I think. On Thu, Oct 13, 2016 at 4:42 PM, Charith Mendis via llvm-dev < llvm-dev at lists.llvm.org> wrote:> So, I tried unrolling the following simple loop. > > int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ > > for(unsigned i=0; i<count; i++){ > > a[i] = b[i]*c[i-1]; > > } > > return 0; > > } > > > Then, the unroller is able to unroll it by 2 even though it doesn't know > about the range of count. SCEV of backedge taken count is (-1 + %count) > > > But, when I change the increment to 4, as in > > > int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ > > for(unsigned i=0; i<count; i+=4){ > > a[i] = b[i]*c[i-1]; > > } > > return 0; > > } > > > The unroller cannot compute the backedge taken count. Therefore, it seems > like the problem is not with the range of "count", can't the unroller > compute it as (- 1 + %count / 4)? > > On Wed, Oct 12, 2016 at 11:28 PM, Charith Mendis < > char.mendis1989 at gmail.com> wrote: > >> Thanks for the explanation. But I am a little confused with the following >> fact. Can't LLVM keep vectorizable_elements as a symbolic value and convert >> the loop to say; >> >> for(unsigned i = 0; i < vectorizable_elements ; i += 2){ >> //main loop >> } >> >> for(unsigned i=0 ; i < vectorizable_elements % 2; i++){ >> //fix up >> } >> >> Why does it have to reason about the range of vectorizable_elements? Even >> if vectorizable_elements == SIZE_MAX the above decomposition would work? >> >> On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at codeaurora.org> >> wrote: >> >>> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote: >>> >>> Hi all, >>> >>> Attached herewith is a simple vectorized function with loops performing >>> a simple shuffle. >>> >>> I want all loops (inner and outer) to be unrolled by 2 and as such used >>> -unroll-count=2 >>> The inner loops(with k as the induction variable and having constant >>> trip counts) unroll fully, but the outer loop with (j) fails to unroll. >>> >>> The llvm code is also attached with inner loops fully unrolled. >>> >>> To inspect further, I added the following to the PassManagerBuilder.cpp >>> to run some canonicalization routines and redo unrolling again. I have set >>> partial unrolling on + have a huge threshold + allows expensive loop trip >>> counts. Still it didn't unroll by 2. >>> >>> MPM.add(createLoopUnrollPass()); >>> >>> MPM.add(createCFGSimplificationPass()); >>> >>> MPM.add(createLoopSimplifyPass()); >>> >>> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1)); >>> >>> MPM.add(createLCSSAPass()); >>> >>> MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars >>> >>> MPM.add(createLoopUnrollPass()); >>> >>> >>> Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder >>> function, where it is unable to calculate the BackEdge taken amount. >>> >>> Can anybody explain what is need to get the outer loop unrolled by 2? It >>> would be a great help. >>> >>> >>> Well, I can at least explain what is happening... runtime unrolling >>> needs to be able to symbolically compute the trip count to avoid inserting >>> a branch after every iteration. SCEV isn't able to prove that your loop >>> isn't an infinite loop (consider the case of vectorizable_elements==SIZE_MAX), >>> therefore it can't compute the trip count. Therefore, we don't unroll. >>> >>> There's a few different angles you could use to attack this: you could >>> teach the unroller to unroll loops with an uncomputable trip count, or you >>> can make the trip count of your loop computable somehow. Changing the >>> unroller is probably straightforward (see the recently committed r284044). >>> Making the trip count computable is more complicated... it's probably >>> possible to teach SCEV to reason about the overflow in the pointer >>> computation, or maybe you could version the loop. >>> >>> -Eli >>> >>> -- >>> Employee of Qualcomm Innovation Center, Inc. >>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project >>> >>> >> >> >> -- >> Kind regards, >> Charith Mendis >> >> Graduate Student, >> CSAIL, >> Massachusetts Institute of Technology >> > > > > -- > Kind regards, > Charith Mendis > > Graduate Student, > CSAIL, > Massachusetts Institute of Technology > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-- *Alexandre Isoard* -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161013/355c93cc/attachment.html>
Charith Mendis via llvm-dev
2016-Oct-13 17:57 UTC
[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop
Oh I see, the original loop may end normally, but by unrolling it may induce an infinite loop. On Thursday, October 13, 2016, Alexandre Isoard <alexandre.isoard at gmail.com> wrote:> If count > MAX_UINT-4 your loop loops indefinitely with an increment of 4, > I think. > > On Thu, Oct 13, 2016 at 4:42 PM, Charith Mendis via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> So, I tried unrolling the following simple loop. >> >> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ >> >> for(unsigned i=0; i<count; i++){ >> >> a[i] = b[i]*c[i-1]; >> >> } >> >> return 0; >> >> } >> >> >> Then, the unroller is able to unroll it by 2 even though it doesn't know >> about the range of count. SCEV of backedge taken count is (-1 + %count) >> >> >> But, when I change the increment to 4, as in >> >> >> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ >> >> for(unsigned i=0; i<count; i+=4){ >> >> a[i] = b[i]*c[i-1]; >> >> } >> >> return 0; >> >> } >> >> >> The unroller cannot compute the backedge taken count. Therefore, it >> seems like the problem is not with the range of "count", can't the unroller >> compute it as (- 1 + %count / 4)? >> >> On Wed, Oct 12, 2016 at 11:28 PM, Charith Mendis < >> char.mendis1989 at gmail.com> wrote: >> >>> Thanks for the explanation. But I am a little confused with the >>> following fact. Can't LLVM keep vectorizable_elements as a symbolic value >>> and convert the loop to say; >>> >>> for(unsigned i = 0; i < vectorizable_elements ; i += 2){ >>> //main loop >>> } >>> >>> for(unsigned i=0 ; i < vectorizable_elements % 2; i++){ >>> //fix up >>> } >>> >>> Why does it have to reason about the range of vectorizable_elements? >>> Even if vectorizable_elements == SIZE_MAX the above decomposition would >>> work? >>> >>> On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at codeaurora.org> >>> wrote: >>> >>>> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote: >>>> >>>> Hi all, >>>> >>>> Attached herewith is a simple vectorized function with loops performing >>>> a simple shuffle. >>>> >>>> I want all loops (inner and outer) to be unrolled by 2 and as such used >>>> -unroll-count=2 >>>> The inner loops(with k as the induction variable and having constant >>>> trip counts) unroll fully, but the outer loop with (j) fails to unroll. >>>> >>>> The llvm code is also attached with inner loops fully unrolled. >>>> >>>> To inspect further, I added the following to the PassManagerBuilder.cpp >>>> to run some canonicalization routines and redo unrolling again. I have set >>>> partial unrolling on + have a huge threshold + allows expensive loop trip >>>> counts. Still it didn't unroll by 2. >>>> >>>> MPM.add(createLoopUnrollPass()); >>>> >>>> MPM.add(createCFGSimplificationPass()); >>>> >>>> MPM.add(createLoopSimplifyPass()); >>>> >>>> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1)); >>>> >>>> MPM.add(createLCSSAPass()); >>>> >>>> MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars >>>> >>>> MPM.add(createLoopUnrollPass()); >>>> >>>> >>>> Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder >>>> function, where it is unable to calculate the BackEdge taken amount. >>>> >>>> Can anybody explain what is need to get the outer loop unrolled by 2? >>>> It would be a great help. >>>> >>>> >>>> Well, I can at least explain what is happening... runtime unrolling >>>> needs to be able to symbolically compute the trip count to avoid inserting >>>> a branch after every iteration. SCEV isn't able to prove that your loop >>>> isn't an infinite loop (consider the case of vectorizable_elements==SIZE_MAX), >>>> therefore it can't compute the trip count. Therefore, we don't unroll. >>>> >>>> There's a few different angles you could use to attack this: you could >>>> teach the unroller to unroll loops with an uncomputable trip count, or you >>>> can make the trip count of your loop computable somehow. Changing the >>>> unroller is probably straightforward (see the recently committed r284044). >>>> Making the trip count computable is more complicated... it's probably >>>> possible to teach SCEV to reason about the overflow in the pointer >>>> computation, or maybe you could version the loop. >>>> >>>> -Eli >>>> >>>> -- >>>> Employee of Qualcomm Innovation Center, Inc. >>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project >>>> >>>> >>> >>> >>> -- >>> Kind regards, >>> Charith Mendis >>> >>> Graduate Student, >>> CSAIL, >>> Massachusetts Institute of Technology >>> >> >> >> >> -- >> Kind regards, >> Charith Mendis >> >> Graduate Student, >> CSAIL, >> Massachusetts Institute of Technology >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >> > > > -- > *Alexandre Isoard* >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161013/532e1991/attachment.html>
Ehsan Amiri via llvm-dev
2016-Oct-13 21:50 UTC
[llvm-dev] Loop Unrolling Fail in Simple Vectorized loop
On Thu, Oct 13, 2016 at 1:57 PM, Charith Mendis via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Oh I see, the original loop may end normally, but by unrolling it may > induce an infinite loop. > >No. The problem is that step of the original loop is not 1. for(unsigned i=0; i<count; i+=4){ a[i] = b[i]*c[i-1]; } Let's assume unsigned is a 32 bit integer. Then maximum unsigned number will be 2^32 - 1. Let count = 2^32 - 1. When the loop iterates at some point we will have i = 2^32 - 4. What is the value of i in the next iteration? 2^32 cannot be represented in a 32 bit integer, so what happens is that you will wrap around and in the next iteration you will have i = 0. So your original loop may be infinte. You can confirm this by compiling and running the following program #include <climits> #include <iostream> using namespace std; int main() { for (int i = 0; i < UINT_MAX; i += 4) { if (i == 0) cout << "i == 0!" << endl; } return 0; }> > On Thursday, October 13, 2016, Alexandre Isoard < > alexandre.isoard at gmail.com> wrote: > >> If count > MAX_UINT-4 your loop loops indefinitely with an increment of >> 4, I think. >> >> On Thu, Oct 13, 2016 at 4:42 PM, Charith Mendis via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> So, I tried unrolling the following simple loop. >>> >>> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ >>> >>> for(unsigned i=0; i<count; i++){ >>> >>> a[i] = b[i]*c[i-1]; >>> >>> } >>> >>> return 0; >>> >>> } >>> >>> >>> Then, the unroller is able to unroll it by 2 even though it doesn't know >>> about the range of count. SCEV of backedge taken count is (-1 + %count) >>> >>> >>> But, when I change the increment to 4, as in >>> >>> >>> int unroll(unsigned * a, unsigned * b, unsigned *c, unsigned count){ >>> >>> for(unsigned i=0; i<count; i+=4){ >>> >>> a[i] = b[i]*c[i-1]; >>> >>> } >>> >>> return 0; >>> >>> } >>> >>> >>> The unroller cannot compute the backedge taken count. Therefore, it >>> seems like the problem is not with the range of "count", can't the unroller >>> compute it as (- 1 + %count / 4)? >>> >>> On Wed, Oct 12, 2016 at 11:28 PM, Charith Mendis < >>> char.mendis1989 at gmail.com> wrote: >>> >>>> Thanks for the explanation. But I am a little confused with the >>>> following fact. Can't LLVM keep vectorizable_elements as a symbolic value >>>> and convert the loop to say; >>>> >>>> for(unsigned i = 0; i < vectorizable_elements ; i += 2){ >>>> //main loop >>>> } >>>> >>>> for(unsigned i=0 ; i < vectorizable_elements % 2; i++){ >>>> //fix up >>>> } >>>> >>>> Why does it have to reason about the range of vectorizable_elements? >>>> Even if vectorizable_elements == SIZE_MAX the above decomposition would >>>> work? >>>> >>>> On Wed, Oct 12, 2016 at 8:25 PM, Friedman, Eli <efriedma at codeaurora.org >>>> > wrote: >>>> >>>>> On 10/12/2016 4:35 PM, Charith Mendis via llvm-dev wrote: >>>>> >>>>> Hi all, >>>>> >>>>> Attached herewith is a simple vectorized function with loops >>>>> performing a simple shuffle. >>>>> >>>>> I want all loops (inner and outer) to be unrolled by 2 and as such >>>>> used -unroll-count=2 >>>>> The inner loops(with k as the induction variable and having constant >>>>> trip counts) unroll fully, but the outer loop with (j) fails to unroll. >>>>> >>>>> The llvm code is also attached with inner loops fully unrolled. >>>>> >>>>> To inspect further, I added the following to the >>>>> PassManagerBuilder.cpp to run some canonicalization routines and redo >>>>> unrolling again. I have set partial unrolling on + have a huge threshold + >>>>> allows expensive loop trip counts. Still it didn't unroll by 2. >>>>> >>>>> MPM.add(createLoopUnrollPass()); >>>>> >>>>> MPM.add(createCFGSimplificationPass()); >>>>> >>>>> MPM.add(createLoopSimplifyPass()); >>>>> >>>>> MPM.add(createLoopRotatePass(SizeLevel == 2 ? 0 : -1)); >>>>> >>>>> MPM.add(createLCSSAPass()); >>>>> >>>>> MPM.add(createIndVarSimplifyPass()); // Canonicalize indvars >>>>> >>>>> MPM.add(createLoopUnrollPass()); >>>>> >>>>> >>>>> Digging deeper I found, that it fails in UnrollRuntimeLoopRemainder >>>>> function, where it is unable to calculate the BackEdge taken amount. >>>>> >>>>> Can anybody explain what is need to get the outer loop unrolled by 2? >>>>> It would be a great help. >>>>> >>>>> >>>>> Well, I can at least explain what is happening... runtime unrolling >>>>> needs to be able to symbolically compute the trip count to avoid inserting >>>>> a branch after every iteration. SCEV isn't able to prove that your loop >>>>> isn't an infinite loop (consider the case of vectorizable_elements==SIZE_MAX), >>>>> therefore it can't compute the trip count. Therefore, we don't unroll. >>>>> >>>>> There's a few different angles you could use to attack this: you could >>>>> teach the unroller to unroll loops with an uncomputable trip count, or you >>>>> can make the trip count of your loop computable somehow. Changing the >>>>> unroller is probably straightforward (see the recently committed r284044). >>>>> Making the trip count computable is more complicated... it's probably >>>>> possible to teach SCEV to reason about the overflow in the pointer >>>>> computation, or maybe you could version the loop. >>>>> >>>>> -Eli >>>>> >>>>> -- >>>>> Employee of Qualcomm Innovation Center, Inc. >>>>> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project >>>>> >>>>> >>>> >>>> >>>> -- >>>> Kind regards, >>>> Charith Mendis >>>> >>>> Graduate Student, >>>> CSAIL, >>>> Massachusetts Institute of Technology >>>> >>> >>> >>> >>> -- >>> Kind regards, >>> Charith Mendis >>> >>> Graduate Student, >>> CSAIL, >>> Massachusetts Institute of Technology >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>> >> >> >> -- >> *Alexandre Isoard* >> > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20161013/a8e3f5c5/attachment.html>