Whitney T Tsang via llvm-dev
2020-Jan-02 19:00 UTC
[llvm-dev] [RFC] Changing LoopUnrollAndJamPass to a function pass.
<div class="socmaildefaultfont" dir="ltr" style="font-family:Arial, Helvetica, sans-serif;font-size:10pt" ><div dir="ltr" ><font face="AppleSystemUIFont" size="3" >LoopUnrollAndJamPass is currently a loop pass. It is added in a LPM with only itself.</font><br><font face="AppleSystemUIFont" size="3" >`OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass(Level)));`</font><br><font face="AppleSystemUIFont" size="3" >Notice that loops are traversed in an inner to outer order in a LPM.</font><br><br><font face="AppleSystemUIFont" size="3" >The current implementation of LoopUnrollAndJamPass supports only loop nest with one inner loop (L->getSubLoops().size() == 1). </font><br><font face="AppleSystemUIFont" size="3" >Consider the example below:</font><br><font face="AppleSystemUIFont" size="3" >Before loop unroll and jam:</font><br><font face="AppleSystemUIFont" size="3" >```</font><br><font face="AppleSystemUIFont" size="3" >for i</font><br><font face="AppleSystemUIFont" size="3" > for j</font><br><font face="AppleSystemUIFont" size="3" > for k</font><br><font face="AppleSystemUIFont" size="3" > A[I][j][k] = 0;</font><br><font face="AppleSystemUIFont" size="3" >```</font><br><font face="AppleSystemUIFont" size="3" >After loop unroll and jam loop-j with a factor of 2:</font><br><font face="AppleSystemUIFont" size="3" >```</font><br><font face="AppleSystemUIFont" size="3" >for i</font><br><font face="AppleSystemUIFont" size="3" > for j += 2</font><br><font face="AppleSystemUIFont" size="3" > for k</font><br><font face="AppleSystemUIFont" size="3" > A[I][j][k] = 0;</font><br><font face="AppleSystemUIFont" size="3" > A[I][j+1][k] = 0;</font><br><font face="AppleSystemUIFont" size="3" > for j’=j</font><br><font face="AppleSystemUIFont" size="3" > for k</font><br><font face="AppleSystemUIFont" size="3" > A[I][j][k] = 0;</font><br><font face="AppleSystemUIFont" size="3" >```</font><br><font face="AppleSystemUIFont" size="3" >Notice that LoopUnrollAndJamPass can no longer unroll and jam loop-i at the next invocation of LoopUnrollAndJamPass, since there exists two inner loops in loop-i.</font><br><font face="AppleSystemUIFont" size="3" >If LoopUnrollAndJamPass is a function pass, then it can control the order of the loops being considered. By doing the transformation from outer to inner, both loop-i and loop-j can be unroll and jammed. </font><br><br><font face="AppleSystemUIFont" size="3" >In conclusion, I propose to change LoopUnrollAndJamPass from loop to function pass, with the reasons below:</font><br><font face="AppleSystemUIFont" size="3" >1. There is no obvious reason why LoopUnrollAndJamPass need to be a loop pass</font><br><font face="AppleSystemUIFont" size="3" >2. More loops can be transformed by traversing in a outer to inter order</font><br><font face="AppleSystemUIFont" size="3" >3. Less remaining loops are needed if we consider the whole loop nest together</font><br><font face="AppleSystemUIFont" size="3" >4. Better cost model can be created by considering the whole loop nest together</font><br><br><font face="AppleSystemUIFont" size="3" >Regards,</font><br><font face="AppleSystemUIFont" size="3" >Whitney Tsang</font></div></div><BR>
Doerfert, Johannes via llvm-dev
2020-Jan-02 22:48 UTC
[llvm-dev] [RFC] Changing LoopUnrollAndJamPass to a function pass.
This makes sense to me. -- Johannes On 01/02, Whitney T Tsang via llvm-dev wrote:> LoopUnrollAndJamPass is currently a loop pass. It is added in a LPM with only > itself. > `OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass > (Level)));` > Notice that loops are traversed in an inner to outer order in a LPM. > > The current implementation of LoopUnrollAndJamPass supports only loop nest with > one inner loop (L->getSubLoops().size() == 1). > Consider the example below: > Before loop unroll and jam: > ``` > for i > for j > for k > A[I][j][k] = 0; > ``` > After loop unroll and jam loop-j with a factor of 2: > ``` > for i > for j += 2 > for k > A[I][j][k] = 0; > A[I][j+1][k] = 0; > for j’=j > for k > A[I][j][k] = 0; > ``` > Notice that LoopUnrollAndJamPass can no longer unroll and jam loop-i at the > next invocation of LoopUnrollAndJamPass, since there exists two inner loops in > loop-i. > If LoopUnrollAndJamPass is a function pass, then it can control the order of > the loops being considered. By doing the transformation from outer to inner, > both loop-i and loop-j can be unroll and jammed. > > In conclusion, I propose to change LoopUnrollAndJamPass from loop to function > pass, with the reasons below: > 1. There is no obvious reason why LoopUnrollAndJamPass need to be a loop pass > 2. More loops can be transformed by traversing in a outer to inter order > 3. Less remaining loops are needed if we consider the whole loop nest together > 4. Better cost model can be created by considering the whole loop nest together > > Regards, > Whitney Tsang >> _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Johannes Doerfert Researcher Argonne National Laboratory Lemont, IL 60439, USA jdoerfert at anl.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200102/97242597/attachment.sig>
David Green via llvm-dev
2020-Jan-03 13:42 UTC
[llvm-dev] [RFC] Changing LoopUnrollAndJamPass to a function pass.
Hello Sounds interesting, and it would be great to see Unroll and Jam get some love. I'm maybe being unimaginative, but I don't know of many cases where Unroll-and-Jamming the outer loops would be more beneficial than the code bloat it gave (a lot like normal unrolling). But making this a function pass sounds useful anyway, it would avoid the awkward case of accidentally unrolling the inner loop before the outer loop has been UnJ'd! > 3. Less remaining loops are needed if we consider the whole loop nest together This one is interesting. UnJ was written for things like matrix multiply on microcontroller's without vectorization available. It helps quite a lot there. In our experience, we often end up unroll-and-jamming the middle(j) iteration, then unrolling the inner(k) one, creating a tile of sorts. It can create quite large code, but that code is quick on these little cpus. Oh, and IIRC UnJ was written in a way that it would potentially be expanded to handle multiple inner loops. Or maybe I just had the idea that it could be expanded like that, providing that you can work out all the control flow and dependencies correctly. This might help with unroll-and-jamming vectorized code too (although if outer loop vectorization and interleaving is something that can already be handled in the vectorizer, it may be better to leave it to do that job and have it included in all it's cost modelling. Reverse-engineering that the runtime checks are loop-invariant doesn't sound very reliable. If the vectorizer can already do Unroll and Jam, let it do it on its own). I look forward to seeing patches! Dave On 02/01/2020 19:00, Whitney T Tsang via llvm-dev wrote:> LoopUnrollAndJamPass is currently a loop pass. It is added in a LPM with > only itself. > `OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass(Level)));` > Notice that loops are traversed in an inner to outer order in a LPM. > > The current implementation of LoopUnrollAndJamPass supports only loop > nest with one inner loop (L->getSubLoops().size() == 1). > Consider the example below: > Before loop unroll and jam: > ``` > for i > for j > for k > A[I][j][k] = 0; > ``` > After loop unroll and jam loop-j with a factor of 2: > ``` > for i > for j += 2 > for k > A[I][j][k] = 0; > A[I][j+1][k] = 0; > for j’=j > for k > A[I][j][k] = 0; > ``` > Notice that LoopUnrollAndJamPass can no longer unroll and jam loop-i at > the next invocation of LoopUnrollAndJamPass, since there exists two > inner loops in loop-i. > If LoopUnrollAndJamPass is a function pass, then it can control the > order of the loops being considered. By doing the transformation from > outer to inner, both loop-i and loop-j can be unroll and jammed. > > In conclusion, I propose to change LoopUnrollAndJamPass from loop to > function pass, with the reasons below: > 1. There is no obvious reason why LoopUnrollAndJamPass need to be a loop > pass > 2. More loops can be transformed by traversing in a outer to inter order > 3. Less remaining loops are needed if we consider the whole loop nest > together > 4. Better cost model can be created by considering the whole loop nest > together > > Regards, > Whitney Tsang > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >
Whitney T Tsang via llvm-dev
2020-Jan-03 14:28 UTC
[llvm-dev] [RFC] Changing LoopUnrollAndJamPass to a function pass.
I am happy to see positive feedback from multiple parties. I will start working on my first NFCI patch which changes LoopUnrollAndJamPass to a function pass, but keeps traversing the loops from inner to outer. Regards, Whitney Tsang From: David Green <David.Green at arm.com> To: Whitney T Tsang <whitneyt at ca.ibm.com>, "llvm-dev at lists.llvm.org" <llvm-dev at lists.llvm.org> Cc: nd <nd at arm.com> Date: 2020/01/03 08:42 AM Subject: [EXTERNAL] Re: [llvm-dev] [RFC] Changing LoopUnrollAndJamPass to a function pass. Hello Sounds interesting, and it would be great to see Unroll and Jam get some love. I'm maybe being unimaginative, but I don't know of many cases where Unroll-and-Jamming the outer loops would be more beneficial than the code bloat it gave (a lot like normal unrolling). But making this a function pass sounds useful anyway, it would avoid the awkward case of accidentally unrolling the inner loop before the outer loop has been UnJ'd! > 3. Less remaining loops are needed if we consider the whole loop nest together This one is interesting. UnJ was written for things like matrix multiply on microcontroller's without vectorization available. It helps quite a lot there. In our experience, we often end up unroll-and-jamming the middle(j) iteration, then unrolling the inner(k) one, creating a tile of sorts. It can create quite large code, but that code is quick on these little cpus. Oh, and IIRC UnJ was written in a way that it would potentially be expanded to handle multiple inner loops. Or maybe I just had the idea that it could be expanded like that, providing that you can work out all the control flow and dependencies correctly. This might help with unroll-and-jamming vectorized code too (although if outer loop vectorization and interleaving is something that can already be handled in the vectorizer, it may be better to leave it to do that job and have it included in all it's cost modelling. Reverse-engineering that the runtime checks are loop-invariant doesn't sound very reliable. If the vectorizer can already do Unroll and Jam, let it do it on its own). I look forward to seeing patches! Dave On 02/01/2020 19:00, Whitney T Tsang via llvm-dev wrote:> LoopUnrollAndJamPass is currently a loop pass. It is added in a LPM with > only itself. > `OptimizePM.addPass(createFunctionToLoopPassAdaptor(LoopUnrollAndJamPass(Level)));`> Notice that loops are traversed in an inner to outer order in a LPM. > > The current implementation of LoopUnrollAndJamPass supports only loop > nest with one inner loop (L->getSubLoops().size() == 1). > Consider the example below: > Before loop unroll and jam: > ``` > for i > for j > for k > A[I][j][k] = 0; > ``` > After loop unroll and jam loop-j with a factor of 2: > ``` > for i > for j += 2 > for k > A[I][j][k] = 0; > A[I][j+1][k] = 0; > for j’=j > for k > A[I][j][k] = 0; > ``` > Notice that LoopUnrollAndJamPass can no longer unroll and jam loop-i at > the next invocation of LoopUnrollAndJamPass, since there exists two > inner loops in loop-i. > If LoopUnrollAndJamPass is a function pass, then it can control the > order of the loops being considered. By doing the transformation from > outer to inner, both loop-i and loop-j can be unroll and jammed. > > In conclusion, I propose to change LoopUnrollAndJamPass from loop to > function pass, with the reasons below: > 1. There is no obvious reason why LoopUnrollAndJamPass need to be a loop > pass > 2. More loops can be transformed by traversing in a outer to inter order > 3. Less remaining loops are needed if we consider the whole loop nest > together > 4. Better cost model can be created by considering the whole loop nest > together > > Regards, > Whitney Tsang > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org >https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.llvm.org_cgi-2Dbin_mailman_listinfo_llvm-2Ddev&d=DwIF-g&c=jf_iaSHvJObTbx-siA1ZOg&r=p0DGcdtx8-l1bvwJTLSk1zBTXpb78Y1slqHKTsTpRTE&m=a_6w_QR2DwJjPBgugHgy00aautUSck-NjdyLUHSbc-E&s=WVb-88fLx3EZzSvSDVlrUrb48piebm6YPolmbobmuVk&e>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200103/20e03fa1/attachment-0001.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: graycol.gif Type: image/gif Size: 105 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200103/20e03fa1/attachment-0001.gif>