Denis Bakhvalov via llvm-dev
2017-Nov-18 15:26 UTC
[llvm-dev] Is llvm capable of doing loop interchange optimization?
Hello, I've been playing around with the really simple example of not cache friendly loop like this: #define N 100 void foo(int** __restrict__ a, int** __restrict__ b) { for (int i = 0; i < N; ++i) for (int j = 0; j < N; ++j) a[j][i] += b[j][i]; } link to compiler explorer: https://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(j:1,source:'%23define+N+100%0A%0Avoid+foo(int**+__restrict__+a,+%0A+++++++++int**+__restrict__+b)%0A%7B%0A++++for+(int+i+%3D+0%3B+i+%3C+N%3B+%2B%2Bi)%0A++++++++for+(int+j+%3D+0%3B+j+%3C+N%3B+%2B%2Bj)%0A++++++++++++a%5Bj%5D%5Bi%5D+%2B%3D+b%5Bj%5D%5Bi%5D%3B%0A%7D%0A%0Avoid+bar(int**+__restrict__+a,+%0A+++++++++int**+__restrict__+b)%0A%7B%0A++++for+(int+i+%3D+0%3B+i+%3C+N%3B+%2B%2Bi)%0A++++++++for+(int+j+%3D+0%3B+j+%3C+N%3B+%2B%2Bj)%0A++++++++++++a%5Bi%5D%5Bj%5D+%2B%3D+b%5Bi%5D%5Bj%5D%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:49.999182365253795,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang500,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',trim:'0'),libs:!(),options:'-O3+-funroll-loops',source:1),l:'5',n:'0',o:'x86-64+clang+5.0.0+(Editor+%231,+Compiler+%231)',t:'0')),k:50.000817634746205,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4 I was thinking that modern compilers are optimizing such cases very aggressively. I quickly checked gcc, there is special option -floop-interchange, but it requires properly configured gcc compiler with Graphite infrastructure enabled. I didn't have that at hand, so I just stopped right there. Loop has no side effects and arrays do not alias, so from my point of view it's safe to do interchange here. I tried different march options as well with no success. I'm interested if llvm is capable of doing loop interchange in general, at least the most simple cases like the mentioned one? -- Best regards, Denis.
Florian Hahn via llvm-dev
2017-Nov-20 12:10 UTC
[llvm-dev] Is llvm capable of doing loop interchange optimization?
Hi Denis, On 18/11/2017 15:26, Denis Bakhvalov via llvm-dev wrote:> Hello, > > I've been playing around with the really simple example of not cache > friendly loop like this: > > #define N 100 > > void foo(int** __restrict__ a, > int** __restrict__ b) > { > for (int i = 0; i < N; ++i) > for (int j = 0; j < N; ++j) > a[j][i] += b[j][i]; > } > > link to compiler explorer: > https://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(j:1,source:'%23define+N+100%0A%0Avoid+foo(int**+__restrict__+a,+%0A+++++++++int**+__restrict__+b)%0A%7B%0A++++for+(int+i+%3D+0%3B+i+%3C+N%3B+%2B%2Bi)%0A++++++++for+(int+j+%3D+0%3B+j+%3C+N%3B+%2B%2Bj)%0A++++++++++++a%5Bj%5D%5Bi%5D+%2B%3D+b%5Bj%5D%5Bi%5D%3B%0A%7D%0A%0Avoid+bar(int**+__restrict__+a,+%0A+++++++++int**+__restrict__+b)%0A%7B%0A++++for+(int+i+%3D+0%3B+i+%3C+N%3B+%2B%2Bi)%0A++++++++for+(int+j+%3D+0%3B+j+%3C+N%3B+%2B%2Bj)%0A++++++++++++a%5Bi%5D%5Bj%5D+%2B%3D+b%5Bi%5D%5Bj%5D%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:49.999182365253795,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang500,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',trim:'0'),libs:!(),options:'-O3+-funroll-loops',source:1),l:'5',n:'0',o:'x86-64+clang+5.0.0+(Editor+%231,+Compiler+%231)',t:'0')),k:50.000817634746205,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4 > > I was thinking that modern compilers are optimizing such cases very > aggressively. I quickly checked gcc, there is special option > -floop-interchange, but it requires properly configured gcc compiler > with Graphite infrastructure enabled. I didn't have that at hand, so I > just stopped right there. > > Loop has no side effects and arrays do not alias, so from my point of > view it's safe to do interchange here. I tried different march options > as well with no success. > I'm interested if llvm is capable of doing loop interchange in > general, at least the most simple cases like the mentioned one? >There is a (relatively basic) loop interchange pass in LLVM. It is not enabled by default however, but you can enable it by passing `-mllvm -enable-loopinterchange` to Clang. LoopInterchange in Clang does not trigger in your example, because a[j] and b[i] may alias. I am no expert on restrict, but it seems both GCC and Clang cannot prove that the access do not alias with the annotations you provided (also `int *restrict *restrict` does not do the trick). In the code below, accesses to aa and bb cannot alias, and both GCC (with -O3) and Clang (recent master build, with `-O3 -mllvm -enable-loopinterchange`) interchange the 2 loops. int aa[N][N]; int bb[N][N]; void foo1() { for (int i = 0; i < N; ++i) for (int j = 0; j < N; ++j) aa[j][i] += bb[j][i]; } Cheers, Florian
Denis Bakhvalov via llvm-dev
2017-Nov-20 12:35 UTC
[llvm-dev] Is llvm capable of doing loop interchange optimization?
On 20/11/2017, Florian Hahn <florian.hahn at arm.com> wrote:> Hi Denis, > > On 18/11/2017 15:26, Denis Bakhvalov via llvm-dev wrote: >> Hello, >> >> I've been playing around with the really simple example of not cache >> friendly loop like this: >> >> #define N 100 >> >> void foo(int** __restrict__ a, >> int** __restrict__ b) >> { >> for (int i = 0; i < N; ++i) >> for (int j = 0; j < N; ++j) >> a[j][i] += b[j][i]; >> } >> >> link to compiler explorer: >> https://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(j:1,source:'%23define+N+100%0A%0Avoid+foo(int**+__restrict__+a,+%0A+++++++++int**+__restrict__+b)%0A%7B%0A++++for+(int+i+%3D+0%3B+i+%3C+N%3B+%2B%2Bi)%0A++++++++for+(int+j+%3D+0%3B+j+%3C+N%3B+%2B%2Bj)%0A++++++++++++a%5Bj%5D%5Bi%5D+%2B%3D+b%5Bj%5D%5Bi%5D%3B%0A%7D%0A%0Avoid+bar(int**+__restrict__+a,+%0A+++++++++int**+__restrict__+b)%0A%7B%0A++++for+(int+i+%3D+0%3B+i+%3C+N%3B+%2B%2Bi)%0A++++++++for+(int+j+%3D+0%3B+j+%3C+N%3B+%2B%2Bj)%0A++++++++++++a%5Bi%5D%5Bj%5D+%2B%3D+b%5Bi%5D%5Bj%5D%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:49.999182365253795,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:clang500,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',trim:'0'),libs:!(),options:'-O3+-funroll-loops',source:1),l:'5',n:'0',o:'x86-64+clang+5.0.0+(Editor+%231,+Compiler+%231)',t:'0')),k:50.000817634746205,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4 >> >> I was thinking that modern compilers are optimizing such cases very >> aggressively. I quickly checked gcc, there is special option >> -floop-interchange, but it requires properly configured gcc compiler >> with Graphite infrastructure enabled. I didn't have that at hand, so I >> just stopped right there. >> >> Loop has no side effects and arrays do not alias, so from my point of >> view it's safe to do interchange here. I tried different march options >> as well with no success. >> I'm interested if llvm is capable of doing loop interchange in >> general, at least the most simple cases like the mentioned one? >> > > There is a (relatively basic) loop interchange pass in LLVM. It is not > enabled by default however, but you can enable it by passing `-mllvm > -enable-loopinterchange` to Clang. > > LoopInterchange in Clang does not trigger in your example, because a[j] > and b[i] may alias. I am no expert on restrict, but it seems both GCC > and Clang cannot prove that the access do not alias with the annotations > you provided (also `int *restrict *restrict` does not do the trick). > > In the code below, accesses to aa and bb cannot alias, and both GCC > (with -O3) and Clang (recent master build, with `-O3 -mllvm > -enable-loopinterchange`) interchange the 2 loops. > > > int aa[N][N]; > int bb[N][N]; > > void foo1() { > for (int i = 0; i < N; ++i) > for (int j = 0; j < N; ++j) > aa[j][i] += bb[j][i]; > } > > > > Cheers, > Florian >Hi Florian, Thank you very much for answering the question. Looks like for now gcc does a better job with the example you provided: https://godbolt.org/g/Z5Cgu3 -- Best regards, Denis.