thr3ads.net - llvm dev - [llvm-dev] Loop invariant not being optimized [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Phil Tomson via llvm-dev

2016-Nov-18 18:00 UTC

[llvm-dev] Loop invariant not being optimized

I tried changing 'noalias' to 'restrict' in the code and I get:

fma.c:17:12: warning: 'restrict' attribute only applies to return values
that are pointers

It seems like 'noalias' would be the correct attribute here, from the
article you linked:

"if a function is annotated as noalias, the optimizer can assume that, in
addition to the parameters themselves, only first-level indirections of
pointer parameters are referenced or modified inside the function. The
visible global state is the set of all data that is not defined or
referenced outside of the compilation scope, and their address is not
taken."

Phil


On Thu, Nov 17, 2016 at 9:50 PM, Nema, Ashutosh <Ashutosh.Nema at amd.com>
wrote:
> If I understood it correctly, __declspec(noalias) is not the same as
> specifying restrict on each parameter.
>
>
>
> It means in the mentioned example a, b & c don't modify or
reference any
> global state, but they are free to alias one another.
>
>
>
> You could specify restrict on each one to indicate that they do not alias
> each other.
>
>
>
> For more details refer: https://msdn.microsoft.com/en-
> us/library/k649tyc7.aspx
>
>
>
> Regards,
>
> Ashutosh
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] *On Behalf Of
*Phil
> Tomson via llvm-dev
> *Sent:* Friday, November 18, 2016 12:23 AM
> *To:* LLVM Developers Mailing List <llvm-dev at lists.llvm.org>
> *Subject:* [llvm-dev] Loop invariant not being optimized
>
>
>
> I've got an example where I think that there should be some
loop-invariant
> optimization happening, but it's not.  Here's the C code:
>
> #define DIM 8
> #define UNROLL_DIM DIM
> typedef double InArray[DIM][DIM];
>
> __declspec(noalias) void f1( InArray c, const InArray a, const InArray b )
> {
>
> #pragma clang loop unroll_count(UNROLL_DIM)
>     for( int i=0;i<DIM;i++)
> #pragma clang loop unroll_count(UNROLL_DIM)
>         for( int j=0;j<DIM;j++)
> #pragma clang loop  unroll_count(UNROLL_DIM)
>             for( int k=0;k<DIM;k++) {
>                 c[i][k] = c[i][k] + a[i][j]*b[j][k];
>             }
> }
>
> The "a[i][j]" there is invariant in that inner loop. I've
unrolled the
> loops with the unroll pragma to make the assembly easier to read,
here's
> what I see (LVM 3.9, compiling with: clang -fms-compatibility
> -funroll-loops -O3   -c fma.c -o fma.o )
>
>
> 0000000000000000 <f1>:
>        0: 29580c0000000000  load  r3,r0,0x0,64
>        8: 2958100200000000  load  r4,r1,0x0,64 #r4 <- a[0][0]
>       10: 2958140400000000  load  r5,r2,0x0,64
>       18: c0580c0805018000  fmaf  r3,r4,r5,r3,64
>       20: 79b80c0000000000  store r3,r0,0x0,64
>       28: 2958100000000008  load  r4,r0,0x8,64
>       30: 2958140200000000  load  r5,r1,0x0,64 #r5 <- a[0][0]
>       38: 2958180400000008  load  r6,r2,0x8,64
>       40: c058100a06020000  fmaf  r4,r5,r6,r4,64
>       48: 79b8100000000008  store r4,r0,0x8,64
>       50: 2958140000000010  load  r5,r0,0x10,64
>       58: 2958180200000000  load  r6,r1,0x0,64 #r6 <- a[0][0]
>       60: 29581c0400000010  load  r7,r2,0x10,64
>       68: c058140c07028000  fmaf  r5,r6,r7,r5,64
>       70: 79b8140000000010  store r5,r0,0x10,64
>       78: 2958180000000018  load  r6,r0,0x18,64
>       80: 29581c0200000000  load  r7,r1,0x0,64 #r7 <- a[0][0]
>       88: 2958200400000018  load  r8,r2,0x18,64
>       90: c058180e08030000  fmaf  r6,r7,r8,r6,64
> ...
>
> (fmaf semantics are: fmaf r1,r2,r3,r4, SIZE    r1 <- r2*r3+r4 )
>
> (load semantics are: load r1,r2,imm, SIZE     r1<- mem[r2+imm] )
>
>
>
> All three of the addresses are loaded in every loop. Only two need to be
> reloaded in the inner loop. I added the 'noalias' declspec in the C
code
> above thinking that it would indicate that the pointers going into the
> function are not aliased and that that would allow the optimization, but it
> didn't make any difference.
>
> Of course it's easy to rewrite the example code to avoid this extra
> load/inner loop, but I would have thought this would be a fairly
> straighforward optimization for the optimizer. Am I missing something?
>
> Phil
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/154ebef8/attachment.html>

Hal Finkel via llvm-dev

2016-Nov-18 23:29 UTC

head link

[llvm-dev] Loop invariant not being optimized

Hi Phil,

I'm not sure whether we do anything with __declspec(noalias), but if I had
to guess, when you used restrict, you did not do it correctly. You can see
http://en.cppreference.com/w/c/language/restrict for some additional usage
examples.

 -Hal

----- Original Message -----> From: "Phil Tomson via llvm-dev" <llvm-dev at
lists.llvm.org>
> To: "Ashutosh Nema" <Ashutosh.Nema at amd.com>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, November 18, 2016 12:00:58 PM
> Subject: Re: [llvm-dev] Loop invariant not being optimized
> 
> 
> 
> 
> 
> I tried changing 'noalias' to 'restrict' in the code and I
get:
> 
> fma.c:17:12: warning: 'restrict' attribute only applies to return
> values that are pointers
> 
> It seems like 'noalias' would be the correct attribute here, from
the
> article you linked:
> 
> "if a function is annotated as noalias , the optimizer can assume
> that, in addition to the parameters themselves, only first-level
> indirections of pointer parameters are referenced or modified inside
> the function. The visible global state is the set of all data that
> is not defined or referenced outside of the compilation scope, and
> their address is not taken."
> 
> Phil
> 
> 
> 
> 
> 
> On Thu, Nov 17, 2016 at 9:50 PM, Nema, Ashutosh <
> Ashutosh.Nema at amd.com > wrote:
> 
> 
> 
> 
> 
> 
> If I understood it correctly, __declspec(noalias) is not the same as
> specifying restrict on each parameter.
> 
> 
> 
> It means in the mentioned example a, b & c don't modify or
reference
> any global state, but they are free to alias one another.
> 
> 
> 
> You could specify restrict on each one to indicate that they do not
> alias each other.
> 
> 
> 
> For more details refer:
> https://msdn.microsoft.com/en-us/library/k649tyc7.aspx
> 
> 
> 
> Regards,
> 
> Ashutosh
> 
> 
> 
> 
> 
> 
> From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On Behalf
> Of Phil Tomson via llvm-dev
> Sent: Friday, November 18, 2016 12:23 AM
> To: LLVM Developers Mailing List < llvm-dev at lists.llvm.org >
> Subject: [llvm-dev] Loop invariant not being optimized
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> I've got an example where I think that there should be some
> loop-invariant optimization happening, but it's not. Here's the C
> code:
> 
> #define DIM 8
> #define UNROLL_DIM DIM
> typedef double InArray[DIM][DIM];
> 
> __declspec(noalias) void f1( InArray c, const InArray a, const
> InArray b )
> {
> 
> #pragma clang loop unroll_count(UNROLL_DIM)
> for( int i=0;i<DIM;i++)
> #pragma clang loop unroll_count(UNROLL_DIM)
> for( int j=0;j<DIM;j++)
> #pragma clang loop unroll_count(UNROLL_DIM)
> for( int k=0;k<DIM;k++) {
> c[i][k] = c[i][k] + a[i][j]*b[j][k];
> }
> }
> 
> The "a[i][j]" there is invariant in that inner loop. I've
unrolled
> the loops with the unroll pragma to make the assembly easier to
> read, here's what I see (LVM 3.9, compiling with: clang
> -fms-compatibility -funroll-loops -O3 -c fma.c -o fma.o )
> 
> 
> 0000000000000000 <f1>:
> 0: 29580c0000000000 load r3,r0,0x0,64
> 8: 2958100200000000 load r4,r1,0x0,64 #r4 <- a[0][0]
> 10: 2958140400000000 load r5,r2,0x0,64
> 18: c0580c0805018000 fmaf r3,r4,r5,r3,64
> 20: 79b80c0000000000 store r3,r0,0x0,64
> 28: 2958100000000008 load r4,r0,0x8,64
> 30: 2958140200000000 load r5,r1,0x0,64 #r5 <- a[0][0]
> 38: 2958180400000008 load r6,r2,0x8,64
> 40: c058100a06020000 fmaf r4,r5,r6,r4,64
> 48: 79b8100000000008 store r4,r0,0x8,64
> 50: 2958140000000010 load r5,r0,0x10,64
> 58: 2958180200000000 load r6,r1,0x0,64 #r6 <- a[0][0]
> 60: 29581c0400000010 load r7,r2,0x10,64
> 68: c058140c07028000 fmaf r5,r6,r7,r5,64
> 70: 79b8140000000010 store r5,r0,0x10,64
> 78: 2958180000000018 load r6,r0,0x18,64
> 80: 29581c0200000000 load r7,r1,0x0,64 #r7 <- a[0][0]
> 88: 2958200400000018 load r8,r2,0x18,64
> 90: c058180e08030000 fmaf r6,r7,r8,r6,64
> ...
> 
> (fmaf semantics are: fmaf r1,r2,r3,r4, SIZE r1 <- r2*r3+r4 )
> 
> 
> (load semantics are: load r1,r2,imm, SIZE r1<- mem[r2+imm] )
> 
> 
> 
> All three of the addresses are loaded in every loop. Only two need to
> be reloaded in the inner loop. I added the 'noalias' declspec in
the
> C code above thinking that it would indicate that the pointers going
> into the function are not aliased and that that would allow the
> optimization, but it didn't make any difference.
> 
> Of course it's easy to rewrite the example code to avoid this extra
> load/inner loop, but I would have thought this would be a fairly
> straighforward optimization for the optimizer. Am I missing
> something?
> 
> Phil
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Phil Tomson via llvm-dev

2016-Nov-19 00:36 UTC

head link

[llvm-dev] Loop invariant not being optimized

Oh, I see. Yes, this works:

__declspec(noalias)
void f1(       double c[restrict DIM][DIM],
         const double a[restrict DIM][DIM],
         const double b[restrict DIM][DIM] )
{

#pragma clang loop unroll_count(UNROLL_DIM)
    for( int i=0;i<DIM;i++)

#pragma clang loop unroll_count(UNROLL_DIM)
        for( int j=0;j<DIM;j++)

#pragma clang loop  unroll_count(UNROLL_DIM)
            for( int k=0;k<DIM;k++) {
                c[i][k] = c[i][k] + a[i][j]*b[j][k];
            }
}

...works as in the invariants are optimized.  Thanks.

Phil


On Fri, Nov 18, 2016 at 3:29 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> Hi Phil,
>
> I'm not sure whether we do anything with __declspec(noalias), but if I
had
> to guess, when you used restrict, you did not do it correctly. You can see
> http://en.cppreference.com/w/c/language/restrict for some additional
> usage examples.
>
>  -Hal
>
> ----- Original Message -----
> > From: "Phil Tomson via llvm-dev" <llvm-dev at
lists.llvm.org>
> > To: "Ashutosh Nema" <Ashutosh.Nema at amd.com>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> > Sent: Friday, November 18, 2016 12:00:58 PM
> > Subject: Re: [llvm-dev] Loop invariant not being optimized
> >
> >
> >
> >
> >
> > I tried changing 'noalias' to 'restrict' in the code
and I get:
> >
> > fma.c:17:12: warning: 'restrict' attribute only applies to
return
> > values that are pointers
> >
> > It seems like 'noalias' would be the correct attribute here,
from the
> > article you linked:
> >
> > "if a function is annotated as noalias , the optimizer can assume
> > that, in addition to the parameters themselves, only first-level
> > indirections of pointer parameters are referenced or modified inside
> > the function. The visible global state is the set of all data that
> > is not defined or referenced outside of the compilation scope, and
> > their address is not taken."
> >
> > Phil
> >
> >
> >
> >
> >
> > On Thu, Nov 17, 2016 at 9:50 PM, Nema, Ashutosh <
> > Ashutosh.Nema at amd.com > wrote:
> >
> >
> >
> >
> >
> >
> > If I understood it correctly, __declspec(noalias) is not the same as
> > specifying restrict on each parameter.
> >
> >
> >
> > It means in the mentioned example a, b & c don't modify or
reference
> > any global state, but they are free to alias one another.
> >
> >
> >
> > You could specify restrict on each one to indicate that they do not
> > alias each other.
> >
> >
> >
> > For more details refer:
> > https://msdn.microsoft.com/en-us/library/k649tyc7.aspx
> >
> >
> >
> > Regards,
> >
> > Ashutosh
> >
> >
> >
> >
> >
> >
> > From: llvm-dev [mailto: llvm-dev-bounces at lists.llvm.org ] On Behalf
> > Of Phil Tomson via llvm-dev
> > Sent: Friday, November 18, 2016 12:23 AM
> > To: LLVM Developers Mailing List < llvm-dev at lists.llvm.org >
> > Subject: [llvm-dev] Loop invariant not being optimized
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > I've got an example where I think that there should be some
> > loop-invariant optimization happening, but it's not. Here's
the C
> > code:
> >
> > #define DIM 8
> > #define UNROLL_DIM DIM
> > typedef double InArray[DIM][DIM];
> >
> > __declspec(noalias) void f1( InArray c, const InArray a, const
> > InArray b )
> > {
> >
> > #pragma clang loop unroll_count(UNROLL_DIM)
> > for( int i=0;i<DIM;i++)
> > #pragma clang loop unroll_count(UNROLL_DIM)
> > for( int j=0;j<DIM;j++)
> > #pragma clang loop unroll_count(UNROLL_DIM)
> > for( int k=0;k<DIM;k++) {
> > c[i][k] = c[i][k] + a[i][j]*b[j][k];
> > }
> > }
> >
> > The "a[i][j]" there is invariant in that inner loop.
I've unrolled
> > the loops with the unroll pragma to make the assembly easier to
> > read, here's what I see (LVM 3.9, compiling with: clang
> > -fms-compatibility -funroll-loops -O3 -c fma.c -o fma.o )
> >
> >
> > 0000000000000000 <f1>:
> > 0: 29580c0000000000 load r3,r0,0x0,64
> > 8: 2958100200000000 load r4,r1,0x0,64 #r4 <- a[0][0]
> > 10: 2958140400000000 load r5,r2,0x0,64
> > 18: c0580c0805018000 fmaf r3,r4,r5,r3,64
> > 20: 79b80c0000000000 store r3,r0,0x0,64
> > 28: 2958100000000008 load r4,r0,0x8,64
> > 30: 2958140200000000 load r5,r1,0x0,64 #r5 <- a[0][0]
> > 38: 2958180400000008 load r6,r2,0x8,64
> > 40: c058100a06020000 fmaf r4,r5,r6,r4,64
> > 48: 79b8100000000008 store r4,r0,0x8,64
> > 50: 2958140000000010 load r5,r0,0x10,64
> > 58: 2958180200000000 load r6,r1,0x0,64 #r6 <- a[0][0]
> > 60: 29581c0400000010 load r7,r2,0x10,64
> > 68: c058140c07028000 fmaf r5,r6,r7,r5,64
> > 70: 79b8140000000010 store r5,r0,0x10,64
> > 78: 2958180000000018 load r6,r0,0x18,64
> > 80: 29581c0200000000 load r7,r1,0x0,64 #r7 <- a[0][0]
> > 88: 2958200400000018 load r8,r2,0x18,64
> > 90: c058180e08030000 fmaf r6,r7,r8,r6,64
> > ...
> >
> > (fmaf semantics are: fmaf r1,r2,r3,r4, SIZE r1 <- r2*r3+r4 )
> >
> >
> > (load semantics are: load r1,r2,imm, SIZE r1<- mem[r2+imm] )
> >
> >
> >
> > All three of the addresses are loaded in every loop. Only two need to
> > be reloaded in the inner loop. I added the 'noalias' declspec
in the
> > C code above thinking that it would indicate that the pointers going
> > into the function are not aliased and that that would allow the
> > optimization, but it didn't make any difference.
> >
> > Of course it's easy to rewrite the example code to avoid this
extra
> > load/inner loop, but I would have thought this would be a fairly
> > straighforward optimization for the optimizer. Am I missing
> > something?
> >
> > Phil
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161118/cb56f948/attachment.html>

llvm dev - Nov 2016 - Loop invariant not being optimized

[llvm-dev] Loop invariant not being optimized

[llvm-dev] Loop invariant not being optimized

[llvm-dev] Loop invariant not being optimized