thr3ads.net - llvm dev - [LLVMdev] Another missed optimization opportunity? [Apr 2013]

If this information is useful, please help other people find it:
Share via:

Scott Pakin

2013-Apr-24 17:40 UTC

[LLVMdev] Another missed optimization opportunity?

I was suprised to find that some bitcode I'm generating isn't getting
optimized.  Here, I'm doing the equivalent of "myarray[5]++" (on
an
"extern int *myarray"), repeated three times:

     @myarray = external global i32*

     define void @update_array() #0 {
       %1 = load i32** @myarray, align 8
       %2 = getelementptr inbounds i32* %1, i64 5
       %3 = load i32* %2, align 4
       %4 = add nsw i32 %3, 1
       store i32 %4, i32* %2, align 4
       %5 = load i32** @myarray, align 8
       %6 = getelementptr inbounds i32* %5, i64 5
       %7 = load i32* %6, align 4
       %8 = add nsw i32 %7, 1
       store i32 %8, i32* %6, align 4
       %9 = load i32** @myarray, align 8
       %10 = getelementptr inbounds i32* %9, i64 5
       %11 = load i32* %10, align 4
       %12 = add nsw i32 %11, 1
       store i32 %12, i32* %10, align 4
       ret void
     }

Running "opt -std-compile-opts" or even "opt -O3"
doesn't seem to
change the bitcode any.  I had expected the three increments by 1 to
be collapsed into a single increment by 3:

     @myarray = external global i32*

     define void @update_array() #0 {
       %1 = load i32** @myarray, align 8
       %2 = load i32* %1, align 4
       %3 = add nsw i32 %2, 3
       store i32 %3, i32* %1, align 4
       ret void
     }

Even the (x86-64) code generator doesn't do any last-minute
optimizations:

     movq    myarray(%rip), %rax
     incl    20(%rax)
     movq    myarray(%rip), %rax
     incl    20(%rax)
     movq    myarray(%rip), %rax
     incl    20(%rax)

This is with LLVM revision 180116.

Is there some semantic reason that the increments aren't allowed to be
combined, or is this a missed optimization opportunity in LLVM?

Thanks,
-- Scott

Cameron McInally

2013-Apr-24 18:35 UTC

head link

[LLVMdev] Another missed optimization opportunity?

Hey Scott,

On Wed, Apr 24, 2013 at 1:40 PM, Scott Pakin <pakin at lanl.gov> wrote:
...
>
> Is there some semantic reason that the increments aren't allowed to be
> combined, or is this a missed optimization opportunity in LLVM?
>
>I believe that the wildcard is the extern keyword.

Since the external symbol isn't resolved until link time, I suspect that it
would be a legal C program to do something like (maybe the language lawyers
know better though):

XXX> cat test.c
extern int x;

int kung( ) {
  return x;
}

XXX> cat foo.c
extern int kung();

volatile int x;

int main() {
  x = 0;
  return kung();
}

Ugly programming, but I see no way of the linker having enough information
to determine that the extern should be volatile and issue an warning/error.

One more reason to use explicit barriers instead of volatile. ;)

-Cameron
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130424/1eca8f0d/attachment.html>

Krzysztof Parzyszek

2013-Apr-24 18:48 UTC

head link

[LLVMdev] Another missed optimization opportunity?

On 4/24/2013 1:35 PM, Cameron McInally wrote:>
> I believe that the wildcard is the extern keyword.
>
> Since the external symbol isn't resolved until link time, I suspect
that
> it would be a legal C program to do something like (maybe the language
> lawyers know better though):
>
> XXX> cat test.c
> extern int x;
>
> int kung( ) {
>    return x;
> }
>
> XXX> cat foo.c
> extern int kung();
>
> volatile int x;
>
> int main() {
>    x = 0;
>    return kung();
> }
>
> Ugly programming, but I see no way of the linker having enough
> information to determine that the extern should be volatile and issue an
> warning/error.
6.7.3 Type qualifiers

5 If an attempt is made to modify an object defined with a 
const-qualified type through use of an lvalue with non-const-qualified 
type, the behavior is undefined. If an attempt is made to refer to an 
object defined with a volatile-qualified type through use of an lvalue
with non-volatile-qualified type, the behavior is undefined.

-Krzysztof

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, 
hosted by The Linux Foundation

Caldarale, Charles R

2013-Apr-24 19:29 UTC

head link

[LLVMdev] Another missed optimization opportunity?

> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
cs.uiuc.edu]
> On Behalf Of Scott Pakin
> Subject: [LLVMdev] Another missed optimization opportunity?
> I'm doing the equivalent of "myarray[5]++" (on an
> "extern int *myarray"), repeated three times:
> I had expected the three increments by 1 to
> be collapsed into a single increment by 3:
> Is there some semantic reason that the increments aren't allowed to be
> combined, or is this a missed optimization opportunity in LLVM?
Is this a potential aliasing effect?  Since myarray is defined as a pointer, not
an array, it's theoretically possible that the address therein refers to the
same memory location as the pointer itself.

 - Chuck


THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY
MATERIAL and is thus for use only by the intended recipient. If you received
this in error, please contact the sender and delete the e-mail and its
attachments from all computers.

Scott Pakin

2013-Apr-24 20:01 UTC

head link

[LLVMdev] Another missed optimization opportunity?

On 04/24/2013 01:29 PM, Caldarale, Charles R wrote:> Is this a potential aliasing effect?  Since myarray is defined as a
pointer, not an array, it's theoretically possible that the address therein
refers to the same memory location as the pointer itself.
I was thinking along those lines, but I haven't been able to come up
with a specific instance of what could possibly be aliased.

Here are some additional observations that might be helpful in solving
this mystery:

   1) The increments are in fact coalesced when myarray is declared as
      an array instead of a pointer:

           @myarray = external global [10 x i32]

           define void @update_array() #0 {
             %1 = load i32* getelementptr inbounds ([10 x i32]* @myarray, i64 0,
i64 5), align 4
             %2 = add nsw i32 %1, 3
             store i32 %2, i32* getelementptr inbounds ([10 x i32]* @myarray,
i64 0, i64 5), align 4
             ret void
           }

   2) Both clang and gcc optimize the analogous C code into an
      increment by 3:

           extern int *myarray;

           void update_array (void)
           {
             myarray[5]++;
             myarray[5]++;
             myarray[5]++;
           }

      This would imply that the issue is not related to semantics.

   3) opt *does* optimize the code when myarray is declared as a
      constant pointer to variable data as opposed to a variable
      pointer to variable data:

           @myarray = external constant i32*

      This would imply that the issue *is* related to semantics.  Sigh.

-- Scott

Duncan Sands

2013-Apr-24 20:14 UTC

head link

[LLVMdev] Another missed optimization opportunity?

Hi Scott,

On 24/04/13 19:40, Scott Pakin wrote:> I was suprised to find that some bitcode I'm generating isn't
getting
> optimized.  Here, I'm doing the equivalent of "myarray[5]++"
(on an
> "extern int *myarray"), repeated three times:
does your bitcode contain data layout information?

Ciao, Duncan.
>
>      @myarray = external global i32*
>
>      define void @update_array() #0 {
>        %1 = load i32** @myarray, align 8
>        %2 = getelementptr inbounds i32* %1, i64 5
>        %3 = load i32* %2, align 4
>        %4 = add nsw i32 %3, 1
>        store i32 %4, i32* %2, align 4
>        %5 = load i32** @myarray, align 8
>        %6 = getelementptr inbounds i32* %5, i64 5
>        %7 = load i32* %6, align 4
>        %8 = add nsw i32 %7, 1
>        store i32 %8, i32* %6, align 4
>        %9 = load i32** @myarray, align 8
>        %10 = getelementptr inbounds i32* %9, i64 5
>        %11 = load i32* %10, align 4
>        %12 = add nsw i32 %11, 1
>        store i32 %12, i32* %10, align 4
>        ret void
>      }
>
> Running "opt -std-compile-opts" or even "opt -O3"
doesn't seem to
> change the bitcode any.  I had expected the three increments by 1 to
> be collapsed into a single increment by 3:
>
>      @myarray = external global i32*
>
>      define void @update_array() #0 {
>        %1 = load i32** @myarray, align 8
>        %2 = load i32* %1, align 4
>        %3 = add nsw i32 %2, 3
>        store i32 %3, i32* %1, align 4
>        ret void
>      }
>
> Even the (x86-64) code generator doesn't do any last-minute
> optimizations:
>
>      movq    myarray(%rip), %rax
>      incl    20(%rax)
>      movq    myarray(%rip), %rax
>      incl    20(%rax)
>      movq    myarray(%rip), %rax
>      incl    20(%rax)
>
> This is with LLVM revision 180116.
>
> Is there some semantic reason that the increments aren't allowed to be
> combined, or is this a missed optimization opportunity in LLVM?
>
> Thanks,
> -- Scott
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Dan Gohman

2013-Apr-24 20:31 UTC

head link

[LLVMdev] Another missed optimization opportunity?

The semantic reason is that the optimizer is required to assume that the
i32 stores could be storing to the storage of myarray. LLVM IR does not
permit optimizers to optimize based on the nominal types of memory objects
or memory accesses.

This gets optimized in C, because the C compiler adds special TBAA metadata
annotations to the loads and stores which say that the stores of "int"
do
not interfere with the loads of "pointer". It also gets optimized if
myarray is const, because the optimizer knows that const memory is not
modified by stores. It also gets optimized if myarray is an actual array,
because then the address of the array is constant, rather than being a
value loaded from memory.

Dan


On Wed, Apr 24, 2013 at 10:40 AM, Scott Pakin <pakin at lanl.gov> wrote:
> I was suprised to find that some bitcode I'm generating isn't
getting
> optimized.  Here, I'm doing the equivalent of "myarray[5]++"
(on an
> "extern int *myarray"), repeated three times:
>
>     @myarray = external global i32*
>
>     define void @update_array() #0 {
>       %1 = load i32** @myarray, align 8
>       %2 = getelementptr inbounds i32* %1, i64 5
>       %3 = load i32* %2, align 4
>       %4 = add nsw i32 %3, 1
>       store i32 %4, i32* %2, align 4
>       %5 = load i32** @myarray, align 8
>       %6 = getelementptr inbounds i32* %5, i64 5
>       %7 = load i32* %6, align 4
>       %8 = add nsw i32 %7, 1
>       store i32 %8, i32* %6, align 4
>       %9 = load i32** @myarray, align 8
>       %10 = getelementptr inbounds i32* %9, i64 5
>       %11 = load i32* %10, align 4
>       %12 = add nsw i32 %11, 1
>       store i32 %12, i32* %10, align 4
>       ret void
>     }
>
> Running "opt -std-compile-opts" or even "opt -O3"
doesn't seem to
> change the bitcode any.  I had expected the three increments by 1 to
> be collapsed into a single increment by 3:
>
>     @myarray = external global i32*
>
>     define void @update_array() #0 {
>       %1 = load i32** @myarray, align 8
>       %2 = load i32* %1, align 4
>       %3 = add nsw i32 %2, 3
>       store i32 %3, i32* %1, align 4
>       ret void
>     }
>
> Even the (x86-64) code generator doesn't do any last-minute
> optimizations:
>
>     movq    myarray(%rip), %rax
>     incl    20(%rax)
>     movq    myarray(%rip), %rax
>     incl    20(%rax)
>     movq    myarray(%rip), %rax
>     incl    20(%rax)
>
> This is with LLVM revision 180116.
>
> Is there some semantic reason that the increments aren't allowed to be
> combined, or is this a missed optimization opportunity in LLVM?
>
> Thanks,
> -- Scott
> ______________________________**_________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>
http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20130424/32e1523f/attachment.html>

Scott Pakin

2013-Apr-24 21:19 UTC

head link

[LLVMdev] Another missed optimization opportunity?

On 04/24/2013 02:14 PM, Duncan Sands wrote:> does your bitcode contain data layout information?
Yes:

     ; ModuleID = 'junk.c'
     target datalayout =
"e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
     target triple = "x86_64-unknown-linux-gnu"

     @myarray = external global i32*

     ; Function Attrs: nounwind uwtable
     define void @update_array() #0 {
       %1 = load i32** @myarray, align 8
       %2 = getelementptr inbounds i32* %1, i64 5
       %3 = load i32* %2, align 4
       %4 = add nsw i32 %3, 1
       store i32 %4, i32* %2, align 4
       %5 = load i32** @myarray, align 8
       %6 = getelementptr inbounds i32* %5, i64 5
       %7 = load i32* %6, align 4
       %8 = add nsw i32 %7, 1
       store i32 %8, i32* %6, align 4
       %9 = load i32** @myarray, align 8
       %10 = getelementptr inbounds i32* %9, i64 5
       %11 = load i32* %10, align 4
       %12 = add nsw i32 %11, 1
       store i32 %12, i32* %10, align 4
       ret void
     }

     attributes #0 = { nounwind uwtable
"less-precise-fpmad"="false"
"no-frame-pointer-elim"="true"
"no-frame-pointer-elim-non-leaf"="true"
"no-infs-fp-math"="false"
"no-nans-fp-math"="false"
"unsafe-fp-math"="false"
"use-soft-float"="false" }

-- Scott

Scott Pakin

2013-Apr-24 21:21 UTC

head link

[LLVMdev] Another missed optimization opportunity?

On 04/24/2013 02:31 PM, Dan Gohman wrote:> The semantic reason is that the optimizer is required to assume that the
i32 stores could be storing to the storage of myarray. LLVM IR does not permit
optimizers to optimize based on the nominal types of memory objects or memory
accesses.
>
> This gets optimized in C, because the C compiler adds special TBAA metadata
annotations to the loads and stores which say that the stores of "int"
do not interfere with the loads of "pointer". It also gets optimized
if myarray is const, because the optimizer knows that const memory is not
modified by stores. It also gets optimized if myarray is an actual array,
because then the address of the array is constant, rather than being a value
loaded from memory.
Ah.  Very helpful answer; thanks.

-- Scott

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Apr 2013 - [LLVMdev] Another missed optimization opportunity?

[LLVMdev] Another missed optimization opportunity?

[LLVMdev] Another missed optimization opportunity?

[LLVMdev] Another missed optimization opportunity?

[LLVMdev] Another missed optimization opportunity?

[LLVMdev] Another missed optimization opportunity?

[LLVMdev] Another missed optimization opportunity?

[LLVMdev] Another missed optimization opportunity?

[LLVMdev] Another missed optimization opportunity?

[LLVMdev] Another missed optimization opportunity?

Apparently Analagous Threads