I was suprised to find that some bitcode I'm generating isn't getting optimized. Here, I'm doing the equivalent of "myarray[5]++" (on an "extern int *myarray"), repeated three times: @myarray = external global i32* define void @update_array() #0 { %1 = load i32** @myarray, align 8 %2 = getelementptr inbounds i32* %1, i64 5 %3 = load i32* %2, align 4 %4 = add nsw i32 %3, 1 store i32 %4, i32* %2, align 4 %5 = load i32** @myarray, align 8 %6 = getelementptr inbounds i32* %5, i64 5 %7 = load i32* %6, align 4 %8 = add nsw i32 %7, 1 store i32 %8, i32* %6, align 4 %9 = load i32** @myarray, align 8 %10 = getelementptr inbounds i32* %9, i64 5 %11 = load i32* %10, align 4 %12 = add nsw i32 %11, 1 store i32 %12, i32* %10, align 4 ret void } Running "opt -std-compile-opts" or even "opt -O3" doesn't seem to change the bitcode any. I had expected the three increments by 1 to be collapsed into a single increment by 3: @myarray = external global i32* define void @update_array() #0 { %1 = load i32** @myarray, align 8 %2 = load i32* %1, align 4 %3 = add nsw i32 %2, 3 store i32 %3, i32* %1, align 4 ret void } Even the (x86-64) code generator doesn't do any last-minute optimizations: movq myarray(%rip), %rax incl 20(%rax) movq myarray(%rip), %rax incl 20(%rax) movq myarray(%rip), %rax incl 20(%rax) This is with LLVM revision 180116. Is there some semantic reason that the increments aren't allowed to be combined, or is this a missed optimization opportunity in LLVM? Thanks, -- Scott
Hey Scott, On Wed, Apr 24, 2013 at 1:40 PM, Scott Pakin <pakin at lanl.gov> wrote: ...> > Is there some semantic reason that the increments aren't allowed to be > combined, or is this a missed optimization opportunity in LLVM? > >I believe that the wildcard is the extern keyword. Since the external symbol isn't resolved until link time, I suspect that it would be a legal C program to do something like (maybe the language lawyers know better though): XXX> cat test.c extern int x; int kung( ) { return x; } XXX> cat foo.c extern int kung(); volatile int x; int main() { x = 0; return kung(); } Ugly programming, but I see no way of the linker having enough information to determine that the extern should be volatile and issue an warning/error. One more reason to use explicit barriers instead of volatile. ;) -Cameron -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130424/1eca8f0d/attachment.html>
Krzysztof Parzyszek
2013-Apr-24 18:48 UTC
[LLVMdev] Another missed optimization opportunity?
On 4/24/2013 1:35 PM, Cameron McInally wrote:> > I believe that the wildcard is the extern keyword. > > Since the external symbol isn't resolved until link time, I suspect that > it would be a legal C program to do something like (maybe the language > lawyers know better though): > > XXX> cat test.c > extern int x; > > int kung( ) { > return x; > } > > XXX> cat foo.c > extern int kung(); > > volatile int x; > > int main() { > x = 0; > return kung(); > } > > Ugly programming, but I see no way of the linker having enough > information to determine that the extern should be volatile and issue an > warning/error.6.7.3 Type qualifiers 5 If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. If an attempt is made to refer to an object defined with a volatile-qualified type through use of an lvalue with non-volatile-qualified type, the behavior is undefined. -Krzysztof -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Caldarale, Charles R
2013-Apr-24 19:29 UTC
[LLVMdev] Another missed optimization opportunity?
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Scott Pakin > Subject: [LLVMdev] Another missed optimization opportunity?> I'm doing the equivalent of "myarray[5]++" (on an > "extern int *myarray"), repeated three times:> I had expected the three increments by 1 to > be collapsed into a single increment by 3:> Is there some semantic reason that the increments aren't allowed to be > combined, or is this a missed optimization opportunity in LLVM?Is this a potential aliasing effect? Since myarray is defined as a pointer, not an array, it's theoretically possible that the address therein refers to the same memory location as the pointer itself. - Chuck THIS COMMUNICATION MAY CONTAIN CONFIDENTIAL AND/OR OTHERWISE PROPRIETARY MATERIAL and is thus for use only by the intended recipient. If you received this in error, please contact the sender and delete the e-mail and its attachments from all computers.
On 04/24/2013 01:29 PM, Caldarale, Charles R wrote:> Is this a potential aliasing effect? Since myarray is defined as a pointer, not an array, it's theoretically possible that the address therein refers to the same memory location as the pointer itself.I was thinking along those lines, but I haven't been able to come up with a specific instance of what could possibly be aliased. Here are some additional observations that might be helpful in solving this mystery: 1) The increments are in fact coalesced when myarray is declared as an array instead of a pointer: @myarray = external global [10 x i32] define void @update_array() #0 { %1 = load i32* getelementptr inbounds ([10 x i32]* @myarray, i64 0, i64 5), align 4 %2 = add nsw i32 %1, 3 store i32 %2, i32* getelementptr inbounds ([10 x i32]* @myarray, i64 0, i64 5), align 4 ret void } 2) Both clang and gcc optimize the analogous C code into an increment by 3: extern int *myarray; void update_array (void) { myarray[5]++; myarray[5]++; myarray[5]++; } This would imply that the issue is not related to semantics. 3) opt *does* optimize the code when myarray is declared as a constant pointer to variable data as opposed to a variable pointer to variable data: @myarray = external constant i32* This would imply that the issue *is* related to semantics. Sigh. -- Scott
Hi Scott, On 24/04/13 19:40, Scott Pakin wrote:> I was suprised to find that some bitcode I'm generating isn't getting > optimized. Here, I'm doing the equivalent of "myarray[5]++" (on an > "extern int *myarray"), repeated three times:does your bitcode contain data layout information? Ciao, Duncan.> > @myarray = external global i32* > > define void @update_array() #0 { > %1 = load i32** @myarray, align 8 > %2 = getelementptr inbounds i32* %1, i64 5 > %3 = load i32* %2, align 4 > %4 = add nsw i32 %3, 1 > store i32 %4, i32* %2, align 4 > %5 = load i32** @myarray, align 8 > %6 = getelementptr inbounds i32* %5, i64 5 > %7 = load i32* %6, align 4 > %8 = add nsw i32 %7, 1 > store i32 %8, i32* %6, align 4 > %9 = load i32** @myarray, align 8 > %10 = getelementptr inbounds i32* %9, i64 5 > %11 = load i32* %10, align 4 > %12 = add nsw i32 %11, 1 > store i32 %12, i32* %10, align 4 > ret void > } > > Running "opt -std-compile-opts" or even "opt -O3" doesn't seem to > change the bitcode any. I had expected the three increments by 1 to > be collapsed into a single increment by 3: > > @myarray = external global i32* > > define void @update_array() #0 { > %1 = load i32** @myarray, align 8 > %2 = load i32* %1, align 4 > %3 = add nsw i32 %2, 3 > store i32 %3, i32* %1, align 4 > ret void > } > > Even the (x86-64) code generator doesn't do any last-minute > optimizations: > > movq myarray(%rip), %rax > incl 20(%rax) > movq myarray(%rip), %rax > incl 20(%rax) > movq myarray(%rip), %rax > incl 20(%rax) > > This is with LLVM revision 180116. > > Is there some semantic reason that the increments aren't allowed to be > combined, or is this a missed optimization opportunity in LLVM? > > Thanks, > -- Scott > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
The semantic reason is that the optimizer is required to assume that the i32 stores could be storing to the storage of myarray. LLVM IR does not permit optimizers to optimize based on the nominal types of memory objects or memory accesses. This gets optimized in C, because the C compiler adds special TBAA metadata annotations to the loads and stores which say that the stores of "int" do not interfere with the loads of "pointer". It also gets optimized if myarray is const, because the optimizer knows that const memory is not modified by stores. It also gets optimized if myarray is an actual array, because then the address of the array is constant, rather than being a value loaded from memory. Dan On Wed, Apr 24, 2013 at 10:40 AM, Scott Pakin <pakin at lanl.gov> wrote:> I was suprised to find that some bitcode I'm generating isn't getting > optimized. Here, I'm doing the equivalent of "myarray[5]++" (on an > "extern int *myarray"), repeated three times: > > @myarray = external global i32* > > define void @update_array() #0 { > %1 = load i32** @myarray, align 8 > %2 = getelementptr inbounds i32* %1, i64 5 > %3 = load i32* %2, align 4 > %4 = add nsw i32 %3, 1 > store i32 %4, i32* %2, align 4 > %5 = load i32** @myarray, align 8 > %6 = getelementptr inbounds i32* %5, i64 5 > %7 = load i32* %6, align 4 > %8 = add nsw i32 %7, 1 > store i32 %8, i32* %6, align 4 > %9 = load i32** @myarray, align 8 > %10 = getelementptr inbounds i32* %9, i64 5 > %11 = load i32* %10, align 4 > %12 = add nsw i32 %11, 1 > store i32 %12, i32* %10, align 4 > ret void > } > > Running "opt -std-compile-opts" or even "opt -O3" doesn't seem to > change the bitcode any. I had expected the three increments by 1 to > be collapsed into a single increment by 3: > > @myarray = external global i32* > > define void @update_array() #0 { > %1 = load i32** @myarray, align 8 > %2 = load i32* %1, align 4 > %3 = add nsw i32 %2, 3 > store i32 %3, i32* %1, align 4 > ret void > } > > Even the (x86-64) code generator doesn't do any last-minute > optimizations: > > movq myarray(%rip), %rax > incl 20(%rax) > movq myarray(%rip), %rax > incl 20(%rax) > movq myarray(%rip), %rax > incl 20(%rax) > > This is with LLVM revision 180116. > > Is there some semantic reason that the increments aren't allowed to be > combined, or is this a missed optimization opportunity in LLVM? > > Thanks, > -- Scott > ______________________________**_________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/**mailman/listinfo/llvmdev<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130424/32e1523f/attachment.html>
On 04/24/2013 02:14 PM, Duncan Sands wrote:> does your bitcode contain data layout information?Yes: ; ModuleID = 'junk.c' target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" @myarray = external global i32* ; Function Attrs: nounwind uwtable define void @update_array() #0 { %1 = load i32** @myarray, align 8 %2 = getelementptr inbounds i32* %1, i64 5 %3 = load i32* %2, align 4 %4 = add nsw i32 %3, 1 store i32 %4, i32* %2, align 4 %5 = load i32** @myarray, align 8 %6 = getelementptr inbounds i32* %5, i64 5 %7 = load i32* %6, align 4 %8 = add nsw i32 %7, 1 store i32 %8, i32* %6, align 4 %9 = load i32** @myarray, align 8 %10 = getelementptr inbounds i32* %9, i64 5 %11 = load i32* %10, align 4 %12 = add nsw i32 %11, 1 store i32 %12, i32* %10, align 4 ret void } attributes #0 = { nounwind uwtable "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"="true" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "unsafe-fp-math"="false" "use-soft-float"="false" } -- Scott
On 04/24/2013 02:31 PM, Dan Gohman wrote:> The semantic reason is that the optimizer is required to assume that the i32 stores could be storing to the storage of myarray. LLVM IR does not permit optimizers to optimize based on the nominal types of memory objects or memory accesses. > > This gets optimized in C, because the C compiler adds special TBAA metadata annotations to the loads and stores which say that the stores of "int" do not interfere with the loads of "pointer". It also gets optimized if myarray is const, because the optimizer knows that const memory is not modified by stores. It also gets optimized if myarray is an actual array, because then the address of the array is constant, rather than being a value loaded from memory.Ah. Very helpful answer; thanks. -- Scott