Caldarale, Charles R schrieb:> You have not provided us with the declaration for f(). Unless its argument is marked with the nocapture attribute, the compilation of g() cannot assume that f() has not retained a pointer to the x struct and is using it in the second call. >thanks a lot for the input. Yes, I forgot to that. The C function declaration would have been void f( struct a_b *p); which compiled into declare void @f(%struct.a_b*) #2 with attributes #2 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core2" "target-features"="+cx16,+sse,+sse2,+sse3,+ssse3" "unsafe-fp-math"="false" "use-soft-float"="false" } --- I could not figure out how to decorate my C code to emit the nocapture attribute, __attribute(( nocapture) is unknown. So I tried to modify the IR code by hand to read thusly: declare void @f(%struct.a_b* nocapture) #1 But in the end, it didn't make a difference, when I compiled it with ../llvm-build.d/bin/llc -O3 -o test-combine-alloca.s test-combine-alloca.ir it still used two allocas. From a C perspective, I find it weird, that it should concern the caller if the called function "mistakenly" holds onto an alloca buffer, that will be invalid soon anyway. But I guess that's C++ magic somehow :) Ciao Nat! ---- ; ModuleID = 'test-combine-alloca.c' target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-apple-macosx10.10.0" %struct.a_b = type { i32, i32 } declare void @f(%struct.a_b* nocapture) #1 ; Function Attrs: nounwind ssp uwtable define void @g() #0 { entry: %x = alloca %struct.a_b, align 4 %y = alloca %struct.a_b, align 4 %a = getelementptr inbounds %struct.a_b, %struct.a_b* %x, i32 0, i32 0 store i32 1, i32* %a, align 4 %b = getelementptr inbounds %struct.a_b, %struct.a_b* %x, i32 0, i32 1 store i32 2, i32* %b, align 4 call void @f(%struct.a_b* %x) %a1 = getelementptr inbounds %struct.a_b, %struct.a_b* %y, i32 0, i32 0 store i32 1, i32* %a1, align 4 %b2 = getelementptr inbounds %struct.a_b, %struct.a_b* %y, i32 0, i32 1 store i32 3, i32* %b2, align 4 call void @f(%struct.a_b* %y) ret void } attributes #0 = { nounwind ssp uwtable "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core2" "target-features"="+cx16,+sse,+sse2,+sse3,+ssse3" "unsafe-fp-math"="false" "use-soft-float"="false" } attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="core2" "target-features"="+cx16,+sse,+sse2,+sse3,+ssse3" "unsafe-fp-math"="false" "use-soft-float"="false" } !llvm.module.flags = !{!0} !llvm.ident = !{!1} !0 = !{i32 1, !"PIC Level", i32 2} !1 = !{!"clang version 3.7.0 (http://llvm.org/git/clang.git 36ba449caa88f710520cdce148457e5a75e9dabc) (http://llvm.org/git/llvm.git dccade93466c50834dbaa5f4dabb81e90d768c40)"} ----
Björn Steinbrink via llvm-dev
2015-Aug-31 13:32 UTC
[llvm-dev] alloca combining, not (yet) possible ?
HI Nat, LLVM currently only performs stack coloring to merge allocas if you use lifetime intrinsics to tell it exactly where the lifetimes of the alloca start and end. With your code, the lifetimes of both x and y cover the entire function. Introducing a lexical scope to limit the lifetime of x gives clang the necessary information to emit the lifetime.end intrinsic, and declaring y after that scope makes it emit the lifetime.start intrinsic appropriately as well. struct a_b { long a; long b; }; void f(struct a_b*); void g(void) { { // Lifetime of x starts here struct a_b x; x.a = 1; x.b = 2; f(&x); } // Lifetime of x ends here // Lifetime of y starts here struct a_b y; y.a = 1; y.b = 3; f(&y); // Lifetime of y ends here } It would be nice if LLVM could do this for non-escaping allocas without the need for those intrinsics, but currently, this is the way to go. Cheers, Björn 2015-08-31 15:21 GMT+02:00 Nat! via llvm-dev <llvm-dev at lists.llvm.org>:> Caldarale, Charles R schrieb: >> >> You have not provided us with the declaration for f(). Unless its >> argument is marked with the nocapture attribute, the compilation of g() >> cannot assume that f() has not retained a pointer to the x struct and is >> using it in the second call. >> > > thanks a lot for the input. Yes, I forgot to that. The C function > declaration would have been > > void f( struct a_b *p); > > which compiled into > > declare void @f(%struct.a_b*) #2 > > with > > attributes #2 = { "disable-tail-calls"="false" > "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" > "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" > "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" > "target-cpu"="core2" "target-features"="+cx16,+sse,+sse2,+sse3,+ssse3" > "unsafe-fp-math"="false" "use-soft-float"="false" } > > --- > > I could not figure out how to decorate my C code to emit the nocapture > attribute, __attribute(( nocapture) is unknown. So I tried to modify the IR > code by hand to read thusly: > > declare void @f(%struct.a_b* nocapture) #1 > > But in the end, it didn't make a difference, when I compiled it with > > ../llvm-build.d/bin/llc -O3 -o test-combine-alloca.s test-combine-alloca.ir > > it still used two allocas. > > From a C perspective, I find it weird, that it should concern the caller if > the called function "mistakenly" holds onto an alloca buffer, that will be > invalid soon anyway. But I guess that's C++ magic somehow :) > > Ciao > Nat! > ---- > ; ModuleID = 'test-combine-alloca.c' > target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128" > target triple = "x86_64-apple-macosx10.10.0" > > %struct.a_b = type { i32, i32 } > > declare void @f(%struct.a_b* nocapture) #1 > > ; Function Attrs: nounwind ssp uwtable > define void @g() #0 { > entry: > %x = alloca %struct.a_b, align 4 > %y = alloca %struct.a_b, align 4 > %a = getelementptr inbounds %struct.a_b, %struct.a_b* %x, i32 0, i32 0 > store i32 1, i32* %a, align 4 > %b = getelementptr inbounds %struct.a_b, %struct.a_b* %x, i32 0, i32 1 > store i32 2, i32* %b, align 4 > call void @f(%struct.a_b* %x) > %a1 = getelementptr inbounds %struct.a_b, %struct.a_b* %y, i32 0, i32 0 > store i32 1, i32* %a1, align 4 > %b2 = getelementptr inbounds %struct.a_b, %struct.a_b* %y, i32 0, i32 1 > store i32 3, i32* %b2, align 4 > call void @f(%struct.a_b* %y) > ret void > } > > > attributes #0 = { nounwind ssp uwtable "disable-tail-calls"="false" > "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" > "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" > "no-nans-fp-math"="false" "stack-protector-buffer-size"="8" > "target-cpu"="core2" "target-features"="+cx16,+sse,+sse2,+sse3,+ssse3" > "unsafe-fp-math"="false" "use-soft-float"="false" } > attributes #1 = { "disable-tail-calls"="false" "less-precise-fpmad"="false" > "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" > "no-infs-fp-math"="false" "no-nans-fp-math"="false" > "stack-protector-buffer-size"="8" "target-cpu"="core2" > "target-features"="+cx16,+sse,+sse2,+sse3,+ssse3" "unsafe-fp-math"="false" > "use-soft-float"="false" } > > !llvm.module.flags = !{!0} > !llvm.ident = !{!1} > > !0 = !{i32 1, !"PIC Level", i32 2} > !1 = !{!"clang version 3.7.0 (http://llvm.org/git/clang.git > 36ba449caa88f710520cdce148457e5a75e9dabc) (http://llvm.org/git/llvm.git > dccade93466c50834dbaa5f4dabb81e90d768c40)"} > ---- > > > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Björn Steinbrink schrieb: > ...> // Lifetime of y starts here > struct a_b y; > y.a = 1; > y.b = 3; > f(&y); > // Lifetime of y ends here > }Nice, thanks very much. This does the alloca combining (even without having to specify "nocapture"). Wrapping my clang output with lifetime calls shouldn't be a problem. The code that does that optimization, is I assume: http://www.llvm.org/docs/doxygen/html/StackColoring_8cpp_source.html I would like to take the alloca combining a step further still, which is the combining of allocas across functions, at least on tail calls. My current idea would be to * invent an attribute to mark my parameter. Lets say "reusealloca" * at the beginning of the optimization pass, collect all parameters of type reusealloca and place them in the alloca map with lifetimes ending before the tail call (figure out how to find it) --- void h( struct a_b *p); void g( struct a_b __attribute((reusealloca)) *x) { struct a_b y; // unneeded, use space provided by x y.a = 18; y.b = x->b; // unneeded, &y.b == &x->b h( &y); } void f( void) { struct a_b x; x.a = 1; x.b = 3; g( &x); } --- Does that sound feasible ? Ciao Nat!