Paul Peet via llvm-dev
2016-Feb-08 14:54 UTC
[llvm-dev] Memory Store/Load Optimization Issue (Emulating stack)
Hello, I am trying to emulate the "stack" as like on x86 when using push/pop so afterwards I can use LLVM's optimizer passes to simplify (reduce junk) the code. The LLVM IR code: define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) { ; push foo (On "stack") %sp_1 = sub i32 %sp, 4 %sp_1_ptr = inttoptr i32 %sp_1 to i32* store i32 %foo, i32* %sp_1_ptr, align 4 ; push bar %sp_2 = sub i32 %sp_1, 4 %sp_2_ptr = inttoptr i32 %sp_2 to i32* store i32 %bar, i32* %sp_2_ptr, align 4 ; val1 = pop (val1 = bar) %sp_3_ptr = inttoptr i32 %sp_2 to i32* %val1 = load i32, i32* %sp_3_ptr, align 4 %sp_3 = add i32 %sp_2, 4 ; val2 = pop (val2 = foo) %sp_4_ptr = inttoptr i32 %sp_3 to i32* %val2 = load i32, i32* %sp_4_ptr, align 4 %sp_4 = add i32 %sp_3, 4 %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %val1, 0 %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1 %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp_4, 2 ret { i32, i32, i32 } %ret_3 } This code will "push" two values onto the stack and pop them in reverse order so afterwards "foo" and "bar" will be swapped and returned back. After running this through "opt -O2 ./test.ll", I am getting this: define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) #0 { %sp_1 = add i32 %sp, -4 %1 = zext i32 %sp_1 to i64 %sp_1_ptr = inttoptr i64 %1 to i32* store i32 %foo, i32* %sp_1_ptr, align 4 %sp_2 = add i32 %sp, -8 %2 = zext i32 %sp_2 to i64 %sp_2_ptr = inttoptr i64 %2 to i32* store i32 %bar, i32* %sp_2_ptr, align 4 %val2 = load i32, i32* %sp_1_ptr, align 4 %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %bar, 0 ; Swapped %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1; Not Swapped (Not optimized; Should be %foo) %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp, 2 ret { i32, i32, i32 } %ret_3 } As you can see that the IR has got additional code, eg. zext. But the main problem here is that val2 hasn't been optimized. Could anyone show me some hints what is preventing the second val from being optimized? (My guess would be the zext because I am using %sp as a 32bit pointer although the "target" is 64bit). Regards, Paul -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160208/dc097a10/attachment.html>
Philip Reames via llvm-dev
2016-Feb-09 23:26 UTC
[llvm-dev] Memory Store/Load Optimization Issue (Emulating stack)
Two points: - Using inttoptr is a mistake here. GEPs are strongly preferred and provide strictly more aliasing information to the optimizer. - The zext is a bit weird. I'm not sure where that came from, but I'd not bother looking into until the preceding point is addressed. In general, you may find these docs useful: http://llvm.org/docs/Frontend/PerformanceTips.html Philip On 02/08/2016 06:54 AM, Paul Peet via llvm-dev wrote:> Hello, > > I am trying to emulate the "stack" as like on x86 when using push/pop > so afterwards I can use LLVM's optimizer passes to simplify (reduce > junk) the code. > > The LLVM IR code: > > define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) { > ; push foo (On "stack") > %sp_1 = sub i32 %sp, 4 > %sp_1_ptr = inttoptr i32 %sp_1 to i32* > store i32 %foo, i32* %sp_1_ptr, align 4 > > ; push bar > %sp_2 = sub i32 %sp_1, 4 > %sp_2_ptr = inttoptr i32 %sp_2 to i32* > store i32 %bar, i32* %sp_2_ptr, align 4 > > ; val1 = pop (val1 = bar) > %sp_3_ptr = inttoptr i32 %sp_2 to i32* > %val1 = load i32, i32* %sp_3_ptr, align 4 > %sp_3 = add i32 %sp_2, 4 > > ; val2 = pop (val2 = foo) > %sp_4_ptr = inttoptr i32 %sp_3 to i32* > %val2 = load i32, i32* %sp_4_ptr, align 4 > %sp_4 = add i32 %sp_3, 4 > > %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %val1, 0 > %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1 > %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp_4, 2 > > ret { i32, i32, i32 } %ret_3 > } > > This code will "push" two values onto the stack and pop them in > reverse order so afterwards "foo" and "bar" will be swapped and > returned back. > > After running this through "opt -O2 ./test.ll", I am getting this: > > define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) #0 { > %sp_1 = add i32 %sp, -4 > %1 = zext i32 %sp_1 to i64 > %sp_1_ptr = inttoptr i64 %1 to i32* > store i32 %foo, i32* %sp_1_ptr, align 4 > %sp_2 = add i32 %sp, -8 > %2 = zext i32 %sp_2 to i64 > %sp_2_ptr = inttoptr i64 %2 to i32* > store i32 %bar, i32* %sp_2_ptr, align 4 > %val2 = load i32, i32* %sp_1_ptr, align 4 > %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %bar, 0 ; Swapped > %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1; Not > Swapped (Not optimized; Should be %foo) > %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp, 2 > ret { i32, i32, i32 } %ret_3 > } > > As you can see that the IR has got additional code, eg. zext. But the > main problem here is that val2 hasn't been optimized. > Could anyone show me some hints what is preventing the second val from > being optimized? (My guess would be the zext because I am using %sp as > a 32bit pointer although the "target" is 64bit). > > Regards, > Paul > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160209/cb973f5d/attachment.html>
Paul Peet via llvm-dev
2016-Feb-10 20:18 UTC
[llvm-dev] Memory Store/Load Optimization Issue (Emulating stack)
Thank you for the hint. I adjusted the code and it works: The code after replacing inttoptr with getelementptr: define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp) { entry: ; push foo (On "stack") %sp_1 = getelementptr i8, i8* %sp, i32 -4 %sp_1_ptr = bitcast i8* %sp_1 to i32* store i32 %foo, i32* %sp_1_ptr, align 4 ; push bar %sp_2 = getelementptr i8, i8* %sp_1, i32 -4 %sp_2_ptr = bitcast i8* %sp_2 to i32* store i32 %bar, i32* %sp_2_ptr, align 4 ; val1 = pop (val1 = bar) %sp_3_ptr = bitcast i8* %sp_2 to i32* %val1 = load i32, i32* %sp_3_ptr, align 4 %sp_3 = getelementptr i8, i8* %sp_2, i32 4 ; val2 = pop (val2 = foo) %sp_4_ptr = bitcast i8* %sp_3 to i32* %val2 = load i32, i32* %sp_4_ptr, align 4 %sp_4 = getelementptr i8, i8* %sp_3, i32 4 %ret_1 = insertvalue { i32, i32, i8* } undef, i32 %val1, 0 %ret_2 = insertvalue { i32, i32, i8* } %ret_1, i32 %val2, 1 %ret_3 = insertvalue { i32, i32, i8* } %ret_2, i8* %sp_4, 2 ret { i32, i32, i8* } %ret_3 } After optimization ("opt -instcombine ./code.ll -S") define { i32, i32, i8* } @test(i32 %foo, i32 %bar, i8* %sp) { entry: %sp_1 = getelementptr i8, i8* %sp, i64 -4 %sp_1_ptr = bitcast i8* %sp_1 to i32* store i32 %foo, i32* %sp_1_ptr, align 4 %sp_2 = getelementptr i8, i8* %sp, i64 -8 %sp_2_ptr = bitcast i8* %sp_2 to i32* store i32 %bar, i32* %sp_2_ptr, align 4 %ret_1 = insertvalue { i32, i32, i8* } undef, i32 %bar, 0 %ret_2 = insertvalue { i32, i32, i8* } %ret_1, i32 %foo, 1 %ret_3 = insertvalue { i32, i32, i8* } %ret_2, i8* %sp, 2 ret { i32, i32, i8* } %ret_3 } My only questions are now: - How is it that inttoptr cannot provide that specific alias information so it can optimize that store/load away ? - Might it be possible to get inttoptr providing such alias analysis ? - I came across MemorySSA while browsing though the llvm source. Is it possible that one can use MemorySSA to do such optimization without alias analysis ? - Where do I have to look in the source which is doing this kind of optimization (Is it instcombine which uses lib/Analysis/Loads.cpp ?) Regards, Paul 2016-02-10 0:26 GMT+01:00 Philip Reames <listmail at philipreames.com>:> Two points: > - Using inttoptr is a mistake here. GEPs are strongly preferred and > provide strictly more aliasing information to the optimizer. > - The zext is a bit weird. I'm not sure where that came from, but I'd not > bother looking into until the preceding point is addressed. > > In general, you may find these docs useful: > http://llvm.org/docs/Frontend/PerformanceTips.html > > Philip > > > > On 02/08/2016 06:54 AM, Paul Peet via llvm-dev wrote: > > Hello, > > I am trying to emulate the "stack" as like on x86 when using push/pop so > afterwards I can use LLVM's optimizer passes to simplify (reduce junk) the > code. > > The LLVM IR code: > > define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) { > ; push foo (On "stack") > %sp_1 = sub i32 %sp, 4 > %sp_1_ptr = inttoptr i32 %sp_1 to i32* > store i32 %foo, i32* %sp_1_ptr, align 4 > > ; push bar > %sp_2 = sub i32 %sp_1, 4 > %sp_2_ptr = inttoptr i32 %sp_2 to i32* > store i32 %bar, i32* %sp_2_ptr, align 4 > > ; val1 = pop (val1 = bar) > %sp_3_ptr = inttoptr i32 %sp_2 to i32* > %val1 = load i32, i32* %sp_3_ptr, align 4 > %sp_3 = add i32 %sp_2, 4 > > ; val2 = pop (val2 = foo) > %sp_4_ptr = inttoptr i32 %sp_3 to i32* > %val2 = load i32, i32* %sp_4_ptr, align 4 > %sp_4 = add i32 %sp_3, 4 > > %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %val1, 0 > %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1 > %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp_4, 2 > > ret { i32, i32, i32 } %ret_3 > } > > This code will "push" two values onto the stack and pop them in reverse > order so afterwards "foo" and "bar" will be swapped and returned back. > > After running this through "opt -O2 ./test.ll", I am getting this: > > define { i32, i32, i32 } @test(i32 %foo, i32 %bar, i32 %sp) #0 { > %sp_1 = add i32 %sp, -4 > %1 = zext i32 %sp_1 to i64 > %sp_1_ptr = inttoptr i64 %1 to i32* > store i32 %foo, i32* %sp_1_ptr, align 4 > %sp_2 = add i32 %sp, -8 > %2 = zext i32 %sp_2 to i64 > %sp_2_ptr = inttoptr i64 %2 to i32* > store i32 %bar, i32* %sp_2_ptr, align 4 > %val2 = load i32, i32* %sp_1_ptr, align 4 > %ret_1 = insertvalue { i32, i32, i32 } undef, i32 %bar, 0 ; Swapped > %ret_2 = insertvalue { i32, i32, i32 } %ret_1, i32 %val2, 1; Not Swapped > (Not optimized; Should be %foo) > %ret_3 = insertvalue { i32, i32, i32 } %ret_2, i32 %sp, 2 > ret { i32, i32, i32 } %ret_3 > } > > As you can see that the IR has got additional code, eg. zext. But the main > problem here is that val2 hasn't been optimized. > Could anyone show me some hints what is preventing the second val from > being optimized? (My guess would be the zext because I am using %sp as a > 32bit pointer although the "target" is 64bit). > > Regards, > Paul > > > _______________________________________________ > LLVM Developers mailing listllvm-dev at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160210/df898102/attachment.html>