Dear all, I am doing few experiments to do understand optimization phase interactions. Here is a brief description of my experiements. 1. I picked the list of machine independent optimizations acting on llvm IR (those that are enabled at O3). 2. for each optimzation in the optimization-list a) Compiled the program using 'clang -c O0 -flto program.c' b) opt -optimization program.o -o optprogram.o c) llc optprogram.o d) gcc optprogram.o.s e) Measure the performance of the generated executable. 3. for each optimization pair [opt1, opt2] from the optimization list a) Compiled the program using 'clang -c O0 -flto program.c' b) opt -opt1 -opt2 program.o -o optprogram.o c) llc optprogram.o d) gcc optprogram.o.s e) Measure the performance of the generated executable. My intention is understand or model phase interactions by observing this data and the corresponding program's static/dynamic features. However I couldn't glean much information from this data as almost in all cases there is no change in the runtime when compared to O0 except for few programs where gvn and loop-rotate improved the program performance to some extent. But the 'scalarrepl' optimization is an exception because it almost consistently improved the program performance and in fact it almost matches the O3 level performance of the program. Can some one enlighten about what is happening? Is there any thing wrong in my experimental setup? Thank you -Suresh
On 19 June 2011 14:44, Suresh Purini <suresh.purini at gmail.com> wrote:> I am doing few experiments to do understand optimization phase > interactions. Here is a brief description of my experiements. > > 1. I picked the list of machine independent optimizations acting on > llvm IR (those that are enabled at O3). > 2. for each optimzation in the optimization-list > a) Compiled the program using 'clang -c O0 -flto program.c' > b) opt -optimization program.o -o optprogram.o > c) llc optprogram.o > d) gcc optprogram.o.s > e) Measure the performance of the generated executable. > 3. for each optimization pair [opt1, opt2] from the optimization list > a) Compiled the program using 'clang -c O0 -flto program.c' > b) opt -opt1 -opt2 program.o -o optprogram.o > c) llc optprogram.o > d) gcc optprogram.o.s > e) Measure the performance of the generated executable. > > My intention is understand or model phase interactions by observing > this data and the corresponding program's static/dynamic features. > However I couldn't glean much information from this data as almost in > all cases there is no change in the runtime when compared to O0 except > for few programs where gvn and loop-rotate improved the program > performance to some extent. But the 'scalarrepl' optimization is an > exception because it almost consistently improved the program > performance and in fact it almost matches the O3 level performance of > the program. > > Can some one enlighten about what is happening? Is there any thing > wrong in my experimental setup?In short: it doesn't really make sense to run most of the optimizations before running -scalarrepl (or -mem2reg), and it makes even less sense to leave it out entirely. Almost all optimizations assume that -scalarrepl (or the less aggressive -mem2reg) has been run first. If neither has been run, then all variables are still stored on the stack, meaning they have to be loaded before each use and stored when they change. That makes it hard for other optimizations to see what's really happening because they usually consider every load to be a separate value. A better setup might be to always run -scalarrepl (or mem2reg) before 2b/3b. Running it in a separate opt invocation allows you to to save some cycles by pre-calculating it once.
How are you collecting the Static/Dynamic features? On Sun, Jun 19, 2011 at 6:08 AM, Frits van Bommel <fvbommel at gmail.com>wrote:> On 19 June 2011 14:44, Suresh Purini <suresh.purini at gmail.com> wrote: > > I am doing few experiments to do understand optimization phase > > interactions. Here is a brief description of my experiements. > > > > 1. I picked the list of machine independent optimizations acting on > > llvm IR (those that are enabled at O3). > > 2. for each optimzation in the optimization-list > > a) Compiled the program using 'clang -c O0 -flto program.c' > > b) opt -optimization program.o -o optprogram.o > > c) llc optprogram.o > > d) gcc optprogram.o.s > > e) Measure the performance of the generated executable. > > 3. for each optimization pair [opt1, opt2] from the optimization list > > a) Compiled the program using 'clang -c O0 -flto program.c' > > b) opt -opt1 -opt2 program.o -o optprogram.o > > c) llc optprogram.o > > d) gcc optprogram.o.s > > e) Measure the performance of the generated executable. > > > > My intention is understand or model phase interactions by observing > > this data and the corresponding program's static/dynamic features. > > However I couldn't glean much information from this data as almost in > > all cases there is no change in the runtime when compared to O0 except > > for few programs where gvn and loop-rotate improved the program > > performance to some extent. But the 'scalarrepl' optimization is an > > exception because it almost consistently improved the program > > performance and in fact it almost matches the O3 level performance of > > the program. > > > > Can some one enlighten about what is happening? Is there any thing > > wrong in my experimental setup? > > In short: it doesn't really make sense to run most of the > optimizations before running -scalarrepl (or -mem2reg), and it makes > even less sense to leave it out entirely. > > Almost all optimizations assume that -scalarrepl (or the less > aggressive -mem2reg) has been run first. If neither has been run, then > all variables are still stored on the stack, meaning they have to be > loaded before each use and stored when they change. That makes it hard > for other optimizations to see what's really happening because they > usually consider every load to be a separate value. > > A better setup might be to always run -scalarrepl (or mem2reg) before > 2b/3b. Running it in a separate opt invocation allows you to to save > some cycles by pre-calculating it once. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-- Sameer Kulkarni My Present email load: http://courteous.ly/Ok2EKh Work: www.cis.udel.edu/~skulkarn/ -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20110624/316f4c87/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Phase Interactions
- optim and optimize are not finding the right parameter
- [LLVMdev] Impact of an analysis pass on program run time
- Bug/limitation: allowoptions (for label), implicit (for automated)
- R 2.9.0 devel: package installation with configure-args option