Hi, It was suggested that I post my question regarding a LLVM 3.4 performance regression to this mailing list, rather than stackoverflow. So here is the link: https://stackoverflow.com/questions/22902034/llvm-3-4-performance-regressed Thanks :) Jens -- Jens Tröger http://savage.light-speed.de/
Hi, One reason for the regression might be that the SROA pass is now used instead of mem2reg; consider replacing your use of mem2reg by SROA. I also think it's meaningless to talk about performance without actually enabling any optimization passes. You mention "Inlining failed?" but don't enable any passes that would inline functions. If performance matters for you, consider using -O3 or a similar flag. Best, Jonas On Mon, Apr 7, 2014 at 4:05 AM, Jens Tröger <jens.troeger at light-speed.de> wrote:> > Hi, > > It was suggested that I post my question regarding a LLVM 3.4 performance > regression to this mailing list, rather than stackoverflow. So here is > the link: > >https://stackoverflow.com/questions/22902034/llvm-3-4-performance-regressed> > Thanks :) > Jens > > -- > Jens Tröger > http://savage.light-speed.de/ > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140407/b74fe562/attachment.html>
Hi, (adding llvm-dev again) I did enable optimization, but that didn't have an effect on the runtime> performance numbers. >Can you elaborate? For a program such as bzip2, I'd expect the program to be at least twice as fast with -O3 than with -O0. I also noticed that you use LLC in the final step. An alternative that works well for me is to use the gold linker plugin<http://llvm.org/docs/GoldPlugin.html>. This way, you can link bitcode files directly into the program by using clang -flto $(LDFLAGS) <bitcode files> -o <output file> $(LDLIBS) If your bitcode files have the extension .o, clang will only run LTO optimizations, code generation, and linking. If they have the extension .bc, it will run a full set of compilation passes (in which case you might want to add $(CFLAGS) to the command line). Hope this helps, Jonas -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140408/f77ffb01/attachment.html>
On Tue, Apr 8, 2014 at 1:05 AM, Jonas Wagner <jonas.wagner at epfl.ch> wrote:> Hi, > > (adding llvm-dev again) > >> I did enable optimization, but that didn't have an effect on the runtime >> performance numbers. > > > Can you elaborate? For a program such as bzip2, I'd expect the program to be > at least twice as fast with -O3 than with -O0. > > I also noticed that you use LLC in the final step.This is a thing I'd find suspect as well - the correct sequence of passes changes from time to time (certainly over several revisions). I'd be more inclined to do a normal clang (all the way to object files - or lto as described below) comparison, that'd rule out any dated pass sequencing you might have.> An alternative that works > well for me is to use the gold linker plugin. This way, you can link bitcode > files directly into the program by using > > clang -flto $(LDFLAGS) <bitcode files> -o <output file> $(LDLIBS) > > If your bitcode files have the extension .o, clang will only run LTO > optimizations, code generation, and linking. If they have the extension .bc, > it will run a full set of compilation passes (in which case you might want > to add $(CFLAGS) to the command line). > > Hope this helps, > Jonas > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >
Thanks Jonas, I wasn't aware of the gold linker plugin. Here's what I do, in my current workflow. First, I use clang to compile each .c file (e.g. for the bzip2 benchmark, or any other) into a .bc file: specmake clean 2> make.clean.err | tee make.clean.out rm -rf bzip2 bzip2.exe *.o *.fppized.f* find . \( -name \*.o -o -name '*.fppized.f*' \) -print | xargs rm -rf rm -rf core specmake build 2> make.err | tee make.out clang -g -std=c89 -D_GNU_SOURCE -c -emit-llvm -c -o spec.o -DSPEC_CPU -DNDEBUG spec.c clang -g -std=c89 -D_GNU_SOURCE -c -emit-llvm -c -o blocksort.o -DSPEC_CPU -DNDEBUG blocksort.c clang -g -std=c89 -D_GNU_SOURCE -c -emit-llvm -c -o bzip2.o -DSPEC_CPU -DNDEBUG bzip2.c bzip2.c:487:27: warning: incompatible pointer to integer conversion assigning to 'int' from 'void *' [-Wint-conversion] outputHandleJustInCase = NULL; ^ ~~~~ bzip2.c:614:27: warning: incompatible pointer to integer conversion assigning to 'int' from 'void *' [-Wint-conversion] outputHandleJustInCase = NULL; ^ ~~~~ 2 warnings generated. clang -g -std=c89 -D_GNU_SOURCE -c -emit-llvm -c -o bzlib.o -DSPEC_CPU -DNDEBUG bzlib.c clang -g -std=c89 -D_GNU_SOURCE -c -emit-llvm -c -o compress.o -DSPEC_CPU -DNDEBUG compress.c clang -g -std=c89 -D_GNU_SOURCE -c -emit-llvm -c -o crctable.o -DSPEC_CPU -DNDEBUG crctable.c clang -g -std=c89 -D_GNU_SOURCE -c -emit-llvm -c -o decompress.o -DSPEC_CPU -DNDEBUG decompress.c clang -g -std=c89 -D_GNU_SOURCE -c -emit-llvm -c -o huffman.o -DSPEC_CPU -DNDEBUG huffman.c clang -g -std=c89 -D_GNU_SOURCE -c -emit-llvm -c -o randtable.o -DSPEC_CPU -DNDEBUG randtable.c Once that's done, the Spec "linker" actually calls to a script of mine which uses llvm-link to merge all bitcode files into one, and then calls opt. Ordinarily this opt call would use -simplifycfg -mem2reg <my-passes> At that point I played around with various optimization switches to find out how I can get my performance back to that of LLVM 3.1 compiled code. Using just plain -std-compile-opts to replace my command line didn't work. Once opt has produced an optimized bitcode file, I call llc to lower it. Cheers, Jens On Tue, Apr 08, 2014 at 10:05:03AM +0200, Jonas Wagner wrote:> Hi, > (adding llvm-dev again) > > I did enable optimization, but that didn't have an effect on the > runtime > performance numbers. > > Can you elaborate? For a program such as bzip2, I'd expect the program > to be at least twice as fast with -O3 than with -O0. > I also noticed that you use LLC in the final step. An alternative that > works well for me is to use the [1]gold linker plugin. This way, you > can link bitcode files directly into the program by using > clang -flto $(LDFLAGS) <bitcode files> -o <output file> $(LDLIBS) > If your bitcode files have the extension .o, clang will only run LTO > optimizations, code generation, and linking. If they have the extension > .bc, it will run a full set of compilation passes (in which case you > might want to add $(CFLAGS) to the command line). > Hope this helps, > Jonas > > References > > 1. http://llvm.org/docs/GoldPlugin.html-- Jens Tröger http://savage.light-speed.de/