The Polyhedron 2005 benchmark results for dragonegg svn at r141492 using FSF gcc 4.6.2svn measured on x86_64-apple-darwin11 are listed below. The benchmarks used the optimizaton flags... -msse4 -ffast-math -funroll-loops -O3 in all cases. The use of -fplugin-arg-dragonegg-enable-gcc-optzns to allow for autovectorization from the FSF gcc front-end only produces a single run-time regression, fatigue, which is PR10892. Run time Benchmark gfortran dragonegg dragonegg+optnz ------------------------------------------------ ac 8.81 10.83 8.89 aermod 18.21 16.77 15.79 air 5.51 7.12 6.66 capacita 32.59 42.30 36.53 channel 1.84 2.52 1.95 doduc 26.78 30.24 27.95 fatigue 8.49 9.25 0.01 * gas_dyn 4.27 11.68 4.44 induct 13.66 24.02 12.19 linpk 15.24 15.56 15.77 mdbx 11.21 12.00 11.75 nf 28.01 28.78 29.27 protein 32.64 38.29 36.43 rnflow 24.01 31.96 26.46 test_fpu 8.03 11.49 9.35 tfft 1.87 1.92 1.93 Compile time Benchmark gfortran dragonegg dragonegg+optnz ------------------------------------------------ ac 1.19 0.28 1.52 aermod 37.94 20.05 26.46 air 2.18 1.02 1.46 capacita 1.75 0.50 0.92 channel 0.60 0.22 0.40 doduc 5.25 1.57 3.07 fatigue 1.74 0.89 1.22 gas_dyn 2.97 0.66 1.19 induct 3.91 1.73 2.89 linpk 0.77 0.19 0.45 mdbx 1.81 0.61 1.13 nf 1.80 0.33 0.72 protein 3.97 0.95 1.76 rnflow 5.43 1.26 2.62 test_fpu 4.32 0.96 2.06 tfft 0.54 0.18 0.31
Hi Jack,> The Polyhedron 2005 benchmark results for dragonegg svn at r141492 > using FSF gcc 4.6.2svn measured on x86_64-apple-darwin11 are listed below. > The benchmarks used the optimizaton flags... > > -msse4 -ffast-math -funroll-loops -O3 > > in all cases. The use of -fplugin-arg-dragonegg-enable-gcc-optzns to allow > for autovectorization from the FSF gcc front-end only produces a single run-time > regression, fatigue, which is PR10892.thanks for these numbers. I suggest you also try -O4. This does heavier LLVM optimization when used with -fplugin-arg-dragonegg-enable-gcc-optzns, and seems to typically result in faster code. You can also use -O6, which does even more LLVM optimizing, but seems to slow things down (I didn't analyse why yet). Ciao, Duncan. PS: With -fplugin-arg-dragonegg-enable-gcc-optzns the LLVM optimizers are run at the following levels: Command line option LLVM optimizers run at ------------------- ---------------------- -O1 tiny amount of optimization -O2 or -O3 -O1 -O4 or -O5 -O2 -O6 or better -O3
On Oct 8, 2011, at 12:05 PM, Duncan Sands wrote:> PS: With -fplugin-arg-dragonegg-enable-gcc-optzns the LLVM optimizers are run at > the following levels: > > Command line option LLVM optimizers run at > ------------------- ---------------------- > -O1 tiny amount of optimization > -O2 or -O3 -O1 > -O4 or -O5 -O2 > -O6 or better -O3Hi Duncan, Out of curiosity, why do you follow this approach? People generally use -O2 or -O3. I'd recommend switching dragonegg to line those up with whatever you want people to use. -Chris
Duncan, et al., I am interested in getting dragonegg to work on PowerPC. Obviously the stuff in src/x86 needs to be replaced/replicated for PowerPC, but if you have a few minutes, can you provide your thoughts on what has to be changed between x86 and PPC. Thanks in advance, Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Hi Hal,> I am interested in getting dragonegg to work on PowerPC. Obviously the > stuff in src/x86 needs to be replaced/replicated for PowerPC, but if you > have a few minutes, can you provide your thoughts on what has to be > changed between x86 and PPC.you should probably start by doing this: copy gcc/config/rs6000/llvm-rs6000.cpp to (in the dragonegg source) src/ppc/Target.cpp. Extract the LLVM bits of rs6000.h into include/ppc/dragonegg/Target.h. Be inspired by the corresponding x86 Target.cpp and Target.h. Try to compile. Ciao, Duncan.