Attached are the Polyhedron 2005 benchmark results for current llvm/dragonegg svn on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3. The benchmarks for -msse3 and -msse4 appear identical (at least for degg+optnz). This is fortunate since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core 2 Duo Merom (http://llvm.org/bugs/show_bug.cgi?id=12434). Jack llvm/dragonegg r153877 dragonegg: de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n degg+vectorize: de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n degg+optnz: de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n gfortran: gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n Ave Run (secs) dragonegg degg+vectorize degg+optnz gfortran ac 12.45 12.45 8.85 8.80 aermod 16.15 16.05 14.80 17.48 air 7.10 7.11 6.46 5.50 capacita 40.00 39.96 37.72 32.62 channel 2.16 2.15 1.99 1.84 doduc 29.13 28.41 27.48 26.74 fatigue 8.75 9.03 8.11 8.44 gas_dyn 11.72 11.80 4.47 4.26 induct 24.02 24.91 12.08 13.65 linpk 15.40 15.78 15.74 15.45 mdbx 11.80 12.22 11.86 11.20 nf 28.45 28.50 29.25 27.91 protein 38.15 39.26 37.87 32.49 rnflow 32.25 32.35 26.47 24.06 test_fpu 11.34 11.35 9.31 8.04 tftt 1.91 1.92 1.93 1.87 Geometric Mean 13.50 13.62 11.34 10.87 Compile (secs) dragonegg degg+vectorize degg+optnz gfortran ac 0.33 0.38 0.72 1.27 aermod 25.91 27.58 32.34 43.91 air 1.07 1.25 1.52 2.25 capacita 0.49 0.52 0.89 1.71 channel 0.29 0.36 0.50 0.62 doduc 1.71 4.50 3.25 5.34 fatigue 0.84 0.97 1.19 1.76 gas_dyn 0.67 0.68 1.20 3.02 induct 1.60 2.14 2.82 3.99 linpk 0.22 0.24 0.47 0.78 mdbx 0.63 0.77 1.16 1.85 nf 0.37 0.40 0.70 1.66 protein 0.93 1.02 1.75 4.01 rnflow 1.20 1.25 2.63 5.44 test_fpu 0.88 0.92 2.13 4.39 tftt 0.21 0.24 0.34 0.56 Executable (bytes) dragonegg degg+vectorize degg+optnz gfortran ac 26856 26856 39120 50968 aermod 1043700 1055988 1046288 1265640 air 62004 62004 53740 73988 capacita 41416 41416 45552 73896 channel 22808 22808 26768 34784 doduc 128448 128448 136996 197240 fatigue 69824 69824 69840 86080 gas_dyn 59112 59112 67416 119744 induct 163152 167248 167344 174976 linpk 18752 18752 27056 38648 mdbx 53692 53692 57884 82112 nf 23960 23960 32104 71800 protein 75032 75032 87208 132040 rnflow 71896 71896 96632 181120 test_fpu 54272 54272 78776 155072 tftt 18640 18640 18488 30768
Anton Korobeynikov
2012-Apr-03 07:03 UTC
[LLVMdev] pb05 results for current llvm/dragonegg
Hi Jack> dragonegg degg+vectorize degg+optnz gfortran > ac 12.45 12.45 8.85 8.80 > gas_dyn 11.72 11.80 4.47 4.26 > induct 24.02 24.91 12.08 13.65 > rnflow 32.25 32.35 26.47 24.06Any idea what might cause such differences here? -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Hi Jack,> Attached are the Polyhedron 2005 benchmark results for current llvm/dragonegg svn > on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3.thanks for the numbers. How does this compare to LLVM 3.0 - were there any regressions? Ciao, Duncan. The benchmarks> for -msse3 and -msse4 appear identical (at least for degg+optnz). This is fortunate > since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core 2 Duo Merom > (http://llvm.org/bugs/show_bug.cgi?id=12434). > Jack > > llvm/dragonegg r153877 > > dragonegg: > de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n > > degg+vectorize: > de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n > > degg+optnz: > de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n > > gfortran: > gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n > > Ave Run (secs) > dragonegg degg+vectorize degg+optnz gfortran > ac 12.45 12.45 8.85 8.80 > aermod 16.15 16.05 14.80 17.48 > air 7.10 7.11 6.46 5.50 > capacita 40.00 39.96 37.72 32.62 > channel 2.16 2.15 1.99 1.84 > doduc 29.13 28.41 27.48 26.74 > fatigue 8.75 9.03 8.11 8.44 > gas_dyn 11.72 11.80 4.47 4.26 > induct 24.02 24.91 12.08 13.65 > linpk 15.40 15.78 15.74 15.45 > mdbx 11.80 12.22 11.86 11.20 > nf 28.45 28.50 29.25 27.91 > protein 38.15 39.26 37.87 32.49 > rnflow 32.25 32.35 26.47 24.06 > test_fpu 11.34 11.35 9.31 8.04 > tftt 1.91 1.92 1.93 1.87 > > Geometric Mean 13.50 13.62 11.34 10.87 > > Compile (secs) > dragonegg degg+vectorize degg+optnz gfortran > ac 0.33 0.38 0.72 1.27 > aermod 25.91 27.58 32.34 43.91 > air 1.07 1.25 1.52 2.25 > capacita 0.49 0.52 0.89 1.71 > channel 0.29 0.36 0.50 0.62 > doduc 1.71 4.50 3.25 5.34 > fatigue 0.84 0.97 1.19 1.76 > gas_dyn 0.67 0.68 1.20 3.02 > induct 1.60 2.14 2.82 3.99 > linpk 0.22 0.24 0.47 0.78 > mdbx 0.63 0.77 1.16 1.85 > nf 0.37 0.40 0.70 1.66 > protein 0.93 1.02 1.75 4.01 > rnflow 1.20 1.25 2.63 5.44 > test_fpu 0.88 0.92 2.13 4.39 > tftt 0.21 0.24 0.34 0.56 > > Executable (bytes) > dragonegg degg+vectorize degg+optnz gfortran > ac 26856 26856 39120 50968 > aermod 1043700 1055988 1046288 1265640 > air 62004 62004 53740 73988 > capacita 41416 41416 45552 73896 > channel 22808 22808 26768 34784 > doduc 128448 128448 136996 197240 > fatigue 69824 69824 69840 86080 > gas_dyn 59112 59112 67416 119744 > induct 163152 167248 167344 174976 > linpk 18752 18752 27056 38648 > mdbx 53692 53692 57884 82112 > nf 23960 23960 32104 71800 > protein 75032 75032 87208 132040 > rnflow 71896 71896 96632 181120 > test_fpu 54272 54272 78776 155072 > tftt 18640 18640 18488 30768 >
Hi Anton,>> dragonegg degg+vectorize degg+optnz gfortran >> ac 12.45 12.45 8.85 8.80 >> gas_dyn 11.72 11.80 4.47 4.26 >> induct 24.02 24.91 12.08 13.65 >> rnflow 32.25 32.35 26.47 24.06 > Any idea what might cause such differences here?I haven't analysed these, but as a general remark: if "degg+optnz" does much better than "dragonegg" then that indicates a weakness in LLVM's IR level optimizers, while if "gfortran" does much better than "degg+optnz" then that indicates a weakness in LLVM's codegen. Applying this to the above suggests that most of the differences are coming from LLVM's IR level optimizers not doing a good job somewhere. Ciao, Duncan.
On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands wrote:> Hi Jack, > >> Attached are the Polyhedron 2005 benchmark results for current llvm/dragonegg svn >> on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3. > > thanks for the numbers. How does this compare to LLVM 3.0 - were there any > regressions?The results from just before llvm/dragonegg 3.0 was released are at... http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html It does look as if the ac benchmark has been regressed from 10.80 sec in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would be shocked if that was the origin of the performance regression). The results for -fplugin-arg-dragonegg-enable-gcc-optzns doesn't seem much improved in llvm 3.1 so I assume this means little progress was made in eliminating the scalarization of vectorizations in this release. Did we even get any code added to llvm that would allow us to identify instances of these scalarizations through a compiler warning? Also, the current -fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do almost nothing in terms of vectorization. Do we need to pass any additional flags to actually achieve autovectorization via llvm (in absence of -ftree-vectorize and -fplugin-arg-dragonegg-enable-gcc-optzns)? Jack> > Ciao, Duncan. > > The benchmarks >> for -msse3 and -msse4 appear identical (at least for degg+optnz). This is fortunate >> since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core 2 Duo Merom >> (http://llvm.org/bugs/show_bug.cgi?id=12434). >> Jack >> >> llvm/dragonegg r153877 >> >> dragonegg: >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n >> >> degg+vectorize: >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n >> >> degg+optnz: >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n >> >> gfortran: >> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n >> >> Ave Run (secs) >> dragonegg degg+vectorize degg+optnz gfortran >> ac 12.45 12.45 8.85 8.80 >> aermod 16.15 16.05 14.80 17.48 >> air 7.10 7.11 6.46 5.50 >> capacita 40.00 39.96 37.72 32.62 >> channel 2.16 2.15 1.99 1.84 >> doduc 29.13 28.41 27.48 26.74 >> fatigue 8.75 9.03 8.11 8.44 >> gas_dyn 11.72 11.80 4.47 4.26 >> induct 24.02 24.91 12.08 13.65 >> linpk 15.40 15.78 15.74 15.45 >> mdbx 11.80 12.22 11.86 11.20 >> nf 28.45 28.50 29.25 27.91 >> protein 38.15 39.26 37.87 32.49 >> rnflow 32.25 32.35 26.47 24.06 >> test_fpu 11.34 11.35 9.31 8.04 >> tftt 1.91 1.92 1.93 1.87 >> >> Geometric Mean 13.50 13.62 11.34 10.87 >> >> Compile (secs) >> dragonegg degg+vectorize degg+optnz gfortran >> ac 0.33 0.38 0.72 1.27 >> aermod 25.91 27.58 32.34 43.91 >> air 1.07 1.25 1.52 2.25 >> capacita 0.49 0.52 0.89 1.71 >> channel 0.29 0.36 0.50 0.62 >> doduc 1.71 4.50 3.25 5.34 >> fatigue 0.84 0.97 1.19 1.76 >> gas_dyn 0.67 0.68 1.20 3.02 >> induct 1.60 2.14 2.82 3.99 >> linpk 0.22 0.24 0.47 0.78 >> mdbx 0.63 0.77 1.16 1.85 >> nf 0.37 0.40 0.70 1.66 >> protein 0.93 1.02 1.75 4.01 >> rnflow 1.20 1.25 2.63 5.44 >> test_fpu 0.88 0.92 2.13 4.39 >> tftt 0.21 0.24 0.34 0.56 >> >> Executable (bytes) >> dragonegg degg+vectorize degg+optnz gfortran >> ac 26856 26856 39120 50968 >> aermod 1043700 1055988 1046288 1265640 >> air 62004 62004 53740 73988 >> capacita 41416 41416 45552 73896 >> channel 22808 22808 26768 34784 >> doduc 128448 128448 136996 197240 >> fatigue 69824 69824 69840 86080 >> gas_dyn 59112 59112 67416 119744 >> induct 163152 167248 167344 174976 >> linpk 18752 18752 27056 38648 >> mdbx 53692 53692 57884 82112 >> nf 23960 23960 32104 71800 >> protein 75032 75032 87208 132040 >> rnflow 71896 71896 96632 181120 >> test_fpu 54272 54272 78776 155072 >> tftt 18640 18640 18488 30768 >>
Hi Anton,>> dragonegg degg+vectorize degg+optnz gfortran >> ac 12.45 12.45 8.85 8.80 >> gas_dyn 11.72 11.80 4.47 4.26 >> induct 24.02 24.91 12.08 13.65 >> rnflow 32.25 32.35 26.47 24.06 > Any idea what might cause such differences here?if I'm reading Jack's latest numbers right, for gas_dyn and induct the difference is mainly due to GCC's vectorizer: with GCC's vectorizer and other optimizations: gas_dyn 4.47 induct 12.08 without GCC's vectorizer but with GCC's other optimizations: gas_dyn 10.02 induct 20.54 without any GCC optimizations, only LLVM's optimizers: gas_dyn 11.72 induct 24.02 So even without vectorization GCC is doing a better job, but not hugely better. Ciao, Duncan.
On 03/04/12 09:03, Anton Korobeynikov wrote:> Hi Jack > >> dragonegg degg+vectorize degg+optnz gfortran >> ac 12.45 12.45 8.85 8.80 >> gas_dyn 11.72 11.80 4.47 4.26 >> induct 24.02 24.91 12.08 13.65 >> rnflow 32.25 32.35 26.47 24.06 > Any idea what might cause such differences here? >With the attached patch to turn x/c into x*(1.0/c) in the code generators if -ffast-math is enabled, "ac" with LLVM optimizers goes from 40% slower to 5% slower when compared to "ac" compiled with the GCC optimizers. Currently LLVM does very little in the way of -ffast-math optimizations. There's clearly a lot of room for improvement here. Ciao, Duncan. -------------- next part -------------- A non-text attachment was scrubbed... Name: recip.diff Type: text/x-patch Size: 1238 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120406/14e208d9/attachment.bin>