Duncan, With the commit from http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121203/158488.html, the Polyhedron 2005 benchmarks complete again on x86_64-apple-darwin12. The result are similar to what were seen with FSF gcc 4.6.2svn and llvm/dragonegg 3.0 (which was the last release that passed pb05) http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html. Jack ps Has an exhaustive effort been made yet to insure that llvm/dragonegg isn't still unnecessarily scalarizing the vector code generated by FSF gcc? If that issue were completely solved, llvm/dragonegg might become faster than vanilla FSF gcc. FSF gcc 4.7.2 with llvm/dragonegg 3.2 branch a) de-gfortran47 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n b) de-gfortran47 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n c) gfortran-fsf-4.7 msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n Run time (secs) Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 ac 12.28 8.02 8.17 aermod 15.88 14.54 16.49 air 7.02 5.42 5.80 capacita 39.97 34.93 32.53 channel 2.08 2.10 1.83 doduc 27.17 27.59 25.71 fatigue 8.75 7.80 8.31 gas_dyn 12.11 4.64 3.98 induct 24.03 11.86 12.11 linpk 15.49 15.47 15.46 mdbx 11.90 11.31 11.18 nf 29.34 29.67 28.01 protein 36.31 35.33 31.98 rnflow 27.27 26.74 24.67 test_fpu 11.31 9.13 7.91 tfft 1.93 1.94 1.86 Geom. Mean 13.27 11.02 10.64 Compile time (secs) Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 ac 0.33 1.63 1.72 aermod 21.20 29.47 42.25 air 1.13 2.66 4.38 capacita 0.51 1.00 1.85 channel 0.32 0.52 0.64 doduc 1.79 3.74 5.84 fatigue 0.91 1.29 1.93 gas_dyn 0.65 1.32 3.34 induct 1.73 2.81 3.93 linpk 0.22 0.51 0.91 mdbx 0.64 1.28 2.09 nf 0.39 0.79 2.07 protein 1.11 1.95 4.30 rnflow 1.25 2.87 6.32 test_fpu 0.87 2.25 5.14 tfft 0.21 0.35 0.58 Executable (bytes) Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 ac 26768 47144 59104 aermod 1039416 1065048 1396928 air 61924 65948 110752 capacita 41328 45424 77904 channel 22720 26680 34688 doduc 128360 140564 205304 fatigue 69736 69800 90224 gas_dyn 58936 67232 123664 induct 163072 167296 179064 linpk 18664 26976 42624 mdbx 53580 57684 90216 nf 23864 36176 84056 protein 74944 87128 131960 rnflow 71784 92344 205576 test_fpu 54088 74520 179448 tfft 18552 18400 30664
Hi Jack, thanks for these numbers.> With the commit from http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121203/158488.html, > the Polyhedron 2005 benchmarks complete again on x86_64-apple-darwin12. The result are similar to what > were seen with FSF gcc 4.6.2svn and llvm/dragonegg 3.0 (which was the last release that passed pb05) > http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html. > Jack > ps Has an exhaustive effort been made yet to insure that llvm/dragonegg isn't still unnecessarily scalarizing > the vector code generated by FSF gcc?As far as I know, no effort has been made at all. If that issue were completely solved, llvm/dragonegg might become faster> than vanilla FSF gcc.Another issue is that, until recently, LLVM didn't have much in the way of fast-math optimizations. It should be better in 3.3. Ciao, Duncan.> > FSF gcc 4.7.2 with llvm/dragonegg 3.2 branch > > a) de-gfortran47 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n > b) de-gfortran47 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n > c) gfortran-fsf-4.7 msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n > > Run time (secs) > > Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 > ac 12.28 8.02 8.17 > aermod 15.88 14.54 16.49 > air 7.02 5.42 5.80 > capacita 39.97 34.93 32.53 > channel 2.08 2.10 1.83 > doduc 27.17 27.59 25.71 > fatigue 8.75 7.80 8.31 > gas_dyn 12.11 4.64 3.98 > induct 24.03 11.86 12.11 > linpk 15.49 15.47 15.46 > mdbx 11.90 11.31 11.18 > nf 29.34 29.67 28.01 > protein 36.31 35.33 31.98 > rnflow 27.27 26.74 24.67 > test_fpu 11.31 9.13 7.91 > tfft 1.93 1.94 1.86 > > Geom. Mean 13.27 11.02 10.64 > > Compile time (secs) > > Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 > ac 0.33 1.63 1.72 > aermod 21.20 29.47 42.25 > air 1.13 2.66 4.38 > capacita 0.51 1.00 1.85 > channel 0.32 0.52 0.64 > doduc 1.79 3.74 5.84 > fatigue 0.91 1.29 1.93 > gas_dyn 0.65 1.32 3.34 > induct 1.73 2.81 3.93 > linpk 0.22 0.51 0.91 > mdbx 0.64 1.28 2.09 > nf 0.39 0.79 2.07 > protein 1.11 1.95 4.30 > rnflow 1.25 2.87 6.32 > test_fpu 0.87 2.25 5.14 > tfft 0.21 0.35 0.58 > > Executable (bytes) > > Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 > ac 26768 47144 59104 > aermod 1039416 1065048 1396928 > air 61924 65948 110752 > capacita 41328 45424 77904 > channel 22720 26680 34688 > doduc 128360 140564 205304 > fatigue 69736 69800 90224 > gas_dyn 58936 67232 123664 > induct 163072 167296 179064 > linpk 18664 26976 42624 > mdbx 53580 57684 90216 > nf 23864 36176 84056 > protein 74944 87128 131960 > rnflow 71784 92344 205576 > test_fpu 54088 74520 179448 > tfft 18552 18400 30664 >
On Mon, Dec 10, 2012 at 05:20:31AM +0100, Duncan Sands wrote:> Hi Jack, thanks for these numbers. > >> With the commit from http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121203/158488.html, >> the Polyhedron 2005 benchmarks complete again on x86_64-apple-darwin12. The result are similar to what >> were seen with FSF gcc 4.6.2svn and llvm/dragonegg 3.0 (which was the last release that passed pb05) >> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html. >> Jack >> ps Has an exhaustive effort been made yet to insure that llvm/dragonegg isn't still unnecessarily scalarizing >> the vector code generated by FSF gcc? > > As far as I know, no effort has been made at all.Duncan, Could you propose a testing patch that would emit warnings on each instance of scalarization of vectors for use with llvm/dragonegg trunk? I would be happy to file the PR's for any of those instances in the pb05 testsuite compilation using it. Jack> > If that issue were completely solved, llvm/dragonegg might become faster >> than vanilla FSF gcc. > > Another issue is that, until recently, LLVM didn't have much in the way of > fast-math optimizations. It should be better in 3.3. > > Ciao, Duncan. > >> >> FSF gcc 4.7.2 with llvm/dragonegg 3.2 branch >> >> a) de-gfortran47 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n >> b) de-gfortran47 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n >> c) gfortran-fsf-4.7 msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n >> >> Run time (secs) >> >> Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 >> ac 12.28 8.02 8.17 >> aermod 15.88 14.54 16.49 >> air 7.02 5.42 5.80 >> capacita 39.97 34.93 32.53 >> channel 2.08 2.10 1.83 >> doduc 27.17 27.59 25.71 >> fatigue 8.75 7.80 8.31 >> gas_dyn 12.11 4.64 3.98 >> induct 24.03 11.86 12.11 >> linpk 15.49 15.47 15.46 >> mdbx 11.90 11.31 11.18 >> nf 29.34 29.67 28.01 >> protein 36.31 35.33 31.98 >> rnflow 27.27 26.74 24.67 >> test_fpu 11.31 9.13 7.91 >> tfft 1.93 1.94 1.86 >> >> Geom. Mean 13.27 11.02 10.64 >> >> Compile time (secs) >> >> Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 >> ac 0.33 1.63 1.72 >> aermod 21.20 29.47 42.25 >> air 1.13 2.66 4.38 >> capacita 0.51 1.00 1.85 >> channel 0.32 0.52 0.64 >> doduc 1.79 3.74 5.84 >> fatigue 0.91 1.29 1.93 >> gas_dyn 0.65 1.32 3.34 >> induct 1.73 2.81 3.93 >> linpk 0.22 0.51 0.91 >> mdbx 0.64 1.28 2.09 >> nf 0.39 0.79 2.07 >> protein 1.11 1.95 4.30 >> rnflow 1.25 2.87 6.32 >> test_fpu 0.87 2.25 5.14 >> tfft 0.21 0.35 0.58 >> >> Executable (bytes) >> >> Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 >> ac 26768 47144 59104 >> aermod 1039416 1065048 1396928 >> air 61924 65948 110752 >> capacita 41328 45424 77904 >> channel 22720 26680 34688 >> doduc 128360 140564 205304 >> fatigue 69736 69800 90224 >> gas_dyn 58936 67232 123664 >> induct 163072 167296 179064 >> linpk 18664 26976 42624 >> mdbx 53580 57684 90216 >> nf 23864 36176 84056 >> protein 74944 87128 131960 >> rnflow 71784 92344 205576 >> test_fpu 54088 74520 179448 >> tfft 18552 18400 30664 >>
On Mon, Dec 10, 2012 at 05:20:31AM +0100, Duncan Sands wrote:> Hi Jack, thanks for these numbers. > >> With the commit from http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20121203/158488.html, >> the Polyhedron 2005 benchmarks complete again on x86_64-apple-darwin12. The result are similar to what >> were seen with FSF gcc 4.6.2svn and llvm/dragonegg 3.0 (which was the last release that passed pb05) >> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html. >> Jack >> ps Has an exhaustive effort been made yet to insure that llvm/dragonegg isn't still unnecessarily scalarizing >> the vector code generated by FSF gcc? > > As far as I know, no effort has been made at all.Duncan, I tried adding... Index: lib/Transforms/Vectorize/LoopVectorize.cpp ==================================================================--- lib/Transforms/Vectorize/LoopVectorize.cpp (revision 169738) +++ lib/Transforms/Vectorize/LoopVectorize.cpp (working copy) @@ -714,6 +714,8 @@ void InnerLoopVectorizer::scalarizeInstr Cloned->setOperand(op, Op); } + DEBUG(dbgs() << "LV: scalarizing vector.\n"); + // Place the cloned scalar in the new loop. Builder.Insert(Cloned); to llvm svn for a debug build using... -DCMAKE_BUILD_TYPE=Debug However this doesn't produce any output when compiling with dragonegg using... de-gfortran47 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-enable-gcc-optzns aermod.f90 -v -o aermod Is there something I have to pass through dragonegg in order to trigger the debug output from llvm? Thanks in advance for any clarification. Jack> > If that issue were completely solved, llvm/dragonegg might become faster >> than vanilla FSF gcc. > > Another issue is that, until recently, LLVM didn't have much in the way of > fast-math optimizations. It should be better in 3.3. > > Ciao, Duncan. > >> >> FSF gcc 4.7.2 with llvm/dragonegg 3.2 branch >> >> a) de-gfortran47 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n >> b) de-gfortran47 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n >> c) gfortran-fsf-4.7 msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n >> >> Run time (secs) >> >> Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 >> ac 12.28 8.02 8.17 >> aermod 15.88 14.54 16.49 >> air 7.02 5.42 5.80 >> capacita 39.97 34.93 32.53 >> channel 2.08 2.10 1.83 >> doduc 27.17 27.59 25.71 >> fatigue 8.75 7.80 8.31 >> gas_dyn 12.11 4.64 3.98 >> induct 24.03 11.86 12.11 >> linpk 15.49 15.47 15.46 >> mdbx 11.90 11.31 11.18 >> nf 29.34 29.67 28.01 >> protein 36.31 35.33 31.98 >> rnflow 27.27 26.74 24.67 >> test_fpu 11.31 9.13 7.91 >> tfft 1.93 1.94 1.86 >> >> Geom. Mean 13.27 11.02 10.64 >> >> Compile time (secs) >> >> Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 >> ac 0.33 1.63 1.72 >> aermod 21.20 29.47 42.25 >> air 1.13 2.66 4.38 >> capacita 0.51 1.00 1.85 >> channel 0.32 0.52 0.64 >> doduc 1.79 3.74 5.84 >> fatigue 0.91 1.29 1.93 >> gas_dyn 0.65 1.32 3.34 >> induct 1.73 2.81 3.93 >> linpk 0.22 0.51 0.91 >> mdbx 0.64 1.28 2.09 >> nf 0.39 0.79 2.07 >> protein 1.11 1.95 4.30 >> rnflow 1.25 2.87 6.32 >> test_fpu 0.87 2.25 5.14 >> tfft 0.21 0.35 0.58 >> >> Executable (bytes) >> >> Benchmark de-gfortran47 de-gfortran47+optzns gfortran47 >> ac 26768 47144 59104 >> aermod 1039416 1065048 1396928 >> air 61924 65948 110752 >> capacita 41328 45424 77904 >> channel 22720 26680 34688 >> doduc 128360 140564 205304 >> fatigue 69736 69800 90224 >> gas_dyn 58936 67232 123664 >> induct 163072 167296 179064 >> linpk 18664 26976 42624 >> mdbx 53580 57684 90216 >> nf 23864 36176 84056 >> protein 74944 87128 131960 >> rnflow 71784 92344 205576 >> test_fpu 54088 74520 179448 >> tfft 18552 18400 30664 >>
Reasonably Related Threads
- [LLVMdev] pb05 benchmarks for llvm/dragonegg 3.2
- [LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn
- [LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn
- [LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn
- [LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn