On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands wrote:> Hi Jack, > >> Attached are the Polyhedron 2005 benchmark results for current llvm/dragonegg svn >> on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3. > > thanks for the numbers. How does this compare to LLVM 3.0 - were there any > regressions?The results from just before llvm/dragonegg 3.0 was released are at... http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html It does look as if the ac benchmark has been regressed from 10.80 sec in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would be shocked if that was the origin of the performance regression). The results for -fplugin-arg-dragonegg-enable-gcc-optzns doesn't seem much improved in llvm 3.1 so I assume this means little progress was made in eliminating the scalarization of vectorizations in this release. Did we even get any code added to llvm that would allow us to identify instances of these scalarizations through a compiler warning? Also, the current -fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do almost nothing in terms of vectorization. Do we need to pass any additional flags to actually achieve autovectorization via llvm (in absence of -ftree-vectorize and -fplugin-arg-dragonegg-enable-gcc-optzns)? Jack> > Ciao, Duncan. > > The benchmarks >> for -msse3 and -msse4 appear identical (at least for degg+optnz). This is fortunate >> since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core 2 Duo Merom >> (http://llvm.org/bugs/show_bug.cgi?id=12434). >> Jack >> >> llvm/dragonegg r153877 >> >> dragonegg: >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n >> >> degg+vectorize: >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n >> >> degg+optnz: >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n >> >> gfortran: >> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n >> >> Ave Run (secs) >> dragonegg degg+vectorize degg+optnz gfortran >> ac 12.45 12.45 8.85 8.80 >> aermod 16.15 16.05 14.80 17.48 >> air 7.10 7.11 6.46 5.50 >> capacita 40.00 39.96 37.72 32.62 >> channel 2.16 2.15 1.99 1.84 >> doduc 29.13 28.41 27.48 26.74 >> fatigue 8.75 9.03 8.11 8.44 >> gas_dyn 11.72 11.80 4.47 4.26 >> induct 24.02 24.91 12.08 13.65 >> linpk 15.40 15.78 15.74 15.45 >> mdbx 11.80 12.22 11.86 11.20 >> nf 28.45 28.50 29.25 27.91 >> protein 38.15 39.26 37.87 32.49 >> rnflow 32.25 32.35 26.47 24.06 >> test_fpu 11.34 11.35 9.31 8.04 >> tftt 1.91 1.92 1.93 1.87 >> >> Geometric Mean 13.50 13.62 11.34 10.87 >> >> Compile (secs) >> dragonegg degg+vectorize degg+optnz gfortran >> ac 0.33 0.38 0.72 1.27 >> aermod 25.91 27.58 32.34 43.91 >> air 1.07 1.25 1.52 2.25 >> capacita 0.49 0.52 0.89 1.71 >> channel 0.29 0.36 0.50 0.62 >> doduc 1.71 4.50 3.25 5.34 >> fatigue 0.84 0.97 1.19 1.76 >> gas_dyn 0.67 0.68 1.20 3.02 >> induct 1.60 2.14 2.82 3.99 >> linpk 0.22 0.24 0.47 0.78 >> mdbx 0.63 0.77 1.16 1.85 >> nf 0.37 0.40 0.70 1.66 >> protein 0.93 1.02 1.75 4.01 >> rnflow 1.20 1.25 2.63 5.44 >> test_fpu 0.88 0.92 2.13 4.39 >> tftt 0.21 0.24 0.34 0.56 >> >> Executable (bytes) >> dragonegg degg+vectorize degg+optnz gfortran >> ac 26856 26856 39120 50968 >> aermod 1043700 1055988 1046288 1265640 >> air 62004 62004 53740 73988 >> capacita 41416 41416 45552 73896 >> channel 22808 22808 26768 34784 >> doduc 128448 128448 136996 197240 >> fatigue 69824 69824 69840 86080 >> gas_dyn 59112 59112 67416 119744 >> induct 163152 167248 167344 174976 >> linpk 18752 18752 27056 38648 >> mdbx 53692 53692 57884 82112 >> nf 23960 23960 32104 71800 >> protein 75032 75032 87208 132040 >> rnflow 71896 71896 96632 181120 >> test_fpu 54272 54272 78776 155072 >> tftt 18640 18640 18488 30768 >>
On Tue, 3 Apr 2012 08:57:51 -0400 Jack Howarth <howarth at bromo.med.uc.edu> wrote:> On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands wrote: > > Hi Jack, > > > >> Attached are the Polyhedron 2005 benchmark results for current > >> llvm/dragonegg svn on x86_64-apple-darwin11 built against Xcode > >> 4.3.2 and FSF gcc 4.6.3. > > > > thanks for the numbers. How does this compare to LLVM 3.0 - were > > there any regressions? > > The results from just before llvm/dragonegg 3.0 was released are at... > > http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html > > It does look as if the ac benchmark has been regressed from 10.80 sec > in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are > slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would > be shocked if that was the origin of the performance regression). > The results for -fplugin-arg-dragonegg-enable-gcc-optzns doesn't > seem much improved in llvm 3.1 so I assume this means little progress > was made in eliminating the scalarization of vectorizations in this > release. Did we even get any code added to llvm that would allow us > to identify instances of these scalarizations through a compiler > warning? Also, the current > -fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do > almost nothing in terms of vectorization. Do we need to pass any > additional flags to actually achieve autovectorization via llvmCurrently, we only have basic-block vectorization, so to get autovectorization of loops (which is probably what we want here), the loops need to be unrolled. I see that all categories include -funroll-loops, does that do anything if we're not using gcc's optimizations? I generally run with both -unroll-allow-partial and -unroll-runtime so that llvm's unroller will do as much as it can. Also, in many of these cases, it looks like the vectorization is doing *something*, just not anything overly helpful ;) -vectorize is new, so it is helpful to get feedback on what is actually useful. You might try including -bb-vectorize-aligned-only (sse3 does not actually have unaligned load/stores, right?). Other things to try include -bb-vectorize-no-ints (determining when to vectorize integer ops may be trickier than floating-point ops) and setting the required chain depth to something less than the current default of 6 (for example, -bb-vectorize-req-chain-depth=3) will cause a lot more vectorization. -Hal (in> absence of -ftree-vectorize and > -fplugin-arg-dragonegg-enable-gcc-optzns)? Jack > > > > > Ciao, Duncan. > > > > The benchmarks > >> for -msse3 and -msse4 appear identical (at least for degg+optnz). > >> This is fortunate since there seems to be a bug in -msse4 on 2.33 > >> GHz (T7600) Intel Core 2 Duo Merom > >> (http://llvm.org/bugs/show_bug.cgi?id=12434). Jack > >> > >> llvm/dragonegg r153877 > >> > >> dragonegg: > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n > >> > >> degg+vectorize: > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 > >> -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n > >> > >> degg+optnz: > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 > >> -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n > >> > >> gfortran: > >> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n > >> > >> Ave Run (secs) > >> dragonegg degg+vectorize degg+optnz gfortran > >> ac 12.45 12.45 8.85 8.80 > >> aermod 16.15 16.05 14.80 17.48 > >> air 7.10 7.11 6.46 5.50 > >> capacita 40.00 39.96 37.72 32.62 > >> channel 2.16 2.15 1.99 1.84 > >> doduc 29.13 28.41 27.48 26.74 > >> fatigue 8.75 9.03 8.11 8.44 > >> gas_dyn 11.72 11.80 4.47 4.26 > >> induct 24.02 24.91 12.08 13.65 > >> linpk 15.40 15.78 15.74 15.45 > >> mdbx 11.80 12.22 11.86 11.20 > >> nf 28.45 28.50 29.25 27.91 > >> protein 38.15 39.26 37.87 32.49 > >> rnflow 32.25 32.35 26.47 24.06 > >> test_fpu 11.34 11.35 9.31 8.04 > >> tftt 1.91 1.92 1.93 1.87 > >> > >> Geometric Mean 13.50 13.62 11.34 10.87 > >> > >> Compile (secs) > >> dragonegg degg+vectorize degg+optnz gfortran > >> ac 0.33 0.38 0.72 1.27 > >> aermod 25.91 27.58 32.34 43.91 > >> air 1.07 1.25 1.52 2.25 > >> capacita 0.49 0.52 0.89 1.71 > >> channel 0.29 0.36 0.50 0.62 > >> doduc 1.71 4.50 3.25 5.34 > >> fatigue 0.84 0.97 1.19 1.76 > >> gas_dyn 0.67 0.68 1.20 3.02 > >> induct 1.60 2.14 2.82 3.99 > >> linpk 0.22 0.24 0.47 0.78 > >> mdbx 0.63 0.77 1.16 1.85 > >> nf 0.37 0.40 0.70 1.66 > >> protein 0.93 1.02 1.75 4.01 > >> rnflow 1.20 1.25 2.63 5.44 > >> test_fpu 0.88 0.92 2.13 4.39 > >> tftt 0.21 0.24 0.34 0.56 > >> > >> Executable (bytes) > >> dragonegg degg+vectorize degg+optnz gfortran > >> ac 26856 26856 39120 50968 > >> aermod 1043700 1055988 1046288 1265640 > >> air 62004 62004 53740 73988 > >> capacita 41416 41416 45552 73896 > >> channel 22808 22808 26768 34784 > >> doduc 128448 128448 136996 197240 > >> fatigue 69824 69824 69840 86080 > >> gas_dyn 59112 59112 67416 119744 > >> induct 163152 167248 167344 174976 > >> linpk 18752 18752 27056 38648 > >> mdbx 53692 53692 57884 82112 > >> nf 23960 23960 32104 71800 > >> protein 75032 75032 87208 132040 > >> rnflow 71896 71896 96632 181120 > >> test_fpu 54272 54272 78776 155072 > >> tftt 18640 18640 18488 30768 > >> > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On Tue, Apr 03, 2012 at 08:33:33AM -0500, Hal Finkel wrote:> On Tue, 3 Apr 2012 08:57:51 -0400 > Jack Howarth <howarth at bromo.med.uc.edu> wrote: > > > On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands wrote: > > > Hi Jack, > > > > > >> Attached are the Polyhedron 2005 benchmark results for current > > >> llvm/dragonegg svn on x86_64-apple-darwin11 built against Xcode > > >> 4.3.2 and FSF gcc 4.6.3. > > > > > > thanks for the numbers. How does this compare to LLVM 3.0 - were > > > there any regressions? > > > > The results from just before llvm/dragonegg 3.0 was released are at... > > > > http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html > > > > It does look as if the ac benchmark has been regressed from 10.80 sec > > in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are > > slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would > > be shocked if that was the origin of the performance regression). > > The results for -fplugin-arg-dragonegg-enable-gcc-optzns doesn't > > seem much improved in llvm 3.1 so I assume this means little progress > > was made in eliminating the scalarization of vectorizations in this > > release. Did we even get any code added to llvm that would allow us > > to identify instances of these scalarizations through a compiler > > warning? Also, the current > > -fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do > > almost nothing in terms of vectorization. Do we need to pass any > > additional flags to actually achieve autovectorization via llvm > > Currently, we only have basic-block vectorization, so to get > autovectorization of loops (which is probably what we want here), the > loops need to be unrolled. I see that all categories include > -funroll-loops, does that do anything if we're not using gcc's > optimizations? > > I generally run with both -unroll-allow-partial and -unroll-runtime so > that llvm's unroller will do as much as it can. Also, in many of these > cases, it looks like the vectorization is doing *something*, just not > anything overly helpful ;) -vectorize is new, so it is helpful to > get feedback on what is actually useful. > > You might try including -bb-vectorize-aligned-only (sse3 does not > actually have unaligned load/stores, right?). Other things to try > include -bb-vectorize-no-ints (determining when to vectorize integer > ops may be trickier than floating-point ops) and setting the required > chain depth to something less than the current default of 6 (for > example, -bb-vectorize-req-chain-depth=3) will cause a lot more > vectorization.So these need to be passed on their own instances of -fplugin-arg-dragonegg-llvm-optionI guess. I'll try... de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-llvm-option=-vectorize -fplugin-arg-dragonegg-llvm-option=-unroll-allow-partial -fplugin-arg-dragonegg-llvm-option=-unroll-runtime -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints %n.f90 -o %n Unfortunately it doesn't seem that dragonegg can currently parse something like... -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3 % de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fplugin-arg-dragonegg-llvm-option=-vectorize -fplugin-arg-dragonegg-llvm-option=-unroll-allow-partial -fplugin-arg-dragonegg-llvm-option=-unroll-runtime -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3 ac.f90 -o ac f951: error: malformed option -fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3 (multiple '=' signs) Duncan, any idea how to work around that for passing -bb-vectorize-req-chain-depth=3? Jack> > -Hal > > (in > > absence of -ftree-vectorize and > > -fplugin-arg-dragonegg-enable-gcc-optzns)? Jack > > > > > > > > Ciao, Duncan. > > > > > > The benchmarks > > >> for -msse3 and -msse4 appear identical (at least for degg+optnz). > > >> This is fortunate since there seems to be a bug in -msse4 on 2.33 > > >> GHz (T7600) Intel Core 2 Duo Merom > > >> (http://llvm.org/bugs/show_bug.cgi?id=12434). Jack > > >> > > >> llvm/dragonegg r153877 > > >> > > >> dragonegg: > > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n > > >> > > >> degg+vectorize: > > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 > > >> -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n > > >> > > >> degg+optnz: > > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 > > >> -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n > > >> > > >> gfortran: > > >> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n > > >> > > >> Ave Run (secs) > > >> dragonegg degg+vectorize degg+optnz gfortran > > >> ac 12.45 12.45 8.85 8.80 > > >> aermod 16.15 16.05 14.80 17.48 > > >> air 7.10 7.11 6.46 5.50 > > >> capacita 40.00 39.96 37.72 32.62 > > >> channel 2.16 2.15 1.99 1.84 > > >> doduc 29.13 28.41 27.48 26.74 > > >> fatigue 8.75 9.03 8.11 8.44 > > >> gas_dyn 11.72 11.80 4.47 4.26 > > >> induct 24.02 24.91 12.08 13.65 > > >> linpk 15.40 15.78 15.74 15.45 > > >> mdbx 11.80 12.22 11.86 11.20 > > >> nf 28.45 28.50 29.25 27.91 > > >> protein 38.15 39.26 37.87 32.49 > > >> rnflow 32.25 32.35 26.47 24.06 > > >> test_fpu 11.34 11.35 9.31 8.04 > > >> tftt 1.91 1.92 1.93 1.87 > > >> > > >> Geometric Mean 13.50 13.62 11.34 10.87 > > >> > > >> Compile (secs) > > >> dragonegg degg+vectorize degg+optnz gfortran > > >> ac 0.33 0.38 0.72 1.27 > > >> aermod 25.91 27.58 32.34 43.91 > > >> air 1.07 1.25 1.52 2.25 > > >> capacita 0.49 0.52 0.89 1.71 > > >> channel 0.29 0.36 0.50 0.62 > > >> doduc 1.71 4.50 3.25 5.34 > > >> fatigue 0.84 0.97 1.19 1.76 > > >> gas_dyn 0.67 0.68 1.20 3.02 > > >> induct 1.60 2.14 2.82 3.99 > > >> linpk 0.22 0.24 0.47 0.78 > > >> mdbx 0.63 0.77 1.16 1.85 > > >> nf 0.37 0.40 0.70 1.66 > > >> protein 0.93 1.02 1.75 4.01 > > >> rnflow 1.20 1.25 2.63 5.44 > > >> test_fpu 0.88 0.92 2.13 4.39 > > >> tftt 0.21 0.24 0.34 0.56 > > >> > > >> Executable (bytes) > > >> dragonegg degg+vectorize degg+optnz gfortran > > >> ac 26856 26856 39120 50968 > > >> aermod 1043700 1055988 1046288 1265640 > > >> air 62004 62004 53740 73988 > > >> capacita 41416 41416 45552 73896 > > >> channel 22808 22808 26768 34784 > > >> doduc 128448 128448 136996 197240 > > >> fatigue 69824 69824 69840 86080 > > >> gas_dyn 59112 59112 67416 119744 > > >> induct 163152 167248 167344 174976 > > >> linpk 18752 18752 27056 38648 > > >> mdbx 53692 53692 57884 82112 > > >> nf 23960 23960 32104 71800 > > >> protein 75032 75032 87208 132040 > > >> rnflow 71896 71896 96632 181120 > > >> test_fpu 54272 54272 78776 155072 > > >> tftt 18640 18640 18488 30768 > > >> > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > > -- > Hal Finkel > Postdoctoral Appointee > Leadership Computing Facility > Argonne National Laboratory
Attached are the Polyhedron 2005 benchmark results for current llvm/dragonegg
svn
on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3. The
benchmarks
for -msse3 and -msse4 appear identical (at least for degg+optnz). This is
fortunate
since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core 2 Duo
Merom
(http://llvm.org/bugs/show_bug.cgi?id=12434). I've added two additional
entries to
the table. The first, degg+novect+optnz, should show the optimizations achieved
by
-fplugin-arg-dragonegg-enable-gcc-optzns in the absence of autovectorization by
FSF gcc. This shows the missing optimization opportunities for LLVM IR-level
outside
of autovectorization. The second entry is for the new LLVM autovectorization
option
with all of its related options set. This shows mixed results with some
benchmarks
being improved over the simple -fplugin-arg-dragonegg-llvm-option=-vectorize
and some being worsened in performance.
Jack
llvm/dragonegg r153877
dragonegg:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
degg+vectorize:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
degg+optnz:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
gfortran:
gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
degg+novect+optnz
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fno-tree-vectorize
-fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
degg+fullvect+optnz
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fno-tree-vectorize
-fplugin-arg-dragonegg-llvm-option=-vectorize
-fplugin-arg-dragonegg-llvm-option=-unroll-allow-partia
l -fplugin-arg-dragonegg-llvm-option=-unroll-runtime
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints %
n.f90 -o %n
Ave Run (secs)
dragonegg degg+vectorize degg+optnz gfortran degg+novect+optnz
degg+fullvect+optnz
ac 12.45 12.45 8.85 8.80 8.90
10.89
aermod 16.15 16.05 14.80 17.48 14.12
15.84
air 7.10 7.11 6.46 5.50 6.46
8.15
capacita 40.00 39.96 37.72 32.62 39.38
39.94
channel 2.16 2.15 1.99 1.84 2.15
2.56
doduc 29.13 28.41 27.48 26.74 28.27
29.05
fatigue 8.75 9.03 8.11 8.44 7.28
10.49
gas_dyn 11.72 11.80 4.47 4.26 10.02
11.63
induct 24.02 24.91 12.08 13.65 20.54
24.68
linpk 15.40 15.78 15.74 15.45 15.39
15.46
mdbx 11.80 12.22 11.86 11.20 11.82
11.50
nf 28.45 28.50 29.25 27.91 29.17
28.16
protein 38.15 39.26 37.87 32.49 39.08
38.62
rnflow 32.25 32.35 26.47 24.06 28.75
31.05
test_fpu 11.34 11.35 9.31 8.04 10.88
10.19
tftt 1.91 1.92 1.93 1.87 1.94
1.90
Geometric Mean 13.50 13.62 11.34 10.87 12.53
13.65
Compile (secs)
dragonegg degg+vectorize degg+optnz gfortran degg+novect+optnz
degg+fullvect+optnz
ac 0.33 0.38 0.72 1.27 0.71
0.39
aermod 25.91 27.58 32.34 43.91 25.13
23.62
air 1.07 1.25 1.52 2.25 1.36
1.34
capacita 0.49 0.52 0.89 1.71 0.71
0.98
channel 0.29 0.36 0.50 0.62 0.42
0.49
doduc 1.71 4.50 3.25 5.34 2.75
5.42
fatigue 0.84 0.97 1.19 1.76 1.00
1.24
gas_dyn 0.67 0.68 1.20 3.02 0.90
1.81
induct 1.60 2.14 2.82 3.99 2.53
2.15
linpk 0.22 0.24 0.47 0.78 0.30
0.46
mdbx 0.63 0.77 1.16 1.85 0.99
1.12
nf 0.37 0.40 0.70 1.66 0.42
1.22
protein 0.93 1.02 1.75 4.01 1.40
2.73
rnflow 1.20 1.25 2.63 5.44 1.72
2.85
test_fpu 0.88 0.92 2.13 4.39 1.26
2.38
tftt 0.21 0.24 0.34 0.56 0.30
0.27
Executable (bytes)
dragonegg degg+vectorize degg+optnz gfortran degg+novect+optnz
degg+fullvect+optnz
ac 26856 26856 39120 50968 39120
35144
aermod 1043700 1055988 1046288 1265640 1013488
1146196
air 62004 62004 53740 73988 53740
78392
capacita 41416 41416 45552 73896 41416
70096
channel 22808 22808 26768 34784 22672
34984
doduc 128448 128448 136996 197240 128868
173512
fatigue 69824 69824 69840 86080 65712
78016
gas_dyn 59112 59112 67416 119744 59160
91952
induct 163152 167248 167344 174976 176696
179552
linpk 18752 18752 27056 38648 18904
31200
mdbx 53692 53692 57884 82112 53788
70080
nf 23960 23960 32104 71800 23912
48568
protein 75032 75032 87208 132040 78912
132376
rnflow 71896 71896 96632 181120 67928
137528
test_fpu 54272 54272 78776 155072 50144
111640
tftt 18640 18640 18488 30768 18488
22744