thr3ads.net - llvm dev - [LLVMdev] pb05 results for current llvm/dragonegg [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Jack Howarth

2012-Apr-03 12:57 UTC

[LLVMdev] pb05 results for current llvm/dragonegg

On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands
wrote:> Hi Jack,
>
>>    Attached are the Polyhedron 2005 benchmark results for current
llvm/dragonegg svn
>> on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3.
>
> thanks for the numbers.  How does this compare to LLVM 3.0 - were there any
> regressions?
The results from just before llvm/dragonegg 3.0 was released are at...

http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html

It does look as if the ac benchmark has been regressed from 10.80 sec
in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are
slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would
be shocked if that was the origin of the performance regression).
   The results for -fplugin-arg-dragonegg-enable-gcc-optzns doesn't seem
much improved in llvm 3.1 so I assume this means little progress was made
in eliminating the scalarization of vectorizations in this release. Did
we even get any code added to llvm that would allow us to identify instances
of these scalarizations through a compiler warning? Also, the current
-fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do almost
nothing in terms of vectorization. Do we need to pass any additional flags
to actually achieve autovectorization via llvm (in absence of -ftree-vectorize
and -fplugin-arg-dragonegg-enable-gcc-optzns)?
                 Jack
>
> Ciao, Duncan.
>
>  The benchmarks
>> for -msse3 and -msse4 appear identical (at least for degg+optnz). This
is fortunate
>> since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core
2 Duo Merom
>> (http://llvm.org/bugs/show_bug.cgi?id=12434).
>>                     Jack
>>
>> llvm/dragonegg r153877
>>
>> dragonegg:
>> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
>>
>> degg+vectorize:
>> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
>>
>> degg+optnz:
>> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
>>
>> gfortran:
>> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
>>
>> Ave Run (secs)
>>                 dragonegg degg+vectorize degg+optnz  gfortran
>> ac               12.45       12.45         8.85       8.80
>> aermod           16.15       16.05        14.80      17.48
>> air               7.10        7.11         6.46       5.50
>> capacita         40.00       39.96        37.72      32.62
>> channel           2.16        2.15         1.99       1.84
>> doduc            29.13       28.41        27.48      26.74
>> fatigue           8.75        9.03         8.11       8.44
>> gas_dyn          11.72       11.80         4.47       4.26
>> induct           24.02       24.91        12.08      13.65
>> linpk            15.40       15.78        15.74      15.45
>> mdbx             11.80       12.22        11.86      11.20
>> nf               28.45       28.50        29.25      27.91
>> protein          38.15       39.26        37.87      32.49
>> rnflow           32.25       32.35        26.47      24.06
>> test_fpu         11.34       11.35         9.31       8.04
>> tftt              1.91        1.92         1.93       1.87
>>
>> Geometric Mean   13.50       13.62        11.34      10.87
>>
>> Compile (secs)
>>                 dragonegg degg+vectorize degg+optnz  gfortran
>> ac                0.33        0.38         0.72       1.27
>> aermod           25.91       27.58        32.34      43.91
>> air               1.07        1.25         1.52       2.25
>> capacita          0.49        0.52         0.89       1.71
>> channel           0.29        0.36         0.50       0.62
>> doduc             1.71        4.50         3.25       5.34
>> fatigue           0.84        0.97         1.19       1.76
>> gas_dyn           0.67        0.68         1.20       3.02
>> induct            1.60        2.14         2.82       3.99
>> linpk             0.22        0.24         0.47       0.78
>> mdbx              0.63        0.77         1.16       1.85
>> nf                0.37        0.40         0.70       1.66
>> protein           0.93        1.02         1.75       4.01
>> rnflow            1.20        1.25         2.63       5.44
>> test_fpu          0.88        0.92         2.13       4.39
>> tftt              0.21        0.24         0.34       0.56
>>
>> Executable (bytes)
>>                 dragonegg degg+vectorize  degg+optnz  gfortran
>> ac                26856       26856        39120      50968
>> aermod          1043700     1055988      1046288    1265640
>> air               62004       62004        53740      73988
>> capacita          41416       41416        45552      73896
>> channel           22808       22808        26768      34784
>> doduc            128448      128448       136996     197240
>> fatigue           69824       69824        69840      86080
>> gas_dyn           59112       59112        67416     119744
>> induct           163152      167248       167344     174976
>> linpk             18752       18752        27056      38648
>> mdbx              53692       53692        57884      82112
>> nf                23960       23960        32104      71800
>> protein           75032       75032        87208     132040
>> rnflow            71896       71896        96632     181120
>> test_fpu          54272       54272        78776     155072
>> tftt              18640       18640        18488      30768
>>

Hal Finkel

2012-Apr-03 13:33 UTC

head link

[LLVMdev] pb05 results for current llvm/dragonegg

On Tue, 3 Apr 2012 08:57:51 -0400
Jack Howarth <howarth at bromo.med.uc.edu> wrote:
> On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands wrote:
> > Hi Jack,
> >
> >>    Attached are the Polyhedron 2005 benchmark results for current
> >> llvm/dragonegg svn on x86_64-apple-darwin11 built against Xcode
> >> 4.3.2 and FSF gcc 4.6.3.
> >
> > thanks for the numbers.  How does this compare to LLVM 3.0 - were
> > there any regressions?
> 
> The results from just before llvm/dragonegg 3.0 was released are at...
> 
> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html
> 
> It does look as if the ac benchmark has been regressed from 10.80 sec
> in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are
> slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would
> be shocked if that was the origin of the performance regression).
>    The results for -fplugin-arg-dragonegg-enable-gcc-optzns doesn't
> seem much improved in llvm 3.1 so I assume this means little progress
> was made in eliminating the scalarization of vectorizations in this
> release. Did we even get any code added to llvm that would allow us
> to identify instances of these scalarizations through a compiler
> warning? Also, the current
> -fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do
> almost nothing in terms of vectorization. Do we need to pass any
> additional flags to actually achieve autovectorization via llvm 
Currently, we only have basic-block vectorization, so to get
autovectorization of loops (which is probably what we want here), the
loops need to be unrolled. I see that all categories include
-funroll-loops, does that do anything if we're not using gcc's
optimizations?

I generally run with both -unroll-allow-partial and -unroll-runtime so
that llvm's unroller will do as much as it can. Also, in many of these
cases, it looks like the vectorization is doing *something*, just not
anything overly helpful ;) -vectorize is new, so it is helpful to
get feedback on what is actually useful.

You might try including -bb-vectorize-aligned-only (sse3 does not
actually have unaligned load/stores, right?). Other things to try
include -bb-vectorize-no-ints (determining when to vectorize integer
ops may be trickier than floating-point ops) and setting the required
chain depth to something less than the current default of 6 (for
example, -bb-vectorize-req-chain-depth=3) will cause a lot more
vectorization.

 -Hal

(in> absence of -ftree-vectorize and
> -fplugin-arg-dragonegg-enable-gcc-optzns)? Jack
> 
> >
> > Ciao, Duncan.
> >
> >  The benchmarks
> >> for -msse3 and -msse4 appear identical (at least for degg+optnz).
> >> This is fortunate since there seems to be a bug in -msse4 on 2.33
> >> GHz (T7600) Intel Core 2 Duo Merom
> >> (http://llvm.org/bugs/show_bug.cgi?id=12434). Jack
> >>
> >> llvm/dragonegg r153877
> >>
> >> dragonegg:
> >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
> >>
> >> degg+vectorize:
> >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> >> -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
> >>
> >> degg+optnz:
> >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> >> -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
> >>
> >> gfortran:
> >> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o
%n
> >>
> >> Ave Run (secs)
> >>                 dragonegg degg+vectorize degg+optnz  gfortran
> >> ac               12.45       12.45         8.85       8.80
> >> aermod           16.15       16.05        14.80      17.48
> >> air               7.10        7.11         6.46       5.50
> >> capacita         40.00       39.96        37.72      32.62
> >> channel           2.16        2.15         1.99       1.84
> >> doduc            29.13       28.41        27.48      26.74
> >> fatigue           8.75        9.03         8.11       8.44
> >> gas_dyn          11.72       11.80         4.47       4.26
> >> induct           24.02       24.91        12.08      13.65
> >> linpk            15.40       15.78        15.74      15.45
> >> mdbx             11.80       12.22        11.86      11.20
> >> nf               28.45       28.50        29.25      27.91
> >> protein          38.15       39.26        37.87      32.49
> >> rnflow           32.25       32.35        26.47      24.06
> >> test_fpu         11.34       11.35         9.31       8.04
> >> tftt              1.91        1.92         1.93       1.87
> >>
> >> Geometric Mean   13.50       13.62        11.34      10.87
> >>
> >> Compile (secs)
> >>                 dragonegg degg+vectorize degg+optnz  gfortran
> >> ac                0.33        0.38         0.72       1.27
> >> aermod           25.91       27.58        32.34      43.91
> >> air               1.07        1.25         1.52       2.25
> >> capacita          0.49        0.52         0.89       1.71
> >> channel           0.29        0.36         0.50       0.62
> >> doduc             1.71        4.50         3.25       5.34
> >> fatigue           0.84        0.97         1.19       1.76
> >> gas_dyn           0.67        0.68         1.20       3.02
> >> induct            1.60        2.14         2.82       3.99
> >> linpk             0.22        0.24         0.47       0.78
> >> mdbx              0.63        0.77         1.16       1.85
> >> nf                0.37        0.40         0.70       1.66
> >> protein           0.93        1.02         1.75       4.01
> >> rnflow            1.20        1.25         2.63       5.44
> >> test_fpu          0.88        0.92         2.13       4.39
> >> tftt              0.21        0.24         0.34       0.56
> >>
> >> Executable (bytes)
> >>                 dragonegg degg+vectorize  degg+optnz  gfortran
> >> ac                26856       26856        39120      50968
> >> aermod          1043700     1055988      1046288    1265640
> >> air               62004       62004        53740      73988
> >> capacita          41416       41416        45552      73896
> >> channel           22808       22808        26768      34784
> >> doduc            128448      128448       136996     197240
> >> fatigue           69824       69824        69840      86080
> >> gas_dyn           59112       59112        67416     119744
> >> induct           163152      167248       167344     174976
> >> linpk             18752       18752        27056      38648
> >> mdbx              53692       53692        57884      82112
> >> nf                23960       23960        32104      71800
> >> protein           75032       75032        87208     132040
> >> rnflow            71896       71896        96632     181120
> >> test_fpu          54272       54272        78776     155072
> >> tftt              18640       18640        18488      30768
> >>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev


-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory

Jack Howarth

2012-Apr-03 14:01 UTC

head link

[LLVMdev] pb05 results for current llvm/dragonegg

On Tue, Apr 03, 2012 at 08:33:33AM -0500, Hal Finkel
wrote:> On Tue, 3 Apr 2012 08:57:51 -0400
> Jack Howarth <howarth at bromo.med.uc.edu> wrote:
> 
> > On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands wrote:
> > > Hi Jack,
> > >
> > >>    Attached are the Polyhedron 2005 benchmark results for
current
> > >> llvm/dragonegg svn on x86_64-apple-darwin11 built against
Xcode
> > >> 4.3.2 and FSF gcc 4.6.3.
> > >
> > > thanks for the numbers.  How does this compare to LLVM 3.0 - were
> > > there any regressions?
> > 
> > The results from just before llvm/dragonegg 3.0 was released are at...
> > 
> > http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html
> > 
> > It does look as if the ac benchmark has been regressed from 10.80 sec
> > in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are
> > slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would
> > be shocked if that was the origin of the performance regression).
> >    The results for -fplugin-arg-dragonegg-enable-gcc-optzns
doesn't
> > seem much improved in llvm 3.1 so I assume this means little progress
> > was made in eliminating the scalarization of vectorizations in this
> > release. Did we even get any code added to llvm that would allow us
> > to identify instances of these scalarizations through a compiler
> > warning? Also, the current
> > -fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do
> > almost nothing in terms of vectorization. Do we need to pass any
> > additional flags to actually achieve autovectorization via llvm 
> 
> Currently, we only have basic-block vectorization, so to get
> autovectorization of loops (which is probably what we want here), the
> loops need to be unrolled. I see that all categories include
> -funroll-loops, does that do anything if we're not using gcc's
> optimizations?
> 
> I generally run with both -unroll-allow-partial and -unroll-runtime so
> that llvm's unroller will do as much as it can. Also, in many of these
> cases, it looks like the vectorization is doing *something*, just not
> anything overly helpful ;) -vectorize is new, so it is helpful to
> get feedback on what is actually useful.
> 
> You might try including -bb-vectorize-aligned-only (sse3 does not
> actually have unaligned load/stores, right?). Other things to try
> include -bb-vectorize-no-ints (determining when to vectorize integer
> ops may be trickier than floating-point ops) and setting the required
> chain depth to something less than the current default of 6 (for
> example, -bb-vectorize-req-chain-depth=3) will cause a lot more
> vectorization.
So these need to be passed on their own instances of
-fplugin-arg-dragonegg-llvm-optionI guess. I'll try...

de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-llvm-option=-vectorize
-fplugin-arg-dragonegg-llvm-option=-unroll-allow-partial
-fplugin-arg-dragonegg-llvm-option=-unroll-runtime
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints %n.f90 -o %n

Unfortunately it doesn't seem that dragonegg can currently parse something
like...

-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3

% de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-llvm-option=-vectorize
-fplugin-arg-dragonegg-llvm-option=-unroll-allow-partial
-fplugin-arg-dragonegg-llvm-option=-unroll-runtime
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3 ac.f90 -o ac
f951: error: malformed option
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-req-chain-depth=3 (multiple
'=' signs)

Duncan, any idea how to work around that for passing
-bb-vectorize-req-chain-depth=3?
          Jack
> 
>  -Hal
> 
> (in
> > absence of -ftree-vectorize and
> > -fplugin-arg-dragonegg-enable-gcc-optzns)? Jack
> > 
> > >
> > > Ciao, Duncan.
> > >
> > >  The benchmarks
> > >> for -msse3 and -msse4 appear identical (at least for
degg+optnz).
> > >> This is fortunate since there seems to be a bug in -msse4 on
2.33
> > >> GHz (T7600) Intel Core 2 Duo Merom
> > >> (http://llvm.org/bugs/show_bug.cgi?id=12434). Jack
> > >>
> > >> llvm/dragonegg r153877
> > >>
> > >> dragonegg:
> > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o
%n
> > >>
> > >> degg+vectorize:
> > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> > >> -fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
> > >>
> > >> degg+optnz:
> > >> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
> > >> -fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
> > >>
> > >> gfortran:
> > >> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90
-o %n
> > >>
> > >> Ave Run (secs)
> > >>                 dragonegg degg+vectorize degg+optnz  gfortran
> > >> ac               12.45       12.45         8.85       8.80
> > >> aermod           16.15       16.05        14.80      17.48
> > >> air               7.10        7.11         6.46       5.50
> > >> capacita         40.00       39.96        37.72      32.62
> > >> channel           2.16        2.15         1.99       1.84
> > >> doduc            29.13       28.41        27.48      26.74
> > >> fatigue           8.75        9.03         8.11       8.44
> > >> gas_dyn          11.72       11.80         4.47       4.26
> > >> induct           24.02       24.91        12.08      13.65
> > >> linpk            15.40       15.78        15.74      15.45
> > >> mdbx             11.80       12.22        11.86      11.20
> > >> nf               28.45       28.50        29.25      27.91
> > >> protein          38.15       39.26        37.87      32.49
> > >> rnflow           32.25       32.35        26.47      24.06
> > >> test_fpu         11.34       11.35         9.31       8.04
> > >> tftt              1.91        1.92         1.93       1.87
> > >>
> > >> Geometric Mean   13.50       13.62        11.34      10.87
> > >>
> > >> Compile (secs)
> > >>                 dragonegg degg+vectorize degg+optnz  gfortran
> > >> ac                0.33        0.38         0.72       1.27
> > >> aermod           25.91       27.58        32.34      43.91
> > >> air               1.07        1.25         1.52       2.25
> > >> capacita          0.49        0.52         0.89       1.71
> > >> channel           0.29        0.36         0.50       0.62
> > >> doduc             1.71        4.50         3.25       5.34
> > >> fatigue           0.84        0.97         1.19       1.76
> > >> gas_dyn           0.67        0.68         1.20       3.02
> > >> induct            1.60        2.14         2.82       3.99
> > >> linpk             0.22        0.24         0.47       0.78
> > >> mdbx              0.63        0.77         1.16       1.85
> > >> nf                0.37        0.40         0.70       1.66
> > >> protein           0.93        1.02         1.75       4.01
> > >> rnflow            1.20        1.25         2.63       5.44
> > >> test_fpu          0.88        0.92         2.13       4.39
> > >> tftt              0.21        0.24         0.34       0.56
> > >>
> > >> Executable (bytes)
> > >>                 dragonegg degg+vectorize  degg+optnz 
gfortran
> > >> ac                26856       26856        39120      50968
> > >> aermod          1043700     1055988      1046288    1265640
> > >> air               62004       62004        53740      73988
> > >> capacita          41416       41416        45552      73896
> > >> channel           22808       22808        26768      34784
> > >> doduc            128448      128448       136996     197240
> > >> fatigue           69824       69824        69840      86080
> > >> gas_dyn           59112       59112        67416     119744
> > >> induct           163152      167248       167344     174976
> > >> linpk             18752       18752        27056      38648
> > >> mdbx              53692       53692        57884      82112
> > >> nf                23960       23960        32104      71800
> > >> protein           75032       75032        87208     132040
> > >> rnflow            71896       71896        96632     181120
> > >> test_fpu          54272       54272        78776     155072
> > >> tftt              18640       18640        18488      30768
> > >>
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> -- 
> Hal Finkel
> Postdoctoral Appointee
> Leadership Computing Facility
> Argonne National Laboratory

Jack Howarth

2012-Apr-03 20:50 UTC

head link

[LLVMdev] pb05 results for current llvm/dragonegg

Attached are the Polyhedron 2005 benchmark results for current llvm/dragonegg
svn
on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3. The
benchmarks
for -msse3 and -msse4 appear identical (at least for degg+optnz). This is
fortunate
since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core 2 Duo
Merom
(http://llvm.org/bugs/show_bug.cgi?id=12434). I've added two additional
entries to
the table. The first, degg+novect+optnz, should show the optimizations achieved
by
-fplugin-arg-dragonegg-enable-gcc-optzns in the absence of autovectorization by
FSF gcc. This shows the missing optimization opportunities for LLVM IR-level
outside
of autovectorization. The second entry is for the new LLVM autovectorization
option
with all of its related options set. This shows mixed results with some
benchmarks
being improved over the simple -fplugin-arg-dragonegg-llvm-option=-vectorize
and some being worsened in performance.
                   Jack

llvm/dragonegg r153877

dragonegg:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n

degg+vectorize:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n

degg+optnz:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n

gfortran:
gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n

degg+novect+optnz
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fno-tree-vectorize
-fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n

degg+fullvect+optnz
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 -fno-tree-vectorize
-fplugin-arg-dragonegg-llvm-option=-vectorize
-fplugin-arg-dragonegg-llvm-option=-unroll-allow-partia
l -fplugin-arg-dragonegg-llvm-option=-unroll-runtime
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-aligned-only
-fplugin-arg-dragonegg-llvm-option=-bb-vectorize-no-ints %
n.f90 -o %n

Ave Run (secs)
               dragonegg degg+vectorize degg+optnz  gfortran degg+novect+optnz
degg+fullvect+optnz
ac               12.45       12.45         8.85       8.80       8.90           
10.89
aermod           16.15       16.05        14.80      17.48      14.12           
15.84
air               7.10        7.11         6.46       5.50       6.46           
8.15
capacita         40.00       39.96        37.72      32.62      39.38           
39.94
channel           2.16        2.15         1.99       1.84       2.15           
2.56
doduc            29.13       28.41        27.48      26.74      28.27           
29.05
fatigue           8.75        9.03         8.11       8.44       7.28           
10.49
gas_dyn          11.72       11.80         4.47       4.26      10.02           
11.63
induct           24.02       24.91        12.08      13.65      20.54           
24.68
linpk            15.40       15.78        15.74      15.45      15.39           
15.46
mdbx             11.80       12.22        11.86      11.20      11.82           
11.50
nf               28.45       28.50        29.25      27.91      29.17           
28.16
protein          38.15       39.26        37.87      32.49      39.08           
38.62
rnflow           32.25       32.35        26.47      24.06      28.75           
31.05
test_fpu         11.34       11.35         9.31       8.04      10.88           
10.19
tftt              1.91        1.92         1.93       1.87       1.94           
1.90

Geometric Mean   13.50       13.62        11.34      10.87      12.53           
13.65

Compile (secs)
               dragonegg degg+vectorize degg+optnz  gfortran degg+novect+optnz
degg+fullvect+optnz
ac                0.33        0.38         0.72       1.27       0.71           
0.39
aermod           25.91       27.58        32.34      43.91      25.13           
23.62
air               1.07        1.25         1.52       2.25       1.36           
1.34
capacita          0.49        0.52         0.89       1.71       0.71           
0.98
channel           0.29        0.36         0.50       0.62       0.42           
0.49
doduc             1.71        4.50         3.25       5.34       2.75           
5.42
fatigue           0.84        0.97         1.19       1.76       1.00           
1.24
gas_dyn           0.67        0.68         1.20       3.02       0.90           
1.81
induct            1.60        2.14         2.82       3.99       2.53           
2.15
linpk             0.22        0.24         0.47       0.78       0.30           
0.46
mdbx              0.63        0.77         1.16       1.85       0.99           
1.12
nf                0.37        0.40         0.70       1.66       0.42           
1.22
protein           0.93        1.02         1.75       4.01       1.40           
2.73
rnflow            1.20        1.25         2.63       5.44       1.72           
2.85
test_fpu          0.88        0.92         2.13       4.39       1.26           
2.38
tftt              0.21        0.24         0.34       0.56       0.30           
0.27

Executable (bytes)
               dragonegg degg+vectorize  degg+optnz  gfortran degg+novect+optnz
degg+fullvect+optnz
ac                26856       26856        39120      50968      39120          
35144
aermod          1043700     1055988      1046288    1265640    1013488          
1146196
air               62004       62004        53740      73988      53740          
78392
capacita          41416       41416        45552      73896      41416          
70096
channel           22808       22808        26768      34784      22672          
34984
doduc            128448      128448       136996     197240     128868          
173512
fatigue           69824       69824        69840      86080      65712          
78016
gas_dyn           59112       59112        67416     119744      59160          
91952
induct           163152      167248       167344     174976     176696          
179552
linpk             18752       18752        27056      38648      18904          
31200
mdbx              53692       53692        57884      82112      53788          
70080
nf                23960       23960        32104      71800      23912          
48568
protein           75032       75032        87208     132040      78912          
132376
rnflow            71896       71896        96632     181120      67928          
137528
test_fpu          54272       54272        78776     155072      50144          
111640
tftt              18640       18640        18488      30768      18488          
22744

Seemingly Similar Threads

Search for more possibly parallel threads

llvm dev - Apr 2012 - [LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

Seemingly Similar Threads