thr3ads.net - llvm dev - [LLVMdev] pb05 results for current llvm/dragonegg [Apr 2012]

If this information is useful, please help other people find it:
Share via:

Jack Howarth

2012-Apr-02 23:34 UTC

[LLVMdev] pb05 results for current llvm/dragonegg

Attached are the Polyhedron 2005 benchmark results for current llvm/dragonegg
svn
on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3. The
benchmarks
for -msse3 and -msse4 appear identical (at least for degg+optnz). This is
fortunate
since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core 2 Duo
Merom
(llvm.org/bugs/show_bug.cgi?id=12434).
                   Jack

llvm/dragonegg r153877

dragonegg:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n

degg+vectorize:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n

degg+optnz:
de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n

gfortran:
gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n

Ave Run (secs)
               dragonegg degg+vectorize degg+optnz  gfortran
ac               12.45       12.45         8.85       8.80 
aermod           16.15       16.05        14.80      17.48
air               7.10        7.11         6.46       5.50 
capacita         40.00       39.96        37.72      32.62
channel           2.16        2.15         1.99       1.84 
doduc            29.13       28.41        27.48      26.74 
fatigue           8.75        9.03         8.11       8.44 
gas_dyn          11.72       11.80         4.47       4.26
induct           24.02       24.91        12.08      13.65
linpk            15.40       15.78        15.74      15.45 
mdbx             11.80       12.22        11.86      11.20  
nf               28.45       28.50        29.25      27.91 
protein          38.15       39.26        37.87      32.49
rnflow           32.25       32.35        26.47      24.06
test_fpu         11.34       11.35         9.31       8.04    
tftt              1.91        1.92         1.93       1.87 

Geometric Mean   13.50       13.62        11.34      10.87 

Compile (secs)
               dragonegg degg+vectorize degg+optnz  gfortran
ac                0.33        0.38         0.72       1.27
aermod           25.91       27.58        32.34      43.91 
air               1.07        1.25         1.52       2.25 
capacita          0.49        0.52         0.89       1.71
channel           0.29        0.36         0.50       0.62 
doduc             1.71        4.50         3.25       5.34 
fatigue           0.84        0.97         1.19       1.76 
gas_dyn           0.67        0.68         1.20       3.02 
induct            1.60        2.14         2.82       3.99
linpk             0.22        0.24         0.47       0.78  
mdbx              0.63        0.77         1.16       1.85
nf                0.37        0.40         0.70       1.66 
protein           0.93        1.02         1.75       4.01
rnflow            1.20        1.25         2.63       5.44 
test_fpu          0.88        0.92         2.13       4.39
tftt              0.21        0.24         0.34       0.56

Executable (bytes)
               dragonegg degg+vectorize  degg+optnz  gfortran
ac                26856       26856        39120      50968 
aermod          1043700     1055988      1046288    1265640
air               62004       62004        53740      73988
capacita          41416       41416        45552      73896
channel           22808       22808        26768      34784 
doduc            128448      128448       136996     197240
fatigue           69824       69824        69840      86080 
gas_dyn           59112       59112        67416     119744 
induct           163152      167248       167344     174976
linpk             18752       18752        27056      38648  
mdbx              53692       53692        57884      82112  
nf                23960       23960        32104      71800
protein           75032       75032        87208     132040
rnflow            71896       71896        96632     181120
test_fpu          54272       54272        78776     155072 
tftt              18640       18640        18488      30768

Anton Korobeynikov

2012-Apr-03 07:03 UTC

head link

[LLVMdev] pb05 results for current llvm/dragonegg

Hi Jack
>               dragonegg degg+vectorize degg+optnz  gfortran
> ac               12.45       12.45         8.85       8.80
> gas_dyn          11.72       11.80         4.47       4.26
> induct           24.02       24.91        12.08      13.65
> rnflow           32.25       32.35        26.47      24.06Any idea what might cause such differences here?

-- 
With best regards, Anton Korobeynikov
Faculty of Mathematics and Mechanics, Saint Petersburg State University

Duncan Sands

2012-Apr-03 07:26 UTC

head link

[LLVMdev] pb05 results for current llvm/dragonegg

Hi Jack,
>    Attached are the Polyhedron 2005 benchmark results for current
llvm/dragonegg svn
> on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3.
thanks for the numbers.  How does this compare to LLVM 3.0 - were there any
regressions?

Ciao, Duncan.

  The benchmarks> for -msse3 and -msse4 appear identical (at least for degg+optnz). This is
fortunate
> since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core 2
Duo Merom
> (llvm.org/bugs/show_bug.cgi?id=12434).
>                     Jack
>
> llvm/dragonegg r153877
>
> dragonegg:
> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
>
> degg+vectorize:
> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
>
> degg+optnz:
> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
>
> gfortran:
> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
>
> Ave Run (secs)
>                 dragonegg degg+vectorize degg+optnz  gfortran
> ac               12.45       12.45         8.85       8.80
> aermod           16.15       16.05        14.80      17.48
> air               7.10        7.11         6.46       5.50
> capacita         40.00       39.96        37.72      32.62
> channel           2.16        2.15         1.99       1.84
> doduc            29.13       28.41        27.48      26.74
> fatigue           8.75        9.03         8.11       8.44
> gas_dyn          11.72       11.80         4.47       4.26
> induct           24.02       24.91        12.08      13.65
> linpk            15.40       15.78        15.74      15.45
> mdbx             11.80       12.22        11.86      11.20
> nf               28.45       28.50        29.25      27.91
> protein          38.15       39.26        37.87      32.49
> rnflow           32.25       32.35        26.47      24.06
> test_fpu         11.34       11.35         9.31       8.04
> tftt              1.91        1.92         1.93       1.87
>
> Geometric Mean   13.50       13.62        11.34      10.87
>
> Compile (secs)
>                 dragonegg degg+vectorize degg+optnz  gfortran
> ac                0.33        0.38         0.72       1.27
> aermod           25.91       27.58        32.34      43.91
> air               1.07        1.25         1.52       2.25
> capacita          0.49        0.52         0.89       1.71
> channel           0.29        0.36         0.50       0.62
> doduc             1.71        4.50         3.25       5.34
> fatigue           0.84        0.97         1.19       1.76
> gas_dyn           0.67        0.68         1.20       3.02
> induct            1.60        2.14         2.82       3.99
> linpk             0.22        0.24         0.47       0.78
> mdbx              0.63        0.77         1.16       1.85
> nf                0.37        0.40         0.70       1.66
> protein           0.93        1.02         1.75       4.01
> rnflow            1.20        1.25         2.63       5.44
> test_fpu          0.88        0.92         2.13       4.39
> tftt              0.21        0.24         0.34       0.56
>
> Executable (bytes)
>                 dragonegg degg+vectorize  degg+optnz  gfortran
> ac                26856       26856        39120      50968
> aermod          1043700     1055988      1046288    1265640
> air               62004       62004        53740      73988
> capacita          41416       41416        45552      73896
> channel           22808       22808        26768      34784
> doduc            128448      128448       136996     197240
> fatigue           69824       69824        69840      86080
> gas_dyn           59112       59112        67416     119744
> induct           163152      167248       167344     174976
> linpk             18752       18752        27056      38648
> mdbx              53692       53692        57884      82112
> nf                23960       23960        32104      71800
> protein           75032       75032        87208     132040
> rnflow            71896       71896        96632     181120
> test_fpu          54272       54272        78776     155072
> tftt              18640       18640        18488      30768
>

Duncan Sands

2012-Apr-03 08:13 UTC

head link

[LLVMdev] pb05 results for current llvm/dragonegg

Hi Anton,
>>                dragonegg degg+vectorize degg+optnz  gfortran
>> ac               12.45       12.45         8.85       8.80
>> gas_dyn          11.72       11.80         4.47       4.26
>> induct           24.02       24.91        12.08      13.65
>> rnflow           32.25       32.35        26.47      24.06
> Any idea what might cause such differences here?
I haven't analysed these, but as a general remark: if "degg+optnz"
does much
better than "dragonegg" then that indicates a weakness in LLVM's
IR level
optimizers, while if "gfortran" does much better than
"degg+optnz" then that
indicates a weakness in LLVM's codegen.  Applying this to the above suggests
that most of the differences are coming from LLVM's IR level optimizers not
doing a good job somewhere.

Ciao, Duncan.

Jack Howarth

2012-Apr-03 12:57 UTC

head link

[LLVMdev] pb05 results for current llvm/dragonegg

On Tue, Apr 03, 2012 at 09:26:38AM +0200, Duncan Sands
wrote:> Hi Jack,
>
>>    Attached are the Polyhedron 2005 benchmark results for current
llvm/dragonegg svn
>> on x86_64-apple-darwin11 built against Xcode 4.3.2 and FSF gcc 4.6.3.
>
> thanks for the numbers.  How does this compare to LLVM 3.0 - were there any
> regressions?
The results from just before llvm/dragonegg 3.0 was released are at...

lists.cs.uiuc.edu/pipermail/llvmdev/2011-October/044091.html

It does look as if the ac benchmark has been regressed from 10.80 sec
in llvm/dragonegg 3.0 to 12.45 sec in llvm/dragonegg 3.1. These are
slightly different FSF gcc 4.6 releases (4.6.2svn vs 4.6.3 but I would
be shocked if that was the origin of the performance regression).
   The results for -fplugin-arg-dragonegg-enable-gcc-optzns doesn't seem
much improved in llvm 3.1 so I assume this means little progress was made
in eliminating the scalarization of vectorizations in this release. Did
we even get any code added to llvm that would allow us to identify instances
of these scalarizations through a compiler warning? Also, the current
-fplugin-arg-dragonegg-llvm-option=-vectorize option seems to do almost
nothing in terms of vectorization. Do we need to pass any additional flags
to actually achieve autovectorization via llvm (in absence of -ftree-vectorize
and -fplugin-arg-dragonegg-enable-gcc-optzns)?
                 Jack
>
> Ciao, Duncan.
>
>  The benchmarks
>> for -msse3 and -msse4 appear identical (at least for degg+optnz). This
is fortunate
>> since there seems to be a bug in -msse4 on 2.33 GHz (T7600) Intel Core
2 Duo Merom
>> (llvm.org/bugs/show_bug.cgi?id=12434).
>>                     Jack
>>
>> llvm/dragonegg r153877
>>
>> dragonegg:
>> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
>>
>> degg+vectorize:
>> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-llvm-option=-vectorize %n.f90 -o %n
>>
>> degg+optnz:
>> de-gfortran46 -msse3 -ffast-math -funroll-loops -O3
-fplugin-arg-dragonegg-enable-gcc-optzns %n.f90 -o %n
>>
>> gfortran:
>> gfortran-fsf-4.6 -msse3 -ffast-math -funroll-loops -O3 %n.f90 -o %n
>>
>> Ave Run (secs)
>>                 dragonegg degg+vectorize degg+optnz  gfortran
>> ac               12.45       12.45         8.85       8.80
>> aermod           16.15       16.05        14.80      17.48
>> air               7.10        7.11         6.46       5.50
>> capacita         40.00       39.96        37.72      32.62
>> channel           2.16        2.15         1.99       1.84
>> doduc            29.13       28.41        27.48      26.74
>> fatigue           8.75        9.03         8.11       8.44
>> gas_dyn          11.72       11.80         4.47       4.26
>> induct           24.02       24.91        12.08      13.65
>> linpk            15.40       15.78        15.74      15.45
>> mdbx             11.80       12.22        11.86      11.20
>> nf               28.45       28.50        29.25      27.91
>> protein          38.15       39.26        37.87      32.49
>> rnflow           32.25       32.35        26.47      24.06
>> test_fpu         11.34       11.35         9.31       8.04
>> tftt              1.91        1.92         1.93       1.87
>>
>> Geometric Mean   13.50       13.62        11.34      10.87
>>
>> Compile (secs)
>>                 dragonegg degg+vectorize degg+optnz  gfortran
>> ac                0.33        0.38         0.72       1.27
>> aermod           25.91       27.58        32.34      43.91
>> air               1.07        1.25         1.52       2.25
>> capacita          0.49        0.52         0.89       1.71
>> channel           0.29        0.36         0.50       0.62
>> doduc             1.71        4.50         3.25       5.34
>> fatigue           0.84        0.97         1.19       1.76
>> gas_dyn           0.67        0.68         1.20       3.02
>> induct            1.60        2.14         2.82       3.99
>> linpk             0.22        0.24         0.47       0.78
>> mdbx              0.63        0.77         1.16       1.85
>> nf                0.37        0.40         0.70       1.66
>> protein           0.93        1.02         1.75       4.01
>> rnflow            1.20        1.25         2.63       5.44
>> test_fpu          0.88        0.92         2.13       4.39
>> tftt              0.21        0.24         0.34       0.56
>>
>> Executable (bytes)
>>                 dragonegg degg+vectorize  degg+optnz  gfortran
>> ac                26856       26856        39120      50968
>> aermod          1043700     1055988      1046288    1265640
>> air               62004       62004        53740      73988
>> capacita          41416       41416        45552      73896
>> channel           22808       22808        26768      34784
>> doduc            128448      128448       136996     197240
>> fatigue           69824       69824        69840      86080
>> gas_dyn           59112       59112        67416     119744
>> induct           163152      167248       167344     174976
>> linpk             18752       18752        27056      38648
>> mdbx              53692       53692        57884      82112
>> nf                23960       23960        32104      71800
>> protein           75032       75032        87208     132040
>> rnflow            71896       71896        96632     181120
>> test_fpu          54272       54272        78776     155072
>> tftt              18640       18640        18488      30768
>>

Duncan Sands

2012-Apr-06 09:38 UTC

head link

[LLVMdev] pb05 results for current llvm/dragonegg

Hi Anton,
>>                dragonegg degg+vectorize degg+optnz  gfortran
>> ac               12.45       12.45         8.85       8.80
>> gas_dyn          11.72       11.80         4.47       4.26
>> induct           24.02       24.91        12.08      13.65
>> rnflow           32.25       32.35        26.47      24.06
> Any idea what might cause such differences here?
if I'm reading Jack's latest numbers right, for gas_dyn and induct the
difference is mainly due to GCC's vectorizer:

with GCC's vectorizer and other optimizations:

gas_dyn 4.47
induct  12.08

without GCC's vectorizer but with GCC's other optimizations:

gas_dyn 10.02
induct  20.54

without any GCC optimizations, only LLVM's optimizers:

gas_dyn 11.72
induct  24.02

So even without vectorization GCC is doing a better job, but not hugely
better.

Ciao, Duncan.

Duncan Sands

2012-Apr-06 13:22 UTC

head link

[LLVMdev] pb05 results for current llvm/dragonegg

On 03/04/12 09:03, Anton Korobeynikov wrote:> Hi Jack
>
>>                dragonegg degg+vectorize degg+optnz  gfortran
>> ac               12.45       12.45         8.85       8.80
>> gas_dyn          11.72       11.80         4.47       4.26
>> induct           24.02       24.91        12.08      13.65
>> rnflow           32.25       32.35        26.47      24.06
> Any idea what might cause such differences here?
>
With the attached patch to turn x/c into x*(1.0/c) in the code generators
if -ffast-math is enabled, "ac" with LLVM optimizers goes from 40%
slower
to 5% slower when compared to "ac" compiled with the GCC optimizers.

Currently LLVM does very little in the way of -ffast-math optimizations.
There's clearly a lot of room for improvement here.

Ciao, Duncan.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: recip.diff
Type: text/x-patch
Size: 1238 bytes
Desc: not available
URL:
<lists.llvm.org/pipermail/llvm-dev/attachments/20120406/14e208d9/attachment.bin>

Possibly Parallel Threads

Search for more maybe matching threads

llvm dev - Apr 2012 - [LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

[LLVMdev] pb05 results for current llvm/dragonegg

Possibly Parallel Threads