thr3ads.net - llvm dev - [LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Duncan Sands

2013-Jun-01 04:45 UTC

[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

Hi Jack,

On 29/05/13 22:04, Jack Howarth wrote:> On Wed, May 29, 2013 at 03:25:30PM +0200, Duncan Sands wrote:
>> Hi Jack, I pulled the loop vectorizer and fast math changes into the
3.3 branch,
>> so hopefully they will be part of 3.3 rc3 (and 3.3 final!).  It would
be great
>> if you could redo the benchmarks rc3.
>>
>
> Duncan,
>      As requested, appended are the updated Polyhedron 2005 benchmark
results with both RC1 and RC3 llvm 3.3 testing.
thanks for doing this.  As rc3 hasn't been tagged yet, I assume you used
latest
3.3svn?
> There is a small improvement in the dragonegg results (without
-fplugin-arg-dragonegg-enable-gcc-optzns) in RC3. I assume
> we still only have partial coverage of all of the -ffast-math optimizations
performed by FSF gcc in llvm's fast-math
> support, correct?
These results are very disappointing, I was hoping to see a big improvement
somewhere instead of no real improvement anywhere (except for gas_dyn) or a
regression (eg: mdbx).  I think LLVM now has a reasonable array of fast-math
optimizations.  I will try to find time to poke at gas_dyn and induct: since
turning on gcc's optimizations there halve the run-time, LLVM's IR
optimizers
are clearly missing something important.

Ciao, Duncan.
>                        Jack
>
> Tested on x86_apple-darwin12
>
> Compile Flags: -ffast-math -funroll-loops -O3
>
> de-gfc47: /sw/lib/gcc4.7/bin/gfortran
-fplugin=/sw/lib/gcc4.7/lib/dragonegg.so
-specs=/sw/lib/gcc4.7/lib/integrated-as.specs
> de-gfc48: /sw/lib/gcc4.8/bin/gfortran
-fplugin=/sw/lib/gcc4.8/lib/dragonegg.so
-specs=/sw/lib/gcc4.8/lib/integrated-as.specs
> de-gfc47+optzns: /sw/lib/gcc4.7/bin/gfortran
-fplugin=/sw/lib/gcc4.7/lib/dragonegg.so
-specs=/sw/lib/gcc4.7/lib/integrated-as.specs
> +-fplugin-arg-dragonegg-enable-gcc-optzns
> de-gfc48+optzns: /sw/lib/gcc4.8/bin/gfortran
-fplugin=/sw/lib/gcc4.8/lib/dragonegg.so
-specs=/sw/lib/gcc4.8/lib/integrated-as.specs
> +-fplugin-arg-dragonegg-enable-gcc-optzns
> gfortran47: /sw/bin/gfortran-fsf-4.7
> gfortran48: /sw/bin/gfortran-fsf-4.8
>
> Run time (secs)
>
> Benchmark         de-gfc47  de-gfc47  de-gfc48  de-gfc48  de-gfc47 
de-gfc47  de-gfc48  de-gfc48 gfortran47 gfortran48
>                                                            +optzns  
+optzns   +optzns   +optzns
>                       RC1       RC3     RC1       RC3       RC1       RC3  
RC1       RC3
> ac                  11.39     11.66   11.39     11.58      8.09      8.07  
8.14      8.14       8.18        8.05
> aermod              16.35     16.47   16.00     16.44     14.50     14.61  
15.28     14.43      16.45       16.23
> air                  6.88      6.87    6.77      6.77      5.42      5.42  
5.28      5.27       5.83        5.73
> capacita            39.85     37.80   39.83     37.86     34.71     34.81  
33.47     33.53      32.51       33.02
> channel              2.05      2.06    2.05      2.06      2.15      2.15  
1.99      1.99       1.83        1.83
> doduc               27.10     27.43   27.37     27.39     26.75     27.03  
26.31     26.24      25.91       25.76
> fatigue              8.85      8.84    8.81      8.88      7.72      7.75  
5.60      5.42       8.26        5.60
> gas_dyn             11.76      8.25   11.50      7.94      4.51      4.52  
4.21      4.20       3.88        3.59
> induct              24.01     24.45   24.04     24.04     11.86     11.90  
11.85     11.85      12.08       12.21
> linpk               15.43     15.48   15.48     15.49     15.40     15.47  
15.83     15.81      15.37       15.64
> mdbx                11.92     12.14   11.91     12.15     11.30     11.29  
11.27     11.27      11.18       11.42
> nf                  29.57     30.08   30.04     30.11     29.50     29.82  
29.59     29.86      27.21       27.25
> protein             36.15     36.15   35.21     35.17     35.93     36.02  
34.16     34.06      31.88       31.81
> rnflow              27.02     27.08   25.92     26.12     26.77     26.83  
22.20     22.21      24.67       21.21
> test_fpu            11.49     11.55   11.47     11.52      9.11      9.11  
9.30      9.30       7.90        8.01
> tfft                 1.92      1.94    1.92      1.92      1.92      1.92  
1.89      1.90       1.86        1.90
>
> Geom. Mean          13.19     12.95   13.10     12.83     10.99     11.02  
10.52     10.47      10.60       10.22
>
> Compile time (secs)
>
> Benchmark         de-gfc47  de-gfc47  de-gfc48  de-gfc48  de-gfc47 
de-gfc47  de-gfc48  de-gfc48 gfortran47 gfortran48
>                                                            +optzns  
+optzns   +optzns   +optzns
>                       RC1       RC3       RC1       RC3      RC1       RC3 
RC1       RC3
> ac                   0.62     1.63       0.29      0.93     2.20      1.02 
0.71      0.73      2.88        2.08
> aermod              35.19    35.57      20.44     35.86    43.50     43.39 
42.90     43.08     42.75       55.97
> air                  1.16     1.23       1.11      1.26     2.72      2.68 
2.40      2.35      4.48        4.28
> capacita             0.52     0.60       0.52      0.62     1.02      0.94 
1.04      0.96      1.90        1.89
> channel              0.26     0.28       0.23      0.30     0.47      0.45 
0.50      0.47      0.65        0.75
> doduc                1.74     1.89       1.74      1.91     3.78      3.71 
3.53      3.55      6.03        5.68
> fatigue              0.91     0.91       0.87      0.91     1.33      1.30 
1.49      1.49      1.97        2.04
> gas_dyn              0.70     0.87       0.63      0.88     1.40      1.37 
1.39      1.39      3.39        2.44
> induct               1.95     1.83       1.77      1.83     2.87      2.81 
2.99      3.02      4.08        4.42
> linpk                0.25     0.32       0.21      0.32     0.53      0.52 
0.72      0.73      0.92        1.25
> mdbx                 0.66     0.73       0.61      0.75     1.30      1.26 
1.24      1.15      2.16        1.90
> nf                   0.39     0.55       0.35      0.55     0.80      0.80 
0.74      0.74      2.12        1.67
> protein              1.12     1.18       1.03      1.20     2.01      1.99 
1.79      1.77      4.39        3.62
> rnflow               1.26     1.55       1.19      1.55     2.93      2.84 
2.72      2.73      6.43        5.47
> test_fpu             0.91     1.12       0.85      1.13     2.27      5.06 
2.22      2.23      5.28        4.26
> tfft                 0.22     0.24       0.18      0.22     0.39      0.40 
0.46      0.46      0.59        0.78
>
> Executable (bytes)
>
> Benchmark         de-gfc47  de-gfc47  de-gfc48  de-gfc48  de-gfc47 
de-gfc47  de-gfc48  de-gfc48 gfortran47 gfortran48
>                                                            +optzns  
+optzns   +optzns   +optzns
>                       RC1       RC3       RC1       RC3      RC1       RC3 
RC1       RC3
> ac                  26776     30896     26792     30912    47160     47160 
34928     34928       59120      42784
> aermod            1023024   1035312   1023064   1031248  1052728   1052728 
1031576   1031568     1392840    1286136
> air                 61940     61940     61948     61948    65964     65964 
61876     61876      110768     106680
> capaci              41344     45440     41144     41144    45440     45440 
45040     45040       77920      73248
> channe              22736     22600     22744     22608    26696     22600 
22552     22552       34704      34656
> doduc              128376    120188    128384    120196   140580    140580 
136296    136296      205320     189040
> fatigu              65648     69744     65640     69736    69808     69808 
73848     73848       90240      82040
> gas_dy              54840     58936     54936     59032    63144     63144 
71304     71304      123680      99184
> induct             163064    163064    158792    162888   163192    167288 
166920    171024      179080     170872
> linpk               18680     22896     18688     22904    22896     22896 
34920     34920       42640      50936
> mdbx                49492     57684     49508     57700    57692     57692 
53604     53604      90232       78032
> nf                  23880     32080     23888     27984    32088     32088 
32104     32104       84072      67744
> protei              74960     79056     75048     79144    87144     87144 
83128     83128      131976     115688
> rnflow              67704     79992     67712     80000    88248     88248 
96152     96152      205584     176912
> test_f              50000     62296     50008     62304    70440     70440 
78456     78456      179464     142608
> tfft                18568     18568     18576     18576    18416     18416 
22544     22544       30680      34832
>
>

Jack Howarth

2013-Jun-01 19:34 UTC

head link

[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

On Sat, Jun 01, 2013 at 06:45:48AM +0200, Duncan Sands
wrote:>
> These results are very disappointing, I was hoping to see a big improvement
> somewhere instead of no real improvement anywhere (except for gas_dyn) or a
> regression (eg: mdbx).  I think LLVM now has a reasonable array of
fast-math
> optimizations.  I will try to find time to poke at gas_dyn and induct:
since
> turning on gcc's optimizations there halve the run-time, LLVM's IR
optimizers
> are clearly missing something important.
>
> Ciao, Duncan.
Duncan,
   Appended are another set of benchmark runs where I attempted to decouple the
fast math optimizations from the vectorization by passing -fno-tree-vectorize.
I am unclear if dragonegg really honors -fno-tree-vectorize to disable the llvm
vectorization.

Tested on x86_apple-darwin12

Compile Flags: -ffast-math -funroll-loops -O3 -fno-tree-vectorize

de-gfc48: /sw/lib/gcc4.8/bin/gfortran -fplugin=/sw/lib/gcc4.8/lib/dragonegg.so
-specs=/sw/lib/gcc4.8/lib/integrated-as.specs
de-gfc48+optzns: /sw/lib/gcc4.8/bin/gfortran
-fplugin=/sw/lib/gcc4.8/lib/dragonegg.so
-specs=/sw/lib/gcc4.8/lib/integrated-as.spec
s -fplugin-arg-dragonegg-enable-gcc-optzns
gfortran48: /sw/bin/gfortran-fsf-4.8

Run time (secs)

Benchmark     de-gfc48  de-gfc48   gfortran48
                        +optzns 

ac             11.33      8.10       8.02 
aermod         16.03     14.45      16.13
air             6.80      5.28       5.73
capacita       39.89     35.21      34.96
channel         2.06      2.29       2.69 
doduc          27.35     26.13      25.74
fatigue         8.83      4.82       4.67
gas_dyn        11.41      9.79       9.60
induct         23.95     21.75      21.14
linpk          15.49     15.48      15.69
mdbx           11.91     11.28      11.39
nf             29.92     29.57      27.99
protein        36.34     33.94      31.91
rnflow         25.97     25.27      22.78
test_fpu       11.48     10.91       9.64
tfft            1.92      1.91       1.91 

Geom. Mean     13.12     11.70      11.64

Assuming that the de-gfc48+optzns run really has disabled the llvm
vectorization,
I am hoping that additional benchmarking of de-gfc48+optzns with individual
-ffast-math optimizations disabled (such as passing
-fno-unsafe-math-optimizations)
may give us a clue as the the origin of the performance delta between the stock
dragonegg results with -ffast-math and those with
-fplugin-arg-dragonegg-enable-gcc-optzns.
      Jack

Jack Howarth

2013-Jun-01 23:13 UTC

head link

[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

On Sat, Jun 01, 2013 at 06:45:48AM +0200, Duncan Sands
wrote:>
> These results are very disappointing, I was hoping to see a big improvement
> somewhere instead of no real improvement anywhere (except for gas_dyn) or a
> regression (eg: mdbx).  I think LLVM now has a reasonable array of
fast-math
> optimizations.  I will try to find time to poke at gas_dyn and induct:
since
> turning on gcc's optimizations there halve the run-time, LLVM's IR
optimizers
> are clearly missing something important.
>
> Ciao, Duncan.
>
Duncan,
   In case it helps, I benchmarked disabling individual -ffast-math
optimizations (with partial results
appended). The most important optimization to the benchmark runtimes seems to be
-funsafe-math-optimizations
(as can be seen from the runtime regression caused by
-fno-unsafe-math-optimizations). Does llvm currently
support all of the features of FSF gcc's -funsafe-math-optimizations?
            Jack

Tested on x86_apple-darwin12

Compile Flags: -ffast-math -funroll-loops -O3 -fno-tree-vectorize

de-gfc48: /sw/lib/gcc4.8/bin/gfortran -fplugin=/sw/lib/gcc4.8/lib/dragonegg.so
-specs=/sw/lib/gcc4.8/lib/integrated-as.specs
de-gfc48+optzns: /sw/lib/gcc4.8/bin/gfortran
-fplugin=/sw/lib/gcc4.8/lib/dragonegg.so
-specs=/sw/lib/gcc4.8/lib/integrated-as.spec
s -fplugin-arg-dragonegg-enable-gcc-optzns
gfortran48: /sw/bin/gfortran-fsf-4.8
de-gfc48+nounsafe+optzns:/sw/lib/gcc4.8/bin/gfortran
-fplugin=/sw/lib/gcc4.8/lib/dragonegg.so -specs=/sw/lib/gcc4.8/lib/integrated
-as.specs -fplugin-arg-dragonegg-enable-gcc-optzns -fno-unsafe-math-optimzations
de-gfc48+math-errno+optzns: /sw/lib/gcc4.8/bin/gfortran
-fplugin=/sw/lib/gcc4.8/lib/dragonegg.so -specs=/sw/lib/gcc4.8/lib/integra
ted-as.specs -fplugin-arg-dragonegg-enable-gcc-optzns -fmath-errno
de-gfc48+math-signans+optzns: /sw/lib/gcc4.8/bin/gfortran
-fplugin=/sw/lib/gcc4.8/lib/dragonegg.so -specs=/sw/lib/gcc4.8/lib/integ
rated-as.specs -fplugin-arg-dragonegg-enable-gcc-optzns -fsignaling-nans

Run time (secs)

Benchmark     de-gfc48  de-gfc48   gfortran48 de-gfc48+nounsafe
de-gfc48+math-errno de-gfc48+math-signans
                        +optzns                   +optzns           +optzns     
+optzns

ac             11.33      8.10       8.02          9.20             8.10        
8.10
aermod         16.03     14.45      16.13         14.83            14.20        
14.51
air             6.80      5.28       5.73          6.84             5.26        
5.31
capacita       39.89     35.21      34.96         36.72            35.21        
35.51
channel         2.06      2.29       2.69          2.30             2.29        
2.30
doduc          27.35     26.13      25.74         29.90            26.42        
26.99
fatigue         8.83      4.82       4.67          5.60             4.87        
4.82
gas_dyn        11.41      9.79       9.60         12.97            10.56        
12.13
induct         23.95     21.75      21.14         22.34            21.39        
21.91
linpk          15.49     15.48      15.69         15.49            15.49        
15.52
mdbx           11.91     11.28      11.39         11.85            11.27        
11.83
nf             29.92     29.57      27.99         29.67            29.67        
29.47
protein        36.34     33.94      31.91         34.23            33.62        
33.97
rnflow         25.97     25.27      22.78         27.99            28.00        
28.00
test_fpu       11.48     10.91       9.64         10.95            10.94        
10.93
tfft            1.92      1.91       1.91          1.91             1.90        
1.91

Geom. Mean     13.12     11.70      11.64         12.62            11.82        
12.01

Duncan Sands

2013-Jun-02 08:27 UTC

head link

[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

Hi Jack, thanks for splitting out what the effects of LLVM's / GCC's
vectorizers
is.

On 01/06/13 21:34, Jack Howarth wrote:> On Sat, Jun 01, 2013 at 06:45:48AM +0200, Duncan Sands wrote:
>>
>> These results are very disappointing, I was hoping to see a big
improvement
>> somewhere instead of no real improvement anywhere (except for gas_dyn)
or a
>> regression (eg: mdbx).  I think LLVM now has a reasonable array of
fast-math
>> optimizations.  I will try to find time to poke at gas_dyn and induct:
since
>> turning on gcc's optimizations there halve the run-time, LLVM's
IR optimizers
>> are clearly missing something important.
>>
>> Ciao, Duncan.
>
> Duncan,
>     Appended are another set of benchmark runs where I attempted to
decouple the
> fast math optimizations from the vectorization by passing
-fno-tree-vectorize.
> I am unclear if dragonegg really honors -fno-tree-vectorize to disable the
llvm
> vectorization.
Yes, it does disable LLVM vectorization.
>
> Tested on x86_apple-darwin12
>
> Compile Flags: -ffast-math -funroll-loops -O3 -fno-tree-vectorize
Maybe -march=native would be a good addition.
>
> de-gfc48: /sw/lib/gcc4.8/bin/gfortran
-fplugin=/sw/lib/gcc4.8/lib/dragonegg.so
-specs=/sw/lib/gcc4.8/lib/integrated-as.specs
> de-gfc48+optzns: /sw/lib/gcc4.8/bin/gfortran
-fplugin=/sw/lib/gcc4.8/lib/dragonegg.so
-specs=/sw/lib/gcc4.8/lib/integrated-as.spec
> s -fplugin-arg-dragonegg-enable-gcc-optzns
> gfortran48: /sw/bin/gfortran-fsf-4.8
>
> Run time (secs)
What is the standard deviation for each benchmark?  If each run varies by +-5%
then that means that the changes in runtime of around 3% measured below
don't
mean anything.


Comparing with your previous benchmarks, I see:
>
> Benchmark     de-gfc48  de-gfc48   gfortran48
>                          +optzns
>
> ac             11.33      8.10       8.02
Turning on LLVM's vectorizer gives a 2% slowdown.
> aermod         16.03     14.45      16.13
Turning on LLVM's vectorizer gives a 2.5% slowdown.
> air             6.80      5.28       5.73
> capacita       39.89     35.21      34.96
Turning on LLVM's vectorizer gives a 5% speedup.  GCC gets a 5.5% speedup
from
its vectorizer.
> channel         2.06      2.29       2.69
GCC's gets a 30% speedup from its vectorizer which LLVM doesn't get.  On
the
other hand, without vectorization LLVM's version runs 23% faster than
GCC's, so
while GCC's vectorizer leaps GCC into the lead, the final speed difference
is
more in the order of GCC 10% faster.
> doduc          27.35     26.13      25.74
> fatigue         8.83      4.82       4.67
GCC's gets a 17% speedup from its vectorizer which LLVM doesn't get.
This is a good one to look at, because all the difference between GCC
and LLVM is coming from the mid-level optimizers: turning on GCC optzns
in dragonegg speeds up the program to GCC levels, so it is possible to
get LLVM IR with and without the effect of GCC optimizations, which should
make it fairly easy to understand what GCC is doing right here.
> gas_dyn        11.41      9.79       9.60
Turning on LLVM's vectorizer gives a 30% speedup.  GCC gets a comparable
speedup from its vectorizer.
> induct         23.95     21.75      21.14
GCC's gets a 40% speedup from its vectorizer which LLVM doesn't get. 
Like
fatigue, this is a case where we can get IR showing all the improvements that
the GCC optimizers made.
> linpk          15.49     15.48      15.69
> mdbx           11.91     11.28      11.39
Turning on LLVM's vectorizer gives a 2% slowdown
> nf             29.92     29.57      27.99
> protein        36.34     33.94      31.91
Turning on LLVM's vectorizer gives a 3% speedup.
> rnflow         25.97     25.27      22.78
GCC's gets a 7% speedup from its vectorizer which LLVM doesn't get.
> test_fpu       11.48     10.91       9.64
GCC's gets a 17% speedup from its vectorizer which LLVM doesn't get.
> tfft            1.92      1.91       1.91
>
> Geom. Mean     13.12     11.70      11.64
Ciao, Duncan.
>
> Assuming that the de-gfc48+optzns run really has disabled the llvm
vectorization,
> I am hoping that additional benchmarking of de-gfc48+optzns with individual
> -ffast-math optimizations disabled (such as passing
-fno-unsafe-math-optimizations)
> may give us a clue as the the origin of the performance delta between the
stock
> dragonegg results with -ffast-math and those with
-fplugin-arg-dragonegg-enable-gcc-optzns.
>        Jack
>

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Jun 2013 - [LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

Reasonably Related Threads