Jack Howarth
2011-Jun-09  05:12 UTC
[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns
bugs for
usage with -ffast-math -O3 addressed except for those related to PR2314. Using
the -fno-tree-vectorize
option, we can evaluate the current state of
-fplugin-arg-dragonegg-enable-gcc-optzns with
the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4.
The runtime
benchmarks below show that we average slightly faster than stock gcc 4.5.4 and
significantly
faster than stock dragonegg through the use of
-fplugin-arg-dragonegg-enable-gcc-optzns.
x86_64 darwin 
A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize 
B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
-fplugin-arg-dragonegg-enable-gcc-optzns
C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
              gcc 4.5.4   dragonegg/optzns    dragonegg
ac               9.58          9.13            12.30
aermod          20.88         16.10            17.62
air              6.16          6.59             7.70
capacita        35.68         39.94            46.22
channel          2.03          2.04             1.96
doduc           28.28         28.43            30.41
fatigue          8.13          7.19            10.40 
gas_dyn         10.10          9.83            11.73
induct          20.17         20.76            48.76
linpk           15.42         15.65            15.69
mdbx            11.42         11.73            12.07
nf              27.99         28.60            29.39
protein         38.36         39.08            39.98
rnflow          27.28         28.19            31.90
test_fpu        11.43         11.17            11.50
tfft             1.91          1.95             2.16 
Mean            12.72         12.62            14.71
Once vector_select() is implemented we can retest without -fno-tree-vectorize.
Rotem, Nadav
2011-Jun-09  05:58 UTC
[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
Hi, 
Here's a quick update regarding the vector-select. I started committing my
vector-select patch[1] little by little. The general approach is to implement
Integer-Promotions legalization on vectors (rather than vector-widening). This
enables the widening of <4 x i1> masks into <4 x i32> masks, which
are used by the SIMD instruction set.
I started with some type-legalization refactoring. Next, I added a new flag to
enable the new kind of type-legalization and a few tests. After that, I added
the LegalizeTypes implementation of PromoteInteger for the new vector SDNodes
(buildvector, extract, etc) and the changes to copyFromParts/copyToParts (needed
for argument passing and inter basicblock variables). I added some tests for
arithmetic vector code.
My next patch is going to be augmenting the load/store code for saving and
storing of the modified vectors. A <4 x i8> vector is promoted to <4 x
i32> in registers, but still needs to be saved as <4 x i8> in memory.
After this patch goes it, we can do two things. First, we can consider removing
the special flag and enabling the new legalization strategy for all code.
Second, we can implement the vector select. The vector select part would be
easy. I am not sure how long it would take me to finish this patch, because I am
only working on this in the late evenings.
Cheers, 
Nadav
[1] -
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110502/120445.html
-----Original Message-----
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On
Behalf Of Jack Howarth
Sent: Thursday, June 09, 2011 08:13
To: llvmdev at cs.uiuc.edu
Subject: [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
  Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns
bugs for
usage with -ffast-math -O3 addressed except for those related to PR2314. Using
the -fno-tree-vectorize
option, we can evaluate the current state of
-fplugin-arg-dragonegg-enable-gcc-optzns with
the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4.
The runtime
benchmarks below show that we average slightly faster than stock gcc 4.5.4 and
significantly
faster than stock dragonegg through the use of
-fplugin-arg-dragonegg-enable-gcc-optzns.
x86_64 darwin 
A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize 
B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
-fplugin-arg-dragonegg-enable-gcc-optzns
C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
              gcc 4.5.4   dragonegg/optzns    dragonegg
ac               9.58          9.13            12.30
aermod          20.88         16.10            17.62
air              6.16          6.59             7.70
capacita        35.68         39.94            46.22
channel          2.03          2.04             1.96
doduc           28.28         28.43            30.41
fatigue          8.13          7.19            10.40 
gas_dyn         10.10          9.83            11.73
induct          20.17         20.76            48.76
linpk           15.42         15.65            15.69
mdbx            11.42         11.73            12.07
nf              27.99         28.60            29.39
protein         38.36         39.08            39.98
rnflow          27.28         28.19            31.90
test_fpu        11.43         11.17            11.50
tfft             1.91          1.95             2.16 
Mean            12.72         12.62            14.71
Once vector_select() is implemented we can retest without -fno-tree-vectorize.
_______________________________________________
LLVM Developers mailing list
LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
---------------------------------------------------------------------
Intel Israel (74) Limited
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
Duncan Sands
2011-Jun-09  07:51 UTC
[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
Hi Jack, thanks for these numbers. Can you also please measure compile times? I'm thinking of enabling gcc optimizations by default, but I don't want to increase compile times, which means choosing a value for the -fplugin-arg-dragonegg-llvm-ir-optimize option that is low enough to get good compile times, yet high enough to get fast code. It would be great if you could play around with this to find a good choice. Best wishes, Duncan.> Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns bugs for > usage with -ffast-math -O3 addressed except for those related to PR2314. Using the -fno-tree-vectorize > option, we can evaluate the current state of -fplugin-arg-dragonegg-enable-gcc-optzns with > the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4. The runtime > benchmarks below show that we average slightly faster than stock gcc 4.5.4 and significantly > faster than stock dragonegg through the use of -fplugin-arg-dragonegg-enable-gcc-optzns. > > x86_64 darwin > > A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize > B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns > C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize > > > Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/ > gcc 4.5.4 dragonegg/optzns dragonegg > > ac 9.58 9.13 12.30 > aermod 20.88 16.10 17.62 > air 6.16 6.59 7.70 > capacita 35.68 39.94 46.22 > channel 2.03 2.04 1.96 > doduc 28.28 28.43 30.41 > fatigue 8.13 7.19 10.40 > gas_dyn 10.10 9.83 11.73 > induct 20.17 20.76 48.76 > linpk 15.42 15.65 15.69 > mdbx 11.42 11.73 12.07 > nf 27.99 28.60 29.39 > protein 38.36 39.08 39.98 > rnflow 27.28 28.19 31.90 > test_fpu 11.43 11.17 11.50 > tfft 1.91 1.95 2.16 > > Mean 12.72 12.62 14.71 > > Once vector_select() is implemented we can retest without -fno-tree-vectorize. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Jack Howarth
2011-Jun-09  13:18 UTC
[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
Duncan,
    Below are the tabulated compile times and executable sizes.
A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize
B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
-fplugin-arg-dragonegg-enable-gcc-optzns
C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize
Compile time (seconds)
Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
               gcc 4.5.4   dragonegg/optzns    dragonegg
ac                0.61        1.65           0.32
aermod           31.24       25.83          21.02 
air               1.74        1.49           0.81
capacita          0.83        0.80           0.44
channel           0.34        0.33           0.25
doduc             3.09        2.63           1.63
fatigue           1.04        1.08           0.84
gas_dyn           0.91        0.95           0.75
induct            3.18        2.57           1.73
linpk             0.34        0.30           0.21
mdbx              1.08        1.01           0.59
nf                0.39        0.41           0.28 
protein           1.55        1.29           0.97
rnflow            1.76        1.73           1.26
test_fpu          1.38        1.40           1.05
tfft              0.31        0.28           0.19
mean              3.11        2.73           2.02 
Executable size (bytes)
Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
               gcc 4.5.4   dragonegg/optzns    dragonegg
ac              26344        30896           26704
aermod        1145924      1043816         1052056
air             57404        57700           53532
capacita        40864        41008           37064
channel         22448        22664           22664
doduc          127340       124108          120124
fatigue         61152        65352           65664
gas_dyn        647864        58768 !!!       59024
induct         162360       180440          175312
linpk           18112        18848           18864
mdbx            53464        57652           49516
nf              22560        23784           24080 
protein         74320        74440           74816
rnflow          66040        71488           71648
test_fpu        52624        58224           58320
tfft            18416        18456           18600
The compile times with optzns are 26% slower than stock dragonegg
but 12% faster than stock gcc 4.5.4. The most interesting executable
size difference is gas_dyn which fastest with optzns but 11x larger
in size with stock gcc 4.5.4 compared to either stock dragonegg or
dragonegg with optzns. This is likely much improved in gcc 4.6 with
the new -fwhole-file default.
On Thu, Jun 09, 2011 at 09:51:51AM +0200, Duncan Sands
wrote:> Hi Jack, thanks for these numbers.  Can you also please measure compile
times?
> I'm thinking of enabling gcc optimizations by default, but I don't
want to
> increase compile times, which means choosing a value for the
> -fplugin-arg-dragonegg-llvm-ir-optimize option that is low enough to get
good
> compile times, yet high enough to get fast code.  It would be great if you
could
> play around with this to find a good choice.
> 
> Best wishes, Duncan.
> 
> >    Current dragonegg svn has all of the
-fplugin-arg-dragonegg-enable-gcc-optzns bugs for
> > usage with -ffast-math -O3 addressed except for those related to
PR2314. Using the -fno-tree-vectorize
> > option, we can evaluate the current state of
-fplugin-arg-dragonegg-enable-gcc-optzns with
> > the Polyhedron 2005 benchmarks compared to stock dragonegg and stock
gcc 4.5.4. The runtime
> > benchmarks below show that we average slightly faster than stock gcc
4.5.4 and significantly
> > faster than stock dragonegg through the use of
-fplugin-arg-dragonegg-enable-gcc-optzns.
> >
> > x86_64 darwin
> >
> > A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize
> > B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3
-fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns
> > C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3
-fno-tree-vectorize
> >
> >
> > Benchmark     A) stock    B) gcc 4.5.4/    C) gcc 4.5.4/
> >                gcc 4.5.4   dragonegg/optzns    dragonegg
> >
> > ac               9.58          9.13            12.30
> > aermod          20.88         16.10            17.62
> > air              6.16          6.59             7.70
> > capacita        35.68         39.94            46.22
> > channel          2.03          2.04             1.96
> > doduc           28.28         28.43            30.41
> > fatigue          8.13          7.19            10.40
> > gas_dyn         10.10          9.83            11.73
> > induct          20.17         20.76            48.76
> > linpk           15.42         15.65            15.69
> > mdbx            11.42         11.73            12.07
> > nf              27.99         28.60            29.39
> > protein         38.36         39.08            39.98
> > rnflow          27.28         28.19            31.90
> > test_fpu        11.43         11.17            11.50
> > tfft             1.91          1.95             2.16
> >
> > Mean            12.72         12.62            14.71
> >
> > Once vector_select() is implemented we can retest without
-fno-tree-vectorize.
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Reasonably Related Threads
- [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
- [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
- [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
- [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
- [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status