Jack Howarth
2011-Jun-09 05:12 UTC
[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns bugs for usage with -ffast-math -O3 addressed except for those related to PR2314. Using the -fno-tree-vectorize option, we can evaluate the current state of -fplugin-arg-dragonegg-enable-gcc-optzns with the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4. The runtime benchmarks below show that we average slightly faster than stock gcc 4.5.4 and significantly faster than stock dragonegg through the use of -fplugin-arg-dragonegg-enable-gcc-optzns. x86_64 darwin A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/ gcc 4.5.4 dragonegg/optzns dragonegg ac 9.58 9.13 12.30 aermod 20.88 16.10 17.62 air 6.16 6.59 7.70 capacita 35.68 39.94 46.22 channel 2.03 2.04 1.96 doduc 28.28 28.43 30.41 fatigue 8.13 7.19 10.40 gas_dyn 10.10 9.83 11.73 induct 20.17 20.76 48.76 linpk 15.42 15.65 15.69 mdbx 11.42 11.73 12.07 nf 27.99 28.60 29.39 protein 38.36 39.08 39.98 rnflow 27.28 28.19 31.90 test_fpu 11.43 11.17 11.50 tfft 1.91 1.95 2.16 Mean 12.72 12.62 14.71 Once vector_select() is implemented we can retest without -fno-tree-vectorize.
Rotem, Nadav
2011-Jun-09 05:58 UTC
[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
Hi, Here's a quick update regarding the vector-select. I started committing my vector-select patch[1] little by little. The general approach is to implement Integer-Promotions legalization on vectors (rather than vector-widening). This enables the widening of <4 x i1> masks into <4 x i32> masks, which are used by the SIMD instruction set. I started with some type-legalization refactoring. Next, I added a new flag to enable the new kind of type-legalization and a few tests. After that, I added the LegalizeTypes implementation of PromoteInteger for the new vector SDNodes (buildvector, extract, etc) and the changes to copyFromParts/copyToParts (needed for argument passing and inter basicblock variables). I added some tests for arithmetic vector code. My next patch is going to be augmenting the load/store code for saving and storing of the modified vectors. A <4 x i8> vector is promoted to <4 x i32> in registers, but still needs to be saved as <4 x i8> in memory. After this patch goes it, we can do two things. First, we can consider removing the special flag and enabling the new legalization strategy for all code. Second, we can implement the vector select. The vector select part would be easy. I am not sure how long it would take me to finish this patch, because I am only working on this in the late evenings. Cheers, Nadav [1] - http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20110502/120445.html -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Jack Howarth Sent: Thursday, June 09, 2011 08:13 To: llvmdev at cs.uiuc.edu Subject: [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns bugs for usage with -ffast-math -O3 addressed except for those related to PR2314. Using the -fno-tree-vectorize option, we can evaluate the current state of -fplugin-arg-dragonegg-enable-gcc-optzns with the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4. The runtime benchmarks below show that we average slightly faster than stock gcc 4.5.4 and significantly faster than stock dragonegg through the use of -fplugin-arg-dragonegg-enable-gcc-optzns. x86_64 darwin A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/ gcc 4.5.4 dragonegg/optzns dragonegg ac 9.58 9.13 12.30 aermod 20.88 16.10 17.62 air 6.16 6.59 7.70 capacita 35.68 39.94 46.22 channel 2.03 2.04 1.96 doduc 28.28 28.43 30.41 fatigue 8.13 7.19 10.40 gas_dyn 10.10 9.83 11.73 induct 20.17 20.76 48.76 linpk 15.42 15.65 15.69 mdbx 11.42 11.73 12.07 nf 27.99 28.60 29.39 protein 38.36 39.08 39.98 rnflow 27.28 28.19 31.90 test_fpu 11.43 11.17 11.50 tfft 1.91 1.95 2.16 Mean 12.72 12.62 14.71 Once vector_select() is implemented we can retest without -fno-tree-vectorize. _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Duncan Sands
2011-Jun-09 07:51 UTC
[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
Hi Jack, thanks for these numbers. Can you also please measure compile times? I'm thinking of enabling gcc optimizations by default, but I don't want to increase compile times, which means choosing a value for the -fplugin-arg-dragonegg-llvm-ir-optimize option that is low enough to get good compile times, yet high enough to get fast code. It would be great if you could play around with this to find a good choice. Best wishes, Duncan.> Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns bugs for > usage with -ffast-math -O3 addressed except for those related to PR2314. Using the -fno-tree-vectorize > option, we can evaluate the current state of -fplugin-arg-dragonegg-enable-gcc-optzns with > the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4. The runtime > benchmarks below show that we average slightly faster than stock gcc 4.5.4 and significantly > faster than stock dragonegg through the use of -fplugin-arg-dragonegg-enable-gcc-optzns. > > x86_64 darwin > > A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize > B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns > C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize > > > Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/ > gcc 4.5.4 dragonegg/optzns dragonegg > > ac 9.58 9.13 12.30 > aermod 20.88 16.10 17.62 > air 6.16 6.59 7.70 > capacita 35.68 39.94 46.22 > channel 2.03 2.04 1.96 > doduc 28.28 28.43 30.41 > fatigue 8.13 7.19 10.40 > gas_dyn 10.10 9.83 11.73 > induct 20.17 20.76 48.76 > linpk 15.42 15.65 15.69 > mdbx 11.42 11.73 12.07 > nf 27.99 28.60 29.39 > protein 38.36 39.08 39.98 > rnflow 27.28 28.19 31.90 > test_fpu 11.43 11.17 11.50 > tfft 1.91 1.95 2.16 > > Mean 12.72 12.62 14.71 > > Once vector_select() is implemented we can retest without -fno-tree-vectorize. > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Jack Howarth
2011-Jun-09 13:18 UTC
[LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
Duncan, Below are the tabulated compile times and executable sizes. A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize Compile time (seconds) Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/ gcc 4.5.4 dragonegg/optzns dragonegg ac 0.61 1.65 0.32 aermod 31.24 25.83 21.02 air 1.74 1.49 0.81 capacita 0.83 0.80 0.44 channel 0.34 0.33 0.25 doduc 3.09 2.63 1.63 fatigue 1.04 1.08 0.84 gas_dyn 0.91 0.95 0.75 induct 3.18 2.57 1.73 linpk 0.34 0.30 0.21 mdbx 1.08 1.01 0.59 nf 0.39 0.41 0.28 protein 1.55 1.29 0.97 rnflow 1.76 1.73 1.26 test_fpu 1.38 1.40 1.05 tfft 0.31 0.28 0.19 mean 3.11 2.73 2.02 Executable size (bytes) Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/ gcc 4.5.4 dragonegg/optzns dragonegg ac 26344 30896 26704 aermod 1145924 1043816 1052056 air 57404 57700 53532 capacita 40864 41008 37064 channel 22448 22664 22664 doduc 127340 124108 120124 fatigue 61152 65352 65664 gas_dyn 647864 58768 !!! 59024 induct 162360 180440 175312 linpk 18112 18848 18864 mdbx 53464 57652 49516 nf 22560 23784 24080 protein 74320 74440 74816 rnflow 66040 71488 71648 test_fpu 52624 58224 58320 tfft 18416 18456 18600 The compile times with optzns are 26% slower than stock dragonegg but 12% faster than stock gcc 4.5.4. The most interesting executable size difference is gas_dyn which fastest with optzns but 11x larger in size with stock gcc 4.5.4 compared to either stock dragonegg or dragonegg with optzns. This is likely much improved in gcc 4.6 with the new -fwhole-file default. On Thu, Jun 09, 2011 at 09:51:51AM +0200, Duncan Sands wrote:> Hi Jack, thanks for these numbers. Can you also please measure compile times? > I'm thinking of enabling gcc optimizations by default, but I don't want to > increase compile times, which means choosing a value for the > -fplugin-arg-dragonegg-llvm-ir-optimize option that is low enough to get good > compile times, yet high enough to get fast code. It would be great if you could > play around with this to find a good choice. > > Best wishes, Duncan. > > > Current dragonegg svn has all of the -fplugin-arg-dragonegg-enable-gcc-optzns bugs for > > usage with -ffast-math -O3 addressed except for those related to PR2314. Using the -fno-tree-vectorize > > option, we can evaluate the current state of -fplugin-arg-dragonegg-enable-gcc-optzns with > > the Polyhedron 2005 benchmarks compared to stock dragonegg and stock gcc 4.5.4. The runtime > > benchmarks below show that we average slightly faster than stock gcc 4.5.4 and significantly > > faster than stock dragonegg through the use of -fplugin-arg-dragonegg-enable-gcc-optzns. > > > > x86_64 darwin > > > > A) gcc 4.5.4svn using -msse3 -ffast-math -O3 -fno-tree-vectorize > > B) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize -fplugin-arg-dragonegg-enable-gcc-optzns > > C) gcc 4.5.4svn/dragonegg using -msse3 -ffast-math -O3 -fno-tree-vectorize > > > > > > Benchmark A) stock B) gcc 4.5.4/ C) gcc 4.5.4/ > > gcc 4.5.4 dragonegg/optzns dragonegg > > > > ac 9.58 9.13 12.30 > > aermod 20.88 16.10 17.62 > > air 6.16 6.59 7.70 > > capacita 35.68 39.94 46.22 > > channel 2.03 2.04 1.96 > > doduc 28.28 28.43 30.41 > > fatigue 8.13 7.19 10.40 > > gas_dyn 10.10 9.83 11.73 > > induct 20.17 20.76 48.76 > > linpk 15.42 15.65 15.69 > > mdbx 11.42 11.73 12.07 > > nf 27.99 28.60 29.39 > > protein 38.36 39.08 39.98 > > rnflow 27.28 28.19 31.90 > > test_fpu 11.43 11.17 11.50 > > tfft 1.91 1.95 2.16 > > > > Mean 12.72 12.62 14.71 > > > > Once vector_select() is implemented we can retest without -fno-tree-vectorize. > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Possibly Parallel Threads
- [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
- [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
- [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
- [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status
- [LLVMdev] -fplugin-arg-dragonegg-enable-gcc-optzns status