Ahmed Bougacha
2015-Feb-26 00:57 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
Hi all, I've started looking at the GlobalMerge pass, enabled by default on ARM and AArch64. I think we should reconsider that, at least for AArch64. As is, the pass just merges all globals together, in groups of 4KB (AArch64, 128B on ARM). At the time it was enabled, the general thinking was "it's almost free, it doesn't affect performance much, we might as well use it". Now, it's preventing some link-time optimizations (as acknowledged in one of the FIXMEs). -- Performance impact Overall, it isn't that profitable on the test-suite, and actually degrades performance on a lot of other - "non-benchmark" - projects I tried (where the main reason to use a global is file- or function- static variables, only accessed through a single getter function). Across several runs on the entire test-suite, when disabling the pass, I measured: without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean regression. As for just SPEC2006, there are two big regressions: 400.perlbench (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o). Numbers are attached. -- A way forward One obvious way to improve it is: look at uses of globals, and try to form sets of globals commonly used together. The tricky part is to define heuristics for "commonly". Also, the pass then becomes much more expensive. I'm currently looking into improving it, and will report if I come up with a good solution. But this shouldn't stop us from disabling it, for now. Also, the pass seems like a good candidate for -O3/CodeGenOpt::Aggressive. However, the latter is implied by LTO, which IMO shouldn't include these not-always-profitable optimizations. That's another problem though. Right now, I think we should disable the pass by default, until it's deemed profitable enough. -Ahmed
Ahmed Bougacha
2015-Feb-26 01:12 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
With the numbers! -Ahmed On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote:> Hi all, > > I've started looking at the GlobalMerge pass, enabled by default on > ARM and AArch64. I think we should reconsider that, at least for > AArch64. > > As is, the pass just merges all globals together, in groups of 4KB > (AArch64, 128B on ARM). > > At the time it was enabled, the general thinking was "it's almost > free, it doesn't affect performance much, we might as well use it". > Now, it's preventing some link-time optimizations (as acknowledged in > one of the FIXMEs). > > > -- Performance impact > Overall, it isn't that profitable on the test-suite, and actually > degrades performance on a lot of other - "non-benchmark" - projects I > tried (where the main reason to use a global is file- or function- > static variables, only accessed through a single getter function). > > Across several runs on the entire test-suite, when disabling the pass, > I measured: > without LTO, a -0.19% geomean improvement > with LTO, a +0.11% geomean regression. > > As for just SPEC2006, there are two big regressions: 400.perlbench > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o). > > Numbers are attached. > > > -- A way forward > One obvious way to improve it is: look at uses of globals, and try to > form sets of globals commonly used together. The tricky part is to > define heuristics for "commonly". Also, the pass then becomes much > more expensive. I'm currently looking into improving it, and will > report if I come up with a good solution. But this shouldn't stop us > from disabling it, for now. > > Also, the pass seems like a good candidate for > -O3/CodeGenOpt::Aggressive. However, the latter is implied by LTO, > which IMO shouldn't include these not-always-profitable optimizations. > That's another problem though. > > > > Right now, I think we should disable the pass by default, until it's > deemed profitable enough. > > -Ahmed-------------- next part -------------- Name Prev Current % σ Geometric_Mean 7.3031 7.3114 0.11% - External/SPEC/CINT2006/400_perlbench/400_perlbench 13.8190 15.2806 10.58% 0.0295 MultiSource/Benchmarks/nbench/nbench 25.8387 28.1101 8.79% 1.6962 MultiSource/Benchmarks/Ptrdist/yacr2/yacr2 1.5316 1.6158 5.50% 0.0014 External/SPEC/CINT2000/164_gzip/164_gzip 21.9532 22.8344 4.01% 0.0074 External/SPEC/CFP2000/183_equake/183_equake 11.2462 11.6433 3.53% 0.0444 MultiSource/Applications/siod/siod 4.3631 4.5159 3.50% 0.0090 MultiSource/Applications/aha/aha 3.4522 3.5723 3.48% 0.0016 MultiSource/Benchmarks/sim/sim 19.7369 20.4045 3.38% 0.0068 External/SPEC/CINT2006/471_omnetpp/471_omnetpp 1.2566 1.2855 2.30% 0.0027 SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt 19.8913 20.2556 1.83% 0.0200 MultiSource/Benchmarks/MallocBench/espresso/espresso 1.4505 1.4696 1.32% 0.0049 SingleSource/Benchmarks/Misc-C++/Large/ray 5.5605 5.6181 1.04% 0.0064 MultiSource/Benchmarks/VersaBench/dbms/dbms 8.6059 8.6775 0.83% 0.0171 External/SPEC/CINT2000/175_vpr/175_vpr 5.4864 5.5294 0.78% 0.0121 SingleSource/Benchmarks/BenchmarkGame/Large/fasta 3.9053 3.9339 0.73% 0.0205 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/2mm/2mm 62.4370 62.8789 0.71% 0.0465 MultiSource/Benchmarks/Ptrdist/anagram/anagram 2.1400 2.1536 0.64% 0.0037 SingleSource/Benchmarks/Misc/fbench 3.8568 3.8805 0.61% 0.0012 MultiSource/Applications/minisat/minisat 17.7843 17.8932 0.61% 0.0186 SingleSource/Benchmarks/Misc/ffbench 2.3597 2.3737 0.59% 0.0102 SingleSource/Benchmarks/McGill/chomp 2.5608 2.5755 0.57% 0.0097 SingleSource/Benchmarks/Misc-C++-EH/spirit 79.2880 79.7056 0.53% 0.2736 External/SPEC/CFP2000/179_art/179_art 1.8351 1.8438 0.47% 0.0125 MultiSource/Benchmarks/Prolangs-C++/life/life 6.4227 6.4505 0.43% 0.0025 MultiSource/Benchmarks/Ptrdist/ks/ks 2.1927 2.2014 0.40% 0.0011 SingleSource/Benchmarks/Shootout/heapsort 5.9351 5.9578 0.38% 0.0050 MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath 1.2302 1.2347 0.37% 0.0034 External/SPEC/CINT2000/253_perlbmk/253_perlbmk 13.0019 13.0500 0.37% 0.0367 MultiSource/Benchmarks/MallocBench/cfrac/cfrac 4.9374 4.9549 0.35% 0.0041 External/SPEC/CINT2000/181_mcf/181_mcf 16.0710 16.1268 0.35% 0.0175 MultiSource/Applications/SPASS/SPASS 24.9780 25.0625 0.34% 0.0538 SingleSource/Benchmarks/Shootout-C++/hash 2.0198 2.0262 0.32% 0.0041 External/SPEC/CINT2006/403_gcc/403_gcc 3.5535 3.5646 0.31% 0.0029 MultiSource/Benchmarks/Olden/tsp/tsp 2.8420 2.8503 0.29% 0.0115 SingleSource/Benchmarks/Shootout-C++/heapsort 5.9312 5.9463 0.25% 0.0062 MultiSource/Applications/lua/lua 46.1691 46.2676 0.21% 0.0345 MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset 1.1586 1.1608 0.19% 0.0019 MultiSource/Applications/lemon/lemon 5.0917 5.1007 0.18% 0.0105 External/SPEC/CINT2000/254_gap/254_gap 3.9326 3.9384 0.15% 0.0016 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/doitgen/doitgen 44.8336 44.8965 0.14% 0.0153 External/SPEC/CFP2000/177_mesa/177_mesa 5.0751 5.0824 0.14% 0.0065 SingleSource/Benchmarks/Shootout/fib2 5.1268 5.1335 0.13% 0.0009 SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d 21.4724 21.5013 0.13% 0.0111 SingleSource/Benchmarks/BenchmarkGame/spectral-norm 1.2974 1.2989 0.12% 0.0012 MultiSource/Applications/spiff/spiff 7.9347 7.9442 0.12% 0.0045 External/SPEC/CINT2006/401_bzip2/401_bzip2 5.4607 5.4674 0.12% 0.0129 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemm/gemm 42.9808 43.0270 0.11% 0.0458 SingleSource/Benchmarks/CoyoteBench/almabench 14.9431 14.9590 0.11% 0.0016 MultiSource/Benchmarks/mafft/pairlocalalign 43.3389 43.3860 0.11% 0.0189 MultiSource/Benchmarks/VersaBench/beamformer/beamformer 2.0441 2.0464 0.11% 0.0016 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trmm/trmm 25.7529 25.7798 0.10% 0.0232 SingleSource/Benchmarks/Misc-C++/Large/sphereflake 15.7138 15.7301 0.10% 0.0021 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syrk/syrk 28.8875 28.9144 0.09% 0.0158 SingleSource/Benchmarks/Misc/perlin 3.6677 3.6709 0.09% 0.0012 SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant 5.2340 5.2388 0.09% 0.0009 MultiSource/Benchmarks/Trimaran/enc-md5/enc-md5 3.3993 3.4025 0.09% 0.0009 MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt 7.5897 7.5965 0.09% 0.0011 SingleSource/Benchmarks/Shootout-C++/ary3 3.4389 3.4417 0.08% 0.0034 MultiSource/Applications/viterbi/viterbi 3.3764 3.3791 0.08% 0.0124 SingleSource/Benchmarks/Misc/dt 2.1518 2.1533 0.07% 0.0004 MultiSource/Benchmarks/Olden/bh/bh 5.2680 5.2718 0.07% 0.0022 MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk 12.5760 12.5847 0.07% 0.0054 MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk 27.7489 27.7697 0.07% 0.0262 SingleSource/Benchmarks/Shootout/matrix 8.1274 8.1321 0.06% 0.0064 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky 22.5187 22.5329 0.06% 0.0009 SingleSource/Benchmarks/Misc-C++/bigfib 1.0933 1.0940 0.06% 0.0025 MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt 3.0828 3.0847 0.06% 0.0029 SingleSource/Benchmarks/Shootout-C++/fibo 5.1254 5.1281 0.05% 0.0043 SingleSource/Benchmarks/Polybench/stencils/fdtd-2d/fdtd-2d 60.3410 60.3728 0.05% 0.0186 SingleSource/Benchmarks/Polybench/datamining/correlation/correlation 28.9262 28.9399 0.05% 0.0249 SingleSource/Benchmarks/Misc/oourafft 7.3429 7.3463 0.05% 0.0023 External/SPEC/CINT2000/255_vortex/255_vortex 4.0073 4.0095 0.05% 0.0167 SingleSource/UnitTests/Vector/build2 4.8375 4.8393 0.04% 0.0026 MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt 4.3792 4.3808 0.04% 0.0030 MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt 5.9010 5.9035 0.04% 0.0031 SingleSource/UnitTests/Vectorizer/gcc-loops 7.0147 7.0165 0.03% 0.0018 SingleSource/Benchmarks/Polybench/stencils/adi/adi 28.9532 28.9609 0.03% 0.0031 SingleSource/Benchmarks/CoyoteBench/huffbench 30.0285 30.0367 0.03% 0.0053 SingleSource/Benchmarks/CoyoteBench/fftbench 2.7896 2.7905 0.03% 0.0167 SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding 7.3225 7.3247 0.03% 0.0025 MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt 8.9059 8.9088 0.03% 0.0031 MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl 12.0051 12.0085 0.03% 0.0076 MultiSource/Benchmarks/SciMark2-C/scimark2 93.3103 93.3358 0.03% 0.1591 MultiSource/Benchmarks/7zip/7zip-benchmark 20.6589 20.6641 0.03% 0.0124 External/SPEC/CFP2000/188_ammp/188_ammp 37.1753 37.1857 0.03% 0.0110 SingleSource/Benchmarks/Shootout/hash 11.4575 11.4599 0.02% 0.0386 SingleSource/Benchmarks/Misc/himenobmtxpa 4.1632 4.1642 0.02% 0.0076 SingleSource/Benchmarks/Misc/flops-6 2.8494 2.8499 0.02% 0.0013 MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt 11.7863 11.7886 0.02% 0.0015 MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl 10.9588 10.9613 0.02% 0.0022 MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl 8.2816 8.2834 0.02% 0.0035 SingleSource/Benchmarks/Shootout/ary3 3.4459 3.4461 0.01% 0.0009 MultiSource/Benchmarks/TSVC/ControlLoops-dbl/ControlLoops-dbl 7.8174 7.8180 0.01% 0.0049 External/SPEC/CFP2006/444_namd/444_namd 24.6662 24.6697 0.01% 0.0101 SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper/jacobi-2d-imper 20.6729 20.6731 0.00% 0.0168 SingleSource/Benchmarks/Misc/flops-8 4.4988 4.4986 -0.00% 0.0013 SingleSource/Benchmarks/Misc/flops-4 2.3163 2.3162 -0.00% 0.0006 SingleSource/Benchmarks/Misc/flops-3 5.1341 5.1340 -0.00% 0.0023 MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt 7.0536 7.0533 -0.00% 0.0016 MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt 9.4046 9.4042 -0.00% 0.0038 MultiSource/Applications/SIBsim4/SIBsim4 5.7070 5.7068 -0.00% 0.0337 SingleSource/Benchmarks/Shootout/random 6.4056 6.4050 -0.01% 0.0018 SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog/dynprog 1.5024 1.5023 -0.01% 0.0025 SingleSource/Benchmarks/Misc/fp-convert 6.3689 6.3682 -0.01% 0.0020 SingleSource/Benchmarks/SmallPT/smallpt 17.3723 17.3689 -0.02% 0.0185 SingleSource/Benchmarks/Shootout-C++/random 6.4061 6.4048 -0.02% 0.0038 SingleSource/Benchmarks/Misc-C++/stepanov_v1p2 31.6586 31.6512 -0.02% 0.0062 MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 5.7215 5.7205 -0.02% 0.0050 MultiSource/Benchmarks/VersaBench/8b10b/8b10b 4.5487 4.5477 -0.02% 0.0033 MultiSource/Benchmarks/Trimaran/enc-rc4/enc-rc4 1.8434 1.8430 -0.02% 0.0008 MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl 11.7529 11.7501 -0.02% 0.0009 MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt 5.2616 5.2606 -0.02% 0.0017 MultiSource/Benchmarks/Olden/bisort/bisort 1.5467 1.5464 -0.02% 0.0067 MultiSource/Benchmarks/NPB-serial/is/is 23.2431 23.2382 -0.02% 0.0090 MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 8.5092 8.5075 -0.02% 0.0033 External/SPEC/CINT2000/197_parser/197_parser 5.1147 5.1137 -0.02% 0.0015 SingleSource/Benchmarks/Shootout/methcall 6.4458 6.4437 -0.03% 0.0075 SingleSource/Benchmarks/Misc/lowercase 13.5456 13.5417 -0.03% 0.0066 SingleSource/Benchmarks/Misc/flops-1 2.9235 2.9227 -0.03% 0.0031 SingleSource/Benchmarks/Misc/flops 11.4648 11.4617 -0.03% 0.0030 MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt 8.2453 8.2432 -0.03% 0.0060 MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt 7.3856 7.3834 -0.03% 0.0055 MultiSource/Benchmarks/TSVC/Expansion-dbl/Expansion-dbl 8.1743 8.1720 -0.03% 0.0047 MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt 4.1145 4.1134 -0.03% 0.0039 MultiSource/Benchmarks/Olden/em3d/em3d 4.7882 4.7869 -0.03% 0.0074 MultiSource/Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl 6.1398 6.1372 -0.04% 0.0019 MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl 15.4533 15.4465 -0.04% 0.0078 MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl 4.9759 4.9740 -0.04% 0.0022 External/SPEC/CFP2006/447_dealII/447_dealII 46.2282 46.2074 -0.04% 0.0252 SingleSource/Benchmarks/Shootout-C++/lists 36.0278 36.0094 -0.05% 0.0632 SingleSource/Benchmarks/Polybench/datamining/covariance/covariance 31.2489 31.2321 -0.05% 0.0126 SingleSource/Benchmarks/Misc-C++/mandel-text 5.2266 5.2240 -0.05% 0.0007 SingleSource/Benchmarks/BenchmarkGame/n-body 2.5714 2.5700 -0.05% 0.0043 SingleSource/Benchmarks/Adobe-C++/functionobjects 6.7637 6.7604 -0.05% 0.0103 MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl 7.0214 7.0177 -0.05% 0.0039 MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt 13.4103 13.4037 -0.05% 0.0046 MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl 8.2158 8.2116 -0.05% 0.0027 MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl 11.7576 11.7519 -0.05% 0.0049 MultiSource/Benchmarks/TSVC/ControlLoops-flt/ControlLoops-flt 5.2467 5.2439 -0.05% 0.0063 SingleSource/Benchmarks/Misc/flops-5 3.1762 3.1743 -0.06% 0.0007 SingleSource/Benchmarks/Dhrystone/fldry 1.1614 1.1607 -0.06% 0.0005 SingleSource/Benchmarks/BenchmarkGame/nsieve-bits 1.7750 1.7740 -0.06% 0.0017 External/SPEC/CFP2006/433_milc/433_milc 33.8436 33.8237 -0.06% 0.0363 SingleSource/Benchmarks/Shootout-C++/matrix 8.9047 8.8989 -0.07% 0.0030 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syr2k/syr2k 34.5594 34.5358 -0.07% 0.0247 MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt 8.8731 8.8669 -0.07% 0.0070 MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl 8.4914 8.4855 -0.07% 0.0100 MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk 16.6098 16.5980 -0.07% 0.0228 MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 6.0691 6.0649 -0.07% 0.0045 External/SPEC/CINT2000/256_bzip2/256_bzip2 16.4431 16.4322 -0.07% 0.0243 SingleSource/Benchmarks/Polybench/medley/floyd-warshall/floyd-warshall 24.8368 24.8162 -0.08% 0.0274 SingleSource/Benchmarks/Misc/whetstone 3.3019 3.2992 -0.08% 0.0021 MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl 8.4315 8.4250 -0.08% 0.0036 MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl 10.9391 10.9305 -0.08% 0.0118 SingleSource/Benchmarks/Misc/flops-2 2.0506 2.0486 -0.10% 0.0009 MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt 7.2996 7.2923 -0.10% 0.0048 MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt 7.6079 7.6006 -0.10% 0.0060 SingleSource/Benchmarks/Shootout/sieve 10.2211 10.2097 -0.11% 0.0022 SingleSource/Benchmarks/Misc-C++/stepanov_container 12.4799 12.4667 -0.11% 0.0238 MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl 8.5261 8.5170 -0.11% 0.0169 SingleSource/Benchmarks/Stanford/FloatMM 1.0081 1.0069 -0.12% 0.0009 SingleSource/Benchmarks/Linpack/linpack-pc 10.4873 10.4747 -0.12% 0.0072 SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction 15.8164 15.7968 -0.12% 0.0128 MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url 8.8264 8.8162 -0.12% 0.0134 MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl 9.0248 9.0136 -0.12% 0.0070 SingleSource/Benchmarks/Shootout-C++/methcall 10.6864 10.6724 -0.13% 0.0362 SingleSource/Benchmarks/Misc/flops-7 3.7619 3.7570 -0.13% 0.0033 MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc 2.2602 2.2572 -0.13% 0.0017 SingleSource/Benchmarks/CoyoteBench/lpbench 7.5007 7.4897 -0.15% 0.0128 MultiSource/Benchmarks/Trimaran/enc-3des/enc-3des 3.7517 3.7460 -0.15% 0.0033 SingleSource/Benchmarks/Shootout/lists 13.1471 13.1267 -0.16% 0.0551 SingleSource/Benchmarks/Shootout-C++/hash2 4.7540 4.7462 -0.16% 0.0034 SingleSource/Benchmarks/BenchmarkGame/fannkuch 6.0614 6.0518 -0.16% 0.0045 SingleSource/Benchmarks/Adobe-C++/stepanov_vector 8.0509 8.0378 -0.16% 0.0083 MultiSource/Benchmarks/PAQ8p/paq8p 149.6876 149.4524 -0.16% 0.0519 External/SPEC/CINT2006/429_mcf/429_mcf 6.9057 6.8944 -0.16% 0.0380 External/SPEC/CINT2006/458_sjeng/458_sjeng 9.0737 9.0581 -0.17% 0.0019 SingleSource/Benchmarks/Misc/pi 1.6726 1.6696 -0.18% 0.0005 MultiSource/Benchmarks/VersaBench/bmm/bmm 4.1654 4.1581 -0.18% 0.0039 MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl 9.0921 9.0753 -0.18% 0.0112 MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt 6.6077 6.5958 -0.18% 0.0046 MultiSource/Benchmarks/Ptrdist/bc/bc 2.1649 2.1610 -0.18% 0.0093 MultiSource/Benchmarks/Olden/health/health 1.1067 1.1046 -0.19% 0.0059 MultiSource/Benchmarks/Bullet/bullet 12.9644 12.9401 -0.19% 0.0022 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm 48.4500 48.3520 -0.20% 0.1228 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/3mm/3mm 81.3587 81.1940 -0.20% 0.1381 SingleSource/Benchmarks/Polybench/stencils/fdtd-apml/fdtd-apml 21.9017 21.8559 -0.21% 0.0250 External/SPEC/CINT2000/300_twolf/300_twolf 6.7851 6.7709 -0.21% 0.0233 MultiSource/Applications/hexxagon/hexxagon 11.6343 11.6038 -0.26% 0.0055 SingleSource/Benchmarks/Adobe-C++/loop_unroll 4.8459 4.8323 -0.28% 0.0053 MultiSource/Benchmarks/BitBench/five11/five11 16.4992 16.4471 -0.32% 0.0301 External/SPEC/CINT2006/456_hmmer/456_hmmer 6.2090 6.1893 -0.32% 0.0400 MultiSource/Benchmarks/Ptrdist/ft/ft 2.7223 2.7127 -0.35% 0.0117 MultiSource/Applications/kimwitu++/kc 1.2440 1.2397 -0.35% 0.0035 External/Povray/povray 6.2752 6.2534 -0.35% 0.0092 External/SPEC/CINT2006/473_astar/473_astar 18.3398 18.2675 -0.39% 0.0330 External/SPEC/CFP2006/470_lbm/470_lbm 8.7915 8.7450 -0.53% 0.0142 SingleSource/Benchmarks/Misc/salsa20 12.1149 12.0498 -0.54% 0.0042 MultiSource/Applications/sqlite3/sqlite3 9.6281 9.5603 -0.70% 0.0071 SingleSource/Benchmarks/Shootout-C++/sieve 3.8675 3.8400 -0.71% 0.0052 External/SPEC/CINT2006/464_h264ref/464_h264ref 29.3124 29.0909 -0.76% 0.0209 SingleSource/Benchmarks/BenchmarkGame/recursive 2.0749 2.0584 -0.80% 0.0087 MultiSource/Benchmarks/llubenchmark/llu 16.9753 16.8235 -0.89% 0.6042 SingleSource/Benchmarks/Shootout-C++/lists1 1.5553 1.5377 -1.13% 0.0525 External/SPEC/CINT2000/176_gcc/176_gcc 2.3017 2.2752 -1.15% 0.0046 MultiSource/Applications/lambda-0_1_3/lambda 8.5209 8.4160 -1.23% 0.0298 SingleSource/Benchmarks/Misc/richards_benchmark 2.0727 2.0469 -1.24% 0.0023 SingleSource/Benchmarks/Misc/ReedSolomon 11.1675 11.0251 -1.28% 0.0116 MultiSource/Benchmarks/Fhourstones/fhourstones 2.6764 2.6422 -1.28% 0.0013 MultiSource/Applications/JM/lencod/lencod 11.6660 11.5088 -1.35% 0.0380 MultiSource/Benchmarks/Olden/power/power 3.3588 3.3125 -1.38% 0.0090 SingleSource/Benchmarks/McGill/queens 4.3032 4.2279 -1.75% 0.0017 MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 2.7329 2.6841 -1.79% 0.0136 SingleSource/Benchmarks/Shootout-C++/ackermann 1.7911 1.7417 -2.76% 0.0535 External/SPEC/CINT2000/186_crafty/186_crafty 9.2726 8.9953 -2.99% 0.0307 MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 1.9760 1.8928 -4.21% 0.0041 -------------- next part -------------- Name Prev Current % σ Geometric_Mean 10.5869 10.5672 -0.19% - External/SPEC/CFP2006/447_dealII/447_dealII 67.2395 70.2412 4.46% 0.0157 External/SPEC/CINT2006/471_omnetpp/471_omnetpp 2.5765 2.6774 3.92% 0.0060 MultiSource/Applications/viterbi/viterbi 6.4184 6.6424 3.49% 0.0140 External/SPEC/CINT2006/400_perlbench/400_perlbench 28.2597 29.0212 2.69% 0.0086 External/SPEC/CINT2000/176_gcc/176_gcc 4.2152 4.2763 1.45% 0.0018 SingleSource/Benchmarks/BenchmarkGame/Large/fasta 7.2906 7.3895 1.36% 0.0126 MultiSource/Benchmarks/MiBench/consumer-typeset/consumer-typeset 2.2512 2.2805 1.30% 0.0018 SingleSource/Benchmarks/Misc/fbench 7.3290 7.4000 0.97% 0.0001 MultiSource/Benchmarks/Olden/power/power 4.0953 4.1303 0.85% 0.0055 External/Povray/povray 11.2566 11.3514 0.84% 0.0275 SingleSource/Benchmarks/Shootout-C++/ary3 2.5008 2.5194 0.74% 0.0000 MultiSource/Benchmarks/Fhourstones-3_1/fhourstones3_1 4.2135 4.2448 0.74% 0.0001 MultiSource/Benchmarks/Ptrdist/bc/bc 4.3017 4.3329 0.73% 0.0839 MultiSource/Benchmarks/Ptrdist/yacr2/yacr2 2.5940 2.6120 0.69% 0.0004 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gesummv/gesummv 1.2450 1.2527 0.62% 0.0298 External/SPEC/CINT2000/164_gzip/164_gzip 42.6990 42.9624 0.62% 0.0020 SingleSource/Benchmarks/Shootout-C++/hash 3.1169 3.1326 0.50% 0.0026 MultiSource/Benchmarks/Olden/em3d/em3d 9.0344 9.0743 0.44% 0.0029 MultiSource/Benchmarks/PAQ8p/paq8p 203.8303 204.5563 0.36% 0.0777 MultiSource/Benchmarks/FreeBench/distray/distray 1.0886 1.0923 0.34% 0.0011 SingleSource/Benchmarks/Shootout/strcat 1.0068 1.0100 0.32% 0.0054 MultiSource/Benchmarks/Fhourstones/fhourstones 4.2599 4.2737 0.32% 0.0077 SingleSource/Benchmarks/Polybench/linear-algebra/solvers/durbin/durbin 1.5734 1.5778 0.28% 0.0007 External/SPEC/CINT2006/401_bzip2/401_bzip2 9.6590 9.6849 0.27% 0.0037 SingleSource/Benchmarks/Shootout/hash 18.5884 18.6371 0.26% 0.0870 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemver/gemver 1.2968 1.2999 0.24% 0.0017 SingleSource/Benchmarks/Misc/richards_benchmark 3.7366 3.7455 0.24% 0.0011 MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl 16.6973 16.7363 0.23% 0.0034 MultiSource/Benchmarks/Olden/health/health 1.6507 1.6543 0.22% 0.0098 SingleSource/Benchmarks/Polybench/medley/floyd-warshall/floyd-warshall 45.8621 45.9599 0.21% 0.0023 MultiSource/Applications/minisat/minisat 26.0747 26.1273 0.20% 0.0457 External/SPEC/CINT2000/175_vpr/175_vpr 10.7064 10.7270 0.19% 0.0080 MultiSource/Benchmarks/Olden/treeadd/treeadd 1.6336 1.6365 0.18% 0.0005 MultiSource/Benchmarks/TSVC/NodeSplitting-flt/NodeSplitting-flt 14.5233 14.5480 0.17% 0.0027 MultiSource/Applications/JM/lencod/lencod 21.3973 21.4331 0.17% 0.0097 MultiSource/Benchmarks/Olden/voronoi/voronoi 1.5845 1.5871 0.16% 0.0001 MultiSource/Applications/lua/lua 84.3032 84.4422 0.16% 0.0119 SingleSource/Benchmarks/Misc/whetstone 6.3760 6.3857 0.15% 0.0040 SingleSource/Benchmarks/Misc/dt 4.0528 4.0589 0.15% 0.0032 External/SPEC/CFP2006/433_milc/433_milc 44.9927 45.0582 0.15% 0.0760 SingleSource/Benchmarks/Polybench/stencils/adi/adi 52.6780 52.7451 0.13% 0.0699 MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt 16.0171 16.0341 0.11% 0.0036 MultiSource/Benchmarks/TSVC/Expansion-flt/Expansion-flt 9.8227 9.8332 0.11% 0.0002 MultiSource/Benchmarks/TSVC/Expansion-dbl/Expansion-dbl 14.4773 14.4928 0.11% 0.0003 SingleSource/Benchmarks/Misc/flops-3 9.6567 9.6664 0.10% 0.0008 MultiSource/Benchmarks/Bullet/bullet 21.4226 21.4438 0.10% 0.0202 SingleSource/Benchmarks/Misc/flops-6 8.1478 8.1552 0.09% 0.0002 MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl 13.2342 13.2460 0.09% 0.0028 MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl 27.1836 27.2068 0.09% 0.0015 MultiSource/Benchmarks/MiBench/telecomm-CRC32/telecomm-CRC32 16.1273 16.1416 0.09% 0.0038 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/symm/symm 84.6112 84.6768 0.08% 0.1243 MultiSource/Benchmarks/Trimaran/enc-rc4/enc-rc4 3.4261 3.4289 0.08% 0.0012 SingleSource/Benchmarks/Shootout/fib2 9.6467 9.6536 0.07% 0.0022 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syr2k/syr2k 64.3661 64.4095 0.07% 0.0219 SingleSource/Benchmarks/Misc/flops 19.0371 19.0512 0.07% 0.0056 MultiSource/Benchmarks/TSVC/IndirectAddressing-dbl/IndirectAddressing-dbl 20.1492 20.1639 0.07% 0.0101 MultiSource/Benchmarks/TSVC/ControlLoops-flt/ControlLoops-flt 13.1223 13.1310 0.07% 0.0263 MultiSource/Benchmarks/TSVC/ControlFlow-flt/ControlFlow-flt 14.3561 14.3666 0.07% 0.0050 SingleSource/Benchmarks/Polybench/datamining/covariance/covariance 58.2370 58.2718 0.06% 0.0053 SingleSource/Benchmarks/Misc-C++/Large/sphereflake 27.9345 27.9508 0.06% 0.0086 SingleSource/Benchmarks/CoyoteBench/lpbench 8.8731 8.8780 0.06% 0.0019 MultiSource/Benchmarks/VersaBench/beamformer/beamformer 3.8410 3.8432 0.06% 0.0015 MultiSource/Benchmarks/TSVC/Symbolics-dbl/Symbolics-dbl 12.5129 12.5201 0.06% 0.0010 SingleSource/Benchmarks/Shootout/methcall 12.1446 12.1503 0.05% 0.0044 SingleSource/Benchmarks/Misc/fp-convert 11.8017 11.8073 0.05% 0.0042 SingleSource/Benchmarks/Dhrystone/fldry 2.1822 2.1832 0.05% 0.0011 MultiSource/Applications/hexxagon/hexxagon 38.8084 38.8262 0.05% 0.0040 External/SPEC/CFP2006/444_namd/444_namd 46.4356 46.4600 0.05% 0.0151 SingleSource/Benchmarks/Shootout-C++/methcall 20.1204 20.1293 0.04% 0.0442 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/syrk/syrk 54.0330 54.0550 0.04% 0.0073 SingleSource/Benchmarks/Misc/flops-4 4.3561 4.3580 0.04% 0.0021 SingleSource/Benchmarks/BenchmarkGame/n-body 4.7856 4.7875 0.04% 0.0027 SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant 9.2258 9.2291 0.04% 0.0077 MultiSource/Benchmarks/VersaBench/8b10b/8b10b 24.2272 24.2367 0.04% 0.0095 MultiSource/Benchmarks/TSVC/Equivalencing-dbl/Equivalencing-dbl 9.0187 9.0221 0.04% 0.0014 MultiSource/Benchmarks/TSVC/CrossingThresholds-dbl/CrossingThresholds-dbl 15.1459 15.1525 0.04% 0.0050 MultiSource/Benchmarks/Prolangs-C++/life/life 12.2587 12.2631 0.04% 0.0068 SingleSource/Benchmarks/Shootout/sieve 19.2302 19.2352 0.03% 0.0006 SingleSource/Benchmarks/CoyoteBench/huffbench 59.0514 59.0703 0.03% 0.0402 MultiSource/Benchmarks/Trimaran/enc-3des/enc-3des 7.0538 7.0557 0.03% 0.0055 MultiSource/Benchmarks/TSVC/NodeSplitting-dbl/NodeSplitting-dbl 18.6089 18.6152 0.03% 0.0138 MultiSource/Benchmarks/Olden/bh/bh 7.1381 7.1403 0.03% 0.0017 MultiSource/Benchmarks/TSVC/InductionVariable-dbl/InductionVariable-dbl 20.6407 20.6458 0.02% 0.0005 MultiSource/Benchmarks/ASC_Sequoia/CrystalMk/CrystalMk 23.7325 23.7384 0.02% 0.0030 External/SPEC/CFP2000/179_art/179_art 3.2527 3.2535 0.02% 0.0020 SingleSource/UnitTests/Vectorizer/gcc-loops 11.5070 11.5087 0.01% 0.0025 SingleSource/Benchmarks/Misc/flops-7 7.0714 7.0721 0.01% 0.0013 SingleSource/Benchmarks/Misc/flops-2 3.8555 3.8558 0.01% 0.0020 SingleSource/Benchmarks/Dhrystone/dry 1.8572 1.8574 0.01% 0.0016 MultiSource/Benchmarks/VersaBench/bmm/bmm 7.8049 7.8059 0.01% 0.0042 MultiSource/Benchmarks/TSVC/StatementReordering-dbl/StatementReordering-dbl 16.6236 16.6253 0.01% 0.0034 MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk 27.6958 27.6980 0.01% 0.0297 External/SPEC/CINT2006/473_astar/473_astar 34.3679 34.3704 0.01% 0.0584 External/SPEC/CINT2000/256_bzip2/256_bzip2 29.7645 29.7685 0.01% 0.0033 SingleSource/Benchmarks/Shootout-C++/random 12.0664 12.0667 0.00% 0.0035 SingleSource/Benchmarks/Misc/matmul_f64_4x4 3.8693 3.8693 0.00% 0.0033 SingleSource/Benchmarks/Adobe-C++/stepanov_abstraction 29.4658 29.4649 -0.00% 0.0041 SingleSource/Benchmarks/Adobe-C++/functionobjects 13.3742 13.3739 -0.00% 0.0015 MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc 4.1236 4.1236 0.00% 0.0052 MultiSource/Benchmarks/Trimaran/enc-md5/enc-md5 6.4056 6.4053 -0.00% 0.0067 MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl 15.8606 15.8609 0.00% 0.0052 MultiSource/Benchmarks/TSVC/LoopRerolling-dbl/LoopRerolling-dbl 22.0044 22.0051 0.00% 0.0007 MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt 7.3534 7.3534 0.00% 0.0046 MultiSource/Benchmarks/7zip/7zip-benchmark 36.0418 36.0428 0.00% 0.0088 External/SPEC/CINT2000/181_mcf/181_mcf 24.8176 24.8179 0.00% 0.0247 SingleSource/Benchmarks/Shootout/heapsort 9.0304 9.0296 -0.01% 0.0023 SingleSource/Benchmarks/Shootout-C++/matrix 16.7011 16.6988 -0.01% 0.0013 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/trmm/trmm 47.8234 47.8176 -0.01% 0.0084 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/2mm/2mm 102.9151 102.9050 -0.01% 0.0121 External/SPEC/CFP2006/470_lbm/470_lbm 12.9028 12.9016 -0.01% 0.0095 SingleSource/Benchmarks/Misc-C++/stepanov_v1p2 59.7152 59.7029 -0.02% 0.0129 SingleSource/Benchmarks/CoyoteBench/almabench 28.7084 28.7021 -0.02% 0.0037 SingleSource/Benchmarks/Adobe-C++/loop_unroll 7.0497 7.0481 -0.02% 0.0029 MultiSource/Benchmarks/TSVC/StatementReordering-flt/StatementReordering-flt 13.7407 13.7376 -0.02% 0.0007 MultiSource/Benchmarks/TSVC/LinearDependence-dbl/LinearDependence-dbl 14.8833 14.8804 -0.02% 0.0117 External/SPEC/CINT2000/254_gap/254_gap 6.8229 6.8214 -0.02% 0.0114 SingleSource/UnitTests/Vector/build2 9.1135 9.1106 -0.03% 0.0009 SingleSource/Benchmarks/Misc/salsa20 21.5502 21.5445 -0.03% 0.0065 SingleSource/Benchmarks/Misc/pi 3.1478 3.1468 -0.03% 0.0006 SingleSource/Benchmarks/Misc/himenobmtxpa 8.4782 8.4758 -0.03% 0.0004 SingleSource/Benchmarks/Adobe-C++/stepanov_vector 15.0206 15.0165 -0.03% 0.0012 SingleSource/Benchmarks/Adobe-C++/simple_types_constant_folding 13.5083 13.5042 -0.03% 0.0046 MultiSource/Benchmarks/Trimaran/netbench-url/netbench-url 16.5930 16.5881 -0.03% 0.0027 MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl 14.7625 14.7582 -0.03% 0.0063 SingleSource/Benchmarks/Shootout-C++/heapsort 9.0328 9.0292 -0.04% 0.0046 MultiSource/Benchmarks/VersaBench/ecbdes/ecbdes 11.2640 11.2600 -0.04% 0.0047 MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt 14.0157 14.0107 -0.04% 0.0062 External/SPEC/CINT2006/456_hmmer/456_hmmer 11.2880 11.2839 -0.04% 0.0255 SingleSource/Benchmarks/Shootout/matrix 7.1550 7.1517 -0.05% 0.0008 SingleSource/Benchmarks/Misc/flops-8 8.4762 8.4720 -0.05% 0.0002 SingleSource/Benchmarks/Misc/flops-5 8.2136 8.2098 -0.05% 0.0071 SingleSource/Benchmarks/Misc-C++/bigfib 1.9124 1.9115 -0.05% 0.0036 SingleSource/Benchmarks/Misc-C++/Large/ray 9.2812 9.2769 -0.05% 0.0088 SingleSource/Benchmarks/BenchmarkGame/spectral-norm 2.4673 2.4660 -0.05% 0.0046 MultiSource/Benchmarks/TSVC/ControlLoops-dbl/ControlLoops-dbl 18.4625 18.4531 -0.05% 0.0033 External/SPEC/CINT2006/458_sjeng/458_sjeng 15.8751 15.8668 -0.05% 0.0054 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/3mm/3mm 131.4519 131.3674 -0.06% 0.0614 MultiSource/Benchmarks/TSVC/Recurrences-flt/Recurrences-flt 25.3504 25.3364 -0.06% 0.0017 MultiSource/Benchmarks/TSVC/GlobalDataFlow-dbl/GlobalDataFlow-dbl 13.0418 13.0344 -0.06% 0.0058 SingleSource/Benchmarks/Misc/flops-1 5.5064 5.5023 -0.07% 0.0057 SingleSource/Benchmarks/Misc-C++/stepanov_container 20.8460 20.8313 -0.07% 0.0138 MultiSource/Benchmarks/TSVC/Searching-flt/Searching-flt 13.2449 13.2354 -0.07% 0.0099 MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl 29.2329 29.2124 -0.07% 0.0084 SingleSource/Benchmarks/Shootout/random 12.0630 12.0532 -0.08% 0.0049 SingleSource/Benchmarks/Polybench/stencils/seidel-2d/seidel-2d 40.1327 40.1015 -0.08% 0.0044 SingleSource/Benchmarks/Polybench/stencils/fdtd-apml/fdtd-apml 40.9448 40.9136 -0.08% 0.0271 MultiSource/Benchmarks/Ptrdist/ks/ks 4.1315 4.1280 -0.08% 0.0003 MultiSource/Benchmarks/MiBench/automotive-basicmath/automotive-basicmath 2.2459 2.2442 -0.08% 0.0061 SingleSource/UnitTests/Vector/multiplies 1.4981 1.4968 -0.09% 0.0008 SingleSource/Benchmarks/Shootout-C++/hash2 8.9958 8.9875 -0.09% 0.0063 MultiSource/Benchmarks/TSVC/Reductions-flt/Reductions-flt 26.4367 26.4125 -0.09% 0.0131 MultiSource/Benchmarks/NPB-serial/is/is 39.7440 39.7065 -0.09% 0.0185 MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt 16.8347 16.8171 -0.10% 0.0003 SingleSource/Benchmarks/BenchmarkGame/nsieve-bits 2.9601 2.9569 -0.11% 0.0040 External/SPEC/CINT2000/197_parser/197_parser 9.1718 9.1619 -0.11% 0.0088 External/SPEC/CFP2000/177_mesa/177_mesa 8.6173 8.6080 -0.11% 0.0132 SingleSource/Benchmarks/Polybench/linear-algebra/solvers/dynprog/dynprog 2.8281 2.8246 -0.12% 0.0001 External/SPEC/CINT2006/403_gcc/403_gcc 6.5736 6.5655 -0.12% 0.0046 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/doitgen/doitgen 83.9004 83.7948 -0.13% 0.0978 SingleSource/Benchmarks/McGill/queens 7.9352 7.9249 -0.13% 0.0072 MultiSource/Benchmarks/SciMark2-C/scimark2 197.3006 197.0500 -0.13% 0.9087 MultiSource/Benchmarks/ASC_Sequoia/AMGmk/AMGmk 51.3858 51.3187 -0.13% 0.1052 SingleSource/Benchmarks/Misc-C++/mandel-text 9.8456 9.8317 -0.14% 0.0046 MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt 12.8381 12.8206 -0.14% 0.0079 MultiSource/Benchmarks/TSVC/Symbolics-flt/Symbolics-flt 7.3114 7.3004 -0.15% 0.0065 MultiSource/Benchmarks/TSVC/LoopRerolling-flt/LoopRerolling-flt 17.0470 17.0217 -0.15% 0.0126 SingleSource/Benchmarks/Shootout-C++/fibo 9.6620 9.6468 -0.16% 0.0069 MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt 12.9542 12.9316 -0.17% 0.0129 MultiSource/Benchmarks/TSVC/Equivalencing-flt/Equivalencing-flt 7.1249 7.1126 -0.17% 0.0010 MultiSource/Benchmarks/Olden/tsp/tsp 4.3922 4.3849 -0.17% 0.0000 SingleSource/Benchmarks/Shootout/lists 24.9095 24.8635 -0.18% 0.1084 SingleSource/Benchmarks/Stanford/FloatMM 1.8683 1.8648 -0.19% 0.0026 SingleSource/Benchmarks/SmallPT/smallpt 32.5939 32.5323 -0.19% 0.0220 SingleSource/Benchmarks/BenchmarkGame/recursive 3.9489 3.9413 -0.19% 0.0047 MultiSource/Applications/kimwitu++/kc 2.3180 2.3135 -0.19% 0.0033 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/bicg/bicg 1.1623 1.1600 -0.20% 0.0033 SingleSource/Benchmarks/Misc/oourafft 14.0678 14.0391 -0.20% 0.0016 SingleSource/Benchmarks/Polybench/stencils/jacobi-2d-imper/jacobi-2d-imper 38.5470 38.4656 -0.21% 0.0298 MultiSource/Benchmarks/TSVC/CrossingThresholds-flt/CrossingThresholds-flt 11.1176 11.0940 -0.21% 0.0076 MultiSource/Applications/sqlite3/sqlite3 15.5954 15.5630 -0.21% 0.0004 External/SPEC/CINT2000/253_perlbmk/253_perlbmk 25.2330 25.1798 -0.21% 0.0496 SingleSource/Benchmarks/Misc-C++/oopack_v1p8 1.0764 1.0740 -0.22% 0.0006 MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4 1.4849 1.4817 -0.22% 0.0051 MultiSource/Applications/spiff/spiff 14.4858 14.4545 -0.22% 0.0574 MultiSource/Benchmarks/Olden/perimeter/perimeter 1.2419 1.2391 -0.23% 0.0077 SingleSource/Benchmarks/Polybench/datamining/correlation/correlation 54.0652 53.9381 -0.24% 0.0382 SingleSource/Benchmarks/Misc/ReedSolomon 20.9796 20.9287 -0.24% 0.0067 MultiSource/Benchmarks/BitBench/five11/five11 31.1810 31.1057 -0.24% 0.0457 MultiSource/Benchmarks/Ptrdist/anagram/anagram 4.2150 4.2045 -0.25% 0.0036 MultiSource/Benchmarks/Prolangs-C++/primes/primes 1.2353 1.2321 -0.26% 0.0031 External/SPEC/CINT95/132_ijpeg/132_ijpeg 1.0627 1.0599 -0.26% 0.0002 SingleSource/Benchmarks/BenchmarkGame/partialsums 1.1037 1.1007 -0.27% 0.0005 MultiSource/Benchmarks/Trimaran/enc-pc1/enc-pc1 3.1418 3.1326 -0.29% 0.0045 MultiSource/Benchmarks/McCat/12-IOtest/iotest 1.1610 1.1576 -0.29% 0.0086 MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000 9.5529 9.5252 -0.29% 0.0118 SingleSource/Benchmarks/Misc/perlin 6.8606 6.8402 -0.30% 0.0062 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/cholesky/cholesky 41.9528 41.8198 -0.32% 0.0167 SingleSource/Benchmarks/BenchmarkGame/fannkuch 11.7015 11.6642 -0.32% 0.0188 External/SPEC/CFP2000/188_ammp/188_ammp 58.6433 58.4554 -0.32% 0.1549 MultiSource/Benchmarks/Ptrdist/ft/ft 4.5580 4.5429 -0.33% 0.0054 SingleSource/Benchmarks/Shootout-C++/lists 67.9327 67.6826 -0.37% 0.2186 SingleSource/Benchmarks/Polybench/stencils/fdtd-2d/fdtd-2d 112.8710 112.4582 -0.37% 0.3840 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/gemm/gemm 72.8211 72.5548 -0.37% 0.2036 MultiSource/Applications/aha/aha 8.0554 8.0258 -0.37% 0.0124 MultiSource/Benchmarks/Olden/bisort/bisort 2.4604 2.4507 -0.39% 0.0024 SingleSource/Benchmarks/BenchmarkGame/puzzle 1.2789 1.2736 -0.41% 0.0004 MultiSource/Benchmarks/mafft/pairlocalalign 117.3061 116.8098 -0.42% 0.4370 MultiSource/Benchmarks/MallocBench/espresso/espresso 2.6729 2.6616 -0.42% 0.0042 External/SPEC/CINT2000/255_vortex/255_vortex 9.5583 9.5180 -0.42% 0.0146 MultiSource/Applications/lemon/lemon 9.3665 9.3263 -0.43% 0.0143 MultiSource/Applications/SIBsim4/SIBsim4 10.3535 10.3081 -0.44% 0.0232 SingleSource/Benchmarks/Shootout-C++/sieve 7.1308 7.0986 -0.45% 0.0037 SingleSource/Benchmarks/Misc/ffbench 4.4971 4.4769 -0.45% 0.0027 MultiSource/Benchmarks/MallocBench/cfrac/cfrac 9.7051 9.6584 -0.48% 0.0130 SingleSource/Benchmarks/Shootout/ary3 2.5090 2.4968 -0.49% 0.0004 External/SPEC/CINT2006/429_mcf/429_mcf 10.8008 10.7476 -0.49% 0.0219 SingleSource/Benchmarks/Polybench/linear-algebra/kernels/mvt/mvt 1.3281 1.3215 -0.50% 0.0078 External/SPEC/CINT2006/464_h264ref/464_h264ref 51.1022 50.8415 -0.51% 0.0348 MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm 1.4626 1.4550 -0.52% 0.0002 External/SPEC/CINT95/147_vortex/147_vortex 1.6119 1.6028 -0.56% 0.0056 External/SPEC/CINT95/099_go/099_go 1.1234 1.1171 -0.56% 0.0051 SingleSource/Benchmarks/Misc-C++-EH/spirit 149.3092 148.4177 -0.60% 0.4042 SingleSource/Benchmarks/Misc/lowercase 10.3197 10.2445 -0.73% 0.0181 SingleSource/Benchmarks/Polybench/linear-algebra/solvers/gramschmidt/gramschmidt 37.1810 36.8926 -0.78% 0.1427 SingleSource/Benchmarks/CoyoteBench/fftbench 4.3798 4.3458 -0.78% 0.0187 MultiSource/Benchmarks/VersaBench/dbms/dbms 15.8694 15.7439 -0.79% 0.1101 External/SPEC/CINT2000/186_crafty/186_crafty 17.4896 17.3513 -0.79% 0.0155 SingleSource/Benchmarks/McGill/chomp 4.8088 4.7626 -0.96% 0.0173 External/SPEC/CINT2000/300_twolf/300_twolf 13.5453 13.4135 -0.97% 0.0221 MultiSource/Applications/SPASS/SPASS 45.1814 44.6753 -1.12% 0.3057 MultiSource/Benchmarks/sim/sim 40.6488 40.1364 -1.26% 0.0127 External/SPEC/CFP2000/183_equake/183_equake 18.2451 17.9627 -1.55% 0.1728 MultiSource/Applications/siod/siod 7.2801 7.1457 -1.85% 0.0022 SingleSource/Benchmarks/Shootout-C++/lists1 2.9642 2.8584 -3.57% 0.0336 MultiSource/Applications/lambda-0_1_3/lambda 16.3782 15.7493 -3.84% 0.0309 MultiSource/Benchmarks/llubenchmark/llu 28.5323 27.3668 -4.08% 0.8212 SingleSource/Benchmarks/Linpack/linpack-pc 11.7001 11.1437 -4.76% 0.0122 SingleSource/Benchmarks/Shootout-C++/ackermann 3.6398 3.3878 -6.92% 0.0678 MultiSource/Benchmarks/nbench/nbench 42.8617 34.7371 -18.96% 3.9575
Kristof Beyls
2015-Feb-26 10:33 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
Hi Ahmed, Did you run these experiments on a platform with a linker that makes use of the AArch64CollectLOH-pass-produced information? I'm guessing that the AArch64CollectLOH-pass information and a linker that makes use of that information could affect the profitability of the GlobalMerge pass? Thanks, Kristof> -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Ahmed Bougacha > Sent: 26 February 2015 01:13 > To: LLVM Dev > Subject: Re: [LLVMdev] [RFC] AArch64: Should we disable GlobalMerge? > > With the numbers! > -Ahmed > > > On Wed, Feb 25, 2015 at 4:57 PM, Ahmed Bougacha > <ahmed.bougacha at gmail.com> wrote: > > Hi all, > > > > I've started looking at the GlobalMerge pass, enabled by default on > > ARM and AArch64. I think we should reconsider that, at least for > > AArch64. > > > > As is, the pass just merges all globals together, in groups of 4KB > > (AArch64, 128B on ARM). > > > > At the time it was enabled, the general thinking was "it's almost > > free, it doesn't affect performance much, we might as well use it". > > Now, it's preventing some link-time optimizations (as acknowledged in > > one of the FIXMEs). > > > > > > -- Performance impact > > Overall, it isn't that profitable on the test-suite, and actually > > degrades performance on a lot of other - "non-benchmark" - projects I > > tried (where the main reason to use a global is file- or function- > > static variables, only accessed through a single getter function). > > > > Across several runs on the entire test-suite, when disabling the pass, > > I measured: > > without LTO, a -0.19% geomean improvement with LTO, a +0.11% geomean > > regression. > > > > As for just SPEC2006, there are two big regressions: 400.perlbench > > (10.6% w/ LTO, 2.7% w/o) and 471.omnetpp (2.3% w/, 3.9% w/o). > > > > Numbers are attached. > > > > > > -- A way forward > > One obvious way to improve it is: look at uses of globals, and try to > > form sets of globals commonly used together. The tricky part is to > > define heuristics for "commonly". Also, the pass then becomes much > > more expensive. I'm currently looking into improving it, and will > > report if I come up with a good solution. But this shouldn't stop us > > from disabling it, for now. > > > > Also, the pass seems like a good candidate for > > -O3/CodeGenOpt::Aggressive. However, the latter is implied by LTO, > > which IMO shouldn't include these not-always-profitable optimizations. > > That's another problem though. > > > > > > > > Right now, I think we should disable the pass by default, until it's > > deemed profitable enough. > > > > -Ahmed
Renato Golin
2015-Feb-26 12:09 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
On 26 February 2015 at 00:57, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote:> -- A way forward > One obvious way to improve it is: look at uses of globals, and try to > form sets of globals commonly used together. The tricky part is to > define heuristics for "commonly". Also, the pass then becomes much > more expensive. I'm currently looking into improving it, and will > report if I come up with a good solution. But this shouldn't stop us > from disabling it, for now.Hi Ahmed, Before "moving forward", it would be good to understand what in GlobalMerge is impacting what in LTO. With LTO becoming more important nowadays, I agree we have to balance the compiler optimisations to work well with it, but by turning things off we might be impacting unknown code in an unknown way. We'll never know how unknown code behaves, but if at least we understand what of GM affects what of LTO, then people using unknown code will have a more informed view on what to disable, when. cheers, --renato
Jiangning Liu
2015-Feb-26 21:13 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
Hi Ahmed, Yes. I'd share with Kristof and Renato's concerns, and the impact/dependence upon link-time tool should be clarified before disabling this pass. On the other hand, actually the test on our hardware shows disabling this pass without LTO considered, some spec benchmarks would have big regressions, (positive is bad) spec.cpu2000.ref.253_perlbmk 3.27% spec.cpu2000.ref.254_gap 3.18% although I do see some improvements like below, (negative is good) spec.cpu2006.ref.400_perlbench -1.90% spec.cpu2006.ref.471_omnetpp -1.64% spec.cpu2006.ref.482_sphinx3 -1.03% Thanks, -Jiangning 2015-02-26 20:09 GMT+08:00 Renato Golin <renato.golin at linaro.org>:> On 26 February 2015 at 00:57, Ahmed Bougacha <ahmed.bougacha at gmail.com> > wrote: > > -- A way forward > > One obvious way to improve it is: look at uses of globals, and try to > > form sets of globals commonly used together. The tricky part is to > > define heuristics for "commonly". Also, the pass then becomes much > > more expensive. I'm currently looking into improving it, and will > > report if I come up with a good solution. But this shouldn't stop us > > from disabling it, for now. > > Hi Ahmed, > > Before "moving forward", it would be good to understand what in > GlobalMerge is impacting what in LTO. > > With LTO becoming more important nowadays, I agree we have to balance > the compiler optimisations to work well with it, but by turning things > off we might be impacting unknown code in an unknown way. > > We'll never know how unknown code behaves, but if at least we > understand what of GM affects what of LTO, then people using unknown > code will have a more informed view on what to disable, when. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150227/9f87ea2d/attachment.html>
Ahmed Bougacha
2015-Feb-27 22:03 UTC
[LLVMdev] [RFC] AArch64: Should we disable GlobalMerge?
On Thu, Feb 26, 2015 at 4:09 AM, Renato Golin <renato.golin at linaro.org> wrote:> On 26 February 2015 at 00:57, Ahmed Bougacha <ahmed.bougacha at gmail.com> wrote: >> -- A way forward >> One obvious way to improve it is: look at uses of globals, and try to >> form sets of globals commonly used together. The tricky part is to >> define heuristics for "commonly". Also, the pass then becomes much >> more expensive. I'm currently looking into improving it, and will >> report if I come up with a good solution. But this shouldn't stop us >> from disabling it, for now. > > Hi Ahmed, > > Before "moving forward", it would be good to understand what in > GlobalMerge is impacting what in LTO. > > With LTO becoming more important nowadays, I agree we have to balance > the compiler optimisations to work well with it, but by turning things > off we might be impacting unknown code in an unknown way. > > We'll never know how unknown code behaves, but if at least we > understand what of GM affects what of LTO, then people using unknown > code will have a more informed view on what to disable, when.Fair enough. First, a couple things to note: - GlobalMerge runs as a pre-ISel pass, so very late in the mid-level pipeline. - GlobalMerge (by default) only looks at internal globals. Internal globals come up with file- or function- static variables. In LTO, all module-level globals are internalized, and are eligible for merging. So, we can generally group global usage into a few categories: - a function that uses a local static variable (say, llvm::outs()) - a function that uses several globals at once. For instance, 400.perlbench's interpreter has a bunch of those, as does its parser/lexer. - a set of functions that share a few common globals (say, an inlined reference to a function-local static variable), but otherwise each use several other globals (again, perl's interpreter). GlobalMerge is only ever a win if we are able to share base pointers. This requires: - several globals being referenced - the references being close enough (otherwise we'll just rematerialize the base, or worse, increase register pressure) There is one obvious special case for the first requirement: if a global is only ever used alone, there's no point in merging it anywhere. (this is improvement #1). Once we can determine the set of used globals for each function, we can try to merge those sets only. (#2) We can try to better handle the second requirement, by having some more precise metric for distance between uses. One trivially available such metric is grouping used sets by parent basic-block rather than function (#3). Experimentally, #1 catches a lot of the singleton-ish globals out there, which is the majority in some of the more "modern" code I've looked at. It leaves the legitimate merging in perl alone. #2 (and even moreso #3) is actually too aggressive, and doesn't catch a lot/most of the profitable cases in perl. Consider: - a "g_log" global (or, say, LLVM's outs/dbgs/errs), used pretty much everywhere - several sets of globals, used in different parts of the program (perl's interpreter vs parser) You'd pick one of the latter sets, and add the "g_log" global to it. Now you made it more expensive everywhere you use "g_log", without the benefit of base sharing in all the other functions. So you need to be smart when picking the sets. You can combine some of them, using some cost metric. (#4) This is where it gets complicated. I'll try measuring some of those, see what happens on benchmarks. Again, that shouldn't stop us from enabling GlobalMerge less often. Hopefully it's clear that the pass isn't always a win, so -O3 should be OK. I'm less comfortable with disabling it on Darwin only, but that seems like the obvious next step. Thanks for the feedback! -Ahmed> cheers, > --renato
Possibly Parallel Threads
- [RFC] Delaying phi-to-select transformation until later in the pass pipeline
- [RFC] Delaying phi-to-select transformation until later in the pass pipeline
- [RFC] Delaying phi-to-select transformation until later in the pass pipeline
- [LLVMdev] Enabling the SLP vectorizer by default for -O3
- [LLVMdev] Enabling the SLP-vectorizer by default for -O3