thr3ads.net - search: "dealii"

(RFC) Adjusting default loop fully unroll threshold

2017 Jan 30

4

(RFC) Adjusting default loop fully unroll threshold

...ller. This seems conservative because unlike dynamic/partial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: Code size: 447.dealII 0.50% 453.povray 0.42% 433.milc 0.20% 445.gobmk 0.32% 403.gcc 0.05% 464.h264ref 3.62% Compile Time: 447.dealII 0.22% 453.povray -0.16% 433.milc 0.09% 445.gobmk -2.43% 403.gcc 0.06% 464.h264ref 3.21% Performance (on intel sandybridge): 447.dealII +0.07% 453.povray +1.79% 433.milc +1.02% 445.gobmk...

(RFC) Adjusting default loop fully unroll threshold

2017 Jan 30

0

(RFC) Adjusting default loop fully unroll threshold

...rtial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368 <https://reviews.llvm.org/D28368>, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: > > Code size: > 447.dealII 0.50% > 453.povray 0.42% > 433.milc 0.20% > 445.gobmk 0.32% > 403.gcc 0.05% > 464.h264ref 3.62% > > Compile Time: > 447.dealII 0.22% > 453.povray -0.16% > 433.milc 0.09% > 445.gobmk -2.43% > 403.gcc 0.06% > 464.h264ref 3.21% > > Performance (on intel s...

(RFC) Adjusting default loop fully unroll threshold

2017 Jan 30

2

(RFC) Adjusting default loop fully unroll threshold

...use > unlike dynamic/partial unrolling, fully unrolling will not affect > LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to > double the threshold for loop fully unroller. This will change the codegen > of several SPECCPU benchmarks: > > Code size: > 447.dealII 0.50% > 453.povray 0.42% > 433.milc 0.20% > 445.gobmk 0.32% > 403.gcc 0.05% > 464.h264ref 3.62% > > Compile Time: > 447.dealII 0.22% > 453.povray -0.16% > 433.milc 0.09% > 445.gobmk -2.43% > 403.gcc 0.06% > 464.h264ref 3.21% > > Performance (on intel san...

(RFC) Adjusting default loop fully unroll threshold

2017 Jan 31

0

(RFC) Adjusting default loop fully unroll threshold

...partial unrolling, fully unrolling will not affect >> LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed >> to double the threshold for loop fully unroller. This will change the >> codegen of several SPECCPU benchmarks: >> >> Code size: >> 447.dealII 0.50% >> 453.povray 0.42% >> 433.milc 0.20% >> 445.gobmk 0.32% >> 403.gcc 0.05% >> 464.h264ref 3.62% >> >> Compile Time: >> 447.dealII 0.22% >> 453.povray -0.16% >> 433.milc 0.09% >> 445.gobmk -2.43% >> 403.gcc 0.06% >> 4...

New and more general Function Merging optimization for code size

2018 Aug 02

2

New and more general Function Merging optimization for code size

...n the final exectuable file over the baseline: 5.55% compared to 0.49% of the identical merge. Average reduction in the total number of instructions over the baseline: 7.04% compared to 0.47% of the identical merge. The highest reduction on the executable file is of about 20% (both 429.mcf and 447.dealII) and the highest reduction on the total number of instructions is of about 37% (447.dealII). It has an average slowdown of about 1%, but having no statistical difference from the baseline in most of the benchmarks in the SPEC'06 suite. Because this new function merging technique is able to m...

[LLVMdev] Measurements of the new inlinehint attribute

2010 Feb 15

0

[LLVMdev] Measurements of the new inlinehint attribute

...83.equake 0.00% -1.85% 3.54% 0.00% SPEC/CFP2000/188.ammp/188.ammp 0.28% -0.18% 48.68% 3.10% SPEC/CFP2006/433.milc/433.milc 0.00% -0.14% 20.31% 2.68% SPEC/CFP2006/444.namd/444.namd 0.04% 0.44% 3.28% 1.40% SPEC/CFP2006/447.dealII/447.dealII 10.61% 13.06% 35.52% 15.01% SPEC/CFP2006/450.soplex/450.soplex 0.30% 0.00% 22.47% 0.00% SPEC/CFP2006/470.lbm/470.lbm 0.00% 0.00% 4.91% 0.30% SPEC/CINT2000/164.gzip/164.gzip 0.00% 0.17% 32.44% -4.93% SPEC/CINT2000/1...

[LLVMdev] Greedy register allocation

2011 Apr 30

2

[LLVMdev] Greedy register allocation

...4ref +6.7% 177.mesa With more registers and out-of-order execution hiding the cost of spilling, x86-64 is more mixed. I suspect this architecture is more sensitive to code layout issues than to register allocation: Targeting x86-64: -6.4% 464.h264ref -6.1% 256.bzip2 -5.2% 183.equake -4.8% 447.dealII -3.9% 400.perlbench -3.5% 401.bzip2 -3.3% 255.vortex +3.8% 186.crafty +5.0% 462.libquantum +8.0% 471.omnetpp Finally, armv7/thumb2 running on a Cortex-A9 CPU does quite well: Targeting armv7: -6.2% 447.dealII -4.4% 183.equake -4.1% 462.libquantum -3.5% 401.bzip2 Clang builds llvm+clang...

(RFC) Adjusting default loop fully unroll threshold

2017 Jan 31

3

(RFC) Adjusting default loop fully unroll threshold

...ing, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368 <https://reviews.llvm.org/D28368>, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: >> >> Code size: >> 447.dealII 0.50% >> 453.povray 0.42% >> 433.milc 0.20% >> 445.gobmk 0.32% >> 403.gcc 0.05% >> 464.h264ref 3.62% >> >> Compile Time: >> 447.dealII 0.22% >> 453.povray -0.16% >> 433.milc 0.09% >> 445.gobmk -2.43% >> 403.gcc 0.06% >>...

Register Spill Caused by the Reassociation pass

2015 Oct 01

2

Register Spill Caused by the Reassociation pass

Hi Sanjay, I observed some extra register spills when applying the reassociation pass on spec2006 benchmarks and I would like to listen to your advice. For example, function get_new_point_on_quad() of tria_boundary.cc in spec2006/dealII has a sequences of code like this . X=a+b . Y=X+c . Z=Y+d . There are many other instructions between these float adds. The reassociation pass first swaps a and c when checking the second add, and then swaps a and d when checking the third add. The transformed code looks like ....

Enable vectorizer-maximize-bandwidth by default?

2017 May 18

6

Enable vectorizer-maximize-bandwidth by default?

...help performance. I've tested the performance impact on Intel sandybridge machine with speccpu benchmarks: Benchmark Base:Reference (1) ------------------------------------------------------- spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58...

Fwd: cfl-aa

2016 Aug 30

2

Fwd: cfl-aa

...0 | 450.soplex | 72 2472234 | 401.bzip2 | 229 2574217 | 456.hmmer | 1833 3492577 | 445.gobmk | 8480 3685838 | 444.namd | 616 12943554 | 471.omnetpp | 422 20068605 | 464.h264ref | 8593 23849576 | 400.perlbench | 99316 37779455 | 447.dealII | 11204 186008992 | 403.gcc | 404828 I am finding these results weird because I was expecting a larger number of no-alias responses. For instance, I got only 404,828 responses out of 186,008,992 queries. Has anyone gotten similar, or different results? Regards, Vitor Mendes Paisan...

[CodeGen] CodeSize - TailMerging and BlockPlacement

2016 Mar 29

2

[CodeGen] CodeSize - TailMerging and BlockPlacement

...%). 473.astar -7 401.bzip2 -110 403.gcc -13,006 445.gobmk -1,716 464.h264ref -684 456.hmmer -391 462.libquantum -4 429.mcf -4 471.omnetpp -1,980 400.perlbench -4,176 458.sjeng -338 450.soplex -395 483.xalancbmk -4,183 447.dealII -186 433.milc -34 444.namd -104 453.povray -1,785 482.sphinx3 -112 I propose to factor out the relevant code from BranchFolding into a utility, and call it from BlockPlacement whenever the layout is changed. It is similar to D18226 and D18411 which factor tail...

New and more general Function Merging optimization for code size

2018 Aug 02

2

New and more general Function Merging optimization for code size

...ver the baseline: 5.55% > compared to 0.49% of the identical merge. > Average reduction in the total number of instructions over the baseline: > 7.04% compared to 0.47% of the identical merge. > > The highest reduction on the executable file is of about 20% (both 429.mcf > and 447.dealII) and the highest reduction on the total number of > instructions is of about 37% (447.dealII). > > It has an average slowdown of about 1%, but having no statistical > difference from the baseline in most of the benchmarks in the SPEC'06 suite. > > > Because this new functio...

[LLVMdev] fp Question

2010 Jul 22

0

[LLVMdev] fp Question

On Jul 22, 2010, at 4:18 PMPDT, Reza Yazdani wrote: > Hi, > > I ran Spec2006 with -O4. All integer benchmarks passed, but only 8 > out 17 of floating point benchmarks passed. Is this normal or I > made a mistake in my build? Hi Reza. Somebody on Linux should answer, but I don't think it's normal. You may have checked out the source at a moment when it had a bug

[LLVMdev] fp Question

2010 Jul 22

3

[LLVMdev] fp Question

Hi, I ran Spec2006 with -O4. All integer benchmarks passed, but only 8 out 17 of floating point benchmarks passed. Is this normal or I made a mistake in my build? Reza -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100722/4c4a81a9/attachment.html>

[LLVMdev] fp Question

2010 Jul 23

3

[LLVMdev] fp Question

...0.00146 RE 435.gromacs -- 0.00138 RE 436.cactusADM -- 0.00135 RE 437.leslie3d -- 0.00141 RE 444.namd -- 19.4 -- S 447.dealII -- 19.7 -- S 450.soplex -- 0.0380 -- S 453.povray -- 2.49 -- S 454.calculix -- 0.00135 RE 459.Gems...

[LLVMdev] [PATCH] use-diet for review

2008 Apr 29

0

[LLVMdev] [PATCH] use-diet for review

Hi Gabor, Thanks for posting the memory savings. 13% less memory usage in 447.dealII is very impressive. I haven't taken more than a very brief peek at this patch, but I have a few questions already. Is there a header missing? I don't see DECLARE_TRANSPARENT_OPERAND_ACCESSORS defined anywhere. Also, what affect does this macro have on doxygen? In User.h: +public: + te...

[LLVMdev] LLVM's Pre-allocation Scheduler Tested against a Branch-and-Bound Scheduler

2012 Sep 29

7

[LLVMdev] LLVM's Pre-allocation Scheduler Tested against a Branch-and-Bound Scheduler

...xalancbmk 21.9 21.9 0.00% GEOMEAN 19.0929865 19.00588287 0.46% 410.bwaves 15.2 15.2 0.00% 416.gamess CE CE #VALUE! 433.milc 19 18.6 2.15% 434.zeusmp 14.2 14.2 0.00% 435.gromacs 11.6 11.3 2.65% 436.cactusADM 8.31 7.89 5.32% 437.leslie3d 11 11 0.00% 444.namd 16 16 0.00% 447.dealII 25.4 25.4 0.00% 450.soplex 26.1 26.1 0.00% 453.povray 20.5 20.5 0.00% 454.calculix 8.44 8.3 1.69% 459.GemsFDTD 10.7 10.7 0.00% 465.tonto CE CE #VALUE! 470.lbm 38.1 31.5 20.95% 481.wrf 11.6 11.6 0.00% 482.sphinx3 28.2 26.9 4.83% GEOMEAN 15.91486307 15.54419555 2.38% Precise Latencies...

(RFC) Encoding code duplication factor in discriminator

2016 Oct 27

2

(RFC) Encoding code duplication factor in discriminator

The impact to debug_line is actually not small. I only implemented the part 1 (encoding duplication factor) for loop unrolling and loop vectorization. The debug_line size overhead for "-O2 -g1" binary of speccpu C/C++ benchmarks: 433.milc 23.59% 444.namd 6.25% 447.dealII 8.43% 450.soplex 2.41% 453.povray 5.40% 470.lbm 0.00% 482.sphinx3 7.10% 400.perlbench 2.77% 401.bzip2 9.62% 403.gcc 2.67% 429.mcf 9.54% 445.gobmk 7.40% 456.hmmer 9.79% 458.sjeng 9.98% 462.libquantum 10.90% 464.h264ref 30.21% 471.omnetpp 0.52% 473.astar 5.67% 483.xalancbmk 1.46% mean 7.86% Dehao On...

[RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)

2020 Aug 18

7

[RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)

...99.00 1351.00 69.1% test-suite.../CINT2000/176.gcc/176.gcc.test 412.00 668.00 62.1% test-suite...nsumer-lame/consumer-lame.test 111.00 175.00 57.7% test-suite...marks/7zip/7zip-benchmark.test 1069.00 1683.00 57.4% test-suite...006/447.dealII/447.dealII.test 1715.00 2689.00 56.8% There are a few existing missed optimization issues MemorySSA-backed DSE addresses, e.g. https://bugs.llvm.org/show_bug.cgi?id=46847 https://bugs.llvm.org/show_bug.cgi?id=40527 And some that should be relatively straight-forward to addre...

search for: dealii