search for: dealii

Displaying 20 results from an estimated 43 matches for "dealii".

2017 Jan 30
4
(RFC) Adjusting default loop fully unroll threshold
...ller. This seems conservative because unlike dynamic/partial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: Code size: 447.dealII 0.50% 453.povray 0.42% 433.milc 0.20% 445.gobmk 0.32% 403.gcc 0.05% 464.h264ref 3.62% Compile Time: 447.dealII 0.22% 453.povray -0.16% 433.milc 0.09% 445.gobmk -2.43% 403.gcc 0.06% 464.h264ref 3.21% Performance (on intel sandybridge): 447.dealII +0.07% 453.povray +1.79% 433.milc +1.02% 445.gobmk...
2017 Jan 30
0
(RFC) Adjusting default loop fully unroll threshold
...rtial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368 <https://reviews.llvm.org/D28368>, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: > > Code size: > 447.dealII 0.50% > 453.povray 0.42% > 433.milc 0.20% > 445.gobmk 0.32% > 403.gcc 0.05% > 464.h264ref 3.62% > > Compile Time: > 447.dealII 0.22% > 453.povray -0.16% > 433.milc 0.09% > 445.gobmk -2.43% > 403.gcc 0.06% > 464.h264ref 3.21% > > Performance (on intel s...
2017 Jan 30
2
(RFC) Adjusting default loop fully unroll threshold
...use > unlike dynamic/partial unrolling, fully unrolling will not affect > LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to > double the threshold for loop fully unroller. This will change the codegen > of several SPECCPU benchmarks: > > Code size: > 447.dealII 0.50% > 453.povray 0.42% > 433.milc 0.20% > 445.gobmk 0.32% > 403.gcc 0.05% > 464.h264ref 3.62% > > Compile Time: > 447.dealII 0.22% > 453.povray -0.16% > 433.milc 0.09% > 445.gobmk -2.43% > 403.gcc 0.06% > 464.h264ref 3.21% > > Performance (on intel san...
2017 Jan 31
0
(RFC) Adjusting default loop fully unroll threshold
...partial unrolling, fully unrolling will not affect >> LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed >> to double the threshold for loop fully unroller. This will change the >> codegen of several SPECCPU benchmarks: >> >> Code size: >> 447.dealII 0.50% >> 453.povray 0.42% >> 433.milc 0.20% >> 445.gobmk 0.32% >> 403.gcc 0.05% >> 464.h264ref 3.62% >> >> Compile Time: >> 447.dealII 0.22% >> 453.povray -0.16% >> 433.milc 0.09% >> 445.gobmk -2.43% >> 403.gcc 0.06% >> 4...
2018 Aug 02
2
New and more general Function Merging optimization for code size
...n the final exectuable file over the baseline: 5.55% compared to 0.49% of the identical merge. Average reduction in the total number of instructions over the baseline: 7.04% compared to 0.47% of the identical merge. The highest reduction on the executable file is of about 20% (both 429.mcf and 447.dealII) and the highest reduction on the total number of instructions is of about 37% (447.dealII). It has an average slowdown of about 1%, but having no statistical difference from the baseline in most of the benchmarks in the SPEC'06 suite. Because this new function merging technique is able to m...
2010 Feb 15
0
[LLVMdev] Measurements of the new inlinehint attribute
...83.equake 0.00% -1.85% 3.54% 0.00% SPEC/CFP2000/188.ammp/188.ammp 0.28% -0.18% 48.68% 3.10% SPEC/CFP2006/433.milc/433.milc 0.00% -0.14% 20.31% 2.68% SPEC/CFP2006/444.namd/444.namd 0.04% 0.44% 3.28% 1.40% SPEC/CFP2006/447.dealII/447.dealII 10.61% 13.06% 35.52% 15.01% SPEC/CFP2006/450.soplex/450.soplex 0.30% 0.00% 22.47% 0.00% SPEC/CFP2006/470.lbm/470.lbm 0.00% 0.00% 4.91% 0.30% SPEC/CINT2000/164.gzip/164.gzip 0.00% 0.17% 32.44% -4.93% SPEC/CINT2000/1...
2011 Apr 30
2
[LLVMdev] Greedy register allocation
...4ref +6.7% 177.mesa With more registers and out-of-order execution hiding the cost of spilling, x86-64 is more mixed. I suspect this architecture is more sensitive to code layout issues than to register allocation: Targeting x86-64: -6.4% 464.h264ref -6.1% 256.bzip2 -5.2% 183.equake -4.8% 447.dealII -3.9% 400.perlbench -3.5% 401.bzip2 -3.3% 255.vortex +3.8% 186.crafty +5.0% 462.libquantum +8.0% 471.omnetpp Finally, armv7/thumb2 running on a Cortex-A9 CPU does quite well: Targeting armv7: -6.2% 447.dealII -4.4% 183.equake -4.1% 462.libquantum -3.5% 401.bzip2 Clang builds llvm+clang...
2017 Jan 31
3
(RFC) Adjusting default loop fully unroll threshold
...ing, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368 <https://reviews.llvm.org/D28368>, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: >> >> Code size: >> 447.dealII 0.50% >> 453.povray 0.42% >> 433.milc 0.20% >> 445.gobmk 0.32% >> 403.gcc 0.05% >> 464.h264ref 3.62% >> >> Compile Time: >> 447.dealII 0.22% >> 453.povray -0.16% >> 433.milc 0.09% >> 445.gobmk -2.43% >> 403.gcc 0.06% >>...
2015 Oct 01
2
Register Spill Caused by the Reassociation pass
Hi Sanjay, I observed some extra register spills when applying the reassociation pass on spec2006 benchmarks and I would like to listen to your advice. For example, function get_new_point_on_quad() of tria_boundary.cc in spec2006/dealII has a sequences of code like this . X=a+b . Y=X+c . Z=Y+d . There are many other instructions between these float adds. The reassociation pass first swaps a and c when checking the second add, and then swaps a and d when checking the third add. The transformed code looks like ....
2017 May 18
6
Enable vectorizer-maximize-bandwidth by default?
...help performance. I've tested the performance impact on Intel sandybridge machine with speccpu benchmarks: Benchmark Base:Reference (1) ------------------------------------------------------- spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/2006/fp/C++/453.povray 38.57 -2.25% spec/2006/fp/C/433.milc 24.54 -0.76% spec/2006/fp/C/470.lbm 41.08 +0.26% spec/2006/fp/C/482.sphinx3 47.58...
2016 Aug 30
2
Fwd: cfl-aa
...0 | 450.soplex | 72 2472234 | 401.bzip2 | 229 2574217 | 456.hmmer | 1833 3492577 | 445.gobmk | 8480 3685838 | 444.namd | 616 12943554 | 471.omnetpp | 422 20068605 | 464.h264ref | 8593 23849576 | 400.perlbench | 99316 37779455 | 447.dealII | 11204 186008992 | 403.gcc | 404828 I am finding these results weird because I was expecting a larger number of no-alias responses. For instance, I got only 404,828 responses out of 186,008,992 queries. Has anyone gotten similar, or different results? Regards, Vitor Mendes Paisan...
2016 Mar 29
2
[CodeGen] CodeSize - TailMerging and BlockPlacement
...%). 473.astar -7 401.bzip2 -110 403.gcc -13,006 445.gobmk -1,716 464.h264ref -684 456.hmmer -391 462.libquantum -4 429.mcf -4 471.omnetpp -1,980 400.perlbench -4,176 458.sjeng -338 450.soplex -395 483.xalancbmk -4,183 447.dealII -186 433.milc -34 444.namd -104 453.povray -1,785 482.sphinx3 -112 I propose to factor out the relevant code from BranchFolding into a utility, and call it from BlockPlacement whenever the layout is changed. It is similar to D18226 and D18411 which factor tail...
2018 Aug 02
2
New and more general Function Merging optimization for code size
...ver the baseline: 5.55% > compared to 0.49% of the identical merge. > Average reduction in the total number of instructions over the baseline: > 7.04% compared to 0.47% of the identical merge. > > The highest reduction on the executable file is of about 20% (both 429.mcf > and 447.dealII) and the highest reduction on the total number of > instructions is of about 37% (447.dealII). > > It has an average slowdown of about 1%, but having no statistical > difference from the baseline in most of the benchmarks in the SPEC'06 suite. > > > Because this new functio...
2010 Jul 22
0
[LLVMdev] fp Question
On Jul 22, 2010, at 4:18 PMPDT, Reza Yazdani wrote: > Hi, > > I ran Spec2006 with -O4. All integer benchmarks passed, but only 8 > out 17 of floating point benchmarks passed. Is this normal or I > made a mistake in my build? Hi Reza. Somebody on Linux should answer, but I don't think it's normal. You may have checked out the source at a moment when it had a bug
2010 Jul 22
3
[LLVMdev] fp Question
Hi, I ran Spec2006 with -O4. All integer benchmarks passed, but only 8 out 17 of floating point benchmarks passed. Is this normal or I made a mistake in my build? Reza -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100722/4c4a81a9/attachment.html>
2010 Jul 23
3
[LLVMdev] fp Question
...0.00146 RE 435.gromacs -- 0.00138 RE 436.cactusADM -- 0.00135 RE 437.leslie3d -- 0.00141 RE 444.namd -- 19.4 -- S 447.dealII -- 19.7 -- S 450.soplex -- 0.0380 -- S 453.povray -- 2.49 -- S 454.calculix -- 0.00135 RE 459.Gems...
2008 Apr 29
0
[LLVMdev] [PATCH] use-diet for review
Hi Gabor, Thanks for posting the memory savings. 13% less memory usage in 447.dealII is very impressive. I haven't taken more than a very brief peek at this patch, but I have a few questions already. Is there a header missing? I don't see DECLARE_TRANSPARENT_OPERAND_ACCESSORS defined anywhere. Also, what affect does this macro have on doxygen? In User.h: +public: + te...
2012 Sep 29
7
[LLVMdev] LLVM's Pre-allocation Scheduler Tested against a Branch-and-Bound Scheduler
...xalancbmk 21.9 21.9 0.00% GEOMEAN 19.0929865 19.00588287     0.46% 410.bwaves  15.2 15.2 0.00% 416.gamess CE CE #VALUE! 433.milc  19 18.6 2.15% 434.zeusmp    14.2 14.2 0.00% 435.gromacs       11.6 11.3 2.65% 436.cactusADM 8.31 7.89 5.32% 437.leslie3d 11 11 0.00% 444.namd   16 16 0.00% 447.dealII 25.4 25.4 0.00% 450.soplex 26.1 26.1 0.00% 453.povray 20.5 20.5 0.00% 454.calculix 8.44 8.3 1.69% 459.GemsFDTD  10.7 10.7 0.00% 465.tonto CE CE #VALUE! 470.lbm 38.1 31.5 20.95% 481.wrf 11.6 11.6 0.00% 482.sphinx3 28.2 26.9 4.83% GEOMEAN 15.91486307 15.54419555    2.38% Precise Latencies...
2016 Oct 27
2
(RFC) Encoding code duplication factor in discriminator
The impact to debug_line is actually not small. I only implemented the part 1 (encoding duplication factor) for loop unrolling and loop vectorization. The debug_line size overhead for "-O2 -g1" binary of speccpu C/C++ benchmarks: 433.milc 23.59% 444.namd 6.25% 447.dealII 8.43% 450.soplex 2.41% 453.povray 5.40% 470.lbm 0.00% 482.sphinx3 7.10% 400.perlbench 2.77% 401.bzip2 9.62% 403.gcc 2.67% 429.mcf 9.54% 445.gobmk 7.40% 456.hmmer 9.79% 458.sjeng 9.98% 462.libquantum 10.90% 464.h264ref 30.21% 471.omnetpp 0.52% 473.astar 5.67% 483.xalancbmk 1.46% mean 7.86% Dehao On...
2020 Aug 18
7
[RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
...99.00 1351.00 69.1% test-suite.../CINT2000/176.gcc/176.gcc.test 412.00 668.00 62.1% test-suite...nsumer-lame/consumer-lame.test 111.00 175.00 57.7% test-suite...marks/7zip/7zip-benchmark.test 1069.00 1683.00 57.4% test-suite...006/447.dealII/447.dealII.test 1715.00 2689.00 56.8% There are a few existing missed optimization issues MemorySSA-backed DSE addresses, e.g. https://bugs.llvm.org/show_bug.cgi?id=46847 https://bugs.llvm.org/show_bug.cgi?id=40527 And some that should be relatively straight-forward to addre...