Displaying 20 results from an estimated 43 matches for "dealii".
2017 Jan 30
4
(RFC) Adjusting default loop fully unroll threshold
...ller. This seems conservative because
unlike dynamic/partial unrolling, fully unrolling will not affect
LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to
double the threshold for loop fully unroller. This will change the codegen
of several SPECCPU benchmarks:
Code size:
447.dealII 0.50%
453.povray 0.42%
433.milc 0.20%
445.gobmk 0.32%
403.gcc 0.05%
464.h264ref 3.62%
Compile Time:
447.dealII 0.22%
453.povray -0.16%
433.milc 0.09%
445.gobmk -2.43%
403.gcc 0.06%
464.h264ref 3.21%
Performance (on intel sandybridge):
447.dealII +0.07%
453.povray +1.79%
433.milc +1.02%
445.gobmk...
2017 Jan 30
0
(RFC) Adjusting default loop fully unroll threshold
...rtial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368 <https://reviews.llvm.org/D28368>, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks:
>
> Code size:
> 447.dealII 0.50%
> 453.povray 0.42%
> 433.milc 0.20%
> 445.gobmk 0.32%
> 403.gcc 0.05%
> 464.h264ref 3.62%
>
> Compile Time:
> 447.dealII 0.22%
> 453.povray -0.16%
> 433.milc 0.09%
> 445.gobmk -2.43%
> 403.gcc 0.06%
> 464.h264ref 3.21%
>
> Performance (on intel s...
2017 Jan 30
2
(RFC) Adjusting default loop fully unroll threshold
...use
> unlike dynamic/partial unrolling, fully unrolling will not affect
> LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to
> double the threshold for loop fully unroller. This will change the codegen
> of several SPECCPU benchmarks:
>
> Code size:
> 447.dealII 0.50%
> 453.povray 0.42%
> 433.milc 0.20%
> 445.gobmk 0.32%
> 403.gcc 0.05%
> 464.h264ref 3.62%
>
> Compile Time:
> 447.dealII 0.22%
> 453.povray -0.16%
> 433.milc 0.09%
> 445.gobmk -2.43%
> 403.gcc 0.06%
> 464.h264ref 3.21%
>
> Performance (on intel san...
2017 Jan 31
0
(RFC) Adjusting default loop fully unroll threshold
...partial unrolling, fully unrolling will not affect
>> LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed
>> to double the threshold for loop fully unroller. This will change the
>> codegen of several SPECCPU benchmarks:
>>
>> Code size:
>> 447.dealII 0.50%
>> 453.povray 0.42%
>> 433.milc 0.20%
>> 445.gobmk 0.32%
>> 403.gcc 0.05%
>> 464.h264ref 3.62%
>>
>> Compile Time:
>> 447.dealII 0.22%
>> 453.povray -0.16%
>> 433.milc 0.09%
>> 445.gobmk -2.43%
>> 403.gcc 0.06%
>> 4...
2018 Aug 02
2
New and more general Function Merging optimization for code size
...n the final exectuable file over the baseline: 5.55%
compared to 0.49% of the identical merge.
Average reduction in the total number of instructions over the baseline:
7.04% compared to 0.47% of the identical merge.
The highest reduction on the executable file is of about 20% (both 429.mcf
and 447.dealII) and the highest reduction on the total number of
instructions is of about 37% (447.dealII).
It has an average slowdown of about 1%, but having no statistical
difference from the baseline in most of the benchmarks in the SPEC'06 suite.
Because this new function merging technique is able to m...
2010 Feb 15
0
[LLVMdev] Measurements of the new inlinehint attribute
...83.equake 0.00% -1.85% 3.54% 0.00%
SPEC/CFP2000/188.ammp/188.ammp 0.28% -0.18% 48.68% 3.10%
SPEC/CFP2006/433.milc/433.milc 0.00% -0.14% 20.31% 2.68%
SPEC/CFP2006/444.namd/444.namd 0.04% 0.44% 3.28% 1.40%
SPEC/CFP2006/447.dealII/447.dealII 10.61% 13.06% 35.52% 15.01%
SPEC/CFP2006/450.soplex/450.soplex 0.30% 0.00% 22.47% 0.00%
SPEC/CFP2006/470.lbm/470.lbm 0.00% 0.00% 4.91% 0.30%
SPEC/CINT2000/164.gzip/164.gzip 0.00% 0.17% 32.44% -4.93%
SPEC/CINT2000/1...
2011 Apr 30
2
[LLVMdev] Greedy register allocation
...4ref
+6.7% 177.mesa
With more registers and out-of-order execution hiding the cost of spilling, x86-64 is more mixed. I suspect this architecture is more sensitive to code layout issues than to register allocation:
Targeting x86-64:
-6.4% 464.h264ref
-6.1% 256.bzip2
-5.2% 183.equake
-4.8% 447.dealII
-3.9% 400.perlbench
-3.5% 401.bzip2
-3.3% 255.vortex
+3.8% 186.crafty
+5.0% 462.libquantum
+8.0% 471.omnetpp
Finally, armv7/thumb2 running on a Cortex-A9 CPU does quite well:
Targeting armv7:
-6.2% 447.dealII
-4.4% 183.equake
-4.1% 462.libquantum
-3.5% 401.bzip2
Clang builds llvm+clang...
2017 Jan 31
3
(RFC) Adjusting default loop fully unroll threshold
...ing, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368 <https://reviews.llvm.org/D28368>, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks:
>>
>> Code size:
>> 447.dealII 0.50%
>> 453.povray 0.42%
>> 433.milc 0.20%
>> 445.gobmk 0.32%
>> 403.gcc 0.05%
>> 464.h264ref 3.62%
>>
>> Compile Time:
>> 447.dealII 0.22%
>> 453.povray -0.16%
>> 433.milc 0.09%
>> 445.gobmk -2.43%
>> 403.gcc 0.06%
>>...
2015 Oct 01
2
Register Spill Caused by the Reassociation pass
Hi Sanjay,
I observed some extra register spills when applying the reassociation pass
on spec2006 benchmarks and I would like to listen to your advice.
For example, function get_new_point_on_quad() of tria_boundary.cc in
spec2006/dealII has a sequences of code like this
.
X=a+b
.
Y=X+c
.
Z=Y+d
.
There are many other instructions between these float adds. The
reassociation pass first swaps a and c when checking the second add, and
then swaps a and d when checking the third add. The transformed code looks
like
....
2017 May 18
6
Enable vectorizer-maximize-bandwidth by default?
...help performance.
I've tested the performance impact on Intel sandybridge machine with
speccpu benchmarks:
Benchmark Base:Reference (1)
-------------------------------------------------------
spec/2006/fp/C++/444.namd 26.84 -0.31%
spec/2006/fp/C++/447.dealII 46.19 +0.89%
spec/2006/fp/C++/450.soplex 42.92 -0.44%
spec/2006/fp/C++/453.povray 38.57 -2.25%
spec/2006/fp/C/433.milc 24.54 -0.76%
spec/2006/fp/C/470.lbm 41.08 +0.26%
spec/2006/fp/C/482.sphinx3 47.58...
2016 Aug 30
2
Fwd: cfl-aa
...0 | 450.soplex | 72
2472234 | 401.bzip2 | 229
2574217 | 456.hmmer | 1833
3492577 | 445.gobmk | 8480
3685838 | 444.namd | 616
12943554 | 471.omnetpp | 422
20068605 | 464.h264ref | 8593
23849576 | 400.perlbench | 99316
37779455 | 447.dealII | 11204
186008992 | 403.gcc | 404828
I am finding these results weird because I was expecting a larger
number of no-alias responses. For instance, I got only 404,828 responses
out of 186,008,992 queries. Has anyone gotten similar, or different results?
Regards,
Vitor Mendes Paisan...
2016 Mar 29
2
[CodeGen] CodeSize - TailMerging and BlockPlacement
...%).
473.astar -7
401.bzip2 -110
403.gcc -13,006
445.gobmk -1,716
464.h264ref -684
456.hmmer -391
462.libquantum -4
429.mcf -4
471.omnetpp -1,980
400.perlbench -4,176
458.sjeng -338
450.soplex -395
483.xalancbmk -4,183
447.dealII -186
433.milc -34
444.namd -104
453.povray -1,785
482.sphinx3 -112
I propose to factor out the relevant code from BranchFolding into a
utility, and call it from BlockPlacement whenever the layout is changed.
It is similar to D18226 and D18411 which factor tail...
2018 Aug 02
2
New and more general Function Merging optimization for code size
...ver the baseline: 5.55%
> compared to 0.49% of the identical merge.
> Average reduction in the total number of instructions over the baseline:
> 7.04% compared to 0.47% of the identical merge.
>
> The highest reduction on the executable file is of about 20% (both 429.mcf
> and 447.dealII) and the highest reduction on the total number of
> instructions is of about 37% (447.dealII).
>
> It has an average slowdown of about 1%, but having no statistical
> difference from the baseline in most of the benchmarks in the SPEC'06 suite.
>
>
> Because this new functio...
2010 Jul 22
0
[LLVMdev] fp Question
On Jul 22, 2010, at 4:18 PMPDT, Reza Yazdani wrote:
> Hi,
>
> I ran Spec2006 with -O4. All integer benchmarks passed, but only 8
> out 17 of floating point benchmarks passed. Is this normal or I
> made a mistake in my build?
Hi Reza. Somebody on Linux should answer, but I don't think it's
normal. You may have checked out the source at a moment when it had a
bug
2010 Jul 22
3
[LLVMdev] fp Question
Hi,
I ran Spec2006 with -O4. All integer benchmarks passed, but only 8 out 17
of floating point benchmarks passed. Is this normal or I made a mistake in
my build?
Reza
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100722/4c4a81a9/attachment.html>
2010 Jul 23
3
[LLVMdev] fp Question
...0.00146
RE
435.gromacs -- 0.00138
RE
436.cactusADM -- 0.00135
RE
437.leslie3d -- 0.00141
RE
444.namd -- 19.4 --
S
447.dealII -- 19.7 --
S
450.soplex -- 0.0380 --
S
453.povray -- 2.49 --
S
454.calculix -- 0.00135
RE
459.Gems...
2008 Apr 29
0
[LLVMdev] [PATCH] use-diet for review
Hi Gabor,
Thanks for posting the memory savings. 13% less memory usage
in 447.dealII is very impressive.
I haven't taken more than a very brief peek at this patch, but I
have a few questions already.
Is there a header missing? I don't see
DECLARE_TRANSPARENT_OPERAND_ACCESSORS
defined anywhere.
Also, what affect does this macro have on doxygen?
In User.h:
+public:
+ te...
2012 Sep 29
7
[LLVMdev] LLVM's Pre-allocation Scheduler Tested against a Branch-and-Bound Scheduler
...xalancbmk 21.9 21.9 0.00%
GEOMEAN 19.0929865 19.00588287 0.46%
410.bwaves 15.2 15.2 0.00%
416.gamess CE CE #VALUE!
433.milc 19 18.6 2.15%
434.zeusmp 14.2 14.2 0.00%
435.gromacs 11.6 11.3 2.65%
436.cactusADM 8.31 7.89 5.32%
437.leslie3d 11 11 0.00%
444.namd 16 16 0.00%
447.dealII 25.4 25.4 0.00%
450.soplex 26.1 26.1 0.00%
453.povray 20.5 20.5 0.00%
454.calculix 8.44 8.3 1.69%
459.GemsFDTD 10.7 10.7 0.00%
465.tonto CE CE #VALUE!
470.lbm 38.1 31.5 20.95%
481.wrf 11.6 11.6 0.00%
482.sphinx3 28.2 26.9 4.83%
GEOMEAN 15.91486307 15.54419555 2.38%
Precise Latencies...
2016 Oct 27
2
(RFC) Encoding code duplication factor in discriminator
The impact to debug_line is actually not small. I only implemented the part
1 (encoding duplication factor) for loop unrolling and loop vectorization.
The debug_line size overhead for "-O2 -g1" binary of speccpu C/C++
benchmarks:
433.milc 23.59%
444.namd 6.25%
447.dealII 8.43%
450.soplex 2.41%
453.povray 5.40%
470.lbm 0.00%
482.sphinx3 7.10%
400.perlbench 2.77%
401.bzip2 9.62%
403.gcc 2.67%
429.mcf 9.54%
445.gobmk 7.40%
456.hmmer 9.79%
458.sjeng 9.98%
462.libquantum 10.90%
464.h264ref 30.21%
471.omnetpp 0.52%
473.astar 5.67%
483.xalancbmk 1.46%
mean 7.86%
Dehao
On...
2020 Aug 18
7
[RFC] Switching to MemorySSA-backed Dead Store Elimination (aka cross-bb DSE)
...99.00 1351.00 69.1%
test-suite.../CINT2000/176.gcc/176.gcc.test 412.00 668.00 62.1%
test-suite...nsumer-lame/consumer-lame.test 111.00 175.00 57.7%
test-suite...marks/7zip/7zip-benchmark.test 1069.00 1683.00 57.4%
test-suite...006/447.dealII/447.dealII.test 1715.00 2689.00 56.8%
There are a few existing missed optimization issues MemorySSA-backed DSE addresses, e.g.
https://bugs.llvm.org/show_bug.cgi?id=46847
https://bugs.llvm.org/show_bug.cgi?id=40527
And some that should be relatively straight-forward to addre...