search for: speccpu

Displaying 19 results from an estimated 19 matches for "speccpu".

2017 May 18
6
Enable vectorizer-maximize-bandwidth by default?
Hi, I'm proposing to make vectorizer-maximize-bandwidth on by default for loop vectorizer because it should generally help performance. I've tested the performance impact on Intel sandybridge machine with speccpu benchmarks: Benchmark Base:Reference (1) ------------------------------------------------------- spec/2006/fp/C++/444.namd 26.84 -0.31% spec/2006/fp/C++/447.dealII 46.19 +0.89% spec/2006/fp/C++/450.soplex 42.92 -0.44% spec/200...
2017 Jan 30
4
(RFC) Adjusting default loop fully unroll threshold
...op dynamic unroller and partial unroller. This seems conservative because unlike dynamic/partial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: Code size: 447.dealII 0.50% 453.povray 0.42% 433.milc 0.20% 445.gobmk 0.32% 403.gcc 0.05% 464.h264ref 3.62% Compile Time: 447.dealII 0.22% 453.povray -0.16% 433.milc 0.09% 445.gobmk -2.43% 403.gcc 0.06% 464.h264ref 3.21% Performance (on intel sandybridge): 447.dealII +0.07% 453.povra...
2016 Oct 07
7
Debug info interacting with optimization and code generation
...t a couple of careless bugs to fix. But looks like there are much more issues than I expected. So I'm calling the community for help: * Is there anyone else who also cares about codegen consistency? * Any volunteers to help fix codegen consistency issues? (It is easy to find issues, just build speccpu with -g and -g0, then compare the "objdump -d" output) * How to setup a regression test to ensure future changes does not break codegen consistency? Any comments? Thanks, Dehao -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipe...
2017 Jan 30
0
(RFC) Adjusting default loop fully unroll threshold
.... This seems conservative because unlike dynamic/partial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368 <https://reviews.llvm.org/D28368>, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: > > Code size: > 447.dealII 0.50% > 453.povray 0.42% > 433.milc 0.20% > 445.gobmk 0.32% > 403.gcc 0.05% > 464.h264ref 3.62% > > Compile Time: > 447.dealII 0.22% > 453.povray -0.16% > 433.milc 0.09% > 445.gobmk -2.43% > 403.gcc 0.06% > 46...
2017 Jan 30
2
(RFC) Adjusting default loop fully unroll threshold
...and partial unroller. This seems conservative because > unlike dynamic/partial unrolling, fully unrolling will not affect > LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to > double the threshold for loop fully unroller. This will change the codegen > of several SPECCPU benchmarks: > > Code size: > 447.dealII 0.50% > 453.povray 0.42% > 433.milc 0.20% > 445.gobmk 0.32% > 403.gcc 0.05% > 464.h264ref 3.62% > > Compile Time: > 447.dealII 0.22% > 453.povray -0.16% > 433.milc 0.09% > 445.gobmk -2.43% > 403.gcc 0.06% > 464....
2015 Jan 16
7
[LLVMdev] proof of concept for a loop fusion pass
Hi, We are proposing a loop fusion pass that tries to proactive fuse loops across function call boundaries and arbitrary control flow. http://reviews.llvm.org/D7008 With this pass, we get 103 loop fusions in SPECCPU INT 2006 462.libquantum with rate performance improving close to 2.5X in x86 (results from AMD A10-6700). I took some liberties in patching up some of the code in ScalarEvolution/DependenceAnalysis/GlobalsModRef/Clang options/ and also adjusted the IPO/LTO pass managers. I would need to do a bette...
2016 Oct 27
2
(RFC) Encoding code duplication factor in discriminator
The impact to debug_line is actually not small. I only implemented the part 1 (encoding duplication factor) for loop unrolling and loop vectorization. The debug_line size overhead for "-O2 -g1" binary of speccpu C/C++ benchmarks: 433.milc 23.59% 444.namd 6.25% 447.dealII 8.43% 450.soplex 2.41% 453.povray 5.40% 470.lbm 0.00% 482.sphinx3 7.10% 400.perlbench 2.77% 401.bzip2 9.62% 403.gcc 2.67% 429.mcf 9.54% 445.gobmk 7.40% 456.hmmer 9.79% 458.sjeng 9.98% 462.libquantum 10.90% 464.h264ref 30.21% 471.omnetpp 0...
2016 Mar 22
3
Instrumented BB in PGO
Hello, I have a question regarding PGO instrumented BBs (I use IR-level instrumentation). It seems that instrumented BBs do not match between the two compilations for profile-gen and profile-use for some cases. Here is an example from SPECcpu 2006 lbm (a simple case consisting of just two modules). In the first compilation, we have 5 instrumentation points for the main function as follows: $ opt -pgo-instr-gen -instrprof _all_combined.bc -o _all_combined_inst.bc -debug-only=pgo-instrumentation Dump Function main Hash: 61483163021 af...
2017 Jan 31
0
(RFC) Adjusting default loop fully unroll threshold
...ller. This seems conservative because >> unlike dynamic/partial unrolling, fully unrolling will not affect >> LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed >> to double the threshold for loop fully unroller. This will change the >> codegen of several SPECCPU benchmarks: >> >> Code size: >> 447.dealII 0.50% >> 453.povray 0.42% >> 433.milc 0.20% >> 445.gobmk 0.32% >> 403.gcc 0.05% >> 464.h264ref 3.62% >> >> Compile Time: >> 447.dealII 0.22% >> 453.povray -0.16% >> 433.milc 0.09...
2016 Oct 27
0
(RFC) Encoding code duplication factor in discriminator
...ct 27, 2016 at 1:11 PM, Dehao Chen <dehao at google.com> wrote: > The impact to debug_line is actually not small. I only implemented the > part 1 (encoding duplication factor) for loop unrolling and loop > vectorization. The debug_line size overhead for "-O2 -g1" binary of speccpu > C/C++ benchmarks: > > 433.milc 23.59% > 444.namd 6.25% > 447.dealII 8.43% > 450.soplex 2.41% > 453.povray 5.40% > 470.lbm 0.00% > 482.sphinx3 7.10% > 400.perlbench 2.77% > 401.bzip2 9.62% > 403.gcc 2.67% > 429.mcf 9.54% > 445.gobmk 7.40% > 456.hmmer 9....
2011 Feb 10
0
Problem with Memory Throughput Difference between Two Nodes(sockets)
Hi all, I installed xen4.0.1-rc3 & 2.6.18.8 (dom0) on my machine (INTEL Xeon X5650, Westemere, 12cores, 6cores per socket, 2sockets, 12MB L3,.. ) I figured out after running SPECCPU 2006 Libquantum benchmark that two nodes have different throughput. I set up 6vm on each node, and ran the workload in each VM. VM in node1 got 1500sec exec time while VM in node2 got 1990sec exec time. Previously, I also installed same xen version with 2.6.31 (dom0) pvops linux ke...
2015 Jan 17
3
[LLVMdev] proof of concept for a loop fusion pass
...on pass that tries to proactive fuse > loops across function call boundaries and arbitrary control flow. > > > http://reviews.llvm.org/D7008 This link contains the Clang patch. Did you intend to post the LLVM patch as well? > > > With this pass, we get 103 loop fusions in SPECCPU INT 2006 > 462.libquantum with rate performance improving close to 2.5X in x86 > (results from AMD A10-6700). > > > I took some liberties in patching up some of the code in > ScalarEvolution/DependenceAnalysis/GlobalsModRef/Clang options/ and > also adjusted the IPO/LTO pass...
2017 Jan 31
3
(RFC) Adjusting default loop fully unroll threshold
.... This seems conservative because unlike dynamic/partial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368 <https://reviews.llvm.org/D28368>, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: >> >> Code size: >> 447.dealII 0.50% >> 453.povray 0.42% >> 433.milc 0.20% >> 445.gobmk 0.32% >> 403.gcc 0.05% >> 464.h264ref 3.62% >> >> Compile Time: >> 447.dealII 0.22% >> 453.povray -0.16% >> 433.milc 0....
2003 Nov 18
0
LLVM Status Update
...Misha and Brian made huge contributions. 2. Several C++ EH related bug fixes went into the C++ frontend. 3. Opaque type resolution has been reimplemented. It's now simpler and fixes problems linking a large number of programs. 4. We fixed the remaining problems preventing the C codes in SPECCPU 2000 from working. 5. Misha improved the JIT to incrementally load bytecode files from the disk as functions are needed. This reduces startup time as well as memory footprint. 6. John updated the QMTest test expectations to match what we expect to fail on either X86 or Sparc. Now al...
2016 Oct 27
0
(RFC) Encoding code duplication factor in discriminator
Do you have an estimate of the debug_line size increase? I guess it will be small. David On Thu, Oct 27, 2016 at 11:39 AM, Dehao Chen <dehao at google.com> wrote: > Motivation: > Many optimizations duplicate code. E.g. loop unroller duplicates the loop > body, GVN duplicates computation, etc. The duplicated code will share the > same debug info with the original code. For
2016 Apr 20
3
RFC: EfficiencySanitizer
On 04/20/2016 02:58 PM, Renato Golin via llvm-dev wrote: > Hi Derek, > > I'm not an expert in any of these topics, but I'm excited that you > guys are doing it. It seems like a missing piece that needs to be > filled. > > Some comments inline... > > > On 17 April 2016 at 22:46, Derek Bruening via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >>
2010 Oct 29
2
[LLVMdev] "multiple definition of .. " in clang 2.8
Hi, I tried to run the SPEC benchmark suite SPECCPU 2006 with llvm and clang 2.8. When building the perlbench sources I get these errors (see below) for all the source files. I used a config file ( http://old.nabble.com/file/p30085184/llvm.cfg llvm.cfg ) where I specify clang as the compiler. I verified the same sources with llvm-gcc and it works f...
2018 Sep 14
1
Re: NUMA issues on virtualized hosts
Hello again, when the iozone writes slow. This is how slabtop looks like: 62476752 62476728 0% 0.10K 1601968 39 6407872K buffer_head 1000678 999168 0% 0.56K 142954 7 571816K radix_tree_node 132184 125911 0% 0.03K 1066 124 4264K kmalloc-32 118496 118224 0% 0.12K 3703 32 14812K kmalloc-node 73206 56467 0% 0.19K 3486 21
2016 Oct 27
8
(RFC) Encoding code duplication factor in discriminator
Motivation: Many optimizations duplicate code. E.g. loop unroller duplicates the loop body, GVN duplicates computation, etc. The duplicated code will share the same debug info with the original code. For SamplePGO, the debug info is used to present the profile. Code duplication will affect profile accuracy. Taking loop unrolling for example: #1 foo(); #2 for (i = 0; i < N; i++) { #3 bar();