thr3ads.net - search: "spec2017"

2017 Nov 02

13

[RFC] Enable Partial Inliner by default

...rest of the code in bar } Here are the numbers on a Power8 PPCLE running Ubuntu 15.04 in ST-mode ---------------------------------------------- Runtime performance (speed) ---------------------------------------------- Workload Improvement -------- ----------- SPEC2006(C/C++) 0.06% (geomean) SPEC2017(C/C++) 0.10% (geomean) ---------------------------------------------- Compile time performance for Bootstrapped LLVM ---------------------------------------------- Workload Improvement -------- ----------- SPEC2006(C/C++) 0.41% (cumulative) SPEC2017(C/C++) -0.16% (cumulative) lnt 0.61% (geom...

Heroic LLVM optimizations

2017 Aug 15

2

Heroic LLVM optimizations

...uld post this in the llvm-dev. HiSilicon (Santa Clara office) is looking for some developer capable of implementing the "heroic optimizations" (http://llvm.org/devmtg/2015-10/slides/Gerolf-PerformanceImprovementsAndHeadroom.pdf) in LLVM. Focus on SPEC2006 but also looking at the new SPEC2017. The goal is to match, or get closer, to the Intel compiler with SPEC2006. ICC has a significant advantage. As the talk shows, there is over 10x diff in libquantum, and other benchmarks have also significant difference between latest gcc/llvm and ICC. Send me an email with your CV or quest...

Heroic LLVM optimizations

2017 Aug 16

2

Heroic LLVM optimizations

...ilto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Tobias Grosser via llvm-dev Sent: Wednesday, August 16, 2017 10:10 AM To: renau at uncore.io; llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] Heroic LLVM optimizations Hi Jose, we have work based on Polly which should get the loop-fusion in SPEC2017. The code is not yet ready to share, but I would be interested to learn if this would be of use to you. Best, Tobias On Wed, Aug 16, 2017, at 00:15, renau at uncore.io via llvm-dev wrote: > > I am a professor at UC Santa Cruz, but I also do consulting a Huawei. > Chris Lattner told m...

LoopSimplify pass prevents loop unrolling

2017 Jun 30

2

LoopSimplify pass prevents loop unrolling

...ify canonicalizing incorrectly? Should simplifycfg skip folding the latch block into the loop header if this results in additional backedges and let the empty blocks be folded during CGP? More details in https://bugs.llvm.org/show_bug.cgi?id=33605. FWIW, this prevents unrolling of a hot loop in spec2017/gcc and also prevents loop-interleave of a loop in spec2017/perlbench. Appreciate any suggestions on how to fix this. Attached testcase: $cat test.c void foo(); bool test(int a, int b, int *c) { bool changed = false; for (unsigned int i = 2; i--;) { int r = a | b; if ( r != c...

Heroic LLVM optimizations

2017 Aug 16

1

Heroic LLVM optimizations

...ser via llvm-dev >>Sent: Wednesday, August 16, 2017 10:10 AM >>To: renau at uncore.io; llvm-dev at lists.llvm.org >>Subject: Re: [llvm-dev] Heroic LLVM optimizations >> >>Hi Jose, >> >>we have work based on Polly which should get the loop-fusion in >>SPEC2017. >>The code is not yet ready to share, but I would be interested to learn >>if >>this would be of use to you. >> >>Best, >>Tobias >> >>On Wed, Aug 16, 2017, at 00:15, renau at uncore.io via llvm-dev wrote: >> > >> > I am a profess...

[RFC] Enable Partial Inliner by default

2017 Nov 10

0

[RFC] Enable Partial Inliner by default

...5.04 in ST-mode > > ---------------------------------------------- > Runtime performance (speed) > ---------------------------------------------- > Workload Improvement > -------- ----------- > SPEC2006(C/C++) 0.06% (geomean) > SPEC2017(C/C++) 0.10% (geomean) > ---------------------------------------------- > Compile time performance for Bootstrapped LLVM > ---------------------------------------------- > Workload Improvement > -------- ----------- > SPEC2006(C/C++) 0...

[RFC] Enable Partial Inliner by default

2017 Nov 10

5

[RFC] Enable Partial Inliner by default

...mbers on a Power8 PPCLE running Ubuntu 15.04 in ST-mode > > ---------------------------------------------- > Runtime performance (speed) > ---------------------------------------------- > Workload Improvement > -------- ----------- > SPEC2006(C/C++) 0.06% (geomean) > SPEC2017(C/C++) 0.10% (geomean) > ---------------------------------------------- > Compile time performance for Bootstrapped LLVM > ---------------------------------------------- > Workload Improvement > -------- ----------- > SPEC2006(C/C++) 0.41% (cumulative) > SPEC2017(C/C++)...

[RFC] Enable Partial Inliner by default

2017 Nov 13

2

[RFC] Enable Partial Inliner by default

...mbers on a Power8 PPCLE running Ubuntu 15.04 in ST-mode > > ---------------------------------------------- > Runtime performance (speed) > ---------------------------------------------- > Workload Improvement > -------- ----------- > SPEC2006(C/C++) 0.06% (geomean) > SPEC2017(C/C++) 0.10% (geomean) > ---------------------------------------------- > Compile time performance for Bootstrapped LLVM > ---------------------------------------------- > Workload Improvement > -------- ----------- > SPEC2006(C/C++) 0.41% (cumulative) > SPEC2017(C/C++)...

RFC: Switching to the new pass manager by default

2017 Oct 26

3

RFC: Switching to the new pass manager by default

...a debug or release build, if asserts are enabled. On 10/26/2017 4:05 PM, Chad Rosier via llvm-dev wrote: > > Chandler/All, > > We've just started testing the new pass manager this week and we ran > into a 548x slowdown (i.e., 6.28s to 3443.83s) for one of the files > from SPEC2017/blender. The issue arises only in debug builds due to > the numerous calls to RefSCC::verify() and SCC::verify() in the > LazyCallGraph implementation. Would it make sense to start > predicating these calls with the EXPENSIVE_CHECKS macro, rather than > NDEBUG? > > Chad &gt...

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 04

4

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

...IPO copy/constant propagation doesn't get this > case, but i didn't look if the lattice supports variables. > In particular, in your example, given no other call sites, it should > eliminate the dead code. > (In a real program, it may require cloning). In the actual program (SPEC2017/gcc, ironically), there are multiple calls to fn2 and only one of them has the property that the 1st and 2nd argument are the same (as is shown in my pseudo code). Internally, we have another developer, Matt Simpson, working on a function specialization patch that might be of value here. Specif...

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 07

3

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

...'m a bit surprised IPO copy/constant propagation doesn't get this case, but i didn't look if the lattice supports variables. In particular, in your example, given no other call sites, it should eliminate the dead code. (In a real program, it may require cloning). In the actual program (SPEC2017/gcc, ironically), there are multiple calls to fn2 and only one of them has the property that the 1st and 2nd argument are the same (as is shown in my pseudo code). Internally, we have another developer, Matt Simpson, working on a function specialization patch that might be of value here. Specific...

NUMA issues on virtualized hosts

2018 Sep 14

3

NUMA issues on virtualized hosts

...pe='pci' index='5' model='pci-bridge'/></devices> <metadata> <system_datastore><![CDATA[/opt/opennebula/var/datastores/108/55782]]> </system_datastore> </metadata> </domain> If I run e.g., spec2017 on the virtual, I can see: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1350 root 20 0 843136 830068 2524 R 78.1 0.2 513:16.16 bwaves_r_base.m 2456 root 20 0 804608 791264 2524 R 76.6 0.2 491:39.92 bwaves_r_base.m 4631 root...

Re: NUMA issues on virtualized hosts

2018 Sep 14

1

Re: NUMA issues on virtualized hosts

...;/></devices> > > <metadata> > > <system_datastore><![CDATA[/opt/opennebula/var/datastores/108/55782]]> </system_datastore> > > </metadata> > > </domain> > > > > If I run e.g., spec2017 on the virtual, I can see: > > > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 1350 root 20 0 843136 830068 2524 R 78.1 0.2 513:16.16 bwaves_r_base.m > > 2456 root 20 0 804608 791264 2524 R 76.6 0.2 491...

mischeduler (pre-RA) experiments

2017 Nov 23

3

mischeduler (pre-RA) experiments

Hi, I have been experimenting for a while with tryCandidate() method of the pre-RA mischeduler. I have by chance found some parameters that give quite good results on benchmarks on SystemZ (on average 1% improvement, some improvements of several percent and very little regressions). Basically, I add a "latency heuristic boost" just above processor resources checking:

Re: NUMA issues on virtualized hosts

2018 Sep 14

0

Re: NUMA issues on virtualized hosts

...#39; model='pci-bridge'/></devices> > <metadata> > <system_datastore><![CDATA[/opt/opennebula/var/datastores/108/55782]]> </system_datastore> > </metadata> > </domain> > > If I run e.g., spec2017 on the virtual, I can see: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 1350 root 20 0 843136 830068 2524 R 78.1 0.2 513:16.16 bwaves_r_base.m > 2456 root 20 0 804608 791264 2524 R 76.6 0.2 491:39.92 bwaves_r_base...

GEP with a null pointer base

2017 Jul 06

5

GEP with a null pointer base

Hi everyone, I've got a problem that I would like some input on. The problem basically boils down to a program that I am compiling, whose source I don't control, doing something like this: p = (char*)0 + n where 'n' is an intptr_t-sized value that the program knows is actually a valid address for a pointer. clang translates this as %p = getelementptr inbounds i8, i8*

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 07

2

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

...k if the lattice supports > variables. > > In particular, in your example, given no other call sites, > it should eliminate the dead code. > > (In a real program, it may require cloning). > > > In the actual program (SPEC2017/gcc, ironically), there are > multiple calls to fn2 and only one of them has the property > that the 1st and 2nd argument are the same (as is shown in my > pseudo code). Internally, we have another developer, Matt > Simpson, working on a function special...

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

2017 Aug 04

3

[RFC][InlineCost] Modeling JumpThreading (or similar) in inline cost model

All, I'm working on an improvement to the inline cost model, but I'm unsure how to proceed. Let me begin by first describing the problem I'm trying to solve. Consider the following pseudo C code: *typedef struct element { unsigned idx; } element_t; * *static inline unsigned char fn2 (element_t *dst_ptr, const element_t *a_ptr, const element_t *b_ptr,

RFC: Switching to the new pass manager by default

2017 Oct 18

18

RFC: Switching to the new pass manager by default

Greetings everyone! The new pass manager is getting extremely close to the point where I'm not aware of any significant outstanding work needed, and I'd like to see what else would be needed to enable it by default. Here are the current functionality I'm aware of outstanding: 1) Does not do non-trivial loop unswitching. Majority of this is in https://reviews.llvm.org/D34200 but will

search for: spec2017