thr3ads.net - similar to: "Enable vectorizer-maximize-bandwidth by default?"

Displaying 20 results from an estimated 500 matches similar to: "Enable vectorizer-maximize-bandwidth by default?"

Debug info interacting with optimization and code generation

2016 Oct 07

Debug info interacting with optimization and code generation

In theory, compiler should generate bit-identical code with and without debug info. I.e. # clang -c -O2 -g a.cc -o a.g.o # clang -c -O2 -g0 a.cc -o a.g0.o # strip a.g.o a.g0.o # diff a.g.o a.g0.o The diff should find two binaries identical. For brevity, in the rest of the mail, I'll refer to this requirement as "codegen consistency" (any better name?) Unfortunately, LLVM does not

(RFC) Adjusting default loop fully unroll threshold

2017 Jan 30

(RFC) Adjusting default loop fully unroll threshold

Currently, loop fully unroller shares the same default threshold as loop dynamic unroller and partial unroller. This seems conservative because unlike dynamic/partial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368, I proposed to double the threshold for loop fully unroller. This will change the codegen of several SPECCPU benchmarks: Code

(RFC) Encoding code duplication factor in discriminator

2016 Oct 27

(RFC) Encoding code duplication factor in discriminator

The impact to debug_line is actually not small. I only implemented the part 1 (encoding duplication factor) for loop unrolling and loop vectorization. The debug_line size overhead for "-O2 -g1" binary of speccpu C/C++ benchmarks: 433.milc 23.59% 444.namd 6.25% 447.dealII 8.43% 450.soplex 2.41% 453.povray 5.40% 470.lbm 0.00% 482.sphinx3 7.10% 400.perlbench 2.77% 401.bzip2 9.62% 403.gcc

(RFC) Adjusting default loop fully unroll threshold

2017 Jan 30

(RFC) Adjusting default loop fully unroll threshold

On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev < llvm-dev at lists.llvm.org> wrote: > On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > > Currently, loop fully unroller shares the same default threshold as loop > dynamic unroller and partial unroller. This seems conservative because > unlike dynamic/partial

(RFC) Encoding code duplication factor in discriminator

2016 Oct 27

(RFC) Encoding code duplication factor in discriminator

The large percentages are from those tiny benchmarks. If you look at omnetpp (0.52%), and xalanc (1.46%), the increase is small. To get a better average increase, you can sum up total debug_line size before and after and compute percentage accordingly. David On Thu, Oct 27, 2016 at 1:11 PM, Dehao Chen <dehao at google.com> wrote: > The impact to debug_line is actually not small. I only

(RFC) Adjusting default loop fully unroll threshold

2017 Jan 31

(RFC) Adjusting default loop fully unroll threshold

On Mon, Jan 30, 2017 at 3:56 PM, Chandler Carruth <chandlerc at google.com> wrote: > On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >> Currently, loop fully unroller shares the same default

(RFC) Adjusting default loop fully unroll threshold

2017 Jan 30

(RFC) Adjusting default loop fully unroll threshold

> On Jan 30, 2017, at 10:49 AM, Dehao Chen via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > Currently, loop fully unroller shares the same default threshold as loop dynamic unroller and partial unroller. This seems conservative because unlike dynamic/partial unrolling, fully unrolling will not affect LSD/ICache performance. In https://reviews.llvm.org/D28368

(RFC) Encoding code duplication factor in discriminator

2016 Oct 27

(RFC) Encoding code duplication factor in discriminator

Do you have an estimate of the debug_line size increase? I guess it will be small. David On Thu, Oct 27, 2016 at 11:39 AM, Dehao Chen <dehao at google.com> wrote: > Motivation: > Many optimizations duplicate code. E.g. loop unroller duplicates the loop > body, GVN duplicates computation, etc. The duplicated code will share the > same debug info with the original code. For

(RFC) Adjusting default loop fully unroll threshold

2017 Jan 31

(RFC) Adjusting default loop fully unroll threshold

> On Jan 30, 2017, at 4:56 PM, Dehao Chen <dehao at google.com> wrote: > > > > On Mon, Jan 30, 2017 at 3:56 PM, Chandler Carruth <chandlerc at google.com <mailto:chandlerc at google.com>> wrote: > On Mon, Jan 30, 2017 at 3:51 PM Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> On Jan 30,

(RFC) Encoding code duplication factor in discriminator

2016 Oct 27

(RFC) Encoding code duplication factor in discriminator

Motivation: Many optimizations duplicate code. E.g. loop unroller duplicates the loop body, GVN duplicates computation, etc. The duplicated code will share the same debug info with the original code. For SamplePGO, the debug info is used to present the profile. Code duplication will affect profile accuracy. Taking loop unrolling for example: #1 foo(); #2 for (i = 0; i < N; i++) { #3 bar();

wireframe advice - with reproducible code

2011 May 16

wireframe advice - with reproducible code

Dear List, i am trying to produce a 3d plot using wireframe using the code: wireframe(Residuals_FD ~ Elevation * Temperature, data = data2, scales = list(arrows = FALSE), drape = TRUE, colorkey = TRUE) As you can see when the code (using the data below) is run the plot area is set-up correctly but the actual surface is missing? Any help would be greatly appreciated. Chris #data Elevation

(RFC) Adjusting default loop fully unroll threshold

2017 Feb 13

(RFC) Adjusting default loop fully unroll threshold

FWIW, I'm good with the updated data, but I'd really like at least someone from Apple and someone from ARM to chime in here... CC-ing random people in the hope it helps... On Mon, Feb 13, 2017 at 8:30 AM Dehao Chen via llvm-dev < llvm-dev at lists.llvm.org> wrote: > Thanks for the comment. The performance experiments were performed on > Intel Sandybridge. Updated this info to

(RFC) Adjusting default loop fully unroll threshold

2017 Feb 10

(RFC) Adjusting default loop fully unroll threshold

On 02/10/2017 05:21 PM, Dehao Chen wrote: > Thanks every for the comments. > > Do we have a decision here? You're good to go as far as I'm concerned. -Hal > > Dehao > > On Tue, Feb 7, 2017 at 10:24 PM, Hal Finkel <hfinkel at anl.gov > <mailto:hfinkel at anl.gov>> wrote: > > > On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote:

Enable vectorizer-maximize-bandwidth by default?

2017 May 30

Enable vectorizer-maximize-bandwidth by default?

On Tue, May 30, 2017 at 1:40 AM Agabaria, Mohammed via llvm-dev < llvm-dev at lists.llvm.org> wrote: > We’re seeing nice improvements but also significant degradations on IA, > which we would like to investigate before the patch is committed. > > > > Major degradations we see: > > > > networking > > ip_pktcheckb1m -6.80 % > >

Enable vectorizer-maximize-bandwidth by default?

2017 Jun 12

Enable vectorizer-maximize-bandwidth by default?

Guys, Just to clarify that with the current fix in SLM there is no need to wait for other issues to be fixed (minor issue). So you can move on with your patch. From: Agabaria, Mohammed Sent: Wednesday, June 07, 2017 15:24 To: Zaks, Ayal <ayal.zaks at intel.com>; Chandler Carruth <chandlerc at gmail.com>; Flamedoge <code.kchoi at gmail.com>; Dehao Chen <dehao at google.com>

(RFC) Adjusting default loop fully unroll threshold

2017 Feb 08

(RFC) Adjusting default loop fully unroll threshold

On 02/07/2017 05:29 PM, Sanjay Patel via llvm-dev wrote: > Sorry if I missed it, but what machine/CPU are you using to collect > the perf numbers? > > I am concerned that what may be a win on a CPU that keeps a couple of > hundred instructions in-flight and has many MB of caches will not hold > for a small core. In my experience, unrolling tends to help weaker cores even more

(RFC) Adjusting default loop fully unroll threshold

2017 Feb 07

(RFC) Adjusting default loop fully unroll threshold

Ping... with the updated code size impact data, any more comments? Any more data that would be interesting to collect? Thanks, Dehao On Thu, Feb 2, 2017 at 2:07 PM, Dehao Chen <dehao at google.com> wrote: > Here is the code size impact for clang, chrome and 24 google internal > benchmarks (name omited, 14 15 16 are encoding/decoding benchmarks similar > as h264). There are 2

(RFC) Adjusting default loop fully unroll threshold

2017 Feb 02

(RFC) Adjusting default loop fully unroll threshold

I had suggested having size metrics from somewhat larger applications such as Chrome, Webkit, or Firefox; clang itself; and maybe some of our internal binaries with rough size brackets? On Wed, Feb 1, 2017 at 4:33 PM Dehao Chen <dehao at google.com> wrote: > With the new data points, any comments on whether this can justify setting > fully inline threshold to 300 (or any other

(RFC) Encoding code duplication factor in discriminator

2016 Nov 04

(RFC) Encoding code duplication factor in discriminator

Discussed with Hal, Adrain and Paul offline at the llvm dev meeting today. * trip count is not enough for vectorization, there is runtime check that might go false, which can be reflected in profile that we may want to preserve. * simply recording these context-profile may cause problems to iterative-sample-pgo. i.e. when you find a loop's vectorized version no executed (due to runtime

(RFC) Adjusting default loop fully unroll threshold

2017 Feb 02

(RFC) Adjusting default loop fully unroll threshold

> On Feb 1, 2017, at 4:57 PM, Xinliang David Li via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > clang, chrome, and some internal large apps are good candidates for size metrics. I'd also add the standard LLVM testsuite just because it's the suite everyone in the community can use. Michael > > David > > On Wed, Feb 1, 2017 at 4:47 PM, Chandler Carruth via

similar to: Enable vectorizer-maximize-bandwidth by default?