thr3ads.net - similar to: "[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info"

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info"

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

2014 Jan 21

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

On 16/01/2014, 23:47 , Andrew Trick wrote: > > On Jan 15, 2014, at 4:13 PM, Diego Novillo <dnovillo at google.com > <mailto:dnovillo at google.com>> wrote: > >> Chandler also pointed me at the vectorizer, which has its own >> unroller. However, the vectorizer only unrolls enough to serve the >> target, it's not as general as the runtime-triggered

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

2014 Jan 28

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

In r200270 I added support to unroll conditional stores in the loop vectorizer. It is currently off pending further benchmarking and can be enabled with "-mllvm -vectorize-num-stores-pred=1”. Furthermore, I added a heuristic to unroll until load/store ports are saturated “-mllvm enable-loadstore-runtime-unroll” instead of the pure size based heuristic. Those two together with a patch that

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

2014 Jan 16

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

On Wed, Jan 15, 2014 at 5:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Was the vectorizer successful in unrolling the loop in quantum_sigma_x? I > wonder if 'size’ is typically high or low. No. The vectorizer stated that it wasn't going to bother with the loop because it wasn't profitable. Specifically: LV: Checking a loop in "quantum_sigma_x" LV: Found a

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

2014 Jan 16

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

On Thu, Jan 16, 2014 at 9:26 AM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Diego, > > It looks like the problem is with the code in the vectorizer that tries to estimate the most profitable vectorization factor: > >> LV: Found an estimated cost of 6 for VF 2 For instruction: %3 = load >> i64* %state, align 8, !dbg !58, !tbaa !61 > > > It looks like a

(RFC) Encoding code duplication factor in discriminator

2016 Oct 27

(RFC) Encoding code duplication factor in discriminator

The impact to debug_line is actually not small. I only implemented the part 1 (encoding duplication factor) for loop unrolling and loop vectorization. The debug_line size overhead for "-O2 -g1" binary of speccpu C/C++ benchmarks: 433.milc 23.59% 444.namd 6.25% 447.dealII 8.43% 450.soplex 2.41% 453.povray 5.40% 470.lbm 0.00% 482.sphinx3 7.10% 400.perlbench 2.77% 401.bzip2 9.62% 403.gcc

[LLVMdev] RFC - Adding an optimization report facility?

2014 Mar 06

[LLVMdev] RFC - Adding an optimization report facility?

The context of this is performance analysis of generated code. My interest is to trace at a high-level the major decisions done by the various optimizers. For instance, when the inliner decides to inline foo into bar, or the loop unroller decides to unroll a loop N times, or the vectorizer decides to vectorize a loop body. Many of these details are usually available via -debug-only. However, this

(RFC) Encoding code duplication factor in discriminator

2016 Oct 27

(RFC) Encoding code duplication factor in discriminator

The large percentages are from those tiny benchmarks. If you look at omnetpp (0.52%), and xalanc (1.46%), the increase is small. To get a better average increase, you can sum up total debug_line size before and after and compute percentage accordingly. David On Thu, Oct 27, 2016 at 1:11 PM, Dehao Chen <dehao at google.com> wrote: > The impact to debug_line is actually not small. I only

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi, I am trying to get a small loop to *not vectorize* for cases where it doesn't make sense. For instance, this loop: void foo(int a[4][8], int n) { int b[4][8]; for(int i = 0; i < 4; i++) { for(int j = 0; j < n; j++) { a[i][j] = b[i][j]; } } } * Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2011 Nov 08

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On Tue, 2011-11-08 at 12:12 +0100, Tobias Grosser wrote: > On 11/08/2011 11:45 AM, Hal Finkel wrote: > > I've attached the latest version of my autovectorization patch. > > > > Working through the test suite has proved to be a productive > > experience ;) -- And almost all of the bugs that it revealed have now > > been fixed. There are still two programs that

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

2011 Nov 08

[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass

On 11/08/2011 03:36 PM, Hal Finkel wrote: > On Tue, 2011-11-08 at 12:12 +0100, Tobias Grosser wrote: >> On 11/08/2011 11:45 AM, Hal Finkel wrote: >>> I've attached the latest version of my autovectorization patch. >>> >>> Working through the test suite has proved to be a productive >>> experience ;) -- And almost all of the bugs that it revealed

(RFC) Encoding code duplication factor in discriminator

2016 Oct 27

(RFC) Encoding code duplication factor in discriminator

Do you have an estimate of the debug_line size increase? I guess it will be small. David On Thu, Oct 27, 2016 at 11:39 AM, Dehao Chen <dehao at google.com> wrote: > Motivation: > Many optimizations duplicate code. E.g. loop unroller duplicates the loop > body, GVN duplicates computation, etc. The duplicated code will share the > same debug info with the original code. For

(RFC) Encoding code duplication factor in discriminator

2016 Oct 27

(RFC) Encoding code duplication factor in discriminator

Motivation: Many optimizations duplicate code. E.g. loop unroller duplicates the loop body, GVN duplicates computation, etc. The duplicated code will share the same debug info with the original code. For SamplePGO, the debug info is used to present the profile. Code duplication will affect profile accuracy. Taking loop unrolling for example: #1 foo(); #2 for (i = 0; i < N; i++) { #3 bar();

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Sriram, Thanks for performing this analysis. The problem here, both for memcpy and the vectorizer, is that we can’t predict the size of “n”, even though the only use of ’n’ is for the loop bound for the alloca [4 x [8 x i32]]. If you change the unroll condition to TC >= 0 then you will disable loop unrolling for all loops because getSmallConstantTripCount returns an unsigned number. You

[LLVMdev] Vectorization: Next Steps

2012 Feb 07

[LLVMdev] Vectorization: Next Steps

On Mon, 2012-02-06 at 14:26 -0800, Chris Lattner wrote: > On Feb 2, 2012, at 7:56 PM, Hal Finkel wrote: > > As some of you may know, I committed my basic-block autovectorization > > pass a few days ago. I encourage anyone interested to try it out (pass > > -vectorize to opt or -mllvm -vectorize to clang) and provide feedback. > > Especially in combination with

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

2014 Jan 21

[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

Just to add a few notes... On Tue, Jan 21, 2014 at 1:31 PM, Andrew Trick <atrick at apple.com> wrote: > Chandler suggested a way around the problem. I'll work on that first. > > > It is very difficult to deal with the LoopPassManager. The concept doesn’t > fit with typical loop passes, which may need to rerun function level > analyses, and can affect code outside the

[LLVMdev] RFC - Adding an optimization report facility?

2014 Mar 07

[LLVMdev] RFC - Adding an optimization report facility?

----- Original Message ----- > From: "Chris Lattner" <clattner at apple.com> > To: "Diego Novillo" <dnovillo at google.com> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Thursday, March 6, 2014 5:54:02 PM > Subject: Re: [LLVMdev] RFC - Adding an optimization report facility? > > > On Mar 6, 2014, at

[LLVMdev] Vectorization: Next Steps

2012 Feb 03

[LLVMdev] Vectorization: Next Steps

As some of you may know, I committed my basic-block autovectorization pass a few days ago. I encourage anyone interested to try it out (pass -vectorize to opt or -mllvm -vectorize to clang) and provide feedback. Especially in combination with -unroll-allow-partial, I have observed some significant benchmark speedups, but, I have also observed some significant slowdowns. I would like to share my

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Nadav, Thanks for the response. I forgot to mention that there is an upper limit of 16 for the Trip Count check, TinyTripCountVectorThreshold = 16; if (TC > 0u && TC < TinyTripCountVectorThreshold). So right now, any loop with Trip Count as 0, or with value >=16, LV with unroll. With the change to the lower bound, it will also include the loop with 0 trip count. SCEV returns 0

[LLVMdev] Vectorization: Next Steps

2012 Feb 09

[LLVMdev] Vectorization: Next Steps

On Feb 7, 2012, at 12:10 PM, Hal Finkel wrote: >>> 1. "Target Data" for vectorization - I think that in order to improve >>> the vectorization quality, the vectorizer will need more information >>> about the target. This information could be provided in the form of a >>> kind of extended target data. This extended target data might contain:

[LLVMdev] Vectorization: Next Steps

2012 Feb 06

[LLVMdev] Vectorization: Next Steps

On Feb 2, 2012, at 7:56 PM, Hal Finkel wrote: > As some of you may know, I committed my basic-block autovectorization > pass a few days ago. I encourage anyone interested to try it out (pass > -vectorize to opt or -mllvm -vectorize to clang) and provide feedback. > Especially in combination with -unroll-allow-partial, I have observed > some significant benchmark speedups, but, I

similar to: [LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info