similar to: [LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info"

2014 Jan 21
5
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
On 16/01/2014, 23:47 , Andrew Trick wrote: > > On Jan 15, 2014, at 4:13 PM, Diego Novillo <dnovillo at google.com > <mailto:dnovillo at google.com>> wrote: > >> Chandler also pointed me at the vectorizer, which has its own >> unroller. However, the vectorizer only unrolls enough to serve the >> target, it's not as general as the runtime-triggered
2014 Jan 28
2
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
In r200270 I added support to unroll conditional stores in the loop vectorizer. It is currently off pending further benchmarking and can be enabled with "-mllvm -vectorize-num-stores-pred=1”. Furthermore, I added a heuristic to unroll until load/store ports are saturated “-mllvm enable-loadstore-runtime-unroll” instead of the pure size based heuristic. Those two together with a patch that
2014 Jan 16
3
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
On Wed, Jan 15, 2014 at 5:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > Was the vectorizer successful in unrolling the loop in quantum_sigma_x? I > wonder if 'size’ is typically high or low. No. The vectorizer stated that it wasn't going to bother with the loop because it wasn't profitable. Specifically: LV: Checking a loop in "quantum_sigma_x" LV: Found a
2014 Jan 16
3
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
On Thu, Jan 16, 2014 at 9:26 AM, Nadav Rotem <nrotem at apple.com> wrote: > Hi Diego, > > It looks like the problem is with the code in the vectorizer that tries to estimate the most profitable vectorization factor: > >> LV: Found an estimated cost of 6 for VF 2 For instruction: %3 = load >> i64* %state, align 8, !dbg !58, !tbaa !61 > > > It looks like a
2016 Oct 27
2
(RFC) Encoding code duplication factor in discriminator
The impact to debug_line is actually not small. I only implemented the part 1 (encoding duplication factor) for loop unrolling and loop vectorization. The debug_line size overhead for "-O2 -g1" binary of speccpu C/C++ benchmarks: 433.milc 23.59% 444.namd 6.25% 447.dealII 8.43% 450.soplex 2.41% 453.povray 5.40% 470.lbm 0.00% 482.sphinx3 7.10% 400.perlbench 2.77% 401.bzip2 9.62% 403.gcc
2014 Mar 06
11
[LLVMdev] RFC - Adding an optimization report facility?
The context of this is performance analysis of generated code. My interest is to trace at a high-level the major decisions done by the various optimizers. For instance, when the inliner decides to inline foo into bar, or the loop unroller decides to unroll a loop N times, or the vectorizer decides to vectorize a loop body. Many of these details are usually available via -debug-only. However, this
2016 Oct 27
0
(RFC) Encoding code duplication factor in discriminator
The large percentages are from those tiny benchmarks. If you look at omnetpp (0.52%), and xalanc (1.46%), the increase is small. To get a better average increase, you can sum up total debug_line size before and after and compute percentage accordingly. David On Thu, Oct 27, 2016 at 1:11 PM, Dehao Chen <dehao at google.com> wrote: > The impact to debug_line is actually not small. I only
2013 Sep 27
2
[LLVMdev] Trip count and Loop Vectorizer
Hi, I am trying to get a small loop to *not vectorize* for cases where it doesn't make sense. For instance, this loop: void foo(int a[4][8], int n) { int b[4][8]; for(int i = 0; i < 4; i++) { for(int j = 0; j < n; j++) { a[i][j] = b[i][j]; } } } * Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform
2011 Nov 08
3
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Tue, 2011-11-08 at 12:12 +0100, Tobias Grosser wrote: > On 11/08/2011 11:45 AM, Hal Finkel wrote: > > I've attached the latest version of my autovectorization patch. > > > > Working through the test suite has proved to be a productive > > experience ;) -- And almost all of the bugs that it revealed have now > > been fixed. There are still two programs that
2011 Nov 08
0
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On 11/08/2011 03:36 PM, Hal Finkel wrote: > On Tue, 2011-11-08 at 12:12 +0100, Tobias Grosser wrote: >> On 11/08/2011 11:45 AM, Hal Finkel wrote: >>> I've attached the latest version of my autovectorization patch. >>> >>> Working through the test suite has proved to be a productive >>> experience ;) -- And almost all of the bugs that it revealed
2016 Oct 27
0
(RFC) Encoding code duplication factor in discriminator
Do you have an estimate of the debug_line size increase? I guess it will be small. David On Thu, Oct 27, 2016 at 11:39 AM, Dehao Chen <dehao at google.com> wrote: > Motivation: > Many optimizations duplicate code. E.g. loop unroller duplicates the loop > body, GVN duplicates computation, etc. The duplicated code will share the > same debug info with the original code. For
2016 Oct 27
8
(RFC) Encoding code duplication factor in discriminator
Motivation: Many optimizations duplicate code. E.g. loop unroller duplicates the loop body, GVN duplicates computation, etc. The duplicated code will share the same debug info with the original code. For SamplePGO, the debug info is used to present the profile. Code duplication will affect profile accuracy. Taking loop unrolling for example: #1 foo(); #2 for (i = 0; i < N; i++) { #3 bar();
2013 Sep 27
0
[LLVMdev] Trip count and Loop Vectorizer
Hi Sriram, Thanks for performing this analysis. The problem here, both for memcpy and the vectorizer, is that we can’t predict the size of “n”, even though the only use of ’n’ is for the loop bound for the alloca [4 x [8 x i32]]. If you change the unroll condition to TC >= 0 then you will disable loop unrolling for all loops because getSmallConstantTripCount returns an unsigned number. You
2012 Feb 07
4
[LLVMdev] Vectorization: Next Steps
On Mon, 2012-02-06 at 14:26 -0800, Chris Lattner wrote: > On Feb 2, 2012, at 7:56 PM, Hal Finkel wrote: > > As some of you may know, I committed my basic-block autovectorization > > pass a few days ago. I encourage anyone interested to try it out (pass > > -vectorize to opt or -mllvm -vectorize to clang) and provide feedback. > > Especially in combination with
2014 Jan 21
2
[LLVMdev] Loop unrolling opportunity in SPEC's libquantum with profile info
Just to add a few notes... On Tue, Jan 21, 2014 at 1:31 PM, Andrew Trick <atrick at apple.com> wrote: > Chandler suggested a way around the problem. I'll work on that first. > > > It is very difficult to deal with the LoopPassManager. The concept doesn’t > fit with typical loop passes, which may need to rerun function level > analyses, and can affect code outside the
2014 Mar 07
3
[LLVMdev] RFC - Adding an optimization report facility?
----- Original Message ----- > From: "Chris Lattner" <clattner at apple.com> > To: "Diego Novillo" <dnovillo at google.com> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Thursday, March 6, 2014 5:54:02 PM > Subject: Re: [LLVMdev] RFC - Adding an optimization report facility? > > > On Mar 6, 2014, at
2012 Feb 03
8
[LLVMdev] Vectorization: Next Steps
As some of you may know, I committed my basic-block autovectorization pass a few days ago. I encourage anyone interested to try it out (pass -vectorize to opt or -mllvm -vectorize to clang) and provide feedback. Especially in combination with -unroll-allow-partial, I have observed some significant benchmark speedups, but, I have also observed some significant slowdowns. I would like to share my
2013 Sep 27
2
[LLVMdev] Trip count and Loop Vectorizer
Hi Nadav, Thanks for the response. I forgot to mention that there is an upper limit of 16 for the Trip Count check, TinyTripCountVectorThreshold = 16; if (TC > 0u && TC < TinyTripCountVectorThreshold). So right now, any loop with Trip Count as 0, or with value >=16, LV with unroll. With the change to the lower bound, it will also include the loop with 0 trip count. SCEV returns 0
2012 Feb 09
0
[LLVMdev] Vectorization: Next Steps
On Feb 7, 2012, at 12:10 PM, Hal Finkel wrote: >>> 1. "Target Data" for vectorization - I think that in order to improve >>> the vectorization quality, the vectorizer will need more information >>> about the target. This information could be provided in the form of a >>> kind of extended target data. This extended target data might contain:
2012 Feb 06
0
[LLVMdev] Vectorization: Next Steps
On Feb 2, 2012, at 7:56 PM, Hal Finkel wrote: > As some of you may know, I committed my basic-block autovectorization > pass a few days ago. I encourage anyone interested to try it out (pass > -vectorize to opt or -mllvm -vectorize to clang) and provide feedback. > Especially in combination with -unroll-allow-partial, I have observed > some significant benchmark speedups, but, I