thr3ads.net - similar to: "[LLVMdev] [RFC] Heuristic for complete loop unrolling"

Displaying 20 results from an estimated 8000 matches similar to: "[LLVMdev] [RFC] Heuristic for complete loop unrolling"

[LLVMdev] Disable loop unroll pass

2012 Nov 23

[LLVMdev] Disable loop unroll pass

Hi, Ivan: Sorry for deviating the topic a bit. As I told you before I'm a LLVM newbie, I cannot give you conclusive answer if the proposed interface is ok or not. My personal opinion on these two interface is summarized bellow: - hasZeroCostLoop() pro: it is clearly state the HW support. con: Having zero cost loop doesn't imply the benefit HW loop could achieve.

[LLVMdev] RFC: LoopEditor, a high-level loop transform toolkit

2015 Jul 28

[LLVMdev] RFC: LoopEditor, a high-level loop transform toolkit

Hi Michael, +llvmdev,Hal,Nadav For testing, I was currently thinking of a two pronged approach. Lit tests as you suggest with a dummy pass, probably with command line options to define what transform to do, and unit tests to test the delegate behaviour and return values. I'll try and produce a mega patch with at least the loop vectoriser moved over, then split it up again after review.

[LLVMdev] Disable loop unroll pass

2012 Nov 22

[LLVMdev] Disable loop unroll pass

Hi, Gang: I don't want to discuss Open64 internal in LLVM mailing list. Let us only focus on the design per se. As your this mail and your previous mail combined give me a impression that : The only reason you introduce the specific operator for HW loop in Scalar Opt simply because you have hard time in figure out the trip count in CodeGen. This might be true for Open64's

[LLVMdev] Disable loop unroll pass

2012 Nov 23

[LLVMdev] Disable loop unroll pass

Hi Shuxin, On 23/11/2012 00:17, Shuxin Yang wrote: > Hi, Gang: > > I don't want to discuss Open64 internal in LLVM mailing list. Let us > only focus on the design per se. > As your this mail and your previous mail combined give me a impression > that : > > The only reason you introduce the specific operator for HW loop in > Scalar Opt simply because >

[LLVMdev] Modifying LoopUnrollingPass

2015 May 02

[LLVMdev] Modifying LoopUnrollingPass

Hi Zhoulai, I am trying to modify "LoopUnrollPass" in llvm which produces multiple copies of loop equal to the loop unroll factor.Currently, using multicore architecture, say 3 for example and the execution goes like: for 3 cores if there are 9 iterations of loop core instruction 1 0,3,6 2 1,4,7 3 2,5,8 But I want to to

llvm is getting slower, January edition

2017 Jan 18

llvm is getting slower, January edition

On 1/18/17 3:55 PM, Davide Italiano via llvm-dev wrote: > On Tue, Jan 17, 2017 at 6:02 PM, Mikhail Zolotukhin > <mzolotukhin at apple.com> wrote: >> Hi, >> >> Continuing recent efforts in understanding compile time slowdowns, I looked at some historical data: I picked one test and tried to pin-point commits that affected its compile-time. The data I have is not 100%

[LLVMdev] Possible typo in LoopUnrollPass.cpp

2012 Apr 03

[LLVMdev] Possible typo in LoopUnrollPass.cpp

hi, In "LoopUnrollPass.cpp", when trying to reduce unroll count to meet the unroll threshold requirement in line 200 and line 206, variable "CurrentThreshold" is used in the computation, instead of the variable "Threshold", which is defined by: // Determine the current unrolling threshold. While this is normally set // from UnrollThreshold, it is overridden to

[LLVMdev] aarch64 status for generating SIMD instructions

2015 Feb 09

[LLVMdev] aarch64 status for generating SIMD instructions

% clang -S -O3 -mcpu=cortex-a57 -ffast-math -Rpass-analysis=loop-vectorize dot.c dot.c:15:1: remark: loop not vectorized: value that could not be identified as reduction is used outside the loop [-Rpass-analysis=loop-vectorize] } ^ dot.c:15:1: note: could not determine the original source location for :0:0 I found “llvm-as < /dev/null | llc -march=aarch64 -mattr=help” which listed a

llvm is getting slower, January edition

2017 Jan 18

llvm is getting slower, January edition

Hi, Continuing recent efforts in understanding compile time slowdowns, I looked at some historical data: I picked one test and tried to pin-point commits that affected its compile-time. The data I have is not 100% accurate, but hopefully it helps to provide an overview of what's going on with compile time in LLVM and give a better understanding of what changes usually impact compile time.

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi, I am trying to get a small loop to *not vectorize* for cases where it doesn't make sense. For instance, this loop: void foo(int a[4][8], int n) { int b[4][8]; for(int i = 0; i < 4; i++) { for(int j = 0; j < n; j++) { a[i][j] = b[i][j]; } } } * Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform

[LLVMdev] LLVM loop vectorizer

2015 Jul 08

[LLVMdev] LLVM loop vectorizer

Hello. I am trying to vectorize a CSR SpMV (sparse matrix vector multiplication) procedure but the LLVM loop vectorizer is not able to handle such code. I am using cland and llvm version 3.4 (on Ubuntu 12.10). I use the -fvectorize option with clang and -loop-vectorize with opt-3.4 . The CSR SpMV function is inspired from

SCEV LoopTripCount

2016 Aug 10

SCEV LoopTripCount

Hello, I was doing some experiments with SCEV and especially the loop trip count. Sorry for the dumb question, what is the easiest way to dump SCEV analysis results on a .bc file? On a side note, I wanted to see if we could optimize this function: unsigned long kernel(long w, long h, long d) { unsigned long count = 0; for(int i = 0; i < w; ++i) for(int j = i; j < h; ++j) for(int k = j; k

[LLVMdev] [PATCH] Split LoopUnroll pass into mechanism and policy

2008 May 07

[LLVMdev] [PATCH] Split LoopUnroll pass into mechanism and policy

Hello Matthijs, Separating mechanism from policy is a good thing for the LoopUnroll pass. Instead of moving the policy to a subclass though, I think it'd be better to move the mechanism, the unrollLoop function, out to be a standalone utility function, with the LoopInfo object passed in explicitly. FoldBlockIntoPredecessor would also be good to make into a standalone utility function, since

Scaling to many basic blocks

2015 Aug 23

Scaling to many basic blocks

On Sat, Aug 22, 2015 at 11:57 PM, Michael Zolotukhin <mzolotukhin at apple.com> wrote: > Hi, > > Several passes would have troubles with such code (namely, GVN and > JumpThreading). Can you just choose not to run those particular passes? I suppose the big problem would be if there's a problem with the code generation and related stuff like instruction scheduling and

Noisy benchmark results?

2017 Mar 01

Noisy benchmark results?

On 28 Feb 2017, at 22:50, Michael Zolotukhin <mzolotukhin at apple.com<mailto:mzolotukhin at apple.com>> wrote: I also usually rerun suspiciously improved or regressed tests to verify the performance change. Most of the time, if it was just a noise, the test doesn’t appear on another run. I wish LNT (or any other script) could do that for me :) Michael Doesn't the lnt runtest nt

Scaling to many basic blocks

2015 Aug 22

Scaling to many basic blocks

How well does LLVM scale to many basic blocks? Let's say you have a single function consisting of a million basic blocks each with a few tens of instructions (and assuming the whole thing isn't trivially repetitive so the number of simultaneously live variables and whatever is large) and you feed that through the optimisers into the backend code generator, will this work okay, or will it

[LLVMdev] [PATCH] Split LoopUnroll pass into mechanism and policy

2008 May 09

[LLVMdev] [PATCH] Split LoopUnroll pass into mechanism and policy

Hi All, the attached patch performs the splitting in the proposed manner. before applying the patch, please execute svn cp lib/Transforms/Scalar/LoopUnroll.cpp lib/Transforms/Utils/UnrollLoop.cpp to make the patch apply and preserve proper history. Transforms/Utils/UnrollLoop.cpp contains the unrollLoop function, which is now used by the LoopUnroll pass. I've also moved the

[cfe-dev] Who wants faster LLVM/Clang builds?

2017 Dec 12

[cfe-dev] Who wants faster LLVM/Clang builds?

On Mon, Dec 11, 2017 at 3:37 PM, Mikhail Zolotukhin via cfe-dev < cfe-dev at lists.llvm.org> wrote: > Hi Kim, > > On Dec 10, 2017, at 7:39 AM, Kim Gräsman <kim.grasman at gmail.com> wrote: > > Hi Michael, > > On Thu, Dec 7, 2017 at 3:16 AM, Michael Zolotukhin > <mzolotukhin at apple.com> wrote: > > > Nice to IWYU developers here:) I wonder how

[cfe-dev] Who wants faster LLVM/Clang builds?

2017 Dec 10

[cfe-dev] Who wants faster LLVM/Clang builds?

Hi Michael, On Thu, Dec 7, 2017 at 3:16 AM, Michael Zolotukhin <mzolotukhin at apple.com> wrote: > > Nice to IWYU developers here:) I wonder how hard it would be to run IWYU on > LLVM/Clang (or, if it’s supposed to work, I wonder what I did wrong). There are known problems with running IWYU over LLVM/Clang -- Zachary Turner made an attempt a while back to get it up and running.

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Sriram, Thanks for performing this analysis. The problem here, both for memcpy and the vectorizer, is that we can’t predict the size of “n”, even though the only use of ’n’ is for the loop bound for the alloca [4 x [8 x i32]]. If you change the unroll condition to TC >= 0 then you will disable loop unrolling for all loops because getSmallConstantTripCount returns an unsigned number. You

similar to: [LLVMdev] [RFC] Heuristic for complete loop unrolling