thr3ads.net - similar to: "[LLVMdev] Trip count and Loop Vectorizer"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Trip count and Loop Vectorizer"

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Sriram, Thanks for performing this analysis. The problem here, both for memcpy and the vectorizer, is that we can’t predict the size of “n”, even though the only use of ’n’ is for the loop bound for the alloca [4 x [8 x i32]]. If you change the unroll condition to TC >= 0 then you will disable loop unrolling for all loops because getSmallConstantTripCount returns an unsigned number. You

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Nadav, Thanks for the response. I forgot to mention that there is an upper limit of 16 for the Trip Count check, TinyTripCountVectorThreshold = 16; if (TC > 0u && TC < TinyTripCountVectorThreshold). So right now, any loop with Trip Count as 0, or with value >=16, LV with unroll. With the change to the lower bound, it will also include the loop with 0 trip count. SCEV returns 0

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Sriram, The problem is that you want to unroll/vectorize many loops with non-constant loop count - it is a trade-off of which case you estimate as more likely. int foo(int *ptr, int n) { for ( .. i <n) ptr[i] = ... } The question is: is it more likely to have “n” such that unrolling is beneficial or not. Now, you could probably write an analysis that bounds the loop count (for the

[LLVMdev] Improving the usability of LNT

2013 Apr 30

[LLVMdev] Improving the usability of LNT

Hi Daniel, I made some changes to the LNT perf reporting tool to make it more user friendly by adding some features: 1. Make the sidebar and the navigation bar stationary, so that it is easy to navigate the site 2. Have the pop-down menu for the items in the navigation bar, activate upon hovering the mouse, rather than clicking the item 3. Add a nav-link in the sidebar for the

[LLVMdev] Improving the usability of LNT

2013 May 02

[LLVMdev] Improving the usability of LNT

Wow, that sounds great! Thanks for working on this, and yes, please, send the patches! --renato On 30 April 2013 16:23, Murali, Sriram <sriram.murali at intel.com> wrote: > Hi Daniel,**** > > I made some changes to the LNT perf reporting tool to make it more user > friendly by adding some features:**** > > **1. **Make the sidebar and the navigation bar stationary,

[LLVMdev] State of Loop Unrolling and Vectorization in LLVM

2013 Apr 15

[LLVMdev] State of Loop Unrolling and Vectorization in LLVM

Hi , I have a test case (and a micro benchmark made out of the test case) to check if loop unrolling and loop vectorization is efficiently done on LLVM. Here is the test case (credits: Tyler Nowicki) {code} extern float * array; extern int array_size; float g() { int i; float total = 0; for(i = 0; i < array_size; i++) { total += array[i]; } return total; } {code} When

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

On Sep 27, 2013, at 12:47 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > so you could infer that n must be smaller than 8 (because you know the range of the other dimension). The question is how often does such an example occur, where this is possible, to make such an effort justifiable? smaller equal, of course ;)

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hey Arnold, I have run into this situation many times while benchmarking. I think it is best if this is addressed using a simple heuristic. For that, we need to identify the loop cost and decide if it makes sense to completely unroll the loop, or partially unroll. I am unsure of the optimal way to implement this though. I want to run it by the list to get any ideas floating around :) Thanks

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

2013 Jan 29

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

Hello, This patch aims to improve compile time performance by increasing the SCEV vector size in LoopStrengthReduce. It is observed that the BaseRegs vector size is 4 in most cases, and elements are frequently copied when it is initialized as SmallVector<const SCEV *, 2> BaseRegs. Our benchmark results show that the compilation time performance improved by ~0.5%. Patch by Wan Xiaofei.

[LLVMdev] Disabling x87 instructions for a sub-target

2012 Apr 04

[LLVMdev] Disabling x87 instructions for a sub-target

Hello there, I recently started working on the LLVM backend for a target that doesn't support x87 instructions. Currently, I am in the process of completely disabling some x87 instructions such as fcomi, fcompi,... for a specific sub-target. I also do not have SSE enabled for my sub-target, and llvm resorts to fcomi* instructions for FP compare instructions. Is there a way to bypass the

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

2013 Jan 29

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

On Tue, Jan 29, 2013 at 3:59 PM, Murali, Sriram <sriram.murali at intel.com> wrote: > Our benchmark results show that the compilation time performance improved by > ~0.5%. That's fairly small; what was the standard deviation, confidence interval, etc? -- Sean Silva

[LLVMdev] Calling with register indirect reference instead of memory indirect reference.

2013 Feb 28

[LLVMdev] Calling with register indirect reference instead of memory indirect reference.

Hi, I am working on a small optimization feature to replace the calls with indirect reference using a memory with an indirect reference using register. The purpose of this feature is to improve the performance of calls to functions referred to by function pointers. The motivation behind this work is that gcc does this optimization. Here is a small test case, that will generate an indirect call

[LLVMdev] Disabling x87 instructions for a sub-target

2012 Apr 04

[LLVMdev] Disabling x87 instructions for a sub-target

Hi Sriram, I'm not sure if I understand your question correctly: Do you need to generate code that contains no x87 floating-point instructions altogether, but uses calls into a soft-float library instead? That behaviour can be enabled using the "-soft-float" flag, as far as I know. Or is it only about the fcomi* instructions, which are not supported by pre-Pentium Pro chips? Then I

SCEV cannot compute the trip count of Simple loop

2016 Sep 16

SCEV cannot compute the trip count of Simple loop

Hi Deepali, SCEV reports the backedge taken count as "((-1 * (sext i32 (3 + %x) to i64))<nsw> + ((sext i32 (3 + %x) to i64) smax (sext i32 (6 + %x) to i64)))", so symbolically it does have an answer. Ideally SCEV should be able to exploit <nsw> on (3 + %x) and (6 + %x) to fold the expression above to "3", but due to some systemic issues SCEV can't exploit

[LLVMdev] Get the loop trip count variable

2010 Apr 06

[LLVMdev] Get the loop trip count variable

Thanks a lot for your guys' help!!! I guess once I am able to get *V* (which probably is a pointer to a Value object), then, I can instrument some code at the IR level to dump V. As long as I maintain V at this pass stage, I should be able to dump the loop trip count. This is true, isn't it? Basically, what I am going to do is to add a function call before the loop body, such as:

[LLVMdev] Questions about trip count

2010 Aug 12

[LLVMdev] Questions about trip count

Dear guys, I am having problems to obtain good information from the LoopInfo. I am always getting a trip count of 0, even though I am clearly passing a loop with a constant bound. I am using this pass below: void testLoopInfo(const Function& F) const { const LoopInfo *LI = &getAnalysis<LoopInfo>(); Function::const_iterator BB = F.begin(), E = F.end(); for (; BB !=

[LLVMdev] [PATCH] Split LoopUnroll pass into mechanism and policy

2008 May 09

[LLVMdev] [PATCH] Split LoopUnroll pass into mechanism and policy

Hi All, the attached patch performs the splitting in the proposed manner. before applying the patch, please execute svn cp lib/Transforms/Scalar/LoopUnroll.cpp lib/Transforms/Utils/UnrollLoop.cpp to make the patch apply and preserve proper history. Transforms/Utils/UnrollLoop.cpp contains the unrollLoop function, which is now used by the LoopUnroll pass. I've also moved the

SCEV cannot compute the trip count of Simple loop

2016 Sep 16

SCEV cannot compute the trip count of Simple loop

I have modified the example test case for UB error, still it didn’t unroll void foo(int x) { int p, i = 1; int mat[9][9][9]; for (p = (x+1) ; p < (x+3) ;p++) mat[x][p-1][i] = mat[x][p-1][i] + 5; } Regard, Deepali From: Kevin Choi [mailto:code.kchoi at gmail.com] Sent: Friday, September 16, 2016 1:20 PM To: Rai, Deepali Cc: llvm-dev at lists.llvm.org Subject: Re: [llvm-dev] SCEV

[LLVMdev] [PATCH] Split LoopUnroll pass into mechanism and policy

2008 May 07

[LLVMdev] [PATCH] Split LoopUnroll pass into mechanism and policy

Hello Matthijs, Separating mechanism from policy is a good thing for the LoopUnroll pass. Instead of moving the policy to a subclass though, I think it'd be better to move the mechanism, the unrollLoop function, out to be a standalone utility function, with the LoopInfo object passed in explicitly. FoldBlockIntoPredecessor would also be good to make into a standalone utility function, since

[LLVMdev] LLVM Loop Vectorizer puzzle

2013 May 23

[LLVMdev] LLVM Loop Vectorizer puzzle

Hi, I have the llvm loop vectorizer to complie the following sample: //================= int test(int *a, int n) { for(int i = 0; i < n; i++) { a[i] += i; } return 0; } //================ The corresponded .ll file has a loop preheader: //================ for.body.lr.ph: ; preds = %entry

similar to: [LLVMdev] Trip count and Loop Vectorizer