thr3ads.net - similar to: "[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce"

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

2013 Jan 29

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

On Tue, Jan 29, 2013 at 3:59 PM, Murali, Sriram <sriram.murali at intel.com> wrote: > Our benchmark results show that the compilation time performance improved by > ~0.5%. That's fairly small; what was the standard deviation, confidence interval, etc? -- Sean Silva

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

2013 Jan 30

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

The compilation time is measured for different benchmarks while compiling a .bc file into a shared object. The improvement across the range of benchmarks is listed in following table. If the reason behind the need for other performance metrics is to identify possible measurement errors, then I think this table would be of some help. However, we do not have the standard deviation and confidence

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

2013 Feb 01

[LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce

Sriram, This patch looks good. Please commit. ...and thanks for the data. -Andy On Jan 29, 2013, at 12:59 PM, "Murali, Sriram" <sriram.murali at intel.com> wrote: > Hello, > This patch aims to improve compile time performance by increasing the SCEV vector size in LoopStrengthReduce. It is observed that the BaseRegs vector size is 4 in most cases, and elements are

[LLVMdev] Improving the usability of LNT

2013 Apr 30

[LLVMdev] Improving the usability of LNT

Hi Daniel, I made some changes to the LNT perf reporting tool to make it more user friendly by adding some features: 1. Make the sidebar and the navigation bar stationary, so that it is easy to navigate the site 2. Have the pop-down menu for the items in the navigation bar, activate upon hovering the mouse, rather than clicking the item 3. Add a nav-link in the sidebar for the

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi, I am trying to get a small loop to *not vectorize* for cases where it doesn't make sense. For instance, this loop: void foo(int a[4][8], int n) { int b[4][8]; for(int i = 0; i < 4; i++) { for(int j = 0; j < n; j++) { a[i][j] = b[i][j]; } } } * Has maximum of 8ints copy. LLVM tries to use Memcpy for the inner loop. It is not helpful to perform

[LLVMdev] Disabling x87 instructions for a sub-target

2012 Apr 04

[LLVMdev] Disabling x87 instructions for a sub-target

Hello there, I recently started working on the LLVM backend for a target that doesn't support x87 instructions. Currently, I am in the process of completely disabling some x87 instructions such as fcomi, fcompi,... for a specific sub-target. I also do not have SSE enabled for my sub-target, and llvm resorts to fcomi* instructions for FP compare instructions. Is there a way to bypass the

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Nadav, Thanks for the response. I forgot to mention that there is an upper limit of 16 for the Trip Count check, TinyTripCountVectorThreshold = 16; if (TC > 0u && TC < TinyTripCountVectorThreshold). So right now, any loop with Trip Count as 0, or with value >=16, LV with unroll. With the change to the lower bound, it will also include the loop with 0 trip count. SCEV returns 0

[LLVMdev] Improving the usability of LNT

2013 May 02

[LLVMdev] Improving the usability of LNT

Wow, that sounds great! Thanks for working on this, and yes, please, send the patches! --renato On 30 April 2013 16:23, Murali, Sriram <sriram.murali at intel.com> wrote: > Hi Daniel,**** > > I made some changes to the LNT perf reporting tool to make it more user > friendly by adding some features:**** > > **1. **Make the sidebar and the navigation bar stationary,

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hi Sriram, Thanks for performing this analysis. The problem here, both for memcpy and the vectorizer, is that we can’t predict the size of “n”, even though the only use of ’n’ is for the loop bound for the alloca [4 x [8 x i32]]. If you change the unroll condition to TC >= 0 then you will disable loop unrolling for all loops because getSmallConstantTripCount returns an unsigned number. You

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

On Sep 27, 2013, at 12:47 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote: > so you could infer that n must be smaller than 8 (because you know the range of the other dimension). The question is how often does such an example occur, where this is possible, to make such an effort justifiable? smaller equal, of course ;)

[LLVMdev] Disabling x87 instructions for a sub-target

2012 Apr 04

[LLVMdev] Disabling x87 instructions for a sub-target

Hi Sriram, I'm not sure if I understand your question correctly: Do you need to generate code that contains no x87 floating-point instructions altogether, but uses calls into a soft-float library instead? That behaviour can be enabled using the "-soft-float" flag, as far as I know. Or is it only about the fcomi* instructions, which are not supported by pre-Pentium Pro chips? Then I

[LLVMdev] Trip count and Loop Vectorizer

2013 Sep 27

[LLVMdev] Trip count and Loop Vectorizer

Hey Arnold, I have run into this situation many times while benchmarking. I think it is best if this is addressed using a simple heuristic. For that, we need to identify the loop cost and decide if it makes sense to completely unroll the loop, or partially unroll. I am unsure of the optimal way to implement this though. I want to run it by the list to get any ideas floating around :) Thanks

[LLVMdev] State of Loop Unrolling and Vectorization in LLVM

2013 Apr 15

[LLVMdev] State of Loop Unrolling and Vectorization in LLVM

Hi , I have a test case (and a micro benchmark made out of the test case) to check if loop unrolling and loop vectorization is efficiently done on LLVM. Here is the test case (credits: Tyler Nowicki) {code} extern float * array; extern int array_size; float g() { int i; float total = 0; for(i = 0; i < array_size; i++) { total += array[i]; } return total; } {code} When

RFC for a design change in LoopStrengthReduce / ScalarEvolution

2015 Aug 18

RFC for a design change in LoopStrengthReduce / ScalarEvolution

> Of course, and the point is that, for example, on x86_64, the zext here is free. I'm still trying to understand the problem... > > In the example you provided in your previous e-mail, we choose the solution: > > `GEP @Global, zext(V)` -> `GEP (@Global + zext VStart), {i64 0,+,1}` > `V` -> `trunc({i64 0,+,1}) + VStart` > > instead of the actually-better

[LLVMdev] Compiler error: LoopStrengthReduce.cpp

2009 May 12

[LLVMdev] Compiler error: LoopStrengthReduce.cpp

The error given: ..\..\..\..\trunk\lib\Transforms\Scalar\LoopStrengthReduce.cpp(1016) : error C2668: 'abs' : ambiguous call to overloaded function f:\Program Files\Microsoft Visual Studio 8\VC\include\math.h(539): could be 'long double abs(long double)' f:\Program Files\Microsoft Visual Studio 8\VC\include\math.h(491): or 'float abs(float)'

[LLVMdev] Compiler error: LoopStrengthReduce.cpp

2009 May 13

[LLVMdev] Compiler error: LoopStrengthReduce.cpp

On Tue, May 12, 2009 at 5:01 PM, Dale Johannesen <dalej at apple.com> wrote: > > On May 12, 2009, at 3:09 PMPDT, OvermindDL1 wrote: > >> The error given: >> >> ..\..\..\..\trunk\lib\Transforms\Scalar\LoopStrengthReduce.cpp(1016) : >> error C2668: 'abs' : ambiguous call to overloaded function >> >> It should be rather obvious from the

[LLVMdev] Compiler error: LoopStrengthReduce.cpp

2009 May 12

[LLVMdev] Compiler error: LoopStrengthReduce.cpp

On May 12, 2009, at 3:09 PMPDT, OvermindDL1 wrote: > The error given: > > ..\..\..\..\trunk\lib\Transforms\Scalar\LoopStrengthReduce.cpp(1016) : > error C2668: 'abs' : ambiguous call to overloaded function > > It should be rather obvious from the message. The error is in > LoopStrengthReduce.cpp on line 1016: > (unsigned(abs(SInt)) < SSInt || (SInt %

[LLVMdev] Compiler error: LoopStrengthReduce.cpp

2009 May 13

[LLVMdev] Compiler error: LoopStrengthReduce.cpp

On May 12, 2009, at 5:01 PMPDT, OvermindDL1 wrote: > On Tue, May 12, 2009 at 5:01 PM, Dale Johannesen <dalej at apple.com> > wrote: >> >> On May 12, 2009, at 3:09 PMPDT, OvermindDL1 wrote: >> >>> The error given: >>> >>> ..\..\..\..\trunk\lib\Transforms\Scalar >>> \LoopStrengthReduce.cpp(1016) : >>> error C2668:

RFC for a design change in LoopStrengthReduce / ScalarEvolution

2015 Aug 17

RFC for a design change in LoopStrengthReduce / ScalarEvolution

This is related to an issue in loop strength reduction [1] that I've been trying to fix on and off for a while. [1] has a more detailed description of the issue and an example, but briefly put, I want LSR to consider formulae that have "Zext T" as base and/or scale registers, and to appropriately rate such formulae. My first attempt[2] at fixing this was buggy and had to be

LoopStrengthReduce.cpp

2016 Mar 28

LoopStrengthReduce.cpp

Hi, I am looking for a way to rewrite induction variables to use an addition of -1 whenever possible (and not otherwise unprofitable). This is needed to utilize hardware loop instructions, which are present on SystemZ (branch on count). Later in the backend, an 'add -1; compare w/ 0; jne 0'-sequence can be replaced with a brct instruction. I could not find any way in the LSR pass to

similar to: [LLVMdev] [Patch][Review Requested][Compilation Time] Avoid frequent copy of elements in LoopStrengthReduce