thr3ads.net - similar to: "[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops"

Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops"

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Hi Sebpop, Thanks for your explanation. I noticed that Polly would finally run the SROA pass to transform these load/store instructions into scalar operations. Is it possible to run such a pass before polly-dependence analysis? Star Tan At 2013-08-15 21:12:53,"Sebastian Pop" <sebpop at gmail.com> wrote: >Codeprepare and independent blocks are introducing these loads and

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Codeprepare and independent blocks are introducing these loads and stores. These are prepasses that polly runs prior to building the dependence graph to transform scalar dependences into data dependences. Ether was working on eliminating the rewrite of scalar dependences. On Thu, Aug 15, 2013 at 5:32 AM, Star Tan <tanmx_star at yeah.net> wrote: > Hi all, > > I have investigated the

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

I do not think that running SROA before polly is a good idea: it would defeat the purpose of the code preparation passes that polly intentionally schedules for the data dependence analysis. If you remove the data references before polly runs, you would miss them in the dependence graph: that could lead to incorrect transforms. On Thu, Aug 15, 2013 at 7:28 PM, Star Tan <tanmx_star at

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

On 08/15/2013 03:32 AM, Star Tan wrote: > Hi all, Hi, I tried to reproduce your findings, but could not do so. > I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by

[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch

2012 Apr 23

[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch

Hi, When I write various test cases and explore how they're handled by the code in LoopDependenceAnalysis::analysePair, I'm surprised. This loop collects pairs of subscripts from the source and destination refs. * // Collect GEP operand pairs (FIXME: use GetGEPOperands from BasicAA), adding* * // trailing zeroes to the smaller GEP, if needed.* * GEPOpdsTy destOpds, srcOpds;* *

[LLVMdev] [Polly] Assert in Scope construction

2013 Jul 03

[LLVMdev] [Polly] Assert in Scope construction

Should have changed the subject line... --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Sergei Larin > Sent: Wednesday, July 03, 2013 12:29 PM > To: 'Tobias Grosser' > Cc: 'llvmdev'

[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch

2012 Apr 12

[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch

Hi, Here is a preliminary (monolithic) version you can comment on. This is still buggy, however, and I'll be testing for and fixing bugs over the next few days. I've used your version of the strong siv test. Thanks! -- Sanjoy Das. http://playingwithpointers.com -------------- next part -------------- A non-text attachment was scrubbed... Name: patch.diff Type: application/octet-stream

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

Hello, I was trying to trace the Loop vectorizer of the LLVM, I wrote a simple loop with a clear dependency. But found that the debug shows that 'we can vectorize this loop' Here you are my loop with dependency: for(int k=20;k<50;k++) dataY[k] = dataY[k-1]; And the debug prints: LV: Checking a loop in "main" LV: Found a loop: for.body4 LV: Found an

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

2013 Jul 02

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

On 07/01/2013 09:41 AM, Renato Golin wrote: > On 1 July 2013 02:02, Chris Matthews <chris.matthews at apple.com> wrote: > >> One thing that LNT is doing to help “smooth” the results for you is by >> presenting the min of the data at a particular revision, which (hopefully) >> is approximating the actual runtime without noise. >> > > That's an

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

Actually what I meant in my original loop, that there is a dependency between every two consecutive iterations. So, how the loop vectorizer says 'we can vectorize this loop'? for(int k=20;k<50;k++) dataY[k] = dataY[k-1]; From: Henrique Santos [mailto:henrique.nazare.santos at gmail.com] Sent: Sunday, November 03, 2013 4:28 PM To: Sara Elshobaky Cc: <llvmdev at

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

2013 Jul 01

[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

On 1 July 2013 02:02, Chris Matthews <chris.matthews at apple.com> wrote: > One thing that LNT is doing to help “smooth” the results for you is by > presenting the min of the data at a particular revision, which (hopefully) > is approximating the actual runtime without noise. > That's an interesting idea, as you said, if you run multiple times on every revision. On ARM,

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

Notice that the code you provided, for globals and stack allocations, at least, is semantically equivalent to: int a = d[19]; for(int k = 20; k < 50; k++) dataY[k] = a; Like so, the load you see missing was redundant, probably hoisted by GVN/PRE and replaced with "%.pre". H. On Sun, Nov 3, 2013 at 11:26 AM, Sara Elshobaky <sara.elshobaky at gmail.com>wrote: >

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

Hi Sarah, the loop vectorizer runs not on the C code but on LLVM IR this c code was lowered to. Before the loop vectorizer runs many other optimization change the shape of this IR. You can see in the LLVM IR you referenced below, a preceding LLVM IR transformation has change your loop from: > for(int k=20;k<50;k++) > dataY[k] = dataY[k-1]; to > int a = d[19]; >

[LLVMdev] [Polly] Assert in Scope construction

2013 Jul 05

[LLVMdev] [Polly] Assert in Scope construction

Hi Sergei, On Thu, Jul 4, 2013 at 1:36 AM, Sergei Larin <slarin at codeaurora.org> wrote: > Should have changed the subject line... > > --- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by > The Linux Foundation > > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

[LLVMdev] loop vectorizer and storing to uniform addresses

I changed the input C to using a 64 bit type for the loop index (this eliminates 'sext' instructions in the IR) Here the IR produced with clang -O0 define float @foo(i64 %start, i64 %end, float* %A) #0 { entry: %start.addr = alloca i64, align 8 %end.addr = alloca i64, align 8 %A.addr = alloca float*, align 8 %sum = alloca [4 x float], align 16 %i = alloca i64, align 8

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

[LLVMdev] loop vectorizer and storing to uniform addresses

On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote: > LV: We don't allow storing to uniform addresses > This is triggering because it didn't recognize as a reduction variable during the canVectorizeInstrs() but did recognize that sum[q] is loop invariant in canVectorizeMemory(). I'm guessing the nested loop was unrolled because of the low trip-count, and

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

[LLVMdev] loop vectorizer and storing to uniform addresses

I am trying my luck on this global reduction kernel: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { for (int q = 0 ; q < 4 ; ++q ) sum[q] += A[i*4+q]; } return sum[0]+sum[1]+sum[2]+sum[3]; } LV: Checking a loop in "foo" LV: Found a loop: for.cond1 LV: Found an induction variable. LV: We

[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass

2013 Jul 26

[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass

Hi Sebastian, Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows: http://llvm.org/bugs/show_bug.cgi?id=14240 My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop

[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass

2013 Jul 29

[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass

On 07/29/2013 09:15 AM, Sven Verdoolaege wrote: > On Mon, Jul 29, 2013 at 07:37:14AM -0700, Tobias Grosser wrote: >> On 07/29/2013 03:18 AM, Sven Verdoolaege wrote: >>> On Sun, Jul 28, 2013 at 04:42:25PM -0700, Tobias Grosser wrote: >>>> Sven: In terms of making the behaviour of isl easier to understand, >>>> it may make sense to fail/assert in case

similar to: [LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops