similar to: [LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

Displaying 20 results from an estimated 5000 matches similar to: "[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops"

2013 Aug 16
2
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Hi Sebpop, Thanks for your explanation. I noticed that Polly would finally run the SROA pass to transform these load/store instructions into scalar operations. Is it possible to run such a pass before polly-dependence analysis? Star Tan At 2013-08-15 21:12:53,"Sebastian Pop" <sebpop at gmail.com> wrote: >Codeprepare and independent blocks are introducing these loads and
2013 Aug 15
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
Codeprepare and independent blocks are introducing these loads and stores. These are prepasses that polly runs prior to building the dependence graph to transform scalar dependences into data dependences. Ether was working on eliminating the rewrite of scalar dependences. On Thu, Aug 15, 2013 at 5:32 AM, Star Tan <tanmx_star at yeah.net> wrote: > Hi all, > > I have investigated the
2013 Aug 16
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
I do not think that running SROA before polly is a good idea: it would defeat the purpose of the code preparation passes that polly intentionally schedules for the data dependence analysis. If you remove the data references before polly runs, you would miss them in the dependence graph: that could lead to incorrect transforms. On Thu, Aug 15, 2013 at 7:28 PM, Star Tan <tanmx_star at
2013 Aug 16
0
[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops
On 08/15/2013 03:32 AM, Star Tan wrote: > Hi all, Hi, I tried to reproduce your findings, but could not do so. > I have investigated the 6X extra compile-time overhead when Polly compiles the simple nestedloop benchmark in LLVM-testsuite. (http://188.40.87.11:8000/db_default/v4/nts/31?compare_to=28&baseline=28). Preliminary results show that such compile-time overhead is resulted by
2012 Apr 23
0
[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch
Hi, When I write various test cases and explore how they're handled by the code in LoopDependenceAnalysis::analysePair, I'm surprised. This loop collects pairs of subscripts from the source and destination refs. * // Collect GEP operand pairs (FIXME: use GetGEPOperands from BasicAA), adding* * // trailing zeroes to the smaller GEP, if needed.* * GEPOpdsTy destOpds, srcOpds;* *
2013 Jul 03
2
[LLVMdev] [Polly] Assert in Scope construction
Should have changed the subject line... --- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Sergei Larin > Sent: Wednesday, July 03, 2013 12:29 PM > To: 'Tobias Grosser' > Cc: 'llvmdev'
2012 Apr 12
6
[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch
Hi, Here is a preliminary (monolithic) version you can comment on. This is still buggy, however, and I'll be testing for and fixing bugs over the next few days. I've used your version of the strong siv test. Thanks! -- Sanjoy Das. http://playingwithpointers.com -------------- next part -------------- A non-text attachment was scrubbed... Name: patch.diff Type: application/octet-stream
2013 Nov 03
2
[LLVMdev] loop vectorizer issue
Hello, I was trying to trace the Loop vectorizer of the LLVM, I wrote a simple loop with a clear dependency. But found that the debug shows that 'we can vectorize this loop' Here you are my loop with dependency: for(int k=20;k<50;k++) dataY[k] = dataY[k-1]; And the debug prints: LV: Checking a loop in "main" LV: Found a loop: for.body4 LV: Found an
2013 Jul 02
0
[LLVMdev] [LNT] Question about results reliability in LNT infrustructure
On 07/01/2013 09:41 AM, Renato Golin wrote: > On 1 July 2013 02:02, Chris Matthews <chris.matthews at apple.com> wrote: > >> One thing that LNT is doing to help “smooth” the results for you is by >> presenting the min of the data at a particular revision, which (hopefully) >> is approximating the actual runtime without noise. >> > > That's an
2013 Nov 03
3
[LLVMdev] loop vectorizer issue
Actually what I meant in my original loop, that there is a dependency between every two consecutive iterations. So, how the loop vectorizer says 'we can vectorize this loop'? for(int k=20;k<50;k++) dataY[k] = dataY[k-1]; From: Henrique Santos [mailto:henrique.nazare.santos at gmail.com] Sent: Sunday, November 03, 2013 4:28 PM To: Sara Elshobaky Cc: <llvmdev at
2013 Jul 01
2
[LLVMdev] [LNT] Question about results reliability in LNT infrustructure
On 1 July 2013 02:02, Chris Matthews <chris.matthews at apple.com> wrote: > One thing that LNT is doing to help “smooth” the results for you is by > presenting the min of the data at a particular revision, which (hopefully) > is approximating the actual runtime without noise. > That's an interesting idea, as you said, if you run multiple times on every revision. On ARM,
2013 Nov 03
0
[LLVMdev] loop vectorizer issue
Notice that the code you provided, for globals and stack allocations, at least, is semantically equivalent to: int a = d[19]; for(int k = 20; k < 50; k++) dataY[k] = a; Like so, the load you see missing was redundant, probably hoisted by GVN/PRE and replaced with "%.pre". H. On Sun, Nov 3, 2013 at 11:26 AM, Sara Elshobaky <sara.elshobaky at gmail.com>wrote: >
2013 Nov 03
0
[LLVMdev] loop vectorizer issue
Hi Sarah, the loop vectorizer runs not on the C code but on LLVM IR this c code was lowered to. Before the loop vectorizer runs many other optimization change the shape of this IR. You can see in the LLVM IR you referenced below, a preceding LLVM IR transformation has change your loop from: > for(int k=20;k<50;k++) > dataY[k] = dataY[k-1]; to > int a = d[19]; >
2013 Jul 05
0
[LLVMdev] [Polly] Assert in Scope construction
Hi Sergei, On Thu, Jul 4, 2013 at 1:36 AM, Sergei Larin <slarin at codeaurora.org> wrote: > Should have changed the subject line... > > --- > Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted > by > The Linux Foundation > > > > -----Original Message----- > > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at
2013 Nov 08
1
[LLVMdev] loop vectorizer and storing to uniform addresses
I changed the input C to using a 64 bit type for the loop index (this eliminates 'sext' instructions in the IR) Here the IR produced with clang -O0 define float @foo(i64 %start, i64 %end, float* %A) #0 { entry: %start.addr = alloca i64, align 8 %end.addr = alloca i64, align 8 %A.addr = alloca float*, align 8 %sum = alloca [4 x float], align 16 %i = alloca i64, align 8
2013 Nov 08
0
[LLVMdev] loop vectorizer and storing to uniform addresses
On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote: > LV: We don't allow storing to uniform addresses > This is triggering because it didn't recognize as a reduction variable during the canVectorizeInstrs() but did recognize that sum[q] is loop invariant in canVectorizeMemory(). I'm guessing the nested loop was unrolled because of the low trip-count, and
2013 Nov 08
3
[LLVMdev] loop vectorizer and storing to uniform addresses
I am trying my luck on this global reduction kernel: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { for (int q = 0 ; q < 4 ; ++q ) sum[q] += A[i*4+q]; } return sum[0]+sum[1]+sum[2]+sum[3]; } LV: Checking a loop in "foo" LV: Found a loop: for.cond1 LV: Found an induction variable. LV: We
2013 Jul 26
6
[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
Hi Sebastian, Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows: http://llvm.org/bugs/show_bug.cgi?id=14240 My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead
2013 Nov 01
2
[LLVMdev] loop vectorizer: this loop is not worth vectorizing
I am trying a setup where the one loop is rewritten as two loops. This avoids the 'rem' and 'div' instructions in the index calculation (which give the loop vectorizer a hard time). However, with this setup the loop vectorizer complains about a too small loop. LV: Checking a loop in "main" LV: Found a loop: L3 LV: Found a loop with a very small trip count. This loop
2013 Jul 29
0
[LLVMdev] [Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass
On 07/29/2013 09:15 AM, Sven Verdoolaege wrote: > On Mon, Jul 29, 2013 at 07:37:14AM -0700, Tobias Grosser wrote: >> On 07/29/2013 03:18 AM, Sven Verdoolaege wrote: >>> On Sun, Jul 28, 2013 at 04:42:25PM -0700, Tobias Grosser wrote: >>>> Sven: In terms of making the behaviour of isl easier to understand, >>>> it may make sense to fail/assert in case