thr3ads.net - search: "body3"

2015 Feb 26

6

[LLVMdev] RFC: Loop versioning for LICM

...6 var3[i] = var1[j] + var3[i]; 7 } 8 } 9 } At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars) but because of alias analysis un surety about memory access it unable to move it out. After Loop versioning IR: <Versioned Loop> for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ] %arrayidx.loopVersion = getelementptr inbounds i32* %va...

[LLVMdev] RFC: Loop versioning for LICM

2015 Feb 26

1

[LLVMdev] RFC: Loop versioning for LICM

...} > 8 } > 9 } > > At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars) > but because of alias analysis un surety about memory access it unable to move it out. > > After Loop versioning IR: > > <Versioned Loop> > for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion > %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ] > %arrayidx.loopVersion = getelementptr inbound...

[LLVMdev] sign extensions, SCEVs, and wrap flags

2012 Sep 22

0

[LLVMdev] sign extensions, SCEVs, and wrap flags

...today. *void miv7(int n, int A[][n], int *B) { for (int i = 0; i < n; i++) for (int j = 0; j < n; j++) { A[2*i][4*j] = i; *B++ = A[8*i][6*j + 1]; } }* If I look at the store to A (src) and the load from A (dst), I see the following: *src = ((sext i32 {0,+,4}<%for.body3> to i64) +* * ((zext i32 %n to i64) * (sext i32 {0,+,2}<%for.cond1.preheader> to i64)))* *dst = ((sext i32 {1,+,6}<%for.body3> to i64) +* * ((zext i32 %n to i64) * (sext i32 {0,+,8}<%for.cond1.preheader> to i64)))* I speculate that some earlier analysis decided th...

[LLVMdev] RFC: Loop versioning for LICM

2015 Mar 04

2

[LLVMdev] RFC: Loop versioning for LICM

...} > 8 } > 9 } > > At line #6 store to var3 can be moved out by LICM(promoteLoopAccessesToScalars) > but because of alias analysis un surety about memory access it unable to move it out. > > After Loop versioning IR: > > <Versioned Loop> > for.body3.loopVersion: ; preds = %for.body3.loopVersion.preheader, %for.body3.loopVersion > %indvars.iv.loopVersion = phi i64 [ %indvars.iv.next.loopVersion, %for.body3.loopVersion ], [ %2, %for.body3.loopVersion.preheader ] > %arrayidx.loopVersion = getelementptr inbound...

[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch

2012 Apr 23

0

[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch

Hi, When I write various test cases and explore how they're handled by the code in LoopDependenceAnalysis::analysePair, I'm surprised. This loop collects pairs of subscripts from the source and destination refs. * // Collect GEP operand pairs (FIXME: use GetGEPOperands from BasicAA), adding* * // trailing zeroes to the smaller GEP, if needed.* * GEPOpdsTy destOpds, srcOpds;* *

[LLVMdev] Loss of precision with very large branch weights

2015 Apr 24

5

[LLVMdev] Loss of precision with very large branch weights

...ction 'main': block-frequency-info: main - entry: float = 1.0, int = 8 - for.cond: float = 500001.5, int = 4000011 - for.body: float = 500000.5, int = 4000003 - for.inc: float = 500000.5, int = 4000003 - for.end: float = 1.0, int = 8 - for.cond1: float = 250001.5, int = 2000011 - for.body3: float = 250000.5, int = 2000003 - for.inc4: float = 250000.5, int = 2000003 - for.end6: float = 1.0, int = 8 But if I manually modify the frequencies of both to get close to MAX_INT32, the ratios between the frequencies do not reflect reality. For example, if I change branch_weights in both lo...

LoopVectorize fails to vectorize more complex loops

2018 Jul 07

2

LoopVectorize fails to vectorize more complex loops

...ov[i][j] /= (float_n - 1.0); cov[j][i] = cov[i][j]; } */ } For the first loop I get the following debug info from clang and opt: LV: Checking a loop in "kernel_covariance" from test.c:10:7 LV: Loop hints: force=? width=0 unroll=0 LV: Found a loop: for.body3.us LV: PHI is not a poly recurrence. LV: PHI is not a poly recurrence. LV: Found an unidentified PHI. %2 = phi i16 [ 0, %for.body.us ], [ %add.us, %for.body3.us ], !dbg !48 LV: Can't vectorize the instructions or CFG LV: Not vectorizing: Cannot prove legality. Tha...

[LLVMdev] Loop Vectorization and Store-Load Forwarding issue

2015 Jun 12

4

[LLVMdev] Loop Vectorization and Store-Load Forwarding issue

...lock; i++) { sblock = m + (i << 7); for (j = 16; j < 80; j++) { y[j] = y[j - 2] + y[j - 15] ; } } } Part C: <snip> from the debug dump during the LoopAccessAnalysis phase: LAA: Checking memory dependencies LAA: Src Scev: {(8 + %y),+,8}<%for.body3>Sink Scev: {(128 + %y),+,8}<nsw><%for.body3>(Induction step: 1) LAA: Distance for %3 = load i64, i64* %arrayidx6, align 8 to store i64 %add, i64* %arrayidx8, align 8: 120 LAA: Distance 120 that could cause a store-load forwarding conflict -------------- next part ------------...

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

1

[LLVMdev] loop vectorizer and storing to uniform addresses

...nd8 for.body: ; preds = %for.cond store i64 0, i64* %q, align 8 br label %for.cond1 for.cond1: ; preds = %for.inc, %for.body %4 = load i64* %q, align 8 %cmp2 = icmp slt i64 %4, 4 br i1 %cmp2, label %for.body3, label %for.end for.body3: ; preds = %for.cond1 %5 = load i64* %i, align 8 %mul = mul nsw i64 %5, 4 %6 = load i64* %q, align 8 %add = add nsw i64 %mul, %6 %7 = load float** %A.addr, align 8 %arrayidx = getelementptr inbounds float* %7, i64 %...

[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch

2012 Apr 12

6

[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch

Hi, Here is a preliminary (monolithic) version you can comment on. This is still buggy, however, and I'll be testing for and fixing bugs over the next few days. I've used your version of the strong siv test. Thanks! -- Sanjoy Das. http://playingwithpointers.com -------------- next part -------------- A non-text attachment was scrubbed... Name: patch.diff Type: application/octet-stream

[LLVMdev] Code generators (both llvmc and Jit) get stuck when dealing circular CFG

2010 Apr 04

1

[LLVMdev] Code generators (both llvmc and Jit) get stuck when dealing circular CFG

...; <i32> [#uses=1] %"BrCounter++" = add i32 %12, 1 ; <i32> [#uses=2] store i32 %"BrCounter++", i32* %BranchCounter %13 = icmp ult i32 %"BrCounter++", 10 ; <i1> [#uses=1] br i1 %13, label %Body, label %Exit Body3: ; preds = %Brancher4 %14 = srem i64 40, %1 ; <i64> [#uses=1] %15 = getelementptr inbounds i8* %0, i64 %14 ; <i8*> [#uses=1] %16 = load i8* %15 ; <i8> [#uses=1] %PeekResult...

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

2012 Dec 10

3

[LLVMdev] [PATCH] Teaching ScalarEvolution to handle IV=add(zext(trunc(IV)), Step)

...define signext i8 @kernel() nounwind readnone { +entry: + br label %for.cond1.preheader + +for.cond1.preheader: ; preds = %entry, %for.inc7 + %storemerge7 = phi i32 [ 0, %entry ], [ %inc8, %for.inc7 ] + %0 = phi i32 [ 0, %entry ], [ %add, %for.inc7 ] + br label %for.body3 + +for.body3: ; preds = %for.cond1.preheader, %for.body3 + %storemerge15 = phi i32 [ 0, %for.cond1.preheader ], [ %inc, %for.body3 ] + %1 = phi i32 [ %0, %for.cond1.preheader ], [ %add, %for.body3 ] + %conv2 = and i32 %1, 255 + %add = add nsw i32 %conv2, %...

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

0

[LLVMdev] loop vectorizer and storing to uniform addresses

On 7 November 2013 17:18, Frank Winter <fwinter at jlab.org> wrote: > LV: We don't allow storing to uniform addresses > This is triggering because it didn't recognize as a reduction variable during the canVectorizeInstrs() but did recognize that sum[q] is loop invariant in canVectorizeMemory(). I'm guessing the nested loop was unrolled because of the low trip-count, and

[LLVMdev] How to make Polly ignore some non-affine memory accesses

2011 Nov 01

0

[LLVMdev] How to make Polly ignore some non-affine memory accesses

...1000) B[j] = i; } printf("Random Value: %d", B[rand() % 1024*1024]); return 0; } running: opt -load ${PATH_TO_POLLY_LIB}/LLVMPolly.dylib -polly-scops -analyze code.preopt.ll I get: Printing analysis 'Polly - Create polyhedral description of Scops' for region: 'for.body3 => for.inc.single_exit' in function 'main': Invalid Scop! 0 libLLVM-3.1svn.dylib 0x0000000103fab905 _ZL15PrintStackTracePv + 53 1 libLLVM-3.1svn.dylib 0x0000000103fabf79 _ZL13SignalHandleri + 361 2 libsystem_c.dylib 0x00007fff94c8acfa _sigtramp + 26 3 libLLVM-3.1svn.dylib 0x00...

[LLVMdev] How to make Polly ignore some non-affine memory accesses

2011 Oct 27

2

[LLVMdev] How to make Polly ignore some non-affine memory accesses

Perfect, thank you very much :) 2011/10/26 Tobias Grosser <tobias at grosser.es>: > On 10/24/2011 11:32 PM, Marcello Maggioni wrote: >> >> Strange , with --enable-shared (I use auto tool by the way ...) it gives: >> >> MacBook-Pro-di-Marcello:examples Kariddi$ ./compile_ex.sh >> not_so_simple_loop >> clang (LLVM option parsing): Unknown command line

[LLVMdev] loop vectorizer and storing to uniform addresses

2013 Nov 08

3

[LLVMdev] loop vectorizer and storing to uniform addresses

I am trying my luck on this global reduction kernel: float foo( int start , int end , float * A ) { float sum[4] = {0.,0.,0.,0.}; for (int i = start ; i < end ; ++i ) { for (int q = 0 ; q < 4 ; ++q ) sum[q] += A[i*4+q]; } return sum[0]+sum[1]+sum[2]+sum[3]; } LV: Checking a loop in "foo" LV: Found a loop: for.cond1 LV: Found an induction variable. LV: We

[IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

2016 Apr 20

2

[IndVarSimplify] Narrow IV's are not eliminated resulting in inefficient code

...ine we use: clang++ -mllvm -debug -S -emit-llvm -O3 --target=aarch64-linux-elf indvar_test.cpp -o bad.ir I found that 'WidenIV::widenIVUse' (IndVarSimplify.cpp) may fail to widen narrow IV uses. When the function gets a NarrowUse such as '{(-2 + %inc.lcssa),+,1}<nsw><%for.body3>', it first tries to get a wide recurrence for it via the 'getWideRecurrence' call. 'getWideRecurrence' returns recurrence like this: '{(sext i32 (-2 + %inc.lcssa) to i64),+,1}<nsw><%for.body3>', which is fine by itself. Then a wide use operation is gener...

[LLVMdev] dynamic data dependence extraction using llvm

2014 Dec 11

5

[LLVMdev] dynamic data dependence extraction using llvm

...[j]; a[i][j+2] = x + 1; } } The corresponding simplified llvm-IR is shown in below: *Beginning of simplified llvm-IR* entry: ... store i32 0, i32* %j, align4 br label %for.cond for.cond: ... br ... for.body: store i32 1, i32* %i, align4 br ... for.cond1: ... for.body3: ... %temp4 = load[10 x i32]** %a.addr, align 8 ... store i32 %add, i32* %arrayidx10, align4 br ... ... ... *End of simplified llvm-IR* The general idea to obtain the dynamic data dependence is that 1. get and record corresponding load/store addresses; 2. analyze load/store ad...

[LLVMdev] dynamic data dependence extraction using llvm

2014 Dec 11

2

[LLVMdev] dynamic data dependence extraction using llvm

...store i32 0, i32* %j, align4 > > br label %for.cond > > > > for.cond: > > ... > > br ... > > > > for.body: > > store i32 1, i32* %i, align4 > > br ... > > > > for.cond1: > > ... > > > > for.body3: > > ... > > %temp4 = load[10 x i32]** %a.addr, align 8 > > ... > > store i32 %add, i32* %arrayidx10, align4 > > br ... > > > > ... ... > > *End of simplified llvm-IR* > > > > The general idea to obtain the dynamic data...

[LLVMdev] dynamic data dependence extraction using llvm

2014 Dec 12

2

[LLVMdev] dynamic data dependence extraction using llvm

...store i32 0, i32* %j, align4 > > br label %for.cond > > > > for.cond: > > ... > > br ... > > > > for.body: > > store i32 1, i32* %i, align4 > > br ... > > > > for.cond1: > > ... > > > > for.body3: > > ... > > %temp4 = load[10 x i32]** %a.addr, align 8 > > ... > > store i32 %add, i32* %arrayidx10, align4 > > br ... > > > > ... ... > > *End of simplified llvm-IR* > > > > The general idea to obtain the dynamic data...

search for: body3