thr3ads.net - search: "body4"

Displaying 20 results from an estimated 21 matches for "body4".

Did you mean: body

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

...i * 2 + 1 ) * inner + q; c[ ir0 ] = a[ ir0 ] + b[ ir0 ]; c[ ir1 ] = a[ ir1 ] + b[ ir1 ]; } } } the loop vectorizer complains as well, but the produced code is vectorized: LV: Checking a loop in "_Z3barmmPfS_S_" LV: Found a loop: for.body4 LV: Found an induction variable. LV: Found unvectorizable type. LV: Can't vectorize the instructions or CFG LV: Not vectorizing. ; Function Attrs: nounwind uwtable define void @_Z3barmmPfS_S_(i64 %start, i64 %end, float* noalias %c, float* noalias %a, float* noalias %b) #3 { entry: %div =...

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

...imple loop with a clear dependency. But found that the debug shows that 'we can vectorize this loop' Here you are my loop with dependency: for(int k=20;k<50;k++) dataY[k] = dataY[k-1]; And the debug prints: LV: Checking a loop in "main" LV: Found a loop: for.body4 LV: Found an induction variable. LV: Found a write-only loop! LV: We can vectorize this loop! ... LV: Vectorization is possible but not beneficial. >From the LLVM IR, it contains only one 'store' instruction with '%.pre'. Seems that no 'load' instruction prevente...

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

2013 Nov 01

[LLVMdev] loop vectorizer: this loop is not worth vectorizing

...= a[ ir0 ] + b[ ir0 ]; > c[ ir1 ] = a[ ir1 ] + b[ ir1 ]; > } > } > } > > the loop vectorizer complains as well, but the produced code is > vectorized: > > LV: Checking a loop in "_Z3barmmPfS_S_" > LV: Found a loop: for.body4 > LV: Found an induction variable. > LV: Found unvectorizable type. > LV: Can't vectorize the instructions or CFG > LV: Not vectorizing. > > ; Function Attrs: nounwind uwtable > define void @_Z3barmmPfS_S_(i64 %start, i64 %end, float* noalias %c, > float* noalias %a, fl...

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

...ug shows that ‘we can vectorize this loop’ > > > > Here you are my loop with dependency: > > for(int k=20;k<50;k++) > > dataY[k] = dataY[k-1]; > > > > And the debug prints: > > LV: Checking a loop in "main" > > LV: Found a loop: for.body4 > > LV: Found an induction variable. > > LV: Found a write-only loop! > > LV: We can vectorize this loop! > > ... > > LV: Vectorization is possible but not beneficial. > > > > From the LLVM IR, it contains only one ‘store’ instruction with ‘%.pre’. > See...

[LLVMdev] Cast to SCEVAddRecExpr

2015 Mar 19

[LLVMdev] Cast to SCEVAddRecExpr

Hi Nick, Thanks for looking into it. I have tried that as well but it didn't worked. "AddExpr->getOperand(0))" node is: " (4 * (sext i32 {2,+,2}<%for.body4> to i64))<nsw>" When I cast this to "SCEVAddRecExpr" it returns NULL. Regards, Ashutosh -----Original Message----- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Thursday, March 19, 2015 12:19 PM To: Nema, Ashutosh Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev]...

[LLVMdev] Cast to SCEVAddRecExpr

2015 Mar 19

[LLVMdev] Cast to SCEVAddRecExpr

Yes, I can get "SCEVAddRecExpr" from operands of "(sext i32 {2,+,2}<%for.body4> to i64)". So whenever SCEV cast to "SCEVAddRecExpr" fails, we have drill down for such patterns ? Is that the right way ? Regards, Ashutosh -----Original Message----- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Thursday, March 19, 2015 1:02 PM To: Nema, Ashutosh Cc...

[LLVMdev] loop vectorizer issue

2013 Nov 03

[LLVMdev] loop vectorizer issue

...ntically equivalent and beneficial because we safe many loads. We can vectorize the latter loop. You can see in the debug output there is no load in the loop once the loop vectorizer gets to see it: > And the debug prints: > LV: Checking a loop in "main" > LV: Found a loop: for.body4 > LV: Found an induction variable. > LV: Found a write-only loop! // <<<< Write-only. > LV: We can vectorize this loop! > ... > LV: Vectorization is possible but not beneficial. This is not a bug but a great example of how one optimization can enable another. Best, Ar...

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

...cond22, 0 >> br label %for.cond2.preheader >> for.cond2.preheader: >> %x.019 = phi i32 [ 0, %for.cond2.preheader.lr.ph ], [ %x.1.lcssa, >> %for.inc6 ] >> %a.018 = phi i32 [ 0, %for.cond2.preheader.lr.ph ], [ %inc7, %for.inc6 ] >> br i1 %cmp314, label %for.body4, label %for.inc6 >> for.body4: >> %x.116 = phi i32 [ %inc, %for.body4 ], [ %x.019, %for.cond2.preheader ] >> %b.015 = phi i32 [ %inc5, %for.body4 ], [ 0, %for.cond2.preheader ] >> %inc = add nsw i32 %x.116, 1 >> %inc5 = add nsw i32 %b.015, 1 >> %cmp3 =...

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

...%cmp314 = icmp sgt i32 %cond22, 0 > br label %for.cond2.preheader > for.cond2.preheader: > %x.019 = phi i32 [ 0, %for.cond2.preheader.lr.ph ], [ %x.1.lcssa, > %for.inc6 ] > %a.018 = phi i32 [ 0, %for.cond2.preheader.lr.ph ], [ %inc7, %for.inc6 ] > br i1 %cmp314, label %for.body4, label %for.inc6 > for.body4: > %x.116 = phi i32 [ %inc, %for.body4 ], [ %x.019, %for.cond2.preheader ] > %b.015 = phi i32 [ %inc5, %for.body4 ], [ 0, %for.cond2.preheader ] > %inc = add nsw i32 %x.116, 1 > %inc5 = add nsw i32 %b.015, 1 > %cmp3 = icmp slt i32 %inc5, %con...

[LLVMdev] Cast to SCEVAddRecExpr

2015 Mar 19

[LLVMdev] Cast to SCEVAddRecExpr

Hi, I'm trying to cast one of the SCEV node to "SCEVAddRecExpr". Every time cast return NULL, and I'm unable to do this. SCEV Node: ((4 * (sext i32 {2,+,2}<%for.body4> to i64))<nsw> + %var)<nsw> Casting: const SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(SCEVNode); 'var' is of type float pointer (float*). Without 'sext' it works, but I'm wondering why it not working in above case. I'm not sure, is such casting al...

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

...r.cond2.preheader >>> for.cond2.preheader: >>> %x.019 = phi i32 [ 0, %for.cond2.preheader.lr.ph ], [ %x.1.lcssa, >>> %for.inc6 ] >>> %a.018 = phi i32 [ 0, %for.cond2.preheader.lr.ph ], [ %inc7, %for.inc6 >>> ] >>> br i1 %cmp314, label %for.body4, label %for.inc6 >>> for.body4: >>> %x.116 = phi i32 [ %inc, %for.body4 ], [ %x.019, %for.cond2.preheader ] >>> %b.015 = phi i32 [ %inc5, %for.body4 ], [ 0, %for.cond2.preheader ] >>> %inc = add nsw i32 %x.116, 1 >>> %inc5 = add nsw i32 %b.015, 1...

[LLVMdev] IR optimization pass ideas for backend porting before ISel

2012 Jul 30

[LLVMdev] IR optimization pass ideas for backend porting before ISel

...n>= 1 which r2 stands for arr, r7 stands for the next address, and ElementSize of int type is 4. However, LLVM GEP adopts a rule as N(n) = arr + n * ElementSize, and may produce several instructions to compute the address. Here's IR codes (bubbleSort-O3.ll) generated by Clang -O3: for.body4: %j.018 = phi i32 [ 0, %for.body4.lr.ph ], [ %add.ptr.sum, %for.inc ] %add.ptr.sum = add i32 %j.018, 1 %add.ptr6 = getelementptr inbounds i32* %arr, i32 %add.ptr.sum %1 = load i32* %add.ptr6, align 4, !tbaa !0 and thus generated assembly codes (bubbleSort-mcore-O3.s) by llc -march=mcore a...

[LLVMdev] Cast to SCEVAddRecExpr

2015 Apr 01

[LLVMdev] Cast to SCEVAddRecExpr

...9;d venture a guess that this lets LLVM transform a > sign-extend of an add-rec to an add-rec sign-extends. I agree here, we want LLVM to transform sign-extend of add-rec to add-rec of sign extend in possible cases. Currently I don’t see LLVM is doing this. i.e.: (sext i32 addrec{2,+,2}<%for.body4> to i64) Will convert it to: addrec{(sext i32 '2' to i64), + , (sext i32 '2' to i64)} <%for.body4> If this looks OK, I'll make changes and come up with a patch. With sign-extend (SCEVSignExtendExpr) similarly we have to take care and handle other casts as well (i.e....

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 15

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

...%cmp314 = icmp sgt i32 %cond22, 0 br label %for.cond2.preheader for.cond2.preheader: %x.019 = phi i32 [ 0, %for.cond2.preheader.lr.ph ], [ %x.1.lcssa, %for.inc6 ] %a.018 = phi i32 [ 0, %for.cond2.preheader.lr.ph ], [ %inc7, %for.inc6 ] br i1 %cmp314, label %for.body4, label %for.inc6 for.body4: %x.116 = phi i32 [ %inc, %for.body4 ], [ %x.019, %for.cond2.preheader ] %b.015 = phi i32 [ %inc5, %for.body4 ], [ 0, %for.cond2.preheader ] %inc = add nsw i32 %x.116, 1 %inc5 = add nsw i32 %b.015, 1 %cmp3 = icmp slt i32 %...

[LLVMdev] Cast to SCEVAddRecExpr

2015 Mar 31

[LLVMdev] Cast to SCEVAddRecExpr

...--- From: Nick Lewycky [mailto:nicholas at mxc.ca] Sent: Friday, March 20, 2015 9:51 AM To: Nema, Ashutosh Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Cast to SCEVAddRecExpr Nema, Ashutosh wrote: > Yes, I can get "SCEVAddRecExpr" from operands of "(sext i32 {2,+,2}<%for.body4> to i64)". > > So whenever SCEV cast to "SCEVAddRecExpr" fails, we have drill down for such patterns ? I don't know, what are you planning to do with it? Even if you do drill down and find the addrec inside, is that useful to you? SCEV will already try to hoist out...

[LLVMdev] Help with hazards

2011 Dec 14

[LLVMdev] Help with hazards

...i8*)*)(i8* getelementptr inbounds ([6 x i8]* @.str, i32 0, i32 0)) nounwind %call1 = tail call i32 @clock() nounwind br label %for.cond2.preheader for.cond2.preheader: ; preds = %for.end, %entry %nl.014 = phi i32 [ 0, %entry ], [ %inc8, %for.end ] br label %for.body4 for.body4: ; preds = %for.body4, %for.cond2.preheader %i.013 = phi i32 [ 0, %for.cond2.preheader ], [ %inc.15, %for.body4 ] %arrayidx = getelementptr inbounds [16000 x double]* @Y, i32 0, i32 %i.013 %0 = load double* %arrayidx, align 16, !tbaa !0 %add...

Delinearization validity checks in DependenceAnalysis

2019 May 22

Delinearization validity checks in DependenceAnalysis

...and jam might think it was safe to reorder (unroll and jam) and miscompile the code. Without the delinearization validity checks, the access functions A[i*M + j] and A[i*M + j - 1] would get delinearized as follows: SrcSCEV = {{((4 * %M) + %A)<nsw>,+,(4 * %M)}<%for.body>,+,4}<%for.body4> DstSCEV = {{(-4 + (4 * %M) + %A),+,(4 * %M)}<%for.body>,+,4}<%for.body4> SrcSubscripts: {1,+,1}<%for.body>{0,+,1}<%for.body4> DstSubscripts: {1,+,1}<%for.body>{-1,+,1}<%for.body4> delinearized subscript 0 src = {1,+,1}<%for.body> dst = {1,+,1}<%for....

Delinearization validity checks in DependenceAnalysis

2019 May 16

Delinearization validity checks in DependenceAnalysis

Hello Under the proviso that it's been a while since I looked into any of these things... On 05/15, Bardia Mahjour via llvm-dev wrote: > I also get correct results for my example (for a 64-bit target) if the upper > bounds are changed to unsigned. The reason is simply because clang zero-extends > `m` for address calculations but sign-extends it for the loop upper bound. This >

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

2013 Aug 16

[LLVMdev] [Polly] Analysis of extra compile-time overhead for simple nested loops

...%cmp314 = icmp sgt i32 %cond22, 0 > br label %for.cond2.preheader > for.cond2.preheader: > %x.019 = phi i32 [ 0, %for.cond2.preheader.lr.ph ], [ %x.1.lcssa, %for.inc6 ] > %a.018 = phi i32 [ 0, %for.cond2.preheader.lr.ph ], [ %inc7, %for.inc6 ] > br i1 %cmp314, label %for.body4, label %for.inc6 > for.body4: > %x.116 = phi i32 [ %inc, %for.body4 ], [ %x.019, %for.cond2.preheader ] > %b.015 = phi i32 [ %inc5, %for.body4 ], [ 0, %for.cond2.preheader ] > %inc = add nsw i32 %x.116, 1 > %inc5 = add nsw i32 %b.015, 1 > %cmp3 = icmp slt i32 %inc5,...

search for: body4