thr3ads.net - search: "idxprom"

Reassociation is blocking a vectorization

2019 Nov 10

2

Reassociation is blocking a vectorization

Hi Devs, I am looking at the bug https://bugs.llvm.org/show_bug.cgi?id=43953 and found that following piece of ir %arrayidx = getelementptr inbounds float, float* %Vec0, i64 %idxprom %0 = load float, float* %arrayidx, align 4, !tbaa !2 %arrayidx2 = getelementptr inbounds float, float* %Vec1, i64 %idxprom %1 = load float, float* %arrayidx2, align 4, !tbaa !2 %sub = fsub fast float %0, %1 %add = fadd fast float %sum.0, %sub is transformed into %arrayidx = getelementpt...

[ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls

2017 Aug 08

2

[ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls

...n discussed before, but I just wanted to check if this is indeed the current expected behavior, and if anyone has any plans/ideas for addressing this issue. For reference, below is a reduced loop where this problem occurs. The SCEV for %i.07.i will have <nuw> or not depending on whether %idxprom.i was computed before it: for.body.i: %i.07.i = phi i32 [ %inc.i, %for.body.i ], [ 0, %for.body.i.preheader ] %idxprom.i = zext i32 %i.07.i to i64 %arrayidx.i = getelementptr inbounds i32, i32* %rot, i64 %idxprom.i %0 = load i32, i32* %arrayidx.i %inc.i = add i32 %i.07.i, 1 %cmp...

[LLVMdev] better code for IV

2014 Feb 19

2

[LLVMdev] better code for IV

...ry L_entry: ; preds = %L_entry, %L_pre_head %L_ind_var = phi i64 [ 0, %L_pre_head ], [ %L_inc_ind_var, %L_entry ] %L_tid = phi i64 [ 0, %L_pre_head ], [ %L_inc_tid, %L_entry ] %trunc = trunc i64 %L_tid to i32 %idxprom = sext i32 %trunc to i64 %arrayidx = getelementptr inbounds float* %a, i64 %idxprom %0 = load float* %arrayidx, align 4 %arrayidx2 = getelementptr inbounds float* %b, i64 %idxprom %1 = load float* %arrayidx2, align 4 %add = fadd float %0, %1...

[ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls

2017 Aug 08

2

[ScalarEvolution][SCEV] no-wrap flags dependent on order of getSCEV() calls

...ressing this issue. > > The general issue that SCEV nsw is weird is known... see, for example https://bugs.llvm.org/show_bug.cgi?id=23527. > >> For reference, below is a reduced loop where this problem occurs. The SCEV for %i.07.i will have <nuw> or not depending on whether %idxprom.i was computed before it: > > %idxprom.i, the zext? I'm not sure how you're getting that particular effect. ScalarEvolution::getSCEV for a zext immediately calls getSCEV on its operand. Here is an abridged record of the getSCEV results as seen by each pass with/without preserving...

loop unrolling introduces conditional branch

2015 Aug 22

3

loop unrolling introduces conditional branch

...c, %entry %0 = load i32, i32* %i, align 4 %1 = load i32, i32* %n.addr, align 4 %cmp = icmp slt i32 %0, %1 br i1 %cmp, label %for.body, label %for.end for.body: ; preds = %for.cond %2 = load i32, i32* %i, align 4 %3 = load i32, i32* %i, align 4 %idxprom = sext i32 %3 to i64 %4 = load i32*, i32** %array_x.addr, align 8 %arrayidx = getelementptr inbounds i32, i32* %4, i64 %idxprom store i32 %2, i32* %arrayidx, align 4 br label %for.inc for.inc: ; preds = %for.body %5 = load i32, i32* %i, align 4...

loop unrolling introduces conditional branch

2015 Aug 22

2

loop unrolling introduces conditional branch

...gn 4 > %1 = load i32, i32* %n.addr, align 4 > %cmp = icmp slt i32 %0, %1 > br i1 %cmp, label %for.body, label %for.end > > for.body: ; preds = %for.cond > %2 = load i32, i32* %i, align 4 > %3 = load i32, i32* %i, align 4 > %idxprom = sext i32 %3 to i64 > %4 = load i32*, i32** %array_x.addr, align 8 > %arrayidx = getelementptr inbounds i32, i32* %4, i64 %idxprom > store i32 %2, i32* %arrayidx, align 4 > br label %for.inc > > for.inc: ; preds = %for.body >...

[LLVMdev] Why int variable get promoted to i64

2011 Aug 19

3

[LLVMdev] Why int variable get promoted to i64

Hi, all I found in some cases the int variable get promoted to i64, although I think it should i32. I use the online demo (http://llvm.org/demo). And below is the test case. ------------- test case ------------- int test(int x[], int y[], int n) { int i = 0; int sum = 0; for ( ; i < n; i++) { sum += x[i] * y[i]; } return sum; } ------------------------------------- No

[LLVMdev] Vectorizing global struct pointers

2013 Feb 05

3

[LLVMdev] Vectorizing global struct pointers

...and phi nodes). Does that make sense? cheers, --renato PS: A simplified version of the IR: %struct.anon = type { [256 x i64], [256 x i64], [256 x i64] } @Foo = common global %struct.anon zeroinitializer, align 8 ... %arrayidx = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 1, i32 %idxprom %0 = load i64* %arrayidx, align 8 %arrayidx2 = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 2, i32 %idxprom %1 = load i64* %arrayidx2, align 8 %mul = mul nsw i64 %1, %0 %arrayidx4 = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 0, i32 %idxprom store i64 %mul, i64* %arra...

[AMDGPU] Strange results with different address spaces

2017 Dec 06

2

[AMDGPU] Strange results with different address spaces

...vergence -o - as1.ll Printing analysis 'Divergence Analysis' for function '_ZN5pacxx2v213genericKernelIZL12test_barrieriPPcE3$_0EEvT_': DIVERGENT: %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11 DIVERGENT: %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6 DIVERGENT: %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64 DIVERGENT: %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce0, i64 %idxprom.i.i.i DIVERGENT: %9 = load i32, i32 addrspace(1)* %8, align 4 DIVERGENT: %10 = getelementptr [16 x i32], [16 x i32] addrspace(3)* @"_ZN5pacxx2v213genericKernelIZL12tes...

If there are some passes in LLVM do the opposite of the SROA(Scalar Replacement of Aggregates) pass

2019 Feb 21

2

If there are some passes in LLVM do the opposite of the SROA(Scalar Replacement of Aggregates) pass

Hi LLVM developers, We tried to find if there are some passes in LLVM do the opposite of the SROA(Scalar Replacement of Aggregates) pass, but did not find one. Do we have this kind of pass to bring back the structure type? Or this is done separately in any transformation passes? Thanks, Lin-Ya -------------- next part -------------- An HTML attachment was scrubbed... URL:

Particular type of loop optimization

2016 Feb 02

5

Particular type of loop optimization

Dear LLVMers, I am trying to implement a particular type of loop optimization, but I am having problems with global variables. To solve this problem, I would like to know if LLVM has some pass that moves loads outside loops. I will illustrate with an example. I want to transform this code below. I am writing in C for readability, but I am analysing LLVM IR: int *vectorE; void foo (int n) {

Information Loss of Array Type in Function Interface in IR Generated by Clang

2019 Jun 30

2

Information Loss of Array Type in Function Interface in IR Generated by Clang

...define dso_local i32 @_Z1fPii(i32* nocapture readonly %A, i32 %x) local_unnamed_addr #0 !dbg !7 { entry: call void @llvm.dbg.value(metadata i32* %A, metadata !13, metadata !DIExpression()), !dbg !15 call void @llvm.dbg.value(metadata i32 %x, metadata !14, metadata !DIExpression()), !dbg !16 %idxprom = sext i32 %x to i64, !dbg !17 %arrayidx = getelementptr inbounds i32, i32* %A, i64 %idxprom, !dbg !17 %0 = load i32, i32* %arrayidx, align 4, !dbg !17, !tbaa !18 ret i32 %0, !dbg !22 } Best regards, ------------------------------------------ Tingyuan LIANG MPhil Student Department of Elect...

[AMDGPU] Strange results with different address spaces

2017 Dec 05

2

[AMDGPU] Strange results with different address spaces

...dressing in as1.ll is incorrectly concluded to be uniform: > > %6 = tail call i32 @llvm.amdgcn.workitem.id.x() #0, !range !11 > %7 = tail call i32 @llvm.amdgcn.workgroup.id.x() #0 > %mul.i.i.i.i.i = mul nsw i32 %7, %3 > %add.i.i.i.i.i = add nsw i32 %mul.i.i.i.i.i, %6 > %idxprom.i.i.i = sext i32 %add.i.i.i.i.i to i64 > %8 = getelementptr i32, i32 addrspace(1)* %callable.coerce0, i64 %idxprom.i.i.i, !amdgpu.uniform !12, !amdgpu.noclobber !12 > > However since this depends on workitem.id <http://workitem.id/>.x, it certainly is not > > -Matt Actuall...

[LLVMdev] Vectorizing global struct pointers

2013 Feb 05

0

[LLVMdev] Vectorizing global struct pointers

...t; > PS: > > A simplified version of the IR: > > %struct.anon = type { [256 x i64], [256 x i64], [256 x i64] } > > @Foo = common global %struct.anon zeroinitializer, align 8 > > ... > > %arrayidx = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 1, i32 %idxprom > %0 = load i64* %arrayidx, align 8 > %arrayidx2 = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 2, i32 %idxprom > %1 = load i64* %arrayidx2, align 8 > %mul = mul nsw i64 %1, %0 > %arrayidx4 = getelementptr inbounds %struct.anon* @Foo, i32 0, i32 0, i32 %idxprom > st...

memcmp code fragment

2017 May 19

4

memcmp code fragment

...ck[i2]; if (c1 != c2) return (c1 > c2); i1++; i2++; .. .. <repeat 12 times> In LLVM IR it will be following: define i8 @mainGtU(i32 %i1, i32 %i2, i8* readonly %block, i16* nocapture readnone %quadrant, i32 %nblock, i32* nocapture readnone %budget) local_unnamed_addr #0 { entry: %idxprom = zext i32 %i1 to i64 %arrayidx = getelementptr inbounds i8, i8* %block, i64 %idxprom %0 = load i8, i8* %arrayidx, align 1 %idxprom1 = zext i32 %i2 to i64 %arrayidx2 = getelementptr inbounds i8, i8* %block, i64 %idxprom1 %1 = load i8, i8* %arrayidx2, align 1 %cmp = icmp eq i8 %0, %1 b...

[LLVMdev] SimplifyIndVar looses nsw flags

2013 Jun 25

2

[LLVMdev] SimplifyIndVar looses nsw flags

...sum += a[i]; return sum; } // *** IR Dump Before Induction Variable Simplification *** // for.body: ; preds = %entry, %for.body // %i.05 = phi i32 [ 0, %entry ], [ %inc, %for.body ] // %sum.04 = phi i32 [ 0, %entry ], [ %add, %for.body ] // %idxprom = sext i32 %i.05 to i64 // %arrayidx = getelementptr inbounds i32* %a, i64 %idxprom // %0 = load i32* %arrayidx, align 4, !tbaa !0 // %add = add nsw i32 %0, %sum.04 // %inc = add nsw i32 %i.05, 1 // %cmp = icmp slt i32 %inc, 1000 // br i1 %cmp, label %for.body, label %for.end // *** IR...

Information Loss of Array Type in Function Interface in IR Generated by Clang

2019 Jun 30

2

Information Loss of Array Type in Function Interface in IR Generated by Clang

...define dso_local i32 @_Z1fPii(i32* nocapture readonly %A, i32 %x) local_unnamed_addr #0 !dbg !7 { entry: call void @llvm.dbg.value(metadata i32* %A, metadata !13, metadata !DIExpression()), !dbg !15 call void @llvm.dbg.value(metadata i32 %x, metadata !14, metadata !DIExpression()), !dbg !16 %idxprom = sext i32 %x to i64, !dbg !17 %arrayidx = getelementptr inbounds i32, i32* %A, i64 %idxprom, !dbg !17 %0 = load i32, i32* %arrayidx, align 4, !dbg !17, !tbaa !18 ret i32 %0, !dbg !22 } Best regards, ------------------------------------------ Tingyuan LIANG MPhil Student Department of Elect...

[LLVMdev] Vectorizing global struct pointers

2013 Feb 05

1

[LLVMdev] Vectorizing global struct pointers

On 5 February 2013 17:28, Nadav Rotem <nrotem at apple.com> wrote: > We insert runtime overlap checks only for unidentified objects. The > problem here is that the vectorizer thinks that A,B,C are all pointers to > the same array, so it gives up. If A,B,C were different arrays then it > could have used runtime checks. > Yes, that is exactly the code that creates the

[LLVMdev] alias analysis on llvm internal globals

2015 Apr 25

3

[LLVMdev] alias analysis on llvm internal globals

...oo.init, align 1 br label %if.end if.end: ; preds = %entry.if.end_crit_edge, %if.then %0 = phi i32* [ %.pre, %entry.if.end_crit_edge ], [ getelementptr inbounds ([2048 x i32]* @foo.local_fooBuf, i64 0, i64 0), %if.then ] %div = sdiv i32 %aconst, 2 %idxprom = sext i32 %div to i64 %arrayidx = getelementptr inbounds i32, i32* %0, i64 %idxprom %cmp110 = icmp sgt i32 %aconst, 0 br i1 %cmp110, label %for.body.preheader, label %for.end, !llvm.loop !5 for.body.preheader: ; preds = %if.end br label %for.body for.body:...

Question about a May-alias case

2018 Jun 13

2

Question about a May-alias case

... char *a = buf[3 - idx]; char *b = buf[idx]; *a = *b; c++; *a = *b; } We can imagine the second "*a = *b" could be removed. Let's look at the IR snippet with -O3 for above example. 1 define void @test(i32 %idx) { 2 entry: 3 %sub = sub nsw i32 3, %idx 4 %idxprom = sext i32 %sub to i64 5 %arrayidx = getelementptr inbounds [4 x i8*], [4 x i8*]* @buf, i64 0, i64 %idxprom 6 %0 = load i8*, i8** %arrayidx, align 8 7 %idxprom1 = sext i32 %idx to i64 8 %arrayidx2 = getelementptr inbounds [4 x i8*], [4 x i8*]* @buf, i64 0, i64 %idxprom1 9 ...

search for: idxprom