thr3ads.net - search: "add8"

Displaying 20 results from an estimated 55 matches for "add8".

Did you mean: add

2013 Oct 30

[LLVMdev] loop vectorizer

...ter comparisons. LV: Checking memory dependencies LV: Bad stride - Not an AddRecExpr pointer %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 SCEV: ((4 * %add2)<nsw> + %c)<nsw> LV: Bad stride - Not an AddRecExpr pointer %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 SCEV: ((4 * %add8)<nsw> + %c)<nsw> LV: Src Scev: ((4 * %add2)<nsw> + %c)<nsw>Sink Scev: ((4 * %add8)<nsw> + %c)<nsw>(Induction step: 0) LV: Distance for store float %add10, float* %arrayidx11, align 4 to store float %add14, float* %arrayidx15, align 4: ((4...

[LLVMdev] optimizer returning wrong variable?

2011 Jul 05

[LLVMdev] optimizer returning wrong variable?

...% Entry store i32 0, i32* %y br label %Repeat2 Repeat2: ; preds = %Until2, % Repeat1 %x3 = load i32* %x %y4 = load i32* %y %add = add i32 %x3, %y4 %c5 = load i32* %c %add6 = add i32 %c5, %add store i32 %add6, i32* %c %y7 = load i32* %y %add8 = add i32 %y7, 1 store i32 %add8, i32* %y br label %Until2 Until2: ; preds = %Repeat2 %y9 = load i32* %y %b10 = load i32* %b2 %cond = icmp sge i32 %y9, %b10 br i1 %cond, label %Repeat2, label %Untilcmp2 Untilcmp2:...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...oremerge10 = phi i64 [ %inc, %for.body ], [ %start, %entry ] > %div = lshr i64 %storemerge10, 2 > %mul1 = shl i64 %div, 3 > %rem = and i64 %storemerge10, 3 > %add2 = or i64 %mul1, %rem > %0 = lshr i64 %storemerge10, 1 > %add51 = shl i64 %0, 2 > %mul6 = or i64 %rem, %add51 > %add8 = or i64 %mul6, 4 > %arrayidx = getelementptr inbounds float* %a, i64 %add2 > %1 = load float* %arrayidx, align 4 > %arrayidx9 = getelementptr inbounds float* %b, i64 %add2 > %2 = load float* %arrayidx9, align 4 > %add10 = fadd float %1, %2 > %arrayidx11 = getelementptr inbounds f...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...i1 %cmp14, label %for.body, label %for.end for.body: ; preds = %entry, %for.body %i.015 = phi i64 [ %inc, %for.body ], [ %start, %entry ] %div = lshr i64 %i.015, 2 %mul = shl i64 %div, 3 %rem = and i64 %i.015, 3 %add2 = or i64 %mul, %rem %add8 = or i64 %add2, 4 %arrayidx = getelementptr inbounds float* %a, i64 %add2 %0 = load float* %arrayidx, align 4 %arrayidx9 = getelementptr inbounds float* %b, i64 %add2 %1 = load float* %arrayidx9, align 4 %add10 = fadd float %0, %1 %arrayidx11 = getelementptr inbounds float* %c, i6...

[LLVMdev] optimizer returning wrong variable?

2011 Jul 05

[LLVMdev] optimizer returning wrong variable?

...il1, %Entry > store i32 0, i32* %y > br label %Repeat2 > > Repeat2: ; preds = %Until2, %Repeat1 > %x3 = load i32* %x > %y4 = load i32* %y > %add = add i32 %x3, %y4 > %c5 = load i32* %c > %add6 = add i32 %c5, %add > store i32 %add6, i32* %c > %y7 = load i32* %y > %add8 = add i32 %y7, 1 > store i32 %add8, i32* %y > br label %Until2 > > Until2: ; preds = %Repeat2 > %y9 = load i32* %y > %b10 = load i32* %b2 > %cond = icmp sge i32 %y9, %b10 > br i1 %cond, label %Repeat2, label %Untilcmp2 > > Untilcmp2: ; preds = %Until2 > %x11 = load...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...ns. > LV: Checking memory dependencies > LV: Bad stride - Not an AddRecExpr pointer %arrayidx11 = getelementptr inbounds float* %c, i64 %add2 SCEV: ((4 * %add2)<nsw> + %c)<nsw> > LV: Bad stride - Not an AddRecExpr pointer %arrayidx15 = getelementptr inbounds float* %c, i64 %add8 SCEV: ((4 * %add8)<nsw> + %c)<nsw> > LV: Src Scev: ((4 * %add2)<nsw> + %c)<nsw>Sink Scev: ((4 * %add8)<nsw> + %c)<nsw>(Induction step: 0) > LV: Distance for store float %add10, float* %arrayidx11, align 4 to store float %add14, float* %arrayidx15, align...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

..., label %for.end > > for.body: ; preds = %entry, %for.body > %i.015 = phi i64 [ %inc, %for.body ], [ %start, %entry ] > %div = lshr i64 %i.015, 2 > %mul = shl i64 %div, 3 > %rem = and i64 %i.015, 3 > %add2 = or i64 %mul, %rem > %add8 = or i64 %add2, 4 > %arrayidx = getelementptr inbounds float* %a, i64 %add2 > %0 = load float* %arrayidx, align 4 > %arrayidx9 = getelementptr inbounds float* %b, i64 %add2 > %1 = load float* %arrayidx9, align 4 > %add10 = fadd float %0, %1 > %arrayidx11 = getelementptr inbo...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...%for.body %storemerge10 = phi i64 [ %inc, %for.body ], [ %start, %entry ] %div = lshr i64 %storemerge10, 2 %mul1 = shl i64 %div, 3 %rem = and i64 %storemerge10, 3 %add2 = or i64 %mul1, %rem %0 = lshr i64 %storemerge10, 1 %add51 = shl i64 %0, 2 %mul6 = or i64 %rem, %add51 %add8 = or i64 %mul6, 4 %arrayidx = getelementptr inbounds float* %a, i64 %add2 %1 = load float* %arrayidx, align 4 %arrayidx9 = getelementptr inbounds float* %b, i64 %add2 %2 = load float* %arrayidx9, align 4 %add10 = fadd float %1, %2 %arrayidx11 = getelementptr inbounds float* %c, i6...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

On 30 October 2013 09:25, Nadav Rotem <nrotem at apple.com> wrote: > The access pattern to arrays a and b is non-linear. Unrolled loops are > usually handled by the SLP-vectorizer. Are ir0 and ir1 consecutive for all > values for i ? > Based on his list of values, it seems that the induction stride is linear within each block of 4 iterations, but it's not a clear

[LLVMdev] ScalarEvolution: Suboptimal handling of globals

2014 Nov 28

[LLVMdev] ScalarEvolution: Suboptimal handling of globals

...0; int newInc = incr+1; for (int i = 0; i < 1000; i++) { for (int j = 0; j < 1000; j += incr) { x += (Arr[i] + Arr[j]); } } return x; } ================================= The SCEV expression computed for the variable "j" is %j.0 = phi i32 [ 0, %for.body ], [ %add8, %for.inc ] --> %j.0 Exits: <<Unknown>>As is evident, this isn't a useful computation. Whereas if I use the variable newInc to be the increment for "j", i.e., "j += newInc" in the inner loop, the computed SCEV is %j.0 = phi i32 [ 0, %for.body...

[LLVMdev] loop vectorizer

2013 Oct 30

[LLVMdev] loop vectorizer

...r.body: ; preds = %entry, %for.body >> %i.015 = phi i64 [ %inc, %for.body ], [ %start, %entry ] >> %div = lshr i64 %i.015, 2 >> %mul = shl i64 %div, 3 >> %rem = and i64 %i.015, 3 >> %add2 = or i64 %mul, %rem >> %add8 = or i64 %add2, 4 >> %arrayidx = getelementptr inbounds float* %a, i64 %add2 >> %0 = load float* %arrayidx, align 4 >> %arrayidx9 = getelementptr inbounds float* %b, i64 %add2 >> %1 = load float* %arrayidx9, align 4 >> %add10 = fadd float %0, %1 >> %a...

[LLVMdev] Implementing llvm.atomic.cmp.swap.i32 on PowerPC

2008 Jul 08

[LLVMdev] Implementing llvm.atomic.cmp.swap.i32 on PowerPC

...> The "let Uses = [R0]" is not needed. The pseudo instruction will be > expanded like this later: > > + BuildMI(BB, TII->get(is64bit ? PPC::LDARX : PPC::LWARX), dest) > + .addReg(ptrA).addReg(ptrB); > + BuildMI(BB, TII->get(is64bit ? PPC::ADD4 : PPC::ADD8), PPC::R0) > + .addReg(incr).addReg(dest); > + BuildMI(BB, TII->get(is64bit ? PPC::STDCX : PPC::STWCX)) > + .addReg(PPC::R0).addReg(ptrA).addReg(ptrB); > > The second instruction defines R0 and the 3rd reads R0 which is > enough to tell the register allocator what...

[LLVMdev] SCEV getMulExpr() not propagating Wrap flags

2013 Nov 13

[LLVMdev] SCEV getMulExpr() not propagating Wrap flags

...= load i32* %arrayidx, align 4, !tbaa !1 %add = add nsw i32 %1, %I %arrayidx3 = getelementptr inbounds i32* %a, i64 %0 store i32 %add, i32* %arrayidx3, align 4, !tbaa !1 %2 = or i64 %0, 1 %arrayidx7 = getelementptr inbounds i32* %b, i64 %2 %3 = load i32* %arrayidx7, align 4, !tbaa !1 %add8 = add nsw i32 %3, %I %arrayidx12 = getelementptr inbounds i32* %a, i64 %2 store i32 %add8, i32* %arrayidx12, align 4, !tbaa !1 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp eq i64 %indvars.iv.next, 512 br i1 %exitcond, label %for.end, label %for.body And, when going...

[LLVMdev] Converting into SSA form

2010 Jun 05

[LLVMdev] Converting into SSA form

...; preds = %if.else, %if.then %a.0 = phi i32 [ 6, %if.then ], [ 2, %if.else ] ; <i32> [#uses=1] %b.0 = phi i32 [ 3, %if.then ], [ 4, %if.else ] ; <i32> [#uses=1] %c.0 = phi i32 [ %add, %if.then ], [ %mul, %if.else ] ; <i32> [#uses=1] %add8 = add nsw i32 %c.0, %a.0 ; <i32> [#uses=1] %add11 = add nsw i32 %add8, %b.0 ; <i32> [#uses=1] ret i32 %add11 } In order to preserve the constants' names from the original program, mem2reg would have to insert operations like %a.0 = bitcast i32 2...

[LLVMdev] Converting into SSA form

2010 Jun 05

[LLVMdev] Converting into SSA form

Suppose my Input function is like : myfunc(int x,int y){ int a=2, b=3,c=5; if(x>y) { c=a+b; a=6; } else { c=a*b; b=4; } a=c+a; c=a+b; } and the output should be : myfunc(int x,int y){ int a.0=2, b.0=3,c.0=5; if(x>y) { c.1=a.0+b.0; a.1=6; } else { c.2=a.0*b.0; b.1=4; }

[LLVMdev] Implementing llvm.atomic.cmp.swap.i32 on PowerPC

2008 Jul 08

[LLVMdev] Implementing llvm.atomic.cmp.swap.i32 on PowerPC

...et Uses = [R0]" is not needed. The pseudo instruction will be >> expanded like this later: >> >> + BuildMI(BB, TII->get(is64bit ? PPC::LDARX : PPC::LWARX), dest) >> + .addReg(ptrA).addReg(ptrB); >> + BuildMI(BB, TII->get(is64bit ? PPC::ADD4 : PPC::ADD8), PPC::R0) >> + .addReg(incr).addReg(dest); >> + BuildMI(BB, TII->get(is64bit ? PPC::STDCX : PPC::STWCX)) >> + .addReg(PPC::R0).addReg(ptrA).addReg(ptrB); >> >> The second instruction defines R0 and the 3rd reads R0 which is >> enough to tell the...

[LLVMdev] loop vectorizer says Bad stride

2013 Oct 28

[LLVMdev] loop vectorizer says Bad stride

...%arrayidx4, align 4 %11 = load i32* %i, align 4 %add5 = add nsw i32 256, %11 %idxprom6 = sext i32 %add5 to i64 %12 = load float** %a.addr, align 8 %arrayidx7 = getelementptr inbounds float* %12, i64 %idxprom6 %13 = load float* %arrayidx7, align 4 %14 = load i32* %i, align 4 %add8 = add nsw i32 256, %14 %idxprom9 = sext i32 %add8 to i64 %15 = load float** %b.addr, align 8 %arrayidx10 = getelementptr inbounds float* %15, i64 %idxprom9 %16 = load float* %arrayidx10, align 4 %add11 = fadd float %13, %16 %17 = load i32* %i, align 4 %add12 = add nsw i32 256,...

[LLVMdev] RFC: Loop versioning for LICM

2015 Feb 26

[LLVMdev] RFC: Loop versioning for LICM

...; preds = %for.body3.lr.ph, %for.body3 %indvars.iv = phi i64 [ %indvars.iv.next, %for.body3 ], [ %2, %for.body3.lr.ph ] %arrayidx = getelementptr inbounds i32* %var1, i64 %indvars.iv store i32 %add, i32* %arrayidx, align 4, !tbaa !1 %8 = load i32* %arrayidx7, align 4, !tbaa !1 %add8 = add nsw i32 %8, %add store i32 %add8, i32* %arrayidx7, align 4, !tbaa !1 %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %lftr.wideiv = trunc i64 %indvars.iv to i32 %exitcond = icmp eq i32 %lftr.wideiv, %0 br i1 %exitcond, label %for.inc11, label %for.body3 In versioned loop differen...

[LLVMdev] Implementing llvm.atomic.cmp.swap.i32 on PowerPC

2008 Jul 04

[LLVMdev] Implementing llvm.atomic.cmp.swap.i32 on PowerPC

...MIC_LOAD_ADD_I32 : Pseudo< The "let Uses = [R0]" is not needed. The pseudo instruction will be expanded like this later: + BuildMI(BB, TII->get(is64bit ? PPC::LDARX : PPC::LWARX), dest) + .addReg(ptrA).addReg(ptrB); + BuildMI(BB, TII->get(is64bit ? PPC::ADD4 : PPC::ADD8), PPC::R0) + .addReg(incr).addReg(dest); + BuildMI(BB, TII->get(is64bit ? PPC::STDCX : PPC::STWCX)) + .addReg(PPC::R0).addReg(ptrA).addReg(ptrB); The second instruction defines R0 and the 3rd reads R0 which is enough to tell the register allocator what to do. I do have a questio...

[LLVMdev] loop vectorizer says Bad stride

2013 Oct 28

[LLVMdev] loop vectorizer says Bad stride

...%11 = load i32* %i, align 4 > %add5 = add nsw i32 256, %11 > %idxprom6 = sext i32 %add5 to i64 > %12 = load float** %a.addr, align 8 > %arrayidx7 = getelementptr inbounds float* %12, i64 %idxprom6 > %13 = load float* %arrayidx7, align 4 > %14 = load i32* %i, align 4 > %add8 = add nsw i32 256, %14 > %idxprom9 = sext i32 %add8 to i64 > %15 = load float** %b.addr, align 8 > %arrayidx10 = getelementptr inbounds float* %15, i64 %idxprom9 > %16 = load float* %arrayidx10, align 4 > %add11 = fadd float %13, %16 > %17 = load i32* %i, align 4 > %add1...

search for: add8