search for: extractelement

Displaying 20 results from an estimated 239 matches for "extractelement".

2014 Mar 17
2
[LLVMdev] Improving SLPVectorizer for Julia
...und in the Julia discussion https://github.com/JuliaLang/julia/issues/5857 . Here is an example of the kind of LLVM code I wish to vectorize. ------------------------------------------------------------- define <4 x float> @julia_foo111(<4 x float>, <4 x float>) { top: %2 = extractelement <4 x float> %0, i32 0 %3 = extractelement <4 x float> %1, i32 0 %4 = fadd float %2, %3 %5 = insertelement <4 x float> undef, float %4, i32 0 %6 = extractelement <4 x float> %0, i32 1 %7 = extractelement <4 x float> %1, i32 1 %8 = fadd float %6, %7...
2015 Jun 26
3
[LLVMdev] extractelement causes memory access violation - what to do?
Hi, Let's have a simple program: define i32 @main(i32 %n, i64 %idx) { %idxSafe = trunc i64 %idx to i5 %r = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, i64 %idx ret i32 %r } The assembly of that would be: pcmpeqd %xmm0, %xmm0 movdqa %xmm0, -24(%rsp) movl -24(%rsp,%rsi,4), %eax retq The language reference states that the extractelement instruction produces undefined value in case the index...
2015 Jun 30
2
[LLVMdev] extractelement causes memory access violation - what to do?
...david.majnemer at gmail.com> wrote: > On Fri, Jun 26, 2015 at 7:00 AM, Paweł Bylica <chfast at gmail.com> wrote: > >> Hi, >> >> Let's have a simple program: >> define i32 @main(i32 %n, i64 %idx) { >> %idxSafe = trunc i64 %idx to i5 >> %r = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, i64 %idx >> ret i32 %r >> } >> >> The assembly of that would be: >> pcmpeqd %xmm0, %xmm0 >> movdqa %xmm0, -24(%rsp) >> movl -24(%rsp,%rsi,4), %eax >> retq >> >> The language referen...
2015 Jun 26
2
[LLVMdev] extractelement causes memory access violation - what to do?
...t; > On Fri, Jun 26, 2015 at 7:00 AM, Paweł Bylica <chfast at gmail.com > <mailto:chfast at gmail.com>> wrote: > > Hi, > > Let's have a simple program: > define i32 @main(i32 %n, i64 %idx) { > %idxSafe = trunc i64 %idx to i5 > %r = extractelement <4 x i32> <i32 -1, i32 -1, i32 -1, i32 -1>, > i64 %idx > ret i32 %r > } > > The assembly of that would be: > pcmpeqd%xmm0, %xmm0 > movdqa%xmm0, -24(%rsp) > movl-24(%rsp,%rsi,4), %eax > retq > > The language reference...
2013 Feb 04
6
[LLVMdev] Vectorizer using Instruction, not opcodes
...dcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0 %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3> %0 = extractelement <4 x i32> %induction, i32 0 %1 = getelementptr inbounds [256 x i32]* %b, i32 0, i32 %0 %2 = insertelement <4 x i32*> undef, i32* %1, i32 0 %3 = extractelement <4 x i32> %induction, i32 1 %4 = getelementptr inbounds [256 x i32]* %b, i32 0, i32 %3 %5 = insertelement <4...
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
....splatinsert, <4 x i64> undef, <4 x i32> zeroinitializer %induction = add <4 x i64> %broadcast.splat, <i64 0, i64 1, i64 2, i64 3> %20 = mul <4 x i64> %broadcast.splat2, <i64 4, i64 4, i64 4, i64 4> %21 = add nsw <4 x i64> %20, %induction %22 = extractelement <4 x i64> %21, i32 0 %23 = getelementptr float* %arg5, i64 %22 %24 = insertelement <4 x float*> undef, float* %23, i32 0 %25 = extractelement <4 x i64> %21, i32 1 %26 = getelementptr float* %arg5, i64 %25 %27 = insertelement <4 x float*> %24, float* %26, i32 1...
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
...t, <4 x i64> undef, <4 x i32> zeroinitializer > %induction = add <4 x i64> %broadcast.splat, <i64 0, i64 1, i64 2, i64 3> > %20 = mul <4 x i64> %broadcast.splat2, <i64 4, i64 4, i64 4, i64 4> > %21 = add nsw <4 x i64> %20, %induction > %22 = extractelement <4 x i64> %21, i32 0 > %23 = getelementptr float* %arg5, i64 %22 > %24 = insertelement <4 x float*> undef, float* %23, i32 0 > %25 = extractelement <4 x i64> %21, i32 1 > %26 = getelementptr float* %arg5, i64 %25 > %27 = insertelement <4 x float*> %24, fl...
2015 Jun 30
2
[LLVMdev] extractelement causes memory access violation - what to do?
...----- > > From: "Paweł Bylica" <chfast at gmail.com> > > To: "David Majnemer" <david.majnemer at gmail.com> > > Cc: "LLVMdev" <llvmdev at cs.uiuc.edu> > > Sent: Tuesday, June 30, 2015 5:42:24 AM > > Subject: Re: [LLVMdev] extractelement causes memory access violation - > what to do? > > > > > > > > > > > > On Fri, Jun 26, 2015 at 5:42 PM David Majnemer < > > david.majnemer at gmail.com > wrote: > > > > > > > > > > > > On Fri, Jun 26, 2015 at 7...
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
Hi all, My take on this is that, as you state below, at the IR level we are only roughly estimating cost, at best (or we would have to lower the code and then estimate cost - something we don't want to do). I would propose for estimating the "worst case costs" and see how far we get with this. My rational here is that we don't want vectorization to decrease performance relative
2013 Nov 06
2
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
....splatinsert, <4 x i64> undef, <4 x i32> zeroinitializer %induction = add <4 x i64> %broadcast.splat, <i64 0, i64 1, i64 2, i64 3> %20 = shl <4 x i64> %broadcast.splat2, <i64 2, i64 2, i64 2, i64 2> %21 = add nsw <4 x i64> %20, %induction %22 = extractelement <4 x i64> %21, i32 0 %23 = getelementptr float* %arg5, i64 %22 %24 = bitcast float* %23 to <4 x float>* %wide.load = load <4 x float>* %24, align 16 %25 = extractelement <4 x i64> %21, i32 0 %26 = getelementptr float* %arg6, i64 %25 %27 = bitcast float* %26...
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
...rtelement <4 x i32> undef, i32 %index, > i32 0 > %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 > x i32> undef, <4 x i32> zeroinitializer > %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, > i32 3> > %0 = extractelement <4 x i32> %induction, i32 0 > %1 = getelementptr inbounds [256 x i32]* %b, i32 0, i32 %0 > %2 = insertelement <4 x i32*> undef, i32* %1, i32 0 > %3 = extractelement <4 x i32> %induction, i32 1 > %4 = getelementptr inbounds [256 x i32]* %b, i32 0, i32 %3 > %5 = inser...
2013 Feb 04
0
[LLVMdev] Vectorizer using Instruction, not opcodes
...the CostAnalysis pass. It estimates the cost itself before it even performs the vectorization. The way it works is that it looks at all the scalar instructions and asks: What is the cost if I execute the scalar instruction as a vector instruction. Therefore, it will not consider any of the insert/extractelement instructions you see below (note, that they will all go away any way). If you run clang++ -O3 -mllvm -debug-only=loop-vectorize you will see which instructions it considers. E.g: LV: Found an estimated cost of 0 for VF 1 For instruction: %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next,...
2015 Jul 01
2
[LLVMdev] extractelement causes memory access violation - what to do?
...Pete Cooper" <peter_cooper at apple.com> > To: "Paweł Bylica" <chfast at gmail.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "LLVMdev" <llvmdev at cs.uiuc.edu> > Sent: Wednesday, July 1, 2015 12:08:37 PM > Subject: Re: [LLVMdev] extractelement causes memory access violation - what to do? > > Sorry for chiming in so late in this. > > So I agree that negative indices are UB, I don’t think thats > contentious. > > However, I think the issue here is the DAG expansion. That is the > point at which we go from UB whic...
2015 Jul 02
2
[LLVMdev] extractelement causes memory access violation - what to do?
...nemer" <david.majnemer at gmail.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "Pete Cooper" <peter_cooper at apple.com>, "LLVMdev" <llvmdev at cs.uiuc.edu> > Sent: Wednesday, July 1, 2015 7:17:19 PM > Subject: Re: [LLVMdev] extractelement causes memory access violation - what to do? > > > On Wed, Jul 1, 2015 at 4:48 PM, Hal Finkel < hfinkel at anl.gov > wrote: > > > ----- Original Message ----- > > From: "Pete Cooper" < peter_cooper at apple.com > > > > > To: "Hal...
2013 Feb 04
2
[LLVMdev] Vectorizer using Instruction, not opcodes
Hi folks, I've been thinking on how to implement some of the costs and there is a lot of instructions which cost depend on other instructions around. Casts are one obvious case, since arithmetic and memory instructions can, sometimes, cast values for free. The cost model receives Opcodes, which lose the info on the history of the values being vectorized, and I thought we could pass the whole
2015 Jul 01
3
[LLVMdev] extractelement causes memory access violation - what to do?
...;Pete Cooper" <peter_cooper at apple.com> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVMdev" <llvmdev at cs.uiuc.edu>, "Paweł Bylica" <chfast at gmail.com> > Sent: Wednesday, July 1, 2015 6:42:41 PM > Subject: Re: [LLVMdev] extractelement causes memory access violation - what to do? > > > > On Jul 1, 2015, at 3:45 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > > > ----- Original Message ----- > >> From: "Pete Cooper" <peter_cooper at apple.com> > >> To: "Paweł B...
2013 Nov 06
0
[LLVMdev] loop vectorizer: Unexpected extract/insertelement
...t, <4 x i64> undef, <4 x i32> zeroinitializer > %induction = add <4 x i64> %broadcast.splat, <i64 0, i64 1, i64 2, i64 3> > %20 = shl <4 x i64> %broadcast.splat2, <i64 2, i64 2, i64 2, i64 2> > %21 = add nsw <4 x i64> %20, %induction > %22 = extractelement <4 x i64> %21, i32 0 > %23 = getelementptr float* %arg5, i64 %22 > %24 = bitcast float* %23 to <4 x float>* > %wide.load = load <4 x float>* %24, align 16 > %25 = extractelement <4 x i64> %21, i32 0 > %26 = getelementptr float* %arg6, i64 %25 > %27 =...
2012 Sep 02
2
[LLVMdev] branch on vector compare?
...nd a slow way (below) which goes through memory. Is there some idiom I'm missing so that it would use for instance movmsk for SSE or vcmpgt & cr6 for altivec? Or do I need to resort to calling the intrinsic directly? Thanks, Stephen. %16 = fcmp ogt <4 x float> %15, %cr %17 = extractelement <4 x i1> %16, i32 0 %18 = extractelement <4 x i1> %16, i32 1 %19 = extractelement <4 x i1> %16, i32 2 %20 = extractelement <4 x i1> %16, i32 3 %21 = or i1 %17, %18 %22 = or i1 %19, %20 %23 = or i1 %21, %22 br i1 %23, label %true1, label %false2
2012 Jan 26
0
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
...bel %for.inc45, label %for.body10 BBV: fusing: %0 = load i8* %r.063, align 1, !tbaa !0 <-> %1 = load i8* %incdec.ptr11, align 1, !tbaa !0 BBV: fusing: %conv14 = zext i8 %2 to i32 <-> %conv15 = zext i8 %3 to i32 BBV: moving: %mul = mul nsw i32 %5, 123 to after %conv14.v.r2 = extractelement <2 x i32> %conv14, i32 1 BBV: fusing: %mul = mul nsw i32 %conv14.v.r1, 123 <-> %mul16 = mul nsw i32 %conv14.v.r2, 321 BBV: fusing: %mul23 = mul nsw i32 %conv14.v.r1, 234 <-> %mul35 = mul nsw i32 %conv14.v.r2, 543 BBV: moving: %add26 = add i32 %mul25, %5 to after %mul23...
2017 Mar 14
3
llvm-stress crash
...uleID = '/tmp/autogen.bc' source_filename = "/tmp/autogen.bc" define void @autogen_SD29355(i8*, i32*, i64*, i32, i64, i8) { BB: %A4 = alloca double %A3 = alloca float %A2 = alloca i8 %A1 = alloca double %A = alloca i64 %L = load i8, i8* %0 store i8 33, i8* %0 %E = extractelement <8 x i1> zeroinitializer, i32 2 br label %CF261 CF261: ; preds = %BB %Shuff = shufflevector <2 x i16> zeroinitializer, <2 x i16> zeroinitializer, <2 x i32> <i32 undef, i32 3> %I = insertelement <8 x i8> zeroinitia...