thr3ads.net - search: "splatinsert"

Displaying 20 results from an estimated 22 matches for "splatinsert".

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

...of the vector elements produced by the divide are ugt the spatInstMap. I can't say for sure that we can do better here - I haven't studied our vector canonicalization rules enough - but this seems like something which could possibly be improved. This is interesting: %splatCallS27_D.splatinsert = insertelement <2 x i8*> undef, i8* %call5.i.i, i32 0 %splatCallS27_D.splat = shufflevector <2 x i8*> %splatCallS27_D.splatinsert, <2 x i8*> undef, <2 x i32> zeroinitializer Can't that shuifflevector be replaced with: %splatCallS27_D.splat = insertelement <2...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

...2 x i64> %umul_with_overflow.i.iS26_D = shl <2 x i64> %S25_D, <i64 3, i64 3> %extumul_with_overflow.i.iS26_D = extractelement <2 x i64> %umul_with_overflow.i.iS26_D, i32 1 %call5.i.i = tail call noalias i8* @_Znam(i64 %extumul_with_overflow.i.iS26_D) #22 %splatCallS27_D.splatinsert = insertelement <2 x i8*> undef, i8* %call5.i.i, i32 0 %splatCallS27_D.splat = shufflevector <2 x i8*> %splatCallS27_D.splatinsert, <2 x i8*> undef, <2 x i32> zeroinitializer %bitcastS28_D = bitcast <2 x i8*> %splatCallS27_D.splat to <2 x double*> %extractS...

[LLVMdev] SIMD for sdiv <2 x i64>

2015 Jul 24

[LLVMdev] SIMD for sdiv <2 x i64>

On 07/24/2015 03:42 AM, Benjamin Kramer wrote: >> On 24.07.2015, at 08:06, zhi chen <zchenhn at gmail.com> wrote: >> >> It seems that that it's hard to vectorize int64 in LLVM. For example, LLVM 3.4 generates very complicated code for the following IR. I am running on a Haswell processor. Is it because there is no alternative AVX/2 instructions for int64? The same thing

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...e still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x i64> undef, i64 %19, i32 0 %broadcast.splat2 = shufflevector <4 x i64> %broadcast.splatinsert1, <4 x i64> undef, <4 x i32> zeroinitializer br label %vector.body vector.body: ; preds = %vector.body, %vector.p...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...mp sge i64 %17, %5 br i1 %18, label %L6, label %L5 L5: ; preds = %L4, %L2 %19 = phi i64 [ %17, %L4 ], [ %4, %L2 ] br i1 false, label %middle.block, label %vector.ph vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x i64> undef, i64 %19, i32 0 %broadcast.splat2 = shufflevector <4 x i64> %broadcast.splatinsert1, <4 x i64> undef, <4 x i32> zeroinitializer br label %vector.body vector.body: ; preds = %vector.body, %vector.p...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...18, label %L6, label %L5 > > L5: ; preds = %L4, %L2 > %19 = phi i64 [ %17, %L4 ], [ %4, %L2 ] > br i1 false, label %middle.block, label %vector.ph > > vector.ph: ; preds = %L5 > %broadcast.splatinsert1 = insertelement <4 x i64> undef, i64 %19, i32 0 > %broadcast.splat2 = shufflevector <4 x i64> %broadcast.splatinsert1, <4 x i64> undef, <4 x i32> zeroinitializer > br label %vector.body > > vector.body: ; preds = %vector.b...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...sertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. > > Frank > > > vector.ph: ; preds = %L5 > %broadcast.splatinsert1 = insertelement <4 x i64> undef, i64 %19, i32 0 > %broadcast.splat2 = shufflevector <4 x i64> %broadcast.splatinsert1, <4 x i64> undef, <4 x i32> zeroinitializer > br label %vector.body > > vector.body: ; preds = %vector.b...

Instruction selection problems due to SelectionDAGBuilder

2016 Aug 02

Instruction selection problems due to SelectionDAGBuilder

...ving problems at instruction selection with my back end with the following basic-block due to a vector add with immediate constant vector (obtained by vectorizing a simple C program doing vector sum map): vector.ph: ; preds = %vector.memcheck50 %.splatinsert = insertelement <8 x i64> undef, i64 %i.07.unr, i32 0 %.splat = shufflevector <8 x i64> %.splatinsert, <8 x i64> undef, <8 x i32> zeroinitializer %induction = add <8 x i64> %.splat, <i64 0, i64 1, i64 2, i64 3, i64 4, i64 5, i64 6, i64 7> %...

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

...[r7] vmul.i32 q8, q9, q8 vst1.32 {d16, d17}, [r5] bne .LBB0_2 ** Vectorized IR (just the loop): vector.body: ; preds = %vector.body, % vector.ph %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ] %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0 %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3> %0 = extractelement &l...

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

Hi all, My take on this is that, as you state below, at the IR level we are only roughly estimating cost, at best (or we would have to lower the code and then estimate cost - something we don't want to do). I would propose for estimating the "worst case costs" and see how far we get with this. My rational here is that we don't want vectorization to decrease performance relative

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

...; preds = %vector.body, %vector.ph <http://vector.ph> %index = phi i32 [ 0, %vector.ph <http://vector.ph> ], [ %index.next, %vector.body ] %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph <http://vector.ph> ], [ %14, %vector.body ] %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0 %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3> %0 = extractelement <4 x...

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

...d19}, [r7] > vmul.i32 q8, q9, q8 > vst1.32 {d16, d17}, [r5] > bne .LBB0_2 > > > ** Vectorized IR (just the loop): > > > > > vector.body: ; preds = %vector.body, % vector.ph > %index = phi i32 [ 0, % vector.ph ], [ %index.next, %vector.body ] > %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, > i32 0 > %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 > x i32> undef, <4 x i32> zeroinitializer > %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, > i32 3> &gt...

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

Hi folks, I've been thinking on how to implement some of the costs and there is a lot of instructions which cost depend on other instructions around. Casts are one obvious case, since arithmetic and memory instructions can, sometimes, cast values for free. The cost model receives Opcodes, which lose the info on the history of the values being vectorized, and I thought we could pass the whole

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

...> vst1.32 {d16, d17}, [r5] > bne .LBB0_2 > > ** Vectorized IR (just the loop): > > vector.body: ; preds = %vector.body, %vector.ph > %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ] > %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0 > %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer > %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3> > %0 = ex...

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 18

[LLVMdev] [Vectorization] Mis match in code generated

...; preds = > %vector.body, %vector.ph <http://vector.ph/> %index = phi i32 [ 0, > %vector.ph <http://vector.ph/> ], [ %index.next, %vector.body ] %vec.phi = > phi <4 x i32> [ zeroinitializer, %vector.ph <http://vector.ph/> ], [ %14, > %vector.body ] %broadcast.splatinsert = insertelement <4 x i32> undef, i32 > %index, i32 0 %broadcast.splat = shufflevector <4 x i32> > %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer > %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3> > %0 = ex...

LV: predication

2020 May 21

LV: predication

...j < M; j++) Sum += Input[j] * Input[j+i]; Output[i] = Sum; } We are vectorising the inner-loop and we need to know its BTC. Its loop upperbound M depends on outerloop i, which results in a recursive SCEV expression. %trip.count.minus.1 = sub i32 %1, 1 %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %trip.count.minus.1, i32 0 %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer br label %vector.body vector.body: .. %4 = icmp ule <4 x i32> %induction, %br...

[LLVMdev] [Vectorization] Mis match in code generated

2014 Sep 19

[LLVMdev] [Vectorization] Mis match in code generated

...gt; %vector.body, %vector.ph <http://vector.ph/> %index = phi i32 [ 0, >> %vector.ph <http://vector.ph/> ], [ %index.next, %vector.body ] %vec.phi = >> phi <4 x i32> [ zeroinitializer, %vector.ph <http://vector.ph/> ], [ %14, >> %vector.body ] %broadcast.splatinsert = insertelement <4 x i32> undef, i32 >> %index, i32 0 %broadcast.splat = shufflevector <4 x i32> >> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer >> %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3>...

[LLVMdev] [Vectorization] Mis match in code generated

2014 Nov 10

[LLVMdev] [Vectorization] Mis match in code generated

...; > > %index = phi i32 [ 0, %vector.ph(http://vector.ph/) ], [ > %index.next, %vector.body ] > > > > > > > %vec.phi = phi <4 x i32> [ zeroinitializer, %vector.ph( > http://vector.ph/) ], [ %14, %vector.body ] > > > > > > > %broadcast.splatinsert = insertelement <4 x i32> undef, i32 > %index, i32 0 > > > > > > > %broadcast.splat = shufflevector <4 x i32> > %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer > > > > > > > %induction = add <4 x i32&g...

[EXT] Re: [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

2019 May 24

[EXT] Re: [RFC][SVE] Supporting SIMD instruction sets with variable vector lengths

JinGu: I’m not Graham, but you might find the following link a good starting point. https://community.arm.com/developer/tools-software/hpc/b/hpc-blog/posts/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture The question you ask doesn’t have a short answer. The compiler and the instruction set design work together to allow programs to be compiled without knowing

[RFC] Supporting ARM's SVE in LLVM

2016 Nov 04

[RFC] Supporting ARM's SVE in LLVM

...@SimpleReduction(i32* nocapture readonly %a, i32 %count) #0 { entry: %cmp6 = icmp sgt i32 %count, 0 br i1 %cmp6, label %min.iters.checked, label %for.cond.cleanup min.iters.checked: %0 = add i32 %count, -1 %1 = zext i32 %0 to i64 %2 = add nuw nsw i64 %1, 1 %wide.end.idx.splatinsert = insertelement <n x 4 x i64> undef, i64 %2, i32 0 %wide.end.idx.splat = shufflevector <n x 4 x i64> %wide.end.idx.splatinsert, <n x 4 x i64> undef, <n x 4 x i32> zeroinitializer %3 = icmp ugt <n x 4 x i64> %wide.end.idx.splat, ser...

search for: splatinsert