thr3ads.net - search: "insertelement"

Displaying 20 results from an estimated 259 matches for "insertelement".

[LLVMdev] Improving SLPVectorizer for Julia

2014 Mar 17

[LLVMdev] Improving SLPVectorizer for Julia

...to vectorize. ------------------------------------------------------------- define <4 x float> @julia_foo111(<4 x float>, <4 x float>) { top: %2 = extractelement <4 x float> %0, i32 0 %3 = extractelement <4 x float> %1, i32 0 %4 = fadd float %2, %3 %5 = insertelement <4 x float> undef, float %4, i32 0 %6 = extractelement <4 x float> %0, i32 1 %7 = extractelement <4 x float> %1, i32 1 %8 = fadd float %6, %7 %9 = insertelement <4 x float> %5, float %8, i32 1 %10 = extractelement <4 x float> %0, i32 2 %11 = extractel...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...%5 br i1 %18, label %L6, label %L5 L5: ; preds = %L4, %L2 %19 = phi i64 [ %17, %L4 ], [ %4, %L2 ] br i1 false, label %middle.block, label %vector.ph vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x i64> undef, i64 %19, i32 0 %broadcast.splat2 = shufflevector <4 x i64> %broadcast.splatinsert1, <4 x i64> undef, <4 x i32> zeroinitializer br label %vector.body vector.body: ; preds = %vector.body, %vector.ph %index = phi...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...label %L5 > > L5: ; preds = %L4, %L2 > %19 = phi i64 [ %17, %L4 ], [ %4, %L2 ] > br i1 false, label %middle.block, label %vector.ph > > vector.ph: ; preds = %L5 > %broadcast.splatinsert1 = insertelement <4 x i64> undef, i64 %19, i32 0 > %broadcast.splat2 = shufflevector <4 x i64> %broadcast.splatinsert1, <4 x i64> undef, <4 x i32> zeroinitializer > br label %vector.body > > vector.body: ; preds = %vector.body, %vector.ph &...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

The instcombine pass cleans up a lot. Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. Frank vector.ph: ; preds = %L5 %broadcast.splatinsert1 = insertelement <4 x...

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

...mul.i32 q8, q9, q8 vst1.32 {d16, d17}, [r5] bne .LBB0_2 ** Vectorized IR (just the loop): vector.body: ; preds = %vector.body, % vector.ph %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ] %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0 %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3> %0 = extractelement <4 x i32> %i...

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

2013 Nov 06

[LLVMdev] loop vectorizer: Unexpected extract/insertelement

...gn 16 %index.next = add i64 %index, 4 %15 = icmp eq i64 %index, 0 br i1 %15, label %middle.block, label %vector.body On Nov 6, 2013, at 8:39 AM, Frank Winter <fwinter at jlab.org> wrote: > The instcombine pass cleans up a lot. > > Any idea why there are still shufflevector, insertelement, *and* bitcast (!!) etc. instructions left? The original loop is so clean, a textbook example I'd say. There is no need to shuffle anything.At least I don't see it. > > Frank > > > vector.ph: ; preds = %L5 > %broadcast.splatinsert...

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

Hi all, My take on this is that, as you state below, at the IR level we are only roughly estimating cost, at best (or we would have to lower the code and then estimate cost - something we don't want to do). I would propose for estimating the "worst case costs" and see how far we get with this. My rational here is that we don't want vectorization to decrease performance relative

[LLVMdev] Bug in InsertElement constant propagation?

2015 Jan 14

[LLVMdev] Bug in InsertElement constant propagation?

Hi, When I run opt on the following LLVM IR: define i32 @foo() { bb0: %0 = bitcast i32 2139171423 to float %1 = insertelement <1 x float> undef, float %0, i32 0 %2 = extractelement <1 x float> %1, i32 0 %3 = bitcast float %2 to i32 ret i32 %3 } -> It generates: define i32 @foo() { bb0: ret i32 2143365727 } While tracking the value I see that the floating point value is changed while folding insertE...

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

...; vmul.i32 q8, q9, q8 > vst1.32 {d16, d17}, [r5] > bne .LBB0_2 > > > ** Vectorized IR (just the loop): > > > > > vector.body: ; preds = %vector.body, % vector.ph > %index = phi i32 [ 0, % vector.ph ], [ %index.next, %vector.body ] > %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, > i32 0 > %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 > x i32> undef, <4 x i32> zeroinitializer > %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, > i32 3> > %0 = extractel...

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

Hi folks, I've been thinking on how to implement some of the costs and there is a lot of instructions which cost depend on other instructions around. Casts are one obvious case, since arithmetic and memory instructions can, sometimes, cast values for free. The cost model receives Opcodes, which lose the info on the history of the values being vectorized, and I thought we could pass the whole

[LLVMdev] Vectorizer using Instruction, not opcodes

2013 Feb 04

[LLVMdev] Vectorizer using Instruction, not opcodes

...vst1.32 {d16, d17}, [r5] > bne .LBB0_2 > > ** Vectorized IR (just the loop): > > vector.body: ; preds = %vector.body, %vector.ph > %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ] > %broadcast.splatinsert = insertelement <4 x i32> undef, i32 %index, i32 0 > %broadcast.splat = shufflevector <4 x i32> %broadcast.splatinsert, <4 x i32> undef, <4 x i32> zeroinitializer > %induction = add <4 x i32> %broadcast.splat, <i32 0, i32 1, i32 2, i32 3> > %0 = extractelement &lt...

[LLVMdev] Bug in InsertElement constant propagation?

2015 Jan 14

[LLVMdev] Bug in InsertElement constant propagation?

...assume we shouldn't be converting APFloat to float in order to avoid such problems? -----Original Message----- From: Jonathan Roelofs [mailto:jonathan at codesourcery.com] Sent: Wednesday, January 14, 2015 9:39 AM To: Raoux, Thomas F; LLVM Developers Mailing List Subject: Re: [LLVMdev] Bug in InsertElement constant propagation? On 1/14/15 9:22 AM, Raoux, Thomas F wrote: > Hi, > > When I run opt on the following LLVM IR: > > define i32 @foo() { > > bb0: > > %0 = bitcast i32 2139171423 to float > > %1 = insertelement <1 x float> undef, float %0, i32 0 &gt...

llvm-stress crash

2017 Mar 14

llvm-stress crash

...ore i8 33, i8* %0 %E = extractelement <8 x i1> zeroinitializer, i32 2 br label %CF261 CF261: ; preds = %BB %Shuff = shufflevector <2 x i16> zeroinitializer, <2 x i16> zeroinitializer, <2 x i32> <i32 undef, i32 3> %I = insertelement <8 x i8> zeroinitializer, i8 69, i32 3 %B = udiv i8 -99, 33 %Tr = trunc i64 -1 to i32 %Sl = select i1 true, i64* %2, i64* %2 %L5 = load i64, i64* %Sl store i64 %L5, i64* %2 %E6 = extractelement <4 x i16> zeroinitializer, i32 3 %Shuff7 = shufflevector <4 x i16> zeroin...

[LLVMdev] Seg faulting on vector ops

2007 Jul 20

[LLVMdev] Seg faulting on vector ops

...> > define float @vSelect3(float %x) { > > body: > > %pv = alloca <4 x float> ; <<4 x float>*> > [#uses=1] > > %v = load <4 x float>* %pv ; <<4 x float>> > [#uses=1] > %v1 = insertelement <4 x float> %v, float %x, i32 > 0 ; <<4 x You are allocating a chunk of memory on the stack then loading the undefined value back. I suppose this should be legal. So perhaps there is a codegen bug. With tot, I see sub $28, %esp. Maybe that's already fixed. B...

[LLVMdev] InsertElementInst and ExtractElementInst

2014 Jul 22

[LLVMdev] InsertElementInst and ExtractElementInst

...3 x i32> vector in LLVM IR. Then I insert 3 instructions and later on I try to load one instruction from the vector. The insertion seems to work, however, when I try to load a specific instruction from a vector I seems that it does not work. This is the part of my IR: %"ins or1" = insertelement <3 x i32> undef, i32 %38, i32 0 %"ins and2" = insertelement <3 x i32> undef, i32 %41, i32 1 %"ins xor3" = insertelement <3 x i32> undef, i32 %43, i32 2 %extract4 = extractelement <3 x i32> undef, i32 %35 ... store i32 %extract4, i32* %46, align 4 The out...

[LLVMdev] Seg faulting on vector ops

2007 Jul 20

[LLVMdev] Seg faulting on vector ops

...In LLVM IR: ; ModuleID = 'test vectors' define float @vSelect3(float %x) { body: %pv = alloca <4 x float> ; <<4 x float>*> [#uses=1] %v = load <4 x float>* %pv ; <<4 x float>> [#uses=1] %v1 = insertelement <4 x float> %v, float %x, i32 0 ; <<4 x float>> [#uses=1] %v2 = insertelement <4 x float> %v1, float %x, i32 1 ; <<4 x float>> [#uses=1] %v3 = insertelement <4 x float> %v2, float %x, i32 2 ; <<4 x float>> [#uses=1]...

[LLVMdev] How to vectorize a vector type cast?

2012 Feb 28

[LLVMdev] How to vectorize a vector type cast?

...ot; and then through "opt -O2 -S", produces the following IR: define <4 x float> @to_float4(i32 %in.coerce) nounwind uwtable readnone { entry: %0 = bitcast i32 %in.coerce to <4 x i8> %1 = extractelement <4 x i8> %0, i32 0 %conv = uitofp i8 %1 to float %vecinit = insertelement <4 x float> undef, float %conv, i32 0 %2 = extractelement <4 x i8> %0, i32 1 %conv2 = uitofp i8 %2 to float %vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1 %3 = extractelement <4 x i8> %0, i32 2 %conv4 = uitofp i8 %3 to float %vecinit5 = inser...

[LLVMdev] Bug in InsertElement constant propagation?

2015 Jan 15

[LLVMdev] Bug in InsertElement constant propagation?

...ntDataVector to allow that? Any hint on what would be the right fix otherwise? Thomas -----Original Message----- From: Jonathan Roelofs [mailto:jonathan at codesourcery.com] Sent: Wednesday, January 14, 2015 10:30 AM To: Raoux, Thomas F; LLVM Developers Mailing List Subject: Re: [LLVMdev] Bug in InsertElement constant propagation? On 1/14/15 11:12 AM, Raoux, Thomas F wrote: > Ha here is what I was missing. Thanks Jon. It still seems to me that the transformation of LLVM IR is invalid is that right? I don't know if IR is required to preserve NaN bit patterns, but ISTM that it would be better if...

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

2020 Jan 11

[RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

...on the input vectors themselves. In the PR you linked, there is an example that shows the difference (simplified to <2 x double> for brevity): define dso_local <2 x double> @test(i64 %a, i64 %b) { entry: %conv = uitofp i64 %a to double %conv1 = uitofp i64 %b to double %vecinit = insertelement <2 x double> undef, double %conv, i32 0 %vecinit2 = insertelement <2 x double> %vecinit, double %conv1, i32 1 ret <2 x double> %vecinit2 } The inputs here are scalars so I suppose it is quite possible (perhaps likely) that on some targets, doing the insert with integers and t...

[LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.

2012 Oct 24

[LLVMdev] RegisterCoalescing Pass seems to ignore part of CFG.

...@llvm.AMDGPU.reserve.reg(i32 0) call void @llvm.AMDGPU.reserve.reg(i32 1) call void @llvm.AMDGPU.reserve.reg(i32 2) call void @llvm.AMDGPU.reserve.reg(i32 3) %1 = call float @llvm.AMDGPU.load.const(i32 0) %2 = bitcast float %1 to i32 %3 = call float @llvm.AMDGPU.load.const(i32 4) %4 = insertelement <4 x float> undef, float %3, i32 0 %5 = call float @llvm.AMDGPU.load.const(i32 5) %6 = insertelement <4 x float> %4, float %5, i32 1 %7 = call float @llvm.AMDGPU.load.const(i32 6) %8 = insertelement <4 x float> %6, float %7, i32 2 %9 = call float @llvm.AMDGPU.load.const(...

search for: insertelement