thr3ads.net - search: "blendvps"

Displaying 9 results from an estimated 9 matches for "blendvps".

Did you mean: blendps

[LLVMdev] passing vector of booleans to functions

2013 Feb 26

[LLVMdev] passing vector of booleans to functions

...t> %b) { entry: %cmp = fcmp olt <4 x float> %a, %b %add = fadd <4 x float> %a, %b %sel = select <4 x i1> %cmp, <4 x float> %add, <4 x float> %a ret <4 x float> %sel } I will get (on SSE): movaps %xmm0, %xmm2 cmpltps %xmm1, %xmm0 addps %xmm2, %xmm1 blendvps %xmm1, %xmm2 movaps %xmm2, %xmm0 ret great :) But now, let us try to pass a mask to a function. define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x float> %b) { entry: %add = fadd <4 x float> %a, %b %sel = select <4 x i1> %mask, <4 x...

[LLVMdev] passing vector of booleans to functions

2013 Feb 26

[LLVMdev] passing vector of booleans to functions

...%a, <4 x float> %b) { > entry: > %add = fadd <4 x float> %a, %b > %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a > ret <4 x float> %sel > } > > I will get: > > addps %xmm1, %xmm2 > pslld $31, %xmm0 > blendvps %xmm2, %xmm1 > movaps %xmm1, %xmm0 > ret > > While this is correct and works, I'm unhappy with the pssld. Apparently, > LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask > bit. But blendvps expects the MSB as mask bit and therefore the shi...

[LLVMdev] passing vector of booleans to functions

2013 Feb 26

[LLVMdev] passing vector of booleans to functions

...add <4 x float> %a, %b > > %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a > > ret <4 x float> %sel > > > > } > > > > I will get: > > > > addps %xmm1, %xmm2 > > pslld $31, %xmm0 > > blendvps %xmm2, %xmm1 > > movaps %xmm1, %xmm0 > > ret > > > > While this is correct and works, I'm unhappy with the pssld. Apparently, > > LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask > > bit. But blendvps expects the MSB as...

[LLVMdev] AVX Status?

2011 Jun 01

[LLVMdev] AVX Status?

...a, <8 x float> %b, i8 1) nounwind readnone %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone ret <8 x float> %res } This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). On the other hand, this does not work: define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) nounwind readnone { entry: %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, <8 x float> %b, i8 1) nounwind readnone...

[LLVMdev] AVX Status?

2011 Jun 02

[LLVMdev] AVX Status?

...1) nounwind readnone > %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> > %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone > ret <8 x float> %res > } > > This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). > > On the other hand, this does not work: > > define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m) > nounwind readnone { > entry: > %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a, > <8 x fl...

[LLVMdev] Vector select/compare support in LLVM

2011 Mar 10

[LLVMdev] Vector select/compare support in LLVM

Hey, I am currently forced to create the BLENDVPS intrinsic as an external call (via Intrinsic::x86_sse41_blendvps) which has the following signature (from IntrinsicsX86.td): def int_x86_sse41_blendvps : GCCBuiltin<"__builtin_ia32_blendvps">, Intrinsic<[llvm_v4f32_ty],[llvm_v4f32_ty, llvm_v4f32_ty, llvm_v4f32_ty],[IntrNoMem]...

[LLVMdev] Vector select/compare support in LLVM

2011 Mar 10

[LLVMdev] Vector select/compare support in LLVM

After I implemented a new type of legalization (the packing of i1 vectors), I found that x86 does not have a way to load packed masks into SSE registers. So, I guess that legalizing of <4 x i1> to <4 x i32> is the way to go. Cheers, Nadav -----Original Message----- From: Rotem, Nadav Sent: Thursday, March 10, 2011 11:04 To: 'David A. Greene' Cc: llvmdev at cs.uiuc.edu

[LLVMdev] AVX Status?

2011 Jun 03

[LLVMdev] AVX Status?

...> %res = tail call<8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float> >> %a,<8 x float> %b,<8 x float> %cmp) nounwind readnone >> ret<8 x float> %res >> } >> >> This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS). >> >> On the other hand, this does not work: >> >> define<8 x float> @test2(<8 x float> %a,<8 x float> %b,<8 x i32> %m) >> nounwind readnone { >> entry: >> %cmp = tail call<8 x float> @llvm.x86.avx.cmp.ps.256(<8 x...

[LLVMdev] AVX Status?

2011 Jun 03

[LLVMdev] AVX Status?

...t; } > > That would be nice indeed Some lowering code would be needed to convert from i1 masks to i8 masks (the so-called packed vs. sparse mask issue). I don't think I've added anything to do this as our vectorizer doesn't generate code this way. >> -> VCMPPS, VANDPS, BLENDVPS >> >> Nadav Rotem sent around a patch a few weeks ago in which he implemented >> codegen for the select for SSE, unfortunately I did not have time to >> look at it in more depth so far. >> >> Can anybody comment on the current status of AVX? > > No codegen...

search for: blendvps