Displaying 9 results from an estimated 9 matches for "blendvps".
Did you mean:
blendps
2013 Feb 26
2
[LLVMdev] passing vector of booleans to functions
...t> %b) {
entry:
%cmp = fcmp olt <4 x float> %a, %b
%add = fadd <4 x float> %a, %b
%sel = select <4 x i1> %cmp, <4 x float> %add, <4 x float> %a
ret <4 x float> %sel
}
I will get (on SSE):
movaps %xmm0, %xmm2
cmpltps %xmm1, %xmm0
addps %xmm2, %xmm1
blendvps %xmm1, %xmm2
movaps %xmm2, %xmm0
ret
great :)
But now, let us try to pass a mask to a function.
define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x float> %b) {
entry:
%add = fadd <4 x float> %a, %b
%sel = select <4 x i1> %mask, <4 x...
2013 Feb 26
0
[LLVMdev] passing vector of booleans to functions
...%a, <4 x float>
%b) {
> entry:
> %add = fadd <4 x float> %a, %b
> %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a
> ret <4 x float> %sel
> }
>
> I will get:
>
> addps %xmm1, %xmm2
> pslld $31, %xmm0
> blendvps %xmm2, %xmm1
> movaps %xmm1, %xmm0
> ret
>
> While this is correct and works, I'm unhappy with the pssld. Apparently,
> LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask
> bit. But blendvps expects the MSB as mask bit and therefore the shi...
2013 Feb 26
1
[LLVMdev] passing vector of booleans to functions
...add <4 x float> %a, %b
> > %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a
> > ret <4 x float> %sel
> >
> > }
> >
> > I will get:
> >
> > addps %xmm1, %xmm2
> > pslld $31, %xmm0
> > blendvps %xmm2, %xmm1
> > movaps %xmm1, %xmm0
> > ret
> >
> > While this is correct and works, I'm unhappy with the pssld. Apparently,
> > LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask
> > bit. But blendvps expects the MSB as...
2011 Jun 01
4
[LLVMdev] AVX Status?
...a,
<8 x float> %b, i8 1) nounwind readnone
%res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
%a, <8 x float> %b, <8 x float> %cmp) nounwind readnone
ret <8 x float> %res
}
This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS).
On the other hand, this does not work:
define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m)
nounwind readnone {
entry:
%cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a,
<8 x float> %b, i8 1) nounwind readnone...
2011 Jun 02
0
[LLVMdev] AVX Status?
...1) nounwind readnone
> %res = tail call <8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
> %a, <8 x float> %b, <8 x float> %cmp) nounwind readnone
> ret <8 x float> %res
> }
>
> This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS).
>
> On the other hand, this does not work:
>
> define <8 x float> @test2(<8 x float> %a, <8 x float> %b, <8 x i32> %m)
> nounwind readnone {
> entry:
> %cmp = tail call <8 x float> @llvm.x86.avx.cmp.ps.256(<8 x float> %a,
> <8 x fl...
2011 Mar 10
0
[LLVMdev] Vector select/compare support in LLVM
Hey,
I am currently forced to create the BLENDVPS intrinsic as an external
call (via Intrinsic::x86_sse41_blendvps) which has the following
signature (from IntrinsicsX86.td):
def int_x86_sse41_blendvps :
GCCBuiltin<"__builtin_ia32_blendvps">,
Intrinsic<[llvm_v4f32_ty],[llvm_v4f32_ty, llvm_v4f32_ty,
llvm_v4f32_ty],[IntrNoMem]...
2011 Mar 10
2
[LLVMdev] Vector select/compare support in LLVM
After I implemented a new type of legalization (the packing of i1 vectors), I found that x86 does not have a way to load packed masks into SSE registers. So, I guess that legalizing of <4 x i1> to <4 x i32> is the way to go.
Cheers,
Nadav
-----Original Message-----
From: Rotem, Nadav
Sent: Thursday, March 10, 2011 11:04
To: 'David A. Greene'
Cc: llvmdev at cs.uiuc.edu
2011 Jun 03
1
[LLVMdev] AVX Status?
...> %res = tail call<8 x float> @llvm.x86.avx.blendv.ps.256(<8 x float>
>> %a,<8 x float> %b,<8 x float> %cmp) nounwind readnone
>> ret<8 x float> %res
>> }
>>
>> This works fine and produces the expected assembly (VCMPLTPS + VBLENDVPS).
>>
>> On the other hand, this does not work:
>>
>> define<8 x float> @test2(<8 x float> %a,<8 x float> %b,<8 x i32> %m)
>> nounwind readnone {
>> entry:
>> %cmp = tail call<8 x float> @llvm.x86.avx.cmp.ps.256(<8 x...
2011 Jun 03
2
[LLVMdev] AVX Status?
...t; }
>
> That would be nice indeed
Some lowering code would be needed to convert from i1 masks to i8 masks
(the so-called packed vs. sparse mask issue). I don't think I've added
anything to do this as our vectorizer doesn't generate code this way.
>> -> VCMPPS, VANDPS, BLENDVPS
>>
>> Nadav Rotem sent around a patch a few weeks ago in which he implemented
>> codegen for the select for SSE, unfortunately I did not have time to
>> look at it in more depth so far.
>>
>> Can anybody comment on the current status of AVX?
>
> No codegen...