Hi all,
I'm currently trying to figure out the best way to pass vector of
booleans to other functions. Take this small example:
define <4 x float> @vcmp_add(<4 x float> %a, <4 x float> %b) {
entry:
%cmp = fcmp olt <4 x float> %a, %b
%add = fadd <4 x float> %a, %b
%sel = select <4 x i1> %cmp, <4 x float> %add, <4 x float>
%a
ret <4 x float> %sel
}
I will get (on SSE):
movaps %xmm0, %xmm2
cmpltps %xmm1, %xmm0
addps %xmm2, %xmm1
blendvps %xmm1, %xmm2
movaps %xmm2, %xmm0
ret
great :)
But now, let us try to pass a mask to a function.
define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float>
%a, <4 x float> %b) {
entry:
%add = fadd <4 x float> %a, %b
%sel = select <4 x i1> %mask, <4 x float> %add, <4 x float>
%a
ret <4 x float> %sel
}
I will get:
addps %xmm1, %xmm2
pslld $31, %xmm0
blendvps %xmm2, %xmm1
movaps %xmm1, %xmm0
ret
While this is correct and works, I'm unhappy with the pssld. Apparently,
LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the
mask
bit. But blendvps expects the MSB as mask bit and therefore the shift.
OK, let's try better. This time, I will directly use <4 x i32>:
define <4 x float> @masked_add_32(<4 x i32> %mask, <4 x float>
%a, <4 x float> %b)
{
entry:
%add = fadd <4 x float> %a, %b
%trunc = trunc <4 x i32> %mask to <4 x i1>
%sel = select <4 x i1> %trunc, <4 x float> %add, <4 x float>
%a
ret <4 x float> %sel
}
But damn, I have to truncate the mask in order to use the select. So in
the end, LLVM will produce the same code as above. So what code do I
have to use, in order to get rid of the shift?
If there would be a way to somehow tell LLVM that each element of %mask
is guaranteed to be 0xFFFFFFFF or 0x0...
Thanks,
Roland
Hi Roland,
> define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x
float> %a, <4 x float>
%b) {> entry:
> %add = fadd <4 x float> %a, %b
> %sel = select <4 x i1> %mask, <4 x float> %add, <4 x
float> %a
> ret <4 x float> %sel
> }
>
> I will get:
>
> addps %xmm1, %xmm2
> pslld $31, %xmm0
> blendvps %xmm2, %xmm1
> movaps %xmm1, %xmm0
> ret
>
> While this is correct and works, I'm unhappy with the pssld.
Apparently,
> LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds
the mask
> bit. But blendvps expects the MSB as mask bit and therefore the shift.
try plunking a signext attribute on the mask parameter. That's supposed to
tell
the code generators that the caller passed in an all-zero or all-one value.
Ciao, Duncan.
Hi Duncan, thanks for the hint. I tried both variants: define <4 x float> @masked_add_1(<4 x i1> signext %mask, <4 x float> %a, <4 x float> %b) define <4 x float> @masked_add_32(<4 x i32> %mask, <4 x float> %a, <4 x float> %b) Unfortunately, this will raise an assertion: Wrong types for attribute: zeroext signext noalias nocapture sret byval nest Should I file a bug report? -- Roland On Tuesday 26 February 2013 10:02:22 Duncan Sands wrote:> Hi Roland, > > > define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x > > float> > %b) { > > > entry: > > %add = fadd <4 x float> %a, %b > > %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a > > ret <4 x float> %sel > > > > } > > > > I will get: > > > > addps %xmm1, %xmm2 > > pslld $31, %xmm0 > > blendvps %xmm2, %xmm1 > > movaps %xmm1, %xmm0 > > ret > > > > While this is correct and works, I'm unhappy with the pssld. Apparently, > > LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask > > bit. But blendvps expects the MSB as mask bit and therefore the shift. > > try plunking a signext attribute on the mask parameter. That's supposed to > tell the code generators that the caller passed in an all-zero or all-one > value. > > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev