Hi all,
I'm currently trying to figure out the best way to pass vector of
booleans to other functions. Take this small example:
define <4 x float> @vcmp_add(<4 x float> %a, <4 x float> %b) {
entry:
  %cmp = fcmp olt <4 x float> %a, %b
  %add = fadd <4 x float> %a, %b
  %sel = select <4 x i1> %cmp, <4 x float> %add, <4 x float>
%a
  ret <4 x float> %sel
}
I will get (on SSE):
	movaps	%xmm0, %xmm2
	cmpltps	%xmm1, %xmm0
	addps	%xmm2, %xmm1
	blendvps	%xmm1, %xmm2
	movaps	%xmm2, %xmm0
	ret
great :)
But now, let us try to pass a mask to a function.
define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float>
%a, <4 x float> %b) {
entry:
  %add = fadd <4 x float> %a, %b
  %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float>
%a
  ret <4 x float> %sel
}
I will get:
addps   %xmm1, %xmm2
pslld   $31, %xmm0
blendvps    %xmm2, %xmm1
movaps  %xmm1, %xmm0
ret
While this is correct and works, I'm unhappy with the pssld. Apparently,
LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the
mask
bit. But blendvps expects the MSB as mask bit and therefore the shift.
OK, let's try better. This time, I will directly use <4 x i32>:
define <4 x float> @masked_add_32(<4 x i32> %mask, <4 x float>
%a, <4 x float> %b)
{
entry:
  %add = fadd <4 x float> %a, %b
  %trunc = trunc <4 x i32> %mask to <4 x i1>
  %sel = select <4 x i1> %trunc, <4 x float> %add, <4 x float>
%a
  ret <4 x float> %sel
}
But damn, I have to truncate the mask in order to use the select. So in
the end, LLVM will produce the same code as above. So what code do I
have to use, in order to get rid of the shift? 
If there would be a way to somehow tell LLVM that each element of %mask
is guaranteed to be 0xFFFFFFFF or 0x0...
Thanks,
Roland
Hi Roland,
 > define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x
float> %a, <4 x float>
%b) {> entry:
>    %add = fadd <4 x float> %a, %b
>    %sel = select <4 x i1> %mask, <4 x float> %add, <4 x
float> %a
>    ret <4 x float> %sel
> }
>
> I will get:
>
> addps   %xmm1, %xmm2
> pslld   $31, %xmm0
> blendvps    %xmm2, %xmm1
> movaps  %xmm1, %xmm0
> ret
>
> While this is correct and works, I'm unhappy with the pssld.
Apparently,
> LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds
the mask
> bit. But blendvps expects the MSB as mask bit and therefore the shift.
try plunking a signext attribute on the mask parameter.  That's supposed to
tell
the code generators that the caller passed in an all-zero or all-one value.
Ciao, Duncan.
Hi Duncan, thanks for the hint. I tried both variants: define <4 x float> @masked_add_1(<4 x i1> signext %mask, <4 x float> %a, <4 x float> %b) define <4 x float> @masked_add_32(<4 x i32> %mask, <4 x float> %a, <4 x float> %b) Unfortunately, this will raise an assertion: Wrong types for attribute: zeroext signext noalias nocapture sret byval nest Should I file a bug report? -- Roland On Tuesday 26 February 2013 10:02:22 Duncan Sands wrote:> Hi Roland, > > > define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x > > float> > %b) { > > > entry: > > %add = fadd <4 x float> %a, %b > > %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a > > ret <4 x float> %sel > > > > } > > > > I will get: > > > > addps %xmm1, %xmm2 > > pslld $31, %xmm0 > > blendvps %xmm2, %xmm1 > > movaps %xmm1, %xmm0 > > ret > > > > While this is correct and works, I'm unhappy with the pssld. Apparently, > > LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask > > bit. But blendvps expects the MSB as mask bit and therefore the shift. > > try plunking a signext attribute on the mask parameter. That's supposed to > tell the code generators that the caller passed in an all-zero or all-one > value. > > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Possibly Parallel Threads
- [LLVMdev] passing vector of booleans to functions
- [LLVMdev] passing vector of booleans to functions
- New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
- [PATCH] Make SSE Run Time option. Add Win32 SSE code
- [LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address