Hi all, I'm currently trying to figure out the best way to pass vector of booleans to other functions. Take this small example: define <4 x float> @vcmp_add(<4 x float> %a, <4 x float> %b) { entry: %cmp = fcmp olt <4 x float> %a, %b %add = fadd <4 x float> %a, %b %sel = select <4 x i1> %cmp, <4 x float> %add, <4 x float> %a ret <4 x float> %sel } I will get (on SSE): movaps %xmm0, %xmm2 cmpltps %xmm1, %xmm0 addps %xmm2, %xmm1 blendvps %xmm1, %xmm2 movaps %xmm2, %xmm0 ret great :) But now, let us try to pass a mask to a function. define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x float> %b) { entry: %add = fadd <4 x float> %a, %b %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a ret <4 x float> %sel } I will get: addps %xmm1, %xmm2 pslld $31, %xmm0 blendvps %xmm2, %xmm1 movaps %xmm1, %xmm0 ret While this is correct and works, I'm unhappy with the pssld. Apparently, LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask bit. But blendvps expects the MSB as mask bit and therefore the shift. OK, let's try better. This time, I will directly use <4 x i32>: define <4 x float> @masked_add_32(<4 x i32> %mask, <4 x float> %a, <4 x float> %b) { entry: %add = fadd <4 x float> %a, %b %trunc = trunc <4 x i32> %mask to <4 x i1> %sel = select <4 x i1> %trunc, <4 x float> %add, <4 x float> %a ret <4 x float> %sel } But damn, I have to truncate the mask in order to use the select. So in the end, LLVM will produce the same code as above. So what code do I have to use, in order to get rid of the shift? If there would be a way to somehow tell LLVM that each element of %mask is guaranteed to be 0xFFFFFFFF or 0x0... Thanks, Roland
Hi Roland, > define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x float> %b) {> entry: > %add = fadd <4 x float> %a, %b > %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a > ret <4 x float> %sel > } > > I will get: > > addps %xmm1, %xmm2 > pslld $31, %xmm0 > blendvps %xmm2, %xmm1 > movaps %xmm1, %xmm0 > ret > > While this is correct and works, I'm unhappy with the pssld. Apparently, > LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask > bit. But blendvps expects the MSB as mask bit and therefore the shift.try plunking a signext attribute on the mask parameter. That's supposed to tell the code generators that the caller passed in an all-zero or all-one value. Ciao, Duncan.
Hi Duncan, thanks for the hint. I tried both variants: define <4 x float> @masked_add_1(<4 x i1> signext %mask, <4 x float> %a, <4 x float> %b) define <4 x float> @masked_add_32(<4 x i32> %mask, <4 x float> %a, <4 x float> %b) Unfortunately, this will raise an assertion: Wrong types for attribute: zeroext signext noalias nocapture sret byval nest Should I file a bug report? -- Roland On Tuesday 26 February 2013 10:02:22 Duncan Sands wrote:> Hi Roland, > > > define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x > > float> > %b) { > > > entry: > > %add = fadd <4 x float> %a, %b > > %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a > > ret <4 x float> %sel > > > > } > > > > I will get: > > > > addps %xmm1, %xmm2 > > pslld $31, %xmm0 > > blendvps %xmm2, %xmm1 > > movaps %xmm1, %xmm0 > > ret > > > > While this is correct and works, I'm unhappy with the pssld. Apparently, > > LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask > > bit. But blendvps expects the MSB as mask bit and therefore the shift. > > try plunking a signext attribute on the mask parameter. That's supposed to > tell the code generators that the caller passed in an all-zero or all-one > value. > > Ciao, Duncan. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Possibly Parallel Threads
- [LLVMdev] passing vector of booleans to functions
- [LLVMdev] passing vector of booleans to functions
- New routine: FLAC__lpc_compute_autocorrelation_asm_ia32_sse_lag_16
- [PATCH] Make SSE Run Time option. Add Win32 SSE code
- [LLVMdev] x86-64 backend generates aligned ADDPS with unaligned address