Hey,
I am currently forced to create the BLENDVPS intrinsic as an external
call (via Intrinsic::x86_sse41_blendvps) which has the following
signature (from IntrinsicsX86.td):
def int_x86_sse41_blendvps :
GCCBuiltin<"__builtin_ia32_blendvps">,
Intrinsic<[llvm_v4f32_ty],[llvm_v4f32_ty, llvm_v4f32_ty,
llvm_v4f32_ty],[IntrNoMem]>
Thus, it expects the mask (first operand if i recall correctly) to be a
<4 x float>.
It would be great to have this mirrored in the IR, meaning one should be
able to create a SelectInst with 3 <4 x float> operands which would
generate this intrinsic.
Is there anything that speaks against this?
I think I also recall something similar for ICmp/FCmp instructions...
Best,
Ralf
P.S. I am not up-to-date on the latest status of "direct" support of
vector instructions, the corresponding part of my system has been
written over a year ago.
On 3/10/11 1:44 PM, Rotem, Nadav wrote:> After I implemented a new type of legalization (the packing of i1 vectors),
I found that x86 does not have a way to load packed masks into SSE registers.
So, I guess that legalizing of<4 x i1> to<4 x i32> is the way to
go.
>
> Cheers,
> Nadav
>
> -----Original Message-----
> From: Rotem, Nadav
> Sent: Thursday, March 10, 2011 11:04
> To: 'David A. Greene'
> Cc: llvmdev at cs.uiuc.edu
> Subject: RE: [LLVMdev] Vector select/compare support in LLVM
>
> Hi David,
>
> The MOVMSKPS instruction is cheap (2 cycles). Not to be confused with
VMASKMOV, the AVX masked move, which is expensive.
>
> One of the arguments for packing masks is that it reduces vector-registers
pressure. Auto-vectorizing compilers maintain multiple masks for different
execution paths (for each loop nesting, etc). Saving masks in xmm registers may
result in vector-register pressure which will cause spilling of these registers.
I agree with you that GP registers are also a precious resource.
> I am not sure what is the best way to store masks.
>
> In my private branch, I added the [v4i1 .. v64i1] types. I also implemented
a new type of target lowering: "PACK". This lowering packs vectors of
i1s into integer registers. For example, the<4 x i1> type would get
packed into the i8 type. I modified LegalizeTypes and LegalizeVectorTypes and
added legalization for SETCC, XOR, OR, AND, and BUILD_VECTOR. I also changed
the x86 lowering of SELECT to prevent lowering of selects with vector condition
operand. Next, I am going to add new patterns for SETCC and SELECT which use
i8/i16/i32/i64 as a condition value.
>
> I also plan to experiment with promoting<4 x i1> to<4 x i32>.
At this point I can't really say what needs to be done. Implementing this
kind of promotion also requires adding legalization support for strange vector
types such as<4 x i65>.
>
> -Nadav
>
>
>
> -----Original Message-----
> From: David A. Greene [mailto:greened at obbligato.org]
> Sent: Wednesday, March 09, 2011 21:59
> To: Rotem, Nadav
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] Vector select/compare support in LLVM
>
> "Rotem, Nadav"<nadav.rotem at intel.com> writes:
>
>> I can think of two ways to represent masks in x86: sparse and
>> packed. In the sparse method, the masks are kept in<4 x 32bit>
>> registers, which are mapped to xmm registers. This is the ‘native’ way
>> of using masks.
>
> This argues for the sparse representation, I think.
>
>> _Sparse_ After my discussion with Duncan, last week, I started working
>> on the promotion of type<4 x i1> to<4 x i32>, and I ran
into a
>> problem. It looks like the codegen term ‘promote’ is overloaded.
>
> Heavily. :-/
>
>> For scalars, the ‘promote’ operation converts scalars to larger
>> bit-width scalars. For vectors, the ‘promote’ operation widens the
>> vector to the next power of two. This is reasonable for types such as
>> ‘<3 x float>’. Maybe we need to add another legalization
operation which
>> will mean widening the vectors?
>
> You mean widening the element type, correct? Yes, that's definitely a
> useful concept.
>
>> In any case, I estimated that implementing this per-element promotion
>> would require major changes and decided that this is not the way to
>> go.
>
> What major changes? I think this will end up giving much better code in
> the end. The pack/unpack operations could be very expensive.
>
> There is another huge cost in using GPRs to hold masks. There will be
> fewer GPRs to hold addresses, which is a precious resource. We should
> avoid doing anything that uses more of that resource unnecessarily.
>
> -Dave
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev