Hi David, The MOVMSKPS instruction is cheap (2 cycles). Not to be confused with VMASKMOV, the AVX masked move, which is expensive. One of the arguments for packing masks is that it reduces vector-registers pressure. Auto-vectorizing compilers maintain multiple masks for different execution paths (for each loop nesting, etc). Saving masks in xmm registers may result in vector-register pressure which will cause spilling of these registers. I agree with you that GP registers are also a precious resource. I am not sure what is the best way to store masks. In my private branch, I added the [v4i1 .. v64i1] types. I also implemented a new type of target lowering: "PACK". This lowering packs vectors of i1s into integer registers. For example, the <4 x i1> type would get packed into the i8 type. I modified LegalizeTypes and LegalizeVectorTypes and added legalization for SETCC, XOR, OR, AND, and BUILD_VECTOR. I also changed the x86 lowering of SELECT to prevent lowering of selects with vector condition operand. Next, I am going to add new patterns for SETCC and SELECT which use i8/i16/i32/i64 as a condition value. I also plan to experiment with promoting <4 x i1> to <4 x i32>. At this point I can't really say what needs to be done. Implementing this kind of promotion also requires adding legalization support for strange vector types such as <4 x i65>. -Nadav -----Original Message----- From: David A. Greene [mailto:greened at obbligato.org] Sent: Wednesday, March 09, 2011 21:59 To: Rotem, Nadav Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vector select/compare support in LLVM "Rotem, Nadav" <nadav.rotem at intel.com> writes:> I can think of two ways to represent masks in x86: sparse and > packed. In the sparse method, the masks are kept in <4 x 32bit> > registers, which are mapped to xmm registers. This is the ‘native’ way > of using masks.This argues for the sparse representation, I think.> _Sparse_ After my discussion with Duncan, last week, I started working > on the promotion of type <4 x i1> to <4 x i32>, and I ran into a > problem. It looks like the codegen term ‘promote’ is overloaded.Heavily. :-/> For scalars, the ‘promote’ operation converts scalars to larger > bit-width scalars. For vectors, the ‘promote’ operation widens the > vector to the next power of two. This is reasonable for types such as > ‘<3 x float>’. Maybe we need to add another legalization operation which > will mean widening the vectors?You mean widening the element type, correct? Yes, that's definitely a useful concept.> In any case, I estimated that implementing this per-element promotion > would require major changes and decided that this is not the way to > go.What major changes? I think this will end up giving much better code in the end. The pack/unpack operations could be very expensive. There is another huge cost in using GPRs to hold masks. There will be fewer GPRs to hold addresses, which is a precious resource. We should avoid doing anything that uses more of that resource unnecessarily. -Dave --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
"Rotem, Nadav" <nadav.rotem at intel.com> writes:> One of the arguments for packing masks is that it reduces > vector-registers pressure. Auto-vectorizing compilers maintain > multiple masks for different execution paths (for each loop nesting, > etc). Saving masks in xmm registers may result in vector-register > pressure which will cause spilling of these registers. I agree with > you that GP registers are also a precious resource.GPRs are more precious than vector registers in my experience. Spilling a vector register isn't that painful. Spilling a GPR holding an address is disastrous.> In my private branch, I added the [v4i1 .. v64i1] types. I also > implemented a new type of target lowering: "PACK". This lowering packsIs PACK in the X86 namespace? It seems a pretty target-specific thing.> I also plan to experiment with promoting <4 x i1> to <4 x i32>. At > this point I can't really say what needs to be done. Implementing > this kind of promotion also requires adding legalization support for > strange vector types such as <4 x i65>.How often do we see something like that? Baby steps, baby steps... :) -Dave
David, The problem with the sparse representation is that it is word-width dependent. For 32-bit data-types, the mask is the 32nd bit, while fore 64bit types the mask is the 64th bit. How would you legalize the mask for the following code ? %mask = cmp nge <4 x float> %A, %B ; <4 x i1> %val = select <4 x i1>% mask, <4 x double> %X, %Y ; <4 x double> Moreover, in some cases the generator of the mask and the consumer of the mask are in different basic blocks. The legalizer works on one basic block at a time. This makes it impossible for the legalizer to find the 'native' representation. I wrote down some of the comments which were made in this email thread: http://wiki.llvm.org/Vector_select Cheers, Nadav -----Original Message----- From: David A. Greene [mailto:greened at obbligato.org] Sent: Thursday, March 10, 2011 18:57 To: Rotem, Nadav Cc: David A. Greene; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] Vector select/compare support in LLVM "Rotem, Nadav" <nadav.rotem at intel.com> writes:> One of the arguments for packing masks is that it reduces > vector-registers pressure. Auto-vectorizing compilers maintain > multiple masks for different execution paths (for each loop nesting, > etc). Saving masks in xmm registers may result in vector-register > pressure which will cause spilling of these registers. I agree with > you that GP registers are also a precious resource.GPRs are more precious than vector registers in my experience. Spilling a vector register isn't that painful. Spilling a GPR holding an address is disastrous.> In my private branch, I added the [v4i1 .. v64i1] types. I also > implemented a new type of target lowering: "PACK". This lowering packsIs PACK in the X86 namespace? It seems a pretty target-specific thing.> I also plan to experiment with promoting <4 x i1> to <4 x i32>. At > this point I can't really say what needs to be done. Implementing > this kind of promotion also requires adding legalization support for > strange vector types such as <4 x i65>.How often do we see something like that? Baby steps, baby steps... :) -Dave --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.