Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] Vector select/compare support in LLVM"
2011 Mar 09
0
[LLVMdev] Vector select/compare support in LLVM
"Rotem, Nadav" <nadav.rotem at intel.com> writes:
> I can think of two ways to represent masks in x86: sparse and
> packed. In the sparse method, the masks are kept in <4 x 32bit>
> registers, which are mapped to xmm registers. This is the ‘native’ way
> of using masks.
This argues for the sparse representation, I think.
> _Sparse_ After my discussion with
2011 Mar 10
2
[LLVMdev] Vector select/compare support in LLVM
After I implemented a new type of legalization (the packing of i1 vectors), I found that x86 does not have a way to load packed masks into SSE registers. So, I guess that legalizing of <4 x i1> to <4 x i32> is the way to go.
Cheers,
Nadav
-----Original Message-----
From: Rotem, Nadav
Sent: Thursday, March 10, 2011 11:04
To: 'David A. Greene'
Cc: llvmdev at cs.uiuc.edu
2011 Mar 10
2
[LLVMdev] Vector select/compare support in LLVM
Hi David,
The MOVMSKPS instruction is cheap (2 cycles). Not to be confused with VMASKMOV, the AVX masked move, which is expensive.
One of the arguments for packing masks is that it reduces vector-registers pressure. Auto-vectorizing compilers maintain multiple masks for different execution paths (for each loop nesting, etc). Saving masks in xmm registers may result in vector-register
2011 Mar 10
0
[LLVMdev] Vector select/compare support in LLVM
"Rotem, Nadav" <nadav.rotem at intel.com> writes:
> One of the arguments for packing masks is that it reduces
> vector-registers pressure. Auto-vectorizing compilers maintain
> multiple masks for different execution paths (for each loop nesting,
> etc). Saving masks in xmm registers may result in vector-register
> pressure which will cause spilling of these
2011 Mar 10
0
[LLVMdev] Vector select/compare support in LLVM
Hey,
I am currently forced to create the BLENDVPS intrinsic as an external
call (via Intrinsic::x86_sse41_blendvps) which has the following
signature (from IntrinsicsX86.td):
def int_x86_sse41_blendvps :
GCCBuiltin<"__builtin_ia32_blendvps">,
Intrinsic<[llvm_v4f32_ty],[llvm_v4f32_ty, llvm_v4f32_ty,
llvm_v4f32_ty],[IntrNoMem]>
Thus, it expects the mask (first operand if
2011 Mar 14
1
[LLVMdev] Vector select/compare support in LLVM
David,
The problem with the sparse representation is that it is word-width dependent. For 32-bit data-types, the mask is the 32nd bit, while fore 64bit types the mask is the 64th bit.
How would you legalize the mask for the following code ?
%mask = cmp nge <4 x float> %A, %B ; <4 x i1>
%val = select <4 x i1>% mask, <4 x double> %X, %Y ; <4 x
2013 Mar 05
4
[LLVMdev] Vector splitting vs widening
Hello,
Working on my (currently out-of-tree) BG/Q PPC enhancements, I've run into the following problem with vector type legalization. Here's a quick example:
Scalarize node result 0: 0x2348420: v1f32 = extract_subvector 0x23434a0, 0x2348320 [ID=0]
Scalarize node result 0: 0x2348220: v1f32 = extract_subvector 0x23434a0, 0x23466e0 [ID=0]
Split node result: 0x23469e0: v4f32 =
2013 Oct 25
2
[LLVMdev] Bug #16941
Nadav,
The problem appears only for vectors longer than available hardware
register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8
on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers,
select converts them to a single XMM registers (i.e. 8 x 16 bit),
immediately after it converts back to two XMM registers and does blend.
Conversion forth and back has
2012 Jul 30
4
[LLVMdev] Vector promotion broken for <2 x [i8|i16]>
Sorry, <4 x i8> should convert to a <1 x i32>. What currently is happening is that it is returning a <2 x i32> because <1 x i32> does not exist.
Micah
> -----Original Message-----
> From: Rotem, Nadav [mailto:nadav.rotem at intel.com]
> Sent: Monday, July 30, 2012 10:51 AM
> To: Villmow, Micah; Developers Mailing List
> Subject: RE: Vector promotion broken
2013 Mar 06
0
[LLVMdev] Vector splitting vs widening
Hi Hal,
> The problem is essentially the following: there are no vector f32 types (yet), so the <v4i1> = setcc <v4f32> node needs to be split and scalarized. The operand splitting seems to start correctly, but because <v4i1> is itself a legal type, after splitting the node into <v2i1> = setcc <v2f32>, the process becomes confused. The operands are again split
2012 Jul 30
2
[LLVMdev] Vector promotion broken for <2 x [i8|i16]>
Hrmm.... PromoteVectorOp doesn't seem to follow this at all.
http://llvm.org/svn/llvm-project/llvm/trunk/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp
SDValue VectorLegalizer::PromoteVectorOp(SDValue Op) {
// Vector "promotion" is basically just bitcasting and doing the operation
// in a different type. For example, x86 promotes ISD::AND on v2i32 to
// v1i64.
EVT VT =
2012 Jul 30
2
[LLVMdev] Vector promotion broken for <2 x [i8|i16]>
v4i8 itself is a legal type, just not on the 'AND' operation.
So there seems to be multiple problems here.
1) PromoteVectorOp doesn't handle the case where the types are not the same size, this occurs because #2
2) getTypeToPromoteTo doesn't actual check to see if the type it should promote to makes any sense.
3) PromoteVectorOp also doesn't handle the case where
2013 Oct 26
0
[LLVMdev] Bug #16941
Hi Dmitry,
Yes, this is a known problem with legalizing vector masks. The type <8 x i1> is legalized to 8 x i16, on SSE, but your operands are legalized to <4 x i32>. Type-legalization is performed per-node and we don’t have a good way to support instructions that mix the mask and operand type. Why does ISPC generate illegal vector types ? Does ISPC rely on the LLVM codegen to
2011 Oct 16
3
[LLVMdev] Enabling Vector-select
Hello everyone,
I wanted to let everybody know that I am going to enable the support for vector-select by default later today.
Details:
Currently the LLVM code-generator only supports 'select' [1] instructions with a boolean condition. Vectorizing compilers, such as the Intel OpenCL Vectorizer and the GCC vectorizer often use vector-select instructions to implements masks. This change
2012 Jul 30
0
[LLVMdev] Vector promotion broken for <2 x [i8|i16]>
I don't know how your target architecture looks like, but I suspect that <4 x i8> should not be legalized to <1 x i32>. I think that what you are seeing is that <4 x i8> is first split into <2 x i8>, and later promoted to <2 x i32>. At the moment different targets can only affect type-legalization by declaring different legal types. A number of us discussed the
2012 Jul 30
0
[LLVMdev] Vector promotion broken for <2 x [i8|i16]>
Notice that PromoteVectorOp is called after the type legalization legalized all of the types in the program. It legalizes the *operations*, not the types. So, you should only see legal types (Legal types are types that fit into your registers). So, if your target has v2i32, I suspect that v4i8 is an illegal because it has a different size.
-----Original Message-----
From: Villmow, Micah
2013 Oct 26
1
[LLVMdev] Bug #16941
Hi Nadav,
ISPC is generating long vectors (on corresponding ISPC targets) this way
since the every beginning of ISPC as far as I know. There's no such things
in official LLVM documents as "illegal vectors", so people do expect that
arbitrary long vectors are supported and generated reasonably well. Note,
not super-optimal, but reasonably well. Keeping it this way allows
considering
2012 Jul 30
0
[LLVMdev] Vector promotion broken for <2 x [i8|i16]>
If v4i8 is a legal type then getTypeToPromoteTo should return the pair v4i8 and 'legal'. This looks like the root of the problem.
-----Original Message-----
From: Villmow, Micah [mailto:Micah.Villmow at amd.com]
Sent: Monday, July 30, 2012 22:10
To: Rotem, Nadav; Developers Mailing List
Subject: RE: Vector promotion broken for <2 x [i8|i16]>
v4i8 itself is a legal type, just not
2012 Jul 30
2
[LLVMdev] Vector promotion broken for <2 x [i8|i16]>
No, that is correct. I am adding the new types so that I can bitcast v2i8 into a v1i16 and then perform the 'and' operation and have legalize types turn the v1i16 into a scalar.
Though I am having trouble in understanding how x86 supports the <1 x i64> type. Based on looking at the code, it should fail because v1i64 is not supported on the x86 platform as far as I can tell.
Micah
2013 Mar 09
1
[LLVMdev] Vector splitting vs widening
----- Original Message -----
> From: "Nadav Rotem" <nrotem at apple.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvmdev at cs.uiuc.edu Dev" <llvmdev at cs.uiuc.edu>
> Sent: Wednesday, March 6, 2013 3:40:50 PM
> Subject: Re: [LLVMdev] Vector splitting vs widening
>
> Hi Hal,
>
>
>
>
>
>
> The