similar to: [LLVMdev] SelectionDAG scalarizes vector operations.

Displaying 20 results from an estimated 8000 matches similar to: "[LLVMdev] SelectionDAG scalarizes vector operations."

2012 Feb 08
0
[LLVMdev] SelectionDAG scalarizes vector operations.
Hi Nadav, > I had a few thoughts regarding our short discussion yesterday. > > I am not sure how we can lower SEXT into the vpmovsx family of instructions. I propose the following strategy for the ZEXT and ANYEXT family of functions. what I would like to understand first is why there are any vector xEXT nodes at all! As I tried to explain on IRC, I don't think you ever get these
2012 Feb 08
2
[LLVMdev] SelectionDAG scalarizes vector operations.
We generate xEXT nodes in many cases. Unlike GCC which vectorizes inner loops, we vectorize the implicit outermost loop of data-parallel workloads (also called whole function vectorization). We vectorize code even if the user uses xEXT instructions, uses mixed types, etc. We choose a vectorization factor which is likely to generate more legal vector types, but if the user mixes types then we
2012 Feb 08
0
[LLVMdev] SelectionDAG scalarizes vector operations.
"Rotem, Nadav" <nadav.rotem at intel.com> writes: > We generate xEXT nodes in many cases. Unlike GCC which vectorizes > inner loops, we vectorize the implicit outermost loop of data-parallel > workloads (also called whole function vectorization). We vectorize > code even if the user uses xEXT instructions, uses mixed types, etc. > We choose a vectorization factor
2012 Feb 08
5
[LLVMdev] SelectionDAG scalarizes vector operations.
Hi Dave, >> We generate xEXT nodes in many cases. Unlike GCC which vectorizes >> inner loops, we vectorize the implicit outermost loop of data-parallel >> workloads (also called whole function vectorization). We vectorize >> code even if the user uses xEXT instructions, uses mixed types, etc. >> We choose a vectorization factor which is likely to generate more
2012 Feb 08
3
[LLVMdev] SelectionDAG scalarizes vector operations.
Hi David! I'd be interested in hearing about the places that you had to fix. It seems like there is a number of people who are starting to look at the quality of the generated vector code. Maybe we should report our findings in bug reports, so that we could share the work and discuss possible findings. I also plan to fill a few bug reports with suboptimal code. Thanks, Nadav
2012 Feb 08
0
[LLVMdev] SelectionDAG scalarizes vector operations.
"Rotem, Nadav" <nadav.rotem at intel.com> writes: > Hi David! > > I'd be interested in hearing about the places that you had to fix. It > seems like there is a number of people who are starting to look at the > quality of the generated vector code. Maybe we should report our > findings in bug reports, so that we could share the work and discuss >
2012 Feb 08
0
[LLVMdev] SelectionDAG scalarizes vector operations.
Hi All, > Hi Dave, > > >> We generate xEXT nodes in many cases. Unlike GCC which vectorizes > >> inner loops, we vectorize the implicit outermost loop of > >> data-parallel workloads (also called whole function vectorization). Just to clarify, GCC vectorizes innermost and next-to-innermost (aka outer) loops, packing instances of the same original scalar
2012 Feb 08
0
[LLVMdev] SelectionDAG scalarizes vector operations.
Duncan Sands <baldrick at free.fr> writes: > I think it is important we produce non-scalarized code for the IR produced by > the GCC vectorizer, since we know it can be done (otherwise GCC wouldn't have > produced it). It is of course important to produce decent code in the most > common cases coming from other vectorizers too. However it seems sensible to > me to start
2011 Mar 08
3
[LLVMdev] Vector select/compare support in LLVM
Hello, I started working on adding vector support for the SELECT and CMP instructions in the codegen (bugs: 3384, 1784, 2314).  Currently, the codegen scalarizes vector CMPs into multiple scalar CMPs.  It is easy to add similar scalarization support to the SELECT instruction.  However, using multiple scalar operations is slower than using vector operations. In LLVM, vector-compare operations
2013 Oct 25
2
[LLVMdev] Bug #16941
Nadav, The problem appears only for vectors longer than available hardware register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8 on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has
2012 Feb 08
2
[LLVMdev] SelectionDAG scalarizes vector operations.
On Feb 8, 2012, at 9:01 AM, David A. Greene wrote: > "Rotem, Nadav" <nadav.rotem at intel.com> writes: > >> Hi David! >> >> I'd be interested in hearing about the places that you had to fix. It >> seems like there is a number of people who are starting to look at the >> quality of the generated vector code. Maybe we should report our
2013 Oct 26
0
[LLVMdev] Bug #16941
Hi Dmitry, Yes, this is a known problem with legalizing vector masks. The type <8 x i1> is legalized to 8 x i16, on SSE, but your operands are legalized to <4 x i32>. Type-legalization is performed per-node and we don’t have a good way to support instructions that mix the mask and operand type. Why does ISPC generate illegal vector types ? Does ISPC rely on the LLVM codegen to
2013 Oct 26
1
[LLVMdev] Bug #16941
Hi Nadav, ISPC is generating long vectors (on corresponding ISPC targets) this way since the every beginning of ISPC as far as I know. There's no such things in official LLVM documents as "illegal vectors", so people do expect that arbitrary long vectors are supported and generated reasonably well. Note, not super-optimal, but reasonably well. Keeping it this way allows considering
2013 Oct 22
0
[LLVMdev] Bug #16941
On Oct 21, 2013, at 12:09 PM, Dmitry Babokin <babokin at gmail.com> wrote: > By the way, I'm curious, is the any reason why you focus on SSE4, not AVX? Seems that vectorizer should care the most about the latest silicon. > I am interested in looking at the SSE4 code because lowering of AVX code is more complicated, especially for masks. The problem that <8 x i1> can be
2013 Oct 21
2
[LLVMdev] Bug #16941
Nadav, You are right, ISPC may issue intrinsics as a result of AST selection. Though I believe that we should stick to LLVM IR whenever is possible. Intrinsics may appear to be boundaries for optimizations (on both data and control flow) and are generally not optimizable. LLVM may improve over time from performance stand point and we would benefit from it (or it may play against us, like in this
2011 Mar 09
0
[LLVMdev] Vector select/compare support in LLVM
"Rotem, Nadav" <nadav.rotem at intel.com> writes: > I can think of two ways to represent masks in x86: sparse and > packed. In the sparse method, the masks are kept in <4 x 32bit> > registers, which are mapped to xmm registers. This is the ‘native’ way > of using masks. This argues for the sparse representation, I think. > _Sparse_ After my discussion with
2014 Dec 02
2
[LLVMdev] Should more vector [zs]extloads be legal for X86 SSE4.1?
Hi Chandler, all, Why aren't the vector [zs]extloads introduced by SSE4.1/AVX2 declared legal? Is it a simple oversight, or did I miss a deeper reason? While cleaning up PMOV*X patterns, I stumbled upon this braindead testcase: %0 = load <8 x i8>* %src, align 1 %1 = zext <8 x i8> %0 to <8 x i16> turning into: pmovzxbw (%rsi), %xmm0
2011 Mar 10
2
[LLVMdev] Vector select/compare support in LLVM
Hi David, The MOVMSKPS instruction is cheap (2 cycles). Not to be confused with VMASKMOV, the AVX masked move, which is expensive. One of the arguments for packing masks is that it reduces vector-registers pressure. Auto-vectorizing compilers maintain multiple masks for different execution paths (for each loop nesting, etc). Saving masks in xmm registers may result in vector-register
2011 Mar 10
2
[LLVMdev] Vector select/compare support in LLVM
After I implemented a new type of legalization (the packing of i1 vectors), I found that x86 does not have a way to load packed masks into SSE registers. So, I guess that legalizing of <4 x i1> to <4 x i32> is the way to go. Cheers, Nadav -----Original Message----- From: Rotem, Nadav Sent: Thursday, March 10, 2011 11:04 To: 'David A. Greene' Cc: llvmdev at cs.uiuc.edu
2011 Mar 10
0
[LLVMdev] Vector select/compare support in LLVM
"Rotem, Nadav" <nadav.rotem at intel.com> writes: > One of the arguments for packing masks is that it reduces > vector-registers pressure. Auto-vectorizing compilers maintain > multiple masks for different execution paths (for each loop nesting, > etc). Saving masks in xmm registers may result in vector-register > pressure which will cause spilling of these