thr3ads.net - similar to: "[LLVMdev] LLVM Loop Vectorizer"

Displaying 20 results from an estimated 10000 matches similar to: "[LLVMdev] LLVM Loop Vectorizer"

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

I think we should try to abstract the costs of instructions of various targets instead of trying to replicate them exactly. The coarser the costing infrastructure the more robust will be the vectorization pass. Also this eliminates/reduces the need of updating the costing infrastructure as and when new h/w reduces the cost(s) of existing instructions. - Dibyendu -----Original Message----- From:

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

----- Original Message ----- > From: "Dibyendu Das" <Dibyendu.Das at amd.com> > To: "Nadav Rotem" <nrotem at apple.com>, "llvmdev at cs.uiuc.edu Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Friday, October 5, 2012 3:59:56 AM > Subject: Re: [LLVMdev] LLVM Loop Vectorizer > > I think we should try to abstract the costs of

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

Why not just have a hook into the TargetInstrInfo to query for the cost of an instruction? This is already used in many places throughout the optimizers. > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Das, Dibyendu > Sent: Friday, October 05, 2012 2:00 AM > To: Nadav Rotem; llvmdev at cs.uiuc.edu Mailing

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

----- Original Message ----- > From: "Ramshankar Ramanarayanan" <Ramshankar.Ramanarayanan at amd.com> > To: "Hal Finkel" <hfinkel at anl.gov>, "Dibyendu Das" <Dibyendu.Das at amd.com> > Cc: "llvmdev at cs.uiuc.edu Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Friday, October 5, 2012 11:00:39 AM > Subject: RE: [LLVMdev]

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

Perhaps we can parameterize the size of the vector while vectorizing @ llvm and fix up the loop iterators in a target specific pass. -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Hal Finkel Sent: Friday, October 05, 2012 8:30 PM To: Das, Dibyendu Cc: llvmdev at cs.uiuc.edu Mailing List Subject: Re: [LLVMdev] LLVM Loop

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

If -simd option is specified opt could do validity checks, dependency analysis and such and recognize that a loop can be executed in parallel and as the -simd option is specified, convert the data types to vector instructions and add the scaling factor to the loop's iterators. Following this there can be an early machine function pass that sets up processor specific value in all of

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

----- Original Message ----- > From: "Nadav Rotem" <nrotem at apple.com> > To: "llvmdev at cs.uiuc.edu Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Friday, October 5, 2012 1:14:47 AM > Subject: [LLVMdev] LLVM Loop Vectorizer > > Hi, > > We are starting to work on an LLVM loop vectorizer. There's number of > different projects that

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

>I think that the first step would be to expose Target Lowering Interface (TLI) to OPT's IR-level passes. By "lowering", we assume the bitcode is more abstract than the machine code. However, in some situations, it is just opposite. For instance, some architectures support vectorization of min/max/saturated-{add,sub)/conditional-assignment/etc/../etc. We need to detect such

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

Nadav Rotem wrote: > Hi, > > We are starting to work on an LLVM loop vectorizer. There's number of different projects that already vectorize LLVM IR. For example Hal's BB-Vectorizer, Intel's OpenCL Vectorizer, Polly, ISPC, AnySL, just to name a few. I think that it would be great if we could collaborate on the areas that are shared between the different projects. I think that

[LLVMdev] Bug #16941

2013 Oct 25

[LLVMdev] Bug #16941

Nadav, The problem appears only for vectors longer than available hardware register (in doubleword elements, i.e. more than 4 on SSE4 and more than 8 on AVX). Select does weird thing. <8 x i1> mask comes as two XMM registers, select converts them to a single XMM registers (i.e. 8 x 16 bit), immediately after it converts back to two XMM registers and does blend. Conversion forth and back has

[LLVMdev] Bug #16941

2013 Oct 21

[LLVMdev] Bug #16941

Nadav, You are absolutely right, it's ISPC workload. I've checked SSE4 and it's also severely affected. We use intrinsics only for conversion <N x i32> <=> i32, i.e. movmsk.ps. For the rest we use general LLVM instructions. And I actually would really like to stick this way. We rely on LLVM's ability to produce efficient code from general LLVM IR. Relying on

[LLVMdev] Bug #16941

2013 Oct 26

[LLVMdev] Bug #16941

Hi Dmitry, Yes, this is a known problem with legalizing vector masks. The type <8 x i1> is legalized to 8 x i16, on SSE, but your operands are legalized to <4 x i32>. Type-legalization is performed per-node and we don’t have a good way to support instructions that mix the mask and operand type. Why does ISPC generate illegal vector types ? Does ISPC rely on the LLVM codegen to

[LLVMdev] Bug #16941

2013 Oct 21

[LLVMdev] Bug #16941

Hi Dmitry, ISPC does some instruction selection as part of vectorization (on ASTs!) by placing intrinsics for specific operations. The SEXT to i32 pattern was implemented because LLVM did not support vector-selects when this code was written. Can you submit a small SSE4 test case that demonstrates the problem? Select is the canonical form of this operations, and SEXT is usually more

[LLVMdev] Bug #16941

2013 Oct 26

[LLVMdev] Bug #16941

Hi Nadav, ISPC is generating long vectors (on corresponding ISPC targets) this way since the every beginning of ISPC as far as I know. There's no such things in official LLVM documents as "illegal vectors", so people do expect that arbitrary long vectors are supported and generated reasonably well. Note, not super-optimal, but reasonably well. Keeping it this way allows considering

[LLVMdev] Bug #16941

2013 Oct 21

[LLVMdev] Bug #16941

Nadav, You are right, ISPC may issue intrinsics as a result of AST selection. Though I believe that we should stick to LLVM IR whenever is possible. Intrinsics may appear to be boundaries for optimizations (on both data and control flow) and are generally not optimizable. LLVM may improve over time from performance stand point and we would benefit from it (or it may play against us, like in this

[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP

2011 Nov 23

[LLVMdev] [llvm-commits] Vectors of Pointers and Vector-GEP

Duncan, Thanks for the quick review! Here is a short description (design) of where I am going with this patch: 1. Motivation: Vectors-of-pointers is the first step in supporting scatter/gather instructions (available in AVX2, for example). I believe that this feature was requested on the mailing list before. As mentioned by Hal Finkel earlier today, this feature is desired by autovectorizers as

[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

2013 Jun 02

[LLVMdev] Polyhedron 2005 results for dragonegg 3.3svn

Hi Jack, thanks for splitting out what the effects of LLVM's / GCC's vectorizers is. On 01/06/13 21:34, Jack Howarth wrote: > On Sat, Jun 01, 2013 at 06:45:48AM +0200, Duncan Sands wrote: >> >> These results are very disappointing, I was hoping to see a big improvement >> somewhere instead of no real improvement anywhere (except for gas_dyn) or a >> regression

[LLVMdev] LLVM Loop Vectorizer

2012 Oct 05

[LLVMdev] LLVM Loop Vectorizer

On Oct 5, 2012, at 12:08 AM, Nick Lewycky <nicholas at mxc.ca> wrote: > I absolutely think that we should have something like TargetData (now DataLayout) but for the vector types and operations. However, I'm not familiar with "Target Lowering Interface". Could you explain? I agree. Once we make the codegen accessible to the IR-level passes we need to start talking about

[LLVMdev] Generate scalar SSE instructions instead of packed instructions

2013 Feb 21

[LLVMdev] Generate scalar SSE instructions instead of packed instructions

On Thu, Feb 21, 2013 at 12:14 PM, Nadav Rotem <nrotem at apple.com> wrote: > You can change the input LLVM-IR. > > On Feb 21, 2013, at 7:16 AM, "Nowicki, Tyler" <tyler.nowicki at intel.com> > wrote: > > Hi,**** > > ** ** > > I am interested in evaluating the performance of packed vs scalar > double-precision floating point instructions on

[LLVMdev] Bug #16941

2013 Oct 21

[LLVMdev] Bug #16941

Nadav, Could you please have a look at bug #16941 and let us know what you think about it? It's performance regression after one of your commits. Thanks. Dmitry. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20131021/036e81d6/attachment.html>

similar to: [LLVMdev] LLVM Loop Vectorizer