similar to: [LLVMdev] SPMD Autovectorizer

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] SPMD Autovectorizer"

2015 Jul 07
2
[LLVMdev] SPMD Autovectorizer
On 07/07/2015 01:32 PM, Renato Golin wrote: > Wouldn't OpenMP account for some of that? At least on a single > machine, could you have both parallel and simd optimisations done on > the same loop? The point in SPMD program description (e.g. CUDA or OpenCL C) autovectorization is to produce something like OpenMP parallel loops or SIMD pragmas automatically from the single thread/WI
2013 Jan 25
4
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
On 01/25/2013 09:56 AM, Nadav Rotem wrote: > Thanks for checking the Loop Vectorizer, I am interested in hearing your > feedback. The Loop Vectorizer does not fit here. OpenCL vectorization is > completely different because the language itself is data-parallel. You > don't need all of the legality checks that the loop vectorizer has. I'm aware of this and it was my point in
2015 Jul 07
2
[LLVMdev] SPMD Autovectorizer
On 07/07/2015 09:30 PM, C Bergström wrote: > If you're going to "autopar" (turn a loop into a threads which run on > many cores or something) then please don't add a dependency on OMP. I wouldn't, but simply utilize the parallel loop metadata that was originally designed for this purpose. What is done with that MD is up to other passes. -- --Pekka
2013 Jan 24
3
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi, I started to play with the LoopVectorizer of LLVM trunk on the work-item loops produced by pocl's OpenCL C kernel compiler, in hopes of implementing multi-work-item work group autovectorization in a modular manner. The vectorizer seems to refuse to vectorize the loop if it sees multiple writes to the same memory object within the same iteration. In case of parallel loops such as the
2013 Jan 25
2
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
> I am in favor of adding metadata to control different aspects of > vectorization, mainly for supporting user-level pargmas [1] but also for > DSLs. Before we start adding metadata to the IR we need to define the > semantics of the tags. "Parallel_for" is too general. We also want to control > vectorization factor, unroll factor, cost model, etc. These are used to
2015 Jul 07
2
[LLVMdev] SPMD Autovectorizer
Hi Renato, On 07/07/2015 10:57 PM, Renato Golin wrote: > Now, IIRC, OpenCL had a lot of trouble from getting odd-sized vector > types in IR that the middle end would not understand, especially the > vectorizers. The solution, at least as of 2 years ago, was to > serialise everything and let the CL back-end to vectorize it. Perhaps you are referring to the problem of autovectorizing
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Pekka, > Hi, > > I started to play with the LoopVectorizer of LLVM trunk > on the work-item loops produced by pocl's OpenCL C > kernel compiler, in hopes of implementing multi-work-item > work group autovectorization in a modular manner. > Thanks for checking the Loop Vectorizer, I am interested in hearing your feedback. The Loop Vectorizer does not fit here.
2013 Jan 31
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Pekka, hi Nadav, I didn't find the time to read this thread until now, sorry for that. I actually think you are both right :). As for the current status, the loop vectorizer is only able to vectorize inner loops and (I think) does not handle function calls and memory operations well. This will prevent it from vectorizing a large group of OpenCL kernels, and certainly all
2013 Jan 31
3
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Ralf, On 01/31/2013 05:44 PM, Ralf Karrenberg wrote: > As for the current status, the loop vectorizer is only able to vectorize > inner loops and (I think) does not handle function calls and memory > operations well. This will prevent it from vectorizing a large group of > OpenCL kernels, and certainly all "interesting", more complex ones. Agreed -- but not being able to
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Pekka, > How I see it, the data parallel input simply makes the vectorizer's job > easier (skip some of the legality checks) while reusing most of the > implementation (e.g. cost estimation, unrolling decisions, the > vector instruction formation itself, predication/if-conversion, > speculative execution+blend, etc.). > What you need is outer loop vectorization while
2013 Jan 25
4
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
On 01/25/2013 04:21 PM, Hal Finkel wrote: > My point is that I specifically think that you should try it. I'm curious > to see how what you come up with might apply to other use cases as well. OK, attached is the first quick attempt towards this. I'm not proposing committing this, but would like to get comments to possibly move towards something committable. It simply looks for a
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
----- Original Message ----- > From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi> > To: "Nadav Rotem" <nrotem at apple.com> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Friday, January 25, 2013 5:35:16 AM > Subject: Re: [LLVMdev] LoopVectorizer in OpenCL C work group autovectorization > > On
2013 Jan 31
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
----- Original Message ----- > From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi> > To: "Ralf Karrenberg" <Chareos at gmx.de> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu> > Sent: Thursday, January 31, 2013 11:15:43 AM > Subject: Re: [LLVMdev] LoopVectorizer in OpenCL C work group autovectorization > > Hi
2015 Jun 10
4
[LLVMdev] The use iterator not working...
Thanks Dan and Jon. I made an incorrect assumption that the "use" iterator was actually giving me the "user" when de-referencing it. Did it always have this behavior in previous LLVM versions? I've seen lots of examples of the "use" iterator being dereferenced and resulting Instruction pointer being treated as the "user"? Thanks, Zack On Tue, Jun 9,
2011 Oct 19
1
[LLVMdev] ANN: libclc (OpenCL C library implementation)
Hi Micah, The numbers from the paper were measured with the ATI Stream SDK v2.1 (it's only mentioned in the references I think). The most recent measurements I have were done with the current v2.5. Best, Ralf Am 19.10.2011 23:43, schrieb Villmow, Micah: > Ralf, > What version of the SDK were you using for your analysis? I don't see that in the slides/pdf. > > Thanks, >
2011 Oct 19
0
[LLVMdev] ANN: libclc (OpenCL C library implementation)
Ralf, What version of the SDK were you using for your analysis? I don't see that in the slides/pdf. Thanks, Micah > -----Original Message----- > From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] > On Behalf Of Ralf Karrenberg > Sent: Wednesday, October 19, 2011 2:13 PM > To: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] ANN: libclc (OpenCL C
2011 Oct 19
6
[LLVMdev] ANN: libclc (OpenCL C library implementation)
Hi everybody, the compiler design lab at Saarland University (chair of Sebastian Hack) is also working on an LLVM-based OpenCL driver. The project started as a use-case for our "Whole-Function Vectorization" library, which allows to transform a function to compute the same as W executions of the original code by using SIMD instructions (W = 4 for SSE/AltiVec, 8 for AVX). The
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Pekka, I am in favor of adding metadata to control different aspects of vectorization, mainly for supporting user-level pargmas [1] but also for DSLs. Before we start adding metadata to the IR we need to define the semantics of the tags. "Parallel_for" is too general. We also want to control vectorization factor, unroll factor, cost model, etc. Doug Gregor suggested to add the
2012 Oct 17
0
[LLVMdev] Loop vectorizer
Hi everybody, On 10/17/12 12:32 AM, Hal Finkel wrote: >>> Do you have a plan for xforms to increase the amount of >>> vectorization? >> >> Yes. We will need to implement a predication phase and to design the >> interaction with other loop transformations. Also, this will have to >> work well with the cost model. We also need to think of a good way to
2013 Oct 25
2
[LLVMdev] Is there pass to break down <4 x float> to scalars
Hi, Great to see someone working on this. This will benefit the performance portability goal of the pocl's OpenCL kernel compiler. It has been one of the low hanging fruits in improving its implicit WG vectorization applicability. The use case there is that sometimes it makes sense to devectorize the explicitly used vector datatype code of OpenCL kernels in order to make better opportunities