search for: wiloop

Displaying 20 results from an estimated 20 matches for "wiloop".

Did you mean: wiloops
2013 Jan 25
4
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...39;s kernel compiler detects the "parallel regions" (the regions between work group barriers) and generates a new function suitable for executing multiple work items (WI) in the work group. One method to generate such functions is to generate embarrassingly parallel "for-loops" (wiloops) that produce the multi-WI DLP execution. That is, the loop executes the code in the parallel regions for each work item in the work group. This step is needed to make the multi-WI kernel executable on non-SIMD/SIMT platforms (read: CPUs). On the "SPMD-tailored" processors (many GPUs) t...
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...detects the "parallel regions" (the > regions between work group barriers) and generates a new function suitable > for executing multiple work items (WI) in the work group. One method to > generate such functions is to generate embarrassingly parallel "for-loops" > (wiloops) that produce the multi-WI DLP execution. That is, the loop > executes the code in the parallel regions for each work item in the work > group. > > This step is needed to make the multi-WI kernel executable on > non-SIMD/SIMT platforms (read: CPUs). On the "SPMD-tailored"...
2013 Jan 25
4
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...ing this, but would like to get comments to possibly move towards something committable. It simply looks for a metadata named 'parallel_for' in any of the instructions in the loop's header and assumes the loop is a parallel one if such is found. This metadata is added by the pocl's wiloops generation routine. It passes the pocl test suite when enabled but probably cannot vectorize many kernels (at least) due to the missing intra-kernel vector scalarizer. Some known problems that need addressing: - Metadata can only be attached to Instructions (not Loops or even BasicBlocks), th...
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...uot;parallel regions" (the > regions between work group barriers) and generates a new function > suitable > for executing multiple work items (WI) in the work group. One method > to > generate such functions is to generate embarrassingly parallel > "for-loops" > (wiloops) that produce the multi-WI DLP execution. That is, the loop > executes the code in the parallel regions for each work item in the > work > group. > > This step is needed to make the multi-WI kernel executable on > non-SIMD/SIMT platforms (read: CPUs). On the "SPMD-tailored&...
2013 Jan 29
5
[LLVMdev] [PATCH] parallel loop metadata
....cs.uiuc.edu/pipermail/llvmdev/2013-January/058727.html Maybe the safe thing here is to rename it back to the honest "llvm.loop.parallel" or similar and we can add a separate one for the assumed_dep later on. This one would support the truly parallel loops (at least OpenMP for and OpenCL WIloops) where no compiler checking at all can be assumed by the programmer. Any objections? Paul Redmond? -- --Pekka
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Pekka, > Hi, > > I started to play with the LoopVectorizer of LLVM trunk > on the work-item loops produced by pocl's OpenCL C > kernel compiler, in hopes of implementing multi-work-item > work group autovectorization in a modular manner. > Thanks for checking the Loop Vectorizer, I am interested in hearing your feedback. The Loop Vectorizer does not fit here.
2013 Jan 24
3
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi, I started to play with the LoopVectorizer of LLVM trunk on the work-item loops produced by pocl's OpenCL C kernel compiler, in hopes of implementing multi-work-item work group autovectorization in a modular manner. The vectorizer seems to refuse to vectorize the loop if it sees multiple writes to the same memory object within the same iteration. In case of parallel loops such as the
2013 Jan 29
1
[LLVMdev] [PATCH] parallel loop metadata
...nen wrote: > > > Maybe the safe thing here is to rename it back to the honest > > "llvm.loop.parallel" or similar and we can add a separate one for > > the assumed_dep later on. This one would support the truly parallel > > loops (at least OpenMP for and OpenCL WIloops) where no compiler > > checking at all can be assumed by the programmer. > > > > Any objections? Paul Redmond? > > > > I don't have any objections. I think the only requirement is that the > semantics are clearly defined. > > Personally I think this m...
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
----- Original Message ----- > From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "Nadav Rotem" <nrotem at apple.com> > Sent: Friday, January 25, 2013 8:14:57 AM > Subject: Re: [LLVMdev] LoopVectorizer in OpenCL
2013 Jan 28
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...CL code. That it would miscompile non-OpenCL code is irrelevant. + for (BasicBlock::iterator ii = header->begin(); + ii != header->end(); ii++) { http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop Nick This metadata is added by the pocl's wiloops > generation routine. It passes the pocl test suite when enabled but > probably cannot vectorize many kernels (at least) due to the missing > intra-kernel vector scalarizer. > > Some known problems that need addressing: > > - Metadata can only be attached to Instructions (not...
2013 Jan 29
0
[LLVMdev] [PATCH] parallel loop metadata
...2:42 PM, Pekka Jääskeläinen wrote: > Maybe the safe thing here is to rename it back to the honest > "llvm.loop.parallel" or similar and we can add a separate one for > the assumed_dep later on. This one would support the truly parallel > loops (at least OpenMP for and OpenCL WIloops) where no compiler > checking at all can be assumed by the programmer. > > Any objections? Paul Redmond? > I don't have any objections. I think the only requirement is that the semantics are clearly defined. Personally I think this metadata should be used to guide the vectorize...
2013 Jan 25
2
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
On 01/25/2013 04:00 PM, Hal Finkel wrote: > Based on this experience, can you propose some metadata that would allow > this to happen (so that the LoopVectorizer would be generally useful for > POCL)? I suspect this same metadata might be useful in other contexts (such > as implementing iteration-independence pragmas). I cannot yet. In this hack I simply changed LoopVectorizer to
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...to get comments > to possibly move towards something committable. > > It simply looks for a metadata named 'parallel_for' in any of the > instructions in the loop's header and assumes the loop is a parallel > one if such is found. This metadata is added by the pocl's wiloops > generation routine. It passes the pocl test suite when enabled but > probably cannot vectorize many kernels (at least) due to the missing > intra-kernel vector scalarizer. > > Some known problems that need addressing: > > - Metadata can only be attached to Instructions (no...
2013 Jan 31
3
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...is the following: > WFV assumes that there is at least one outer loop that has increments of > one, runs a multiple of the SIMD width iterations, and that every > iteration is independent (barriers can be handled by the OpenCL driver > *after* WFV). Yes, this is the case with the "wiloops" work group generation method of pocl. The parallel outer loops are the max 3 dimensions of the local space. The actual wg barrier calls are converted to no-ops (compiler barriers) for the current targets. > On the other hand, LoopVectorizer may not be aimed at covering all kinds > of...
2013 Jan 29
0
[LLVMdev] [PATCH] parallel loop metadata
...vmdev/2013-January/058727.html > > Maybe the safe thing here is to rename it back to the honest > "llvm.loop.parallel" or similar and we can add a separate one for > the assumed_dep later on. This one would support the truly parallel > loops (at least OpenMP for and OpenCL WIloops) where no compiler > checking at all can be assumed by the programmer. Will parallel always be synonymous with no_interiteration_dependencies? I'm sightly worried that 'parallel' seems too much like a directive, and we may want it to mean something else in the future. -Hal >...
2013 Feb 19
0
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
...ent to llvm.mem.parallel_loop_access? I'll do this. > - Update the loop vectorizer (to update the metadata when it unrolls) > - Update the regular unroller I'll update the pocl's work-item replicator first (of which output is effectively the same as a fully unrolled parallel wiloop) to mark the iterations with the iteration argument and see where it gets the WG vectorization using the upstream BBVectorizer. > - Update the alias analysis (maybe this is sufficient for basic support in BBVectorize) - is your current code close enough for this? The AA should be trivial. I...
2013 Jan 31
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...is at least one outer loop that has > > increments of > > one, runs a multiple of the SIMD width iterations, and that every > > iteration is independent (barriers can be handled by the OpenCL > > driver > > *after* WFV). > > Yes, this is the case with the "wiloops" work group generation > method of pocl. The parallel outer loops are the max 3 dimensions of > the > local space. The actual wg barrier calls are converted to no-ops > (compiler > barriers) for the current targets. > > > On the other hand, LoopVectorizer may not be a...
2013 Jan 31
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Pekka, hi Nadav, I didn't find the time to read this thread until now, sorry for that. I actually think you are both right :). As for the current status, the loop vectorizer is only able to vectorize inner loops and (I think) does not handle function calls and memory operations well. This will prevent it from vectorizing a large group of OpenCL kernels, and certainly all
2013 Jan 25
2
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
> I am in favor of adding metadata to control different aspects of > vectorization, mainly for supporting user-level pargmas [1] but also for > DSLs. Before we start adding metadata to the IR we need to define the > semantics of the tags. "Parallel_for" is too general. We also want to control > vectorization factor, unroll factor, cost model, etc. These are used to
2013 Feb 19
4
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
----- Original Message ----- > From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi> > To: "Nadav Rotem" <nrotem at apple.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Tobias Grosser" <tobias at grosser.es>, "llvmdev at cs.uiuc.edu Dev" > <llvmdev at cs.uiuc.edu> > Sent: Tuesday, February 19, 2013