Displaying 20 results from an estimated 20 matches for "wiloop".
Did you mean:
wiloops
2013 Jan 25
4
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...39;s kernel compiler detects the "parallel regions" (the
regions between work group barriers) and generates a new function suitable
for executing multiple work items (WI) in the work group. One method to
generate such functions is to generate embarrassingly parallel "for-loops"
(wiloops) that produce the multi-WI DLP execution. That is, the loop
executes the code in the parallel regions for each work item in the work
group.
This step is needed to make the multi-WI kernel executable on
non-SIMD/SIMT platforms (read: CPUs). On the "SPMD-tailored" processors
(many GPUs) t...
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...detects the "parallel regions" (the
> regions between work group barriers) and generates a new function suitable
> for executing multiple work items (WI) in the work group. One method to
> generate such functions is to generate embarrassingly parallel "for-loops"
> (wiloops) that produce the multi-WI DLP execution. That is, the loop
> executes the code in the parallel regions for each work item in the work
> group.
>
> This step is needed to make the multi-WI kernel executable on
> non-SIMD/SIMT platforms (read: CPUs). On the "SPMD-tailored"...
2013 Jan 25
4
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...ing this, but would like to get comments
to possibly move towards something committable.
It simply looks for a metadata named 'parallel_for' in any of the
instructions in the loop's header and assumes the loop is a parallel
one if such is found. This metadata is added by the pocl's wiloops
generation routine. It passes the pocl test suite when enabled but
probably cannot vectorize many kernels (at least) due to the missing
intra-kernel vector scalarizer.
Some known problems that need addressing:
- Metadata can only be attached to Instructions (not Loops or even
BasicBlocks), th...
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...uot;parallel regions" (the
> regions between work group barriers) and generates a new function
> suitable
> for executing multiple work items (WI) in the work group. One method
> to
> generate such functions is to generate embarrassingly parallel
> "for-loops"
> (wiloops) that produce the multi-WI DLP execution. That is, the loop
> executes the code in the parallel regions for each work item in the
> work
> group.
>
> This step is needed to make the multi-WI kernel executable on
> non-SIMD/SIMT platforms (read: CPUs). On the "SPMD-tailored&...
2013 Jan 29
5
[LLVMdev] [PATCH] parallel loop metadata
....cs.uiuc.edu/pipermail/llvmdev/2013-January/058727.html
Maybe the safe thing here is to rename it back to the honest
"llvm.loop.parallel" or similar and we can add a separate one for
the assumed_dep later on. This one would support the truly parallel
loops (at least OpenMP for and OpenCL WIloops) where no compiler
checking at all can be assumed by the programmer.
Any objections? Paul Redmond?
--
--Pekka
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Pekka,
> Hi,
>
> I started to play with the LoopVectorizer of LLVM trunk
> on the work-item loops produced by pocl's OpenCL C
> kernel compiler, in hopes of implementing multi-work-item
> work group autovectorization in a modular manner.
>
Thanks for checking the Loop Vectorizer, I am interested in hearing your feedback. The Loop Vectorizer does not fit here.
2013 Jan 24
3
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi,
I started to play with the LoopVectorizer of LLVM trunk
on the work-item loops produced by pocl's OpenCL C
kernel compiler, in hopes of implementing multi-work-item
work group autovectorization in a modular manner.
The vectorizer seems to refuse to vectorize the loop if it sees
multiple writes to the same memory object within the
same iteration. In case of parallel loops such as
the
2013 Jan 29
1
[LLVMdev] [PATCH] parallel loop metadata
...nen wrote:
>
> > Maybe the safe thing here is to rename it back to the honest
> > "llvm.loop.parallel" or similar and we can add a separate one for
> > the assumed_dep later on. This one would support the truly parallel
> > loops (at least OpenMP for and OpenCL WIloops) where no compiler
> > checking at all can be assumed by the programmer.
> >
> > Any objections? Paul Redmond?
> >
>
> I don't have any objections. I think the only requirement is that the
> semantics are clearly defined.
>
> Personally I think this m...
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
----- Original Message -----
> From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "Nadav Rotem" <nrotem at apple.com>
> Sent: Friday, January 25, 2013 8:14:57 AM
> Subject: Re: [LLVMdev] LoopVectorizer in OpenCL
2013 Jan 28
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...CL code. That it would miscompile non-OpenCL
code is irrelevant.
+ for (BasicBlock::iterator ii = header->begin();
+ ii != header->end(); ii++) {
http://llvm.org/docs/CodingStandards.html#don-t-evaluate-end-every-time-through-a-loop
Nick
This metadata is added by the pocl's wiloops
> generation routine. It passes the pocl test suite when enabled but
> probably cannot vectorize many kernels (at least) due to the missing
> intra-kernel vector scalarizer.
>
> Some known problems that need addressing:
>
> - Metadata can only be attached to Instructions (not...
2013 Jan 29
0
[LLVMdev] [PATCH] parallel loop metadata
...2:42 PM, Pekka Jääskeläinen wrote:
> Maybe the safe thing here is to rename it back to the honest
> "llvm.loop.parallel" or similar and we can add a separate one for
> the assumed_dep later on. This one would support the truly parallel
> loops (at least OpenMP for and OpenCL WIloops) where no compiler
> checking at all can be assumed by the programmer.
>
> Any objections? Paul Redmond?
>
I don't have any objections. I think the only requirement is that the semantics are clearly defined.
Personally I think this metadata should be used to guide the vectorize...
2013 Jan 25
2
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
On 01/25/2013 04:00 PM, Hal Finkel wrote:
> Based on this experience, can you propose some metadata that would allow
> this to happen (so that the LoopVectorizer would be generally useful for
> POCL)? I suspect this same metadata might be useful in other contexts (such
> as implementing iteration-independence pragmas).
I cannot yet. In this hack I simply changed LoopVectorizer to
2013 Jan 25
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...to get comments
> to possibly move towards something committable.
>
> It simply looks for a metadata named 'parallel_for' in any of the
> instructions in the loop's header and assumes the loop is a parallel
> one if such is found. This metadata is added by the pocl's wiloops
> generation routine. It passes the pocl test suite when enabled but
> probably cannot vectorize many kernels (at least) due to the missing
> intra-kernel vector scalarizer.
>
> Some known problems that need addressing:
>
> - Metadata can only be attached to Instructions (no...
2013 Jan 31
3
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...is the following:
> WFV assumes that there is at least one outer loop that has increments of
> one, runs a multiple of the SIMD width iterations, and that every
> iteration is independent (barriers can be handled by the OpenCL driver
> *after* WFV).
Yes, this is the case with the "wiloops" work group generation
method of pocl. The parallel outer loops are the max 3 dimensions of the
local space. The actual wg barrier calls are converted to no-ops (compiler
barriers) for the current targets.
> On the other hand, LoopVectorizer may not be aimed at covering all kinds
> of...
2013 Jan 29
0
[LLVMdev] [PATCH] parallel loop metadata
...vmdev/2013-January/058727.html
>
> Maybe the safe thing here is to rename it back to the honest
> "llvm.loop.parallel" or similar and we can add a separate one for
> the assumed_dep later on. This one would support the truly parallel
> loops (at least OpenMP for and OpenCL WIloops) where no compiler
> checking at all can be assumed by the programmer.
Will parallel always be synonymous with no_interiteration_dependencies? I'm sightly worried that 'parallel' seems too much like a directive, and we may want it to mean something else in the future.
-Hal
>...
2013 Feb 19
0
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
...ent to
llvm.mem.parallel_loop_access? I'll do this.
> - Update the loop vectorizer (to update the metadata when it unrolls)
> - Update the regular unroller
I'll update the pocl's work-item replicator first (of which output
is effectively the same as a fully unrolled parallel wiloop) to mark the
iterations with the iteration argument and see where it gets the WG
vectorization using the upstream BBVectorizer.
> - Update the alias analysis (maybe this is sufficient for basic support in BBVectorize) - is your current code close enough for this?
The AA should be trivial. I...
2013 Jan 31
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
...is at least one outer loop that has
> > increments of
> > one, runs a multiple of the SIMD width iterations, and that every
> > iteration is independent (barriers can be handled by the OpenCL
> > driver
> > *after* WFV).
>
> Yes, this is the case with the "wiloops" work group generation
> method of pocl. The parallel outer loops are the max 3 dimensions of
> the
> local space. The actual wg barrier calls are converted to no-ops
> (compiler
> barriers) for the current targets.
>
> > On the other hand, LoopVectorizer may not be a...
2013 Jan 31
0
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
Hi Pekka, hi Nadav,
I didn't find the time to read this thread until now, sorry for that.
I actually think you are both right :).
As for the current status, the loop vectorizer is only able to vectorize
inner loops and (I think) does not handle function calls and memory
operations well. This will prevent it from vectorizing a large group of
OpenCL kernels, and certainly all
2013 Jan 25
2
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
> I am in favor of adding metadata to control different aspects of
> vectorization, mainly for supporting user-level pargmas [1] but also for
> DSLs. Before we start adding metadata to the IR we need to define the
> semantics of the tags. "Parallel_for" is too general. We also want to control
> vectorization factor, unroll factor, cost model, etc.
These are used to
2013 Feb 19
4
[LLVMdev] Pointer Context Metadata (was: Parallel Loop Metadata)
----- Original Message -----
> From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi>
> To: "Nadav Rotem" <nrotem at apple.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Tobias Grosser" <tobias at grosser.es>, "llvmdev at cs.uiuc.edu Dev"
> <llvmdev at cs.uiuc.edu>
> Sent: Tuesday, February 19, 2013