Pekka Jääskeläinen
2013-Jan-25 14:14 UTC
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
On 01/25/2013 04:00 PM, Hal Finkel wrote:> Based on this experience, can you propose some metadata that would allow > this to happen (so that the LoopVectorizer would be generally useful for > POCL)? I suspect this same metadata might be useful in other contexts (such > as implementing iteration-independence pragmas).I cannot yet. In this hack I simply changed LoopVectorizer to assume all loops the vectorizer sees are parallel (as the kernels I tried didn't have loops inside) to see where the other potential vectorization obstacles are. I'm planning to try next an approach where I add metadata to the loop header basic block that simply marks that the loop is parallel. The loop vectorizer, when it sees such metadata in the loop can then skip cross-iteration memory dependency checks. If you think this is a dead-end, please let me know. Otherwise, I'll try and see how it works. BR, -- Pekka
Hal Finkel
2013-Jan-25 14:21 UTC
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
----- Original Message -----> From: "Pekka Jääskeläinen" <pekka.jaaskelainen at tut.fi> > To: "Hal Finkel" <hfinkel at anl.gov> > Cc: "LLVM Developers Mailing List" <llvmdev at cs.uiuc.edu>, "Nadav Rotem" <nrotem at apple.com> > Sent: Friday, January 25, 2013 8:14:57 AM > Subject: Re: [LLVMdev] LoopVectorizer in OpenCL C work group autovectorization > > On 01/25/2013 04:00 PM, Hal Finkel wrote: > > Based on this experience, can you propose some metadata that would > > allow > > this to happen (so that the LoopVectorizer would be generally > > useful for > > POCL)? I suspect this same metadata might be useful in other > > contexts (such > > as implementing iteration-independence pragmas). > > I cannot yet. In this hack I simply changed LoopVectorizer to assume > all loops the vectorizer sees are parallel (as the kernels I tried > didn't have loops inside) to see where the other potential > vectorization obstacles are.Okay, I understand.> > I'm planning to try next an approach where I add metadata > to the loop header basic block that simply marks that the loop is > parallel. > The loop vectorizer, when it sees such metadata in the loop can then > skip cross-iteration memory dependency checks. If you think this is a > dead-end, please let me know. Otherwise, I'll try and see how it > works.My point is that I specifically think that you should try it. I'm curious to see how what you come up with might apply to other use cases as well. -Hal> > BR, > -- > Pekka >
Pekka Jääskeläinen
2013-Jan-25 17:16 UTC
[LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
On 01/25/2013 04:21 PM, Hal Finkel wrote:> My point is that I specifically think that you should try it. I'm curious > to see how what you come up with might apply to other use cases as well.OK, attached is the first quick attempt towards this. I'm not proposing committing this, but would like to get comments to possibly move towards something committable. It simply looks for a metadata named 'parallel_for' in any of the instructions in the loop's header and assumes the loop is a parallel one if such is found. This metadata is added by the pocl's wiloops generation routine. It passes the pocl test suite when enabled but probably cannot vectorize many kernels (at least) due to the missing intra-kernel vector scalarizer. Some known problems that need addressing: - Metadata can only be attached to Instructions (not Loops or even BasicBlocks), therefore the brute force approach of marking all instructions in the header BB in hopes of that optimizers might retain at least one of them. E.g., a special intrinsics call might be a better solution. - The loop header can be potentially shared with multilevel loops where the outer or inner levels might not be parallel. Not a problem in the pocl use case as the wiloops are fully parallel at all the three levels, but needs to be sorted out in a general solution. Perhaps it would be better to attach the metadata to the iteration count increment/check instruction(s) or similar to better identify the parallel (for) loop in question. - Are there optimizations that might push code *illegally* to the parallel loop from the outside of it? If there's, e.g., a non-parallel loop inside a parallel loop, loop invariant code motion might move code from the inner loop to the parallel loop's body. That should be a safe optimization, to my understanding (it preservers the ordering semantics), but I wonder if there are others that might cause breakage. -- Pekka -------------- next part -------------- A non-text attachment was scrubbed... Name: llvm-3.3-loopvectorizer-parallel_for-metadata-detection.patch Type: text/x-patch Size: 1761 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130125/e4f8f53b/attachment.bin>
Apparently Analagous Threads
- [LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
- [LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
- [LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
- [LLVMdev] LoopVectorizer in OpenCL C work group autovectorization
- [LLVMdev] LoopVectorizer in OpenCL C work group autovectorization