Hi Ether, On 18 April 2012 19:11, Hongbin Zheng <etherzhhb at gmail.com> wrote:> Instead of exporting the polyhedral model of the program with > metadata, another possible solution is designing a generic "Loop > Parallelism" analysis interface just like the AliasAnalysis group. > For a particular loop, the interface simply answer how many loop > iterations can run in parallel. With information provided by this > interface we can unroll the loop to expose vectorizable iterations and > apply vectorization to the unrolled loop with BBVectorizer.In the long run, this kind of paralellism detector should be replaced by Polly, but it could be a starting point. I only fear that the burden might out-weight the benefits in the short term.> To allow polly export its analysis result to FunctionPass/LoopPass, we > need to make the polyhedral loop parallelism analysis became a > FunctionPass, and schedule it before all polly passes but do nothing > in its runOnFunction method, after that we can let another pass of > polly to fill the actually analysis results into the polyhedral loop > parallelism analysis pass. By doing this, other > FunctionPasses/LoopPasses can query the parallelism information > calculated by Polly.That's the idea. Would be good if you could have both analysis+transform AND analysis-only pre-passes, to allow a more fine grained control over vectorization (and ease tests of other passes that use Polly's info).> If the parallelism information is available outside polly, we can also > find some way to move code generation support for OpenMP, Vecorization > and CUDA from Polly to LLVM transformation library, after that we can > also generate such code base on the analysis result of the SCEV based > parallelism analysis.LLVM already has OpenMP support, maybe we should follow a similar standard, or common them up. CUDA would be closer to OpenCL than OpenMP or Polly, I'm not sure there is a feasible way to make sure the semantics remains the same on such drastic changes of paradigm. -- cheers, --renato http://systemcall.org/
On Wed, 18 Apr 2012 20:17:35 +0100 Renato Golin <rengolin at systemcall.org> wrote:> Hi Ether, > > On 18 April 2012 19:11, Hongbin Zheng <etherzhhb at gmail.com> wrote: > > Instead of exporting the polyhedral model of the program with > > metadata, another possible solution is designing a generic "Loop > > Parallelism" analysis interface just like the AliasAnalysis group. > > For a particular loop, the interface simply answer how many loop > > iterations can run in parallel. With information provided by this > > interface we can unroll the loop to expose vectorizable iterations > > and apply vectorization to the unrolled loop with BBVectorizer. > > In the long run, this kind of paralellism detector should be replaced > by Polly, but it could be a starting point. I only fear that the > burden might out-weight the benefits in the short term. > > > > To allow polly export its analysis result to FunctionPass/LoopPass, > > we need to make the polyhedral loop parallelism analysis became a > > FunctionPass, and schedule it before all polly passes but do nothing > > in its runOnFunction method, after that we can let another pass of > > polly to fill the actually analysis results into the polyhedral loop > > parallelism analysis pass. By doing this, other > > FunctionPasses/LoopPasses can query the parallelism information > > calculated by Polly. > > That's the idea. Would be good if you could have both > analysis+transform AND analysis-only pre-passes, to allow a more fine > grained control over vectorization (and ease tests of other passes > that use Polly's info). > > > > If the parallelism information is available outside polly, we can > > also find some way to move code generation support for OpenMP, > > Vecorization and CUDA from Polly to LLVM transformation library, > > after that we can also generate such code base on the analysis > > result of the SCEV based parallelism analysis. > > LLVM already has OpenMP support, maybe we should follow a similar > standard, or common them up.I wish that were true; unless you know something I don't know, there is no parallelization support at this time. Polly has some ability to lower directly to the libgomp runtime, but that is not the same as OpenMP support. This is, however, something I'd like to work on. -Hal> > CUDA would be closer to OpenCL than OpenMP or Polly, I'm not sure > there is a feasible way to make sure the semantics remains the same on > such drastic changes of paradigm. >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On 19 April 2012 01:51, Hal Finkel <hfinkel at anl.gov> wrote:> I wish that were true; unless you know something I don't know, there > is no parallelization support at this time. Polly has some ability > to lower directly to the libgomp runtime, but that is not the same > as OpenMP support. This is, however, something I'd like to work on.Well, a few months ago there were some people (where there?) implementing support to read the pragmas, but I'm not sure how far did they get. It might be just my imagination, though... -- cheers, --renato http://systemcall.org/
Possibly Parallel Threads
- [LLVMdev] Vectorization metadata
- [LLVMdev] Vectorization metadata
- [LLVMdev] Vectorization metadata
- [LLVMdev] Proposal: Generic auto-vectorization and parallelization approach for LLVM and Polly
- [LLVMdev] Proposal: Generic auto-vectorization and parallelization approach for LLVM and Polly