Hal, I'm opening a new discussion on vectorization metadata, since it has little to do with fp-math. ;) What kind of metadata would you annotate in the instructions? If I remember from your talk, you're not doing any loop or whole-function analysis, possibly leaving it for Polly to help you along the way. I remember discussing it with Tobias that Polly could have three main steps: 1. Early analysis and annotation: a step that wouldn't modify code, but extensively annotate (with metadata), so that itself, and other passes like yours, could benefit from the polyhedral model. 2. Full polyhedral code modification: use the annotation of the previous pass to extensively modify code. This is what Polly does today, but the result of the analysis is not benefiting anyone except for Polly. This step can be fused with step 1 for performance reasons, but would be good to be able to pass only the analysis part for the benefit of the annotation, without the heavy modifications. This will be fundamental for independently testing vectorization passes that depend on Polly's metadata. 3. Code generation steps. As you said in your talk, and we discussed in the fp-math thread, some code-generation steps could be aware of the optimizations done via the metadata that was left in it. That will require some guarantees on metadata semantics and persistence that is not available today... Anyway, not sure any metadata-hardening will be very well accepted... ;) -- cheers, --renato http://systemcall.org/
On Wed, 18 Apr 2012 17:30:11 +0100 Renato Golin <rengolin at systemcall.org> wrote:> Hal, > > I'm opening a new discussion on vectorization metadata, since it has > little to do with fp-math. ;)Fair enough, but I was actually taking about how fp-math, etc. metadata is updated during vectorization. When vectorization fuses originally-independent instructions, it has the same metadata issues as GVN, etc. Metadata specifically for vectorization is another interesting topic, but I don't have any specific ideas for this at the moment. That having been said, I think that we do need to think about metadata that will help with vectorization; we might want to tag instructions as safe for speculative execution, for example. We might want to tag loops with a specific unrolling factor. We might want to be able to pass along specific alias independence results. None of these things are really specific to vectorization, but will generally have an impact on it. -Hal> > What kind of metadata would you annotate in the instructions? If I > remember from your talk, you're not doing any loop or whole-function > analysis, possibly leaving it for Polly to help you along the way. > > I remember discussing it with Tobias that Polly could have three main > steps: > > 1. Early analysis and annotation: a step that wouldn't modify code, > but extensively annotate (with metadata), so that itself, and other > passes like yours, could benefit from the polyhedral model. > > 2. Full polyhedral code modification: use the annotation of the > previous pass to extensively modify code. This is what Polly does > today, but the result of the analysis is not benefiting anyone except > for Polly. > > This step can be fused with step 1 for performance reasons, but would > be good to be able to pass only the analysis part > for the benefit of the annotation, without the heavy modifications. > This will be fundamental for independently testing vectorization > passes that depend on Polly's metadata. > > 3. Code generation steps. As you said in your talk, and we discussed > in the fp-math thread, some code-generation steps could be aware of > the optimizations done via the metadata that was left in it. > > That will require some guarantees on metadata semantics and > persistence that is not available today... Anyway, not sure any > metadata-hardening will be very well accepted... ;) >-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
On Thu, Apr 19, 2012 at 12:30 AM, Renato Golin <rengolin at systemcall.org> wrote:> Hal, > > I'm opening a new discussion on vectorization metadata, since it has > little to do with fp-math. ;) > > What kind of metadata would you annotate in the instructions? If I > remember from your talk, you're not doing any loop or whole-function > analysis, possibly leaving it for Polly to help you along the way. > > I remember discussing it with Tobias that Polly could have three main steps: > > 1. Early analysis and annotation: a step that wouldn't modify code, > but extensively annotate (with metadata), so that itself, and other > passes like yours, could benefit from the polyhedral model.hi renato, Instead of exporting the polyhedral model of the program with metadata, another possible solution is designing a generic "Loop Parallelism" analysis interface just like the AliasAnalysis group. For a particular loop, the interface simply answer how many loop iterations can run in parallel. With information provided by this interface we can unroll the loop to expose vectorizable iterations and apply vectorization to the unrolled loop with BBVectorizer. Like AliasAnalysis we can have difference implementation of loop parallelism analysis, i.e., we can have a light weight loop parallelism Analysis implementation based on SCEV (or the LoopDependency Analysis), and we can also have a Loop Parallelism Analysis implementation based on polyhedral model analysis implemented in polly (called polyhedral loop parallelism analysis), but analysis result of Polly is not visible at the scope of FunctionPass/LoopPass as all polly passes are RegionPasses right now. To allow polly export its analysis result to FunctionPass/LoopPass, we need to make the polyhedral loop parallelism analysis became a FunctionPass, and schedule it before all polly passes but do nothing in its runOnFunction method, after that we can let another pass of polly to fill the actually analysis results into the polyhedral loop parallelism analysis pass. By doing this, other FunctionPasses/LoopPasses can query the parallelism information calculated by Polly. If the parallelism information is available outside polly, we can also find some way to move code generation support for OpenMP, Vecorization and CUDA from Polly to LLVM transformation library, after that we can also generate such code base on the analysis result of the SCEV based parallelism analysis. best regards ether> > 2. Full polyhedral code modification: use the annotation of the > previous pass to extensively modify code. This is what Polly does > today, but the result of the analysis is not benefiting anyone except > for Polly. > > This step can be fused with step 1 for performance reasons, but would > be good to be able to pass only the analysis part > for the benefit of the annotation, without the heavy modifications. > This will be fundamental for independently testing vectorization > passes that depend on Polly's metadata. > > 3. Code generation steps. As you said in your talk, and we discussed > in the fp-math thread, some code-generation steps could be aware of > the optimizations done via the metadata that was left in it. > > That will require some guarantees on metadata semantics and > persistence that is not available today... Anyway, not sure any > metadata-hardening will be very well accepted... ;) > > -- > cheers, > --renato > > http://systemcall.org/ > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Hi Ether, On 18 April 2012 19:11, Hongbin Zheng <etherzhhb at gmail.com> wrote:> Instead of exporting the polyhedral model of the program with > metadata, another possible solution is designing a generic "Loop > Parallelism" analysis interface just like the AliasAnalysis group. > For a particular loop, the interface simply answer how many loop > iterations can run in parallel. With information provided by this > interface we can unroll the loop to expose vectorizable iterations and > apply vectorization to the unrolled loop with BBVectorizer.In the long run, this kind of paralellism detector should be replaced by Polly, but it could be a starting point. I only fear that the burden might out-weight the benefits in the short term.> To allow polly export its analysis result to FunctionPass/LoopPass, we > need to make the polyhedral loop parallelism analysis became a > FunctionPass, and schedule it before all polly passes but do nothing > in its runOnFunction method, after that we can let another pass of > polly to fill the actually analysis results into the polyhedral loop > parallelism analysis pass. By doing this, other > FunctionPasses/LoopPasses can query the parallelism information > calculated by Polly.That's the idea. Would be good if you could have both analysis+transform AND analysis-only pre-passes, to allow a more fine grained control over vectorization (and ease tests of other passes that use Polly's info).> If the parallelism information is available outside polly, we can also > find some way to move code generation support for OpenMP, Vecorization > and CUDA from Polly to LLVM transformation library, after that we can > also generate such code base on the analysis result of the SCEV based > parallelism analysis.LLVM already has OpenMP support, maybe we should follow a similar standard, or common them up. CUDA would be closer to OpenCL than OpenMP or Polly, I'm not sure there is a feasible way to make sure the semantics remains the same on such drastic changes of paradigm. -- cheers, --renato http://systemcall.org/
On 18 April 2012 17:54, Hal Finkel <hfinkel at anl.gov> wrote:> Metadata specifically for vectorization is another interesting topic, > but I don't have any specific ideas for this at the moment. That having > been said, I think that we do need to think about metadata that will > help with vectorization; we might want to tag instructions as safe for > speculative execution, for example. We might want to tag loops with a > specific unrolling factor. We might want to be able to pass along > specific alias independence results. None of these things are really > specific to vectorization, but will generally have an impact on it.I think this is a very important feature for vectorization. If we start building small passes for small vectorization steps (like one for hoisting loop constants, other to simplify the induction range, other to unroll loops), we might not be able to predict the best strategy, since early changes might shadow better strategies later. Having metadata allows one to infer what's the best strategy as a whole, and apply it, rather than hoping for a good sequence of passes... We still can have separate passes for each task, but not run them all on all code all the time. So, if an early analysis pass annotate saying in a particular loop, you should only hoist the loop-constants (aggressive inlining is possible, for ex.), while on another you should actually unroll, then each pass can run independently and trust the metadata on each loop/block/instruction. -- cheers, --renato http://systemcall.org/
Hi Hal, On 04/18/2012 07:54 PM, Hal Finkel wrote:> We might want to be able to pass along > specific alias independence results. None of these things are really > specific to vectorization, but will generally have an impact on it.For what it's worth, in pocl we do something along this lines now [1]. We annotate the OpenCL C kernel instructions with the OpenCL work item id and the "parallel region id" (region between barriers). As you probably know, in OpenCL C the work items are fully independent "threads of execution" between the barrier regions which is useful information to pass along. This metadata is used to both guide a (modified) bb-vectorizer to perform the work group auto-vectorization (whole function vectorization, if you will) more efficiently and to improve the alias analysis for instruction scheduling (and other optimizations that might benefit). The benefit of not just vectorizing directly the parallel regions is that we can choose to wg-vectorize and/or to statically instruction parallelize using the same input from pocl. It would be really nice to have a set of "standard independence metadata" in LLVM that would cover also this scenario. [1] http://bazaar.launchpad.net/~pocl/pocl/trunk/revision/237 BR, -- --Pekka