Michael Zolotukhin via llvm-dev
2016-Aug-30 01:34 UTC
[llvm-dev] Questions on LLVM vectorization diagnostics
Hi Hideki, Thanks for the interesting writeup!> On Aug 27, 2016, at 7:15 AM, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 25 August 2016 at 05:46, Saito, Hideki via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Now, I have one question. Suppose we'd like to split the vectorization decision as an Analysis pass and vectorization >> transformation as a Transformation pass. Is it acceptable if an Analysis pass creates new Instructions and new BasicBlocks, >> keep them unreachable from the underlying IR of the Function/Loop, and pass those to the Transformation pass as >> part of Analysis's internal data? We've been operating under the assumption that such Analysis pass behavior is unacceptable. > > Hi Saito, > > First let me say, impressive work you guys are planning for the > vectoriser. Outer loop vectorisation is not an easy task, so feel free > to share your ideas early and often, as that would probably mean a lot > less work for you guys, too. > > Regarding generation of dead code, I don't remember any pass doing > this (though I haven't looked at many). Most passes do some kind of > clean up at the end, and DCE ends up getting rid of spurious things > here and there, so you can't *rely* on it being there. It's even worse > than metadata, which is normally left alone *unless* needs to be > destroyed, dead code is purposely destroyed. > > But analysis passes shouldn't be touching code in the first place. Of > course, creating additional dead code is not strictly changing code, > but this could be cause for code bloat, leaks, or making it worse for > other analysis. My personal view is that this is a bad move.While I agree with Renato, it should be definitely worth mentioning LCSSA in this context. I still don’t know how we should call it: an analysis or a transformation. It sometimes can be viewed as an analysis meaning that a pass can ‘preserve’ it (i.e. the IR is still in LCSSA form after the pass). At the same time, LCSSA obviously can and does transform IR, but it does so by generating a ‘dead’ code - phi-nodes that can later be folded easily. So, to answer your question - I think it is ok to do some massaging of the IR before your pass, and you could use LCSSA as an example of how it can be implemented. However, creating unreachable blocks sound a bit hacky - it looks like we’re just going to use IR as some shadow data-structure. If that’s the case, why not to use a shadow data-structure :-) ? ScalarEvolution might be an example of how this can be done - it creates a map from IR instructions to SCEV-objects. Thanks, Michael> > >> Please let us know if this is a generally acceptable way for an Analysis pass to work ---- this might make our development >> move quicker. Why we'd want to do this? As mentioned above, we need to "pseudo-massage inner loop control flow" >> before deciding where/whether to vectorize. Hope someone can give us a clear answer. > > We have discussed the split of analysis vs transformation with Polly > years ago, and it was considered "a good idea". But that relied > exclusively on metadata. > > So, first, the vectorisers and Polly would pass on the IR as an > analysis pass first, leaving a trail of width/unroll factors, loop > dependency trackers, recommended skew factors, etc. Then, the > transformation passes (Loop/SLP/Polly) would use that information and > transform the loop the best they can, and clean up the metadata, > leaving only a single "width=1", which means, "don't try to vectorise > any more". Clean ups as required, after the transformation pass. > > The current loop vectoriser is split in three stages: validity, cost > and transformation. We only check the cost if we know of any valid > transformation, and we only transform if we know of any better cost > than width=1. Where the cost analysis would be, depends on how we > arrange Polly, Loop and SLP vectoriser and their analysis passes. > Conservatively, I'd leave the cost analysis with the transformation, > so we only do it once. > > The outer loop proposal, then, suffers from the cost analysis not > being done at the same time as the validity analysis. It would also > complicate a lot to pass "more than one" types of possible > vectorisation techniques via the same metadata structure, which will > probably already be complex enough. This is the main reason why we > haven't split yet. > > Given that scenario of split responsibility, I'm curious as to your > opinion on the matter of carrying (and sharing) metadata between > different vectorisation analysis passes and different transformation > types. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Saito, Hideki via llvm-dev
2016-Aug-30 21:55 UTC
[llvm-dev] Questions on LLVM vectorization diagnostics
Renato and Michael, thanks for your replies. Renato > Outer loop vectorisation is not an easy task, so feel free Renato > to share your ideas early and often, as that would probably mean a lot Renato > less work for you guys, too. Will definitely do. Renato >But analysis passes shouldn't be touching code in the first place. Of course, creating Renato >additional dead code is not strictly changing code, but this could be cause for code Renato >bloat, leaks, or making it worse for other analysis. My personal view is that this is a bad move. Michael >However, creating unreachable blocks sound a bit hacky - it looks like we’re just going to use IR Michael >as some shadow data-structure. Also got another person via private e-mail saying Analysis pass creating Instruction/BasicBlock "is generally frowned upon". I think this is showing enough people disliking the idea of an Analysis pass creating Instruction/BasicBlock and use that to pass (as part of) Analysis info to the Transformation pass. That was our original assumption, and it's good to know that our assumption has support (don't know how wide, but at least a good start). Now, the next question is "how else to make something similar to happen". Renato > We have discussed the split of analysis vs transformation with Polly years ago, and it was considered Renato > "a good idea". Same thinking here. Renato > But that relied exclusively on metadata. Snip snip snip Renato > Given that scenario of split responsibility, I'm curious as to your opinion on the matter of carrying (and Renato > sharing) metadata between different vectorisation analysis passes and different transformation types. Michael > why not to use a shadow data-structure :-) ? ScalarEvolution might be an example of how this can be Michael > done - it creates a map from IR instructions to SCEV-objects. Our thinking is that what we'd like to communicate between VecAnalysis and VecTransform is not simple enough to represent well in metadata form, in a long run. As such, we are currently going after creating an internal data structure (which eventually will become data structure of VectAnalysis, to be referenced from VecTransform through member functions). As I wrote before, since we have a need to represent a new control flow (within Analysis), this internal data structure we are introducing is an abstraction of Basic Block, and soon-to-come NFC patch essentially stops there. Next step is to add a new control flow for new optimization/functionality (that's useful enough in LoopVectorize.cpp). At that moment, we inevitably have to represent newly generated "instructions" in an abstracted way and the abstracted Basic Blocks start to diverge from underlying real Basic Blocks. One might call this "a shadow data-structure". Once RFC and the patches comes out, I hope enough of you will like the approach we are taking. We'll find out at that time. I think enough is said before real RFC and NFC patch. Let me get back to RFC/patch so that you'll see them sooner than later. In the meantime, I'll be glad if more people express their likes/dislikes on our general approach. Thanks, Hideki -----Original Message----- From: mzolotukhin at apple.com [mailto:mzolotukhin at apple.com] Sent: Monday, August 29, 2016 6:35 PM To: Renato Golin <renato.golin at linaro.org> Cc: Saito, Hideki <hideki.saito at intel.com>; llvm-dev at lists.llvm.org; Santanu Das <cs15mtech11018 at iith.ac.in>; Dangeti Tharun kumar <cs15mtech11002 at iith.ac.in> Subject: Re: [llvm-dev] Questions on LLVM vectorization diagnostics Hi Hideki, Thanks for the interesting writeup!> On Aug 27, 2016, at 7:15 AM, Renato Golin via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > On 25 August 2016 at 05:46, Saito, Hideki via llvm-dev > <llvm-dev at lists.llvm.org> wrote: >> Now, I have one question. Suppose we'd like to split the >> vectorization decision as an Analysis pass and vectorization >> transformation as a Transformation pass. Is it acceptable if an >> Analysis pass creates new Instructions and new BasicBlocks, keep them unreachable from the underlying IR of the Function/Loop, and pass those to the Transformation pass as part of Analysis's internal data? We've been operating under the assumption that such Analysis pass behavior is unacceptable. > > Hi Saito, > > First let me say, impressive work you guys are planning for the > vectoriser. Outer loop vectorisation is not an easy task, so feel free > to share your ideas early and often, as that would probably mean a lot > less work for you guys, too. > > Regarding generation of dead code, I don't remember any pass doing > this (though I haven't looked at many). Most passes do some kind of > clean up at the end, and DCE ends up getting rid of spurious things > here and there, so you can't *rely* on it being there. It's even worse > than metadata, which is normally left alone *unless* needs to be > destroyed, dead code is purposely destroyed. > > But analysis passes shouldn't be touching code in the first place. Of > course, creating additional dead code is not strictly changing code, > but this could be cause for code bloat, leaks, or making it worse for > other analysis. My personal view is that this is a bad move.While I agree with Renato, it should be definitely worth mentioning LCSSA in this context. I still don’t know how we should call it: an analysis or a transformation. It sometimes can be viewed as an analysis meaning that a pass can ‘preserve’ it (i.e. the IR is still in LCSSA form after the pass). At the same time, LCSSA obviously can and does transform IR, but it does so by generating a ‘dead’ code - phi-nodes that can later be folded easily. So, to answer your question - I think it is ok to do some massaging of the IR before your pass, and you could use LCSSA as an example of how it can be implemented. However, creating unreachable blocks sound a bit hacky - it looks like we’re just going to use IR as some shadow data-structure. If that’s the case, why not to use a shadow data-structure :-) ? ScalarEvolution might be an example of how this can be done - it creates a map from IR instructions to SCEV-objects. Thanks, Michael> > >> Please let us know if this is a generally acceptable way for an >> Analysis pass to work ---- this might make our development move quicker. Why we'd want to do this? As mentioned above, we need to "pseudo-massage inner loop control flow" >> before deciding where/whether to vectorize. Hope someone can give us a clear answer. > > We have discussed the split of analysis vs transformation with Polly > years ago, and it was considered "a good idea". But that relied > exclusively on metadata. > > So, first, the vectorisers and Polly would pass on the IR as an > analysis pass first, leaving a trail of width/unroll factors, loop > dependency trackers, recommended skew factors, etc. Then, the > transformation passes (Loop/SLP/Polly) would use that information and > transform the loop the best they can, and clean up the metadata, > leaving only a single "width=1", which means, "don't try to vectorise > any more". Clean ups as required, after the transformation pass. > > The current loop vectoriser is split in three stages: validity, cost > and transformation. We only check the cost if we know of any valid > transformation, and we only transform if we know of any better cost > than width=1. Where the cost analysis would be, depends on how we > arrange Polly, Loop and SLP vectoriser and their analysis passes. > Conservatively, I'd leave the cost analysis with the transformation, > so we only do it once. > > The outer loop proposal, then, suffers from the cost analysis not > being done at the same time as the validity analysis. It would also > complicate a lot to pass "more than one" types of possible > vectorisation techniques via the same metadata structure, which will > probably already be complex enough. This is the main reason why we > haven't split yet. > > Given that scenario of split responsibility, I'm curious as to your > opinion on the matter of carrying (and sharing) metadata between > different vectorisation analysis passes and different transformation > types. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
Renato Golin via llvm-dev
2016-Aug-30 22:23 UTC
[llvm-dev] Questions on LLVM vectorization diagnostics
On 30 August 2016 at 22:55, Saito, Hideki <hideki.saito at intel.com> wrote:> Our thinking is that what we'd like to communicate between VecAnalysis and VecTransform > is not simple enough to represent well in metadata form, in a long run. As such, we are currently > going after creating an internal data structure (which eventually will become data structure of > VectAnalysis, to be referenced from VecTransform through member functions). As I wrote before, > since we have a need to represent a new control flow (within Analysis), this internal data structure we > are introducing is an abstraction of Basic Block, and soon-to-come NFC patch essentially stops there. > Next step is to add a new control flow for new optimization/functionality (that's useful enough in > LoopVectorize.cpp). At that moment, we inevitably have to represent newly generated "instructions" > in an abstracted way and the abstracted Basic Blocks start to diverge from underlying real Basic Blocks. > One might call this "a shadow data-structure". Once RFC and the patches comes out, I hope enough of > you will like the approach we are taking. We'll find out at that time.Hi Hideki, Thanks for sharing the roadmap, I'm curious as to how this shadow basic block will look like. :) cheers, --renato