Renato Golin via llvm-dev
2017-Mar-17 21:23 UTC
[llvm-dev] LoopVectorizer with ifconversion
On 17 March 2017 at 16:34, Hal Finkel <hfinkel at anl.gov> wrote:> In general, this is true everywhere. In a large vectorized loop, this cost > may well be worthwhile. The idea is that the cost model should account for > all of these costs. If it doesn't properly, we should fix that.Isn't this only worth when the SIMD instructions can be conditionalised per lane? I believe AVX512 and SVE have such abilities, but not NEON or SSE. Does SystemZ support conditionalisation in SIMD? cheers, --renato
Jonas Paulsson via llvm-dev
2017-Mar-20 08:59 UTC
[llvm-dev] LoopVectorizer with ifconversion
On 2017-03-17 23:23, Renato Golin wrote:> On 17 March 2017 at 16:34, Hal Finkel <hfinkel at anl.gov> wrote: >> In general, this is true everywhere. In a large vectorized loop, this cost >> may well be worthwhile. The idea is that the cost model should account for >> all of these costs. If it doesn't properly, we should fix that. > Isn't this only worth when the SIMD instructions can be > conditionalised per lane? I believe AVX512 and SVE have such > abilities, but not NEON or SSE. > > Does SystemZ support conditionalisation in SIMD? >No, z13 has no masked vector operations. I think this is checked with the isLegalMasked...() hooks. Is this implicitly assumed in LoopVectorizers if-conversion algorithm somewhere? I see the scalar costs are adjusted with 'ScalarCost /= getReciprocalPredBlockProb()'. It seems that this should be done for all blocks that remain after vectorization? If not doing the simple thing and disabling store with predication vectorization, I agree with Hal that this could be worth modeling as an extra cost to not reject some large loop with just one conditional store, or so. I think this extra cost should be added to the compare, but I am not sure if it would be better to instead add it to the branch, because there are also cases of e.g. (AND (COMPARE, COMPARE)). Adding a cost to a vectorized branch instead could be done by assuming that a conditional branch would have to be set up for each branch after the vector compare. Does the loop vectorizer know which blocks that need predication in the scalar loop will remain after vectorization? SystemZ could check such blocks by looking for stores, but that seems like extra work. thanks, Jonas
Matthew Simpson via llvm-dev
2017-Mar-20 20:27 UTC
[llvm-dev] LoopVectorizer with ifconversion
For loops with conditional stores, the EnableCondStoresVectorization feature allows the vectorizer to scalarize and predicate the stores in the vectorized loop. The same thing is done for conditional instructions that may divide by zero. In both cases the cost model does make an attempt to estimate the cost. It accounts for the cost of the extra inserts/extracts due to scalarization and the fact that the predicated blocks may not be executed the same number of times as the header. It currently assumes a block probably of 0.5 for predicated blocks and scales the costs of the predicated instructions accordingly. You can take look at the AArch64/predication_costs.ll test to see some examples of how the cost is computed. This may give you some ideas as to what needs to be extended and/or modeled more accurately. -- Matt On Mon, Mar 20, 2017 at 4:59 AM, Jonas Paulsson via llvm-dev < llvm-dev at lists.llvm.org> wrote:> > On 2017-03-17 23:23, Renato Golin wrote: > >> On 17 March 2017 at 16:34, Hal Finkel <hfinkel at anl.gov> wrote: >> >>> In general, this is true everywhere. In a large vectorized loop, this >>> cost >>> may well be worthwhile. The idea is that the cost model should account >>> for >>> all of these costs. If it doesn't properly, we should fix that. >>> >> Isn't this only worth when the SIMD instructions can be >> conditionalised per lane? I believe AVX512 and SVE have such >> abilities, but not NEON or SSE. >> >> Does SystemZ support conditionalisation in SIMD? >> >> No, z13 has no masked vector operations. I think this is checked with the > isLegalMasked...() hooks. Is this implicitly assumed in LoopVectorizers > if-conversion algorithm somewhere? I see the scalar costs are adjusted with > 'ScalarCost /= getReciprocalPredBlockProb()'. It seems that this should be > done for all blocks that remain after vectorization? > > If not doing the simple thing and disabling store with predication > vectorization, I agree with Hal that this could be worth modeling as an > extra cost to not reject some large loop with just one conditional store, > or so. > > I think this extra cost should be added to the compare, but I am not sure > if it would be better to instead add it to the branch, because there are > also cases of e.g. (AND (COMPARE, COMPARE)). Adding a cost to a vectorized > branch instead could be done by assuming that a conditional branch would > have to be set up for each branch after the vector compare. > > Does the loop vectorizer know which blocks that need predication in the > scalar loop will remain after vectorization? SystemZ could check such > blocks by looking for stores, but that seems like extra work. > > thanks, > > Jonas > > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170320/c783142c/attachment.html>
On 03/20/2017 03:59 AM, Jonas Paulsson wrote:> > On 2017-03-17 23:23, Renato Golin wrote: >> On 17 March 2017 at 16:34, Hal Finkel <hfinkel at anl.gov> wrote: >>> In general, this is true everywhere. In a large vectorized loop, >>> this cost >>> may well be worthwhile. The idea is that the cost model should >>> account for >>> all of these costs. If it doesn't properly, we should fix that. >> Isn't this only worth when the SIMD instructions can be >> conditionalised per lane? I believe AVX512 and SVE have such >> abilities, but not NEON or SSE. >> >> Does SystemZ support conditionalisation in SIMD? >> > No, z13 has no masked vector operations. I think this is checked with > the isLegalMasked...() hooks. Is this implicitly assumed in > LoopVectorizers if-conversion algorithm somewhere? I see the scalar > costs are adjusted with 'ScalarCost /= getReciprocalPredBlockProb()'. > It seems that this should be done for all blocks that remain after > vectorization? > > If not doing the simple thing and disabling store with predication > vectorization, I agree with Hal that this could be worth modeling as > an extra cost to not reject some large loop with just one conditional > store, or so. > > I think this extra cost should be added to the compare, but I am not > sure if it would be better to instead add it to the branch, because > there are also cases of e.g. (AND (COMPARE, COMPARE)). Adding a cost > to a vectorized branch instead could be done by assuming that a > conditional branch would have to be set up for each branch after the > vector compare.Yes, I'd assume that you'd want to add some relative cost of a compare, extract, and a correctly-predicted branch (etc.).> > Does the loop vectorizer know which blocks that need predication in > the scalar loop will remain after vectorization? SystemZ could check > such blocks by looking for stores, but that seems like extra work.Yes. Legal->blockNeedsPredication (there's also Legal->isScalarWithPredication). -Hal> > thanks, > > Jonas >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory