thr3ads.net - llvm dev - [llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation? [Jun 2021]

If this information is useful, please help other people find it:
Share via:

Sander De Smalen via llvm-dev

2021-Jun-10 20:50 UTC

[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?

Hi,

Last year we added the InstructionCost class which adds the ability to
represent that an operation cannot be costed, i.e. operations that cannot
be expanded by the code-generator will have an invalid cost.

We started using this information in the Loop Vectorizer for scalable
auto-vectorization. The LV has a legality- and a cost-model stage, which are
conceptually separate concepts with different purposes. But with the
introduction of having valid/invalid costs it's more inviting to use the
cost-model as 'legalisation', which leads us to the following question:

Should we be using the cost-model to do legalisation?

'Legalisation' in this context means asking the question beforehand if
the
code-generator can handle the IR emitted from the LV. Examples of
operations that need such legalisation are predicated divides (at least
until we can use the llvm.vp intrinsics), or intrinsic calls that have no
scalable-vector equivalent. For fixed-width vectors this legalisation issue
is mostly moot, since operations on fixed-width vectors can be scalarised.
For scalable vectors this is neither supported nor feasible [1].

This means there's the option to do one of two things:
[Option 1]

Add checks to the LV legalisation to see if scalable-vectorisation is
feasible. If so, assert the cost must be valid. Otherwise discard scalable
VFs as possible candidates.
* This has the benefit that the compiler can avoid
calculating/considering VPlans that we know cannot be costed.
* Legalisation and cost-model keep each other in check. If something
cannot be costed then either the cost-model or legalisation was
incomplete.

[Option 2]

Leave the question about legalisation to the CostModel, i.e. if the
CostModel says that <operation> for `VF=vscale x N` is Invalid, then avoid
selecting that VF.
* This has the benefit that we don't need to do work up-front to
discard scalable VFs, keeping the LV design simpler.
* This makes gaps in the cost-model more difficult to spot.

Note that it's not useful to combine Option 1 and Option 2, because having
two ways to choose from takes away the need to do legalisation beforehand,
and so that's basically a choice for Option 2.

Both approaches lead to the same end-result, but we currently have a few
patches in flight that have taken Option 1, and this led to some questions
about the approach from both Florian and David Green. So we're looking to
reach to a consensus and decision on what way to move forward.

I've tentatively added this as a topic to the agenda of the upcoming LLVM
SVE/Scalable Vector Sync-up meeting next Tuesday (June 15th, [2]) as an
opportunity to discuss this more freely if we can get enough people who
actively work on the LV together in that meeting (like Florian and David,
although please forward to anyone else who might have input on this).

Thanks,

Sander

[1] Expanding the vector operation into a scalarisation loop is currently
not supported. It could be done, but we have done extensive
experimentation with loops that handle each element of a scalable
vector sequentially, but this has never proved beneficial, even when
using special instructions to efficiently increment the predicate
vector. I doubt this will be any different for other scalable vector
architectures, because of the loop control overhead. Also the
insertion/extraction of elements from a scalable vector is unlikely to
be as cheap as for fixed-width vectors.

[2]
https://docs.google.com/document/d/1UPH2Hzou5RgGT8XfO39OmVXKEibWPfdYLELSaHr3xzo/edit?usp=sharing

Sjoerd Meijer via llvm-dev

2021-Jun-11 06:35 UTC

head link

[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?

Please correct me if I am wrong, but I thought this discussion was brought up by
a temporarily workaround in the cost-model, working around current codegen
limitations that needs fixing.
I am asking because Option 1 is what we currently have, and I don't see
reasons to depart from this general idea, even if the cost-model can return
Invalid due to a workaround that would hopefully disappear soon. That would mean
the assert that the legalisation and cost-model are in sync would need to be
skipped, and while that is not ideal, I don't see that as a big problem and
I don't see it as a total departure from Option 1, especially if this is all
temporarily.

And does this discussion disappear if the codegen issues are fixed? I don't
know the scale of the problem/work, but is it not easier to fix that avoiding
this cost-model vs. legalisation discussion?
________________________________
From: llvm-dev <llvm-dev-bounces at lists.llvm.org> on behalf of Sander De
Smalen via llvm-dev <llvm-dev at lists.llvm.org>
Sent: 10 June 2021 21:50
To: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [llvm-dev] LoopVectorizer: Should the cost-model be used for
legalisation?

Hi,

Last year we added the InstructionCost class which adds the ability to
represent that an operation cannot be costed, i.e. operations that cannot
be expanded by the code-generator will have an invalid cost.

We started using this information in the Loop Vectorizer for scalable
auto-vectorization. The LV has a legality- and a cost-model stage, which are
conceptually separate concepts with different purposes. But with the
introduction of having valid/invalid costs it's more inviting to use the
cost-model as 'legalisation', which leads us to the following question:

   Should we be using the cost-model to do legalisation?

'Legalisation' in this context means asking the question beforehand if
the
code-generator can handle the IR emitted from the LV. Examples of
operations that need such legalisation are predicated divides (at least
until we can use the llvm.vp intrinsics), or intrinsic calls that have no
scalable-vector equivalent. For fixed-width vectors this legalisation issue
is mostly moot, since operations on fixed-width vectors can be scalarised.
For scalable vectors this is neither supported nor feasible [1].

This means there's the option to do one of two things:
 
[Option 1]

Add checks to the LV legalisation to see if scalable-vectorisation is
feasible. If so, assert the cost must be valid. Otherwise discard scalable
VFs as possible candidates.
 * This has the benefit that the compiler can avoid
   calculating/considering VPlans that we know cannot be costed.
 * Legalisation and cost-model keep each other in check. If something
   cannot be costed then either the cost-model or legalisation was
   incomplete.


[Option 2]

Leave the question about legalisation to the CostModel, i.e. if the
CostModel says that <operation> for `VF=vscale x N` is Invalid, then avoid
selecting that VF.
 * This has the benefit that we don't need to do work up-front to
   discard scalable VFs, keeping the LV design simpler.
 * This makes gaps in the cost-model more difficult to spot.

Note that it's not useful to combine Option 1 and Option 2, because having
two ways to choose from takes away the need to do legalisation beforehand,
and so that's basically a choice for Option 2.

Both approaches lead to the same end-result, but we currently have a few
patches in flight that have taken Option 1, and this led to some questions
about the approach from both Florian and David Green. So we're looking to
reach to a consensus and decision on what way to move forward.

I've tentatively added this as a topic to the agenda of the upcoming LLVM
SVE/Scalable Vector Sync-up meeting next Tuesday (June 15th, [2]) as an
opportunity to discuss this more freely if we can get enough people who
actively work on the LV together in that meeting (like Florian and David,
although please forward to anyone else who might have input on this).

Thanks,

Sander


[1] Expanding the vector operation into a scalarisation loop is currently
    not supported. It could be done, but we have done extensive
    experimentation with loops that handle each element of a scalable
    vector sequentially, but this has never proved beneficial, even when
    using special instructions to efficiently increment the predicate
    vector. I doubt this will be any different for other scalable vector
    architectures, because of the loop control overhead. Also the
    insertion/extraction of elements from a scalable vector is unlikely to
    be as cheap as for fixed-width vectors.

[2]
https://docs.google.com/document/d/1UPH2Hzou5RgGT8XfO39OmVXKEibWPfdYLELSaHr3xzo/edit?usp=sharing

_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210611/89e7b799/attachment.html>

Vineet Kumar via llvm-dev

2021-Jun-11 18:36 UTC

head link

[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?

Hi,

IIUC, for option 1, unless the legalizer and the cost-model follow the 
same logic to determine the feasibility of scalable-vectorization, it 
seems inaccurate to assert that if something is legal it must have a 
valid cost. But if that assertion is relaxed, then as you mentioned in 
another reply, it would be a  de facto choice for option 2. If we don't 
relax this assertion then for it to be accurate legalizer and cost-model 
will both do redundant work. Also, IIRC, an invalid cost also models a 
too-expensive-to-be-useful operation but that doesn't really imply 
illegal operation.

I am not sure if this makes sense, but with option 2, since VPlans are 
not discarded upfront there is a chance that a transofrmation on VPlan 
might actually make it feasible.

I didn't think deeply about this so I might be missing some obvious and 
key facts. Please correct me if I am wrong.


Best,

Vineet


On 2021-06-10 10:50 p.m., Sander De Smalen via llvm-dev
wrote:> Hi,
>
> Last year we added the InstructionCost class which adds the ability to
> represent that an operation cannot be costed, i.e. operations that cannot
> be expanded by the code-generator will have an invalid cost.
>
> We started using this information in the Loop Vectorizer for scalable
> auto-vectorization. The LV has a legality- and a cost-model stage, which
are
> conceptually separate concepts with different purposes. But with the
> introduction of having valid/invalid costs it's more inviting to use
the
> cost-model as 'legalisation', which leads us to the following
question:
>
>     Should we be using the cost-model to do legalisation?
>
> 'Legalisation' in this context means asking the question beforehand
if the
> code-generator can handle the IR emitted from the LV. Examples of
> operations that need such legalisation are predicated divides (at least
> until we can use the llvm.vp intrinsics), or intrinsic calls that have no
> scalable-vector equivalent. For fixed-width vectors this legalisation issue
> is mostly moot, since operations on fixed-width vectors can be scalarised.
> For scalable vectors this is neither supported nor feasible [1].
>
> This means there's the option to do one of two things:
>  
> [Option 1]
>
> Add checks to the LV legalisation to see if scalable-vectorisation is
> feasible. If so, assert the cost must be valid. Otherwise discard scalable
> VFs as possible candidates.
>   * This has the benefit that the compiler can avoid
>     calculating/considering VPlans that we know cannot be costed.
>   * Legalisation and cost-model keep each other in check. If something
>     cannot be costed then either the cost-model or legalisation was
>     incomplete.
>
>
> [Option 2]
>
> Leave the question about legalisation to the CostModel, i.e. if the
> CostModel says that <operation> for `VF=vscale x N` is Invalid, then
avoid
> selecting that VF.
>   * This has the benefit that we don't need to do work up-front to
>     discard scalable VFs, keeping the LV design simpler.
>   * This makes gaps in the cost-model more difficult to spot.
>
> Note that it's not useful to combine Option 1 and Option 2, because
having
> two ways to choose from takes away the need to do legalisation beforehand,
> and so that's basically a choice for Option 2.
>
> Both approaches lead to the same end-result, but we currently have a few
> patches in flight that have taken Option 1, and this led to some questions
> about the approach from both Florian and David Green. So we're looking
to
> reach to a consensus and decision on what way to move forward.
>
> I've tentatively added this as a topic to the agenda of the upcoming
LLVM
> SVE/Scalable Vector Sync-up meeting next Tuesday (June 15th, [2]) as an
> opportunity to discuss this more freely if we can get enough people who
> actively work on the LV together in that meeting (like Florian and David,
> although please forward to anyone else who might have input on this).
>
> Thanks,
>
> Sander
>
>
> [1] Expanding the vector operation into a scalarisation loop is currently
>      not supported. It could be done, but we have done extensive
>      experimentation with loops that handle each element of a scalable
>      vector sequentially, but this has never proved beneficial, even when
>      using special instructions to efficiently increment the predicate
>      vector. I doubt this will be any different for other scalable vector
>      architectures, because of the loop control overhead. Also the
>      insertion/extraction of elements from a scalable vector is unlikely to
>      be as cheap as for fixed-width vectors.
>
> [2]
https://docs.google.com/document/d/1UPH2Hzou5RgGT8XfO39OmVXKEibWPfdYLELSaHr3xzo/edit?usp=sharing
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
http://bsc.es/disclaimer

Florian Hahn via llvm-dev

2021-Jun-15 11:48 UTC

head link

[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?

Hi,

Thanks for bringing this up!
> On Jun 10, 2021, at 21:50, Sander De Smalen <Sander.DeSmalen at
arm.com> wrote:
> 
> Hi,
> 
> Last year we added the InstructionCost class which adds the ability to
> represent that an operation cannot be costed, i.e. operations that cannot
> be expanded by the code-generator will have an invalid cost.
> 
> We started using this information in the Loop Vectorizer for scalable
> auto-vectorization. The LV has a legality- and a cost-model stage, which
are
> conceptually separate concepts with different purposes. But with the 
> introduction of having valid/invalid costs it's more inviting to use
the
> cost-model as 'legalisation', which leads us to the following
question:
> 
>   Should we be using the cost-model to do legalisation?
> 
> 'Legalisation' in this context means asking the question beforehand
if the
> code-generator can handle the IR emitted from the LV. Examples of
> operations that need such legalisation are predicated divides (at least
> until we can use the llvm.vp intrinsics), or intrinsic calls that have no
> scalable-vector equivalent. For fixed-width vectors this legalisation issue
> is mostly moot, since operations on fixed-width vectors can be scalarised.
> For scalable vectors this is neither supported nor feasible [1].
I think this is one of the key points. LoopVectorLegality at the moment is
mostly concerned whether the loop is not vectorizable due to conceptual issues
(not sure sure what the best term would be) preventing vectorization (like
unvectorizable dependences or FP constraints), not target specific instruction
legality constraints like whether there is a vector version of a given
instruction. It is also independent of any given concrete VF. IIUC this is also
not limited to predication, e.g. we also need to check whether intrinsic calls
are supported natively. Are there any others?

As you mentioned, predicated instructions that are not supported by a target can
be scalarized and predicated. Even for fixed vectors, this is quite expensive in
general, but it allows LoopVectorLegality to work mostly independently of other
target constraints and focus on general legality checks. IIUC conceptually
there’s nothing preventing us from scalarizing operations on scalable vectors in
a similar fashion, but it requires an explicit loop dependent on the vector
width at runtime and deciding the cost depends on an unknown (the target vector
width).

I don’t have any insight in hardware-specific costs for scalable vector
operations, so I can’t comment on any specifics there. I am wondering if the
following hypothetical scenario is feasible/realistic: consider a loop where all
operations expect one can be widened directly and a single operations needs
scalarizing. I would expect there to be a point where the number of widenable
operations gets large enough to offset the cost of emitting a loop to scalarize
the single operation needing predication. Granted, depending on the maximum
vector width of the hardware this number might be quite large, but I could
imagine a scenario where we want to optimize given a tighter upper bound on the
maximum vector width than allowed by the hardware spec. E.g. considering the an
upper bound of 256 for the vector width, we’d only need to execute 2 iterations
of such a loop on AArch64. It would also be interesting to get addition
perspective on the cost question from people familiar with other hardware
supporting scalable vectors.

As a side-note, there is a use of TTI in LVL, but it is rather problematic. At
the moment, LVL considers loops with non-temporal memory operations as
unvectorizable up-front, if the vector version for an arbitrary VF (in that case
2 was chosen) is illegal on the target. This has the unfortunate side effect
that using non-temporal stores flat out blocks vectorization if there’s no legal
non-temporal load/store for VF = 2, which can be very surprising to our users,
especially on AArch64 where a single element non-temporal memory operation may
not be legal to start with and non-temporal stores may be legal at higher VFs
(https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp#L788
<https://github.com/llvm/llvm-project/blob/main/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp#L788>).
It’s not a perfect example, illustrates one of the issues of bailing out too
early.
> This means there's the option to do one of two things:
> 
> 
> [Option 1]
> 
> Add checks to the LV legalisation to see if scalable-vectorisation is
> feasible. If so, assert the cost must be valid. Otherwise discard scalable
> VFs as possible candidates.
> * This has the benefit that the compiler can avoid
>   calculating/considering VPlans that we know cannot be costed.
> * Legalisation and cost-model keep each other in check. If something
>   cannot be costed then either the cost-model or legalisation was
>   incomplete.
> 
> 
> [Option 2]
> 
> Leave the question about legalisation to the CostModel, i.e. if the
> CostModel says that <operation> for `VF=vscale x N` is Invalid, then
avoid
> selecting that VF.
> * This has the benefit that we don't need to do work up-front to
>   discard scalable VFs, keeping the LV design simpler.
> * This makes gaps in the cost-model more difficult to spot.
> 
I think if we would support scalarization for scalable vectors via a loop, then
the cost of predicating any given instruction would not be invalid, just
possibly quite high (depending on the upper bound for the vector), right? So we
should  still be able to assert that each result != Invalid.
> Note that it's not useful to combine Option 1 and Option 2, because
having
> two ways to choose from takes away the need to do legalisation beforehand,
> and so that's basically a choice for Option 2.
> 
> Both approaches lead to the same end-result, but we currently have a few
> patches in flight that have taken Option 1, and this led to some questions
> about the approach from both Florian and David Green. So we're looking
to
> reach to a consensus and decision on what way to move forward.
I think one concern that came up during the discussion was that option 1 means
that we need to add multiple isLegalToXXX helpers to TTI, which need to be kept
in sync with the already existing cost functions. It also more closely couples
legality checks and cost-modeling. I’m not sure if/how much native predication
support will complicate things and there may be more places that need to be
taught about native predication support. I’m not saying this is a blocker, just
a trade-off to consider. Also, if we missed a case or add a new case that may
require scalarization we would need to update/add additional checks.

Cheers,
Florian


-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20210615/62bf3d39/attachment.html>

llvm dev - Jun 2021 - LoopVectorizer: Should the cost-model be used for legalisation?

[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?

[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?

[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?

[llvm-dev] LoopVectorizer: Should the cost-model be used for legalisation?