thr3ads.net - llvm dev - [llvm-dev] [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors [Jan 2020]

If this information is useful, please help other people find it:
Share via:

Nemanja Ivanovic via llvm-dev

2020-Jan-11 23:08 UTC

[llvm-dev] [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

Thanks so much for your feedback Simon.

I am not sure that what I am proposing here is at odds with what you're
referring to (here and in the PR you linked). The key difference AFAICT is
that the pattern I am referring to is probably more aptly described as
"reducing scalarization" than as "vectorization". The reason
I say that is
that the inputs are vectors and the output is also a vector - we just
perform the operation on extracted elements rather than on the input
vectors themselves.

In the PR you linked, there is an example that shows the difference
(simplified to <2 x double> for brevity):
define dso_local <2 x double> @test(i64 %a, i64 %b) {
entry:
  %conv = uitofp i64 %a to double
  %conv1 = uitofp i64 %b to double
  %vecinit = insertelement <2 x double> undef, double %conv, i32 0
  %vecinit2 = insertelement <2 x double> %vecinit, double %conv1, i32 1
  ret <2 x double> %vecinit2
}

The inputs here are scalars so I suppose it is quite possible (perhaps
likely) that on some targets, doing the insert with integers and then
converting the vector is cheaper (although this is definitely not the case
with PPC).
But what I am proposing here is actually handling something like this:
define dso_local <2 x double> @test(<2 x i64> %a) {
entry:
  %vecext = extractelement <2 x i64> %a, i32 0
  %vecext1 = extractelement <2 x i64> %a, i32 1
  %conv = sitofp i64 %vecext to double
  %conv2 = sitofp i64 %vecext1 to double
  %vecinit = insertelement <2 x double> undef, double %conv, i32 0
  %vecinit3 = insertelement <2 x double> %vecinit, double %conv2, i32 1
  ret <2 x double> %vecinit3
}
With this type conversion, InstCombine will actually simplify this as
expected. And I think that is the right thing to do - I can't see the
scalarized version being cheaper on any target. Since we already do
something quite similar in InstCombine, I would assume it would be rather
uncontroversial to do it on the SDAG.

Now, a reasonable question might be "why do it on the SDAG if we already do
it in InstCombine?" And the short answer is it is quite possible that
legalization will introduce scalarization code and a subsequent DAG combine
creates an opportunity to remove the scalarization. Here is an example of
that:
define dso_local <2 x i64> @testv(<2 x i64> %a, <2 x i64> %b)
{
entry:
  %sexta = sext <2 x i64> %a to <2 x i128>
  %sextb = sext <2 x i64> %b to <2 x i128>
  %mul = mul nsw <2 x i128> %sexta, %sextb
  %shift = lshr <2 x i128> %mul, <i128 64, i128 64>
  %trunc = trunc <2 x i128> %shift to <2 x i64>
  ret <2 x i64> %trunc
}
On PPC, the legalizer will scalarize this since we do not have v2i128. Then
the DAG combiner will produce the pattern I am referring to in this RFC:

(v2i64 build_vector (mulhs (extractelt %a, 0), (extractelt %b, 0)),
                    (mulhs (extractelt %a, 1), (extractelt %b, 1)))

And if the target has mulhs legal for the vector type, this is strictly
worse. So no matter what we do in InstCombine or the SLP vectorizer, we
will end up with non-optimal code.

If we also handle shuffles of input vectors, we can catch things such as
the following:
define dso_local <4 x float> @test(<4 x i32> %a, <4 x i32> %b)
{
entry:
  %vecext = extractelement <4 x i32> %a, i32 0
  %vecext1 = extractelement <4 x i32> %a, i32 1
  %vecext4 = extractelement <4 x i32> %b, i32 2
  %vecext7 = extractelement <4 x i32> %b, i32 3
  %conv = sitofp i32 %vecext to float
  %conv2 = sitofp i32 %vecext1 to float
  %conv5 = sitofp i32 %vecext4 to float
  %conv8 = sitofp i32 %vecext7 to float
  %vecinit = insertelement <4 x float> undef, float %conv, i32 0
  %vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1
  %vecinit6 = insertelement <4 x float> %vecinit3, float %conv5, i32 2
  %vecinit9 = insertelement <4 x float> %vecinit6, float %conv8, i32 3
  ret <4 x float> %vecinit9
}

Is equivalent to:
define dso_local <4 x float> @testv(<4 x i32> %a, <4 x i32>
%b) {
entry:
  %shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x
i32> <i32 0,
i32 1, i32 6, i32 7>
  %0 = sitofp <4 x i32> %shuffle to <4 x float>
  ret <4 x float> %0
}

Of course, this is something we can handle in InstCombine, but I am
wondering if we may again be missing situations where it is the DAG
legalizer that creates the scalarization code.

On Sat, Jan 11, 2020 at 8:25 AM Simon Pilgrim via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Performing vectorization in the SelectionDAG tends to be fraught with
> problems as we don't have a cost model mechanism to control it; we hit
this
> in https://bugs.llvm.org/show_bug.cgi?id=35732 where conversion ops were
> being vectorized in DAG whether we wanted to or not, purely based on a
> isOperationLegalOrCustom call (just because an op is possible doesn't
mean
> its efficient to do so under all circumstances).
>
> In general I think we should be relying on the SLPVectorizer to handle
> this instead. If SLP is failing, we're probably better off spending dev
> time fixing it there instead of trying to cleanup in DAG (e.g. too often I
> find its a case of tweaking the TTI cost models).
>
> Having said that, we do hit cases where vectorizable patterns do appear
> during legalization - x86 does some very limited bitwise op vectorization
> in lowerBuildVectorToBitOp as this was such a common occurrence. So
there's
> probably scope to provide some common helper functions in SelectionDAG.cpp
> to recognise vectorizable patterns (similar to
> SelectionDAG::matchBinOpReduction) but we leave it up to the target's
> build_vector combiner/lowering to actually decide whether to make use of
> them.
>
> Simon.
> On 10/01/2020 21:49, Nemanja Ivanovic via llvm-dev wrote:
>
> I have added a few PPC-specific DAG combines in the past that follow this
> pattern on specific operations. Now that it appears that this would be
> useful to do on yet another operation, I'm wondering what people think
> about doing this in the target-independent DAG Combiner for any
> legal/custom operation on the target.
>
> TL; DR;
> The generic pattern would look like this:
>
> (build_vector (op (extractelt %a, 0), [(extractelt %b, 0)]...),
>               (op (extractelt %a, 1), [(extractelt %b, 1)]...), ...)
>
> Basically, if the build vector is built from the same operation applied on
elements extracted from another vector (or pair of vectors for binary ops), then
we can check for the legality of the operation on the vector type. If the
operation is legal for the vector type (and the operand and result vector types
are the same), then we can just convert this to
> (op %a [, %b])
>
> Which is likely going to produce better code in all cases. Of course,
things like this have a tendency to not be better in all cases, but I think we
can probably account for any corner cases that arise.
>
> Example:(v2i64 build_vector (mulhs (extractelt %a, 0), (extractelt %b, 0)),
>                     (mulhs (extractelt %a, 1), (extractelt %b, 1)))
>
> Can be converted to the following on a target that has the operation
available on vectors.(v2i64 mulhs %a, %b)
>
> A further improvement might be if the order of extracted elements
doesn't match the order in the build_vector, that we shuffle one or both
input vectors prior to applying the operation.
>
>
> If you think this is a good idea (or a terrible idea), please let me know.
>
>
> _______________________________________________
> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200111/a30669ab/attachment.html>

Craig Topper via llvm-dev

2020-Jan-11 23:14 UTC

head link

[llvm-dev] [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

For the MULHS example couldn’t that be fixed by adding a DAG combine to
create vector MULHS before type legalization?

On Sat, Jan 11, 2020 at 3:08 PM Nemanja Ivanovic via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Thanks so much for your feedback Simon.
>
> I am not sure that what I am proposing here is at odds with what you're
> referring to (here and in the PR you linked). The key difference AFAICT is
> that the pattern I am referring to is probably more aptly described as
> "reducing scalarization" than as "vectorization". The
reason I say that is
> that the inputs are vectors and the output is also a vector - we just
> perform the operation on extracted elements rather than on the input
> vectors themselves.
>
> In the PR you linked, there is an example that shows the difference
> (simplified to <2 x double> for brevity):
> define dso_local <2 x double> @test(i64 %a, i64 %b) {
> entry:
>   %conv = uitofp i64 %a to double
>   %conv1 = uitofp i64 %b to double
>   %vecinit = insertelement <2 x double> undef, double %conv, i32 0
>   %vecinit2 = insertelement <2 x double> %vecinit, double %conv1, i32
1
>   ret <2 x double> %vecinit2
> }
>
> The inputs here are scalars so I suppose it is quite possible (perhaps
> likely) that on some targets, doing the insert with integers and then
> converting the vector is cheaper (although this is definitely not the case
> with PPC).
> But what I am proposing here is actually handling something like this:
> define dso_local <2 x double> @test(<2 x i64> %a) {
> entry:
>   %vecext = extractelement <2 x i64> %a, i32 0
>   %vecext1 = extractelement <2 x i64> %a, i32 1
>   %conv = sitofp i64 %vecext to double
>   %conv2 = sitofp i64 %vecext1 to double
>   %vecinit = insertelement <2 x double> undef, double %conv, i32 0
>   %vecinit3 = insertelement <2 x double> %vecinit, double %conv2, i32
1
>   ret <2 x double> %vecinit3
> }
> With this type conversion, InstCombine will actually simplify this as
> expected. And I think that is the right thing to do - I can't see the
> scalarized version being cheaper on any target. Since we already do
> something quite similar in InstCombine, I would assume it would be rather
> uncontroversial to do it on the SDAG.
>
> Now, a reasonable question might be "why do it on the SDAG if we
already
> do it in InstCombine?" And the short answer is it is quite possible
that
> legalization will introduce scalarization code and a subsequent DAG combine
> creates an opportunity to remove the scalarization. Here is an example of
> that:
> define dso_local <2 x i64> @testv(<2 x i64> %a, <2 x i64>
%b) {
> entry:
>   %sexta = sext <2 x i64> %a to <2 x i128>
>   %sextb = sext <2 x i64> %b to <2 x i128>
>   %mul = mul nsw <2 x i128> %sexta, %sextb
>   %shift = lshr <2 x i128> %mul, <i128 64, i128 64>
>   %trunc = trunc <2 x i128> %shift to <2 x i64>
>   ret <2 x i64> %trunc
> }
> On PPC, the legalizer will scalarize this since we do not have v2i128.
> Then the DAG combiner will produce the pattern I am referring to in this
> RFC:
>
> (v2i64 build_vector (mulhs (extractelt %a, 0), (extractelt %b, 0)),
>                     (mulhs (extractelt %a, 1), (extractelt %b, 1)))
>
> And if the target has mulhs legal for the vector type, this is strictly
> worse. So no matter what we do in InstCombine or the SLP vectorizer, we
> will end up with non-optimal code.
>
> If we also handle shuffles of input vectors, we can catch things such as
> the following:
> define dso_local <4 x float> @test(<4 x i32> %a, <4 x
i32> %b) {
> entry:
>   %vecext = extractelement <4 x i32> %a, i32 0
>   %vecext1 = extractelement <4 x i32> %a, i32 1
>   %vecext4 = extractelement <4 x i32> %b, i32 2
>   %vecext7 = extractelement <4 x i32> %b, i32 3
>   %conv = sitofp i32 %vecext to float
>   %conv2 = sitofp i32 %vecext1 to float
>   %conv5 = sitofp i32 %vecext4 to float
>   %conv8 = sitofp i32 %vecext7 to float
>   %vecinit = insertelement <4 x float> undef, float %conv, i32 0
>   %vecinit3 = insertelement <4 x float> %vecinit, float %conv2, i32 1
>   %vecinit6 = insertelement <4 x float> %vecinit3, float %conv5, i32
2
>   %vecinit9 = insertelement <4 x float> %vecinit6, float %conv8, i32
3
>   ret <4 x float> %vecinit9
> }
>
> Is equivalent to:
> define dso_local <4 x float> @testv(<4 x i32> %a, <4 x
i32> %b) {
> entry:
>   %shuffle = shufflevector <4 x i32> %a, <4 x i32> %b, <4 x
i32> <i32 0,
> i32 1, i32 6, i32 7>
>   %0 = sitofp <4 x i32> %shuffle to <4 x float>
>   ret <4 x float> %0
> }
>
> Of course, this is something we can handle in InstCombine, but I am
> wondering if we may again be missing situations where it is the DAG
> legalizer that creates the scalarization code.
>
> On Sat, Jan 11, 2020 at 8:25 AM Simon Pilgrim via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Performing vectorization in the SelectionDAG tends to be fraught with
>> problems as we don't have a cost model mechanism to control it; we
hit this
>> in https://bugs.llvm.org/show_bug.cgi?id=35732 where conversion ops
were
>> being vectorized in DAG whether we wanted to or not, purely based on a
>> isOperationLegalOrCustom call (just because an op is possible
doesn't mean
>> its efficient to do so under all circumstances).
>>
>> In general I think we should be relying on the SLPVectorizer to handle
>> this instead. If SLP is failing, we're probably better off spending
dev
>> time fixing it there instead of trying to cleanup in DAG (e.g. too
often I
>> find its a case of tweaking the TTI cost models).
>>
>> Having said that, we do hit cases where vectorizable patterns do appear
>> during legalization - x86 does some very limited bitwise op
vectorization
>> in lowerBuildVectorToBitOp as this was such a common occurrence. So
there's
>> probably scope to provide some common helper functions in
SelectionDAG.cpp
>> to recognise vectorizable patterns (similar to
>> SelectionDAG::matchBinOpReduction) but we leave it up to the
target's
>> build_vector combiner/lowering to actually decide whether to make use
of
>> them.
>>
>> Simon.
>> On 10/01/2020 21:49, Nemanja Ivanovic via llvm-dev wrote:
>>
>> I have added a few PPC-specific DAG combines in the past that follow
this
>> pattern on specific operations. Now that it appears that this would be
>> useful to do on yet another operation, I'm wondering what people
think
>> about doing this in the target-independent DAG Combiner for any
>> legal/custom operation on the target.
>>
>> TL; DR;
>> The generic pattern would look like this:
>>
>> (build_vector (op (extractelt %a, 0), [(extractelt %b, 0)]...),
>>               (op (extractelt %a, 1), [(extractelt %b, 1)]...), ...)
>>
>> Basically, if the build vector is built from the same operation applied
on elements extracted from another vector (or pair of vectors for binary ops),
then we can check for the legality of the operation on the vector type. If the
operation is legal for the vector type (and the operand and result vector types
are the same), then we can just convert this to
>> (op %a [, %b])
>>
>> Which is likely going to produce better code in all cases. Of course,
things like this have a tendency to not be better in all cases, but I think we
can probably account for any corner cases that arise.
>>
>> Example:(v2i64 build_vector (mulhs (extractelt %a, 0), (extractelt %b,
0)),
>>                     (mulhs (extractelt %a, 1), (extractelt %b, 1)))
>>
>> Can be converted to the following on a target that has the operation
available on vectors.(v2i64 mulhs %a, %b)
>>
>> A further improvement might be if the order of extracted elements
doesn't match the order in the build_vector, that we shuffle one or both
input vectors prior to applying the operation.
>>
>>
>> If you think this is a good idea (or a terrible idea), please let me
know.
>>
>>
>> _______________________________________________
>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-- 
~Craig
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200111/62b4278b/attachment.html>

Nemanja Ivanovic via llvm-dev

2020-Jan-11 23:29 UTC

head link

[llvm-dev] [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

Absolutely. We do it for scalars, so it would likely be a matter of just
extending it.

But that is one example. The issue of extracting elements, performing an
operation on each element individually and then rebuilding the vector is
likely more prevalent than that. At least I think that is the case, but
I'll do some analysis to see if it is so or not.

On Sat, Jan 11, 2020 at 6:15 PM Craig Topper <craig.topper at gmail.com>
wrote:
> For the MULHS example couldn’t that be fixed by adding a DAG combine to
> create vector MULHS before type legalization?
>
> On Sat, Jan 11, 2020 at 3:08 PM Nemanja Ivanovic via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Thanks so much for your feedback Simon.
>>
>> I am not sure that what I am proposing here is at odds with what
you're
>> referring to (here and in the PR you linked). The key difference AFAICT
is
>> that the pattern I am referring to is probably more aptly described as
>> "reducing scalarization" than as "vectorization".
The reason I say that is
>> that the inputs are vectors and the output is also a vector - we just
>> perform the operation on extracted elements rather than on the input
>> vectors themselves.
>>
>> In the PR you linked, there is an example that shows the difference
>> (simplified to <2 x double> for brevity):
>> define dso_local <2 x double> @test(i64 %a, i64 %b) {
>> entry:
>>   %conv = uitofp i64 %a to double
>>   %conv1 = uitofp i64 %b to double
>>   %vecinit = insertelement <2 x double> undef, double %conv, i32
0
>>   %vecinit2 = insertelement <2 x double> %vecinit, double %conv1,
i32 1
>>   ret <2 x double> %vecinit2
>> }
>>
>> The inputs here are scalars so I suppose it is quite possible (perhaps
>> likely) that on some targets, doing the insert with integers and then
>> converting the vector is cheaper (although this is definitely not the
case
>> with PPC).
>> But what I am proposing here is actually handling something like this:
>> define dso_local <2 x double> @test(<2 x i64> %a) {
>> entry:
>>   %vecext = extractelement <2 x i64> %a, i32 0
>>   %vecext1 = extractelement <2 x i64> %a, i32 1
>>   %conv = sitofp i64 %vecext to double
>>   %conv2 = sitofp i64 %vecext1 to double
>>   %vecinit = insertelement <2 x double> undef, double %conv, i32
0
>>   %vecinit3 = insertelement <2 x double> %vecinit, double %conv2,
i32 1
>>   ret <2 x double> %vecinit3
>> }
>> With this type conversion, InstCombine will actually simplify this as
>> expected. And I think that is the right thing to do - I can't see
the
>> scalarized version being cheaper on any target. Since we already do
>> something quite similar in InstCombine, I would assume it would be
rather
>> uncontroversial to do it on the SDAG.
>>
>> Now, a reasonable question might be "why do it on the SDAG if we
already
>> do it in InstCombine?" And the short answer is it is quite
possible that
>> legalization will introduce scalarization code and a subsequent DAG
combine
>> creates an opportunity to remove the scalarization. Here is an example
of
>> that:
>> define dso_local <2 x i64> @testv(<2 x i64> %a, <2 x
i64> %b) {
>> entry:
>>   %sexta = sext <2 x i64> %a to <2 x i128>
>>   %sextb = sext <2 x i64> %b to <2 x i128>
>>   %mul = mul nsw <2 x i128> %sexta, %sextb
>>   %shift = lshr <2 x i128> %mul, <i128 64, i128 64>
>>   %trunc = trunc <2 x i128> %shift to <2 x i64>
>>   ret <2 x i64> %trunc
>> }
>> On PPC, the legalizer will scalarize this since we do not have v2i128.
>> Then the DAG combiner will produce the pattern I am referring to in
this
>> RFC:
>>
>> (v2i64 build_vector (mulhs (extractelt %a, 0), (extractelt %b, 0)),
>>                     (mulhs (extractelt %a, 1), (extractelt %b, 1)))
>>
>> And if the target has mulhs legal for the vector type, this is strictly
>> worse. So no matter what we do in InstCombine or the SLP vectorizer, we
>> will end up with non-optimal code.
>>
>> If we also handle shuffles of input vectors, we can catch things such
as
>> the following:
>> define dso_local <4 x float> @test(<4 x i32> %a, <4 x
i32> %b) {
>> entry:
>>   %vecext = extractelement <4 x i32> %a, i32 0
>>   %vecext1 = extractelement <4 x i32> %a, i32 1
>>   %vecext4 = extractelement <4 x i32> %b, i32 2
>>   %vecext7 = extractelement <4 x i32> %b, i32 3
>>   %conv = sitofp i32 %vecext to float
>>   %conv2 = sitofp i32 %vecext1 to float
>>   %conv5 = sitofp i32 %vecext4 to float
>>   %conv8 = sitofp i32 %vecext7 to float
>>   %vecinit = insertelement <4 x float> undef, float %conv, i32 0
>>   %vecinit3 = insertelement <4 x float> %vecinit, float %conv2,
i32 1
>>   %vecinit6 = insertelement <4 x float> %vecinit3, float %conv5,
i32 2
>>   %vecinit9 = insertelement <4 x float> %vecinit6, float %conv8,
i32 3
>>   ret <4 x float> %vecinit9
>> }
>>
>> Is equivalent to:
>> define dso_local <4 x float> @testv(<4 x i32> %a, <4 x
i32> %b) {
>> entry:
>>   %shuffle = shufflevector <4 x i32> %a, <4 x i32> %b,
<4 x i32> <i32 0,
>> i32 1, i32 6, i32 7>
>>   %0 = sitofp <4 x i32> %shuffle to <4 x float>
>>   ret <4 x float> %0
>> }
>>
>> Of course, this is something we can handle in InstCombine, but I am
>> wondering if we may again be missing situations where it is the DAG
>> legalizer that creates the scalarization code.
>>
>> On Sat, Jan 11, 2020 at 8:25 AM Simon Pilgrim via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Performing vectorization in the SelectionDAG tends to be fraught
with
>>> problems as we don't have a cost model mechanism to control it;
we hit this
>>> in https://bugs.llvm.org/show_bug.cgi?id=35732 where conversion ops
>>> were being vectorized in DAG whether we wanted to or not, purely
based on a
>>> isOperationLegalOrCustom call (just because an op is possible
doesn't mean
>>> its efficient to do so under all circumstances).
>>>
>>> In general I think we should be relying on the SLPVectorizer to
handle
>>> this instead. If SLP is failing, we're probably better off
spending dev
>>> time fixing it there instead of trying to cleanup in DAG (e.g. too
often I
>>> find its a case of tweaking the TTI cost models).
>>>
>>> Having said that, we do hit cases where vectorizable patterns do
appear
>>> during legalization - x86 does some very limited bitwise op
vectorization
>>> in lowerBuildVectorToBitOp as this was such a common occurrence. So
there's
>>> probably scope to provide some common helper functions in
SelectionDAG.cpp
>>> to recognise vectorizable patterns (similar to
>>> SelectionDAG::matchBinOpReduction) but we leave it up to the
target's
>>> build_vector combiner/lowering to actually decide whether to make
use of
>>> them.
>>>
>>> Simon.
>>> On 10/01/2020 21:49, Nemanja Ivanovic via llvm-dev wrote:
>>>
>>> I have added a few PPC-specific DAG combines in the past that
follow
>>> this pattern on specific operations. Now that it appears that this
would be
>>> useful to do on yet another operation, I'm wondering what
people think
>>> about doing this in the target-independent DAG Combiner for any
>>> legal/custom operation on the target.
>>>
>>> TL; DR;
>>> The generic pattern would look like this:
>>>
>>> (build_vector (op (extractelt %a, 0), [(extractelt %b, 0)]...),
>>>               (op (extractelt %a, 1), [(extractelt %b, 1)]...),
...)
>>>
>>> Basically, if the build vector is built from the same operation
applied on elements extracted from another vector (or pair of vectors for binary
ops), then we can check for the legality of the operation on the vector type. If
the operation is legal for the vector type (and the operand and result vector
types are the same), then we can just convert this to
>>> (op %a [, %b])
>>>
>>> Which is likely going to produce better code in all cases. Of
course, things like this have a tendency to not be better in all cases, but I
think we can probably account for any corner cases that arise.
>>>
>>> Example:(v2i64 build_vector (mulhs (extractelt %a, 0), (extractelt
%b, 0)),
>>>                     (mulhs (extractelt %a, 1), (extractelt %b, 1)))
>>>
>>> Can be converted to the following on a target that has the
operation available on vectors.(v2i64 mulhs %a, %b)
>>>
>>> A further improvement might be if the order of extracted elements
doesn't match the order in the build_vector, that we shuffle one or both
input vectors prior to applying the operation.
>>>
>>>
>>> If you think this is a good idea (or a terrible idea), please let
me know.
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing listllvm-dev at
lists.llvm.orghttps://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
> --
> ~Craig
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20200111/33a8be7d/attachment.html>

Apparently Analagous Threads

Search for more maybe matching threads

llvm dev - Jan 2020 - [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

[llvm-dev] [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

[llvm-dev] [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

[llvm-dev] [RFC][SDAG] Convert build_vector of ops on extractelts into ops on input vectors

Apparently Analagous Threads