thr3ads.net - llvm dev - [llvm-dev] SLP regression on SystemZ [Mar 2017]

If this information is useful, please help other people find it:
Share via:

Jonas Paulsson via llvm-dev

2017-Mar-24 12:25 UTC

[llvm-dev] SLP regression on SystemZ

Hi,

I have come across a major regression resulting after SLP vectorization 
(+18% on SystemZ, just for enabling SLP). This all relates to one 
particular very hot loop.

Scalar code:
   %conv252 = zext i16 %110 to i64
   %conv254 = zext i16 %111 to i64
   %sub255 = sub nsw i64 %conv252, %conv254
   ... repeated

SLP output:
   %101 = zext <16 x i16> %100 to <16 x i64>
   %104 = zext <16 x i16> %103 to <16 x i64>
   %105 = sub nsw <16 x i64> %101, %104
   %106 = trunc <16 x i64> %105 to <16 x i32>
/for each element e 0:15/
    %107 = extractelement <16 x i32> %106, i32 e
    %108 = sext i32 %107 to i64

The vectorized code should in this case only have to be

   %101 = zext <16 x i16> %100 to <16 x i64>
   %104 = zext <16 x i16> %103 to <16 x i64>
   %105 = sub nsw <16 x i64> %101, %104
/for each element e 0:15/
    %107 = extractelement <16 x i64> %105, i32 e

,but this does not get handled so for all the 16 elements, extracts *and 
extends* are done.

I see that there is a special function in SLP vectorizer that does this 
truncation and extract+extend whenever possible. Is this the place to 
fix this?

Or would it be better to rely on InstCombiner?

Is this truncation done by SLP with the assumption that it is free to 
extend an extracted element? On SystemZ, this is not true.

/Jonas

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170324/d062c5f8/attachment.html>

Matthew Simpson via llvm-dev

2017-Mar-24 13:10 UTC

head link

[llvm-dev] SLP regression on SystemZ

Hi Jonas,

The vectorizers do attempt to type-shrink elements if possible to pack more
data into vectors. It looks like that's what's happening here. This
transformation is cost-modeled, but there are assumptions made about what
InstCombine will be able to clean up. Would you mind filing a bug with at
test case that we can take a look at?

-- Matt

On Fri, Mar 24, 2017 at 8:25 AM, Jonas Paulsson via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> I have come across a major regression resulting after SLP vectorization
> (+18% on SystemZ, just for enabling SLP). This all relates to one
> particular very hot loop.
>
> Scalar code:
>   %conv252 = zext i16 %110 to i64
>   %conv254 = zext i16 %111 to i64
>   %sub255 = sub nsw i64 %conv252, %conv254
>   ... repeated
>
> SLP output:
>   %101 = zext <16 x i16> %100 to <16 x i64>
>   %104 = zext <16 x i16> %103 to <16 x i64>
>   %105 = sub nsw <16 x i64> %101, %104
>   %106 = trunc <16 x i64> %105 to <16 x i32>
>   *for each element e 0:15*
>    %107 = extractelement <16 x i32> %106, i32 e
>    %108 = sext i32 %107 to i64
>
> The vectorized code should in this case only have to be
>
>   %101 = zext <16 x i16> %100 to <16 x i64>
>   %104 = zext <16 x i16> %103 to <16 x i64>
>   %105 = sub nsw <16 x i64> %101, %104
>   *for each element e 0:15*
>    %107 = extractelement <16 x i64> %105, i32 e
>
> ,but this does not get handled so for all the 16 elements, extracts *and
> extends* are done.
>
> I see that there is a special function in SLP vectorizer that does this
> truncation and extract+extend whenever possible. Is this the place to fix
> this?
>
> Or would it be better to rely on InstCombiner?
>
> Is this truncation done by SLP with the assumption that it is free to
> extend an extracted element? On SystemZ, this is not true.
>
> /Jonas
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170324/cf121141/attachment.html>

Jonas Paulsson via llvm-dev

2017-Mar-24 13:35 UTC

head link

[llvm-dev] SLP regression on SystemZ

Hi Matt,

thanks for taking a look, please see 
https://bugs.llvm.org//show_bug.cgi?id=32406.

/Jonas


On 2017-03-24 15:10, Matthew Simpson wrote:> Hi Jonas,
>
> The vectorizers do attempt to type-shrink elements if possible to pack 
> more data into vectors. It looks like that's what's happening here.
> This transformation is cost-modeled, but there are assumptions made 
> about what InstCombine will be able to clean up. Would you mind filing 
> a bug with at test case that we can take a look at?
>
> -- Matt
>
> On Fri, Mar 24, 2017 at 8:25 AM, Jonas Paulsson via llvm-dev 
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     Hi,
>
>     I have come across a major regression resulting after SLP
>     vectorization (+18% on SystemZ, just for enabling SLP). This all
>     relates to one particular very hot loop.
>
>     Scalar code:
>       %conv252 = zext i16 %110 to i64
>       %conv254 = zext i16 %111 to i64
>       %sub255 = sub nsw i64 %conv252, %conv254
>       ... repeated
>
>     SLP output:
>       %101 = zext <16 x i16> %100 to <16 x i64>
>       %104 = zext <16 x i16> %103 to <16 x i64>
>       %105 = sub nsw <16 x i64> %101, %104
>       %106 = trunc <16 x i64> %105 to <16 x i32>
>     /for each element e 0:15/
>        %107 = extractelement <16 x i32> %106, i32 e
>        %108 = sext i32 %107 to i64
>
>     The vectorized code should in this case only have to be
>
>       %101 = zext <16 x i16> %100 to <16 x i64>
>       %104 = zext <16 x i16> %103 to <16 x i64>
>       %105 = sub nsw <16 x i64> %101, %104
>     /for each element e 0:15/
>        %107 = extractelement <16 x i64> %105, i32 e
>
>     ,but this does not get handled so for all the 16 elements,
>     extracts *and extends* are done.
>
>     I see that there is a special function in SLP vectorizer that does
>     this truncation and extract+extend whenever possible. Is this the
>     place to fix this?
>
>     Or would it be better to rely on InstCombiner?
>
>     Is this truncation done by SLP with the assumption that it is free
>     to extend an extracted element? On SystemZ, this is not true.
>
>     /Jonas
>
>
>     _______________________________________________
>     LLVM Developers mailing list
>     llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170324/79427039/attachment-0001.html>

Reasonably Related Threads

Search for more seemingly similar threads

llvm dev - Mar 2017 - SLP regression on SystemZ

[llvm-dev] SLP regression on SystemZ

[llvm-dev] SLP regression on SystemZ

[llvm-dev] SLP regression on SystemZ

Reasonably Related Threads