thr3ads.net - llvm dev - [llvm-dev] Vectorizer has trouble with vpmovmskb and store [Nov 2018]

If this information is useful, please help other people find it:
Share via:

Johan Engelen via llvm-dev

2018-Nov-26 22:50 UTC

[llvm-dev] Vectorizer has trouble with vpmovmskb and store

Hi all,
  I've run into a case where the optimizer seems to be having trouble doing
the "obvious" thing.

Consider this code:
```
define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8>
%a0) {
    %a1 = icmp slt <16 x i8> %a0, zeroinitializer
    %a2 = bitcast <16 x i1> %a1 to i16
    %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress,
i64 0,
i64 7
    ;store i16 %a2, i16* %astore
    ret i16 %a2
}
```
The optimizer recognizes this and llc nicely outputs a vpmovmskb
instruction:
```
foo: # @foo
    vpmovmskb eax, xmm0
    ret
```

Writing to the output vector also works well:
```
define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x
i8> %a0)
{
    %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress,
i64 0,
i64 7
    store i16 123, i16* %astore
    ret void
}
```
outputs:
```
writing: # @writing
    mov word ptr [rdi + 14], 123
    ret
```

Now, combining these two by uncommenting the store in `foo()` suddenly
results in a very large function, instead of just:
    vpmovmskb eax, xmm0
    mov word ptr [rdi + 14], ax
    ret

Is there something wrong with my IR code, or is the optimizer somehow
confused? Can I rewrite the code such that the optimizer does understand?

Godbolt link: https://llvm.godbolt.org/z/OgExDk

Thanks a lot for the help.
Cheers,
  Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/5c15b7b0/attachment-0001.html>

Craig Topper via llvm-dev

2018-Nov-26 23:13 UTC

head link

[llvm-dev] Vectorizer has trouble with vpmovmskb and store

Here's a quick patch that fixes this. I don't know to avoid it in IR. I
haven't checked any other tests, but it does fix your case. I'll try to
put
up a real phabricator tonight or tomorrow.

diff --git a/lib/Target/X86/X86ISelLowering.cpp
b/lib/Target/X86/X86ISelLowering.cpp
index e31f2a6..d79c0be 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz()
const {

 bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT,
                                                 EVT BitcastVT) const {
+  if (!LoadVT.isVector() && BitcastVT.isVector() &&
+      BitcastVT.getVectorElementType() == MVT::i1 &&
+      !Subtarget.hasAVX512())
+    return false;
+
   if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1)
     return false;


~Craig


On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>   I've run into a case where the optimizer seems to be having trouble
> doing the "obvious" thing.
>
> Consider this code:
> ```
> define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x
i8> %a0) {
>     %a1 = icmp slt <16 x i8> %a0, zeroinitializer
>     %a2 = bitcast <16 x i1> %a1 to i16
>     %astore = getelementptr inbounds <8 x i16>, <8 x i16>*
%egress, i64 0,
> i64 7
>     ;store i16 %a2, i16* %astore
>     ret i16 %a2
> }
> ```
> The optimizer recognizes this and llc nicely outputs a vpmovmskb
> instruction:
> ```
> foo: # @foo
>     vpmovmskb eax, xmm0
>     ret
> ```
>
> Writing to the output vector also works well:
> ```
> define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x
i8> %a0)
> {
>     %astore = getelementptr inbounds <8 x i16>, <8 x i16>*
%egress, i64 0,
> i64 7
>     store i16 123, i16* %astore
>     ret void
> }
> ```
> outputs:
> ```
> writing: # @writing
>     mov word ptr [rdi + 14], 123
>     ret
> ```
>
> Now, combining these two by uncommenting the store in `foo()` suddenly
> results in a very large function, instead of just:
>     vpmovmskb eax, xmm0
>     mov word ptr [rdi + 14], ax
>     ret
>
> Is there something wrong with my IR code, or is the optimizer somehow
> confused? Can I rewrite the code such that the optimizer does understand?
>
> Godbolt link: https://llvm.godbolt.org/z/OgExDk
>
> Thanks a lot for the help.
> Cheers,
>   Johan
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/c51293a2/attachment.html>

Craig Topper via llvm-dev

2018-Nov-27 03:00 UTC

head link

[llvm-dev] Vectorizer has trouble with vpmovmskb and store

We should handle this a lot better after r34763

~Craig


On Mon, Nov 26, 2018 at 3:13 PM Craig Topper <craig.topper at gmail.com>
wrote:
> Here's a quick patch that fixes this. I don't know to avoid it in
IR. I
> haven't checked any other tests, but it does fix your case. I'll
try to put
> up a real phabricator tonight or tomorrow.
>
> diff --git a/lib/Target/X86/X86ISelLowering.cpp
> b/lib/Target/X86/X86ISelLowering.cpp
> index e31f2a6..d79c0be 100644
> --- a/lib/Target/X86/X86ISelLowering.cpp
> +++ b/lib/Target/X86/X86ISelLowering.cpp
> @@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz()
> const {
>
>  bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT,
>                                                  EVT BitcastVT) const {
> +  if (!LoadVT.isVector() && BitcastVT.isVector() &&
> +      BitcastVT.getVectorElementType() == MVT::i1 &&
> +      !Subtarget.hasAVX512())
> +    return false;
> +
>    if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1)
>      return false;
>
>
> ~Craig
>
>
> On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>   I've run into a case where the optimizer seems to be having
trouble
>> doing the "obvious" thing.
>>
>> Consider this code:
>> ```
>> define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x
i8> %a0) {
>>     %a1 = icmp slt <16 x i8> %a0, zeroinitializer
>>     %a2 = bitcast <16 x i1> %a1 to i16
>>     %astore = getelementptr inbounds <8 x i16>, <8 x i16>*
%egress, i64 0,
>> i64 7
>>     ;store i16 %a2, i16* %astore
>>     ret i16 %a2
>> }
>> ```
>> The optimizer recognizes this and llc nicely outputs a vpmovmskb
>> instruction:
>> ```
>> foo: # @foo
>>     vpmovmskb eax, xmm0
>>     ret
>> ```
>>
>> Writing to the output vector also works well:
>> ```
>> define void @writing(<8 x i16>* dereferenceable(16) %egress,
<16 x i8>
>> %a0) {
>>     %astore = getelementptr inbounds <8 x i16>, <8 x i16>*
%egress, i64 0,
>> i64 7
>>     store i16 123, i16* %astore
>>     ret void
>> }
>> ```
>> outputs:
>> ```
>> writing: # @writing
>>     mov word ptr [rdi + 14], 123
>>     ret
>> ```
>>
>> Now, combining these two by uncommenting the store in `foo()` suddenly
>> results in a very large function, instead of just:
>>     vpmovmskb eax, xmm0
>>     mov word ptr [rdi + 14], ax
>>     ret
>>
>> Is there something wrong with my IR code, or is the optimizer somehow
>> confused? Can I rewrite the code such that the optimizer does
understand?
>>
>> Godbolt link: https://llvm.godbolt.org/z/OgExDk
>>
>> Thanks a lot for the help.
>> Cheers,
>>   Johan
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/95fc98a1/attachment.html>

Apparently Analagous Threads

Search for more reasonably related threads

llvm dev - Nov 2018 - Vectorizer has trouble with vpmovmskb and store

[llvm-dev] Vectorizer has trouble with vpmovmskb and store

[llvm-dev] Vectorizer has trouble with vpmovmskb and store

[llvm-dev] Vectorizer has trouble with vpmovmskb and store

Apparently Analagous Threads