Johan Engelen via llvm-dev
2018-Nov-26 22:50 UTC
[llvm-dev] Vectorizer has trouble with vpmovmskb and store
Hi all, I've run into a case where the optimizer seems to be having trouble doing the "obvious" thing. Consider this code: ``` define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) { %a1 = icmp slt <16 x i8> %a0, zeroinitializer %a2 = bitcast <16 x i1> %a1 to i16 %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0, i64 7 ;store i16 %a2, i16* %astore ret i16 %a2 } ``` The optimizer recognizes this and llc nicely outputs a vpmovmskb instruction: ``` foo: # @foo vpmovmskb eax, xmm0 ret ``` Writing to the output vector also works well: ``` define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) { %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0, i64 7 store i16 123, i16* %astore ret void } ``` outputs: ``` writing: # @writing mov word ptr [rdi + 14], 123 ret ``` Now, combining these two by uncommenting the store in `foo()` suddenly results in a very large function, instead of just: vpmovmskb eax, xmm0 mov word ptr [rdi + 14], ax ret Is there something wrong with my IR code, or is the optimizer somehow confused? Can I rewrite the code such that the optimizer does understand? Godbolt link: https://llvm.godbolt.org/z/OgExDk Thanks a lot for the help. Cheers, Johan -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/5c15b7b0/attachment-0001.html>
Craig Topper via llvm-dev
2018-Nov-26 23:13 UTC
[llvm-dev] Vectorizer has trouble with vpmovmskb and store
Here's a quick patch that fixes this. I don't know to avoid it in IR. I haven't checked any other tests, but it does fix your case. I'll try to put up a real phabricator tonight or tomorrow. diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp index e31f2a6..d79c0be 100644 --- a/lib/Target/X86/X86ISelLowering.cpp +++ b/lib/Target/X86/X86ISelLowering.cpp @@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz() const { bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT, EVT BitcastVT) const { + if (!LoadVT.isVector() && BitcastVT.isVector() && + BitcastVT.getVectorElementType() == MVT::i1 && + !Subtarget.hasAVX512()) + return false; + if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1) return false; ~Craig On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev < llvm-dev at lists.llvm.org> wrote:> Hi all, > I've run into a case where the optimizer seems to be having trouble > doing the "obvious" thing. > > Consider this code: > ``` > define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) { > %a1 = icmp slt <16 x i8> %a0, zeroinitializer > %a2 = bitcast <16 x i1> %a1 to i16 > %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0, > i64 7 > ;store i16 %a2, i16* %astore > ret i16 %a2 > } > ``` > The optimizer recognizes this and llc nicely outputs a vpmovmskb > instruction: > ``` > foo: # @foo > vpmovmskb eax, xmm0 > ret > ``` > > Writing to the output vector also works well: > ``` > define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) > { > %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0, > i64 7 > store i16 123, i16* %astore > ret void > } > ``` > outputs: > ``` > writing: # @writing > mov word ptr [rdi + 14], 123 > ret > ``` > > Now, combining these two by uncommenting the store in `foo()` suddenly > results in a very large function, instead of just: > vpmovmskb eax, xmm0 > mov word ptr [rdi + 14], ax > ret > > Is there something wrong with my IR code, or is the optimizer somehow > confused? Can I rewrite the code such that the optimizer does understand? > > Godbolt link: https://llvm.godbolt.org/z/OgExDk > > Thanks a lot for the help. > Cheers, > Johan > > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/c51293a2/attachment.html>
Craig Topper via llvm-dev
2018-Nov-27 03:00 UTC
[llvm-dev] Vectorizer has trouble with vpmovmskb and store
We should handle this a lot better after r34763 ~Craig On Mon, Nov 26, 2018 at 3:13 PM Craig Topper <craig.topper at gmail.com> wrote:> Here's a quick patch that fixes this. I don't know to avoid it in IR. I > haven't checked any other tests, but it does fix your case. I'll try to put > up a real phabricator tonight or tomorrow. > > diff --git a/lib/Target/X86/X86ISelLowering.cpp > b/lib/Target/X86/X86ISelLowering.cpp > index e31f2a6..d79c0be 100644 > --- a/lib/Target/X86/X86ISelLowering.cpp > +++ b/lib/Target/X86/X86ISelLowering.cpp > @@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz() > const { > > bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT, > EVT BitcastVT) const { > + if (!LoadVT.isVector() && BitcastVT.isVector() && > + BitcastVT.getVectorElementType() == MVT::i1 && > + !Subtarget.hasAVX512()) > + return false; > + > if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1) > return false; > > > ~Craig > > > On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi all, >> I've run into a case where the optimizer seems to be having trouble >> doing the "obvious" thing. >> >> Consider this code: >> ``` >> define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) { >> %a1 = icmp slt <16 x i8> %a0, zeroinitializer >> %a2 = bitcast <16 x i1> %a1 to i16 >> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0, >> i64 7 >> ;store i16 %a2, i16* %astore >> ret i16 %a2 >> } >> ``` >> The optimizer recognizes this and llc nicely outputs a vpmovmskb >> instruction: >> ``` >> foo: # @foo >> vpmovmskb eax, xmm0 >> ret >> ``` >> >> Writing to the output vector also works well: >> ``` >> define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x i8> >> %a0) { >> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0, >> i64 7 >> store i16 123, i16* %astore >> ret void >> } >> ``` >> outputs: >> ``` >> writing: # @writing >> mov word ptr [rdi + 14], 123 >> ret >> ``` >> >> Now, combining these two by uncommenting the store in `foo()` suddenly >> results in a very large function, instead of just: >> vpmovmskb eax, xmm0 >> mov word ptr [rdi + 14], ax >> ret >> >> Is there something wrong with my IR code, or is the optimizer somehow >> confused? Can I rewrite the code such that the optimizer does understand? >> >> Godbolt link: https://llvm.godbolt.org/z/OgExDk >> >> Thanks a lot for the help. >> Cheers, >> Johan >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/95fc98a1/attachment.html>