Craig Topper via llvm-dev
2018-Nov-27 03:00 UTC
[llvm-dev] Vectorizer has trouble with vpmovmskb and store
We should handle this a lot better after r34763 ~Craig On Mon, Nov 26, 2018 at 3:13 PM Craig Topper <craig.topper at gmail.com> wrote:> Here's a quick patch that fixes this. I don't know to avoid it in IR. I > haven't checked any other tests, but it does fix your case. I'll try to put > up a real phabricator tonight or tomorrow. > > diff --git a/lib/Target/X86/X86ISelLowering.cpp > b/lib/Target/X86/X86ISelLowering.cpp > index e31f2a6..d79c0be 100644 > --- a/lib/Target/X86/X86ISelLowering.cpp > +++ b/lib/Target/X86/X86ISelLowering.cpp > @@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz() > const { > > bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT, > EVT BitcastVT) const { > + if (!LoadVT.isVector() && BitcastVT.isVector() && > + BitcastVT.getVectorElementType() == MVT::i1 && > + !Subtarget.hasAVX512()) > + return false; > + > if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1) > return false; > > > ~Craig > > > On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev < > llvm-dev at lists.llvm.org> wrote: > >> Hi all, >> I've run into a case where the optimizer seems to be having trouble >> doing the "obvious" thing. >> >> Consider this code: >> ``` >> define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) { >> %a1 = icmp slt <16 x i8> %a0, zeroinitializer >> %a2 = bitcast <16 x i1> %a1 to i16 >> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0, >> i64 7 >> ;store i16 %a2, i16* %astore >> ret i16 %a2 >> } >> ``` >> The optimizer recognizes this and llc nicely outputs a vpmovmskb >> instruction: >> ``` >> foo: # @foo >> vpmovmskb eax, xmm0 >> ret >> ``` >> >> Writing to the output vector also works well: >> ``` >> define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x i8> >> %a0) { >> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 0, >> i64 7 >> store i16 123, i16* %astore >> ret void >> } >> ``` >> outputs: >> ``` >> writing: # @writing >> mov word ptr [rdi + 14], 123 >> ret >> ``` >> >> Now, combining these two by uncommenting the store in `foo()` suddenly >> results in a very large function, instead of just: >> vpmovmskb eax, xmm0 >> mov word ptr [rdi + 14], ax >> ret >> >> Is there something wrong with my IR code, or is the optimizer somehow >> confused? Can I rewrite the code such that the optimizer does understand? >> >> Godbolt link: https://llvm.godbolt.org/z/OgExDk >> >> Thanks a lot for the help. >> Cheers, >> Johan >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >> >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181126/95fc98a1/attachment.html>
Johan Engelen via llvm-dev
2018-Dec-01 12:38 UTC
[llvm-dev] Vectorizer has trouble with vpmovmskb and store
Hello Craig, Thank you for the quick response and fix. However, the improvement turns out to be quite fragile. If I run `opt` on the original testcase, and run the output through `llc` then the previous very long assembly output results. (things work for a bitcast from <16 x i1> to i16, but not for a <16 x i1>* store) Godbolt link: https://llvm.godbolt.org/z/j1ob9w regards, Johan On Tue, Nov 27, 2018 at 4:00 AM Craig Topper <craig.topper at gmail.com> wrote:> We should handle this a lot better after r34763 > > ~Craig > > > On Mon, Nov 26, 2018 at 3:13 PM Craig Topper <craig.topper at gmail.com> > wrote: > >> Here's a quick patch that fixes this. I don't know to avoid it in IR. I >> haven't checked any other tests, but it does fix your case. I'll try to put >> up a real phabricator tonight or tomorrow. >> >> diff --git a/lib/Target/X86/X86ISelLowering.cpp >> b/lib/Target/X86/X86ISelLowering.cpp >> index e31f2a6..d79c0be 100644 >> --- a/lib/Target/X86/X86ISelLowering.cpp >> +++ b/lib/Target/X86/X86ISelLowering.cpp >> @@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz() >> const { >> >> bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT, >> EVT BitcastVT) const { >> + if (!LoadVT.isVector() && BitcastVT.isVector() && >> + BitcastVT.getVectorElementType() == MVT::i1 && >> + !Subtarget.hasAVX512()) >> + return false; >> + >> if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1) >> return false; >> >> >> ~Craig >> >> >> On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev < >> llvm-dev at lists.llvm.org> wrote: >> >>> Hi all, >>> I've run into a case where the optimizer seems to be having trouble >>> doing the "obvious" thing. >>> >>> Consider this code: >>> ``` >>> define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) { >>> %a1 = icmp slt <16 x i8> %a0, zeroinitializer >>> %a2 = bitcast <16 x i1> %a1 to i16 >>> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 >>> 0, i64 7 >>> ;store i16 %a2, i16* %astore >>> ret i16 %a2 >>> } >>> ``` >>> The optimizer recognizes this and llc nicely outputs a vpmovmskb >>> instruction: >>> ``` >>> foo: # @foo >>> vpmovmskb eax, xmm0 >>> ret >>> ``` >>> >>> Writing to the output vector also works well: >>> ``` >>> define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x i8> >>> %a0) { >>> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 >>> 0, i64 7 >>> store i16 123, i16* %astore >>> ret void >>> } >>> ``` >>> outputs: >>> ``` >>> writing: # @writing >>> mov word ptr [rdi + 14], 123 >>> ret >>> ``` >>> >>> Now, combining these two by uncommenting the store in `foo()` suddenly >>> results in a very large function, instead of just: >>> vpmovmskb eax, xmm0 >>> mov word ptr [rdi + 14], ax >>> ret >>> >>> Is there something wrong with my IR code, or is the optimizer somehow >>> confused? Can I rewrite the code such that the optimizer does understand? >>> >>> Godbolt link: https://llvm.godbolt.org/z/OgExDk >>> >>> Thanks a lot for the help. >>> Cheers, >>> Johan >>> >>> _______________________________________________ >>> LLVM Developers mailing list >>> llvm-dev at lists.llvm.org >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>> >>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181201/5c29155c/attachment.html>
Craig Topper via llvm-dev
2018-Dec-01 19:28 UTC
[llvm-dev] Vectorizer has trouble with vpmovmskb and store
I was afraid of that. I thought I had checked whether InstCombine would remove the bitcast here, but I guess I didn't or didn't do it right. I'll see what I can do to fix this. ~Craig On Sat, Dec 1, 2018 at 4:39 AM Johan Engelen <jbc.engelen at gmail.com> wrote:> Hello Craig, > Thank you for the quick response and fix. > However, the improvement turns out to be quite fragile. If I run `opt` on > the original testcase, and run the output through `llc` then the previous > very long assembly output results. (things work for a bitcast from <16 x > i1> to i16, but not for a <16 x i1>* store) > Godbolt link: https://llvm.godbolt.org/z/j1ob9w > > regards, > Johan > > > > On Tue, Nov 27, 2018 at 4:00 AM Craig Topper <craig.topper at gmail.com> > wrote: > >> We should handle this a lot better after r34763 >> >> ~Craig >> >> >> On Mon, Nov 26, 2018 at 3:13 PM Craig Topper <craig.topper at gmail.com> >> wrote: >> >>> Here's a quick patch that fixes this. I don't know to avoid it in IR. I >>> haven't checked any other tests, but it does fix your case. I'll try to put >>> up a real phabricator tonight or tomorrow. >>> >>> diff --git a/lib/Target/X86/X86ISelLowering.cpp >>> b/lib/Target/X86/X86ISelLowering.cpp >>> index e31f2a6..d79c0be 100644 >>> --- a/lib/Target/X86/X86ISelLowering.cpp >>> +++ b/lib/Target/X86/X86ISelLowering.cpp >>> @@ -4837,6 +4837,11 @@ bool X86TargetLowering::isCheapToSpeculateCtlz() >>> const { >>> >>> bool X86TargetLowering::isLoadBitCastBeneficial(EVT LoadVT, >>> EVT BitcastVT) const { >>> + if (!LoadVT.isVector() && BitcastVT.isVector() && >>> + BitcastVT.getVectorElementType() == MVT::i1 && >>> + !Subtarget.hasAVX512()) >>> + return false; >>> + >>> if (!Subtarget.hasDQI() && BitcastVT == MVT::v8i1) >>> return false; >>> >>> >>> ~Craig >>> >>> >>> On Mon, Nov 26, 2018 at 2:51 PM Johan Engelen via llvm-dev < >>> llvm-dev at lists.llvm.org> wrote: >>> >>>> Hi all, >>>> I've run into a case where the optimizer seems to be having trouble >>>> doing the "obvious" thing. >>>> >>>> Consider this code: >>>> ``` >>>> define i16 @foo(<8 x i16>* dereferenceable(16) %egress, <16 x i8> %a0) >>>> { >>>> %a1 = icmp slt <16 x i8> %a0, zeroinitializer >>>> %a2 = bitcast <16 x i1> %a1 to i16 >>>> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 >>>> 0, i64 7 >>>> ;store i16 %a2, i16* %astore >>>> ret i16 %a2 >>>> } >>>> ``` >>>> The optimizer recognizes this and llc nicely outputs a vpmovmskb >>>> instruction: >>>> ``` >>>> foo: # @foo >>>> vpmovmskb eax, xmm0 >>>> ret >>>> ``` >>>> >>>> Writing to the output vector also works well: >>>> ``` >>>> define void @writing(<8 x i16>* dereferenceable(16) %egress, <16 x i8> >>>> %a0) { >>>> %astore = getelementptr inbounds <8 x i16>, <8 x i16>* %egress, i64 >>>> 0, i64 7 >>>> store i16 123, i16* %astore >>>> ret void >>>> } >>>> ``` >>>> outputs: >>>> ``` >>>> writing: # @writing >>>> mov word ptr [rdi + 14], 123 >>>> ret >>>> ``` >>>> >>>> Now, combining these two by uncommenting the store in `foo()` suddenly >>>> results in a very large function, instead of just: >>>> vpmovmskb eax, xmm0 >>>> mov word ptr [rdi + 14], ax >>>> ret >>>> >>>> Is there something wrong with my IR code, or is the optimizer somehow >>>> confused? Can I rewrite the code such that the optimizer does understand? >>>> >>>> Godbolt link: https://llvm.godbolt.org/z/OgExDk >>>> >>>> Thanks a lot for the help. >>>> Cheers, >>>> Johan >>>> >>>> _______________________________________________ >>>> LLVM Developers mailing list >>>> llvm-dev at lists.llvm.org >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >>>> >>>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181201/570b5516/attachment.html>