Robert Lougher via llvm-dev
2018-Jul-04 12:50 UTC
[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
Hi, On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev < llvm-dev at lists.llvm.org> wrote:> + llvm-dev > > -----Original Message----- > From: Nema, Ashutosh > Sent: Wednesday, July 4, 2018 12:12 PM > To: Hal Finkel <hfinkel at anl.gov>; Saito, Hideki <hideki.saito at intel.com>; > Sanjay Patel <spatel at rotateright.com>; mzolotukhin at apple.com > Cc: dccitaliano at gmail.com; Masten, Matt <matt.masten at intel.com> > Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls? > > Hi Hal, > > > __svml_sin8 (plus whatever shuffles are necessary). > > The vectorizer should do this. > > It should not generate calls to functions that don't exist. > > I'm not sure how vectorizer will do this, consider the case where > "-vectorizer-maximize-bandwidth" option is enabled and vectorizer is > forced to generate the wider VF, and hence it may generate a call to > __svml_sin_* which may not exist. > > Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to > two __svml_sin_4 calls ? > > Regards, > Ashutosh >If an accurate cost model was in place (which there isn't), then an "unsupported" vectorization factor should only be selected if it was forced. However, in this case __svml_sin_8 is the same cost as __svml_sin_4, so the loop vectorizer will select a VF of 8, and generate a call to a function which effectively doesn't exist. The simplest way to fix it, is to simply only populate the SVML vector library table with __svml_sin_8 when the target is AVX-512. Alternatively, TLI.isFunctionVectorizable() should check that the entry is available on the target (this is more difficult as the type is not encoded). I'm guessing that the cost model would then make VF=4 cheaper, so generating calls to __svml_sin_4 (I'm not in work so can't check). If the vectorization factor was forced to 8, we'll either get a call to the intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer will scalarize the call. The vectorizer would not generate two calls to __svml_sin_4 although this would be cheaper. While this problem probably doesn't require the loop vectorizer to have knowledge of the target ABI, others may do. I'm thinking specifically of D48193: https://reviews.llvm.org/D48193 In this case we have poor code generation due to the interleave count selected by the loop vectorizer. I can't see how this can be fixed later, so we will need to expose details of the ABI to the loop vectorizer (see my latest comment D48193#1149705). Thanks, Rob. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180704/e76932ab/attachment-0001.html>
Hal Finkel via llvm-dev
2018-Jul-04 16:58 UTC
[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
On 07/04/2018 07:50 AM, Robert Lougher wrote:> Hi, > > On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev > <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: > > + llvm-dev > > -----Original Message----- > From: Nema, Ashutosh > Sent: Wednesday, July 4, 2018 12:12 PM > To: Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at anl.gov>>; Saito, > Hideki <hideki.saito at intel.com <mailto:hideki.saito at intel.com>>; > Sanjay Patel <spatel at rotateright.com > <mailto:spatel at rotateright.com>>; mzolotukhin at apple.com > <mailto:mzolotukhin at apple.com> > Cc: dccitaliano at gmail.com <mailto:dccitaliano at gmail.com>; Masten, > Matt <matt.masten at intel.com <mailto:matt.masten at intel.com>> > Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize > VECLIB calls? > > Hi Hal, > > > __svml_sin8 (plus whatever shuffles are necessary). > > The vectorizer should do this. > > It should not generate calls to functions that don't exist. > > I'm not sure how vectorizer will do this, consider the case where > "-vectorizer-maximize-bandwidth" option is enabled and vectorizer > is forced to generate the wider VF, and hence it may generate a > call to __svml_sin_* which may not exist. > > Are you expecting the vectorizer to lower the calls i.e. > __svml_sin_8 to two __svml_sin_4 calls ? > > Regards, > Ashutosh > > > If an accurate cost model was in place (which there isn't), then an > "unsupported" vectorization factor should only be selected if it was > forced. However, in this case __svml_sin_8 is the same cost as > __svml_sin_4, so the loop vectorizer will select a VF of 8, and > generate a call to a function which effectively doesn't exist.Would it actually be the same, or would there be extra shuffle costs associated with the calls to __svml_sin_4?> > The simplest way to fix it, is to simply only populate the SVML vector > library table with __svml_sin_8 when the target is AVX-512.I believe that this is exactly what we should do. When not targeting AVX-512, __svml_sin_8 essentially doesn't exist (i.e. there's no usable ABI via which we can call it), and so it should not appear in the vectorizer's list of options at all. -Hal> Alternatively, TLI.isFunctionVectorizable() should check that the > entry is available on the target (this is more difficult as the type > is not encoded). > > I'm guessing that the cost model would then make VF=4 cheaper, so > generating calls to __svml_sin_4 (I'm not in work so can't check). > If the vectorization factor was forced to 8, we'll either get a call > to the intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer > will scalarize the call. The vectorizer would not generate two calls > to __svml_sin_4 although this would be cheaper. > > While this problem probably doesn't require the loop vectorizer to > have knowledge of the target ABI, others may do. I'm thinking > specifically of D48193: > > https://reviews.llvm.org/D48193 > > In this case we have poor code generation due to the interleave count > selected by the loop vectorizer. I can't see how this can be fixed > later, so we will need to expose details of the ABI to the loop > vectorizer (see my latest comment D48193#1149705). > > Thanks, > Rob. > >-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20180704/bae85fe1/attachment.html>
Saito, Hideki via llvm-dev
2018-Jul-09 17:36 UTC
[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
All, It looks like we are finally converging into 4) Vectorizer emit legalized VECLIB calls. Since it can emit instructions in scalarized form, adding legalized call functionality is in some sense similar to that. Vectorizer can’t simply choose type legal function name with illegal vector ---- since LegalizeVectorType() will still end up using one call instead of two. I was hoping to collectively come up with a better solution, but not at all surprised to see us settling down to this known-to-work practical approach. We need a more elaborate VECLIB setting, taking per-target function availability into account. Also, much of the "legalized call" mechanism should work for OpenMP declare simd --- and we should make that easier for reuse in case other FE/optimizers want to emit legalized calls. Simon, is the RV "legalized call emission" code easily reusable outside of RV? If yes, would you be able to restructure it so that it can reside, say, under Transforms/Utils? I think this RFC is ready to close at the end of this week. Thank you very much for all the lively discussions. If anybody have more inputs, please speak up soon. Thanks, Hideki ==========================================From: Hal Finkel [mailto:hfinkel at anl.gov] Sent: Wednesday, July 04, 2018 9:59 AM To: Robert Lougher <rob.lougher at gmail.com>; Nema, Ashutosh <Ashutosh.Nema at amd.com> Cc: Saito, Hideki <hideki.saito at intel.com>; Sanjay Patel <spatel at rotateright.com>; mzolotukhin at apple.com; llvm-dev at lists.llvm.org; dccitaliano at gmail.com; Masten, Matt <matt.masten at intel.com> Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls? On 07/04/2018 07:50 AM, Robert Lougher wrote: Hi, On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev <llvm-dev at lists.llvm.org> wrote: + llvm-dev -----Original Message----- From: Nema, Ashutosh Sent: Wednesday, July 4, 2018 12:12 PM To: Hal Finkel <hfinkel at anl.gov>; Saito, Hideki <hideki.saito at intel.com>; Sanjay Patel <spatel at rotateright.com>; mzolotukhin at apple.com Cc: dccitaliano at gmail.com; Masten, Matt <matt.masten at intel.com> Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls? Hi Hal,> __svml_sin8 (plus whatever shuffles are necessary). > The vectorizer should do this. > It should not generate calls to functions that don't exist.I'm not sure how vectorizer will do this, consider the case where "-vectorizer-maximize-bandwidth" option is enabled and vectorizer is forced to generate the wider VF, and hence it may generate a call to __svml_sin_* which may not exist. Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to two __svml_sin_4 calls ? Regards, Ashutosh If an accurate cost model was in place (which there isn't), then an "unsupported" vectorization factor should only be selected if it was forced. However, in this case __svml_sin_8 is the same cost as __svml_sin_4, so the loop vectorizer will select a VF of 8, and generate a call to a function which effectively doesn't exist. Would it actually be the same, or would there be extra shuffle costs associated with the calls to __svml_sin_4? The simplest way to fix it, is to simply only populate the SVML vector library table with __svml_sin_8 when the target is AVX-512. I believe that this is exactly what we should do. When not targeting AVX-512, __svml_sin_8 essentially doesn't exist (i.e. there's no usable ABI via which we can call it), and so it should not appear in the vectorizer's list of options at all. -Hal Alternatively, TLI.isFunctionVectorizable() should check that the entry is available on the target (this is more difficult as the type is not encoded). I'm guessing that the cost model would then make VF=4 cheaper, so generating calls to __svml_sin_4 (I'm not in work so can't check). If the vectorization factor was forced to 8, we'll either get a call to the intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer will scalarize the call. The vectorizer would not generate two calls to __svml_sin_4 although this would be cheaper. While this problem probably doesn't require the loop vectorizer to have knowledge of the target ABI, others may do. I'm thinking specifically of D48193: https://reviews.llvm.org/D48193 In this case we have poor code generation due to the interleave count selected by the loop vectorizer. I can't see how this can be fixed later, so we will need to expose details of the ABI to the loop vectorizer (see my latest comment D48193#1149705). Thanks, Rob. -- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory