thr3ads.net - llvm dev - [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls? [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Robert Lougher via llvm-dev

2018-Jul-04 12:50 UTC

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Hi,

On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> + llvm-dev
>
> -----Original Message-----
> From: Nema, Ashutosh
> Sent: Wednesday, July 4, 2018 12:12 PM
> To: Hal Finkel <hfinkel at anl.gov>; Saito, Hideki <hideki.saito
at intel.com>;
> Sanjay Patel <spatel at rotateright.com>; mzolotukhin at apple.com
> Cc: dccitaliano at gmail.com; Masten, Matt <matt.masten at intel.com>
> Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
> Hi Hal,
>
> > __svml_sin8 (plus whatever shuffles are necessary).
> > The vectorizer should do this.
> > It should not generate calls to functions that don't exist.
>
> I'm not sure how vectorizer will do this, consider the case where
> "-vectorizer-maximize-bandwidth" option is enabled and vectorizer
is
> forced to generate the wider VF, and hence it may generate a call to
> __svml_sin_* which may not exist.
>
> Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to
> two __svml_sin_4 calls ?
>
> Regards,
> Ashutosh
>
If an accurate cost model was in place (which there isn't), then an
"unsupported" vectorization factor should only be selected if it was
forced.  However, in this case __svml_sin_8 is the same cost as
__svml_sin_4, so the loop vectorizer will select a VF of 8, and generate a
call to a function which effectively doesn't exist.

The simplest way to fix it, is to simply only populate the SVML vector
library table with __svml_sin_8 when the target is AVX-512.  Alternatively,
TLI.isFunctionVectorizable() should check that the entry is available on
the target (this is more difficult as the type is not encoded).

I'm guessing that the cost model would then make VF=4 cheaper, so
generating calls to __svml_sin_4 (I'm not in work so can't check).   If
the
vectorization factor was forced to 8, we'll either get a call to the
intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer will
scalarize the call.  The vectorizer would not generate two calls to
__svml_sin_4 although this would be cheaper.

While this problem probably doesn't require the loop vectorizer to have
knowledge of the target ABI, others may do.  I'm thinking specifically of
D48193:

https://reviews.llvm.org/D48193

In this case we have poor code generation due to the interleave count
selected by the loop vectorizer.  I can't see how this can be fixed later,
so we will need to expose details of the ABI to the loop vectorizer (see my
latest comment D48193#1149705).

Thanks,
Rob.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180704/e76932ab/attachment-0001.html>

Hal Finkel via llvm-dev

2018-Jul-04 16:58 UTC

head link

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

On 07/04/2018 07:50 AM, Robert Lougher wrote:> Hi,
>
> On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev
> <llvm-dev at lists.llvm.org <mailto:llvm-dev at
lists.llvm.org>> wrote:
>
>     + llvm-dev
>
>     -----Original Message-----
>     From: Nema, Ashutosh
>     Sent: Wednesday, July 4, 2018 12:12 PM
>     To: Hal Finkel <hfinkel at anl.gov <mailto:hfinkel at
anl.gov>>; Saito,
>     Hideki <hideki.saito at intel.com <mailto:hideki.saito at
intel.com>>;
>     Sanjay Patel <spatel at rotateright.com
>     <mailto:spatel at rotateright.com>>; mzolotukhin at apple.com
>     <mailto:mzolotukhin at apple.com>
>     Cc: dccitaliano at gmail.com <mailto:dccitaliano at gmail.com>;
Masten,
>     Matt <matt.masten at intel.com <mailto:matt.masten at
intel.com>>
>     Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize
>     VECLIB calls?
>
>     Hi Hal,
>
>     > __svml_sin8 (plus whatever shuffles are necessary).
>     > The vectorizer should do this.
>     > It should not generate calls to functions that don't exist.
>
>     I'm not sure how vectorizer will do this, consider the case where
>     "-vectorizer-maximize-bandwidth" option is enabled and
vectorizer
>     is forced to generate the wider VF, and hence it may generate a
>     call to __svml_sin_* which may not exist.
>
>     Are you expecting the vectorizer to lower the calls i.e.
>     __svml_sin_8 to two __svml_sin_4 calls ?
>
>     Regards,
>     Ashutosh
>
>
> If an accurate cost model was in place (which there isn't), then an
> "unsupported" vectorization factor should only be selected if it
was
> forced.  However, in this case __svml_sin_8 is the same cost as
> __svml_sin_4, so the loop vectorizer will select a VF of 8, and
> generate a call to a function which effectively doesn't exist.
Would it actually be the same, or would there be extra shuffle costs
associated with the calls to __svml_sin_4?
>
> The simplest way to fix it, is to simply only populate the SVML vector
> library table with __svml_sin_8 when the target is AVX-512.
I believe that this is exactly what we should do. When not targeting
AVX-512, __svml_sin_8 essentially doesn't exist (i.e. there's no usable
ABI via which we can call it), and so it should not appear in the
vectorizer's list of options at all.

 -Hal
>   Alternatively, TLI.isFunctionVectorizable() should check that the
> entry is available on the target (this is more difficult as the type
> is not encoded).
>
> I'm guessing that the cost model would then make VF=4 cheaper, so
> generating calls to __svml_sin_4 (I'm not in work so can't
check).  
> If the vectorization factor was forced to 8, we'll either get a call
> to the intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer
> will scalarize the call.  The vectorizer would not generate two calls
> to __svml_sin_4 although this would be cheaper.
>
> While this problem probably doesn't require the loop vectorizer to
> have knowledge of the target ABI, others may do.  I'm thinking
> specifically of D48193:
>
> https://reviews.llvm.org/D48193
>
> In this case we have poor code generation due to the interleave count
> selected by the loop vectorizer.  I can't see how this can be fixed
> later, so we will need to expose details of the ABI to the loop
> vectorizer (see my latest comment D48193#1149705).
>
> Thanks,
> Rob.
>
>
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180704/bae85fe1/attachment.html>

Saito, Hideki via llvm-dev

2018-Jul-09 17:36 UTC

head link

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

All,

It looks like we are finally converging into 

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions
in scalarized form, adding legalized call functionality is in some sense similar
to that. Vectorizer can’t simply choose type legal function name with illegal
vector ---- since LegalizeVectorType() will still end up using one call instead
of two.

I was hoping to collectively come up with a better solution, but not at all
surprised to see us settling down to this known-to-work practical approach.

We need a more elaborate VECLIB setting, taking per-target function availability
into account. Also, much of the "legalized call" mechanism should work
for OpenMP declare simd --- and we should make that easier for reuse in case
other FE/optimizers want to emit legalized calls.

Simon, is the RV "legalized call emission" code easily reusable
outside of RV? If yes, would you be able to restructure it so that it can
reside, say, under Transforms/Utils?

I think this RFC is ready to close at the end of this week. Thank you very much
for all the lively discussions. If anybody have more inputs, please speak up
soon.

Thanks,
Hideki

==========================================From: Hal Finkel [mailto:hfinkel at
anl.gov]
Sent: Wednesday, July 04, 2018 9:59 AM
To: Robert Lougher <rob.lougher at gmail.com>; Nema, Ashutosh
<Ashutosh.Nema at amd.com>
Cc: Saito, Hideki <hideki.saito at intel.com>; Sanjay Patel <spatel at
rotateright.com>; mzolotukhin at apple.com; llvm-dev at lists.llvm.org;
dccitaliano at gmail.com; Masten, Matt <matt.masten at intel.com>
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

On 07/04/2018 07:50 AM, Robert Lougher wrote:
Hi,

On 4 July 2018 at 07:42, Nema, Ashutosh via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
+ llvm-dev

-----Original Message-----
From: Nema, Ashutosh 
Sent: Wednesday, July 4, 2018 12:12 PM
To: Hal Finkel <hfinkel at anl.gov>; Saito, Hideki <hideki.saito at
intel.com>; Sanjay Patel <spatel at rotateright.com>; mzolotukhin at
apple.com
Cc: dccitaliano at gmail.com; Masten, Matt <matt.masten at intel.com>
Subject: RE: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Hi Hal,
> __svml_sin8 (plus whatever shuffles are necessary). 
> The vectorizer should do this.
> It should not generate calls to functions that don't exist.
I'm not sure how vectorizer will do this, consider the case where
"-vectorizer-maximize-bandwidth" option is enabled and vectorizer is
forced to generate the wider VF, and hence it may generate a call to
__svml_sin_* which may not exist.

Are you expecting the vectorizer to lower the calls i.e. __svml_sin_8 to two
__svml_sin_4 calls ?

Regards,
Ashutosh

If an accurate cost model was in place (which there isn't), then an
"unsupported" vectorization factor should only be selected if it was
forced.  However, in this case __svml_sin_8 is the same cost as __svml_sin_4, so
the loop vectorizer will select a VF of 8, and generate a call to a function
which effectively doesn't exist.

Would it actually be the same, or would there be extra shuffle costs associated
with the calls to __svml_sin_4?

The simplest way to fix it, is to simply only populate the SVML vector library
table with __svml_sin_8 when the target is AVX-512.

I believe that this is exactly what we should do. When not targeting AVX-512,
__svml_sin_8 essentially doesn't exist (i.e. there's no usable ABI via
which we can call it), and so it should not appear in the vectorizer's list
of options at all.

 -Hal

  Alternatively, TLI.isFunctionVectorizable() should check that the entry is
available on the target (this is more difficult as the type is not encoded).
I'm guessing that the cost model would then make VF=4 cheaper, so generating
calls to __svml_sin_4 (I'm not in work so can't check).   If the
vectorization factor was forced to 8, we'll either get a call to the
intrinsic llvm.sin.v8f64 (if no-math-errno) or the vectorizer will scalarize the
call.  The vectorizer would not generate two calls to __svml_sin_4 although this
would be cheaper.

While this problem probably doesn't require the loop vectorizer to have
knowledge of the target ABI, others may do.  I'm thinking specifically of
D48193:

https://reviews.llvm.org/D48193
In this case we have poor code generation due to the interleave count selected
by the loop vectorizer.  I can't see how this can be fixed later, so we will
need to expose details of the ABI to the loop vectorizer (see my latest comment
D48193#1149705).
Thanks,
Rob.

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Jul 2018 - [RFC][VECLIB] how should we legalize VECLIB calls?

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?