thr3ads.net - llvm dev - [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls? [Jul 2018]

If this information is useful, please help other people find it:
Share via:

Venkataramanan Kumar via llvm-dev

2018-Jul-02 06:38 UTC

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Adding to Ashutosh's comments,  We are also interested in making LLVM
generate vector math library calls that are available with glibc (version >
2.22).

reference: https://sourceware.org/glibc/wiki/libmvec

Using the example case given in the reference, we found there are  2 vector
versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx)
version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding
new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate
the vector version.

But unable to decide which version to expand in the vectorizer. We needed
the  TTI information (ISA ).  It looks like better to legalize or generate
them later.

regards,
Venkat.


On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> Hi Hideki -
>
> I hinted at this problem in the summary text of https://reviews.llvm.org/
> D47610:
> Why are we transforming from LLVM intrinsics to platform-specific
> intrinsics in IR? I don't see the benefit.
>
> I don't know if it solves all of the problems you're seeing, but it
should
> be a small change to transform to the platform-specific SVML or other
> intrinsics in the DAG. We already do this for mathlib calls on Linux for
> example when we can use the finite versions of the calls. Have a look in
> SelectionDAGLegalize::ConvertNodeToLibcall():
>
>     if (CanUseFiniteLibCall &&
DAG.getLibInfo().has(LibFunc_log_finite))
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
>                                         RTLIB::LOG_FINITE_F64,
>                                         RTLIB::LOG_FINITE_F80,
>                                         RTLIB::LOG_FINITE_F128,
>                                         RTLIB::LOG_FINITE_PPCF128));
>     else
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32,
> RTLIB::LOG_F64,
>                                         RTLIB::LOG_F80, RTLIB::LOG_F128,
>                                         RTLIB::LOG_PPCF128));
>
>
>
>
> On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <hideki.saito at
intel.com>
> wrote:
>
>>
>>
>> Ashutosh,
>>
>>
>>
>> Thanks for the repy.
>>
>>
>>
>> Related earlier topic on this appears in the review of the SVML patch
>> (@mmasten). Adding few names from there.
>>
>> https://reviews.llvm.org/D19544
>>
>> There, I see Hal’s review comment “let’s start only with the
>> directly-legal calls”. Apparently, what we have right now
>>
>> in the trunk is “not legal enough”. I’ll work on the patch to stop
>> bleeding while we continue to discuss legalization topic.
>>
>>
>>
>> I suppose
>>
>> 1)      LV only solution (let LV emit already legalized VECLIB calls)
is
>> certainly not scalable. It won’t help if VECLIB calls
>> are generated elsewhere. Also, keeping VF low enough to prevent the
>> legalization problem is only a workaround,
>> not a solution.
>>
>> 2)      Assuming that we have to go to IR to IR pass route, there are 3
>> ways to think:
>>
>> a.       Go with very generic IR to IR legalization pass comparable to
>> ISD level legalization. This is most general
>> but I’d think this is the highest cost for development.
>>
>> b.      Go with Intrinsic-only legalization and then apply VECLIB
>> afterwards. This requires all scalar functions
>> with VECLIB mapping to be added to intrinsic.
>>
>> c.       Go with generic enough function call legalization, with the
>> ability to add custom legalization for each VECLIB
>> (and if needed each VECLIB or non-VECLIB entry).
>>
>>
>>
>> I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more
>> flexible. So, I guess we don’t really have to tie this
>>
>> discussion with “letting LV emit widened math call instead of VECLIB”,
>> even though I strongly favor that than LV emitting
>>
>> VECLIB calls.
>>
>>
>>
>> @Davide, in D19544, @spatel thought LibCallSimplifier has relevance to
>> this legalization topic. Do you know enough about
>>
>> LibCallSimiplifer to tell whether it can be extended to deal with 2.b)
or
>> 2.c)?
>>
>>
>>
>> If we think 2.b)/2.c) are right enough directions, I can clean up what
we
>> have and upload it to Phabricator as a starting point
>>
>> to get to 2.b)/2.c).
>>
>>
>>
>> Continue waiting for more feedback. I guess I shouldn’t expect a lot
this
>> week and next due to the big holiday in the U.S.
>>
>>
>>
>> Thanks,
>>
>> Hideki
>>
>>
>>
>> *From:* Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com]
>> *Sent:* Thursday, June 28, 2018 11:37 PM
>> *To:* Saito, Hideki <hideki.saito at intel.com>
>> *Cc:* llvm-dev at lists.llvm.org
>> *Subject:* RE: [RFC][VECLIB] how should we legalize VECLIB calls?
>>
>>
>>
>> Hi Saito,
>>
>>
>>
>> At AMD we have our own version of vector library and faced similar
>> problems, we followed the SVML path and from vectorizer generated the
>> respective vector calls. When vectorizer generates the respective calls
i.e
>> __svml_sin_4 or __amdlibm_sin_4, later one can perform only string
matching
>> to identify the vector lib call. I’m not sure it’s the proper way, may
be
>> instead of generating respective calls it’s better to generate some
>> standard call (may be intrinsics) and lower it later. A late IR pass
can be
>> introduced to perform lowering, this will lower the intrinsic calls to
>> specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be
>> table driven to decide the action based on the vector library, function
>> name, VF and target information, the action can be full-serialize,
>> partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.
>>
>>
>>
>> Thanks,
>>
>> Ashutosh
>>
>>
>>
>> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
>> <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *Saito,
Hideki via
>> llvm-dev
>> *Sent:* Friday, June 29, 2018 7:41 AM
>> *To:* 'Saito, Hideki via llvm-dev' <llvm-dev at
lists.llvm.org>
>> *Subject:* [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB
calls?
>>
>>
>>
>>
>>
>> Illustrative Example:
>>
>>
>>
>> clang -fveclib=SVML -O3 svml.c -mavx
>>
>>
>>
>> #include <math.h>
>>
>> void foo(double *a, int N){
>>
>>   int i;
>>
>> #pragma clang loop vectorize_width(8)
>>
>>   for (i=0;i<N;i++){
>>
>>     a[i] = sin(i);
>>
>>   }
>>
>> }
>>
>>
>>
>> Currently, this results in a call to <8 x double>
__svml_sin8(<8 x
>> double>) after the vectorizer.
>>
>> This is 8-element SVML sin() called with 8-element argument. On the
>> surface, this looks very good.
>>
>> Later on, standard vector type legalization kicks-in but only the
>> argument and return data are legalized.
>>
>>         vmovaps %ymm0, %ymm1
>>
>>         vcvtdq2pd       %xmm1, %ymm0
>>
>>         vextractf128    $1, %ymm1, %xmm1
>>
>>         vcvtdq2pd       %xmm1, %ymm1
>>
>>         callq   __svml_sin8
>>
>>         vmovups %ymm1, 32(%r15,%r12,8)
>>
>>         vmovups %ymm0, (%r15,%r12,8)
>>
>> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It
>> takes zmm0 and returns zmm0.
>>
>> i.e., not legal to use for AVX.
>>
>>
>>
>> What we need to see instead is two calls to __svml_sin4(), like below.
>>
>>         vmovaps %ymm0, %ymm1
>>
>>         vcvtdq2pd       %xmm1, %ymm0
>>
>>         vextractf128    $1, %ymm1, %xmm1
>>
>>         vcvtdq2pd       %xmm1, %ymm1
>>
>>         callq   __svml_sin4
>>
>>         vmovups %ymm0, 32(%r15,%r12,8)
>>
>>         vmovups %ymm1, ymm0
>>
>>         callq   __svml_sin4
>>
>>         vmovups %ymm0, (%r15,%r12,8)
>>
>>
>>
>> What would be the most acceptable way to make this happen? Anybody
having
>> had a similar need previously?
>>
>>
>>
>> Easiest workaround is to serialize the call above “type legal”
>> vectorization factor. This can be done with a few lines of code,
>>
>> plus the code to recognize that the call is “SVML” (which is currently
>> string match against “__svml” prefix in my local workspace).
>>
>> If higher VF is not forced, cost model will likely favor lower VF.
>> Functionally correct, but obviously not an ideal solution.
>>
>>
>>
>> Here are a few ideas I thought about:
>>
>> 1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t
>> seem to work. We could define a generic ISD::VECLIB
>> and try to split into two or more VECLIB nodes, but at that moment we
>> lost the information about which function to call.
>> We can’t define ISD opcode per function. There will be too many libm
>> entries to deal with. We need a scalable solution.
>>
>> 2)      We could write an IR to IR pass to perform IR level
>> legalization. This is essentially duplicating the functionality of
>> LegalizeVectorType()
>> but we can make this available for other similar things that can’t use
>> ISD level vector type legalization. This looks to be attractive enough
>> from that perspective.
>>
>> 3)      We have implemented something similar to 2), but legalization
>> code is specialized for SVML legalization. This was much quicker than
>> trying to generalize the legalization scheme, but I’d imagine community
>> won’t like it.
>>
>> 4)      Vectorizer emit legalized VECLIB calls. Since it can emit
>> instructions in scalarized form, adding legalized call functionality is
in
>> some sense
>> similar to that. Vectorizer can’t simply choose type legal function
name
>> with illegal vector ---- since LegalizeVectorType() will still
>> end up using one call instead of two.
>>
>>
>>
>> Anything else?
>>
>>
>>
>> Also, doing any of this requires reverse mapping from VECLIB name to
>> scalar function name. What’s the most recommended way to do so?
>>
>> Can we use TableGen to create a reverse map?
>>
>>
>>
>> Your input is greatly appreciated. Is there a real need/desire for 2)
>> outside of VECLIB (or outside of SVML)?
>>
>>
>>
>> Thanks,
>>
>> Hideki Saito
>>
>> Intel Corporation
>>
>>
>>
>>
>>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180702/2f86a9cd/attachment.html>

Saito, Hideki via llvm-dev

2018-Jul-02 17:52 UTC

head link

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Venkat, we did not invent LLVM’s VecLib functionality. The original version of
D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a separate pass to
convert widened math lib to SVML.
Our preference for “vectorized sin()” is just widened sin(), that is to be
lowered to a specific library call at a later point (either as IR to IR or in
CodeGen). Matt tried to sell that idea and it didn’t go through.
Anyone else willing to work with us to try it again? In my opinion, however,
this is a related but different topic from legalization issue.

Sanjay, I think what you are suggesting would work better if we don’t map math
lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_ enums, one
from each different math function multiplied by each vectorization factor ---
for each different VecLib. That’s way too many. If that’s one per different math
functions, I’d guess it’s 100+. Still a lot but manageable. This requires those
functions to be listed in the intrinsics, right? That’s another reason some
people favor VecLib mapping at vectorizer. Those math functions don’t have to be
added to the intrinsics.

I don’t insist on IR to IR legalization. However, I’m also interested in being
able to legalize OpenMP declare simd function calls (**). These are user
functions and as such we have no ways to list them as intrinsics or have RTLIB:
enums predefined. For each Target, vector function ABI defines how the
parameters need to be passed and Legalizer should be implemented based on the
ABI, w/o knowing the details of what the user function does. Math lib only
solution doesn’t help legalization of OpenMP declare simd.

Thanks,
Hideki

--------------------------------
(**)
#pragma omp declare simd uniform(a), linear(i)
void foo(float *a, int i);

…

#pragma omp simd
for(i) {                   // this loop could be vectorized with VF that’s wider
than widest available vector function for foo().
    …
    foo(a, i)
    …
}

From: Venkataramanan Kumar [mailto:venkataramanan.kumar.llvm at gmail.com]
Sent: Sunday, July 01, 2018 11:38 PM
To: Sanjay Patel <spatel at rotateright.com>
Cc: Saito, Hideki <hideki.saito at intel.com>; llvm-dev at lists.llvm.org;
Masten, Matt <matt.masten at intel.com>; dccitaliano at gmail.com
Subject: Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Adding to Ashutosh's comments,  We are also interested in making LLVM
generate vector math library calls that are available with glibc (version >
2.22).

reference: https://sourceware.org/glibc/wiki/libmvec

Using the example case given in the reference, we found there are  2 vector
versions for "sin" (4 X double) with same VF namely _ZGVcN4v_sin (avx)
version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path adding new
entry in VecDesc structure in TargetLibraryInfo.cpp,  we can generate the vector
version.

But unable to decide which version to expand in the vectorizer. We needed the 
TTI information (ISA ).  It looks like better to legalize or generate them
later.

regards,
Venkat.


On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>> wrote:
Hi Hideki -

I hinted at this problem in the summary text of https://reviews.llvm.org/D47610:
Why are we transforming from LLVM intrinsics to platform-specific intrinsics in
IR? I don't see the benefit.

I don't know if it solves all of the problems you're seeing, but it
should be a small change to transform to the platform-specific SVML or other
intrinsics in the DAG. We already do this for mathlib calls on Linux for example
when we can use the finite versions of the calls. Have a look in
SelectionDAGLegalize::ConvertNodeToLibcall():

    if (CanUseFiniteLibCall && DAG.getLibInfo().has(LibFunc_log_finite))
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
                                        RTLIB::LOG_FINITE_F64,
                                        RTLIB::LOG_FINITE_F80,
                                        RTLIB::LOG_FINITE_F128,
                                        RTLIB::LOG_FINITE_PPCF128));
    else
      Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32, RTLIB::LOG_F64,
                                        RTLIB::LOG_F80, RTLIB::LOG_F128,
                                        RTLIB::LOG_PPCF128));




On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <hideki.saito at
intel.com<mailto:hideki.saito at intel.com>> wrote:

Ashutosh,

Thanks for the repy.

Related earlier topic on this appears in the review of the SVML patch
(@mmasten). Adding few names from there.
https://reviews.llvm.org/D19544
There, I see Hal’s review comment “let’s start only with the directly-legal
calls”. Apparently, what we have right now
in the trunk is “not legal enough”. I’ll work on the patch to stop bleeding
while we continue to discuss legalization topic.

I suppose

1)      LV only solution (let LV emit already legalized VECLIB calls) is
certainly not scalable. It won’t help if VECLIB calls
are generated elsewhere. Also, keeping VF low enough to prevent the legalization
problem is only a workaround,
not a solution.

2)      Assuming that we have to go to IR to IR pass route, there are 3 ways to
think:

a.       Go with very generic IR to IR legalization pass comparable to ISD level
legalization. This is most general
but I’d think this is the highest cost for development.

b.      Go with Intrinsic-only legalization and then apply VECLIB afterwards.
This requires all scalar functions
with VECLIB mapping to be added to intrinsic.

c.       Go with generic enough function call legalization, with the ability to
add custom legalization for each VECLIB
(and if needed each VECLIB or non-VECLIB entry).

I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more
flexible. So, I guess we don’t really have to tie this
discussion with “letting LV emit widened math call instead of VECLIB”, even
though I strongly favor that than LV emitting
VECLIB calls.

@Davide, in D19544, @spatel thought LibCallSimplifier has relevance to this
legalization topic. Do you know enough about
LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or 2.c)?

If we think 2.b)/2.c) are right enough directions, I can clean up what we have
and upload it to Phabricator as a starting point
to get to 2.b)/2.c).

Continue waiting for more feedback. I guess I shouldn’t expect a lot this week
and next due to the big holiday in the U.S.

Thanks,
Hideki

From: Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com<mailto:Ashutosh.Nema at
amd.com>]
Sent: Thursday, June 28, 2018 11:37 PM
To: Saito, Hideki <hideki.saito at intel.com<mailto:hideki.saito at
intel.com>>
Cc: llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
Subject: RE: [RFC][VECLIB] how should we legalize VECLIB calls?

Hi Saito,

At AMD we have our own version of vector library and faced similar problems, we
followed the SVML path and from vectorizer generated the respective vector
calls. When vectorizer generates the respective calls i.e __svml_sin_4 or
__amdlibm_sin_4, later one can perform only string matching to identify the
vector lib call. I’m not sure it’s the proper way, may be instead of generating
respective calls it’s better to generate some standard call (may be intrinsics)
and lower it later. A late IR pass can be introduced to perform lowering, this
will lower the intrinsic calls to specific lib calls(__svml_sin_4 or
__amdlibm_sin_4 or … ). This can be table driven to decide the action based on
the vector library, function name, VF and target information, the action can be
full-serialize, partial-serialize(VF8 to 2 VF4) or generate the lib call with
same VF.

Thanks,
Ashutosh

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Saito,
Hideki via llvm-dev
Sent: Friday, June 29, 2018 7:41 AM
To: 'Saito, Hideki via llvm-dev' <llvm-dev at
lists.llvm.org<mailto:llvm-dev at lists.llvm.org>>
Subject: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?


Illustrative Example:

clang -fveclib=SVML -O3 svml.c -mavx

#include <math.h>
void foo(double *a, int N){
  int i;
#pragma clang loop vectorize_width(8)
  for (i=0;i<N;i++){
    a[i] = sin(i);
  }
}

Currently, this results in a call to <8 x double> __svml_sin8(<8 x
double>) after the vectorizer.
This is 8-element SVML sin() called with 8-element argument. On the surface,
this looks very good.
Later on, standard vector type legalization kicks-in but only the argument and
return data are legalized.
        vmovaps %ymm0, %ymm1
        vcvtdq2pd       %xmm1, %ymm0
        vextractf128    $1, %ymm1, %xmm1
        vcvtdq2pd       %xmm1, %ymm1
        callq   __svml_sin8
        vmovups %ymm1, 32(%r15,%r12,8)
        vmovups %ymm0, (%r15,%r12,8)
Unfortunately, __svml_sin8() doesn’t use this form of input/output. It takes
zmm0 and returns zmm0.
i.e., not legal to use for AVX.

What we need to see instead is two calls to __svml_sin4(), like below.
        vmovaps %ymm0, %ymm1
        vcvtdq2pd       %xmm1, %ymm0
        vextractf128    $1, %ymm1, %xmm1
        vcvtdq2pd       %xmm1, %ymm1
        callq   __svml_sin4
        vmovups %ymm0, 32(%r15,%r12,8)
        vmovups %ymm1, ymm0
        callq   __svml_sin4
        vmovups %ymm0, (%r15,%r12,8)

What would be the most acceptable way to make this happen? Anybody having had a
similar need previously?

Easiest workaround is to serialize the call above “type legal” vectorization
factor. This can be done with a few lines of code,
plus the code to recognize that the call is “SVML” (which is currently string
match against “__svml” prefix in my local workspace).
If higher VF is not forced, cost model will likely favor lower VF. Functionally
correct, but obviously not an ideal solution.

Here are a few ideas I thought about:

1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t seem to
work. We could define a generic ISD::VECLIB
and try to split into two or more VECLIB nodes, but at that moment we lost the
information about which function to call.
We can’t define ISD opcode per function. There will be too many libm entries to
deal with. We need a scalable solution.

2)      We could write an IR to IR pass to perform IR level legalization. This
is essentially duplicating the functionality of LegalizeVectorType()
but we can make this available for other similar things that can’t use ISD level
vector type legalization. This looks to be attractive enough
from that perspective.

3)      We have implemented something similar to 2), but legalization code is
specialized for SVML legalization. This was much quicker than
trying to generalize the legalization scheme, but I’d imagine community won’t
like it.

4)      Vectorizer emit legalized VECLIB calls. Since it can emit instructions
in scalarized form, adding legalized call functionality is in some sense
similar to that. Vectorizer can’t simply choose type legal function name with
illegal vector ---- since LegalizeVectorType() will still
end up using one call instead of two.

Anything else?

Also, doing any of this requires reverse mapping from VECLIB name to scalar
function name. What’s the most recommended way to do so?
Can we use TableGen to create a reverse map?

Your input is greatly appreciated. Is there a real need/desire for 2) outside of
VECLIB (or outside of SVML)?

Thanks,
Hideki Saito
Intel Corporation




_______________________________________________
LLVM Developers mailing list
llvm-dev at lists.llvm.org<mailto:llvm-dev at lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180702/412bbb3e/attachment.html>

Sanjay Patel via llvm-dev

2018-Jul-02 18:48 UTC

head link

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

It may not be a full solution for the problems you're trying to solve, but
I don't know why adding to include/llvm/CodeGen/RuntimeLibcalls.def is a
problem in itself. Certainly, it's a mess that could be organized,
especially so we're not repeating everything for each data type as we do
right now.

So yes, I think that would allow us to remove the VecLib mappings because
we are always waiting until codegen to make the translation from generic IR
to target-specific libcall. Or is there some reason that the vectorizer
needs to be aware of those libcalls?

On Mon, Jul 2, 2018 at 11:52 AM, Saito, Hideki <hideki.saito at intel.com>
wrote:
>
>
> Venkat, we did not invent LLVM’s VecLib functionality. The original
> version of D19544 (https://reviews.llvm.org/D19544?id=55036) was indeed a
> separate pass to convert widened math lib to SVML.
>
> Our preference for “vectorized sin()” is just widened sin(), that is to be
> lowered to a specific library call at a later point (either as IR to IR or
> in CodeGen). Matt tried to sell that idea and it didn’t go through.
>
> Anyone else willing to work with us to try it again? In my opinion,
> however, this is a related but different topic from legalization issue.
>
>
>
> Sanjay, I think what you are suggesting would work better if we don’t map
> math lib calls to VecLib. Otherwise, we’ll have too many RTLIB:VECLIB_
> enums, one from each different math function multiplied by each
> vectorization factor --- for each different VecLib. That’s way too many. If
> that’s one per different math functions, I’d guess it’s 100+. Still a lot
> but manageable. This requires those functions to be listed in the
> intrinsics, right? That’s another reason some people favor VecLib mapping
> at vectorizer. Those math functions don’t have to be added to the
> intrinsics.
>
>
>
> I don’t insist on IR to IR legalization. However, I’m also interested in
> being able to legalize OpenMP declare simd function calls (**). These are
> user functions and as such we have no ways to list them as intrinsics or
> have RTLIB: enums predefined. For each Target, vector function ABI defines
> how the parameters need to be passed and Legalizer should be implemented
> based on the ABI, w/o knowing the details of what the user function does.
> Math lib only solution doesn’t help legalization of OpenMP declare simd.
>
>
>
> Thanks,
>
> Hideki
>
>
>
> --------------------------------
>
> (**)
>
> #pragma omp declare simd uniform(a), linear(i)
>
> void foo(float *a, int i);
>
>
>
> …
>
>
>
> #pragma omp simd
>
> for(i) {                   // this loop could be vectorized with VF that’s
> wider than widest available vector function for foo().
>     …
>     foo(a, i)
>     …
>
> }
>
>
>
> *From:* Venkataramanan Kumar [mailto:venkataramanan.kumar.llvm at
gmail.com]
> *Sent:* Sunday, July 01, 2018 11:38 PM
> *To:* Sanjay Patel <spatel at rotateright.com>
> *Cc:* Saito, Hideki <hideki.saito at intel.com>; llvm-dev at
lists.llvm.org;
> Masten, Matt <matt.masten at intel.com>; dccitaliano at gmail.com
> *Subject:* Re: [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB
> calls?
>
>
>
> Adding to Ashutosh's comments,  We are also interested in making LLVM
> generate vector math library calls that are available with glibc (version
>
> 2.22).
>
>
>
> reference: https://sourceware.org/glibc/wiki/libmvec
>
>
>
> Using the example case given in the reference, we found there are  2
> vector versions for "sin" (4 X double) with same VF namely
_ZGVcN4v_sin
> (avx) version and _ZGVdN4v_sin (avx2) versions.  Following the SVML path
> adding new entry in VecDesc structure in TargetLibraryInfo.cpp,  we can
> generate the vector version.
>
>
>
> But unable to decide which version to expand in the vectorizer. We needed
> the  TTI information (ISA ).  It looks like better to legalize or generate
> them later.
>
>
>
> regards,
>
> Venkat.
>
>
>
>
>
> On 30 June 2018 at 04:04, Sanjay Patel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi Hideki -
>
>
>
> I hinted at this problem in the summary text of
> https://reviews.llvm.org/D47610:
>
> Why are we transforming from LLVM intrinsics to platform-specific
> intrinsics in IR? I don't see the benefit.
>
>
>
> I don't know if it solves all of the problems you're seeing, but it
should
> be a small change to transform to the platform-specific SVML or other
> intrinsics in the DAG. We already do this for mathlib calls on Linux for
> example when we can use the finite versions of the calls. Have a look in
> SelectionDAGLegalize::ConvertNodeToLibcall():
>
>
>
>     if (CanUseFiniteLibCall &&
DAG.getLibInfo().has(LibFunc_log_finite))
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_FINITE_F32,
>                                         RTLIB::LOG_FINITE_F64,
>                                         RTLIB::LOG_FINITE_F80,
>                                         RTLIB::LOG_FINITE_F128,
>                                         RTLIB::LOG_FINITE_PPCF128));
>     else
>       Results.push_back(ExpandFPLibCall(Node, RTLIB::LOG_F32,
> RTLIB::LOG_F64,
>                                         RTLIB::LOG_F80, RTLIB::LOG_F128,
>                                         RTLIB::LOG_PPCF128));
>
>
>
>
>
>
>
>
>
> On Fri, Jun 29, 2018 at 2:15 PM, Saito, Hideki <hideki.saito at
intel.com>
> wrote:
>
>
>
> Ashutosh,
>
>
>
> Thanks for the repy.
>
>
>
> Related earlier topic on this appears in the review of the SVML patch
> (@mmasten). Adding few names from there.
>
> https://reviews.llvm.org/D19544
>
> There, I see Hal’s review comment “let’s start only with the
> directly-legal calls”. Apparently, what we have right now
>
> in the trunk is “not legal enough”. I’ll work on the patch to stop
> bleeding while we continue to discuss legalization topic.
>
>
>
> I suppose
>
> 1)      LV only solution (let LV emit already legalized VECLIB calls) is
> certainly not scalable. It won’t help if VECLIB calls
> are generated elsewhere. Also, keeping VF low enough to prevent the
> legalization problem is only a workaround,
> not a solution.
>
> 2)      Assuming that we have to go to IR to IR pass route, there are 3
> ways to think:
>
> a.       Go with very generic IR to IR legalization pass comparable to
> ISD level legalization. This is most general
> but I’d think this is the highest cost for development.
>
> b.      Go with Intrinsic-only legalization and then apply VECLIB
> afterwards. This requires all scalar functions
> with VECLIB mapping to be added to intrinsic.
>
> c.       Go with generic enough function call legalization, with the
> ability to add custom legalization for each VECLIB
> (and if needed each VECLIB or non-VECLIB entry).
>
>
>
> I think the cost of 2.b) and 2.c) are similar and 2.c) seems to be more
> flexible. So, I guess we don’t really have to tie this
>
> discussion with “letting LV emit widened math call instead of VECLIB”,
> even though I strongly favor that than LV emitting
>
> VECLIB calls.
>
>
>
> @Davide, in D19544, @spatel thought LibCallSimplifier has relevance to
> this legalization topic. Do you know enough about
>
> LibCallSimiplifer to tell whether it can be extended to deal with 2.b) or
> 2.c)?
>
>
>
> If we think 2.b)/2.c) are right enough directions, I can clean up what we
> have and upload it to Phabricator as a starting point
>
> to get to 2.b)/2.c).
>
>
>
> Continue waiting for more feedback. I guess I shouldn’t expect a lot this
> week and next due to the big holiday in the U.S.
>
>
>
> Thanks,
>
> Hideki
>
>
>
> *From:* Nema, Ashutosh [mailto:Ashutosh.Nema at amd.com]
> *Sent:* Thursday, June 28, 2018 11:37 PM
> *To:* Saito, Hideki <hideki.saito at intel.com>
> *Cc:* llvm-dev at lists.llvm.org
> *Subject:* RE: [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
>
> Hi Saito,
>
>
>
> At AMD we have our own version of vector library and faced similar
> problems, we followed the SVML path and from vectorizer generated the
> respective vector calls. When vectorizer generates the respective calls i.e
> __svml_sin_4 or __amdlibm_sin_4, later one can perform only string matching
> to identify the vector lib call. I’m not sure it’s the proper way, may be
> instead of generating respective calls it’s better to generate some
> standard call (may be intrinsics) and lower it later. A late IR pass can be
> introduced to perform lowering, this will lower the intrinsic calls to
> specific lib calls(__svml_sin_4 or __amdlibm_sin_4 or … ). This can be
> table driven to decide the action based on the vector library, function
> name, VF and target information, the action can be full-serialize,
> partial-serialize(VF8 to 2 VF4) or generate the lib call with same VF.
>
>
>
> Thanks,
>
> Ashutosh
>
>
>
> *From:* llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org
> <llvm-dev-bounces at lists.llvm.org>] *On Behalf Of *Saito, Hideki
via
> llvm-dev
> *Sent:* Friday, June 29, 2018 7:41 AM
> *To:* 'Saito, Hideki via llvm-dev' <llvm-dev at
lists.llvm.org>
> *Subject:* [llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?
>
>
>
>
>
> Illustrative Example:
>
>
>
> clang -fveclib=SVML -O3 svml.c -mavx
>
>
>
> #include <math.h>
>
> void foo(double *a, int N){
>
>   int i;
>
> #pragma clang loop vectorize_width(8)
>
>   for (i=0;i<N;i++){
>
>     a[i] = sin(i);
>
>   }
>
> }
>
>
>
> Currently, this results in a call to <8 x double> __svml_sin8(<8 x
> double>) after the vectorizer.
>
> This is 8-element SVML sin() called with 8-element argument. On the
> surface, this looks very good.
>
> Later on, standard vector type legalization kicks-in but only the argument
> and return data are legalized.
>
>         vmovaps %ymm0, %ymm1
>
>         vcvtdq2pd       %xmm1, %ymm0
>
>         vextractf128    $1, %ymm1, %xmm1
>
>         vcvtdq2pd       %xmm1, %ymm1
>
>         callq   __svml_sin8
>
>         vmovups %ymm1, 32(%r15,%r12,8)
>
>         vmovups %ymm0, (%r15,%r12,8)
>
> Unfortunately, __svml_sin8() doesn’t use this form of input/output. It
> takes zmm0 and returns zmm0.
>
> i.e., not legal to use for AVX.
>
>
>
> What we need to see instead is two calls to __svml_sin4(), like below.
>
>         vmovaps %ymm0, %ymm1
>
>         vcvtdq2pd       %xmm1, %ymm0
>
>         vextractf128    $1, %ymm1, %xmm1
>
>         vcvtdq2pd       %xmm1, %ymm1
>
>         callq   __svml_sin4
>
>         vmovups %ymm0, 32(%r15,%r12,8)
>
>         vmovups %ymm1, ymm0
>
>         callq   __svml_sin4
>
>         vmovups %ymm0, (%r15,%r12,8)
>
>
>
> What would be the most acceptable way to make this happen? Anybody having
> had a similar need previously?
>
>
>
> Easiest workaround is to serialize the call above “type legal”
> vectorization factor. This can be done with a few lines of code,
>
> plus the code to recognize that the call is “SVML” (which is currently
> string match against “__svml” prefix in my local workspace).
>
> If higher VF is not forced, cost model will likely favor lower VF.
> Functionally correct, but obviously not an ideal solution.
>
>
>
> Here are a few ideas I thought about:
>
> 1)      Standard LegalizeVectorType() in CodeGen/SelectionDAG doesn’t
> seem to work. We could define a generic ISD::VECLIB
> and try to split into two or more VECLIB nodes, but at that moment we lost
> the information about which function to call.
> We can’t define ISD opcode per function. There will be too many libm
> entries to deal with. We need a scalable solution.
>
> 2)      We could write an IR to IR pass to perform IR level legalization.
> This is essentially duplicating the functionality of LegalizeVectorType()
> but we can make this available for other similar things that can’t use ISD
> level vector type legalization. This looks to be attractive enough
> from that perspective.
>
> 3)      We have implemented something similar to 2), but legalization
> code is specialized for SVML legalization. This was much quicker than
> trying to generalize the legalization scheme, but I’d imagine community
> won’t like it.
>
> 4)      Vectorizer emit legalized VECLIB calls. Since it can emit
> instructions in scalarized form, adding legalized call functionality is in
> some sense
> similar to that. Vectorizer can’t simply choose type legal function name
> with illegal vector ---- since LegalizeVectorType() will still
> end up using one call instead of two.
>
>
>
> Anything else?
>
>
>
> Also, doing any of this requires reverse mapping from VECLIB name to
> scalar function name. What’s the most recommended way to do so?
>
> Can we use TableGen to create a reverse map?
>
>
>
> Your input is greatly appreciated. Is there a real need/desire for 2)
> outside of VECLIB (or outside of SVML)?
>
>
>
> Thanks,
>
> Hideki Saito
>
> Intel Corporation
>
>
>
>
>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20180702/4ed2c77c/attachment-0001.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Jul 2018 - [RFC][VECLIB] how should we legalize VECLIB calls?

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

[llvm-dev] [RFC][VECLIB] how should we legalize VECLIB calls?

Seemingly Similar Threads