thr3ads.net - llvm dev - [llvm-dev] FMA canonicalization in IR [Nov 2016]

If this information is useful, please help other people find it:
Share via:

Sanjay Patel via llvm-dev

2016-Nov-20 04:40 UTC

[llvm-dev] FMA canonicalization in IR

The potential advantage I was considering would be more accurate cost
modeling in the vectorizer, inliner, etc. Like min/max, this is another
case where the sum of the IR parts is greater than the actual cost.

Beyond that, it seems odd to me that we'd choose the longer IR expression
of something that could be represented in a minimal form. I know we make
practical concessions in IR based on backend deficiencies, but in this case
I think the fix would be easy - if we're in contract=fast mode, just split
all of these intrinsics at DAG creation time and let the DAG or other
passes behave exactly like they do today to fuse them back together again?

On Sat, Nov 19, 2016 at 8:29 PM Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
> > From: "Hal J. via llvm-dev Finkel" <llvm-dev at
lists.llvm.org>
> > To: "Sanjay Patel" <spatel at rotateright.com>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> > Sent: Saturday, November 19, 2016 10:58:27 AM
> > Subject: Re: [llvm-dev] FMA canonicalization in IR
> >
> >
> > Sent from my Verizon Wireless 4G LTE DROID
> > On Nov 19, 2016 10:26 AM, Sanjay Patel < spatel at rotateright.com
>
> > wrote:
> > >
> > > If I have my FMA intrinsics story straight now (thanks for the
> > > explanation, Hal!), I think it raises another question about IR
> > > canonicalization (and may affect the proposed revision to IR
FMF):
> >
> >
> > No, I think that we specifically don't want to canonicalize to
> > fmuladd at the IR level at all. If the backend has the freedom to
> > form FMAs as it sees fit, then we should delay the decision until
> > whenever the backend finds most appropriate. Some backends, for
> > example, form FMAs using the MachineCombiner pass which considers
> > critical path, latency, throughputs, etc. in order to find the best
> > fusion opportunities. We only use fmuladd when required to restrict
> > the backend to certain choices due to source-language semantics.
>
> I'll also add that, in general, we canonicalize in order to enable
other
> transformations (and reduce the number of input forms those transformations
> need to match in order to be effective). Forming @llvm.fmulall at the IR
> level does not seem to further this goal. Did you have something in mind
> that this canonicalization would help?
>
> Thanks again,
> Hal
>
> >
> >
> > Thanks again,
> > Hal
> >
> >
> > >
> > > define float @foo(float %a, float %b, float %c) {
> > > %mul = fmul fast float %a, %b ; using 'fast' because
there is no
> > > 'fma' flag
> > > %add = fadd fast float %mul, %c
> > > ret float %add
> > > }
> > >
> > > Should this be:
> > >
> > > define float @goo(float %a, float %b, float %c) {
> > > % maybe.fma = call fast float @llvm.fmuladd.f32(float %a, float
%b,
> > > float %c)
> > > ret float % maybe.fma
> > > }
> > > declare float @llvm.fmuladd.f32(float %a, float %b, float %c)
> > >
> > >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161120/e0084b54/attachment.html>

Mehdi Amini via llvm-dev

2016-Nov-20 07:21 UTC

head link

[llvm-dev] FMA canonicalization in IR

Hi Sanjay,

Except for memcpy, are there other examples where going from first class
sequence of instructions to intrinsics is considered an OK canonicalization?

— 
Mehdi

> On Nov 19, 2016, at 8:40 PM, Sanjay Patel via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> The potential advantage I was considering would be more accurate cost
modeling in the vectorizer, inliner, etc. Like min/max, this is another case
where the sum of the IR parts is greater than the actual cost.
> 
> Beyond that, it seems odd to me that we'd choose the longer IR
expression of something that could be represented in a minimal form. I know we
make practical concessions in IR based on backend deficiencies, but in this case
I think the fix would be easy - if we're in contract=fast mode, just split
all of these intrinsics at DAG creation time and let the DAG or other passes
behave exactly like they do today to fuse them back together again?
> 
> On Sat, Nov 19, 2016 at 8:29 PM Hal Finkel <hfinkel at anl.gov
<mailto:hfinkel at anl.gov>> wrote:
> ----- Original Message -----
> > From: "Hal J. via llvm-dev Finkel" <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>>
> > To: "Sanjay Patel" <spatel at rotateright.com
<mailto:spatel at rotateright.com>>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org
<mailto:llvm-dev at lists.llvm.org>>
> > Sent: Saturday, November 19, 2016 10:58:27 AM
> > Subject: Re: [llvm-dev] FMA canonicalization in IR
> >
> >
> > Sent from my Verizon Wireless 4G LTE DROID
> > On Nov 19, 2016 10:26 AM, Sanjay Patel < spatel at rotateright.com
<mailto:spatel at rotateright.com> >
> > wrote:
> > >
> > > If I have my FMA intrinsics story straight now (thanks for the
> > > explanation, Hal!), I think it raises another question about IR
> > > canonicalization (and may affect the proposed revision to IR
FMF):
> >
> >
> > No, I think that we specifically don't want to canonicalize to
> > fmuladd at the IR level at all. If the backend has the freedom to
> > form FMAs as it sees fit, then we should delay the decision until
> > whenever the backend finds most appropriate. Some backends, for
> > example, form FMAs using the MachineCombiner pass which considers
> > critical path, latency, throughputs, etc. in order to find the best
> > fusion opportunities. We only use fmuladd when required to restrict
> > the backend to certain choices due to source-language semantics.
> 
> I'll also add that, in general, we canonicalize in order to enable
other transformations (and reduce the number of input forms those
transformations need to match in order to be effective). Forming @llvm.fmulall
at the IR level does not seem to further this goal. Did you have something in
mind that this canonicalization would help?
> 
> Thanks again,
> Hal
> 
> >
> >
> > Thanks again,
> > Hal
> >
> >
> > >
> > > define float @foo(float %a, float %b, float %c) {
> > > %mul = fmul fast float %a, %b ; using 'fast' because
there is no
> > > 'fma' flag
> > > %add = fadd fast float %mul, %c
> > > ret float %add
> > > }
> > >
> > > Should this be:
> > >
> > > define float @goo(float %a, float %b, float %c) {
> > > % maybe.fma = call fast float @llvm.fmuladd.f32(float %a, float
%b,
> > > float %c)
> > > ret float % maybe.fma
> > > }
> > > declare float @llvm.fmuladd.f32(float %a, float %b, float %c)
> > >
> > >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
<http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
> >
> 
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161119/05bc4b6f/attachment.html>

Hal Finkel via llvm-dev

2016-Nov-20 07:38 UTC

head link

[llvm-dev] FMA canonicalization in IR

----- Original Message -----> From: "Sanjay Patel" <spatel at rotateright.com>
> To: "Hal Finkel" <hfinkel at anl.gov>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Saturday, November 19, 2016 10:40:27 PM
> Subject: Re: [llvm-dev] FMA canonicalization in IR
> 
> 
> The potential advantage I was considering would be more accurate cost
> modeling in the vectorizer, inliner, etc. Like min/max, this is
> another case where the sum of the IR parts is greater than the
> actual cost.
This is indeed a problem, but is a much larger problem than just FMAs (as you
note). Our cost-modeling interfaces should be extended to handle instruction
patterns -- I don't see any other way of solving this in general.
> 
> Beyond that, it seems odd to me that we'd choose the longer IR
> expression of something that could be represented in a minimal form.
My fear is that, by forming the FMAs earlier than necessary, you'll just end
up limiting opportunities for CSE, reassociation, etc. without any corresponding
benefit.
> I know we make practical concessions in IR based on backend
> deficiencies, but in this case I think the fix would be easy - if
> we're in contract=fast mode, just split all of these intrinsics at
> DAG creation time and let the DAG or other passes behave exactly
> like they do today to fuse them back together again?
This is a good point; we could do this in fp-contract=fast mode.

 -Hal
> 
> On Sat, Nov 19, 2016 at 8:29 PM Hal Finkel < hfinkel at anl.gov >
wrote:
> 
> 
> ----- Original Message -----
> > From: "Hal J. via llvm-dev Finkel" < llvm-dev at
lists.llvm.org >
> > To: "Sanjay Patel" < spatel at rotateright.com >
> > Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
> > Sent: Saturday, November 19, 2016 10:58:27 AM
> > Subject: Re: [llvm-dev] FMA canonicalization in IR
> > 
> > 
> > Sent from my Verizon Wireless 4G LTE DROID
> > On Nov 19, 2016 10:26 AM, Sanjay Patel < spatel at rotateright.com
>
> > wrote:
> > > 
> > > If I have my FMA intrinsics story straight now (thanks for the
> > > explanation, Hal!), I think it raises another question about IR
> > > canonicalization (and may affect the proposed revision to IR
> > > FMF):
> > 
> > 
> > No, I think that we specifically don't want to canonicalize to
> > fmuladd at the IR level at all. If the backend has the freedom to
> > form FMAs as it sees fit, then we should delay the decision until
> > whenever the backend finds most appropriate. Some backends, for
> > example, form FMAs using the MachineCombiner pass which considers
> > critical path, latency, throughputs, etc. in order to find the best
> > fusion opportunities. We only use fmuladd when required to restrict
> > the backend to certain choices due to source-language semantics.
> 
> I'll also add that, in general, we canonicalize in order to enable
> other transformations (and reduce the number of input forms those
> transformations need to match in order to be effective). Forming
> @llvm.fmulall at the IR level does not seem to further this goal.
> Did you have something in mind that this canonicalization would
> help?
> 
> Thanks again,
> Hal
> 
> > 
> > 
> > Thanks again,
> > Hal
> > 
> > 
> > > 
> > > define float @foo(float %a, float %b, float %c) {
> > > %mul = fmul fast float %a, %b ; using 'fast' because
there is no
> > > 'fma' flag
> > > %add = fadd fast float %mul, %c
> > > ret float %add
> > > }
> > > 
> > > Should this be:
> > > 
> > > define float @goo(float %a, float %b, float %c) {
> > > % maybe.fma = call fast float @llvm.fmuladd.f32(float %a, float
> > > %b,
> > > float %c)
> > > ret float % maybe.fma
> > > }
> > > declare float @llvm.fmuladd.f32(float %a, float %b, float %c)
> > > 
> > > 
> > 
> > 
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
> 
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Sanjay Patel via llvm-dev

2016-Nov-20 15:01 UTC

head link

[llvm-dev] FMA canonicalization in IR

Hi Mehdi,
I can't think of any (and I'm away from my dev machine, so I can't
check).
If you're concerned about inhibiting transforms by introducing intrinsics
(as Hal also mentioned), I agree.

However, I see fmuladd as a special case - we already use these intrinsics
in contract=on mode, so we should already be required to handle these as
"first class" ops in the cost model and other passes. If we're
not, I think
that would be a bug.

On Sun, Nov 20, 2016 at 1:21 AM Mehdi Amini <mehdi.amini at apple.com>
wrote:
> Hi Sanjay,
>
> Except for memcpy, are there other examples where going from first class
> sequence of instructions to intrinsics is considered an OK
canonicalization?
>
> —
> Mehdi
>
>
> On Nov 19, 2016, at 8:40 PM, Sanjay Patel via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> The potential advantage I was considering would be more accurate cost
> modeling in the vectorizer, inliner, etc. Like min/max, this is another
> case where the sum of the IR parts is greater than the actual cost.
>
> Beyond that, it seems odd to me that we'd choose the longer IR
expression
> of something that could be represented in a minimal form. I know we make
> practical concessions in IR based on backend deficiencies, but in this case
> I think the fix would be easy - if we're in contract=fast mode, just
split
> all of these intrinsics at DAG creation time and let the DAG or other
> passes behave exactly like they do today to fuse them back together again?
>
> On Sat, Nov 19, 2016 at 8:29 PM Hal Finkel <hfinkel at anl.gov>
wrote:
>
> ----- Original Message -----
> > From: "Hal J. via llvm-dev Finkel" <llvm-dev at
lists.llvm.org>
> > To: "Sanjay Patel" <spatel at rotateright.com>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> > Sent: Saturday, November 19, 2016 10:58:27 AM
> > Subject: Re: [llvm-dev] FMA canonicalization in IR
> >
> >
> > Sent from my Verizon Wireless 4G LTE DROID
> > On Nov 19, 2016 10:26 AM, Sanjay Patel < spatel at rotateright.com
>
> > wrote:
> > >
> > > If I have my FMA intrinsics story straight now (thanks for the
> > > explanation, Hal!), I think it raises another question about IR
> > > canonicalization (and may affect the proposed revision to IR
FMF):
> >
> >
> > No, I think that we specifically don't want to canonicalize to
> > fmuladd at the IR level at all. If the backend has the freedom to
> > form FMAs as it sees fit, then we should delay the decision until
> > whenever the backend finds most appropriate. Some backends, for
> > example, form FMAs using the MachineCombiner pass which considers
> > critical path, latency, throughputs, etc. in order to find the best
> > fusion opportunities. We only use fmuladd when required to restrict
> > the backend to certain choices due to source-language semantics.
>
> I'll also add that, in general, we canonicalize in order to enable
other
> transformations (and reduce the number of input forms those transformations
> need to match in order to be effective). Forming @llvm.fmulall at the IR
> level does not seem to further this goal. Did you have something in mind
> that this canonicalization would help?
>
> Thanks again,
> Hal
>
> >
> >
> > Thanks again,
> > Hal
> >
> >
> > >
> > > define float @foo(float %a, float %b, float %c) {
> > > %mul = fmul fast float %a, %b ; using 'fast' because
there is no
> > > 'fma' flag
> > > %add = fadd fast float %mul, %c
> > > ret float %add
> > > }
> > >
> > > Should this be:
> > >
> > > define float @goo(float %a, float %b, float %c) {
> > > % maybe.fma = call fast float @llvm.fmuladd.f32(float %a, float
%b,
> > > float %c)
> > > ret float % maybe.fma
> > > }
> > > declare float @llvm.fmuladd.f32(float %a, float %b, float %c)
> > >
> > >
> >
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> >
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20161120/babf482d/attachment.html>

Nicolai Hähnle via llvm-dev

2016-Nov-21 09:01 UTC

head link

[llvm-dev] FMA canonicalization in IR

On 20.11.2016 08:38, Hal Finkel via llvm-dev wrote:> ----- Original Message -----
>> From: "Sanjay Patel" <spatel at rotateright.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
>> Sent: Saturday, November 19, 2016 10:40:27 PM
>> Subject: Re: [llvm-dev] FMA canonicalization in IR
>>
>>
>> The potential advantage I was considering would be more accurate cost
>> modeling in the vectorizer, inliner, etc. Like min/max, this is
>> another case where the sum of the IR parts is greater than the
>> actual cost.
>
> This is indeed a problem, but is a much larger problem than just FMAs (as
you note). Our cost-modeling interfaces should be extended to handle instruction
patterns -- I don't see any other way of solving this in general.
>
>>
>> Beyond that, it seems odd to me that we'd choose the longer IR
>> expression of something that could be represented in a minimal form.
>
> My fear is that, by forming the FMAs earlier than necessary, you'll
just end up limiting opportunities for CSE, reassociation, etc. without any
corresponding benefit.
>
>> I know we make practical concessions in IR based on backend
>> deficiencies, but in this case I think the fix would be easy - if
>> we're in contract=fast mode, just split all of these intrinsics at
>> DAG creation time and let the DAG or other passes behave exactly
>> like they do today to fuse them back together again?
>
> This is a good point; we could do this in fp-contract=fast mode.
I think there's a good reason to do this at the IR level already when 
the appropriate flags are set, see the example that I also sent in 
another mail:

   ((X * Y) * X) + Z

is transformed to

   ((X * X) * Y) + Z

when associative transforms are allowed, but when the original is built as

   fmuladd(X * Y, X, Z)

this optimization may be missed (I didn't actually check what happens 
today).

Nicolai

Philip Reames via llvm-dev

2016-Nov-25 04:58 UTC

head link

[llvm-dev] FMA canonicalization in IR

On 11/19/2016 11:38 PM, Hal Finkel via llvm-dev wrote:> ----- Original Message -----
>> From: "Sanjay Patel" <spatel at rotateright.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
>> Sent: Saturday, November 19, 2016 10:40:27 PM
>> Subject: Re: [llvm-dev] FMA canonicalization in IR
>>
>>
>> The potential advantage I was considering would be more accurate cost
>> modeling in the vectorizer, inliner, etc. Like min/max, this is
>> another case where the sum of the IR parts is greater than the
>> actual cost.
> This is indeed a problem, but is a much larger problem than just FMAs (as
you note). Our cost-modeling interfaces should be extended to handle instruction
patterns -- I don't see any other way of solving this in general.This proposal - cost model instruction patterns, not just instructions - 
keeps coming up in a number of contexts.  We've seen a number of 
proposals recently to add intrinsics at various places in the pipeline 
to get around this limitation.  Investing in infrastructure to solve 
this problem via the cost model seems like a generally useful path 
forward which would benefit far more than FMA.>
>> Beyond that, it seems odd to me that we'd choose the longer IR
>> expression of something that could be represented in a minimal form.
> My fear is that, by forming the FMAs earlier than necessary, you'll
just end up limiting opportunities for CSE, reassociation, etc. without any
corresponding benefit.
>
>> I know we make practical concessions in IR based on backend
>> deficiencies, but in this case I think the fix would be easy - if
>> we're in contract=fast mode, just split all of these intrinsics at
>> DAG creation time and let the DAG or other passes behave exactly
>> like they do today to fuse them back together again?
> This is a good point; we could do this in fp-contract=fast mode.
>
>   -Hal
>
>> On Sat, Nov 19, 2016 at 8:29 PM Hal Finkel < hfinkel at anl.gov >
wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Hal J. via llvm-dev Finkel" < llvm-dev at
lists.llvm.org >
>>> To: "Sanjay Patel" < spatel at rotateright.com >
>>> Cc: "llvm-dev" < llvm-dev at lists.llvm.org >
>>> Sent: Saturday, November 19, 2016 10:58:27 AM
>>> Subject: Re: [llvm-dev] FMA canonicalization in IR
>>>
>>>
>>> Sent from my Verizon Wireless 4G LTE DROID
>>> On Nov 19, 2016 10:26 AM, Sanjay Patel < spatel at
rotateright.com >
>>> wrote:
>>>> If I have my FMA intrinsics story straight now (thanks for the
>>>> explanation, Hal!), I think it raises another question about IR
>>>> canonicalization (and may affect the proposed revision to IR
>>>> FMF):
>>>
>>> No, I think that we specifically don't want to canonicalize to
>>> fmuladd at the IR level at all. If the backend has the freedom to
>>> form FMAs as it sees fit, then we should delay the decision until
>>> whenever the backend finds most appropriate. Some backends, for
>>> example, form FMAs using the MachineCombiner pass which considers
>>> critical path, latency, throughputs, etc. in order to find the best
>>> fusion opportunities. We only use fmuladd when required to restrict
>>> the backend to certain choices due to source-language semantics.
>> I'll also add that, in general, we canonicalize in order to enable
>> other transformations (and reduce the number of input forms those
>> transformations need to match in order to be effective). Forming
>> @llvm.fmulall at the IR level does not seem to further this goal.
>> Did you have something in mind that this canonicalization would
>> help?
>>
>> Thanks again,
>> Hal
>>
>>>
>>> Thanks again,
>>> Hal
>>>
>>>
>>>> define float @foo(float %a, float %b, float %c) {
>>>> %mul = fmul fast float %a, %b ; using 'fast' because
there is no
>>>> 'fma' flag
>>>> %add = fadd fast float %mul, %c
>>>> ret float %add
>>>> }
>>>>
>>>> Should this be:
>>>>
>>>> define float @goo(float %a, float %b, float %c) {
>>>> % maybe.fma = call fast float @llvm.fmuladd.f32(float %a, float
>>>> %b,
>>>> float %c)
>>>> ret float % maybe.fma
>>>> }
>>>> declare float @llvm.fmuladd.f32(float %a, float %b, float %c)
>>>>
>>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>

llvm dev - Nov 2016 - FMA canonicalization in IR

[llvm-dev] FMA canonicalization in IR

[llvm-dev] FMA canonicalization in IR

[llvm-dev] FMA canonicalization in IR

[llvm-dev] FMA canonicalization in IR

[llvm-dev] FMA canonicalization in IR

[llvm-dev] FMA canonicalization in IR