thr3ads.net - llvm dev - [LLVMdev] FPOpFusion = Fast and Multiply-and-add combines [Jul 2014]

If this information is useful, please help other people find it:
Share via:

Samuel F Antao

2014-Jul-30 21:37 UTC

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Hi all,

The AllowFPOpFusion option passed to a target can currently take 3
different values, Fast, Standard or Strict (TargetOptions.h), being
Standard the default.

In the DAGCombiner, during the combination of mul and add/subtract into
multiply-and-add/subtract, this option is expected to be Fast in order to
enable the combine. This means, that by default no multiply-and-add opcodes
are going to be generated. If I understand it correctly, this is
undesirable given that multiply-and-add for targets like PPC (I am not sure
about all the other targets) does not pose any rounding problem and it can
even be more accurate than performing the two operations separately.

Also, in TargetOptions.h I read:

Standard, // Only allow fusion of 'blessed' ops (currently just fmuladd)

which made me suspect that the check against Fast in the DAGCombiner is not
correct.

I was wondering if this is something to be fixed in the DAG combiner or if
the target should set a different option to be checked by the DAGCombiner
saying that mul-add/subtract is okay.

Any comments?

Thanks in advance!
Samuel
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140730/14792424/attachment.html>

Tim Northover

2014-Jul-31 13:54 UTC

head link

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Hi Samuel,

On 30 July 2014 22:37, Samuel F Antao <sfantao at us.ibm.com>
wrote:> In the DAGCombiner, during the combination of mul and add/subtract into
> multiply-and-add/subtract, this option is expected to be Fast in order to
> enable the combine. This means, that by default no multiply-and-add opcodes
> are going to be generated. If I understand it correctly, this is
undesirable
> given that multiply-and-add for targets like PPC (I am not sure about all
> the other targets) does not pose any rounding problem and it can even be
> more accurate than performing the two operations separately.
That extra precision is actually what we're being very careful to
avoid unless specifically told we're allowed. It can be just as
harmful to carefully written floating-point code as dropping precision
would be.
> Also, in TargetOptions.h I read:
>
> Standard, // Only allow fusion of 'blessed' ops (currently just
fmuladd)
>
> which made me suspect that the check against Fast in the DAGCombiner is not
> correct.
I think it's OK. In the IR there are 3 different ways to express mul + add:

1. fmul + fadd. This must not be fused into a single step without
intermediate rounding (unless we're in Fast mode).
2. call @llvm.fmuladd. This *may* be fused or not, depending on
profitability (unless we're in Strict mode, in which case it's
separate).
3. call @llvm.fma. This must not be split into two operations (unless
we're in Fast mode).

That middle one is there because C actually allows you to allow &
disallow contraction within a limited region with "#pragma STDC
FP_CONTRACT ON". So we need a way to represent the idea that it's not
usually OK to fuse them (i.e. not Fast mode), but this particular one
actually is OK.

Cheers.

Tim.

Samuel F Antao

2014-Jul-31 15:50 UTC

head link

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Hi Tim,

Thanks for the thorough explanation. It makes perfect sense.

I was not aware fast-math is supposed to prevent more precision being used
than what is in the standard.

I came across this issue while looking into the output or different
compilers. XL and Microsoft compiler seem
to have that turned on by default. But I assume that clang follows what gcc
does, and have that turned off.

Thanks again,
Samuel

Tim Northover <t.p.northover at gmail.com> wrote on 07/31/2014 09:54:55
AM:
> From: Tim Northover <t.p.northover at gmail.com>
> To: Samuel F Antao/Watson/IBM at IBMUS
> Cc: "llvmdev at cs.uiuc.edu" <llvmdev at cs.uiuc.edu>,
Olivier H
> Sallenave/Watson/IBM at IBMUS
> Date: 07/31/2014 09:55 AM
> Subject: Re: [LLVMdev] FPOpFusion = Fast and Multiply-and-add combines
>
> Hi Samuel,
>
> On 30 July 2014 22:37, Samuel F Antao <sfantao at us.ibm.com> wrote:
> > In the DAGCombiner, during the combination of mul and add/subtract
into
> > multiply-and-add/subtract, this option is expected to be Fast in order
to> > enable the combine. This means, that by default no multiply-and-add
opcodes> > are going to be generated. If I understand it correctly, this is
undesirable> > given that multiply-and-add for targets like PPC (I am not sure about
all> > the other targets) does not pose any rounding problem and it can even
be> > more accurate than performing the two operations separately.
>
> That extra precision is actually what we're being very careful to
> avoid unless specifically told we're allowed. It can be just as
> harmful to carefully written floating-point code as dropping precision
> would be.
>
> > Also, in TargetOptions.h I read:
> >
> > Standard, // Only allow fusion of 'blessed' ops (currently
just
fmuladd)> >
> > which made me suspect that the check against Fast in the DAGCombiner
is
not> > correct.
>
> I think it's OK. In the IR there are 3 different ways to express mul +
add:>
> 1. fmul + fadd. This must not be fused into a single step without
> intermediate rounding (unless we're in Fast mode).
> 2. call @llvm.fmuladd. This *may* be fused or not, depending on
> profitability (unless we're in Strict mode, in which case it's
> separate).
> 3. call @llvm.fma. This must not be split into two operations (unless
> we're in Fast mode).
>
> That middle one is there because C actually allows you to allow &
> disallow contraction within a limited region with "#pragma STDC
> FP_CONTRACT ON". So we need a way to represent the idea that it's
not
> usually OK to fuse them (i.e. not Fast mode), but this particular one
> actually is OK.
>
> Cheers.
>
> Tim.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140731/92c5be11/attachment.html>

Possibly Parallel Threads

Search for more apparently analagous threads

llvm dev - Jul 2014 - [LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

[LLVMdev] FPOpFusion = Fast and Multiply-and-add combines

Possibly Parallel Threads