thr3ads.net - llvm dev - [llvm-dev] Fusing contract fadd/fsub with normal fmul [Jun 2017]

If this information is useful, please help other people find it:
Share via:

Yichao Yu via llvm-dev

2017-Jun-10 03:04 UTC

[llvm-dev] Fusing contract fadd/fsub with normal fmul

Hi,

On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked
with `contract` or `fast` can be merged to a fma instruction by the
backend.

I'm wondering about the exact semantic of this new flag as well as
`fast` and in particular, would it be valid to do this when only the
`fadd`/`fsub` (and not the `fmul`) is marked with `contract` or at
least `fast`. The reasoning is that doing this will have a similar
effect as if the `fadd`/`fsub` is performed not to IEEE spec so a
single flag on this instruction should be enough for the
transformation.

The particular case I'm interested in is vectorized loop with
reduction like in pseudo C code `s += a[i] * b[i]`. Our front end will
recognize this and mark the `+` as `fast` to enable vectorization.
It'll be great if this can enable the reduction to be done with `fma`
instructions.

Yichao Yu

Hal Finkel via llvm-dev

2017-Jun-12 13:36 UTC

head link

[llvm-dev] Fusing contract fadd/fsub with normal fmul

It seems like the contract flag is underspecified in this regard. I'd 
lean, however, toward requiring it on both instructions in order to 
contract them. That way inlining a function where contraction was 
prohibited into a function where contraction was permitted would not be 
able to effectively remove the final-result rounding from the callee.

  -Hal


On 06/09/2017 10:04 PM, Yichao Yu via llvm-dev wrote:> Hi,
>
> On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked
> with `contract` or `fast` can be merged to a fma instruction by the
> backend.
>
> I'm wondering about the exact semantic of this new flag as well as
> `fast` and in particular, would it be valid to do this when only the
> `fadd`/`fsub` (and not the `fmul`) is marked with `contract` or at
> least `fast`. The reasoning is that doing this will have a similar
> effect as if the `fadd`/`fsub` is performed not to IEEE spec so a
> single flag on this instruction should be enough for the
> transformation.
>
> The particular case I'm interested in is vectorized loop with
> reduction like in pseudo C code `s += a[i] * b[i]`. Our front end will
> recognize this and mark the `+` as `fast` to enable vectorization.
> It'll be great if this can enable the reduction to be done with `fma`
> instructions.
>
> Yichao Yu
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Sanjay Patel via llvm-dev

2017-Jun-12 13:40 UTC

head link

[llvm-dev] Fusing contract fadd/fsub with normal fmul

For reference, the FMF 'contract' patches are listed here:
https://bugs.llvm.org/show_bug.cgi?id=25721#c6

If we can make the documentation better, that would certainly be a welcome
patch.

It would be better to see the IR for your example(s), but I think you'd
need 'contract' on both the fmul and fadd to generate an FMA.
Conservatively, we wouldn't alter the result if either component somehow
required strict FP. To vectorize, you probably need 'fast' on both ops
because vectorization would be changing the order of operations
(reassociation).


On Fri, Jun 9, 2017 at 9:04 PM, Yichao Yu via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi,
>
> On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked
> with `contract` or `fast` can be merged to a fma instruction by the
> backend.
>
> I'm wondering about the exact semantic of this new flag as well as
> `fast` and in particular, would it be valid to do this when only the
> `fadd`/`fsub` (and not the `fmul`) is marked with `contract` or at
> least `fast`. The reasoning is that doing this will have a similar
> effect as if the `fadd`/`fsub` is performed not to IEEE spec so a
> single flag on this instruction should be enough for the
> transformation.
>
> The particular case I'm interested in is vectorized loop with
> reduction like in pseudo C code `s += a[i] * b[i]`. Our front end will
> recognize this and mark the `+` as `fast` to enable vectorization.
> It'll be great if this can enable the reduction to be done with `fma`
> instructions.
>
> Yichao Yu
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20170612/da4be716/attachment.html>

Yichao Yu via llvm-dev

2017-Jun-12 18:22 UTC

head link

[llvm-dev] Fusing contract fadd/fsub with normal fmul

On Mon, Jun 12, 2017 at 9:40 AM, Sanjay Patel <spatel at rotateright.com>
wrote:> For reference, the FMF 'contract' patches are listed here:
> https://bugs.llvm.org/show_bug.cgi?id=25721#c6
>
> If we can make the documentation better, that would certainly be a welcome
> patch.
>
> It would be better to see the IR for your example(s), but I think you'd
need
The IR of the scalar loop is
```
if13:                                             ; preds = %scalar.ph, %if13
 %s.124 = phi double [ %51, %if13 ], [ %bc.merge.rdx, %scalar.ph ]
 %"i#672.023" = phi i64 [ %52, %if13 ], [ %bc.resume.val, %scalar.ph ]
 %46 = getelementptr double, double* %13, i64 %"i#672.023"
 %47 = load double, double* %46, align 8
 %48 = getelementptr double, double* %15, i64 %"i#672.023"
 %49 = load double, double* %48, align 8
 %50 = fmul double %47, %49
 %51 = fadd fast double %s.124, %50
 %52 = add nuw nsw i64 %"i#672.023", 1
 %53 = icmp slt i64 %52, %9
 br i1 %53, label %if13, label
%L11.outer.split.L11.outer.split.split_crit_edge.outer.loopexit
```

And it can be vectorized to

```
vector.body:                                      ; preds %vector.body,
%vector.ph
 %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
 %vec.phi = phi <4 x double> [ %19, %vector.ph ], [ %40, %vector.body ]
 %vec.phi94 = phi <4 x double> [ zeroinitializer, %vector.ph ], [ %41,
%vector.body ]
 %vec.phi95 = phi <4 x double> [ zeroinitializer, %vector.ph ], [ %42,
%vector.body ]
 %vec.phi96 = phi <4 x double> [ zeroinitializer, %vector.ph ], [ %43,
%vector.body ]
 %20 = getelementptr double, double* %13, i64 %index
 %21 = bitcast double* %20 to <4 x double>*
 %wide.load = load <4 x double>, <4 x double>* %21, align 8
 %22 = getelementptr double, double* %20, i64 4
 %23 = bitcast double* %22 to <4 x double>*
 %wide.load100 = load <4 x double>, <4 x double>* %23, align 8
 %24 = getelementptr double, double* %20, i64 8
 %25 = bitcast double* %24 to <4 x double>*
 %wide.load101 = load <4 x double>, <4 x double>* %25, align 8
 %26 = getelementptr double, double* %20, i64 12
 %27 = bitcast double* %26 to <4 x double>*
 %wide.load102 = load <4 x double>, <4 x double>* %27, align 8
 %28 = getelementptr double, double* %15, i64 %index
 %29 = bitcast double* %28 to <4 x double>*
 %wide.load103 = load <4 x double>, <4 x double>* %29, align 8
 %30 = getelementptr double, double* %28, i64 4
 %31 = bitcast double* %30 to <4 x double>*
 %wide.load104 = load <4 x double>, <4 x double>* %31, align 8
 %32 = getelementptr double, double* %28, i64 8
 %33 = bitcast double* %32 to <4 x double>*
 %wide.load105 = load <4 x double>, <4 x double>* %33, align 8
 %34 = getelementptr double, double* %28, i64 12
 %35 = bitcast double* %34 to <4 x double>*
 %wide.load106 = load <4 x double>, <4 x double>* %35, align 8
 %36 = fmul <4 x double> %wide.load, %wide.load103
 %37 = fmul <4 x double> %wide.load100, %wide.load104
 %38 = fmul <4 x double> %wide.load101, %wide.load105
 %39 = fmul <4 x double> %wide.load102, %wide.load106
 %40 = fadd fast <4 x double> %vec.phi, %36
 %41 = fadd fast <4 x double> %vec.phi94, %37
 %42 = fadd fast <4 x double> %vec.phi95, %38
 %43 = fadd fast <4 x double> %vec.phi96, %39
 %index.next = add i64 %index, 16
 %44 = icmp eq i64 %index.next, %n.vec
 br i1 %44, label %middle.block, label %vector.body
```

If contracting normal mul and fast add is allowed, both loop can use fma.
> 'contract' on both the fmul and fadd to generate an FMA.
Conservatively, we
> wouldn't alter the result if either component somehow required strict
FP. To
> vectorize, you probably need 'fast' on both ops because
vectorization would
> be changing the order of operations (reassociation).
>
>
> On Fri, Jun 9, 2017 at 9:04 PM, Yichao Yu via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>>
>> Hi,
>>
>> On LLVM 5.0 (current trunk), fadd/fsub and fmul that are both marked
>> with `contract` or `fast` can be merged to a fma instruction by the
>> backend.
>>
>> I'm wondering about the exact semantic of this new flag as well as
>> `fast` and in particular, would it be valid to do this when only the
>> `fadd`/`fsub` (and not the `fmul`) is marked with `contract` or at
>> least `fast`. The reasoning is that doing this will have a similar
>> effect as if the `fadd`/`fsub` is performed not to IEEE spec so a
>> single flag on this instruction should be enough for the
>> transformation.
>>
>> The particular case I'm interested in is vectorized loop with
>> reduction like in pseudo C code `s += a[i] * b[i]`. Our front end will
>> recognize this and mark the `+` as `fast` to enable vectorization.
>> It'll be great if this can enable the reduction to be done with
`fma`
>> instructions.
>>
>> Yichao Yu
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>

llvm dev - Jun 2017 - Fusing contract fadd/fsub with normal fmul

[llvm-dev] Fusing contract fadd/fsub with normal fmul

[llvm-dev] Fusing contract fadd/fsub with normal fmul

[llvm-dev] Fusing contract fadd/fsub with normal fmul

[llvm-dev] Fusing contract fadd/fsub with normal fmul