thr3ads.net - llvm dev - [LLVMdev] Commutability of X86 FMA3 instructions. [Dec 2013]

If this information is useful, please help other people find it:
Share via:

Lang Hames

2013-Dec-20 05:45 UTC

[LLVMdev] Commutability of X86 FMA3 instructions.

Hi all,

The 213 variant of the FMA3 instructions is currently marked
commutable (see X86InstrFMA.td). Is that safe? According to the ISA
the FMA3 instructions aren't commutable for non-numeric results, so
I'd have thought commuting this would only be valid in fast-math mode?

For the curious, the reason that I'm asking is that we currently
always select the 213 variant, but this introduces an extra copies in
accumulator-style loops. Something like:

while (...)
  accumulator = x * y + accumulator;

yields:

loop:
  vfmadd.213 y, x, acc
  vmovaps acc, x
  decl count
  jne loop

instead of

loop:
  vfmadd.231 acc, x, y
  decl count
  jne loop

I have started writing a patch to generate the 231 variant by default,
and I want to know whether I need to go to the trouble of adding
custom commute logic. If these things aren't commutable then I don't
need to worry at all. If they are commutable, but only in fast-math
mode, then I can support that too.

Thanks for the help!

- Lang.

Kay Tiong Khoo

2013-Dec-20 16:29 UTC

head link

[LLVMdev] Commutability of X86 FMA3 instructions.

Hi Lang,

Unfortunately, I don't have an answer on the commutability question, but I
wanted to let you know that I filed a bug on this:
http://llvm.org/bugs/show_bug.cgi?id=17229

This also shows a memory operand variant of the fma that you may want to
consider in your patch and testcases.

Thanks!


On Thu, Dec 19, 2013 at 10:45 PM, Lang Hames <lhames at gmail.com> wrote:
> Hi all,
>
> The 213 variant of the FMA3 instructions is currently marked
> commutable (see X86InstrFMA.td). Is that safe? According to the ISA
> the FMA3 instructions aren't commutable for non-numeric results, so
> I'd have thought commuting this would only be valid in fast-math mode?
>
> For the curious, the reason that I'm asking is that we currently
> always select the 213 variant, but this introduces an extra copies in
> accumulator-style loops. Something like:
>
> while (...)
>   accumulator = x * y + accumulator;
>
> yields:
>
> loop:
>   vfmadd.213 y, x, acc
>   vmovaps acc, x
>   decl count
>   jne loop
>
> instead of
>
> loop:
>   vfmadd.231 acc, x, y
>   decl count
>   jne loop
>
> I have started writing a patch to generate the 231 variant by default,
> and I want to know whether I need to go to the trouble of adding
> custom commute logic. If these things aren't commutable then I
don't
> need to worry at all. If they are commutable, but only in fast-math
> mode, then I can support that too.
>
> Thanks for the help!
>
> - Lang.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131220/f169f6d7/attachment.html>

Lang Hames

2013-Dec-20 21:03 UTC

head link

[LLVMdev] Commutability of X86 FMA3 instructions.

Hi Kay,

My patch will partially address your bug. For now I'm just looking to
switch the default FMA from vfmadd213xx to vfmadd231xx. That will
cause the code in PR17229 to compile as desired, but would regress
code like:

double foo(double a, double b, double c) {
  return a * b + c;
}

Which will now require a vmovaps + vfmadd231.

If this impacts real benchmarks we could add an optimization to change
the FMA variant based on how it's used.

- Lang.

On Fri, Dec 20, 2013 at 8:29 AM, Kay Tiong Khoo <kkhoo at perfwizard.com>
wrote:> Hi Lang,
>
> Unfortunately, I don't have an answer on the commutability question,
but I
> wanted to let you know that I filed a bug on this:
> http://llvm.org/bugs/show_bug.cgi?id=17229
>
> This also shows a memory operand variant of the fma that you may want to
> consider in your patch and testcases.
>
> Thanks!
>
>
> On Thu, Dec 19, 2013 at 10:45 PM, Lang Hames <lhames at gmail.com>
wrote:
>>
>> Hi all,
>>
>> The 213 variant of the FMA3 instructions is currently marked
>> commutable (see X86InstrFMA.td). Is that safe? According to the ISA
>> the FMA3 instructions aren't commutable for non-numeric results, so
>> I'd have thought commuting this would only be valid in fast-math
mode?
>>
>> For the curious, the reason that I'm asking is that we currently
>> always select the 213 variant, but this introduces an extra copies in
>> accumulator-style loops. Something like:
>>
>> while (...)
>>   accumulator = x * y + accumulator;
>>
>> yields:
>>
>> loop:
>>   vfmadd.213 y, x, acc
>>   vmovaps acc, x
>>   decl count
>>   jne loop
>>
>> instead of
>>
>> loop:
>>   vfmadd.231 acc, x, y
>>   decl count
>>   jne loop
>>
>> I have started writing a patch to generate the 231 variant by default,
>> and I want to know whether I need to go to the trouble of adding
>> custom commute logic. If these things aren't commutable then I
don't
>> need to worry at all. If they are commutable, but only in fast-math
>> mode, then I can support that too.
>>
>> Thanks for the help!
>>
>> - Lang.
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Dec 2013 - [LLVMdev] Commutability of X86 FMA3 instructions.

[LLVMdev] Commutability of X86 FMA3 instructions.

[LLVMdev] Commutability of X86 FMA3 instructions.

[LLVMdev] Commutability of X86 FMA3 instructions.

Possibly Parallel Threads