thr3ads.net - llvm dev - [LLVMdev] LLVM ARM VMLA instruction [Dec 2013]

If this information is useful, please help other people find it:
Share via:

Kay Tiong Khoo

2013-Dec-19 01:02 UTC

[LLVMdev] LLVM ARM VMLA instruction

Thanks for the explanation, Tim!

gcc 4.8.1 *does* generate an fma for your code example for an x86 target
that supports fma. I'd bet that the HW vendors' compilers do the same,
but
I don't have any of those installed at the moment to test that theory. So
this is a bug in those compilers? Do you know how they justify it?

I see section 6.5 "Expressions" in the C standard, and I can see that
6.5.8
would seem to agree with you assuming that a "floating expression" is
a
subset of "expression"...is there any other part of the standard that
you
know of that I can reference?

This is made a little weirder by the fact that gcc and clang have a
'fast'
setting for fp-contract, but the C standard that I'm looking at states that
it is just an "on-off-switch".

On Wed, Dec 18, 2013 at 11:17 AM, Tim Northover <t.p.northover at
gmail.com>wrote:
> > http://llvm.org/bugs/show_bug.cgi?id=17188
> > http://llvm.org/bugs/show_bug.cgi?id=17211
>
> Ah, thanks. That makes a lot more sense now.
>
> > Correct - clang is different than gcc, icc, msvc, xlc, etc. on this.
> Still
> > haven't seen any explanation for how this is better though...
>
> That would be because it follows what C tells us a compiler has to do
> by default but provides overrides in either direction if you know what
> you're doing.
>
> The key point is that LLVM (currently) has no notion of statement
> boundaries, so it would fuse the operations in this function:
>
> float foo(float accum, float lhs, float rhs) {
>   float product = lhs * rhs;
>   return accum + product;
> }
>
> This isn't allowed even under FP_CONTRACT=on (the multiply and add do
> not occur within a single expression), so LLVM can't in good
> conscience enable these optimisations by default.
>
> Cheers.
>
> Tim.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131218/d9e4d25c/attachment.html>

Kay Tiong Khoo

2013-Dec-19 01:11 UTC

head link

[LLVMdev] LLVM ARM VMLA instruction

Just to clarify: gcc 4.8.1 generates that fma at -O2; no FP relaxation or
other flags specified.


On Wed, Dec 18, 2013 at 6:02 PM, Kay Tiong Khoo <kkhoo at
perfwizard.com>wrote:
> Thanks for the explanation, Tim!
>
> gcc 4.8.1 *does* generate an fma for your code example for an x86 target
> that supports fma. I'd bet that the HW vendors' compilers do the
same, but
> I don't have any of those installed at the moment to test that theory.
So
> this is a bug in those compilers? Do you know how they justify it?
>
> I see section 6.5 "Expressions" in the C standard, and I can see
that
> 6.5.8 would seem to agree with you assuming that a "floating
expression" is
> a subset of "expression"...is there any other part of the
standard that you
> know of that I can reference?
>
> This is made a little weirder by the fact that gcc and clang have a
'fast'
> setting for fp-contract, but the C standard that I'm looking at states
that
> it is just an "on-off-switch".
>
>
> On Wed, Dec 18, 2013 at 11:17 AM, Tim Northover <t.p.northover at
gmail.com>wrote:
>
>> > http://llvm.org/bugs/show_bug.cgi?id=17188
>> > http://llvm.org/bugs/show_bug.cgi?id=17211
>>
>> Ah, thanks. That makes a lot more sense now.
>>
>> > Correct - clang is different than gcc, icc, msvc, xlc, etc. on
this.
>> Still
>> > haven't seen any explanation for how this is better though...
>>
>> That would be because it follows what C tells us a compiler has to do
>> by default but provides overrides in either direction if you know what
>> you're doing.
>>
>> The key point is that LLVM (currently) has no notion of statement
>> boundaries, so it would fuse the operations in this function:
>>
>> float foo(float accum, float lhs, float rhs) {
>>   float product = lhs * rhs;
>>   return accum + product;
>> }
>>
>> This isn't allowed even under FP_CONTRACT=on (the multiply and add
do
>> not occur within a single expression), so LLVM can't in good
>> conscience enable these optimisations by default.
>>
>> Cheers.
>>
>> Tim.
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131218/6b64b9cd/attachment.html>

suyog sarda

2013-Dec-19 08:00 UTC

head link

[LLVMdev] LLVM ARM VMLA instruction

Hi all,


Thanks for the info. Few observations from my side :


LLVM :


cortex-a8 vfpv3 : no vmla or vfma instruction emitted

cortex-a8 vfpv4 : no vmla or vfma instruction emitted (This is invalid
though as cortex-a8 does not have vfpv4)

cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this
seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma
instructions generated will be invalid )


cortex-a15 vfpv4 : vmla instruction emitted (which is a NEON instruction)

cortex-a15 vfpv4 with ffp-contract=fast vfma instruction emitted.


GCC :


cortex-a8 vfpv3 : vmla instruction emitted

cortex-a15 vfpv4 : vfma instruction emitted


I agree to the point that NEON and VFP instructions shouldn't be used
interchangeably.


However, if gcc emits vmla (NEON) instruction with cortex-a8 then shouldn't
LLVM also emit vmla (NEON) instruction? Can someone please clarify on this
point? The performance gain with vmla instruction is huge. Somewhere i read
that LLVM prefers precision accuracy over performance. Is this true and
hence LLVM is not emiting vmla instructions for cortex-a8?



On Thu, Dec 19, 2013 at 6:41 AM, Kay Tiong Khoo <kkhoo at
perfwizard.com>wrote:
> Just to clarify: gcc 4.8.1 generates that fma at -O2; no FP relaxation or
> other flags specified.
>
>
> On Wed, Dec 18, 2013 at 6:02 PM, Kay Tiong Khoo <kkhoo at
perfwizard.com>wrote:
>
>> Thanks for the explanation, Tim!
>>
>> gcc 4.8.1 *does* generate an fma for your code example for an x86
target
>> that supports fma. I'd bet that the HW vendors' compilers do
the same, but
>> I don't have any of those installed at the moment to test that
theory. So
>> this is a bug in those compilers? Do you know how they justify it?
>>
>> I see section 6.5 "Expressions" in the C standard, and I can
see that
>> 6.5.8 would seem to agree with you assuming that a "floating
expression" is
>> a subset of "expression"...is there any other part of the
standard that you
>> know of that I can reference?
>>
>> This is made a little weirder by the fact that gcc and clang have a
>> 'fast' setting for fp-contract, but the C standard that I'm
looking at
>> states that it is just an "on-off-switch".
>>
>>
>> On Wed, Dec 18, 2013 at 11:17 AM, Tim Northover <t.p.northover at
gmail.com>wrote:
>>
>>> > http://llvm.org/bugs/show_bug.cgi?id=17188
>>> > http://llvm.org/bugs/show_bug.cgi?id=17211
>>>
>>> Ah, thanks. That makes a lot more sense now.
>>>
>>> > Correct - clang is different than gcc, icc, msvc, xlc, etc. on
this.
>>> Still
>>> > haven't seen any explanation for how this is better
though...
>>>
>>> That would be because it follows what C tells us a compiler has to
do
>>> by default but provides overrides in either direction if you know
what
>>> you're doing.
>>>
>>> The key point is that LLVM (currently) has no notion of statement
>>> boundaries, so it would fuse the operations in this function:
>>>
>>> float foo(float accum, float lhs, float rhs) {
>>>   float product = lhs * rhs;
>>>   return accum + product;
>>> }
>>>
>>> This isn't allowed even under FP_CONTRACT=on (the multiply and
add do
>>> not occur within a single expression), so LLVM can't in good
>>> conscience enable these optimisations by default.
>>>
>>> Cheers.
>>>
>>> Tim.
>>>
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>

-- 
With regards,
Suyog Sarda
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20131219/9ffd9395/attachment.html>

Possibly Parallel Threads

Search for more possibly parallel threads

llvm dev - Dec 2013 - [LLVMdev] LLVM ARM VMLA instruction

[LLVMdev] LLVM ARM VMLA instruction

[LLVMdev] LLVM ARM VMLA instruction

[LLVMdev] LLVM ARM VMLA instruction

Possibly Parallel Threads