Displaying 19 results from an estimated 19 matches for "vfma".
Did you mean:
fma
2012 Nov 09
2
[LLVMdev] fmac generation for cortex-a9
...at gmail.com [mailto:rengolin at gmail.com] On Behalf Of
> Renato Golin
> Sent: Friday, November 09, 2012 6:27 PM
> To: Sebastien DELDON-GNB
> Cc: JF Bastien; llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] fmac generation for cortex-a9
>
> Hi Sebastien,
>
> ARMv7-M has VFMA and LLVM's "triple" is far from perfect.
>
> Wikipedia tells me NovaThor can also be A15, or STE could have cramped a
> VFPv4 in it? ;) Or possibly, your code never branches into the VFMA.
> Many things could be happening, but usually, VFMA shouldn't be generated
>...
2013 Dec 19
3
[LLVMdev] LLVM ARM VMLA instruction
Hi all,
Thanks for the info. Few observations from my side :
LLVM :
cortex-a8 vfpv3 : no vmla or vfma instruction emitted
cortex-a8 vfpv4 : no vmla or vfma instruction emitted (This is invalid
though as cortex-a8 does not have vfpv4)
cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this
seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma
instructions generate...
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
cat /proc/cpuinfo ?
Are you sure it's generating VFMA and not VMLA?
On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB <
sebastien.deldon at st.com> wrote:
> Hi Renato,
>
> It's definitively not A15. Can this be the case that NEON units for
> cortex-A9 support it but isn't documented/recommended ?
> And as mentioned b...
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
Hi Sebastien,
ARMv7-M has VFMA and LLVM's "triple" is far from perfect.
Wikipedia tells me NovaThor can also be A15, or STE could have cramped
a VFPv4 in it? ;) Or possibly, your code never branches into the VFMA.
Many things could be happening, but usually, VFMA shouldn't be
generated for A9.
A GCC bug, mayb...
2012 Nov 09
2
[LLVMdev] fmac generation for cortex-a9
...er the question ?
Seb
From: JF Bastien [mailto:jfb at google.com]
Sent: Friday, November 09, 2012 5:36 PM
To: Sebastien DELDON-GNB
Cc: Anitha Boyapati; llvmdev at cs.uiuc.edu
Subject: Re: [LLVMdev] fmac generation for cortex-a9
AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA isn't a fused multiply-add, it's a multiply followed by an add and has different latency as well as precision.
On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB <sebastien.deldon at st.com<ma...
2013 Dec 18
2
[LLVMdev] LLVM ARM VMLA instruction
On 18 December 2013 12:31, Tim Northover <t.p.northover at gmail.com> wrote:
> That's what I thought! But we do seem to generate vfma on Cortex-A9.
> Wonder if that's a bug, or Cortex-A9 is "VFPv3, but chuck in vfma
> too"?
>
Hi Tim,
I believe that's the NEON VMLA, not the VFP one. There was a discussion in
the past about not using NEON and VFP interchangeably due to IEEE
assurances (which NEON doesn&...
2012 Nov 12
1
[LLVMdev] RE : fmac generation for cortex-a9
..._______________________________________
De : JF Bastien [jfb at google.com]
Date d'envoi : vendredi 9 novembre 2012 18:45
À : Sebastien DELDON-GNB
Cc : Renato Golin; llvmdev at cs.uiuc.edu
Objet : Re: [LLVMdev] fmac generation for cortex-a9
cat /proc/cpuinfo ?
Are you sure it's generating VFMA and not VMLA?
On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB <sebastien.deldon at st.com<mailto:sebastien.deldon at st.com>> wrote:
Hi Renato,
It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ?
And...
2013 Dec 19
0
[LLVMdev] LLVM ARM VMLA instruction
Just to clarify: gcc 4.8.1 generates that fma at -O2; no FP relaxation or
other flags specified.
On Wed, Dec 18, 2013 at 6:02 PM, Kay Tiong Khoo <kkhoo at perfwizard.com>wrote:
> Thanks for the explanation, Tim!
>
> gcc 4.8.1 *does* generate an fma for your code example for an x86 target
> that supports fma. I'd bet that the HW vendors' compilers do the same, but
>
2013 Dec 18
2
[LLVMdev] LLVM ARM VMLA instruction
On 18 December 2013 09:42, Tim Northover <t.p.northover at gmail.com> wrote:
> That means one strictly newer than
> cortex-a8: cortex-a7 (don't ask), cortex-a9, cortex-a12, cortex-a15 or
> krait I believe.
>
Hi Tim,
Cortex A8 and A9 use VFPv3. A7, A12 and A15 use VFPv4.
cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2013 Dec 18
0
[LLVMdev] LLVM ARM VMLA instruction
> Cortex A8 and A9 use VFPv3. A7, A12 and A15 use VFPv4.
That's what I thought! But we do seem to generate vfma on Cortex-A9.
Wonder if that's a bug, or Cortex-A9 is "VFPv3, but chuck in vfma
too"?
Tim.
2013 Dec 18
0
[LLVMdev] LLVM ARM VMLA instruction
> I believe that's the NEON VMLA, not the VFP one.
Turns out I was misreading the assembly. I wish "vmla" and "vfma"
weren't so similar-looking.
For Suyog that means the option "-ffp-contract=fast" is needed to get
vfma when needed. Sorry about the bad information earlier.
Cheers.
Tim.
2013 Dec 19
2
[LLVMdev] LLVM ARM VMLA instruction
Thanks for the explanation, Tim!
gcc 4.8.1 *does* generate an fma for your code example for an x86 target
that supports fma. I'd bet that the HW vendors' compilers do the same, but
I don't have any of those installed at the moment to test that theory. So
this is a bug in those compilers? Do you know how they justify it?
I see section 6.5 "Expressions" in the C standard, and
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't
know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA
isn't a fused multiply-add, it's a multiply followed by an add and has
different latency as well as precision.
On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB <
sebastien.deldon at st.com>...
2012 Nov 08
2
[LLVMdev] fmac generation for cortex-a9
Hi Anitha,
Thanks for your answer but -mcpu=cortex-a9 -mattr=+vfp4 doesn' t enable fused mac generation for me.
I would like just to understand why -mtriple=armv7-eabi enables it while -mcpu=cortex-a9 seems to disable it ?
Seb
> -----Original Message-----
> From: Anitha Boyapati [mailto:anitha.boyapati at gmail.com]
> Sent: Thursday, November 08, 2012 10:22 AM
> To: Sebastien
2013 Dec 19
0
[LLVMdev] LLVM ARM VMLA instruction
> cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this
> seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma
> instructions generated will be invalid )
If I'm understanding correctly, you've specifically told it this
Cortex-A8 *does* come with vfpv4. Those kinds of odd combinations can
be use...
2013 Dec 18
2
[LLVMdev] LLVM ARM VMLA instruction
..._bug.cgi?id=17188
http://llvm.org/bugs/show_bug.cgi?id=17211
On Wed, Dec 18, 2013 at 6:02 AM, Tim Northover <t.p.northover at gmail.com>wrote:
> > I believe that's the NEON VMLA, not the VFP one.
>
> Turns out I was misreading the assembly. I wish "vmla" and "vfma"
> weren't so similar-looking.
>
> For Suyog that means the option "-ffp-contract=fast" is needed to get
> vfma when needed. Sorry about the bad information earlier.
>
> Cheers.
>
> Tim.
> _______________________________________________
> LLVM Develo...
2019 Sep 05
2
ARM vectorized fp16 support
Thanks for reply. I was using LLVM 8.0. Let me try trunk and will let
you know if it works.
On Wed, Sep 4, 2019 at 11:19 PM Sjoerd Meijer <Sjoerd.Meijer at arm.com> wrote:
>
> Hi,
> Which version of Clang are you using? I do get a "vfma.f16" with a recent trunk build. I haven't looked at older versions and when this landed, but we had an effort to plug the remaining fp16 holes not that long ago, so again hopefully a newer version will just work for you.
>
> Cheers,
> Sjoerd.
> ________________________________...
2019 Sep 05
2
ARM vectorized fp16 support
...ultiply-add instructions
for c += a * b. I'm wondering whether I did something wrong, if not,
is it a missing feature that will be supported later? (I know there're
fp16 FMLA intrinsics though)
Test programs and outputs,
$ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c
test_vfma_lane_f16: // @test_vfma_lane_f16
fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD
mov v0.16b, v2.16b
ret
$ cat vfp32.c
#include <arm_neon.h>
float32x4_t test_vfma_lane_f16(float32x4_t a, float32x4_t b, float32x4_t c) {...
2019 Jul 12
3
[cfe-dev] RFC: change -fp-contract=off to actually disable FMAs
> However, fp-contract is not a knob to control whether or not
abstract-machine operations generate a single arithmetic instruction
I think that makes sense, but the end result is the same. Wouldn't you
agree that -fp-contract=off still contracts floating point expressions with
the initial example I posted? That is the core of what I'm trying to
resolve here.
I still have some