thr3ads.net - search: "vfma"

Displaying 19 results from an estimated 19 matches for "vfma".

Did you mean: fma

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

...at gmail.com [mailto:rengolin at gmail.com] On Behalf Of > Renato Golin > Sent: Friday, November 09, 2012 6:27 PM > To: Sebastien DELDON-GNB > Cc: JF Bastien; llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] fmac generation for cortex-a9 > > Hi Sebastien, > > ARMv7-M has VFMA and LLVM's "triple" is far from perfect. > > Wikipedia tells me NovaThor can also be A15, or STE could have cramped a > VFPv4 in it? ;) Or possibly, your code never branches into the VFMA. > Many things could be happening, but usually, VFMA shouldn't be generated >...

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Hi all, Thanks for the info. Few observations from my side : LLVM : cortex-a8 vfpv3 : no vmla or vfma instruction emitted cortex-a8 vfpv4 : no vmla or vfma instruction emitted (This is invalid though as cortex-a8 does not have vfpv4) cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma instructions generate...

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

cat /proc/cpuinfo ? Are you sure it's generating VFMA and not VMLA? On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Renato, > > It's definitively not A15. Can this be the case that NEON units for > cortex-A9 support it but isn't documented/recommended ? > And as mentioned b...

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Sebastien, ARMv7-M has VFMA and LLVM's "triple" is far from perfect. Wikipedia tells me NovaThor can also be A15, or STE could have cramped a VFPv4 in it? ;) Or possibly, your code never branches into the VFMA. Many things could be happening, but usually, VFMA shouldn't be generated for A9. A GCC bug, mayb...

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

...er the question ? Seb From: JF Bastien [mailto:jfb at google.com] Sent: Friday, November 09, 2012 5:36 PM To: Sebastien DELDON-GNB Cc: Anitha Boyapati; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] fmac generation for cortex-a9 AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA isn't a fused multiply-add, it's a multiply followed by an add and has different latency as well as precision. On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB <sebastien.deldon at st.com<ma...

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

On 18 December 2013 12:31, Tim Northover <t.p.northover at gmail.com> wrote: > That's what I thought! But we do seem to generate vfma on Cortex-A9. > Wonder if that's a bug, or Cortex-A9 is "VFPv3, but chuck in vfma > too"? > Hi Tim, I believe that's the NEON VMLA, not the VFP one. There was a discussion in the past about not using NEON and VFP interchangeably due to IEEE assurances (which NEON doesn&...

[LLVMdev] RE : fmac generation for cortex-a9

2012 Nov 12

[LLVMdev] RE : fmac generation for cortex-a9

..._______________________________________ De : JF Bastien [jfb at google.com] Date d'envoi : vendredi 9 novembre 2012 18:45 À : Sebastien DELDON-GNB Cc : Renato Golin; llvmdev at cs.uiuc.edu Objet : Re: [LLVMdev] fmac generation for cortex-a9 cat /proc/cpuinfo ? Are you sure it's generating VFMA and not VMLA? On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB <sebastien.deldon at st.com<mailto:sebastien.deldon at st.com>> wrote: Hi Renato, It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ? And...

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Just to clarify: gcc 4.8.1 generates that fma at -O2; no FP relaxation or other flags specified. On Wed, Dec 18, 2013 at 6:02 PM, Kay Tiong Khoo <kkhoo at perfwizard.com>wrote: > Thanks for the explanation, Tim! > > gcc 4.8.1 *does* generate an fma for your code example for an x86 target > that supports fma. I'd bet that the HW vendors' compilers do the same, but >

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

On 18 December 2013 09:42, Tim Northover <t.p.northover at gmail.com> wrote: > That means one strictly newer than > cortex-a8: cortex-a7 (don't ask), cortex-a9, cortex-a12, cortex-a15 or > krait I believe. > Hi Tim, Cortex A8 and A9 use VFPv3. A7, A12 and A15 use VFPv4. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

> Cortex A8 and A9 use VFPv3. A7, A12 and A15 use VFPv4. That's what I thought! But we do seem to generate vfma on Cortex-A9. Wonder if that's a bug, or Cortex-A9 is "VFPv3, but chuck in vfma too"? Tim.

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

> I believe that's the NEON VMLA, not the VFP one. Turns out I was misreading the assembly. I wish "vmla" and "vfma" weren't so similar-looking. For Suyog that means the option "-ffp-contract=fast" is needed to get vfma when needed. Sorry about the bad information earlier. Cheers. Tim.

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

Thanks for the explanation, Tim! gcc 4.8.1 *does* generate an fma for your code example for an x86 target that supports fma. I'd bet that the HW vendors' compilers do the same, but I don't have any of those installed at the moment to test that theory. So this is a bug in those compilers? Do you know how they justify it? I see section 6.5 "Expressions" in the C standard, and

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA isn't a fused multiply-add, it's a multiply followed by an add and has different latency as well as precision. On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com>...

[LLVMdev] fmac generation for cortex-a9

2012 Nov 08

[LLVMdev] fmac generation for cortex-a9

Hi Anitha, Thanks for your answer but -mcpu=cortex-a9 -mattr=+vfp4 doesn' t enable fused mac generation for me. I would like just to understand why -mtriple=armv7-eabi enables it while -mcpu=cortex-a9 seems to disable it ? Seb > -----Original Message----- > From: Anitha Boyapati [mailto:anitha.boyapati at gmail.com] > Sent: Thursday, November 08, 2012 10:22 AM > To: Sebastien

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 19

[LLVMdev] LLVM ARM VMLA instruction

> cortex-a8 vfpv4 with ffp-contract=fast : vfma instruction emitted ( this > seems a bug to me!! If cortex-a8 doesn't come with vfpv4 then vfma > instructions generated will be invalid ) If I'm understanding correctly, you've specifically told it this Cortex-A8 *does* come with vfpv4. Those kinds of odd combinations can be use...

[LLVMdev] LLVM ARM VMLA instruction

2013 Dec 18

[LLVMdev] LLVM ARM VMLA instruction

..._bug.cgi?id=17188 http://llvm.org/bugs/show_bug.cgi?id=17211 On Wed, Dec 18, 2013 at 6:02 AM, Tim Northover <t.p.northover at gmail.com>wrote: > > I believe that's the NEON VMLA, not the VFP one. > > Turns out I was misreading the assembly. I wish "vmla" and "vfma" > weren't so similar-looking. > > For Suyog that means the option "-ffp-contract=fast" is needed to get > vfma when needed. Sorry about the bad information earlier. > > Cheers. > > Tim. > _______________________________________________ > LLVM Develo...

ARM vectorized fp16 support

2019 Sep 05

ARM vectorized fp16 support

Thanks for reply. I was using LLVM 8.0. Let me try trunk and will let you know if it works. On Wed, Sep 4, 2019 at 11:19 PM Sjoerd Meijer <Sjoerd.Meijer at arm.com> wrote: > > Hi, > Which version of Clang are you using? I do get a "vfma.f16" with a recent trunk build. I haven't looked at older versions and when this landed, but we had an effort to plug the remaining fp16 holes not that long ago, so again hopefully a newer version will just work for you. > > Cheers, > Sjoerd. > ________________________________...

ARM vectorized fp16 support

2019 Sep 05

ARM vectorized fp16 support

...ultiply-add instructions for c += a * b. I'm wondering whether I did something wrong, if not, is it a missing feature that will be supported later? (I know there're fp16 FMLA intrinsics though) Test programs and outputs, $ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c test_vfma_lane_f16: // @test_vfma_lane_f16 fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD mov v0.16b, v2.16b ret $ cat vfp32.c #include <arm_neon.h> float32x4_t test_vfma_lane_f16(float32x4_t a, float32x4_t b, float32x4_t c) {...

[cfe-dev] RFC: change -fp-contract=off to actually disable FMAs

2019 Jul 12

[cfe-dev] RFC: change -fp-contract=off to actually disable FMAs

> However, fp-contract is not a knob to control whether or not abstract-machine operations generate a single arithmetic instruction I think that makes sense, but the end result is the same. Wouldn't you agree that -fp-contract=off still contracts floating point expressions with the initial example I posted? That is the core of what I'm trying to resolve here. I still have some

search for: vfma