thr3ads.net - similar to: "[LLVMdev] fmac generation for cortex-a9"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] fmac generation for cortex-a9"

2012 Nov 08

[LLVMdev] fmac generation for cortex-a9

Hi Anitha, Thanks for your answer but -mcpu=cortex-a9 -mattr=+vfp4 doesn' t enable fused mac generation for me. I would like just to understand why -mtriple=armv7-eabi enables it while -mcpu=cortex-a9 seems to disable it ? Seb > -----Original Message----- > From: Anitha Boyapati [mailto:anitha.boyapati at gmail.com] > Sent: Thursday, November 08, 2012 10:22 AM > To: Sebastien

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Renato, It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ? And as mentioned before code is working ! Seb > -----Original Message----- > From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of > Renato Golin > Sent: Friday, November 09, 2012 6:27 PM > To: Sebastien DELDON-GNB >

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Bastien, Weird gcc is generating fma for my platform STEricsson Novathor with Linaro, code works. It also works when I use LLVM to generate fma (using llc -mtriple=armv7-eabi). Maybe someone from ARM can answer the question ? Seb From: JF Bastien [mailto:jfb at google.com] Sent: Friday, November 09, 2012 5:36 PM To: Sebastien DELDON-GNB Cc: Anitha Boyapati; llvmdev at cs.uiuc.edu Subject:

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA isn't a fused multiply-add, it's a multiply followed by an add and has different latency as well as precision. On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Anitha,

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

cat /proc/cpuinfo ? Are you sure it's generating VFMA and not VMLA? On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Renato, > > It's definitively not A15. Can this be the case that NEON units for > cortex-A9 support it but isn't documented/recommended ? > And as mentioned before code is working ! > > Seb >

[LLVMdev] fmac generation for cortex-a9

2012 Nov 08

[LLVMdev] fmac generation for cortex-a9

On 8 November 2012 13:56, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: > Hi all, > > > > > > I’ve a .ll code that use double precision fmul/fadd or fmul/fsub. When I > compile it using llc –mcpu=cortex-a9 I couldn’t get vmla/vmls generated even > using –fp-contract=fast, but when I use option –mtriple=armv7-eabi instead > of –mcpu=cortex-a9 fused mac

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Sebastien, ARMv7-M has VFMA and LLVM's "triple" is far from perfect. Wikipedia tells me NovaThor can also be A15, or STE could have cramped a VFPv4 in it? ;) Or possibly, your code never branches into the VFMA. Many things could be happening, but usually, VFMA shouldn't be generated for A9. A GCC bug, maybe? On 9 November 2012 16:51, Sebastien DELDON-GNB

[LLVMdev] RE : fmac generation for cortex-a9

2012 Nov 12

[LLVMdev] RE : fmac generation for cortex-a9

Hi Renato, You're right it's VMLA/VMLS that are generated. Still don't understand what drives generation for Cortex-A9. I was using fmac for floating point MAC not for fused MAC. Than I realized that we spoke about fma instead of fmac. So back to the original problem why when using -mcpu=cortex-a9 VMLA/VMLS are not generated and when I use -mtriple=armv7-eabi they are ? Best

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato, Thanks for the answer, it confirms what I was suspecting. My problem is that this behavior is controlled by vmlx forwarding on cortex-a9 for which despite asking on this list, I couldn't get a clear understanding what this option is meant for. So here are my new questions: Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee correctness or for performance

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to > guarantee correctness or for performance purpose ? I’ve made some > experiments and DISABLING vmlx-forwarding for cortex-a9 leads to generation > of more vmla/vmls .f32 and significantly improve some benchmarks. I’ve not >

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi all, Everything is in the tile, I would like to enforce generation of vmla.f32 instruction for scalar operations on cortex-a9, so is there a LLMV neon intrinsic available for that ? Thanks for your answers Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 8 February 2013 10:40, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Hi all,**** > > ** ** > > Everything is in the tile, I would like to enforce generation of vmla.f32 > instruction for scalar operations on cortex-a9, so is there a LLMV neon > intrinsic available for that ?**** > > Hi Sebastien, LLVM doesn't use intrinsics when there is a

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 11

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato, Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete benchmark execution ! So it is worth a try. Now going back to vmla generation through LLMV intrinsic usage. I've looked at .td file and it seems to me that when there is a "pattern" to generate instruction, no

[LLVMdev] vmlx forwarding an cortex A9 question

2012 Dec 20

[LLVMdev] vmlx forwarding an cortex A9 question

Hi all, On following code when I use llc targeting ARM Cortex-A9 as follows, if vmlx-forwarding is turned off then 'vmla' instructions are generated. It seems that -mcpu=cortex-a9 enables it by default and thus less 'vmla' instructions are generated. On this specific example it doesn't make any difference in term of performance, but on a more complex example disabling

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > If this helps taking your decision, there are at least two benchmarks for > which disabling vmlx-forwarding makes a significant difference. > I think Evan's worry was to base this decision on visible and comprehensible benchmarks, such as the test-suite. If I get lucky I may be able to run

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 13

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Sebastien, How many extra vmlas did you see in 433.milc due to disabling -vmlx-forwarding? As I mentioned earlier, I saw only two additional integer vmlx instructions when I tested. Could you send me your 433.milc compile setup? (os, flags, compiler version, etc.). I'd like to try to reproduce your results. Cheers, Lang. On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin <renato.golin at

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

I did the initial work on vmla formation. The default settings for cortex-a8 / a9 due to micro-architecture difference (i believe a8 TRM talks about vmla hazards) and extensive testing. That said, given the limitation of the current pre-RA scheduling pass, it's likely the use of vmla can caused regressions. Im not opposed to changing the setting for a9. However, it's not a good idea to

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 15

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Lang & Renato, I eventually set up a panda board with latest linaro delivery (eabi-hf). I did some experiments using my own compiler and LLVM 3.2 as back-end. I use same flagset for my compiler (front-end) and just invoke llc with and without vmlx-forwarding attribute. So base arguments to llc are: llc -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard to which I added

[LLVMdev] Help needed on debugging llvm

2013 Mar 12

[LLVMdev] Help needed on debugging llvm

I'm still slightly confused. Is the error now fixed or is there still a bug in LLVM's integrated assembler? On Mon, Mar 11, 2013 at 4:49 AM, Anitha B Gollamudi < anitha.boyapati at gmail.com> wrote: > On 11 March 2013 17:00, Duncan Sands <baldrick at free.fr> wrote: > > Hi Anitha, > > > > > >> Ah, I am taking back my above words w.r.t encoding.

[LLVMdev] Help needed on debugging llvm

2013 Jan 22

[LLVMdev] Help needed on debugging llvm

Are you still having issues with FMA4? I wonder if PR15040 is related. A fix was just committed. On Wed, Nov 7, 2012 at 3:22 AM, Anitha Boyapati <anitha.boyapati at gmail.com>wrote: > > > On 7 November 2012 15:29, Duncan Sands <baldrick at free.fr> wrote: > >> >> That way the output should be exactly the same as the output dragonegg >> would >>

similar to: [LLVMdev] fmac generation for cortex-a9