similar to: [LLVMdev] vmlx forwarding option for Cortex-A9

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] vmlx forwarding option for Cortex-A9"

2012 Dec 20
1
[LLVMdev] vmlx forwarding an cortex A9 question
Hi all, On following code when I use llc targeting ARM Cortex-A9 as follows, if vmlx-forwarding is turned off then 'vmla' instructions are generated. It seems that -mcpu=cortex-a9 enables it by default and thus less 'vmla' instructions are generated. On this specific example it doesn't make any difference in term of performance, but on a more complex example disabling
2013 Feb 15
2
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Lang & Renato, I eventually set up a panda board with latest linaro delivery (eabi-hf). I did some experiments using my own compiler and LLVM 3.2 as back-end. I use same flagset for my compiler (front-end) and just invoke llc with and without vmlx-forwarding attribute. So base arguments to llc are: llc -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard to which I added
2013 Feb 13
0
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Sebastien, How many extra vmlas did you see in 433.milc due to disabling -vmlx-forwarding? As I mentioned earlier, I saw only two additional integer vmlx instructions when I tested. Could you send me your 433.milc compile setup? (os, flags, compiler version, etc.). I'd like to try to reproduce your results. Cheers, Lang. On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin <renato.golin at
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato, Thanks for the answer, it confirms what I was suspecting. My problem is that this behavior is controlled by vmlx forwarding on cortex-a9 for which despite asking on this list, I couldn't get a clear understanding what this option is meant for. So here are my new questions: Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee correctness or for performance
2013 Feb 15
0
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 15 February 2013 16:00, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > to which I added –mattr=-vmlx-forwarding to disable vmlx forwarding for > cortex-a9. > > When I DISABLE vmlx forwarding I’m observing a 7% speed-up on ref dataset > for MILC. So I’m observing something similar to what I’ve observed on STE > platform available on SNOWBALL board. > Hi
2013 Feb 12
3
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > If this helps taking your decision, there are at least two benchmarks for > which disabling vmlx-forwarding makes a significant difference. > I think Evan's worry was to base this decision on visible and comprehensible benchmarks, such as the test-suite. If I get lucky I may be able to run
2013 Feb 08
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to > guarantee correctness or for performance purpose ? I’ve made some > experiments and DISABLING vmlx-forwarding for cortex-a9 leads to generation > of more vmla/vmls .f32 and significantly improve some benchmarks. I’ve not >
2013 Feb 12
3
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
I did the initial work on vmla formation. The default settings for cortex-a8 / a9 due to micro-architecture difference (i believe a8 TRM talks about vmla hazards) and extensive testing. That said, given the limitation of the current pre-RA scheduling pass, it's likely the use of vmla can caused regressions. Im not opposed to changing the setting for a9. However, it's not a good idea to
2013 Feb 11
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Bob, Seb, Renalto, My VMLA performance work was on Swift, rather than Cortex-A9. Sebastian - is vmlx-forwarding really the only variable you changed between your tests? As far as I can see the VMLx forwarding attribute only exists to restrict the application of one DAG combine optimization: PerformVMULCombine in ARMISelLowering.cpp, which turns (A + B) * C into (A * C) + (B * C). This
2012 Nov 09
2
[LLVMdev] fmac generation for cortex-a9
Hi Renato, It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ? And as mentioned before code is working ! Seb > -----Original Message----- > From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of > Renato Golin > Sent: Friday, November 09, 2012 6:27 PM > To: Sebastien DELDON-GNB >
2012 Nov 09
2
[LLVMdev] fmac generation for cortex-a9
Hi Bastien, Weird gcc is generating fma for my platform STEricsson Novathor with Linaro, code works. It also works when I use LLVM to generate fma (using llc -mtriple=armv7-eabi). Maybe someone from ARM can answer the question ? Seb From: JF Bastien [mailto:jfb at google.com] Sent: Friday, November 09, 2012 5:36 PM To: Sebastien DELDON-GNB Cc: Anitha Boyapati; llvmdev at cs.uiuc.edu Subject:
2013 Feb 12
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Understood, Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when -mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform). Best Regards Seb From: David Tweed [mailto:david.tweed at
2012 Nov 08
2
[LLVMdev] fmac generation for cortex-a9
Hi Anitha, Thanks for your answer but -mcpu=cortex-a9 -mattr=+vfp4 doesn' t enable fused mac generation for me. I would like just to understand why -mtriple=armv7-eabi enables it while -mcpu=cortex-a9 seems to disable it ? Seb > -----Original Message----- > From: Anitha Boyapati [mailto:anitha.boyapati at gmail.com] > Sent: Thursday, November 08, 2012 10:22 AM > To: Sebastien
2013 Feb 11
3
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato, Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete benchmark execution ! So it is worth a try. Now going back to vmla generation through LLMV intrinsic usage. I've looked at .td file and it seems to me that when there is a "pattern" to generate instruction, no
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
cat /proc/cpuinfo ? Are you sure it's generating VFMA and not VMLA? On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Renato, > > It's definitively not A15. Can this be the case that NEON units for > cortex-A9 support it but isn't documented/recommended ? > And as mentioned before code is working ! > > Seb >
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
Hi Sebastien, ARMv7-M has VFMA and LLVM's "triple" is far from perfect. Wikipedia tells me NovaThor can also be A15, or STE could have cramped a VFPv4 in it? ;) Or possibly, your code never branches into the VFMA. Many things could be happening, but usually, VFMA shouldn't be generated for A9. A GCC bug, maybe? On 9 November 2012 16:51, Sebastien DELDON-GNB
2013 Feb 11
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
In theory, the backend should choose the best instructions for the selected target processor. VMLA is not always the best choice. Lang Hames did some measurements a while back to come up with the current behavior, but I don't remember exactly what he found. CC'ing Lang. On Feb 11, 2013, at 8:12 AM, Renato Golin <renato.golin at linaro.org> wrote: > On 11 February 2013 15:51,
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA isn't a fused multiply-add, it's a multiply followed by an add and has different latency as well as precision. On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Anitha,
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi all, Everything is in the tile, I would like to enforce generation of vmla.f32 instruction for scalar operations on cortex-a9, so is there a LLMV neon intrinsic available for that ? Thanks for your answers Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL:
2013 Feb 12
0
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
If this helps taking your decision, there are at least two benchmarks for which disabling vmlx-forwarding makes a significant difference. If I get lucky I may be able to run on a panda board by next week and have more info to share Best Regards Seb ________________________________________ De : Evan Cheng [evan.cheng at apple.com] Date d'envoi : mardi 12 février 2013 16:47 À : Renato Golin Cc