search for: vmlx

Displaying 20 results from an estimated 25 matches for "vmlx".

Did you mean: vmla
2012 Dec 20
1
[LLVMdev] vmlx forwarding an cortex A9 question
Hi all, On following code when I use llc targeting ARM Cortex-A9 as follows, if vmlx-forwarding is turned off then 'vmla' instructions are generated. It seems that -mcpu=cortex-a9 enables it by default and thus less 'vmla' instructions are generated. On this specific example it doesn't make any difference in term of performance, but on a more complex example dis...
2013 Feb 15
2
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Lang & Renato, I eventually set up a panda board with latest linaro delivery (eabi-hf). I did some experiments using my own compiler and LLVM 3.2 as back-end. I use same flagset for my compiler (front-end) and just invoke llc with and without vmlx-forwarding attribute. So base arguments to llc are: llc -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard to which I added -mattr=-vmlx-forwarding to disable vmlx forwarding for cortex-a9. When I DISABLE vmlx forwarding I'm observing a 7% speed-up on ref dataset for MILC. So I'm ob...
2013 Feb 12
3
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > If this helps taking your decision, there are at least two benchmarks for > which disabling vmlx-forwarding makes a significant difference. > I think Evan's worry was to base this decision on visible and comprehensible benchmarks, such as the test-suite. If I get lucky I may be able to run on a panda board by next week and have > more info to share > That'd be great, thank...
2013 Feb 13
0
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Sebastien, How many extra vmlas did you see in 433.milc due to disabling -vmlx-forwarding? As I mentioned earlier, I saw only two additional integer vmlx instructions when I tested. Could you send me your 433.milc compile setup? (os, flags, compiler version, etc.). I'd like to try to reproduce your results. Cheers, Lang. On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin &...
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato, Thanks for the answer, it confirms what I was suspecting. My problem is that this behavior is controlled by vmlx forwarding on cortex-a9 for which despite asking on this list, I couldn't get a clear understanding what this option is meant for. So here are my new questions: Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee correctness or for performance purpose ? I've made so...
2013 Feb 11
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Bob, Seb, Renalto, My VMLA performance work was on Swift, rather than Cortex-A9. Sebastian - is vmlx-forwarding really the only variable you changed between your tests? As far as I can see the VMLx forwarding attribute only exists to restrict the application of one DAG combine optimization: PerformVMULCombine in ARMISelLowering.cpp, which turns (A + B) * C into (A * C) + (B * C). This combine onl...
2013 Feb 15
0
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 15 February 2013 16:00, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > to which I added –mattr=-vmlx-forwarding to disable vmlx forwarding for > cortex-a9. > > When I DISABLE vmlx forwarding I’m observing a 7% speed-up on ref dataset > for MILC. So I’m observing something similar to what I’ve observed on STE > platform available on SNOWBALL board. > Hi Seb, Thanks for doing thi...
2012 Dec 05
1
[LLVMdev] vmlx forwarding option for Cortex-A9
Hi all, Can someone explain me why is vmlx forwarding option enabled for cortex-a9 ? and what the purpose of it ? Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121205/6d6a7a03/attachment.html>
2013 Feb 11
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
...ang. On Feb 11, 2013, at 8:12 AM, Renato Golin <renato.golin at linaro.org> wrote: > On 11 February 2013 15:51, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: > Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete benchmark execution ! So it is worth a try. > > > Hi Sebastien, > > Ineed, worth having a look. Including Bob Wilson (who introduced the code in the first place, and is a connoisseur of NEON in LLVM) to see if he has a better idea of the...
2013 Feb 08
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to > guarantee correctness or for performance purpose ? I’ve made some > experiments and DISABLING vmlx-forwarding for cortex-a9 leads to generation > of more vmla/vmls .f32 and significantly improve some benchmarks. I’ve not > enter into a case...
2013 Feb 12
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi all, Sorry for my naïve question but what is Swift ? Yes vmlx-forwarding is the only variable I changed in my tests. I did the experiment on another popular FP benchmark and observe a 14% speed-up only by disabling vmlx-forwarding. Best Regards Seb My VMLA performance work was on Swift, rather than Cortex-A9. Sebastian - is vmlx-forwarding really the onl...
2013 Feb 12
3
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
...Evan Sent from my iPad On Feb 12, 2013, at 3:08 AM, Renato Golin <renato.golin at linaro.org> wrote: > On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: >> Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform). >> > > Hi Sebastien, > > The optimization d...
2013 Feb 12
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Understood, Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when -mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform). Best Regards Seb From: David Tweed [mailto:david.tweed at arm....
2013 Feb 11
3
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato, Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete benchmark execution ! So it is worth a try. Now going back to vmla generation through LLMV intrinsic usage. I've looked at .td file and it seems to me that when there is a "pattern" to generate instruction, no intrinsic is defined to generat...
2013 Feb 15
1
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
...ON-GNB Cc: Lang Hames; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ? On 15 February 2013 16:00, Sebastien DELDON-GNB <sebastien.deldon at st.com<mailto:sebastien.deldon at st.com>> wrote: to which I added -mattr=-vmlx-forwarding to disable vmlx forwarding for cortex-a9. When I DISABLE vmlx forwarding I'm observing a 7% speed-up on ref dataset for MILC. So I'm observing something similar to what I've observed on STE platform available on SNOWBALL board. Hi Seb, Thanks for doing this, as we expected,...
2013 Feb 12
0
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
If this helps taking your decision, there are at least two benchmarks for which disabling vmlx-forwarding makes a significant difference. If I get lucky I may be able to run on a panda board by next week and have more info to share Best Regards Seb ________________________________________ De : Evan Cheng [evan.cheng at apple.com] Date d'envoi : mardi 12 février 2013 16:47 À : Renato Gol...
2013 Feb 12
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Same architecture, different micro-arch (implementation). Could this be > the case that vmlx-forwarding make senses for SWIFT and not for ARM > Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is > used but test have made show significant improvements when disabled for > cortex-A9 (STEricsson Nova platform). > Hi Sebastien, The optimization does make s...
2013 Feb 08
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 8 February 2013 10:40, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Hi all,**** > > ** ** > > Everything is in the tile, I would like to enforce generation of vmla.f32 > instruction for scalar operations on cortex-a9, so is there a LLMV neon > intrinsic available for that ?**** > > Hi Sebastien, LLVM doesn't use intrinsics when there is a
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi all, Everything is in the tile, I would like to enforce generation of vmla.f32 instruction for scalar operations on cortex-a9, so is there a LLMV neon intrinsic available for that ? Thanks for your answers Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL:
2013 Dec 18
1
[LLVMdev] LLVM ARM VMLA instruction
Hi, I was going through Code of LLVM instruction code generation for ARM. I came across VMLA instruction hazards (Floating point multiply and accumulate). I was comparing assembly code emitted by LLVM and GCC, where i saw that GCC was happily using VMLA instruction for floating point while LLVM never used it, instead it used a pair of VMUL and VADD instruction. I wanted to know if there is any