thr3ads.net - similar to: "[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?"

Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?"

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 8 February 2013 10:40, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Hi all,**** > > ** ** > > Everything is in the tile, I would like to enforce generation of vmla.f32 > instruction for scalar operations on cortex-a9, so is there a LLMV neon > intrinsic available for that ?**** > > Hi Sebastien, LLVM doesn't use intrinsics when there is a

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato, Thanks for the answer, it confirms what I was suspecting. My problem is that this behavior is controlled by vmlx forwarding on cortex-a9 for which despite asking on this list, I couldn't get a clear understanding what this option is meant for. So here are my new questions: Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee correctness or for performance

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to > guarantee correctness or for performance purpose ? I’ve made some > experiments and DISABLING vmlx-forwarding for cortex-a9 leads to generation > of more vmla/vmls .f32 and significantly improve some benchmarks. I’ve not >

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 11

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato, Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete benchmark execution ! So it is worth a try. Now going back to vmla generation through LLMV intrinsic usage. I've looked at .td file and it seems to me that when there is a "pattern" to generate instruction, no

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 11

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

In theory, the backend should choose the best instructions for the selected target processor. VMLA is not always the best choice. Lang Hames did some measurements a while back to come up with the current behavior, but I don't remember exactly what he found. CC'ing Lang. On Feb 11, 2013, at 8:12 AM, Renato Golin <renato.golin at linaro.org> wrote: > On 11 February 2013 15:51,

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 11

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Bob, Seb, Renalto, My VMLA performance work was on Swift, rather than Cortex-A9. Sebastian - is vmlx-forwarding really the only variable you changed between your tests? As far as I can see the VMLx forwarding attribute only exists to restrict the application of one DAG combine optimization: PerformVMULCombine in ARMISelLowering.cpp, which turns (A + B) * C into (A * C) + (B * C). This

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > If this helps taking your decision, there are at least two benchmarks for > which disabling vmlx-forwarding makes a significant difference. > I think Evan's worry was to base this decision on visible and comprehensible benchmarks, such as the test-suite. If I get lucky I may be able to run

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 13

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Sebastien, How many extra vmlas did you see in 433.milc due to disabling -vmlx-forwarding? As I mentioned earlier, I saw only two additional integer vmlx instructions when I tested. Could you send me your 433.milc compile setup? (os, flags, compiler version, etc.). I'd like to try to reproduce your results. Cheers, Lang. On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin <renato.golin at

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 15

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Lang & Renato, I eventually set up a panda board with latest linaro delivery (eabi-hf). I did some experiments using my own compiler and LLVM 3.2 as back-end. I use same flagset for my compiler (front-end) and just invoke llc with and without vmlx-forwarding attribute. So base arguments to llc are: llc -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard to which I added

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 15

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 15 February 2013 16:00, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > to which I added –mattr=-vmlx-forwarding to disable vmlx forwarding for > cortex-a9. > > When I DISABLE vmlx forwarding I’m observing a 7% speed-up on ref dataset > for MILC. So I’m observing something similar to what I’ve observed on STE > platform available on SNOWBALL board. > Hi

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

I did the initial work on vmla formation. The default settings for cortex-a8 / a9 due to micro-architecture difference (i believe a8 TRM talks about vmla hazards) and extensive testing. That said, given the limitation of the current pre-RA scheduling pass, it's likely the use of vmla can caused regressions. Im not opposed to changing the setting for a9. However, it's not a good idea to

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Understood, Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when -mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform). Best Regards Seb From: David Tweed [mailto:david.tweed at

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 11

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 11 February 2013 15:51, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Indeed problem is with generation of vmla.f64. Affected benchmark is MILC > from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on > complete benchmark execution ! So it is worth a try. > Hi Sebastien, Ineed, worth having a look. Including Bob Wilson (who introduced the

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Same architecture, different micro-arch (implementation). Could this be > the case that vmlx-forwarding make senses for SWIFT and not for ARM > Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is > used but test have made show significant improvements when disabled for

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi all, Sorry for my naïve question but what is Swift ? Yes vmlx-forwarding is the only variable I changed in my tests. I did the experiment on another popular FP benchmark and observe a 14% speed-up only by disabling vmlx-forwarding. Best Regards Seb My VMLA performance work was on Swift, rather than Cortex-A9. Sebastian - is vmlx-forwarding really the only variable you changed between your

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

If this helps taking your decision, there are at least two benchmarks for which disabling vmlx-forwarding makes a significant difference. If I get lucky I may be able to run on a panda board by next week and have more info to share Best Regards Seb ________________________________________ De : Evan Cheng [evan.cheng at apple.com] Date d'envoi : mardi 12 février 2013 16:47 À : Renato Golin Cc

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 15

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato, No I've used LNT before and it might not be as simple as you think to get it working here. I'll see what I can do, but It's unlikely I'll have much time to spend on this topic in the coming weeks. I'm more interested coming back to my original question, and would like to know how to proceed if I want to define my own LLVM intrinsic to generate VMLA instruction. My

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 11

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi, | If we find that this optimization produces worse code in more cases than not, than we should leave it disable by default and let the user enable when necessary. I'll let Bob follow up on that, since I don't know what benchmarks he used. Note that it may well be the case that the most "generally performant" default may vary between different ARM cores as well as

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

| Sorry for my naïve question but what is Swift ? It's a complicated area. There's the standard Cortex-a9 design from ARM, Swift is the CPU that Apple that's used in their latest products that is significantly modified from a basic ARM design and then there's the next generation Cortex-a15 design from ARM. Each of them handles the same instruction set, but the implementation

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Renato, It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ? And as mentioned before code is working ! Seb > -----Original Message----- > From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of > Renato Golin > Sent: Friday, November 09, 2012 6:27 PM > To: Sebastien DELDON-GNB >

similar to: [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?