thr3ads.net - similar to: "[LLVMdev] Is it possible to define a LLVM intrinsic that expands in more than one instructions ?"

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Is it possible to define a LLVM intrinsic that expands in more than one instructions ?"

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > If this helps taking your decision, there are at least two benchmarks for > which disabling vmlx-forwarding makes a significant difference. > I think Evan's worry was to base this decision on visible and comprehensible benchmarks, such as the test-suite. If I get lucky I may be able to run

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Renato, It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ? And as mentioned before code is working ! Seb > -----Original Message----- > From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of > Renato Golin > Sent: Friday, November 09, 2012 6:27 PM > To: Sebastien DELDON-GNB >

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 13

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Sebastien, How many extra vmlas did you see in 433.milc due to disabling -vmlx-forwarding? As I mentioned earlier, I saw only two additional integer vmlx instructions when I tested. Could you send me your 433.milc compile setup? (os, flags, compiler version, etc.). I'd like to try to reproduce your results. Cheers, Lang. On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin <renato.golin at

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Bastien, Weird gcc is generating fma for my platform STEricsson Novathor with Linaro, code works. It also works when I use LLVM to generate fma (using llc -mtriple=armv7-eabi). Maybe someone from ARM can answer the question ? Seb From: JF Bastien [mailto:jfb at google.com] Sent: Friday, November 09, 2012 5:36 PM To: Sebastien DELDON-GNB Cc: Anitha Boyapati; llvmdev at cs.uiuc.edu Subject:

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

cat /proc/cpuinfo ? Are you sure it's generating VFMA and not VMLA? On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Renato, > > It's definitively not A15. Can this be the case that NEON units for > cortex-A9 support it but isn't documented/recommended ? > And as mentioned before code is working ! > > Seb >

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato, Thanks for the answer, it confirms what I was suspecting. My problem is that this behavior is controlled by vmlx forwarding on cortex-a9 for which despite asking on this list, I couldn't get a clear understanding what this option is meant for. So here are my new questions: Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee correctness or for performance

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 15

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Lang & Renato, I eventually set up a panda board with latest linaro delivery (eabi-hf). I did some experiments using my own compiler and LLVM 3.2 as back-end. I use same flagset for my compiler (front-end) and just invoke llc with and without vmlx-forwarding attribute. So base arguments to llc are: llc -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard to which I added

[LLVMdev] fmac generation for cortex-a9

2012 Nov 08

[LLVMdev] fmac generation for cortex-a9

Hi Anitha, Thanks for your answer but -mcpu=cortex-a9 -mattr=+vfp4 doesn' t enable fused mac generation for me. I would like just to understand why -mtriple=armv7-eabi enables it while -mcpu=cortex-a9 seems to disable it ? Seb > -----Original Message----- > From: Anitha Boyapati [mailto:anitha.boyapati at gmail.com] > Sent: Thursday, November 08, 2012 10:22 AM > To: Sebastien

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi all, Everything is in the tile, I would like to enforce generation of vmla.f32 instruction for scalar operations on cortex-a9, so is there a LLMV neon intrinsic available for that ? Thanks for your answers Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] Use of movupd instead of movapd for x86

2011 Feb 28

[LLVMdev] Use of movupd instead of movapd for x86

Understood for the aligned case, I want to measure performance degradation for unaligned case. I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction it will raise an exception (SEGFAULT), I just want to know if there

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Sebastien, ARMv7-M has VFMA and LLVM's "triple" is far from perfect. Wikipedia tells me NovaThor can also be A15, or STE could have cramped a VFPv4 in it? ;) Or possibly, your code never branches into the VFMA. Many things could be happening, but usually, VFMA shouldn't be generated for A9. A GCC bug, maybe? On 9 November 2012 16:51, Sebastien DELDON-GNB

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

I did the initial work on vmla formation. The default settings for cortex-a8 / a9 due to micro-architecture difference (i believe a8 TRM talks about vmla hazards) and extensive testing. That said, given the limitation of the current pre-RA scheduling pass, it's likely the use of vmla can caused regressions. Im not opposed to changing the setting for a9. However, it's not a good idea to

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to > guarantee correctness or for performance purpose ? I’ve made some > experiments and DISABLING vmlx-forwarding for cortex-a9 leads to generation > of more vmla/vmls .f32 and significantly improve some benchmarks. I’ve not >

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 15

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 15 February 2013 16:00, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > to which I added –mattr=-vmlx-forwarding to disable vmlx forwarding for > cortex-a9. > > When I DISABLE vmlx forwarding I’m observing a 7% speed-up on ref dataset > for MILC. So I’m observing something similar to what I’ve observed on STE > platform available on SNOWBALL board. > Hi

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 8 February 2013 10:40, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > Hi all,**** > > ** ** > > Everything is in the tile, I would like to enforce generation of vmla.f32 > instruction for scalar operations on cortex-a9, so is there a LLMV neon > intrinsic available for that ?**** > > Hi Sebastien, LLVM doesn't use intrinsics when there is a

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA isn't a fused multiply-add, it's a multiply followed by an add and has different latency as well as precision. On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Anitha,

[LLVMdev] Use of movupd instead of movapd for x86

2011 Mar 01

[LLVMdev] Use of movupd instead of movapd for x86

On Feb 28, 2011, at 2:58 AM, Sebastien DELDON-GNB wrote: > Understood for the aligned case, I want to measure performance degradation for unaligned case. > I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction

[LLVMdev] RE : Question about LLVM NEON intrinsics

2012 Sep 21

[LLVMdev] RE : Question about LLVM NEON intrinsics

Hi Eli, Thanks for the answer, it clarifies the situation for me. Do you know if there is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ? Or shall I define my own intrinsics for non supported types ? Best Regards Seb ________________________________________ De : Eli Friedman [eli.friedman at gmail.com] Date d'envoi : vendredi 21 septembre 2012 11:54 À : Sebastien

[LLVMdev] Use of movupd instead of movapd for x86

2011 Feb 25

[LLVMdev] Use of movupd instead of movapd for x86

Hi all, Is there a way to force llc to generate movupd instruction instead of movapd for x86 target ? I know that movapd is more performant, but I would like to measure degradation when alignment constraints are not met. Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Understood, Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when -mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform). Best Regards Seb From: David Tweed [mailto:david.tweed at

similar to: [LLVMdev] Is it possible to define a LLVM intrinsic that expands in more than one instructions ?