Displaying 20 results from an estimated 3000 matches similar to: "[LLVMdev] LLVM target specific built-ins"
2012 Jun 13
0
[LLVMdev] LLVM target specific built-ins
The easiest way is probably to look at the source tree and look at the intrinsicsARM.td in llvm/include/llvm directory.
Micah
From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Sebastien DELDON-GNB
Sent: Wednesday, June 13, 2012 6:16 AM
To: LLVMdev at cs.uiuc.edu
Subject: [LLVMdev] LLVM target specific built-ins
Hi all,
Does someone knows if there is an
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi all,
Everything is in the tile, I would like to enforce generation of vmla.f32 instruction for scalar operations on cortex-a9, so is there a LLMV neon intrinsic available for that ?
Thanks for your answers
Best Regards
Seb
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato,
Thanks for the answer, it confirms what I was suspecting. My problem is that this behavior is controlled by vmlx forwarding on cortex-a9 for which despite asking on this list, I couldn't get a clear understanding what this option is meant for.
So here are my new questions:
Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee correctness or for performance
2013 Feb 08
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 8 February 2013 10:40, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> Hi all,****
>
> ** **
>
> Everything is in the tile, I would like to enforce generation of vmla.f32
> instruction for scalar operations on cortex-a9, so is there a LLMV neon
> intrinsic available for that ?****
>
>
Hi Sebastien,
LLVM doesn't use intrinsics when there is a
2013 Feb 11
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
In theory, the backend should choose the best instructions for the selected target processor. VMLA is not always the best choice. Lang Hames did some measurements a while back to come up with the current behavior, but I don't remember exactly what he found. CC'ing Lang.
On Feb 11, 2013, at 8:12 AM, Renato Golin <renato.golin at linaro.org> wrote:
> On 11 February 2013 15:51,
2013 Feb 11
3
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato,
Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete benchmark execution ! So it is worth a try. Now going back to vmla generation through LLMV intrinsic usage. I've looked at .td file and it seems to me that when there is a "pattern" to generate instruction, no
2013 Feb 08
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to
> guarantee correctness or for performance purpose ? I’ve made some
> experiments and DISABLING vmlx-forwarding for cortex-a9 leads to generation
> of more vmla/vmls .f32 and significantly improve some benchmarks. I’ve not
>
2012 Nov 09
2
[LLVMdev] fmac generation for cortex-a9
Hi Renato,
It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ?
And as mentioned before code is working !
Seb
> -----Original Message-----
> From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of
> Renato Golin
> Sent: Friday, November 09, 2012 6:27 PM
> To: Sebastien DELDON-GNB
>
2013 Feb 11
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Bob, Seb, Renalto,
My VMLA performance work was on Swift, rather than Cortex-A9.
Sebastian - is vmlx-forwarding really the only variable you changed between
your tests?
As far as I can see the VMLx forwarding attribute only exists to restrict
the application of one DAG combine optimization: PerformVMULCombine in
ARMISelLowering.cpp, which turns (A + B) * C into (A * C) + (B * C). This
2012 Nov 09
2
[LLVMdev] fmac generation for cortex-a9
Hi Bastien,
Weird gcc is generating fma for my platform STEricsson Novathor with Linaro, code works. It also works when I use LLVM to generate fma (using llc -mtriple=armv7-eabi). Maybe someone from ARM can answer the question ?
Seb
From: JF Bastien [mailto:jfb at google.com]
Sent: Friday, November 09, 2012 5:36 PM
To: Sebastien DELDON-GNB
Cc: Anitha Boyapati; llvmdev at cs.uiuc.edu
Subject:
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
cat /proc/cpuinfo ?
Are you sure it's generating VFMA and not VMLA?
On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB <
sebastien.deldon at st.com> wrote:
> Hi Renato,
>
> It's definitively not A15. Can this be the case that NEON units for
> cortex-A9 support it but isn't documented/recommended ?
> And as mentioned before code is working !
>
> Seb
>
2012 Nov 08
2
[LLVMdev] fmac generation for cortex-a9
Hi Anitha,
Thanks for your answer but -mcpu=cortex-a9 -mattr=+vfp4 doesn' t enable fused mac generation for me.
I would like just to understand why -mtriple=armv7-eabi enables it while -mcpu=cortex-a9 seems to disable it ?
Seb
> -----Original Message-----
> From: Anitha Boyapati [mailto:anitha.boyapati at gmail.com]
> Sent: Thursday, November 08, 2012 10:22 AM
> To: Sebastien
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
Hi Sebastien,
ARMv7-M has VFMA and LLVM's "triple" is far from perfect.
Wikipedia tells me NovaThor can also be A15, or STE could have cramped
a VFPv4 in it? ;) Or possibly, your code never branches into the VFMA.
Many things could be happening, but usually, VFMA shouldn't be
generated for A9.
A GCC bug, maybe?
On 9 November 2012 16:51, Sebastien DELDON-GNB
2012 Jun 04
1
[LLVMdev] llc support for ARM predication ?
Hi James,
Thanks for the answer, for Cortex-A9 would you recommend to generate thumb2 code or ARM code ? What would be the best performance wise ?
Best Regards
Seb
> -----Original Message-----
> From: James Molloy [mailto:james.molloy at arm.com]
> Sent: Thursday, May 31, 2012 9:57 AM
> To: Sebastien DELDON-GNB
> Cc: llvmdev at cs.uiuc.edu
> Subject: Re: [LLVMdev] llc support
2011 Feb 28
2
[LLVMdev] Use of movupd instead of movapd for x86
Understood for the aligned case, I want to measure performance degradation for unaligned case.
I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction it will raise an exception (SEGFAULT), I just want to know if there
2012 May 30
2
[LLVMdev] llc support for ARM predication ?
Hi James,
Thanks for the answer, can you elaborate on difference between thumb, thumb2, ARM, thumbv7.
I'm a bit lost right now. When specifying thumbv7 llc will generate thumb only code, not thumb2 ?
Best Regards
Seb
> -----Original Message-----
> From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu]
> On Behalf Of James Molloy
> Sent: Tuesday, May 29,
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't
know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA
isn't a fused multiply-add, it's a multiply followed by an add and has
different latency as well as precision.
On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB <
sebastien.deldon at st.com> wrote:
> Hi Anitha,
2012 Jul 05
3
[LLVMdev] Vector argument passing abi for ARM ?
Hi Rotem,
Thanks for the quick answer, how do I know which type is legal/illegal with respect to calling convention ?
Best Regards
Seb
> -----Original Message-----
> From: Rotem, Nadav [mailto:nadav.rotem at intel.com]
> Sent: Thursday, July 05, 2012 11:21 AM
> To: Sebastien DELDON-GNB; llvmdev at cs.uiuc.edu
> Subject: RE: Vector argument passing abi for ARM ?
>
> The
2012 Sep 21
2
[LLVMdev] RE : Question about LLVM NEON intrinsics
Hi Eli,
Thanks for the answer, it clarifies the situation for me. Do you know if there is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ?
Or shall I define my own intrinsics for non supported types ?
Best Regards
Seb
________________________________________
De : Eli Friedman [eli.friedman at gmail.com]
Date d'envoi : vendredi 21 septembre 2012 11:54
À : Sebastien
2013 Feb 15
2
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Lang & Renato,
I eventually set up a panda board with latest linaro delivery (eabi-hf). I did some experiments using my own compiler and LLVM 3.2 as back-end.
I use same flagset for my compiler (front-end) and just invoke llc with and without vmlx-forwarding attribute. So base arguments to llc are:
llc -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard
to which I added