Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] fmac generation for cortex-a9"
2012 Nov 08
2
[LLVMdev] fmac generation for cortex-a9
Hi Anitha,
Thanks for your answer but -mcpu=cortex-a9 -mattr=+vfp4 doesn' t enable fused mac generation for me.
I would like just to understand why -mtriple=armv7-eabi enables it while -mcpu=cortex-a9 seems to disable it ?
Seb
> -----Original Message-----
> From: Anitha Boyapati [mailto:anitha.boyapati at gmail.com]
> Sent: Thursday, November 08, 2012 10:22 AM
> To: Sebastien
2012 Nov 09
2
[LLVMdev] fmac generation for cortex-a9
Hi Renato,
It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ?
And as mentioned before code is working !
Seb
> -----Original Message-----
> From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of
> Renato Golin
> Sent: Friday, November 09, 2012 6:27 PM
> To: Sebastien DELDON-GNB
>
2012 Nov 09
2
[LLVMdev] fmac generation for cortex-a9
Hi Bastien,
Weird gcc is generating fma for my platform STEricsson Novathor with Linaro, code works. It also works when I use LLVM to generate fma (using llc -mtriple=armv7-eabi). Maybe someone from ARM can answer the question ?
Seb
From: JF Bastien [mailto:jfb at google.com]
Sent: Friday, November 09, 2012 5:36 PM
To: Sebastien DELDON-GNB
Cc: Anitha Boyapati; llvmdev at cs.uiuc.edu
Subject:
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't
know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA
isn't a fused multiply-add, it's a multiply followed by an add and has
different latency as well as precision.
On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB <
sebastien.deldon at st.com> wrote:
> Hi Anitha,
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
cat /proc/cpuinfo ?
Are you sure it's generating VFMA and not VMLA?
On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB <
sebastien.deldon at st.com> wrote:
> Hi Renato,
>
> It's definitively not A15. Can this be the case that NEON units for
> cortex-A9 support it but isn't documented/recommended ?
> And as mentioned before code is working !
>
> Seb
>
2012 Nov 08
0
[LLVMdev] fmac generation for cortex-a9
On 8 November 2012 13:56, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote:
> Hi all,
>
>
>
>
>
> I’ve a .ll code that use double precision fmul/fadd or fmul/fsub. When I
> compile it using llc –mcpu=cortex-a9 I couldn’t get vmla/vmls generated even
> using –fp-contract=fast, but when I use option –mtriple=armv7-eabi instead
> of –mcpu=cortex-a9 fused mac
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
Hi Sebastien,
ARMv7-M has VFMA and LLVM's "triple" is far from perfect.
Wikipedia tells me NovaThor can also be A15, or STE could have cramped
a VFPv4 in it? ;) Or possibly, your code never branches into the VFMA.
Many things could be happening, but usually, VFMA shouldn't be
generated for A9.
A GCC bug, maybe?
On 9 November 2012 16:51, Sebastien DELDON-GNB
2012 Nov 12
1
[LLVMdev] RE : fmac generation for cortex-a9
Hi Renato,
You're right it's VMLA/VMLS that are generated. Still don't understand what drives generation for Cortex-A9.
I was using fmac for floating point MAC not for fused MAC. Than I realized that we spoke about fma instead of fmac.
So back to the original problem why when using -mcpu=cortex-a9 VMLA/VMLS are not generated and when I use -mtriple=armv7-eabi they are ?
Best
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato,
Thanks for the answer, it confirms what I was suspecting. My problem is that this behavior is controlled by vmlx forwarding on cortex-a9 for which despite asking on this list, I couldn't get a clear understanding what this option is meant for.
So here are my new questions:
Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee correctness or for performance
2013 Feb 08
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to
> guarantee correctness or for performance purpose ? I’ve made some
> experiments and DISABLING vmlx-forwarding for cortex-a9 leads to generation
> of more vmla/vmls .f32 and significantly improve some benchmarks. I’ve not
>
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi all,
Everything is in the tile, I would like to enforce generation of vmla.f32 instruction for scalar operations on cortex-a9, so is there a LLMV neon intrinsic available for that ?
Thanks for your answers
Best Regards
Seb
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2013 Feb 08
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 8 February 2013 10:40, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> Hi all,****
>
> ** **
>
> Everything is in the tile, I would like to enforce generation of vmla.f32
> instruction for scalar operations on cortex-a9, so is there a LLMV neon
> intrinsic available for that ?****
>
>
Hi Sebastien,
LLVM doesn't use intrinsics when there is a
2013 Feb 11
3
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato,
Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete benchmark execution ! So it is worth a try. Now going back to vmla generation through LLMV intrinsic usage. I've looked at .td file and it seems to me that when there is a "pattern" to generate instruction, no
2012 Dec 20
1
[LLVMdev] vmlx forwarding an cortex A9 question
Hi all,
On following code when I use llc targeting ARM Cortex-A9 as follows, if vmlx-forwarding is turned off then 'vmla' instructions are generated. It seems that -mcpu=cortex-a9 enables it by default and thus less 'vmla' instructions are generated. On this specific example it doesn't make any difference in term of performance, but on a more complex example disabling
2013 Feb 12
3
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> If this helps taking your decision, there are at least two benchmarks for
> which disabling vmlx-forwarding makes a significant difference.
>
I think Evan's worry was to base this decision on visible and
comprehensible benchmarks, such as the test-suite.
If I get lucky I may be able to run
2013 Feb 13
0
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Sebastien,
How many extra vmlas did you see in 433.milc due to disabling
-vmlx-forwarding?
As I mentioned earlier, I saw only two additional integer vmlx instructions
when I tested.
Could you send me your 433.milc compile setup? (os, flags, compiler
version, etc.). I'd like to try to reproduce your results.
Cheers,
Lang.
On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin <renato.golin at
2013 Feb 12
3
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
I did the initial work on vmla formation. The default settings for cortex-a8 / a9 due to micro-architecture difference (i believe a8 TRM talks about vmla hazards) and extensive testing. That said, given the limitation of the current pre-RA scheduling pass, it's likely the use of vmla can caused regressions.
Im not opposed to changing the setting for a9. However, it's not a good idea to
2013 Feb 15
2
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Lang & Renato,
I eventually set up a panda board with latest linaro delivery (eabi-hf). I did some experiments using my own compiler and LLVM 3.2 as back-end.
I use same flagset for my compiler (front-end) and just invoke llc with and without vmlx-forwarding attribute. So base arguments to llc are:
llc -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard
to which I added
2013 Mar 12
2
[LLVMdev] Help needed on debugging llvm
I'm still slightly confused. Is the error now fixed or is there still a bug
in LLVM's integrated assembler?
On Mon, Mar 11, 2013 at 4:49 AM, Anitha B Gollamudi <
anitha.boyapati at gmail.com> wrote:
> On 11 March 2013 17:00, Duncan Sands <baldrick at free.fr> wrote:
> > Hi Anitha,
> >
> >
> >> Ah, I am taking back my above words w.r.t encoding.
2013 Jan 22
1
[LLVMdev] Help needed on debugging llvm
Are you still having issues with FMA4? I wonder if PR15040 is related. A
fix was just committed.
On Wed, Nov 7, 2012 at 3:22 AM, Anitha Boyapati
<anitha.boyapati at gmail.com>wrote:
>
>
> On 7 November 2012 15:29, Duncan Sands <baldrick at free.fr> wrote:
>
>>
>> That way the output should be exactly the same as the output dragonegg
>> would
>>