thr3ads.net - similar to: "[LLVMdev] ACE claims better result than LLVM for ARM 9 ?"

Displaying 20 results from an estimated 4000 matches similar to: "[LLVMdev] ACE claims better result than LLVM for ARM 9 ?"

[LLVMdev] ACE claims better result than LLVM for ARM 9 ?

2012 Jul 27

[LLVMdev] ACE claims better result than LLVM for ARM 9 ?

On Jul 27, 2012, at 9:36 AM, Sebastien DELDON-GNB wrote: > ACE issued following PR: > http://www.ace.nl/news/aces-cosy-compiler-framework-outperforms-llvm-arm9-processor > Weird that they don't give any number and use ARM 9, do they mean cortex-a9 ? It's impossible to say. This sort of marketing statement is impossible to refute, because there are no details. Who knows whether

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Renato, It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ? And as mentioned before code is working ! Seb > -----Original Message----- > From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of > Renato Golin > Sent: Friday, November 09, 2012 6:27 PM > To: Sebastien DELDON-GNB >

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Bastien, Weird gcc is generating fma for my platform STEricsson Novathor with Linaro, code works. It also works when I use LLVM to generate fma (using llc -mtriple=armv7-eabi). Maybe someone from ARM can answer the question ? Seb From: JF Bastien [mailto:jfb at google.com] Sent: Friday, November 09, 2012 5:36 PM To: Sebastien DELDON-GNB Cc: Anitha Boyapati; llvmdev at cs.uiuc.edu Subject:

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

cat /proc/cpuinfo ? Are you sure it's generating VFMA and not VMLA? On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Renato, > > It's definitively not A15. Can this be the case that NEON units for > cortex-A9 support it but isn't documented/recommended ? > And as mentioned before code is working ! > > Seb >

[LLVMdev] fmac generation for cortex-a9

2012 Nov 08

[LLVMdev] fmac generation for cortex-a9

Hi Anitha, Thanks for your answer but -mcpu=cortex-a9 -mattr=+vfp4 doesn' t enable fused mac generation for me. I would like just to understand why -mtriple=armv7-eabi enables it while -mcpu=cortex-a9 seems to disable it ? Seb > -----Original Message----- > From: Anitha Boyapati [mailto:anitha.boyapati at gmail.com] > Sent: Thursday, November 08, 2012 10:22 AM > To: Sebastien

[LLVMdev] Use of movupd instead of movapd for x86

2011 Feb 28

[LLVMdev] Use of movupd instead of movapd for x86

Understood for the aligned case, I want to measure performance degradation for unaligned case. I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction it will raise an exception (SEGFAULT), I just want to know if there

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

Hi Sebastien, ARMv7-M has VFMA and LLVM's "triple" is far from perfect. Wikipedia tells me NovaThor can also be A15, or STE could have cramped a VFPv4 in it? ;) Or possibly, your code never branches into the VFMA. Many things could be happening, but usually, VFMA shouldn't be generated for A9. A GCC bug, maybe? On 9 November 2012 16:51, Sebastien DELDON-GNB

[LLVMdev] fmac generation for cortex-a9

2012 Nov 09

[LLVMdev] fmac generation for cortex-a9

AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA isn't a fused multiply-add, it's a multiply followed by an add and has different latency as well as precision. On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Anitha,

[LLVMdev] Use of movupd instead of movapd for x86

2011 Mar 01

[LLVMdev] Use of movupd instead of movapd for x86

On Feb 28, 2011, at 2:58 AM, Sebastien DELDON-GNB wrote: > Understood for the aligned case, I want to measure performance degradation for unaligned case. > I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction

[LLVMdev] RE : Question about LLVM NEON intrinsics

2012 Sep 21

[LLVMdev] RE : Question about LLVM NEON intrinsics

Hi Eli, Thanks for the answer, it clarifies the situation for me. Do you know if there is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ? Or shall I define my own intrinsics for non supported types ? Best Regards Seb ________________________________________ De : Eli Friedman [eli.friedman at gmail.com] Date d'envoi : vendredi 21 septembre 2012 11:54 À : Sebastien

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 12

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > If this helps taking your decision, there are at least two benchmarks for > which disabling vmlx-forwarding makes a significant difference. > I think Evan's worry was to base this decision on visible and comprehensible benchmarks, such as the test-suite. If I get lucky I may be able to run

[LLVMdev] Use of movupd instead of movapd for x86

2011 Feb 25

[LLVMdev] Use of movupd instead of movapd for x86

Hi all, Is there a way to force llc to generate movupd instruction instead of movapd for x86 target ? I know that movapd is more performant, but I would like to measure degradation when alignment constraints are not met. Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL:

[LLVMdev] llc support for ARM predication ?

2012 Jun 04

[LLVMdev] llc support for ARM predication ?

Hi James, Thanks for the answer, for Cortex-A9 would you recommend to generate thumb2 code or ARM code ? What would be the best performance wise ? Best Regards Seb > -----Original Message----- > From: James Molloy [mailto:james.molloy at arm.com] > Sent: Thursday, May 31, 2012 9:57 AM > To: Sebastien DELDON-GNB > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] llc support

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 13

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Sebastien, How many extra vmlas did you see in 433.milc due to disabling -vmlx-forwarding? As I mentioned earlier, I saw only two additional integer vmlx instructions when I tested. Could you send me your 433.milc compile setup? (os, flags, compiler version, etc.). I'd like to try to reproduce your results. Cheers, Lang. On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin <renato.golin at

[LLVMdev] Question about arm thumb2 code generation

2012 Jul 27

[LLVMdev] Question about arm thumb2 code generation

Hi all, Does llc -march=thumb -mcpu=cortex-a9 enable generation of thumb2 code for armv7 ? Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120727/da758ea0/attachment.html>

[LLVMdev] Question about LLVM NEON intrinsics

2012 Sep 21

[LLVMdev] Question about LLVM NEON intrinsics

On Sep 21, 2012, at 2:58 AM, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: > Hi Eli, > > Thanks for the answer, it clarifies the situation for me. Do you know if there is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ? > Or shall I define my own intrinsics for non supported types ? You should never generate these sorts of intrinsics with

[LLVMdev] fmac generation for cortex-a9

2012 Nov 08

[LLVMdev] fmac generation for cortex-a9

Hi all, I've a .ll code that use double precision fmul/fadd or fmul/fsub. When I compile it using llc -mcpu=cortex-a9 I couldn't get vmla/vmls generated even using -fp-contract=fast, but when I use option -mtriple=armv7-eabi instead of -mcpu=cortex-a9 fused mac are generated. Can someone explain me why ? Thanks for your answers Seb -------------- next part -------------- An HTML

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 08

[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Renato, Thanks for the answer, it confirms what I was suspecting. My problem is that this behavior is controlled by vmlx forwarding on cortex-a9 for which despite asking on this list, I couldn't get a clear understanding what this option is meant for. So here are my new questions: Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee correctness or for performance

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

2013 Feb 15

[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?

Hi Lang & Renato, I eventually set up a panda board with latest linaro delivery (eabi-hf). I did some experiments using my own compiler and LLVM 3.2 as back-end. I use same flagset for my compiler (front-end) and just invoke llc with and without vmlx-forwarding attribute. So base arguments to llc are: llc -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard to which I added

[LLVMdev] Question about LLVM NEON intrinsics

2012 Sep 21

[LLVMdev] Question about LLVM NEON intrinsics

Hi all, I would like to know if LLVM Neon intrinsics are designed to support only 'Legal' types for NEON units. Using llc -march=arm -mcpu=cortex-a9 vmax4.ll -o vmax4.s on following ll code: ; ModuleID = 'vmax.ll' target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-n32" target triple =

similar to: [LLVMdev] ACE claims better result than LLVM for ARM 9 ?