similar to: [LLVMdev] Use of movupd instead of movapd for x86

Displaying 20 results from an estimated 2000 matches similar to: "[LLVMdev] Use of movupd instead of movapd for x86"

2011 Feb 28
2
[LLVMdev] Use of movupd instead of movapd for x86
Understood for the aligned case, I want to measure performance degradation for unaligned case. I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction it will raise an exception (SEGFAULT), I just want to know if there
2011 Mar 01
0
[LLVMdev] Use of movupd instead of movapd for x86
On Feb 28, 2011, at 2:58 AM, Sebastien DELDON-GNB wrote: > Understood for the aligned case, I want to measure performance degradation for unaligned case. > I mean unaligned case versus aligned. I know this is stupid, but I want to try to pass a <4 x float>* as parameter of a routine and at the call site I want to pass a misaligned pointer. Since LLVM is generating movapd instruction
2011 Feb 25
0
[LLVMdev] Use of movupd instead of movapd for x86
Sebastien DELDON-GNB <sebastien.deldon at st.com> writes: > Hi all, > > Is there a way to force llc to generate movupd instruction instead of movapd for x86 target ? > > I know that movapd is more performant, but I would like to measure degradation when alignment constraints are not met. On modern processors a movupd on aligned data is going to be indistinguishable in
2013 Jul 20
1
[LLVMdev] Another memory alignment issue with SSE operations
Unfortunately, I've ran into a second issue where addpd is being performed on memory that isn't 16 byte aligned. Again, this only happens if the createJIT OptLevel is set to Default (vs None). According to
2012 Nov 09
2
[LLVMdev] fmac generation for cortex-a9
Hi Renato, It's definitively not A15. Can this be the case that NEON units for cortex-A9 support it but isn't documented/recommended ? And as mentioned before code is working ! Seb > -----Original Message----- > From: rengolin at gmail.com [mailto:rengolin at gmail.com] On Behalf Of > Renato Golin > Sent: Friday, November 09, 2012 6:27 PM > To: Sebastien DELDON-GNB >
2012 Nov 09
2
[LLVMdev] fmac generation for cortex-a9
Hi Bastien, Weird gcc is generating fma for my platform STEricsson Novathor with Linaro, code works. It also works when I use LLVM to generate fma (using llc -mtriple=armv7-eabi). Maybe someone from ARM can answer the question ? Seb From: JF Bastien [mailto:jfb at google.com] Sent: Friday, November 09, 2012 5:36 PM To: Sebastien DELDON-GNB Cc: Anitha Boyapati; llvmdev at cs.uiuc.edu Subject:
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
cat /proc/cpuinfo ? Are you sure it's generating VFMA and not VMLA? On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Renato, > > It's definitively not A15. Can this be the case that NEON units for > cortex-A9 support it but isn't documented/recommended ? > And as mentioned before code is working ! > > Seb >
2012 Nov 08
2
[LLVMdev] fmac generation for cortex-a9
Hi Anitha, Thanks for your answer but -mcpu=cortex-a9 -mattr=+vfp4 doesn' t enable fused mac generation for me. I would like just to understand why -mtriple=armv7-eabi enables it while -mcpu=cortex-a9 seems to disable it ? Seb > -----Original Message----- > From: Anitha Boyapati [mailto:anitha.boyapati at gmail.com] > Sent: Thursday, November 08, 2012 10:22 AM > To: Sebastien
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
Hi Sebastien, ARMv7-M has VFMA and LLVM's "triple" is far from perfect. Wikipedia tells me NovaThor can also be A15, or STE could have cramped a VFPv4 in it? ;) Or possibly, your code never branches into the VFMA. Many things could be happening, but usually, VFMA shouldn't be generated for A9. A GCC bug, maybe? On 9 November 2012 16:51, Sebastien DELDON-GNB
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
AFAIK A9 doesn't have VFPv4 or AdvSIMDv2, so it doesn't have VFMA. I don't know what LLVM does, but it shouldn't emit VFMA when you target A9. VMLA isn't a fused multiply-add, it's a multiply followed by an add and has different latency as well as precision. On Thu, Nov 8, 2012 at 4:57 AM, Sebastien DELDON-GNB < sebastien.deldon at st.com> wrote: > Hi Anitha,
2013 Feb 12
3
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote: > If this helps taking your decision, there are at least two benchmarks for > which disabling vmlx-forwarding makes a significant difference. > I think Evan's worry was to base this decision on visible and comprehensible benchmarks, such as the test-suite. If I get lucky I may be able to run
2012 Sep 21
2
[LLVMdev] RE : Question about LLVM NEON intrinsics
Hi Eli, Thanks for the answer, it clarifies the situation for me. Do you know if there is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ? Or shall I define my own intrinsics for non supported types ? Best Regards Seb ________________________________________ De : Eli Friedman [eli.friedman at gmail.com] Date d'envoi : vendredi 21 septembre 2012 11:54 À : Sebastien
2013 Jul 15
3
[LLVMdev] Enabling the SLP vectorizer by default for -O3
On Jul 14, 2013, at 9:52 PM, Chris Lattner <clattner at apple.com> wrote: > > On Jul 13, 2013, at 11:30 PM, Nadav Rotem <nrotem at apple.com> wrote: > >> Hi, >> >> LLVM’s SLP-vectorizer is a new pass that combines similar independent instructions in a straight-line code. It is currently not enabled by default, and people who want to experiment with it
2013 Feb 13
0
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Sebastien, How many extra vmlas did you see in 433.milc due to disabling -vmlx-forwarding? As I mentioned earlier, I saw only two additional integer vmlx instructions when I tested. Could you send me your 433.milc compile setup? (os, flags, compiler version, etc.). I'd like to try to reproduce your results. Cheers, Lang. On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin <renato.golin at
2013 Feb 11
3
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato, Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete benchmark execution ! So it is worth a try. Now going back to vmla generation through LLMV intrinsic usage. I've looked at .td file and it seems to me that when there is a "pattern" to generate instruction, no
2012 Jun 04
1
[LLVMdev] llc support for ARM predication ?
Hi James, Thanks for the answer, for Cortex-A9 would you recommend to generate thumb2 code or ARM code ? What would be the best performance wise ? Best Regards Seb > -----Original Message----- > From: James Molloy [mailto:james.molloy at arm.com] > Sent: Thursday, May 31, 2012 9:57 AM > To: Sebastien DELDON-GNB > Cc: llvmdev at cs.uiuc.edu > Subject: Re: [LLVMdev] llc support
2012 Jul 27
2
[LLVMdev] Question about arm thumb2 code generation
Hi all, Does llc -march=thumb -mcpu=cortex-a9 enable generation of thumb2 code for armv7 ? Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120727/da758ea0/attachment.html>
2012 Nov 08
2
[LLVMdev] fmac generation for cortex-a9
Hi all, I've a .ll code that use double precision fmul/fadd or fmul/fsub. When I compile it using llc -mcpu=cortex-a9 I couldn't get vmla/vmls generated even using -fp-contract=fast, but when I use option -mtriple=armv7-eabi instead of -mcpu=cortex-a9 fused mac are generated. Can someone explain me why ? Thanks for your answers Seb -------------- next part -------------- An HTML
2012 Sep 21
0
[LLVMdev] Question about LLVM NEON intrinsics
On Sep 21, 2012, at 2:58 AM, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: > Hi Eli, > > Thanks for the answer, it clarifies the situation for me. Do you know if there is Pass in LLVM that could be adapted to 'legalize' intrinsics calls ? > Or shall I define my own intrinsics for non supported types ? You should never generate these sorts of intrinsics with
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi all, Everything is in the tile, I would like to enforce generation of vmla.f32 instruction for scalar operations on cortex-a9, so is there a LLMV neon intrinsic available for that ? Thanks for your answers Best Regards Seb -------------- next part -------------- An HTML attachment was scrubbed... URL: