Displaying 20 results from an estimated 1000 matches similar to: "[LLVMdev] vmlx forwarding an cortex A9 question"
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato,
Thanks for the answer, it confirms what I was suspecting. My problem is that this behavior is controlled by vmlx forwarding on cortex-a9 for which despite asking on this list, I couldn't get a clear understanding what this option is meant for.
So here are my new questions:
Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to guarantee correctness or for performance
2013 Feb 11
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Bob, Seb, Renalto,
My VMLA performance work was on Swift, rather than Cortex-A9.
Sebastian - is vmlx-forwarding really the only variable you changed between
your tests?
As far as I can see the VMLx forwarding attribute only exists to restrict
the application of one DAG combine optimization: PerformVMULCombine in
ARMISelLowering.cpp, which turns (A + B) * C into (A * C) + (B * C). This
2013 Feb 15
2
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Lang & Renato,
I eventually set up a panda board with latest linaro delivery (eabi-hf). I did some experiments using my own compiler and LLVM 3.2 as back-end.
I use same flagset for my compiler (front-end) and just invoke llc with and without vmlx-forwarding attribute. So base arguments to llc are:
llc -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard
to which I added
2013 Feb 08
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 8 February 2013 12:28, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> Why for cortex-a9 vmlx-forwarding is enabled by default ? Is it to
> guarantee correctness or for performance purpose ? I’ve made some
> experiments and DISABLING vmlx-forwarding for cortex-a9 leads to generation
> of more vmla/vmls .f32 and significantly improve some benchmarks. I’ve not
>
2013 Feb 11
3
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato,
Indeed problem is with generation of vmla.f64. Affected benchmark is MILC from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on complete benchmark execution ! So it is worth a try. Now going back to vmla generation through LLMV intrinsic usage. I've looked at .td file and it seems to me that when there is a "pattern" to generate instruction, no
2013 Feb 11
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
In theory, the backend should choose the best instructions for the selected target processor. VMLA is not always the best choice. Lang Hames did some measurements a while back to come up with the current behavior, but I don't remember exactly what he found. CC'ing Lang.
On Feb 11, 2013, at 8:12 AM, Renato Golin <renato.golin at linaro.org> wrote:
> On 11 February 2013 15:51,
2012 Dec 05
1
[LLVMdev] vmlx forwarding option for Cortex-A9
Hi all,
Can someone explain me why is vmlx forwarding option enabled for cortex-a9 ? and what the purpose of it ?
Best Regards
Seb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20121205/6d6a7a03/attachment.html>
2013 Feb 12
3
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
I did the initial work on vmla formation. The default settings for cortex-a8 / a9 due to micro-architecture difference (i believe a8 TRM talks about vmla hazards) and extensive testing. That said, given the limitation of the current pre-RA scheduling pass, it's likely the use of vmla can caused regressions.
Im not opposed to changing the setting for a9. However, it's not a good idea to
2013 Feb 12
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi all,
Sorry for my naïve question but what is Swift ?
Yes vmlx-forwarding is the only variable I changed in my tests.
I did the experiment on another popular FP benchmark and observe a 14% speed-up only by disabling vmlx-forwarding.
Best Regards
Seb
My VMLA performance work was on Swift, rather than Cortex-A9.
Sebastian - is vmlx-forwarding really the only variable you changed between your
2013 Feb 13
0
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Sebastien,
How many extra vmlas did you see in 433.milc due to disabling
-vmlx-forwarding?
As I mentioned earlier, I saw only two additional integer vmlx instructions
when I tested.
Could you send me your 433.milc compile setup? (os, flags, compiler
version, etc.). I'd like to try to reproduce your results.
Cheers,
Lang.
On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin <renato.golin at
2013 Feb 15
0
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 15 February 2013 16:00, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> to which I added –mattr=-vmlx-forwarding to disable vmlx forwarding for
> cortex-a9.
>
> When I DISABLE vmlx forwarding I’m observing a 7% speed-up on ref dataset
> for MILC. So I’m observing something similar to what I’ve observed on STE
> platform available on SNOWBALL board.
>
Hi
2013 Feb 12
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Understood,
Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when -mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform).
Best Regards
Seb
From: David Tweed [mailto:david.tweed at
2013 Feb 12
3
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> If this helps taking your decision, there are at least two benchmarks for
> which disabling vmlx-forwarding makes a significant difference.
>
I think Evan's worry was to base this decision on visible and
comprehensible benchmarks, such as the test-suite.
If I get lucky I may be able to run
2013 Feb 11
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 11 February 2013 15:51, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> Indeed problem is with generation of vmla.f64. Affected benchmark is MILC
> from SPEC 2006 suite and disabling vmlx forwarding gives a 10% speed-up on
> complete benchmark execution ! So it is worth a try.
>
Hi Sebastien,
Ineed, worth having a look. Including Bob Wilson (who introduced the
2013 Dec 18
1
[LLVMdev] LLVM ARM VMLA instruction
Hi,
I was going through Code of LLVM instruction code generation for ARM. I
came across VMLA instruction hazards (Floating point multiply and
accumulate). I was comparing assembly code emitted by LLVM and GCC, where i
saw that GCC was happily using VMLA instruction for floating point while
LLVM never used it, instead it used a pair of VMUL and VADD instruction.
I wanted to know if there is any
2013 Feb 08
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 8 February 2013 10:40, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> Hi all,****
>
> ** **
>
> Everything is in the tile, I would like to enforce generation of vmla.f32
> instruction for scalar operations on cortex-a9, so is there a LLMV neon
> intrinsic available for that ?****
>
>
Hi Sebastien,
LLVM doesn't use intrinsics when there is a
2013 Feb 15
1
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato,
No I've used LNT before and it might not be as simple as you think to get it working here. I'll see what I can do, but It's unlikely I'll have much time to spend on this topic in the coming weeks.
I'm more interested coming back to my original question, and would like to know how to proceed if I want to define my own LLVM intrinsic to generate VMLA instruction. My
2012 Nov 09
0
[LLVMdev] fmac generation for cortex-a9
cat /proc/cpuinfo ?
Are you sure it's generating VFMA and not VMLA?
On Fri, Nov 9, 2012 at 9:35 AM, Sebastien DELDON-GNB <
sebastien.deldon at st.com> wrote:
> Hi Renato,
>
> It's definitively not A15. Can this be the case that NEON units for
> cortex-A9 support it but isn't documented/recommended ?
> And as mentioned before code is working !
>
> Seb
>
2013 Feb 08
2
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi all,
Everything is in the tile, I would like to enforce generation of vmla.f32 instruction for scalar operations on cortex-a9, so is there a LLMV neon intrinsic available for that ?
Thanks for your answers
Best Regards
Seb
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
2013 Feb 12
0
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:
> Same architecture, different micro-arch (implementation). Could this be
> the case that vmlx-forwarding make senses for SWIFT and not for ARM
> Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is
> used but test have made show significant improvements when disabled for