Sebastien DELDON-GNB
2013-Feb-12 10:25 UTC
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Understood, Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when -mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform). Best Regards Seb From: David Tweed [mailto:david.tweed at arm.com] Sent: Tuesday, February 12, 2013 11:11 AM To: Sebastien DELDON-GNB; Lang Hames; Bob Wilson Cc: llvmdev at cs.uiuc.edu Subject: RE: [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ? | Sorry for my naïve question but what is Swift ? It's a complicated area. There's the standard Cortex-a9 design from ARM, Swift is the CPU that Apple that's used in their latest products that is significantly modified from a basic ARM design and then there's the next generation Cortex-a15 design from ARM. Each of them handles the same instruction set, but the implementation detaiis of each mean that different instruction sequences may perform better on each. Cheers, Dave -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/2472f5ef/attachment.html>
Renato Golin
2013-Feb-12 11:08 UTC
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:> Same architecture, different micro-arch (implementation). Could this be > the case that vmlx-forwarding make senses for SWIFT and not for ARM > Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is > used but test have made show significant improvements when disabled for > cortex-A9 (STEricsson Nova platform). >Hi Sebastien, The optimization does make sense for cortex-a9, I remember to have reviewed the patch myself and the A9 document clearly states the delays involved between VMLAs and that this was a solution. However, due to micro-architecture differences (as David explained), it may interfere with other non-Swift steps (or the lack of Swift steps) and produce worse code. It's not uncommon to see "is (isSwift())" around the code generation or optimization passes. I haven't done any benchmarking on that particular issue, but if you can show that the performance regression occur on more than one cortex-A9 core (ST, TI), than I'd be inclined to suggest only enable VMLx-forward by default on Swift. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/54b55011/attachment.html>
Evan Cheng
2013-Feb-12 15:47 UTC
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
I did the initial work on vmla formation. The default settings for cortex-a8 / a9 due to micro-architecture difference (i believe a8 TRM talks about vmla hazards) and extensive testing. That said, given the limitation of the current pre-RA scheduling pass, it's likely the use of vmla can caused regressions. Im not opposed to changing the setting for a9. However, it's not a good idea to base the decision on one benchmark. I'd like to see minimally performance data of the entire llvm test suite. Evan Sent from my iPad On Feb 12, 2013, at 3:08 AM, Renato Golin <renato.golin at linaro.org> wrote:> On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: >> Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform). >> > > Hi Sebastien, > > The optimization does make sense for cortex-a9, I remember to have reviewed the patch myself and the A9 document clearly states the delays involved between VMLAs and that this was a solution. > > However, due to micro-architecture differences (as David explained), it may interfere with other non-Swift steps (or the lack of Swift steps) and produce worse code. It's not uncommon to see "is (isSwift())" around the code generation or optimization passes. > > I haven't done any benchmarking on that particular issue, but if you can show that the performance regression occur on more than one cortex-A9 core (ST, TI), than I'd be inclined to suggest only enable VMLx-forward by default on Swift. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/3b45ca90/attachment.html>
Maybe Matching Threads
- [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?