Evan Cheng
2013-Feb-12  15:47 UTC
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
I did the initial work on vmla formation. The default settings for cortex-a8 / a9 due to micro-architecture difference (i believe a8 TRM talks about vmla hazards) and extensive testing. That said, given the limitation of the current pre-RA scheduling pass, it's likely the use of vmla can caused regressions. Im not opposed to changing the setting for a9. However, it's not a good idea to base the decision on one benchmark. I'd like to see minimally performance data of the entire llvm test suite. Evan Sent from my iPad On Feb 12, 2013, at 3:08 AM, Renato Golin <renato.golin at linaro.org> wrote:> On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at st.com> wrote: >> Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform). >> > > Hi Sebastien, > > The optimization does make sense for cortex-a9, I remember to have reviewed the patch myself and the A9 document clearly states the delays involved between VMLAs and that this was a solution. > > However, due to micro-architecture differences (as David explained), it may interfere with other non-Swift steps (or the lack of Swift steps) and produce worse code. It's not uncommon to see "is (isSwift())" around the code generation or optimization passes. > > I haven't done any benchmarking on that particular issue, but if you can show that the performance regression occur on more than one cortex-A9 core (ST, TI), than I'd be inclined to suggest only enable VMLx-forward by default on Swift. > > cheers, > --renato > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/3b45ca90/attachment.html>
Renato Golin
2013-Feb-12  15:53 UTC
[LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 12 February 2013 15:47, Evan Cheng <evan.cheng at apple.com> wrote:> Im not opposed to changing the setting for a9. >At least until we identify what is the problem and how to fix it, if it's another pass messing up the patterns. However, it's not a good idea to base the decision on one benchmark. I'd> like to see minimally performance data of the entire llvm test suite. >Absolutely. --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/2fef2c1e/attachment.html>
Sebastien DELDON-GNB
2013-Feb-12  16:56 UTC
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
If this helps taking your decision, there are at least two benchmarks for which disabling vmlx-forwarding makes a significant difference. If I get lucky I may be able to run on a panda board by next week and have more info to share Best Regards Seb ________________________________________ De : Evan Cheng [evan.cheng at apple.com] Date d'envoi : mardi 12 février 2013 16:47 À : Renato Golin Cc : Sebastien DELDON-GNB; llvmdev at cs.uiuc.edu Objet : Re: [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ? I did the initial work on vmla formation. The default settings for cortex-a8 / a9 due to micro-architecture difference (i believe a8 TRM talks about vmla hazards) and extensive testing. That said, given the limitation of the current pre-RA scheduling pass, it's likely the use of vmla can caused regressions. Im not opposed to changing the setting for a9. However, it's not a good idea to base the decision on one benchmark. I'd like to see minimally performance data of the entire llvm test suite. Evan Sent from my iPad On Feb 12, 2013, at 3:08 AM, Renato Golin <renato.golin at linaro.org<mailto:renato.golin at linaro.org>> wrote: On 12 February 2013 10:25, Sebastien DELDON-GNB <sebastien.deldon at st.com<mailto:sebastien.deldon at st.com>> wrote: Same architecture, different micro-arch (implementation). Could this be the case that vmlx-forwarding make senses for SWIFT and not for ARM Cortex-A9 implementation ? It is enabled by default when –mcpu=cortex-a9 is used but test have made show significant improvements when disabled for cortex-A9 (STEricsson Nova platform). Hi Sebastien, The optimization does make sense for cortex-a9, I remember to have reviewed the patch myself and the A9 document clearly states the delays involved between VMLAs and that this was a solution. However, due to micro-architecture differences (as David explained), it may interfere with other non-Swift steps (or the lack of Swift steps) and produce worse code. It's not uncommon to see "is (isSwift())" around the code generation or optimization passes. I haven't done any benchmarking on that particular issue, but if you can show that the performance regression occur on more than one cortex-A9 core (ST, TI), than I'd be inclined to suggest only enable VMLx-forward by default on Swift. cheers, --renato _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Renato Golin
2013-Feb-12  17:05 UTC
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:> If this helps taking your decision, there are at least two benchmarks for > which disabling vmlx-forwarding makes a significant difference. >I think Evan's worry was to base this decision on visible and comprehensible benchmarks, such as the test-suite. If I get lucky I may be able to run on a panda board by next week and have> more info to share >That'd be great, thanks! --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130212/ab837564/attachment.html>
Possibly Parallel Threads
- [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?