Sebastien DELDON-GNB
2013-Feb-15 16:00 UTC
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Lang & Renato, I eventually set up a panda board with latest linaro delivery (eabi-hf). I did some experiments using my own compiler and LLVM 3.2 as back-end. I use same flagset for my compiler (front-end) and just invoke llc with and without vmlx-forwarding attribute. So base arguments to llc are: llc -march=arm -mcpu=cortex-a9 -mattr=+neon -float-abi=hard to which I added -mattr=-vmlx-forwarding to disable vmlx forwarding for cortex-a9. When I DISABLE vmlx forwarding I'm observing a 7% speed-up on ref dataset for MILC. So I'm observing something similar to what I've observed on STE platform available on SNOWBALL board. Hope this helps Best Regards Seb From: Lang Hames [mailto:lhames at gmail.com] Sent: Wednesday, February 13, 2013 8:31 AM To: Renato Golin Cc: Sebastien DELDON-GNB; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ? Hi Sebastien, How many extra vmlas did you see in 433.milc due to disabling -vmlx-forwarding? As I mentioned earlier, I saw only two additional integer vmlx instructions when I tested. Could you send me your 433.milc compile setup? (os, flags, compiler version, etc.). I'd like to try to reproduce your results. Cheers, Lang. On Tue, Feb 12, 2013 at 9:05 AM, Renato Golin <renato.golin at linaro.org<mailto:renato.golin at linaro.org>> wrote: On 12 February 2013 16:56, Sebastien DELDON-GNB <sebastien.deldon at st.com<mailto:sebastien.deldon at st.com>> wrote: If this helps taking your decision, there are at least two benchmarks for which disabling vmlx-forwarding makes a significant difference. I think Evan's worry was to base this decision on visible and comprehensible benchmarks, such as the test-suite. If I get lucky I may be able to run on a panda board by next week and have more info to share That'd be great, thanks! --renato _______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu<mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130215/9520c1bb/attachment.html>
Renato Golin
2013-Feb-15 16:16 UTC
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
On 15 February 2013 16:00, Sebastien DELDON-GNB <sebastien.deldon at st.com>wrote:> to which I added –mattr=-vmlx-forwarding to disable vmlx forwarding for > cortex-a9. > > When I DISABLE vmlx forwarding I’m observing a 7% speed-up on ref dataset > for MILC. So I’m observing something similar to what I’ve observed on STE > platform available on SNOWBALL board. >Hi Seb, Thanks for doing this, as we expected, it's a general A9 issue. Ok, now we need to get the LNT going. Have you ever done that? It's quite simple. Just follow this guide: http://llvm.org/docs/lnt/quickstart.html Optionally, compile LLVM+Clang and use the attached run.sh file to run the LNT (changing the paths accordingly). That should take a few hours and, if you have the Perf server running, will allow you to run it twice (with and without vmlx) and compare the results. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130215/c249aa4e/attachment.html> -------------- next part -------------- A non-text attachment was scrubbed... Name: run.sh Type: application/x-sh Size: 281 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130215/c249aa4e/attachment.sh>
Sebastien DELDON-GNB
2013-Feb-15 17:06 UTC
[LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
Hi Renato, No I've used LNT before and it might not be as simple as you think to get it working here. I'll see what I can do, but It's unlikely I'll have much time to spend on this topic in the coming weeks. I'm more interested coming back to my original question, and would like to know how to proceed if I want to define my own LLVM intrinsic to generate VMLA instruction. My goal is not to get my work committed to LLVM trunk and thus pollute community work with intrinsics that are only useful to me. Thanks for your answer Best Regards Seb From: Renato Golin [mailto:renato.golin at linaro.org] Sent: Friday, February 15, 2013 5:17 PM To: Sebastien DELDON-GNB Cc: Lang Hames; llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ? On 15 February 2013 16:00, Sebastien DELDON-GNB <sebastien.deldon at st.com<mailto:sebastien.deldon at st.com>> wrote: to which I added -mattr=-vmlx-forwarding to disable vmlx forwarding for cortex-a9. When I DISABLE vmlx forwarding I'm observing a 7% speed-up on ref dataset for MILC. So I'm observing something similar to what I've observed on STE platform available on SNOWBALL board. Hi Seb, Thanks for doing this, as we expected, it's a general A9 issue. Ok, now we need to get the LNT going. Have you ever done that? It's quite simple. Just follow this guide: http://llvm.org/docs/lnt/quickstart.html Optionally, compile LLVM+Clang and use the attached run.sh file to run the LNT (changing the paths accordingly). That should take a few hours and, if you have the Perf server running, will allow you to run it twice (with and without vmlx) and compare the results. cheers, --renato -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130215/fefd807b/attachment.html>
Reasonably Related Threads
- [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] RE : Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?
- [LLVMdev] Is there any llvm neon intrinsic that maps to vmla.f32 instruction ?