Chandler Carruth
2014-Sep-04 09:45 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Greetings all, As you may have noticed, there is a new vector shuffle lowering path in the X86 backend. You can try it out with the '-x86-experimental-vector-shuffle-lowering' flag to llc, or '-mllvm -x86-experimental-vector-shuffle-lowering' to clang. Please test it out! There may be some correctness bugs, I'm still fuzz testing it to shake them out. But I expect fairly few of those. I don't have any test cases which regress in performance with the new shuffle lowering. I have several which improve by 1-3%, and a couple which improve by 5-10%. YMMV. There are still some missing features: AVX2 shuffles, SSE4.1 blends, handling all possible uses of the "mov*" style shuffles. However, as indicated, I don't have any test cases on any micro architectures that are really showing regressions here. It's entirely possible I just don't have access to them, so please help me benchmark! Provided there aren't really terrible regressions in performance, I'd like to switch the default in a couple of days and start getting bug reports about what doesn't work yet. I've already talked to a couple of the regular contributors to the x86 backend and they seem pretty happy, so I just wanted to send a wider reaching email in case some folks had a chance to benchmark more. Inevitably, there will be some regressions, but they can be handled and fixed like anything else provided they don't cause lots of trouble for folks. Thanks, -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140904/447e6a6e/attachment.html>
Robert Lougher
2014-Sep-05 16:32 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, I've done some informal benchmarking on an AMD Jaguar core (amd16h) with and without the experimental flag. The tests were a mixture of FP and Integer tests. I didn't see any significant performance regression, with most of the differances being in the noise (less than 1%). One test, however, did show a performance improvement of ~4%. Unfortunately, another team, while doing internal testing has seen the new path generating illegal insertps masks. A sample here: vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 = xmm4[0,1],xmm1[2],xmm4[3] vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 xmm6[0,1],xmm13[2],xmm6[3] vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 = xmm7[0,1],xmm0[2],xmm7[3] We'll continue to look into this and do additional testing. Thanks, Rob. -- Robert Lougher SN Systems - Sony Computer Entertainment Group
Chandler Carruth
2014-Sep-05 16:38 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote:> Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] > vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] > vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 > xmm4[0,1],xmm1[2],xmm4[3] > vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 > xmm6[0,1],xmm13[2],xmm6[3] > vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 > xmm7[0,1],xmm0[2],xmm7[3] > > We'll continue to look into this and do additional testing. >Interesting. Let me know if you get a test case. The insertps code path was added recently though and has been much less well tested. I'll start fuzz testing it and should hopefully uncover the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/ad76da83/attachment.html>