Chandler Carruth
2014-Sep-05 16:38 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote:> Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] > vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] > vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 > xmm4[0,1],xmm1[2],xmm4[3] > vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 > xmm6[0,1],xmm13[2],xmm6[3] > vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 > xmm7[0,1],xmm0[2],xmm7[3] > > We'll continue to look into this and do additional testing. >Interesting. Let me know if you get a test case. The insertps code path was added recently though and has been much less well tested. I'll start fuzz testing it and should hopefully uncover the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/ad76da83/attachment.html>
Robert Lougher
2014-Sep-05 18:09 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com> wrote:> > On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> > wrote: >> >> Unfortunately, another team, while doing internal testing has seen the >> new path generating illegal insertps masks. A sample here: >> >> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 >> xmm4[0,1],xmm1[2],xmm4[3] >> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 >> xmm6[0,1],xmm13[2],xmm6[3] >> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 >> xmm7[0,1],xmm0[2],xmm7[3] >> >> We'll continue to look into this and do additional testing. > > > Interesting. Let me know if you get a test case. The insertps code path was > added recently though and has been much less well tested. I'll start fuzz > testing it and should hopefully uncover the bug.Here's two small test cases. Hope they are of use. Thanks, Rob. ------ define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) { %1 = extractelement <4 x float> %xyzw, i32 0 %2 = insertelement <4 x float> undef, float %1, i32 0 %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1 %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32 0, i32 1, i32 6, i32 undef> %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32 0, i32 1, i32 2, i32 4> ret <4 x float> %5 } define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) { %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32> <i32 0, i32 undef, i32 2, i32 4> %2 = shufflevector <4 x float> <float undef, float 0.000000e+00, float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1, i32 6, i32 7> ret <4 x float> %2 } llc -march=x86-64 -mattr=+avx test.ll -o - test: # @test vxorps %xmm2, %xmm2, %xmm2 vmovss %xmm0, %xmm2, %xmm2 vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] retl test2: # @test2 vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] vxorps %xmm1, %xmm1, %xmm1 vblendps $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3] retl llc -march=x86-64 -mattr=+avx -x86-experimental-vector-shuffle-lowering test.ll -o - test: # @test vinsertps $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero vinsertps $416, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] retl test2: # @test2 vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] vxorps %xmm1, %xmm1, %xmm1 vinsertps $336, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3] retl
Quentin Colombet
2014-Sep-05 23:36 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, While doing the performance measurement on a Ivy Bridge, I ran into compile time errors. I saw a bunch of “cannot select" in the LLVM test suite with -march=core-avx-i. E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3 -march=core-avx-i with: fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 = bitcast 0x7f91b99b0e10 [ORD=3] [ID=27] 0x7f91b99b0e10: v4i64 = insert_subvector 0x7f91b99a7210, 0x7f91b99a6d68, 0x7f91b99ace70 [ORD=2] [ID=25] 0x7f91b99a7210: v4i64 = undef [ID=15] 0x7f91b99a6d68: v2i64 = scalar_to_vector 0x7f91b99ab840 [ORD=2] [ID=23] 0x7f91b99ab840: i64 = AssertZext 0x7f91b99acc60, 0x7f91b99ac738 [ORD=2] [ID=20] 0x7f91b99acc60: i64,ch = CopyFromReg 0x7f91b8d52820, 0x7f91b99a3a10 [ORD=2] [ID=16] 0x7f91b99a3a10: i64 = Register %vreg68 [ID=1] 0x7f91b99ace70: i64 = Constant<0> [ID=3] In function: isamax0 clang: error: clang frontend command failed with exit code 70 (use -v to see invocation) clang version 3.6.0 (215249) Target: x86_64-apple-darwin14.0.0 For some reason, I cannot reproduce the problem with the test case that clang gives me using -emit-llvm. Since the source is public, I guess you can try to reproduce on your side. Indeed, if you run the test-suite with -march=core-avx-i you’ll likely see all those failures. Let me know if you cannot and I’ll try harder to produce a test case. Note: This is the same failure all over the place, i.e., cannot select a bit cast from various types to v4i32 or v4i64. Thanks, -Quentin> On Sep 5, 2014, at 11:09 AM, Robert Lougher <rob.lougher@ > gmail.com> wrote: > > Hi Chandler, > > On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com <mailto:chandlerc at gmail.com>> wrote: >> >> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> >> wrote: >>> >>> Unfortunately, another team, while doing internal testing has seen the >>> new path generating illegal insertps masks. A sample here: >>> >>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 >>> xmm4[0,1],xmm1[2],xmm4[3] >>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 >>> xmm6[0,1],xmm13[2],xmm6[3] >>> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 >>> xmm7[0,1],xmm0[2],xmm7[3] >>> >>> We'll continue to look into this and do additional testing. >> >> >> Interesting. Let me know if you get a test case. The insertps code path was >> added recently though and has been much less well tested. I'll start fuzz >> testing it and should hopefully uncover the bug. > > Here's two small test cases. Hope they are of use. > > Thanks, > Rob. > > ------ > define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) { > %1 = extractelement <4 x float> %xyzw, i32 0 > %2 = insertelement <4 x float> undef, float %1, i32 0 > %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1 > %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32 > 0, i32 1, i32 6, i32 undef> > %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32 > 0, i32 1, i32 2, i32 4> > ret <4 x float> %5 > } > > define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) { > %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32> > <i32 0, i32 undef, i32 2, i32 4> > %2 = shufflevector <4 x float> <float undef, float 0.000000e+00, > float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1, > i32 6, i32 7> > ret <4 x float> %2 > } > > > llc -march=x86-64 -mattr=+avx test.ll -o - > > test: # @test > vxorps %xmm2, %xmm2, %xmm2 > vmovss %xmm0, %xmm2, %xmm2 > vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] > vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > retl > > test2: # @test2 > vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > vxorps %xmm1, %xmm1, %xmm1 > vblendps $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3] > retl > > llc -march=x86-64 -mattr=+avx > -x86-experimental-vector-shuffle-lowering test.ll -o - > > test: # @test > vinsertps $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero > vinsertps $416, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] > vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > retl > > test2: # @test2 > vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > vxorps %xmm1, %xmm1, %xmm1 > vinsertps $336, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3] > retl > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu> http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/626d7652/attachment.html>
Chandler Carruth
2014-Sep-06 10:47 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
FYI, this is all fixed. =] Sorry for the trouble, was a silly goof that should have been caught sooner. On Fri, Sep 5, 2014 at 11:09 AM, Robert Lougher <rob.lougher at gmail.com> wrote:> Hi Chandler, > > On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com> wrote: > > > > On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> > > wrote: > >> > >> Unfortunately, another team, while doing internal testing has seen the > >> new path generating illegal insertps masks. A sample here: > >> > >> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 > xmm0[0],xmm13[1,2,3] > >> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] > >> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 > xmm13[0],xmm1[1,2,3] > >> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 > >> xmm4[0,1],xmm1[2],xmm4[3] > >> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 > >> xmm6[0,1],xmm13[2],xmm6[3] > >> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 > >> xmm7[0,1],xmm0[2],xmm7[3] > >> > >> We'll continue to look into this and do additional testing. > > > > > > Interesting. Let me know if you get a test case. The insertps code path > was > > added recently though and has been much less well tested. I'll start fuzz > > testing it and should hopefully uncover the bug. > > Here's two small test cases. Hope they are of use. > > Thanks, > Rob. > > ------ > define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) { > %1 = extractelement <4 x float> %xyzw, i32 0 > %2 = insertelement <4 x float> undef, float %1, i32 0 > %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1 > %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32 > 0, i32 1, i32 6, i32 undef> > %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32 > 0, i32 1, i32 2, i32 4> > ret <4 x float> %5 > } > > define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) { > %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32> > <i32 0, i32 undef, i32 2, i32 4> > %2 = shufflevector <4 x float> <float undef, float 0.000000e+00, > float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1, > i32 6, i32 7> > ret <4 x float> %2 > } > > > llc -march=x86-64 -mattr=+avx test.ll -o - > > test: # @test > vxorps %xmm2, %xmm2, %xmm2 > vmovss %xmm0, %xmm2, %xmm2 > vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] > vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > retl > > test2: # @test2 > vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > vxorps %xmm1, %xmm1, %xmm1 > vblendps $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3] > retl > > llc -march=x86-64 -mattr=+avx > -x86-experimental-vector-shuffle-lowering test.ll -o - > > test: # @test > vinsertps $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero > vinsertps $416, %xmm0, %xmm2, %xmm0 # xmm0 > xmm2[0,1],xmm0[2],xmm2[3] > vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > retl > > test2: # @test2 > vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > vxorps %xmm1, %xmm1, %xmm1 > vinsertps $336, %xmm1, %xmm0, %xmm0 # xmm0 > xmm0[0],xmm1[1],xmm0[2,3] > retl > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140906/c7290f88/attachment.html>
Possibly Parallel Threads
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!