Chandler Carruth
2014-Sep-05 16:38 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> wrote:> Unfortunately, another team, while doing internal testing has seen the > new path generating illegal insertps masks. A sample here: > > vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] > vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] > vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] > vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 > xmm4[0,1],xmm1[2],xmm4[3] > vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 > xmm6[0,1],xmm13[2],xmm6[3] > vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 > xmm7[0,1],xmm0[2],xmm7[3] > > We'll continue to look into this and do additional testing. >Interesting. Let me know if you get a test case. The insertps code path was added recently though and has been much less well tested. I'll start fuzz testing it and should hopefully uncover the bug. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/ad76da83/attachment.html>
Robert Lougher
2014-Sep-05 18:09 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com> wrote:> > On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> > wrote: >> >> Unfortunately, another team, while doing internal testing has seen the >> new path generating illegal insertps masks. A sample here: >> >> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3] >> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] >> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3] >> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 >> xmm4[0,1],xmm1[2],xmm4[3] >> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 >> xmm6[0,1],xmm13[2],xmm6[3] >> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 >> xmm7[0,1],xmm0[2],xmm7[3] >> >> We'll continue to look into this and do additional testing. > > > Interesting. Let me know if you get a test case. The insertps code path was > added recently though and has been much less well tested. I'll start fuzz > testing it and should hopefully uncover the bug.Here's two small test cases. Hope they are of use. Thanks, Rob. ------ define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) { %1 = extractelement <4 x float> %xyzw, i32 0 %2 = insertelement <4 x float> undef, float %1, i32 0 %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1 %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32 0, i32 1, i32 6, i32 undef> %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32 0, i32 1, i32 2, i32 4> ret <4 x float> %5 } define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) { %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32> <i32 0, i32 undef, i32 2, i32 4> %2 = shufflevector <4 x float> <float undef, float 0.000000e+00, float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1, i32 6, i32 7> ret <4 x float> %2 } llc -march=x86-64 -mattr=+avx test.ll -o - test: # @test vxorps %xmm2, %xmm2, %xmm2 vmovss %xmm0, %xmm2, %xmm2 vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] retl test2: # @test2 vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] vxorps %xmm1, %xmm1, %xmm1 vblendps $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3] retl llc -march=x86-64 -mattr=+avx -x86-experimental-vector-shuffle-lowering test.ll -o - test: # @test vinsertps $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero vinsertps $416, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] retl test2: # @test2 vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] vxorps %xmm1, %xmm1, %xmm1 vinsertps $336, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3] retl
Quentin Colombet
2014-Sep-05 23:36 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler,
While doing the performance measurement on a Ivy Bridge, I ran into compile time
errors.
I saw a bunch of “cannot select" in the LLVM test suite with
-march=core-avx-i.
E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3
-march=core-avx-i with:
fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 = bitcast
0x7f91b99b0e10 [ORD=3] [ID=27]
0x7f91b99b0e10: v4i64 = insert_subvector 0x7f91b99a7210, 0x7f91b99a6d68,
0x7f91b99ace70 [ORD=2] [ID=25]
0x7f91b99a7210: v4i64 = undef [ID=15]
0x7f91b99a6d68: v2i64 = scalar_to_vector 0x7f91b99ab840 [ORD=2] [ID=23]
0x7f91b99ab840: i64 = AssertZext 0x7f91b99acc60, 0x7f91b99ac738 [ORD=2]
[ID=20]
0x7f91b99acc60: i64,ch = CopyFromReg 0x7f91b8d52820, 0x7f91b99a3a10
[ORD=2] [ID=16]
0x7f91b99a3a10: i64 = Register %vreg68 [ID=1]
0x7f91b99ace70: i64 = Constant<0> [ID=3]
In function: isamax0
clang: error: clang frontend command failed with exit code 70 (use -v to see
invocation)
clang version 3.6.0 (215249)
Target: x86_64-apple-darwin14.0.0
For some reason, I cannot reproduce the problem with the test case that clang
gives me using -emit-llvm. Since the source is public, I guess you can try to
reproduce on your side.
Indeed, if you run the test-suite with -march=core-avx-i you’ll likely see all
those failures.
Let me know if you cannot and I’ll try harder to produce a test case.
Note: This is the same failure all over the place, i.e., cannot select a bit
cast from various types to v4i32 or v4i64.
Thanks,
-Quentin
> On Sep 5, 2014, at 11:09 AM, Robert Lougher <rob.lougher@
> gmail.com> wrote:
>
> Hi Chandler,
>
> On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com
<mailto:chandlerc at gmail.com>> wrote:
>>
>> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at
gmail.com>
>> wrote:
>>>
>>> Unfortunately, another team, while doing internal testing has seen
the
>>> new path generating illegal insertps masks. A sample here:
>>>
>>> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 =
xmm0[0],xmm13[1,2,3]
>>> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 =
xmm1[0],xmm0[1,2,3]
>>> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 =
xmm13[0],xmm1[1,2,3]
>>> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 >>>
xmm4[0,1],xmm1[2],xmm4[3]
>>> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 >>>
xmm6[0,1],xmm13[2],xmm6[3]
>>> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 >>>
xmm7[0,1],xmm0[2],xmm7[3]
>>>
>>> We'll continue to look into this and do additional testing.
>>
>>
>> Interesting. Let me know if you get a test case. The insertps code path
was
>> added recently though and has been much less well tested. I'll
start fuzz
>> testing it and should hopefully uncover the bug.
>
> Here's two small test cases. Hope they are of use.
>
> Thanks,
> Rob.
>
> ------
> define <4 x float> @test(<4 x float> %xyzw, <4 x float>
%abcd) {
> %1 = extractelement <4 x float> %xyzw, i32 0
> %2 = insertelement <4 x float> undef, float %1, i32 0
> %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1
> %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x
i32> <i32
> 0, i32 1, i32 6, i32 undef>
> %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x
i32> <i32
> 0, i32 1, i32 2, i32 4>
> ret <4 x float> %5
> }
>
> define <4 x float> @test2(<4 x float> %xyzw, <4 x float>
%abcd) {
> %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4
x i32>
> <i32 0, i32 undef, i32 2, i32 4>
> %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,
> float undef, float undef>, <4 x float> %1, <4 x i32> <i32
4, i32 1,
> i32 6, i32 7>
> ret <4 x float> %2
> }
>
>
> llc -march=x86-64 -mattr=+avx test.ll -o -
>
> test: # @test
> vxorps %xmm2, %xmm2, %xmm2
> vmovss %xmm0, %xmm2, %xmm2
> vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
> retl
>
> test2: # @test2
> vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
> vxorps %xmm1, %xmm1, %xmm1
> vblendps $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
> retl
>
> llc -march=x86-64 -mattr=+avx
> -x86-experimental-vector-shuffle-lowering test.ll -o -
>
> test: # @test
> vinsertps $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero
> vinsertps $416, %xmm0, %xmm2, %xmm0 # xmm0 =
xmm2[0,1],xmm0[2],xmm2[3]
> vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
> retl
>
> test2: # @test2
> vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
> vxorps %xmm1, %xmm1, %xmm1
> vinsertps $336, %xmm1, %xmm0, %xmm0 # xmm0 =
xmm0[0],xmm1[1],xmm0[2,3]
> retl
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/626d7652/attachment.html>
Chandler Carruth
2014-Sep-06 10:47 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
FYI, this is all fixed. =] Sorry for the trouble, was a silly goof that should have been caught sooner. On Fri, Sep 5, 2014 at 11:09 AM, Robert Lougher <rob.lougher at gmail.com> wrote:> Hi Chandler, > > On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com> wrote: > > > > On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com> > > wrote: > >> > >> Unfortunately, another team, while doing internal testing has seen the > >> new path generating illegal insertps masks. A sample here: > >> > >> vinsertps $256, %xmm0, %xmm13, %xmm4 # xmm4 > xmm0[0],xmm13[1,2,3] > >> vinsertps $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3] > >> vinsertps $256, %xmm13, %xmm1, %xmm7 # xmm7 > xmm13[0],xmm1[1,2,3] > >> vinsertps $416, %xmm1, %xmm4, %xmm14 # xmm14 > >> xmm4[0,1],xmm1[2],xmm4[3] > >> vinsertps $416, %xmm13, %xmm6, %xmm13 # xmm13 > >> xmm6[0,1],xmm13[2],xmm6[3] > >> vinsertps $416, %xmm0, %xmm7, %xmm0 # xmm0 > >> xmm7[0,1],xmm0[2],xmm7[3] > >> > >> We'll continue to look into this and do additional testing. > > > > > > Interesting. Let me know if you get a test case. The insertps code path > was > > added recently though and has been much less well tested. I'll start fuzz > > testing it and should hopefully uncover the bug. > > Here's two small test cases. Hope they are of use. > > Thanks, > Rob. > > ------ > define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd) { > %1 = extractelement <4 x float> %xyzw, i32 0 > %2 = insertelement <4 x float> undef, float %1, i32 0 > %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1 > %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x i32> <i32 > 0, i32 1, i32 6, i32 undef> > %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x i32> <i32 > 0, i32 1, i32 2, i32 4> > ret <4 x float> %5 > } > > define <4 x float> @test2(<4 x float> %xyzw, <4 x float> %abcd) { > %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x i32> > <i32 0, i32 undef, i32 2, i32 4> > %2 = shufflevector <4 x float> <float undef, float 0.000000e+00, > float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4, i32 1, > i32 6, i32 7> > ret <4 x float> %2 > } > > > llc -march=x86-64 -mattr=+avx test.ll -o - > > test: # @test > vxorps %xmm2, %xmm2, %xmm2 > vmovss %xmm0, %xmm2, %xmm2 > vblendps $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3] > vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > retl > > test2: # @test2 > vinsertps $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > vxorps %xmm1, %xmm1, %xmm1 > vblendps $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3] > retl > > llc -march=x86-64 -mattr=+avx > -x86-experimental-vector-shuffle-lowering test.ll -o - > > test: # @test > vinsertps $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero > vinsertps $416, %xmm0, %xmm2, %xmm0 # xmm0 > xmm2[0,1],xmm0[2],xmm2[3] > vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > retl > > test2: # @test2 > vinsertps $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0] > vxorps %xmm1, %xmm1, %xmm1 > vinsertps $336, %xmm1, %xmm0, %xmm0 # xmm0 > xmm0[0],xmm1[1],xmm0[2,3] > retl > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140906/c7290f88/attachment.html>
Reasonably Related Threads
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
- [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!