thr3ads.net - llvm dev - [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon! [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Quentin Colombet

2014-Sep-05 23:36 UTC

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler,

While doing the performance measurement on a Ivy Bridge, I ran into compile time
errors.

I saw a bunch of “cannot select" in the LLVM test suite with
-march=core-avx-i.
E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3
-march=core-avx-i with:
fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 = bitcast
0x7f91b99b0e10 [ORD=3] [ID=27]
  0x7f91b99b0e10: v4i64 = insert_subvector 0x7f91b99a7210, 0x7f91b99a6d68,
0x7f91b99ace70 [ORD=2] [ID=25]
    0x7f91b99a7210: v4i64 = undef [ID=15]
    0x7f91b99a6d68: v2i64 = scalar_to_vector 0x7f91b99ab840 [ORD=2] [ID=23]
      0x7f91b99ab840: i64 = AssertZext 0x7f91b99acc60, 0x7f91b99ac738 [ORD=2]
[ID=20]
        0x7f91b99acc60: i64,ch = CopyFromReg 0x7f91b8d52820, 0x7f91b99a3a10
[ORD=2] [ID=16]
          0x7f91b99a3a10: i64 = Register %vreg68 [ID=1]
    0x7f91b99ace70: i64 = Constant<0> [ID=3]
In function: isamax0
clang: error: clang frontend command failed with exit code 70 (use -v to see
invocation)
clang version 3.6.0 (215249)
Target: x86_64-apple-darwin14.0.0

For some reason, I cannot reproduce the problem with the test case that clang
gives me using -emit-llvm. Since the source is public, I guess you can try to
reproduce on your side.
Indeed, if you run the test-suite with -march=core-avx-i you’ll likely see all
those failures.

Let me know if you cannot and I’ll try harder to produce a test case.

Note: This is the same failure all over the place, i.e., cannot select a bit
cast from various types to v4i32 or v4i64.

Thanks,
-Quentin

> On Sep 5, 2014, at 11:09 AM, Robert Lougher <rob.lougher@
> gmail.com> wrote:
> 
> Hi Chandler,
> 
> On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com
<mailto:chandlerc at gmail.com>> wrote:
>> 
>> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at
gmail.com>
>> wrote:
>>> 
>>> Unfortunately, another team, while doing internal testing has seen
the
>>> new path generating illegal insertps masks.  A sample here:
>>> 
>>>    vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 =
xmm0[0],xmm13[1,2,3]
>>>    vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 =
xmm1[0],xmm0[1,2,3]
>>>    vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 =
xmm13[0],xmm1[1,2,3]
>>>    vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 >>>
xmm4[0,1],xmm1[2],xmm4[3]
>>>    vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 >>>
xmm6[0,1],xmm13[2],xmm6[3]
>>>    vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 >>>
xmm7[0,1],xmm0[2],xmm7[3]
>>> 
>>> We'll continue to look into this and do additional testing.
>> 
>> 
>> Interesting. Let me know if you get a test case. The insertps code path
was
>> added recently though and has been much less well tested. I'll
start fuzz
>> testing it and should hopefully uncover the bug.
> 
> Here's two small test cases.  Hope they are of use.
> 
> Thanks,
> Rob.
> 
> ------
> define <4 x float> @test(<4 x float> %xyzw, <4 x float>
%abcd) {
>  %1 = extractelement <4 x float> %xyzw, i32 0
>  %2 = insertelement <4 x float> undef, float %1, i32 0
>  %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1
>  %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x
i32> <i32
> 0, i32 1, i32 6, i32 undef>
>  %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x
i32> <i32
> 0, i32 1, i32 2, i32 4>
>  ret <4 x float> %5
> }
> 
> define <4 x float> @test2(<4 x float> %xyzw, <4 x float>
%abcd) {
>  %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4
x i32>
> <i32 0, i32 undef, i32 2, i32 4>
>  %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,
> float undef, float undef>, <4 x float> %1, <4 x i32> <i32
4, i32 1,
> i32 6, i32 7>
>  ret <4 x float> %2
> }
> 
> 
> llc -march=x86-64 -mattr=+avx test.ll -o -
> 
> test:                                   # @test
>    vxorps    %xmm2, %xmm2, %xmm2
>    vmovss    %xmm0, %xmm2, %xmm2
>    vblendps    $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
>    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    retl
> 
> test2:                                  # @test2
>    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    vxorps    %xmm1, %xmm1, %xmm1
>    vblendps    $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
>    retl
> 
> llc -march=x86-64 -mattr=+avx
> -x86-experimental-vector-shuffle-lowering test.ll -o -
> 
> test:                                   # @test
>    vinsertps    $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero
>    vinsertps    $416, %xmm0, %xmm2, %xmm0 # xmm0 =
xmm2[0,1],xmm0[2],xmm2[3]
>    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    retl
> 
> test2:                                  # @test2
>    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    vxorps    %xmm1, %xmm1, %xmm1
>    vinsertps    $336, %xmm1, %xmm0, %xmm0 # xmm0 =
xmm0[0],xmm1[1],xmm0[2,3]
>    retl
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/626d7652/attachment.html>

Chandler Carruth

2014-Sep-06 23:07 UTC

head link

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

I'm having trouble reproducing this. I'm trying to get LNT to actually
run,
but manually compiling the given source file didn't reproduce it for me.

It might have been fixed recently (although I'd be surprised if so), but it
would help to get the actual command line for which compiling this file in
the test suite failed.

-Chandler

On Fri, Sep 5, 2014 at 4:36 PM, Quentin Colombet <qcolombet at apple.com>
wrote:
> Hi Chandler,
>
> While doing the performance measurement on a Ivy Bridge, I ran into
> compile time errors.
>
> I saw a bunch of “cannot select" in the LLVM test suite with
> -march=core-avx-i.
> E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3
> -march=core-avx-i with:
> fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 >
bitcast 0x7f91b99b0e10 [ORD=3] [ID=27]
>   0x7f91b99b0e10: v4i64 = insert_subvector 0x7f91b99a7210, 0x7f91b99a6d68,
> 0x7f91b99ace70 [ORD=2] [ID=25]
>     0x7f91b99a7210: v4i64 = undef [ID=15]
>     0x7f91b99a6d68: v2i64 = scalar_to_vector 0x7f91b99ab840 [ORD=2] [ID=23]
>       0x7f91b99ab840: i64 = AssertZext 0x7f91b99acc60, 0x7f91b99ac738
> [ORD=2] [ID=20]
>         0x7f91b99acc60: i64,ch = CopyFromReg 0x7f91b8d52820,
> 0x7f91b99a3a10 [ORD=2] [ID=16]
>           0x7f91b99a3a10: i64 = Register %vreg68 [ID=1]
>     0x7f91b99ace70: i64 = Constant<0> [ID=3]
> In function: isamax0
> clang: error: clang frontend command failed with exit code 70 (use -v to
> see invocation)
> clang version 3.6.0 (215249)
> Target: x86_64-apple-darwin14.0.0
>
> For some reason, I cannot reproduce the problem with the test case that
> clang gives me using -emit-llvm. Since the source is public, I guess you
> can try to reproduce on your side.
> Indeed, if you run the test-suite with -march=core-avx-i you’ll likely see
> all those failures.
>
> Let me know if you cannot and I’ll try harder to produce a test case.
>
> Note: This is the same failure all over the place, i.e., cannot select a
> bit cast from various types to v4i32 or v4i64.
>
> Thanks,
> -Quentin
>
>
> On Sep 5, 2014, at 11:09 AM, Robert Lougher <rob.lougher@
>
> gmail.com> wrote:
>
> Hi Chandler,
>
> On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com>
wrote:
>
>
> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at
gmail.com>
> wrote:
>
>
> Unfortunately, another team, while doing internal testing has seen the
> new path generating illegal insertps masks.  A sample here:
>
>    vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>    vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>    vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>    vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 >
xmm4[0,1],xmm1[2],xmm4[3]
>    vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 >
xmm6[0,1],xmm13[2],xmm6[3]
>    vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 >
xmm7[0,1],xmm0[2],xmm7[3]
>
> We'll continue to look into this and do additional testing.
>
>
>
> Interesting. Let me know if you get a test case. The insertps code path was
> added recently though and has been much less well tested. I'll start
fuzz
> testing it and should hopefully uncover the bug.
>
>
> Here's two small test cases.  Hope they are of use.
>
> Thanks,
> Rob.
>
> ------
> define <4 x float> @test(<4 x float> %xyzw, <4 x float>
%abcd) {
>  %1 = extractelement <4 x float> %xyzw, i32 0
>  %2 = insertelement <4 x float> undef, float %1, i32 0
>  %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1
>  %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x
i32> <i32
> 0, i32 1, i32 6, i32 undef>
>  %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x
i32> <i32
> 0, i32 1, i32 2, i32 4>
>  ret <4 x float> %5
> }
>
> define <4 x float> @test2(<4 x float> %xyzw, <4 x float>
%abcd) {
>  %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4
x i32>
> <i32 0, i32 undef, i32 2, i32 4>
>  %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,
> float undef, float undef>, <4 x float> %1, <4 x i32> <i32
4, i32 1,
> i32 6, i32 7>
>  ret <4 x float> %2
> }
>
>
> llc -march=x86-64 -mattr=+avx test.ll -o -
>
> test:                                   # @test
>    vxorps    %xmm2, %xmm2, %xmm2
>    vmovss    %xmm0, %xmm2, %xmm2
>    vblendps    $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
>    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    retl
>
> test2:                                  # @test2
>    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    vxorps    %xmm1, %xmm1, %xmm1
>    vblendps    $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
>    retl
>
> llc -march=x86-64 -mattr=+avx
> -x86-experimental-vector-shuffle-lowering test.ll -o -
>
> test:                                   # @test
>    vinsertps    $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero
>    vinsertps    $416, %xmm0, %xmm2, %xmm0 # xmm0 >
xmm2[0,1],xmm0[2],xmm2[3]
>    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    retl
>
> test2:                                  # @test2
>    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    vxorps    %xmm1, %xmm1, %xmm1
>    vinsertps    $336, %xmm1, %xmm0, %xmm0 # xmm0 >
xmm0[0],xmm1[1],xmm0[2,3]
>    retl
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140906/f523b384/attachment.html>

Chandler Carruth

2014-Sep-06 23:27 UTC

head link

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

I've run the SingleSource test suite for core-avx-i and have no failures
here so a preprocessed file + commandline would be very useful if this
reproduces for you still.

On Sat, Sep 6, 2014 at 4:07 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:
> I'm having trouble reproducing this. I'm trying to get LNT to
actually
> run, but manually compiling the given source file didn't reproduce it
for
> me.
>
> It might have been fixed recently (although I'd be surprised if so),
but
> it would help to get the actual command line for which compiling this file
> in the test suite failed.
>
> -Chandler
>
> On Fri, Sep 5, 2014 at 4:36 PM, Quentin Colombet <qcolombet at
apple.com>
> wrote:
>
>> Hi Chandler,
>>
>> While doing the performance measurement on a Ivy Bridge, I ran into
>> compile time errors.
>>
>> I saw a bunch of “cannot select" in the LLVM test suite with
>> -march=core-avx-i.
>> E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3
>> -march=core-avx-i with:
>> fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32
>> bitcast 0x7f91b99b0e10 [ORD=3] [ID=27]
>>   0x7f91b99b0e10: v4i64 = insert_subvector 0x7f91b99a7210,
>> 0x7f91b99a6d68, 0x7f91b99ace70 [ORD=2] [ID=25]
>>     0x7f91b99a7210: v4i64 = undef [ID=15]
>>     0x7f91b99a6d68: v2i64 = scalar_to_vector 0x7f91b99ab840 [ORD=2]
>> [ID=23]
>>       0x7f91b99ab840: i64 = AssertZext 0x7f91b99acc60, 0x7f91b99ac738
>> [ORD=2] [ID=20]
>>         0x7f91b99acc60: i64,ch = CopyFromReg 0x7f91b8d52820,
>> 0x7f91b99a3a10 [ORD=2] [ID=16]
>>           0x7f91b99a3a10: i64 = Register %vreg68 [ID=1]
>>     0x7f91b99ace70: i64 = Constant<0> [ID=3]
>> In function: isamax0
>> clang: error: clang frontend command failed with exit code 70 (use -v
to
>> see invocation)
>> clang version 3.6.0 (215249)
>> Target: x86_64-apple-darwin14.0.0
>>
>> For some reason, I cannot reproduce the problem with the test case that
>> clang gives me using -emit-llvm. Since the source is public, I guess
you
>> can try to reproduce on your side.
>> Indeed, if you run the test-suite with -march=core-avx-i you’ll likely
>> see all those failures.
>>
>> Let me know if you cannot and I’ll try harder to produce a test case.
>>
>> Note: This is the same failure all over the place, i.e., cannot select
a
>> bit cast from various types to v4i32 or v4i64.
>>
>> Thanks,
>> -Quentin
>>
>>
>> On Sep 5, 2014, at 11:09 AM, Robert Lougher <rob.lougher@
>>
>> gmail.com> wrote:
>>
>> Hi Chandler,
>>
>> On 5 September 2014 17:38, Chandler Carruth <chandlerc at
gmail.com> wrote:
>>
>>
>> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at
gmail.com>
>> wrote:
>>
>>
>> Unfortunately, another team, while doing internal testing has seen the
>> new path generating illegal insertps masks.  A sample here:
>>
>>    vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 =
xmm0[0],xmm13[1,2,3]
>>    vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>    vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 =
xmm13[0],xmm1[1,2,3]
>>    vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 >>
xmm4[0,1],xmm1[2],xmm4[3]
>>    vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 >>
xmm6[0,1],xmm13[2],xmm6[3]
>>    vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 >>
xmm7[0,1],xmm0[2],xmm7[3]
>>
>> We'll continue to look into this and do additional testing.
>>
>>
>>
>> Interesting. Let me know if you get a test case. The insertps code path
>> was
>> added recently though and has been much less well tested. I'll
start fuzz
>> testing it and should hopefully uncover the bug.
>>
>>
>> Here's two small test cases.  Hope they are of use.
>>
>> Thanks,
>> Rob.
>>
>> ------
>> define <4 x float> @test(<4 x float> %xyzw, <4 x
float> %abcd) {
>>  %1 = extractelement <4 x float> %xyzw, i32 0
>>  %2 = insertelement <4 x float> undef, float %1, i32 0
>>  %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1
>>  %4 = shufflevector <4 x float> %3, <4 x float> %xyzw,
<4 x i32> <i32
>> 0, i32 1, i32 6, i32 undef>
>>  %5 = shufflevector <4 x float> %4, <4 x float> %abcd,
<4 x i32> <i32
>> 0, i32 1, i32 2, i32 4>
>>  ret <4 x float> %5
>> }
>>
>> define <4 x float> @test2(<4 x float> %xyzw, <4 x
float> %abcd) {
>>  %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd,
<4 x i32>
>> <i32 0, i32 undef, i32 2, i32 4>
>>  %2 = shufflevector <4 x float> <float undef, float
0.000000e+00,
>> float undef, float undef>, <4 x float> %1, <4 x i32>
<i32 4, i32 1,
>> i32 6, i32 7>
>>  ret <4 x float> %2
>> }
>>
>>
>> llc -march=x86-64 -mattr=+avx test.ll -o -
>>
>> test:                                   # @test
>>    vxorps    %xmm2, %xmm2, %xmm2
>>    vmovss    %xmm0, %xmm2, %xmm2
>>    vblendps    $4, %xmm0, %xmm2, %xmm0 # xmm0 =
xmm2[0,1],xmm0[2],xmm2[3]
>>    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>>    retl
>>
>> test2:                                  # @test2
>>    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>>    vxorps    %xmm1, %xmm1, %xmm1
>>    vblendps    $13, %xmm0, %xmm1, %xmm0 # xmm0 =
xmm0[0],xmm1[1],xmm0[2,3]
>>    retl
>>
>> llc -march=x86-64 -mattr=+avx
>> -x86-experimental-vector-shuffle-lowering test.ll -o -
>>
>> test:                                   # @test
>>    vinsertps    $270, %xmm0, %xmm0, %xmm2 # xmm2 =
xmm0[0],zero,zero,zero
>>    vinsertps    $416, %xmm0, %xmm2, %xmm0 # xmm0 >>
xmm2[0,1],xmm0[2],xmm2[3]
>>    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>>    retl
>>
>> test2:                                  # @test2
>>    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>>    vxorps    %xmm1, %xmm1, %xmm1
>>    vinsertps    $336, %xmm1, %xmm0, %xmm0 # xmm0 >>
xmm0[0],xmm1[1],xmm0[2,3]
>>    retl
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140906/bf72c3b4/attachment.html>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Sep 2014 - [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Maybe Matching Threads