thr3ads.net - llvm dev - [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon! [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth

2014-Sep-05 16:38 UTC

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at gmail.com>
wrote:
> Unfortunately, another team, while doing internal testing has seen the
> new path generating illegal insertps masks.  A sample here:
>
>     vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 = xmm0[0],xmm13[1,2,3]
>     vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>     vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 = xmm13[0],xmm1[1,2,3]
>     vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 >
xmm4[0,1],xmm1[2],xmm4[3]
>     vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 >
xmm6[0,1],xmm13[2],xmm6[3]
>     vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 >
xmm7[0,1],xmm0[2],xmm7[3]
>
> We'll continue to look into this and do additional testing.
>
Interesting. Let me know if you get a test case. The insertps code path was
added recently though and has been much less well tested. I'll start fuzz
testing it and should hopefully uncover the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/ad76da83/attachment.html>

Robert Lougher

2014-Sep-05 18:09 UTC

head link

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler,

On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com>
wrote:>
> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at
gmail.com>
> wrote:
>>
>> Unfortunately, another team, while doing internal testing has seen the
>> new path generating illegal insertps masks.  A sample here:
>>
>>     vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 =
xmm0[0],xmm13[1,2,3]
>>     vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 = xmm1[0],xmm0[1,2,3]
>>     vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 =
xmm13[0],xmm1[1,2,3]
>>     vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 >>
xmm4[0,1],xmm1[2],xmm4[3]
>>     vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 >>
xmm6[0,1],xmm13[2],xmm6[3]
>>     vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 >>
xmm7[0,1],xmm0[2],xmm7[3]
>>
>> We'll continue to look into this and do additional testing.
>
>
> Interesting. Let me know if you get a test case. The insertps code path was
> added recently though and has been much less well tested. I'll start
fuzz
> testing it and should hopefully uncover the bug.
Here's two small test cases.  Hope they are of use.

Thanks,
Rob.

------
define <4 x float> @test(<4 x float> %xyzw, <4 x float> %abcd)
{
  %1 = extractelement <4 x float> %xyzw, i32 0
  %2 = insertelement <4 x float> undef, float %1, i32 0
  %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1
  %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x
i32> <i32
0, i32 1, i32 6, i32 undef>
  %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x
i32> <i32
0, i32 1, i32 2, i32 4>
  ret <4 x float> %5
}

define <4 x float> @test2(<4 x float> %xyzw, <4 x float>
%abcd) {
  %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4 x
i32>
<i32 0, i32 undef, i32 2, i32 4>
  %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,
float undef, float undef>, <4 x float> %1, <4 x i32> <i32 4,
i32 1,
i32 6, i32 7>
  ret <4 x float> %2
}


llc -march=x86-64 -mattr=+avx test.ll -o -

test:                                   # @test
    vxorps    %xmm2, %xmm2, %xmm2
    vmovss    %xmm0, %xmm2, %xmm2
    vblendps    $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
    retl

test2:                                  # @test2
    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
    vxorps    %xmm1, %xmm1, %xmm1
    vblendps    $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
    retl

llc -march=x86-64 -mattr=+avx
-x86-experimental-vector-shuffle-lowering test.ll -o -

test:                                   # @test
    vinsertps    $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero
    vinsertps    $416, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
    retl

test2:                                  # @test2
    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
    vxorps    %xmm1, %xmm1, %xmm1
    vinsertps    $336, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
    retl

Quentin Colombet

2014-Sep-05 23:36 UTC

head link

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler,

While doing the performance measurement on a Ivy Bridge, I ran into compile time
errors.

I saw a bunch of “cannot select" in the LLVM test suite with
-march=core-avx-i.
E.g., SingleSource/UnitTests/Vector/SSE/sse.isamax.c is failing at O3
-march=core-avx-i with:
fatal error: error in backend: Cannot select: 0x7f91b99a6420: v4i32 = bitcast
0x7f91b99b0e10 [ORD=3] [ID=27]
  0x7f91b99b0e10: v4i64 = insert_subvector 0x7f91b99a7210, 0x7f91b99a6d68,
0x7f91b99ace70 [ORD=2] [ID=25]
    0x7f91b99a7210: v4i64 = undef [ID=15]
    0x7f91b99a6d68: v2i64 = scalar_to_vector 0x7f91b99ab840 [ORD=2] [ID=23]
      0x7f91b99ab840: i64 = AssertZext 0x7f91b99acc60, 0x7f91b99ac738 [ORD=2]
[ID=20]
        0x7f91b99acc60: i64,ch = CopyFromReg 0x7f91b8d52820, 0x7f91b99a3a10
[ORD=2] [ID=16]
          0x7f91b99a3a10: i64 = Register %vreg68 [ID=1]
    0x7f91b99ace70: i64 = Constant<0> [ID=3]
In function: isamax0
clang: error: clang frontend command failed with exit code 70 (use -v to see
invocation)
clang version 3.6.0 (215249)
Target: x86_64-apple-darwin14.0.0

For some reason, I cannot reproduce the problem with the test case that clang
gives me using -emit-llvm. Since the source is public, I guess you can try to
reproduce on your side.
Indeed, if you run the test-suite with -march=core-avx-i you’ll likely see all
those failures.

Let me know if you cannot and I’ll try harder to produce a test case.

Note: This is the same failure all over the place, i.e., cannot select a bit
cast from various types to v4i32 or v4i64.

Thanks,
-Quentin

> On Sep 5, 2014, at 11:09 AM, Robert Lougher <rob.lougher@
> gmail.com> wrote:
> 
> Hi Chandler,
> 
> On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com
<mailto:chandlerc at gmail.com>> wrote:
>> 
>> On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at
gmail.com>
>> wrote:
>>> 
>>> Unfortunately, another team, while doing internal testing has seen
the
>>> new path generating illegal insertps masks.  A sample here:
>>> 
>>>    vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 =
xmm0[0],xmm13[1,2,3]
>>>    vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 =
xmm1[0],xmm0[1,2,3]
>>>    vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 =
xmm13[0],xmm1[1,2,3]
>>>    vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 >>>
xmm4[0,1],xmm1[2],xmm4[3]
>>>    vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 >>>
xmm6[0,1],xmm13[2],xmm6[3]
>>>    vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 >>>
xmm7[0,1],xmm0[2],xmm7[3]
>>> 
>>> We'll continue to look into this and do additional testing.
>> 
>> 
>> Interesting. Let me know if you get a test case. The insertps code path
was
>> added recently though and has been much less well tested. I'll
start fuzz
>> testing it and should hopefully uncover the bug.
> 
> Here's two small test cases.  Hope they are of use.
> 
> Thanks,
> Rob.
> 
> ------
> define <4 x float> @test(<4 x float> %xyzw, <4 x float>
%abcd) {
>  %1 = extractelement <4 x float> %xyzw, i32 0
>  %2 = insertelement <4 x float> undef, float %1, i32 0
>  %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1
>  %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x
i32> <i32
> 0, i32 1, i32 6, i32 undef>
>  %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x
i32> <i32
> 0, i32 1, i32 2, i32 4>
>  ret <4 x float> %5
> }
> 
> define <4 x float> @test2(<4 x float> %xyzw, <4 x float>
%abcd) {
>  %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd, <4
x i32>
> <i32 0, i32 undef, i32 2, i32 4>
>  %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,
> float undef, float undef>, <4 x float> %1, <4 x i32> <i32
4, i32 1,
> i32 6, i32 7>
>  ret <4 x float> %2
> }
> 
> 
> llc -march=x86-64 -mattr=+avx test.ll -o -
> 
> test:                                   # @test
>    vxorps    %xmm2, %xmm2, %xmm2
>    vmovss    %xmm0, %xmm2, %xmm2
>    vblendps    $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
>    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    retl
> 
> test2:                                  # @test2
>    vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    vxorps    %xmm1, %xmm1, %xmm1
>    vblendps    $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
>    retl
> 
> llc -march=x86-64 -mattr=+avx
> -x86-experimental-vector-shuffle-lowering test.ll -o -
> 
> test:                                   # @test
>    vinsertps    $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero
>    vinsertps    $416, %xmm0, %xmm2, %xmm0 # xmm0 =
xmm2[0,1],xmm0[2],xmm2[3]
>    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    retl
> 
> test2:                                  # @test2
>    vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>    vxorps    %xmm1, %xmm1, %xmm1
>    vinsertps    $336, %xmm1, %xmm0, %xmm0 # xmm0 =
xmm0[0],xmm1[1],xmm0[2,3]
>    retl
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>        
http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
<http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140905/626d7652/attachment.html>

Chandler Carruth

2014-Sep-06 10:47 UTC

head link

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

FYI, this is all fixed. =] Sorry for the trouble, was a silly goof that
should have been caught sooner.

On Fri, Sep 5, 2014 at 11:09 AM, Robert Lougher <rob.lougher at gmail.com>
wrote:
> Hi Chandler,
>
> On 5 September 2014 17:38, Chandler Carruth <chandlerc at gmail.com>
wrote:
> >
> > On Fri, Sep 5, 2014 at 9:32 AM, Robert Lougher <rob.lougher at
gmail.com>
> > wrote:
> >>
> >> Unfortunately, another team, while doing internal testing has seen
the
> >> new path generating illegal insertps masks.  A sample here:
> >>
> >>     vinsertps    $256, %xmm0, %xmm13, %xmm4 # xmm4 >
xmm0[0],xmm13[1,2,3]
> >>     vinsertps    $256, %xmm1, %xmm0, %xmm6 # xmm6 =
xmm1[0],xmm0[1,2,3]
> >>     vinsertps    $256, %xmm13, %xmm1, %xmm7 # xmm7 >
xmm13[0],xmm1[1,2,3]
> >>     vinsertps    $416, %xmm1, %xmm4, %xmm14 # xmm14 > >>
xmm4[0,1],xmm1[2],xmm4[3]
> >>     vinsertps    $416, %xmm13, %xmm6, %xmm13 # xmm13 > >>
xmm6[0,1],xmm13[2],xmm6[3]
> >>     vinsertps    $416, %xmm0, %xmm7, %xmm0 # xmm0 > >>
xmm7[0,1],xmm0[2],xmm7[3]
> >>
> >> We'll continue to look into this and do additional testing.
> >
> >
> > Interesting. Let me know if you get a test case. The insertps code
path
> was
> > added recently though and has been much less well tested. I'll
start fuzz
> > testing it and should hopefully uncover the bug.
>
> Here's two small test cases.  Hope they are of use.
>
> Thanks,
> Rob.
>
> ------
> define <4 x float> @test(<4 x float> %xyzw, <4 x float>
%abcd) {
>   %1 = extractelement <4 x float> %xyzw, i32 0
>   %2 = insertelement <4 x float> undef, float %1, i32 0
>   %3 = insertelement <4 x float> %2, float 0.000000e+00, i32 1
>   %4 = shufflevector <4 x float> %3, <4 x float> %xyzw, <4 x
i32> <i32
> 0, i32 1, i32 6, i32 undef>
>   %5 = shufflevector <4 x float> %4, <4 x float> %abcd, <4 x
i32> <i32
> 0, i32 1, i32 2, i32 4>
>   ret <4 x float> %5
> }
>
> define <4 x float> @test2(<4 x float> %xyzw, <4 x float>
%abcd) {
>   %1 = shufflevector <4 x float> %xyzw, <4 x float> %abcd,
<4 x i32>
> <i32 0, i32 undef, i32 2, i32 4>
>   %2 = shufflevector <4 x float> <float undef, float 0.000000e+00,
> float undef, float undef>, <4 x float> %1, <4 x i32> <i32
4, i32 1,
> i32 6, i32 7>
>   ret <4 x float> %2
> }
>
>
> llc -march=x86-64 -mattr=+avx test.ll -o -
>
> test:                                   # @test
>     vxorps    %xmm2, %xmm2, %xmm2
>     vmovss    %xmm0, %xmm2, %xmm2
>     vblendps    $4, %xmm0, %xmm2, %xmm0 # xmm0 = xmm2[0,1],xmm0[2],xmm2[3]
>     vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>     retl
>
> test2:                                  # @test2
>     vinsertps    $48, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>     vxorps    %xmm1, %xmm1, %xmm1
>     vblendps    $13, %xmm0, %xmm1, %xmm0 # xmm0 = xmm0[0],xmm1[1],xmm0[2,3]
>     retl
>
> llc -march=x86-64 -mattr=+avx
> -x86-experimental-vector-shuffle-lowering test.ll -o -
>
> test:                                   # @test
>     vinsertps    $270, %xmm0, %xmm0, %xmm2 # xmm2 = xmm0[0],zero,zero,zero
>     vinsertps    $416, %xmm0, %xmm2, %xmm0 # xmm0 >
xmm2[0,1],xmm0[2],xmm2[3]
>     vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>     retl
>
> test2:                                  # @test2
>     vinsertps    $304, %xmm1, %xmm0, %xmm0 # xmm0 = xmm0[0,1,2],xmm1[0]
>     vxorps    %xmm1, %xmm1, %xmm1
>     vinsertps    $336, %xmm1, %xmm0, %xmm0 # xmm0 >
xmm0[0],xmm1[1],xmm0[2,3]
>     retl
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140906/c7290f88/attachment.html>

Apparently Analagous Threads

Search for more possibly parallel threads

llvm dev - Sep 2014 - [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Apparently Analagous Threads