thr3ads.net - llvm dev - [LLVMdev] RFB: Would like to flip the vector shuffle legality flag [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Ahmed Bougacha

2015-Jan-30 19:15 UTC

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I filed a couple more, in case they're actually different issues:
- http://llvm.org/bugs/show_bug.cgi?id=22412
- http://llvm.org/bugs/show_bug.cgi?id=22413

And that's pretty much it for internal changes.  I'm fine with flipping
the
switch; Quentin, are you?
Also, just to have an idea, do you (or someone else!) plan to tackle these
in the near future?

-Ahmed

On Thu, Jan 29, 2015 at 11:50 AM, Ahmed Bougacha <ahmed.bougacha at
gmail.com>
wrote:
>
> On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at
gmail.com>
> wrote:
>
>>
>> On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <ahmed.bougacha at
gmail.com
>> > wrote:
>>
>>> Hi Chandler,
>>>
>>> I've been looking at the regressions Quentin mentioned, and
filed a PR
>>> for the most egregious one:
http://llvm.org/bugs/show_bug.cgi?id=22377
>>>
>>> As for the others, I'm working on reducing them, but for now,
here are
>>> some raw observations, in case any of it rings a bell:
>>>
>>
>> Very cool, and thanks for the analysis!
>>
>>
>>>
>>>
>>> Another problem I'm seeing is that in some cases we can't
fold memory
>>> anymore:
>>>     vpermilps     $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 = mem[3,0,1,2]
>>>     vblendps      $0x1, %xmm2, %xmm0, %xmm0
>>> becomes:
>>>     vmovaps       -0xXX(%rdx), %xmm2
>>>     vshufps       $0x3, %xmm0, %xmm2, %xmm3 ## xmm3 =
xmm2[3,0],xmm0[0,0]
>>>     vshufps       $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0 >>>
xmm3[0,2],xmm0[1,2]
>>>
>>>
>>> Also, I see differences when some loads are shuffled, that I'm
a bit
>>> conflicted about:
>>>     vmovaps       -0xXX(%rbp), %xmm3
>>>     ...
>>>     vinsertps     $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5 >>>
xmm4[3],xmm3[1,2,3]
>>> becomes:
>>>     vpermilps     $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 = mem[3,0,1,2]
>>>     ...
>>>     vinsertps     $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2 >>>
xmm4[3],xmm2[1,2,3]
>>>
>>> Note that the second version does the shuffle in-place, in xmm2.
>>>
>>>
>>> Some are blends (har har) of those two:
>>>     vpermilps     $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 =
xmm_mem_1[3,0,1,2]
>>>     vpermilps     $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 =
mem_2[3,0,1,2]
>>>     vblendps      $0x1, %xmm1, %xmm6, %xmm0 ## xmm0 =
xmm1[0],xmm6[1,2,3]
>>> becomes:
>>>     vmovaps       -0xXX(%rax), %xmm0 ## %xmm0 = mem_2[0,1,2,3]
>>>     vpermilps     $-0x6d, %xmm0, %xmm1 ## xmm1 = xmm0[3,0,1,2]
>>>     vshufps       $0x3, %xmm_mem_1, %xmm0, %xmm0 ## xmm0
>>> = xmm0[3,0],xmm_mem_1[0,0]
>>>     vshufps       $-0x68, %xmm_mem_1, %xmm0, %xmm0 ## xmm0
>>> = xmm0[0,2],xmm_mem_1[1,2]
>>>
>>>
>>> I also see a lot of somewhat neutral (focusing on Haswell for now)
>>> domain changes such as (xmm5 and 0 are initially integers, and are
>>> dead after the store):
>>>     vpshufd       $-0x5c, %xmm0, %xmm0    ## xmm0 = xmm0[0,1,2,2]
>>>     vpalignr      $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
>>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
>>>     vmovdqu       %xmm0, 0x20(%rax)
>>> turning into:
>>>     vshufps       $0x2, %xmm5, %xmm0, %xmm0 ## xmm0 =
xmm0[2,0],xmm5[0,0]
>>>     vshufps       $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0 >>>
xmm0[0,2],xmm5[1,2]
>>>     vmovups       %xmm0, 0x20(%rax)
>>>
>>
>> All of these stem from what I think is the same core weakness of the
>> current algorithm: we prefer the fully general shufps+shufps 4-way
>> shuffle/blend far too often. Here is how I would more precisely
classify
>> the two things missing here:
>>
>> - Check if either inputs are "in place" and we can do a fast
single-input
>> shuffle with a fixed blend.
>>
>
> I believe this would be http://llvm.org/bugs/show_bug.cgi?id=22390
>
>
>> - Check if we can form a rotation and use palignr to finish a
>> shuffle/blend
>>
>
> .. and this would be  http://llvm.org/bugs/show_bug.cgi?id=22391
>
> I think this about covers the Haswell regressions I'm seeing.  Now for
> some pre-AVX fun!
>
> -Ahmed
>
>
>> There may be other patterns we're missing, but these two seem to
jump out
>> based on your analysis, and may be fairly easy to tackle.
>>
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/a4e875a4/attachment.html>

Chandler Carruth

2015-Jan-30 19:23 UTC

head link

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I may get one or two in the next month, but not more than that. Focused on
the pass manager for now. If none get there first, I'll eventually circle
back though, so they won't rot forever.
On Jan 30, 2015 11:21 AM, "Ahmed Bougacha" <ahmed.bougacha at
gmail.com> wrote:
> I filed a couple more, in case they're actually different issues:
> - http://llvm.org/bugs/show_bug.cgi?id=22412
> - http://llvm.org/bugs/show_bug.cgi?id=22413
>
> And that's pretty much it for internal changes.  I'm fine with
flipping
> the switch; Quentin, are you?
> Also, just to have an idea, do you (or someone else!) plan to tackle these
> in the near future?
>
> -Ahmed
>
> On Thu, Jan 29, 2015 at 11:50 AM, Ahmed Bougacha <ahmed.bougacha at
gmail.com
> > wrote:
>
>>
>> On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at
gmail.com>
>> wrote:
>>
>>>
>>> On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <
>>> ahmed.bougacha at gmail.com> wrote:
>>>
>>>> Hi Chandler,
>>>>
>>>> I've been looking at the regressions Quentin mentioned, and
filed a PR
>>>> for the most egregious one:
http://llvm.org/bugs/show_bug.cgi?id=22377
>>>>
>>>> As for the others, I'm working on reducing them, but for
now, here are
>>>> some raw observations, in case any of it rings a bell:
>>>>
>>>
>>> Very cool, and thanks for the analysis!
>>>
>>>
>>>>
>>>>
>>>> Another problem I'm seeing is that in some cases we
can't fold memory
>>>> anymore:
>>>>     vpermilps     $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 =
mem[3,0,1,2]
>>>>     vblendps      $0x1, %xmm2, %xmm0, %xmm0
>>>> becomes:
>>>>     vmovaps       -0xXX(%rdx), %xmm2
>>>>     vshufps       $0x3, %xmm0, %xmm2, %xmm3 ## xmm3
>>>> xmm2[3,0],xmm0[0,0]
>>>>     vshufps       $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0
>>>> xmm3[0,2],xmm0[1,2]
>>>>
>>>>
>>>> Also, I see differences when some loads are shuffled, that
I'm a bit
>>>> conflicted about:
>>>>     vmovaps       -0xXX(%rbp), %xmm3
>>>>     ...
>>>>     vinsertps     $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5
>>>> xmm4[3],xmm3[1,2,3]
>>>> becomes:
>>>>     vpermilps     $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 =
mem[3,0,1,2]
>>>>     ...
>>>>     vinsertps     $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2
>>>> xmm4[3],xmm2[1,2,3]
>>>>
>>>> Note that the second version does the shuffle in-place, in
xmm2.
>>>>
>>>>
>>>> Some are blends (har har) of those two:
>>>>     vpermilps     $-0x6d, %xmm_mem_1, %xmm6 ## xmm6 =
xmm_mem_1[3,0,1,2]
>>>>     vpermilps     $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 =
mem_2[3,0,1,2]
>>>>     vblendps      $0x1, %xmm1, %xmm6, %xmm0 ## xmm0
>>>> xmm1[0],xmm6[1,2,3]
>>>> becomes:
>>>>     vmovaps       -0xXX(%rax), %xmm0 ## %xmm0 = mem_2[0,1,2,3]
>>>>     vpermilps     $-0x6d, %xmm0, %xmm1 ## xmm1 = xmm0[3,0,1,2]
>>>>     vshufps       $0x3, %xmm_mem_1, %xmm0, %xmm0 ## xmm0
>>>> = xmm0[3,0],xmm_mem_1[0,0]
>>>>     vshufps       $-0x68, %xmm_mem_1, %xmm0, %xmm0 ## xmm0
>>>> = xmm0[0,2],xmm_mem_1[1,2]
>>>>
>>>>
>>>> I also see a lot of somewhat neutral (focusing on Haswell for
now)
>>>> domain changes such as (xmm5 and 0 are initially integers, and
are
>>>> dead after the store):
>>>>     vpshufd       $-0x5c, %xmm0, %xmm0    ## xmm0 =
xmm0[0,1,2,2]
>>>>     vpalignr      $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
>>>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
>>>>     vmovdqu       %xmm0, 0x20(%rax)
>>>> turning into:
>>>>     vshufps       $0x2, %xmm5, %xmm0, %xmm0 ## xmm0
>>>> xmm0[2,0],xmm5[0,0]
>>>>     vshufps       $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0
>>>> xmm0[0,2],xmm5[1,2]
>>>>     vmovups       %xmm0, 0x20(%rax)
>>>>
>>>
>>> All of these stem from what I think is the same core weakness of
the
>>> current algorithm: we prefer the fully general shufps+shufps 4-way
>>> shuffle/blend far too often. Here is how I would more precisely
classify
>>> the two things missing here:
>>>
>>> - Check if either inputs are "in place" and we can do a
fast
>>> single-input shuffle with a fixed blend.
>>>
>>
>> I believe this would be http://llvm.org/bugs/show_bug.cgi?id=22390
>>
>>
>>> - Check if we can form a rotation and use palignr to finish a
>>> shuffle/blend
>>>
>>
>> .. and this would be  http://llvm.org/bugs/show_bug.cgi?id=22391
>>
>> I think this about covers the Haswell regressions I'm seeing.  Now
for
>> some pre-AVX fun!
>>
>> -Ahmed
>>
>>
>>> There may be other patterns we're missing, but these two seem
to jump
>>> out based on your analysis, and may be fairly easy to tackle.
>>>
>>
>>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/ebb1638c/attachment.html>

Ahmed Bougacha

2015-Jan-30 19:25 UTC

head link

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

On Fri, Jan 30, 2015 at 11:23 AM, Chandler Carruth <chandlerc at
gmail.com>
wrote:
> I may get one or two in the next month, but not more than that. Focused on
> the pass manager for now. If none get there first, I'll eventually
circle
> back though, so they won't rot forever.
>Alright, I'll give it a try in the next few weeks as well.

-Ahmed

> On Jan 30, 2015 11:21 AM, "Ahmed Bougacha" <ahmed.bougacha at
gmail.com>
> wrote:
>
>> I filed a couple more, in case they're actually different issues:
>> - http://llvm.org/bugs/show_bug.cgi?id=22412
>> - http://llvm.org/bugs/show_bug.cgi?id=22413
>>
>> And that's pretty much it for internal changes.  I'm fine with
flipping
>> the switch; Quentin, are you?
>> Also, just to have an idea, do you (or someone else!) plan to tackle
>> these in the near future?
>>
>> -Ahmed
>>
>> On Thu, Jan 29, 2015 at 11:50 AM, Ahmed Bougacha <
>> ahmed.bougacha at gmail.com> wrote:
>>
>>>
>>> On Wed, Jan 28, 2015 at 4:47 PM, Chandler Carruth <chandlerc at
gmail.com>
>>> wrote:
>>>
>>>>
>>>> On Wed, Jan 28, 2015 at 4:05 PM, Ahmed Bougacha <
>>>> ahmed.bougacha at gmail.com> wrote:
>>>>
>>>>> Hi Chandler,
>>>>>
>>>>> I've been looking at the regressions Quentin mentioned,
and filed a PR
>>>>> for the most egregious one:
http://llvm.org/bugs/show_bug.cgi?id=22377
>>>>>
>>>>> As for the others, I'm working on reducing them, but
for now, here are
>>>>> some raw observations, in case any of it rings a bell:
>>>>>
>>>>
>>>> Very cool, and thanks for the analysis!
>>>>
>>>>
>>>>>
>>>>>
>>>>> Another problem I'm seeing is that in some cases we
can't fold memory
>>>>> anymore:
>>>>>     vpermilps     $-0x6d, -0xXX(%rdx), %xmm2 ## xmm2 =
mem[3,0,1,2]
>>>>>     vblendps      $0x1, %xmm2, %xmm0, %xmm0
>>>>> becomes:
>>>>>     vmovaps       -0xXX(%rdx), %xmm2
>>>>>     vshufps       $0x3, %xmm0, %xmm2, %xmm3 ## xmm3
>>>>> xmm2[3,0],xmm0[0,0]
>>>>>     vshufps       $-0x68, %xmm0, %xmm3, %xmm0 ## xmm0
>>>>> xmm3[0,2],xmm0[1,2]
>>>>>
>>>>>
>>>>> Also, I see differences when some loads are shuffled, that
I'm a bit
>>>>> conflicted about:
>>>>>     vmovaps       -0xXX(%rbp), %xmm3
>>>>>     ...
>>>>>     vinsertps     $0xc0, %xmm4, %xmm3, %xmm5 ## xmm5
>>>>> xmm4[3],xmm3[1,2,3]
>>>>> becomes:
>>>>>     vpermilps     $-0x6d, -0xXX(%rbp), %xmm2 ## xmm2 =
mem[3,0,1,2]
>>>>>     ...
>>>>>     vinsertps     $0xc0, %xmm4, %xmm2, %xmm2 ## xmm2
>>>>> xmm4[3],xmm2[1,2,3]
>>>>>
>>>>> Note that the second version does the shuffle in-place, in
xmm2.
>>>>>
>>>>>
>>>>> Some are blends (har har) of those two:
>>>>>     vpermilps     $-0x6d, %xmm_mem_1, %xmm6 ## xmm6
>>>>> xmm_mem_1[3,0,1,2]
>>>>>     vpermilps     $-0x6d, -0xXX(%rax), %xmm1 ## xmm1 =
mem_2[3,0,1,2]
>>>>>     vblendps      $0x1, %xmm1, %xmm6, %xmm0 ## xmm0
>>>>> xmm1[0],xmm6[1,2,3]
>>>>> becomes:
>>>>>     vmovaps       -0xXX(%rax), %xmm0 ## %xmm0 =
mem_2[0,1,2,3]
>>>>>     vpermilps     $-0x6d, %xmm0, %xmm1 ## xmm1 =
xmm0[3,0,1,2]
>>>>>     vshufps       $0x3, %xmm_mem_1, %xmm0, %xmm0 ## xmm0
>>>>> = xmm0[3,0],xmm_mem_1[0,0]
>>>>>     vshufps       $-0x68, %xmm_mem_1, %xmm0, %xmm0 ## xmm0
>>>>> = xmm0[0,2],xmm_mem_1[1,2]
>>>>>
>>>>>
>>>>> I also see a lot of somewhat neutral (focusing on Haswell
for now)
>>>>> domain changes such as (xmm5 and 0 are initially integers,
and are
>>>>> dead after the store):
>>>>>     vpshufd       $-0x5c, %xmm0, %xmm0    ## xmm0 =
xmm0[0,1,2,2]
>>>>>     vpalignr      $0xc, %xmm0, %xmm5, %xmm0 ## xmm0
>>>>> = xmm0[12,13,14,15],xmm5[0,1,2,3,4,5,6,7,8,9,10,11]
>>>>>     vmovdqu       %xmm0, 0x20(%rax)
>>>>> turning into:
>>>>>     vshufps       $0x2, %xmm5, %xmm0, %xmm0 ## xmm0
>>>>> xmm0[2,0],xmm5[0,0]
>>>>>     vshufps       $-0x68, %xmm5, %xmm0, %xmm0 ## xmm0
>>>>> xmm0[0,2],xmm5[1,2]
>>>>>     vmovups       %xmm0, 0x20(%rax)
>>>>>
>>>>
>>>> All of these stem from what I think is the same core weakness
of the
>>>> current algorithm: we prefer the fully general shufps+shufps
4-way
>>>> shuffle/blend far too often. Here is how I would more precisely
classify
>>>> the two things missing here:
>>>>
>>>> - Check if either inputs are "in place" and we can do
a fast
>>>> single-input shuffle with a fixed blend.
>>>>
>>>
>>> I believe this would be http://llvm.org/bugs/show_bug.cgi?id=22390
>>>
>>>
>>>> - Check if we can form a rotation and use palignr to finish a
>>>> shuffle/blend
>>>>
>>>
>>> .. and this would be  http://llvm.org/bugs/show_bug.cgi?id=22391
>>>
>>> I think this about covers the Haswell regressions I'm seeing. 
Now for
>>> some pre-AVX fun!
>>>
>>> -Ahmed
>>>
>>>
>>>> There may be other patterns we're missing, but these two
seem to jump
>>>> out based on your analysis, and may be fairly easy to tackle.
>>>>
>>>
>>>
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>>
>>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150130/689e6d96/attachment.html>

Chandler Carruth

2015-Feb-03 20:55 UTC

head link

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Thanks to everyone doing the benchmarking!!! =D

On Fri, Jan 30, 2015 at 11:15 AM, Ahmed Bougacha <ahmed.bougacha at
gmail.com>
wrote:
> I'm fine with flipping the switch; Quentin, are you?
>
I checked quickly and Quentin seems happy. Everyone else seems to have
reported back happy. I'm planning to flip the switch and delete the old
shuffle code "soon". No guarantees (lots of other stuff in flight) but
hoping to rip all of this stuff out.

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150203/0f2f1f48/attachment.html>

Chandler Carruth

2015-Feb-20 02:05 UTC

head link

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

On Tue, Feb 3, 2015 at 12:55 PM, Chandler Carruth <chandlerc at gmail.com>
wrote:
> Thanks to everyone doing the benchmarking!!! =D
>
> On Fri, Jan 30, 2015 at 11:15 AM, Ahmed Bougacha <ahmed.bougacha at
gmail.com
> > wrote:
>
>> I'm fine with flipping the switch; Quentin, are you?
>>
>
> I checked quickly and Quentin seems happy. Everyone else seems to have
> reported back happy. I'm planning to flip the switch and delete the old
> shuffle code "soon". No guarantees (lots of other stuff in
flight) but
> hoping to rip all of this stuff out.
>
FYI, I've fixed all the regressions filed except for PR22391 along with a
*giant* pile of other improvements to the vector shuffle lowering. I even
have a fix up my sleeve for PR22391, but it needs refactoring in the code
that I don't really want to do while supporting both. It is time.

I'm going to start submitting the patches now to rip out the flag and all
the code supporting it.

-Chandler
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150219/0f4f8dbd/attachment.html>

Seemingly Similar Threads

Search for more reasonably related threads

llvm dev - Jan 2015 - [LLVMdev] RFB: Would like to flip the vector shuffle legality flag

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Seemingly Similar Threads