thr3ads.net - llvm dev - [LLVMdev] RFB: Would like to flip the vector shuffle legality flag [Jan 2015]

If this information is useful, please help other people find it:
Share via:

Sanjay Patel

2015-Jan-25 22:15 UTC

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

I ran the benchmarking subset of test-suite on a btver2 machine and
optimizing for btver2 (so enabling AVX codegen).

I don't see anything outside of the noise with
x86-experimental-vector-shuffle-legality=1.

On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <andrea.dibiagio at
gmail.com> wrote:
> Hi Chandler,
>
> On Fri, Jan 23, 2015 at 8:15 AM, Chandler Carruth <chandlerc at
gmail.com>
> wrote:
> > Greetings LLVM hackers and x86 vector shufflers!
> >
> > I would like to flip on another chunk of the new vector shuffling,
> > specifically the logic to mark ~all shuffles as "legal".
> >
> > This can be tested today with the flag
> > "-x86-experimental-vector-shuffle-legality". I would
essentially like to
> > make this the default (by removing the "false" path). Doing
this will
> allow
> > me to completely delete the old vector shuffle lowering.
> >
> > I've got the patches prepped and ready to go, but it will likely
have a
> > significant impact on performance. Notably, a bunch of the remaining
> domain
> > crossing bugs I'm seeing are due to this. The key thing to realize
is
> that
> > vector shuffle combining is *much* more powerful when we think all of
> these
> > are legal, and so we combine away bad shuffles that would trigger
domain
> > crosses.
>
> That's good news!
> Also, I really like your idea of making all shuffles legal by default.
> I remember I did some experiments disabling the checks for legal
> shuffles in the DAGCombiner to see how well the new shuffle lowering
> coped with 'overly' aggressive shuffle combining. I was surprised
to
> see that from eyeballing the generated code it looked much cleaner
> (although I didn't test it extensively). Our target is btver2, so I
> also didn't look at what could have been codegen for targets with no
> AVX/SSE4.1 where there might be fewer opportunities to match a shuffle
> with a single target instruction during legalization.
>
> >
> > All of my benchmarks have come back performance neutral overall with a
> few
> > benchmarks improving. However, there may be some regressions that
folks
> want
> > to track down first. I'd really like to get those reported and
prioritize
> > among the vector shuffle work so we can nuke several *thousand* lines
of
> > code from X86ISelLowering.cpp. =D
>
> I'll see if I can get some numbers from our internal codebase and help
> with reporting potential regressions.
>
> Thanks,
> Andrea
>
> >
> > Thanks!
> > -Chandler
> >
> >
> > PS: If you're feeling adventurous, the next big mode flip flag I
want to
> see
> > changed is -x86-experimental-vector-widening-legalization, but this is
a
> > much more deep change to the entire vector legalization strategy, so I
> want
> > to do it second and separately.
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150125/494487d0/attachment.html>

Quentin Colombet

2015-Jan-27 20:58 UTC

head link

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Hi Chandler,

I ran the new flag through some of our internal benchmarks and overall this is
neutral, so slightly better.
I observed a couple of improvements, but also a couple of regressions.
I am working on tracking those down to produce reduced test cases. No sure I
would have time to do that by the end of this week though.

Cheers,
-Quentin

On Jan 25, 2015, at 2:15 PM, Sanjay Patel <spatel at rotateright.com>
wrote:
> I ran the benchmarking subset of test-suite on a btver2 machine and
optimizing for btver2 (so enabling AVX codegen).
> 
> I don't see anything outside of the noise with
x86-experimental-vector-shuffle-legality=1.
> 
> On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <andrea.dibiagio at
gmail.com> wrote:
> Hi Chandler,
> 
> On Fri, Jan 23, 2015 at 8:15 AM, Chandler Carruth <chandlerc at
gmail.com> wrote:
> > Greetings LLVM hackers and x86 vector shufflers!
> >
> > I would like to flip on another chunk of the new vector shuffling,
> > specifically the logic to mark ~all shuffles as "legal".
> >
> > This can be tested today with the flag
> > "-x86-experimental-vector-shuffle-legality". I would
essentially like to
> > make this the default (by removing the "false" path). Doing
this will allow
> > me to completely delete the old vector shuffle lowering.
> >
> > I've got the patches prepped and ready to go, but it will likely
have a
> > significant impact on performance. Notably, a bunch of the remaining
domain
> > crossing bugs I'm seeing are due to this. The key thing to realize
is that
> > vector shuffle combining is *much* more powerful when we think all of
these
> > are legal, and so we combine away bad shuffles that would trigger
domain
> > crosses.
> 
> That's good news!
> Also, I really like your idea of making all shuffles legal by default.
> I remember I did some experiments disabling the checks for legal
> shuffles in the DAGCombiner to see how well the new shuffle lowering
> coped with 'overly' aggressive shuffle combining. I was surprised
to
> see that from eyeballing the generated code it looked much cleaner
> (although I didn't test it extensively). Our target is btver2, so I
> also didn't look at what could have been codegen for targets with no
> AVX/SSE4.1 where there might be fewer opportunities to match a shuffle
> with a single target instruction during legalization.
> 
> >
> > All of my benchmarks have come back performance neutral overall with a
few
> > benchmarks improving. However, there may be some regressions that
folks want
> > to track down first. I'd really like to get those reported and
prioritize
> > among the vector shuffle work so we can nuke several *thousand* lines
of
> > code from X86ISelLowering.cpp. =D
> 
> I'll see if I can get some numbers from our internal codebase and help
> with reporting potential regressions.
> 
> Thanks,
> Andrea
> 
> >
> > Thanks!
> > -Chandler
> >
> >
> > PS: If you're feeling adventurous, the next big mode flip flag I
want to see
> > changed is -x86-experimental-vector-widening-legalization, but this is
a
> > much more deep change to the entire vector legalization strategy, so I
want
> > to do it second and separately.
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/9f7fc94a/attachment.html>

Simon Pilgrim

2015-Jan-27 22:42 UTC

head link

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Hi Chandler,

I don’t see any scary changes in performance outside of noise (1%) when I set
"all shuffles as legal” to true. I haven’t tested the vector widening yet.

A minor concern: in several situations I’m seeing vpermilps has been replaced
with shufps with repeated inputs (see below). I’m also seeing cases where 2 x
shufps are being used where a single shufps/permilps was occurring. I’ll try to
reduce some examples for bugzilla.

old:
        vshufps        $19, %xmm3, %xmm2, %xmm2 # xmm2 = xmm2[3,0],xmm3[1,0]
        vpermilps        $45, %xmm2, %xmm2 # xmm2 = xmm2[1,3,2,0]

new:
        vshufps        $76, %xmm3, %xmm2, %xmm2 # xmm2 = xmm2[0,3],xmm3[0,1]
        vshufps        $120, %xmm2, %xmm2, %xmm2 # xmm2 = xmm2[0,2,3,1]

This may be fixable with improvements to DAGCombiner::visitVECTOR_SHUFFLE - it
uses shuffle mask legality to check whether to commute candidate shuffle masks,
which we never do with always legal shuffles. Better optimisation may be
possible if we can optimise more shuffle(shuffle(A, B, M0), shuffle(C, D, M1),
M2) patterns as well.

Cheers, Simon.

On 25 Jan 2015, at 22:15, Sanjay Patel <spatel at rotateright.com> wrote:
> I ran the benchmarking subset of test-suite on a btver2 machine and
optimizing for btver2 (so enabling AVX codegen).
> 
> I don't see anything outside of the noise with
x86-experimental-vector-shuffle-legality=1.
> 
> On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <andrea.dibiagio at
gmail.com> wrote:
> Hi Chandler,
> 
> On Fri, Jan 23, 2015 at 8:15 AM, Chandler Carruth <chandlerc at
gmail.com> wrote:
> > Greetings LLVM hackers and x86 vector shufflers!
> >
> > I would like to flip on another chunk of the new vector shuffling,
> > specifically the logic to mark ~all shuffles as "legal".
> >
> > This can be tested today with the flag
> > "-x86-experimental-vector-shuffle-legality". I would
essentially like to
> > make this the default (by removing the "false" path). Doing
this will allow
> > me to completely delete the old vector shuffle lowering.
> >
> > I've got the patches prepped and ready to go, but it will likely
have a
> > significant impact on performance. Notably, a bunch of the remaining
domain
> > crossing bugs I'm seeing are due to this. The key thing to realize
is that
> > vector shuffle combining is *much* more powerful when we think all of
these
> > are legal, and so we combine away bad shuffles that would trigger
domain
> > crosses.
> 
> That's good news!
> Also, I really like your idea of making all shuffles legal by default.
> I remember I did some experiments disabling the checks for legal
> shuffles in the DAGCombiner to see how well the new shuffle lowering
> coped with 'overly' aggressive shuffle combining. I was surprised
to
> see that from eyeballing the generated code it looked much cleaner
> (although I didn't test it extensively). Our target is btver2, so I
> also didn't look at what could have been codegen for targets with no
> AVX/SSE4.1 where there might be fewer opportunities to match a shuffle
> with a single target instruction during legalization.
> 
> >
> > All of my benchmarks have come back performance neutral overall with a
few
> > benchmarks improving. However, there may be some regressions that
folks want
> > to track down first. I'd really like to get those reported and
prioritize
> > among the vector shuffle work so we can nuke several *thousand* lines
of
> > code from X86ISelLowering.cpp. =D
> 
> I'll see if I can get some numbers from our internal codebase and help
> with reporting potential regressions.
> 
> Thanks,
> Andrea
> 
> >
> > Thanks!
> > -Chandler
> >
> >
> > PS: If you're feeling adventurous, the next big mode flip flag I
want to see
> > changed is -x86-experimental-vector-widening-legalization, but this is
a
> > much more deep change to the entire vector legalization strategy, so I
want
> > to do it second and separately.
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/9ec88f90/attachment.html>

Chandler Carruth

2015-Jan-27 22:58 UTC

head link

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Thanks all!

Simon, Quentin, do you feel comfortable removing the old legality testing
now, or would you like to leave it in until you have more time to reduce
test cases?

(I'm not actually trying to hurry getting test cases, I'm fine to leave
it
in place as long as necessary, just checking so I know how to proceed.)

On Tue, Jan 27, 2015 at 2:42 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:
> Hi Chandler,
>
> I don’t see any scary changes in performance outside of noise (1%) when I
> set "all shuffles as legal” to true. I haven’t tested the vector
widening
> yet.
>
> A minor concern: in several situations I’m seeing vpermilps has been
> replaced with shufps with repeated inputs (see below). I’m also seeing
> cases where 2 x shufps are being used where a single shufps/permilps was
> occurring. I’ll try to reduce some examples for bugzilla.
>
> old:
>         vshufps        $19, %xmm3, %xmm2, %xmm2 # xmm2 >
xmm2[3,0],xmm3[1,0]
>         vpermilps        $45, %xmm2, %xmm2 # xmm2 = xmm2[1,3,2,0]
>
> new:
>         vshufps        $76, %xmm3, %xmm2, %xmm2 # xmm2 >
xmm2[0,3],xmm3[0,1]
>         vshufps        $120, %xmm2, %xmm2, %xmm2 # xmm2 = xmm2[0,2,3,1]
>
> This may be fixable with improvements to DAGCombiner::visitVECTOR_SHUFFLE
> - it uses shuffle mask legality to check whether to commute candidate
> shuffle masks, which we never do with always legal shuffles. Better
> optimisation may be possible if we can optimise more shuffle(shuffle(A, B,
> M0), shuffle(C, D, M1), M2) patterns as well.
>
> Cheers, Simon.
>
> On 25 Jan 2015, at 22:15, Sanjay Patel <spatel at rotateright.com>
wrote:
>
> I ran the benchmarking subset of test-suite on a btver2 machine and
> optimizing for btver2 (so enabling AVX codegen).
>
> I don't see anything outside of the noise with
> x86-experimental-vector-shuffle-legality=1.
>
> On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <
> andrea.dibiagio at gmail.com> wrote:
>
>> Hi Chandler,
>>
>> On Fri, Jan 23, 2015 at 8:15 AM, Chandler Carruth <chandlerc at
gmail.com>
>> wrote:
>> > Greetings LLVM hackers and x86 vector shufflers!
>> >
>> > I would like to flip on another chunk of the new vector shuffling,
>> > specifically the logic to mark ~all shuffles as "legal".
>> >
>> > This can be tested today with the flag
>> > "-x86-experimental-vector-shuffle-legality". I would
essentially like to
>> > make this the default (by removing the "false" path).
Doing this will
>> allow
>> > me to completely delete the old vector shuffle lowering.
>> >
>> > I've got the patches prepped and ready to go, but it will
likely have a
>> > significant impact on performance. Notably, a bunch of the
remaining
>> domain
>> > crossing bugs I'm seeing are due to this. The key thing to
realize is
>> that
>> > vector shuffle combining is *much* more powerful when we think all
of
>> these
>> > are legal, and so we combine away bad shuffles that would trigger
domain
>> > crosses.
>>
>> That's good news!
>> Also, I really like your idea of making all shuffles legal by default.
>> I remember I did some experiments disabling the checks for legal
>> shuffles in the DAGCombiner to see how well the new shuffle lowering
>> coped with 'overly' aggressive shuffle combining. I was
surprised to
>> see that from eyeballing the generated code it looked much cleaner
>> (although I didn't test it extensively). Our target is btver2, so I
>> also didn't look at what could have been codegen for targets with
no
>> AVX/SSE4.1 where there might be fewer opportunities to match a shuffle
>> with a single target instruction during legalization.
>>
>> >
>> > All of my benchmarks have come back performance neutral overall
with a
>> few
>> > benchmarks improving. However, there may be some regressions that
folks
>> want
>> > to track down first. I'd really like to get those reported and
>> prioritize
>> > among the vector shuffle work so we can nuke several *thousand*
lines of
>> > code from X86ISelLowering.cpp. =D
>>
>> I'll see if I can get some numbers from our internal codebase and
help
>> with reporting potential regressions.
>>
>> Thanks,
>> Andrea
>>
>> >
>> > Thanks!
>> > -Chandler
>> >
>> >
>> > PS: If you're feeling adventurous, the next big mode flip flag
I want
>> to see
>> > changed is -x86-experimental-vector-widening-legalization, but
this is a
>> > much more deep change to the entire vector legalization strategy,
so I
>> want
>> > to do it second and separately.
>>
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150127/954888ed/attachment.html>

Kuperstein, Michael M

2015-Jan-28 15:57 UTC

head link

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

Same here - it looks mostly neutral.
I don't think we see anything that's significantly above or below noise.

From: Quentin Colombet [mailto:qcolombet at apple.com]
Sent: Tuesday, January 27, 2015 22:58
To: Chandler Carruth
Cc: Andrea Di Biagio; Simon Pilgrim; Sanjay Patel; LLVM Developers Mailing List;
Kuperstein, Michael M
Subject: Re: RFB: Would like to flip the vector shuffle legality flag

Hi Chandler,

I ran the new flag through some of our internal benchmarks and overall this is
neutral, so slightly better.
I observed a couple of improvements, but also a couple of regressions.
I am working on tracking those down to produce reduced test cases. No sure I
would have time to do that by the end of this week though.

Cheers,
-Quentin

On Jan 25, 2015, at 2:15 PM, Sanjay Patel <spatel at
rotateright.com<mailto:spatel at rotateright.com>> wrote:


I ran the benchmarking subset of test-suite on a btver2 machine and optimizing
for btver2 (so enabling AVX codegen).

I don't see anything outside of the noise with
x86-experimental-vector-shuffle-legality=1.

On Fri, Jan 23, 2015 at 5:19 AM, Andrea Di Biagio <andrea.dibiagio at
gmail.com<mailto:andrea.dibiagio at gmail.com>> wrote:
Hi Chandler,

On Fri, Jan 23, 2015 at 8:15 AM, Chandler Carruth <chandlerc at
gmail.com<mailto:chandlerc at gmail.com>>
wrote:> Greetings LLVM hackers and x86 vector shufflers!
>
> I would like to flip on another chunk of the new vector shuffling,
> specifically the logic to mark ~all shuffles as "legal".
>
> This can be tested today with the flag
> "-x86-experimental-vector-shuffle-legality". I would essentially
like to
> make this the default (by removing the "false" path). Doing this
will allow
> me to completely delete the old vector shuffle lowering.
>
> I've got the patches prepped and ready to go, but it will likely have a
> significant impact on performance. Notably, a bunch of the remaining domain
> crossing bugs I'm seeing are due to this. The key thing to realize is
that
> vector shuffle combining is *much* more powerful when we think all of these
> are legal, and so we combine away bad shuffles that would trigger domain
> crosses.
That's good news!
Also, I really like your idea of making all shuffles legal by default.
I remember I did some experiments disabling the checks for legal
shuffles in the DAGCombiner to see how well the new shuffle lowering
coped with 'overly' aggressive shuffle combining. I was surprised to
see that from eyeballing the generated code it looked much cleaner
(although I didn't test it extensively). Our target is btver2, so I
also didn't look at what could have been codegen for targets with no
AVX/SSE4.1 where there might be fewer opportunities to match a shuffle
with a single target instruction during legalization.
>
> All of my benchmarks have come back performance neutral overall with a few
> benchmarks improving. However, there may be some regressions that folks
want
> to track down first. I'd really like to get those reported and
prioritize
> among the vector shuffle work so we can nuke several *thousand* lines of
> code from X86ISelLowering.cpp. =D
I'll see if I can get some numbers from our internal codebase and help
with reporting potential regressions.

Thanks,
Andrea
>
> Thanks!
> -Chandler
>
>
> PS: If you're feeling adventurous, the next big mode flip flag I want
to see
> changed is -x86-experimental-vector-widening-legalization, but this is a
> much more deep change to the entire vector legalization strategy, so I want
> to do it second and separately.

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150128/78e1a51b/attachment.html>

llvm dev - Jan 2015 - [LLVMdev] RFB: Would like to flip the vector shuffle legality flag

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag

[LLVMdev] RFB: Would like to flip the vector shuffle legality flag