thr3ads.net - llvm dev - [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon! [Sep 2014]

If this information is useful, please help other people find it:
Share via:

Chandler Carruth

2014-Sep-23 21:53 UTC

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk>
wrote:
> If you don’t want to spend time on this, I’d be happy to create a
> candidate patch for review? I’ve been unclear if you were taking patches
> for your shuffle work prior to it becoming the default.

While I'm happy to work on it, I'm even more happy to have patches. =D
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140923/53d37d9a/attachment.html>

Andrea Di Biagio

2014-Sep-26 10:39 UTC

head link

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Hi Chandler,

Here is another test.

When looking at the AVX codegen, I noticed that, when using the new
shuffle lowering, we no longer emit a single vbroadcastss in the case
where the shuffle performs a splat of a scalar float loaded from
memory.

For example:
(with -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering)
   vmovss (%rdi), %xmm0
   vpermilps $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0]

Instead of:
(with -mcpu=corei7-avx)
  vbroadcastss (%rdi), %xmm0

I have attached a small reproducible for it.

Basically, the old shuffle lowering logic calls function
'NormalizeVectorShuffle' to handle shuffles that perform a splat
operation.
On AVX, function 'NormalizeVectorShuffle' tries to lower a splat where
the splat value comes from a load into a X86ISD::VBROADCAST dag node.
Later on, during instruction selection, we emit a single avx_broadcast
for the load+splat sequence (basically, we end up folding the load in
the operand of the vbroadcastss).

What happens is that the new shuffle lowering doesn't emit a
vbroadcast node in this case and eventually we end up selecting the
sequence of vmovss+vpermilps.

I hope this helps.
Andrea

On Tue, Sep 23, 2014 at 10:53 PM, Chandler Carruth <chandlerc at
google.com> wrote:>
> On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at
redking.me.uk>
> wrote:
>>
>> If you don’t want to spend time on this, I’d be happy to create a
>> candidate patch for review? I’ve been unclear if you were taking
patches for
>> your shuffle work prior to it becoming the default.
>
>
> While I'm happy to work on it, I'm even more happy to have patches.
=D
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.ll
Type: application/octet-stream
Size: 394 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140926/9821a6f2/attachment.obj>

Chandler Carruth

2014-Sep-30 05:48 UTC

head link

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

Wow. Somehow, I forgot about vbroadcast and vpbroadcast. =[ Sorry about
that. I'll fix those.

On Fri, Sep 26, 2014 at 3:39 AM, Andrea Di Biagio <andrea.dibiagio at
gmail.com> wrote:
> Hi Chandler,
>
> Here is another test.
>
> When looking at the AVX codegen, I noticed that, when using the new
> shuffle lowering, we no longer emit a single vbroadcastss in the case
> where the shuffle performs a splat of a scalar float loaded from
> memory.
>
> For example:
> (with -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering)
>    vmovss (%rdi), %xmm0
>    vpermilps $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0]
>
> Instead of:
> (with -mcpu=corei7-avx)
>   vbroadcastss (%rdi), %xmm0
>
> I have attached a small reproducible for it.
>
> Basically, the old shuffle lowering logic calls function
> 'NormalizeVectorShuffle' to handle shuffles that perform a splat
> operation.
> On AVX, function 'NormalizeVectorShuffle' tries to lower a splat
where
> the splat value comes from a load into a X86ISD::VBROADCAST dag node.
> Later on, during instruction selection, we emit a single avx_broadcast
> for the load+splat sequence (basically, we end up folding the load in
> the operand of the vbroadcastss).
>
> What happens is that the new shuffle lowering doesn't emit a
> vbroadcast node in this case and eventually we end up selecting the
> sequence of vmovss+vpermilps.
>
> I hope this helps.
> Andrea
>
> On Tue, Sep 23, 2014 at 10:53 PM, Chandler Carruth <chandlerc at
google.com>
> wrote:
> >
> > On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at
redking.me.uk>
> > wrote:
> >>
> >> If you don’t want to spend time on this, I’d be happy to create a
> >> candidate patch for review? I’ve been unclear if you were taking
> patches for
> >> your shuffle work prior to it becoming the default.
> >
> >
> > While I'm happy to work on it, I'm even more happy to have
patches. =D
> >
> > _______________________________________________
> > LLVM Developers mailing list
> > LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20140929/cec7af6d/attachment.html>

llvm dev - Sep 2014 - [LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!

[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!