Chandler Carruth
2014-Sep-23 21:53 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote:> If you don’t want to spend time on this, I’d be happy to create a > candidate patch for review? I’ve been unclear if you were taking patches > for your shuffle work prior to it becoming the default.While I'm happy to work on it, I'm even more happy to have patches. =D -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140923/53d37d9a/attachment.html>
Andrea Di Biagio
2014-Sep-26 10:39 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Hi Chandler, Here is another test. When looking at the AVX codegen, I noticed that, when using the new shuffle lowering, we no longer emit a single vbroadcastss in the case where the shuffle performs a splat of a scalar float loaded from memory. For example: (with -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering) vmovss (%rdi), %xmm0 vpermilps $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0] Instead of: (with -mcpu=corei7-avx) vbroadcastss (%rdi), %xmm0 I have attached a small reproducible for it. Basically, the old shuffle lowering logic calls function 'NormalizeVectorShuffle' to handle shuffles that perform a splat operation. On AVX, function 'NormalizeVectorShuffle' tries to lower a splat where the splat value comes from a load into a X86ISD::VBROADCAST dag node. Later on, during instruction selection, we emit a single avx_broadcast for the load+splat sequence (basically, we end up folding the load in the operand of the vbroadcastss). What happens is that the new shuffle lowering doesn't emit a vbroadcast node in this case and eventually we end up selecting the sequence of vmovss+vpermilps. I hope this helps. Andrea On Tue, Sep 23, 2014 at 10:53 PM, Chandler Carruth <chandlerc at google.com> wrote:> > On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk> > wrote: >> >> If you don’t want to spend time on this, I’d be happy to create a >> candidate patch for review? I’ve been unclear if you were taking patches for >> your shuffle work prior to it becoming the default. > > > While I'm happy to work on it, I'm even more happy to have patches. =D > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- A non-text attachment was scrubbed... Name: test.ll Type: application/octet-stream Size: 394 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140926/9821a6f2/attachment.obj>
Chandler Carruth
2014-Sep-30 05:48 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
Wow. Somehow, I forgot about vbroadcast and vpbroadcast. =[ Sorry about that. I'll fix those. On Fri, Sep 26, 2014 at 3:39 AM, Andrea Di Biagio <andrea.dibiagio at gmail.com> wrote:> Hi Chandler, > > Here is another test. > > When looking at the AVX codegen, I noticed that, when using the new > shuffle lowering, we no longer emit a single vbroadcastss in the case > where the shuffle performs a splat of a scalar float loaded from > memory. > > For example: > (with -mcpu=corei7-avx -x86-experimental-vector-shuffle-lowering) > vmovss (%rdi), %xmm0 > vpermilps $0, %xmm0, %xmm0 # xmm0 = xmm0[0,0,0,0] > > Instead of: > (with -mcpu=corei7-avx) > vbroadcastss (%rdi), %xmm0 > > I have attached a small reproducible for it. > > Basically, the old shuffle lowering logic calls function > 'NormalizeVectorShuffle' to handle shuffles that perform a splat > operation. > On AVX, function 'NormalizeVectorShuffle' tries to lower a splat where > the splat value comes from a load into a X86ISD::VBROADCAST dag node. > Later on, during instruction selection, we emit a single avx_broadcast > for the load+splat sequence (basically, we end up folding the load in > the operand of the vbroadcastss). > > What happens is that the new shuffle lowering doesn't emit a > vbroadcast node in this case and eventually we end up selecting the > sequence of vmovss+vpermilps. > > I hope this helps. > Andrea > > On Tue, Sep 23, 2014 at 10:53 PM, Chandler Carruth <chandlerc at google.com> > wrote: > > > > On Tue, Sep 23, 2014 at 2:35 PM, Simon Pilgrim <llvm-dev at redking.me.uk> > > wrote: > >> > >> If you don’t want to spend time on this, I’d be happy to create a > >> candidate patch for review? I’ve been unclear if you were taking > patches for > >> your shuffle work prior to it becoming the default. > > > > > > While I'm happy to work on it, I'm even more happy to have patches. =D > > > > _______________________________________________ > > LLVM Developers mailing list > > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140929/cec7af6d/attachment.html>