Chandler Carruth
2014-Sep-20 18:44 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Sat, Sep 20, 2014 at 7:12 AM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote:> Hi Andrea / Chandler / Quentin, > > If AVX is available I would expect the vpermilps/vpermilpd instruction to > be used for all float/double single vector shuffles, especially as it can > deal with the folded load case as well - this would avoid the integer/float > execution domain transfer issue with using vpshufd. >Yes, this is the obvious solution to folding memory loads. It just isn't implemented yet. Well, actually it is, but I haven't finished writing tests for it. =] -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140920/29e2be8f/attachment.html>
Simon Pilgrim
2014-Sep-21 20:15 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On 20 Sep 2014, at 19:44, Chandler Carruth <chandlerc at google.com> wrote:> If AVX is available I would expect the vpermilps/vpermilpd instruction to be used for all float/double single vector shuffles, especially as it can deal with the folded load case as well - this would avoid the integer/float execution domain transfer issue with using vpshufd. > > Yes, this is the obvious solution to folding memory loads. It just isn't implemented yet. > > Well, actually it is, but I haven't finished writing tests for it. =]Thanks Chandler - vpermilps/vpermilpd generation looks great now. I've found another regression - byte shifts on pre-ssse3 targets are failing to make use of the vpslldq/vpsrldq instructions - I've attached some basic test cases. Could vpslldq/vpsrldq be used on ssse3+ targets for the cases where zeros are being shifted in? It avoids the need for a zero register (although they aren't as good for memory folding). Cheers, Simon. -------------- next part -------------- A non-text attachment was scrubbed... Name: byte_shift.ll Type: application/octet-stream Size: 4589 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140921/309c3196/attachment.obj>
Chandler Carruth
2014-Sep-23 11:28 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Sun, Sep 21, 2014 at 1:15 PM, Simon Pilgrim <llvm-dev at redking.me.uk> wrote:> On 20 Sep 2014, at 19:44, Chandler Carruth <chandlerc at google.com> wrote: > > > If AVX is available I would expect the vpermilps/vpermilpd instruction > to be used for all float/double single vector shuffles, especially as it > can deal with the folded load case as well - this would avoid the > integer/float execution domain transfer issue with using vpshufd. > > > > Yes, this is the obvious solution to folding memory loads. It just isn't > implemented yet. > > > > Well, actually it is, but I haven't finished writing tests for it. =] > > Thanks Chandler - vpermilps/vpermilpd generation looks great now. > > I've found another regression - byte shifts on pre-ssse3 targets are > failing to make use of the vpslldq/vpsrldq instructions - I've attached > some basic test cases. > > Could vpslldq/vpsrldq be used on ssse3+ targets for the cases where zeros > are being shifted in? It avoids the need for a zero register (although they > aren't as good for memory folding).I'm curious, how important is this? This lowering has always seemed deeply magical and unlikely to be necessary in practice. palignr at least allows blending. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140923/a5156662/attachment.html>