Chandler Carruth
2014-Sep-20 05:15 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
After some adding some serious ninja-ry to the new shuffle lowering... On Fri, Sep 19, 2014 at 11:53 AM, Quentin Colombet <qcolombet at apple.com> wrote:> 2. none_useless_shuflle none > Instead of using a single move to materialize a zero extended constant > into a vector register, we explicitly zeroed a vector register and use a > shuffle. >... this test case is fixed, as is your 'pxor.ll' test case from earlier, and the 'movss' test cases from Andrea earlier in the thread (I suspect). Turns out that there is a trick that we can use in the existing tables to get most of the memory-operand-movss optimizations. I think this is all of the non-avx-specific issues raised thus far.... One of the issues isn't avx specific but can only be solved with avx. Anyways, I'll look into some of the AVX issues next. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140919/dcda6298/attachment.html>
Chandler Carruth
2014-Sep-23 11:28 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
And I've just enhanced the AVX side more. It now leverages the variable VPERMILPS formation that I think LLVM has historically completely failed to utilize when lowering AVX code. =] AVX2 is still in-flight. On Fri, Sep 19, 2014 at 10:15 PM, Chandler Carruth <chandlerc at google.com> wrote:> After some adding some serious ninja-ry to the new shuffle lowering... > > On Fri, Sep 19, 2014 at 11:53 AM, Quentin Colombet <qcolombet at apple.com> > wrote: > >> 2. none_useless_shuflle none >> Instead of using a single move to materialize a zero extended constant >> into a vector register, we explicitly zeroed a vector register and use a >> shuffle. >> > > ... this test case is fixed, as is your 'pxor.ll' test case from earlier, > and the 'movss' test cases from Andrea earlier in the thread (I suspect). > Turns out that there is a trick that we can use in the existing tables to > get most of the memory-operand-movss optimizations. > > I think this is all of the non-avx-specific issues raised thus far.... One > of the issues isn't avx specific but can only be solved with avx. Anyways, > I'll look into some of the AVX issues next. >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140923/a8cad78b/attachment.html>
Chandler Carruth
2014-Sep-29 13:05 UTC
[LLVMdev] Please benchmark new x86 vector shuffle lowering, planning to make it the default very soon!
On Tue, Sep 23, 2014 at 4:28 AM, Chandler Carruth <chandlerc at google.com> wrote:> AVX2 is still in-flight.AVX2 is pretty much done. All of the AVX and AVX2 lowering has now been heavily fuzz tested (a few million test cases and counting). I believe it is correct. I've added the basic framework for AVX-512. Nothing interesting is implemented there, mostly because I think there are still very big unanswered questions about how AVX-512 should work. For example, it would be good to lower with index-destructive vs. table-destructive shuffles based on # of uses, but that isn't really possible today. Even better would be to actually respect any loop structure or other invariant properties. There are still plenty of performance gains to be had in AVX or AVX2 (broadcast support, work to combine away intermediate shuffling such as can be seen in the v32i8 test cases with interleaved unpacks, etc. etc. However, I think essentially all of the test cases (other than broadcast and shift test cases) have been fixed. I'd really like to enable this and let folks submit patches for the few remaining cases that impact them significantly. As far as I can tell, the new code paths offer very significant advantages for hardware folks have today with only a few downsides. While they are less implemented for AVX-512 than the current code, I don't really think that should be the priority. Are there any remaining objections? -Chandler -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20140929/851ca54e/attachment.html>