andrew adams
2011-Oct-28 21:13 UTC
[LLVMdev] instcombine does silly things with vector x+x
Consider the following function which doubles a <16 x i8> vector: define <16 x i8> @test(<16 x i8> %a) { %b = add <16 x i8> %a, %a ret <16 x i8> %b } If I compile it for x86 with llc like so: llc paddb.ll -filetype=asm -o=/dev/stdout I get a two-op function that just does paddb %xmm0 %xmm0 and then returns. llc does this regardless of the optimization level. Great! If I let the instcombine pass touch it like so: opt -instcombine paddb.ll | llc -filetype=asm -o=/dev/stdout or like so: opt -O3 paddb.ll | llc -filetype=asm -o=/dev/stdout then the add gets converted to a vector left shift by 1, which then lowers to a much slower function with about a hundred ops. No amount of optimization after the fact will simplify it back to paddb. I'm actually generating these ops in a JIT context, and I want to use instcombine, as it seems like a useful pass. Any idea how I can reliably generate the 128-bit sse version of paddb? I thought I might be able to force the issue with an intrinsic, but there only seems to be an intrinsic for the 64 bit version (llvm.x86.mmx.padd.b), and the saturating 128 bit version (llvm.x86.sse2.padds.b). I would just give up and use inline assembly, but it seems I can't JIT that. I'm using the latest llvm 3.1 from svn. I get similar behavior at llvm.org/demo using the following equivalent C code: #include <emmintrin.h> __m128i f(__m128i a) { return _mm_add_epi8(a, a); } The no-optimization compilation of this is better than the optimized version. Any ideas? Should I just not use this pass? - Andrew
Chris Lattner
2011-Oct-28 23:04 UTC
[LLVMdev] instcombine does silly things with vector x+x
On Oct 28, 2011, at 2:13 PM, andrew adams wrote:> Consider the following function which doubles a <16 x i8> vector: > > define <16 x i8> @test(<16 x i8> %a) { > %b = add <16 x i8> %a, %a > ret <16 x i8> %b > } > > If I compile it for x86 with llc like so: > > llc paddb.ll -filetype=asm -o=/dev/stdout > > I get a two-op function that just does paddb %xmm0 %xmm0 and then > returns. llc does this regardless of the optimization level. Great! > > If I let the instcombine pass touch it like so: > > opt -instcombine paddb.ll | llc -filetype=asm -o=/dev/stdout > > or like so: > > opt -O3 paddb.ll | llc -filetype=asm -o=/dev/stdout > > then the add gets converted to a vector left shift by 1, which then > lowers to a much slower function with about a hundred ops. No amount > of optimization after the fact will simplify it back to paddy.This sounds like a really serious X86 backend performance bug. Canonicalizing "x+x" to a shift is the "right thing to do", the backend should match it. -Chris> > I'm actually generating these ops in a JIT context, and I want to use > instcombine, as it seems like a useful pass. Any idea how I can > reliably generate the 128-bit sse version of paddb? I thought I might > be able to force the issue with an intrinsic, but there only seems to > be an intrinsic for the 64 bit version (llvm.x86.mmx.padd.b), and the > saturating 128 bit version (llvm.x86.sse2.padds.b). I would just give > up and use inline assembly, but it seems I can't JIT that. > > I'm using the latest llvm 3.1 from svn. I get similar behavior at > llvm.org/demo using the following equivalent C code: > > #include <emmintrin.h> > __m128i f(__m128i a) { > return _mm_add_epi8(a, a); > } > > The no-optimization compilation of this is better than the optimized version. > > Any ideas? Should I just not use this pass? > > - Andrew > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
Rotem, Nadav
2011-Oct-30 07:12 UTC
[LLVMdev] instcombine does silly things with vector x+x
Opened pr11266. I will try to make time to work on it. -----Original Message----- From: llvmdev-bounces at cs.uiuc.edu [mailto:llvmdev-bounces at cs.uiuc.edu] On Behalf Of Chris Lattner Sent: Saturday, October 29, 2011 01:04 To: andrew adams Cc: llvmdev at cs.uiuc.edu Subject: Re: [LLVMdev] instcombine does silly things with vector x+x On Oct 28, 2011, at 2:13 PM, andrew adams wrote:> Consider the following function which doubles a <16 x i8> vector: > > define <16 x i8> @test(<16 x i8> %a) { > %b = add <16 x i8> %a, %a > ret <16 x i8> %b > } > > If I compile it for x86 with llc like so: > > llc paddb.ll -filetype=asm -o=/dev/stdout > > I get a two-op function that just does paddb %xmm0 %xmm0 and then > returns. llc does this regardless of the optimization level. Great! > > If I let the instcombine pass touch it like so: > > opt -instcombine paddb.ll | llc -filetype=asm -o=/dev/stdout > > or like so: > > opt -O3 paddb.ll | llc -filetype=asm -o=/dev/stdout > > then the add gets converted to a vector left shift by 1, which then > lowers to a much slower function with about a hundred ops. No amount > of optimization after the fact will simplify it back to paddy.This sounds like a really serious X86 backend performance bug. Canonicalizing "x+x" to a shift is the "right thing to do", the backend should match it. -Chris> > I'm actually generating these ops in a JIT context, and I want to use > instcombine, as it seems like a useful pass. Any idea how I can > reliably generate the 128-bit sse version of paddb? I thought I might > be able to force the issue with an intrinsic, but there only seems to > be an intrinsic for the 64 bit version (llvm.x86.mmx.padd.b), and the > saturating 128 bit version (llvm.x86.sse2.padds.b). I would just give > up and use inline assembly, but it seems I can't JIT that. > > I'm using the latest llvm 3.1 from svn. I get similar behavior at > llvm.org/demo using the following equivalent C code: > > #include <emmintrin.h> > __m128i f(__m128i a) { > return _mm_add_epi8(a, a); > } > > The no-optimization compilation of this is better than the optimized version. > > Any ideas? Should I just not use this pass? > > - Andrew > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev_______________________________________________ LLVM Developers mailing list LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev --------------------------------------------------------------------- Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
Reasonably Related Threads
- [LLVMdev] instcombine does silly things with vector x+x
- [LLVMdev] instcombine does silly things with vector x+x
- [PATCH 1/2] Modify autoconf tests for intrinsics to stop clang from optimizing them away.
- [RFC PATCH v3] Intrinsics/RTCD related fixes. Mostly x86.
- [RFC PATCHv2] Intrinsics/RTCD related fixes. Mostly x86.