Jay Foad via llvm-dev
2019-Dec-09 15:36 UTC
[llvm-dev] [PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs
Sanjay, I'm looking at some missed optimizations caused by D70246. Here's a test case: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef> %t46 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 undef, i32 1, i32 undef, i32 undef> %t47 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 undef, i32 undef, i32 2, i32 undef> %t48 = insertelement <4 x float> %t45, float 1.000000e+00, i32 3 %t49 = shufflevector <4 x float> %t48, <4 x float> %t46, <4 x i32> <i32 0, i32 5, i32 undef, i32 3> %t50 = shufflevector <4 x float> %t49, <4 x float> %t47, <4 x i32> <i32 0, i32 1, i32 6, i32 3> %t55 = load <4 x float>, <4 x float>* %t24, align 16 %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1 %t59 = load <4 x float>, <4 x float>* %t58, align 16 %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2 %t63 = load <4 x float>, <4 x float>* %t62, align 16 %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3 %t67 = load <4 x float>, <4 x float>* %t66, align 16 %t69 = shufflevector <4 x float> %t50, <4 x float> undef, <4 x i32> zeroinitializer %t71 = fmul <4 x float> %t55, %t69 %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1> %t74 = fmul <4 x float> %t59, %t72 %t75 = fadd <4 x float> %t71, %t74 %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 2, i32 2, i32 2, i32 2> %t78 = fmul <4 x float> %t63, %t76 %t79 = fadd <4 x float> %t75, %t78 %t80 = shufflevector <4 x float> %t50, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3> %t82 = fmul <4 x float> %t67, %t80 %t83 = fadd <4 x float> %t79, %t82 ret <4 x float> %t83 } Before D70246, opt -instcombine gives this: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t55 = load <4 x float>, <4 x float>* %t24, align 16 %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1 %t59 = load <4 x float>, <4 x float>* %t58, align 16 %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2 %t63 = load <4 x float>, <4 x float>* %t62, align 16 %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3 %t67 = load <4 x float>, <4 x float>* %t66, align 16 %t69 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> zeroinitializer %t71 = fmul <4 x float> %t55, %t69 %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1> %t74 = fmul <4 x float> %t59, %t72 %t75 = fadd <4 x float> %t71, %t74 %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 2, i32 2, i32 2, i32 2> %t78 = fmul <4 x float> %t63, %t76 %t79 = fadd <4 x float> %t75, %t78 %t83 = fadd <4 x float> %t79, %t67 ret <4 x float> %t83 } After D70246, opt -instcombine gives this: define <4 x float> @f(i32 %t32, <4 x float>* %t24) { .entry: %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 %t44 = bitcast <3 x i32> %t43 to <3 x float> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef> %t48 = insertelement <4 x float> %t45, float 1.000000e+00, i32 3 %t55 = load <4 x float>, <4 x float>* %t24, align 16 %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1 %t59 = load <4 x float>, <4 x float>* %t58, align 16 %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2 %t63 = load <4 x float>, <4 x float>* %t62, align 16 %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3 %t67 = load <4 x float>, <4 x float>* %t66, align 16 %t69 = shufflevector <4 x float> %t48, <4 x float> undef, <4 x i32> zeroinitializer %t71 = fmul <4 x float> %t55, %t69 %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 1, i32 1, i32 1, i32 1> %t74 = fmul <4 x float> %t59, %t72 %t75 = fadd <4 x float> %t71, %t74 %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> <i32 2, i32 2, i32 2, i32 2> %t78 = fmul <4 x float> %t63, %t76 %t79 = fadd <4 x float> %t75, %t78 %t80 = shufflevector <4 x float> %t48, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3> %t82 = fmul <4 x float> %t67, %t80 %t83 = fadd <4 x float> %t79, %t82 ret <4 x float> %t83 } Notice that it has failed to simplify %t80, which extracts element 3 from %t48, which is obviously just the constant 1.0. Would you expect the change you committed to have this kind of effect? Do you agree that simplifying %t80 is still valid, even with the new interpretation of the rules about shufflevector and undef and poison? Thanks, Jay. On Thu, 14 Nov 2019 at 16:25, Sanjay Patel via Phabricator via llvm-commits <llvm-commits at lists.llvm.org> wrote:> > spatel created this revision. > spatel added reviewers: efriedma, regehr, aqjune, nlopes, RKSimon, lebedev.ri, liuz. > Herald added subscribers: hiraditya, mcrosier. > Herald added a project: LLVM. > > Given a shuffle that includes undef elements in an otherwise identity mask like: > > define <4 x float> @shuffle(<4 x float> %arg) { > %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3> > ret <4 x float> %shuf > } > > We were simplifying that to the input operand. > > But as discussed in PR43958: > https://bugs.llvm.org/show_bug.cgi?id=43958 > ...that means that per-vector-element poison that would be stopped by the shuffle can now leak to the result. > > For reference, this transform was added long ago with: > rL142671 <https://reviews.llvm.org/rL142671> > > Also note that we still have (and there are tests for) the same transform with no undef elements in the mask (a fully-defined identity mask). I don't think there's any controversy about that case - it's a valid transform under any interpretation of shufflevector/undef/poison. > > Looking at a few of the diffs into codegen, I don't see any difference in final asm. So depending on your perspective, that's good (no real loss of optimization power) or bad (poison exists in the DAG, so we only partially fixed the bug). > > > https://reviews.llvm.org/D70246 > > Files: > llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp > llvm/test/Transforms/InstCombine/X86/clmulqdq.ll > llvm/test/Transforms/InstCombine/X86/x86-avx2.ll > llvm/test/Transforms/InstCombine/X86/x86-avx512.ll > llvm/test/Transforms/InstCombine/X86/x86-f16c.ll > llvm/test/Transforms/InstCombine/X86/x86-pack.ll > llvm/test/Transforms/InstCombine/X86/x86-pshufb.ll > llvm/test/Transforms/InstCombine/X86/x86-sse.ll > llvm/test/Transforms/InstCombine/X86/x86-sse41.ll > llvm/test/Transforms/InstCombine/X86/x86-sse4a.ll > llvm/test/Transforms/InstCombine/X86/x86-vector-shifts.ll > llvm/test/Transforms/InstCombine/X86/x86-vpermil.ll > llvm/test/Transforms/InstCombine/X86/x86-xop.ll > llvm/test/Transforms/InstCombine/shuffle_select.ll > llvm/test/Transforms/InstCombine/vec_demanded_elts.ll > llvm/test/Transforms/InstCombine/vec_shuffle.ll > > _______________________________________________ > llvm-commits mailing list > llvm-commits at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
Sanjay Patel via llvm-dev
2019-Dec-09 16:18 UTC
[llvm-dev] [PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs
Hi Jay - Yes, this change carried some risk of regressions, and we should still be able to optimize this case. We may need to add or loosen a fold for splat-of-insert: define <4 x float> @f(<4 x float> %x) { %ins3 = insertelement <4 x float> %x, float 3.000000e+00, i32 3 %splat0 = shufflevector <4 x float> %ins3, <4 x float> undef, <4 x i32> zeroinitializer %splat3 = shufflevector <4 x float> %ins3, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3> %r = fadd <4 x float> %splat0, %splat3 ret <4 x float> %r } SimplifyDemandedVectorElts() may not get this because %ins3 has multiple uses, but I didn't step through that yet to confirm. I can work on this if you haven't started already. Let me know. On Mon, Dec 9, 2019 at 10:36 AM Jay Foad <jay.foad at gmail.com> wrote:> Sanjay, > > I'm looking at some missed optimizations caused by D70246. Here's a test > case: > > define <4 x float> @f(i32 %t32, <4 x float>* %t24) { > .entry: > %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 > %t44 = bitcast <3 x i32> %t43 to <3 x float> > %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > <i32 0, i32 undef, i32 undef, i32 undef> > %t46 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > <i32 undef, i32 1, i32 undef, i32 undef> > %t47 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > <i32 undef, i32 undef, i32 2, i32 undef> > %t48 = insertelement <4 x float> %t45, float 1.000000e+00, i32 3 > %t49 = shufflevector <4 x float> %t48, <4 x float> %t46, <4 x i32> > <i32 0, i32 5, i32 undef, i32 3> > %t50 = shufflevector <4 x float> %t49, <4 x float> %t47, <4 x i32> > <i32 0, i32 1, i32 6, i32 3> > %t55 = load <4 x float>, <4 x float>* %t24, align 16 > %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1 > %t59 = load <4 x float>, <4 x float>* %t58, align 16 > %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2 > %t63 = load <4 x float>, <4 x float>* %t62, align 16 > %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3 > %t67 = load <4 x float>, <4 x float>* %t66, align 16 > %t69 = shufflevector <4 x float> %t50, <4 x float> undef, <4 x i32> > zeroinitializer > %t71 = fmul <4 x float> %t55, %t69 > %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > <i32 1, i32 1, i32 1, i32 1> > %t74 = fmul <4 x float> %t59, %t72 > %t75 = fadd <4 x float> %t71, %t74 > %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > <i32 2, i32 2, i32 2, i32 2> > %t78 = fmul <4 x float> %t63, %t76 > %t79 = fadd <4 x float> %t75, %t78 > %t80 = shufflevector <4 x float> %t50, <4 x float> undef, <4 x i32> > <i32 3, i32 3, i32 3, i32 3> > %t82 = fmul <4 x float> %t67, %t80 > %t83 = fadd <4 x float> %t79, %t82 > ret <4 x float> %t83 > } > > Before D70246, opt -instcombine gives this: > > define <4 x float> @f(i32 %t32, <4 x float>* %t24) { > .entry: > %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 > %t44 = bitcast <3 x i32> %t43 to <3 x float> > %t55 = load <4 x float>, <4 x float>* %t24, align 16 > %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1 > %t59 = load <4 x float>, <4 x float>* %t58, align 16 > %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2 > %t63 = load <4 x float>, <4 x float>* %t62, align 16 > %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3 > %t67 = load <4 x float>, <4 x float>* %t66, align 16 > %t69 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > zeroinitializer > %t71 = fmul <4 x float> %t55, %t69 > %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > <i32 1, i32 1, i32 1, i32 1> > %t74 = fmul <4 x float> %t59, %t72 > %t75 = fadd <4 x float> %t71, %t74 > %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > <i32 2, i32 2, i32 2, i32 2> > %t78 = fmul <4 x float> %t63, %t76 > %t79 = fadd <4 x float> %t75, %t78 > %t83 = fadd <4 x float> %t79, %t67 > ret <4 x float> %t83 > } > > After D70246, opt -instcombine gives this: > > define <4 x float> @f(i32 %t32, <4 x float>* %t24) { > .entry: > %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 > %t44 = bitcast <3 x i32> %t43 to <3 x float> > %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > <i32 0, i32 undef, i32 undef, i32 undef> > %t48 = insertelement <4 x float> %t45, float 1.000000e+00, i32 3 > %t55 = load <4 x float>, <4 x float>* %t24, align 16 > %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1 > %t59 = load <4 x float>, <4 x float>* %t58, align 16 > %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2 > %t63 = load <4 x float>, <4 x float>* %t62, align 16 > %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3 > %t67 = load <4 x float>, <4 x float>* %t66, align 16 > %t69 = shufflevector <4 x float> %t48, <4 x float> undef, <4 x i32> > zeroinitializer > %t71 = fmul <4 x float> %t55, %t69 > %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > <i32 1, i32 1, i32 1, i32 1> > %t74 = fmul <4 x float> %t59, %t72 > %t75 = fadd <4 x float> %t71, %t74 > %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> > <i32 2, i32 2, i32 2, i32 2> > %t78 = fmul <4 x float> %t63, %t76 > %t79 = fadd <4 x float> %t75, %t78 > %t80 = shufflevector <4 x float> %t48, <4 x float> undef, <4 x i32> > <i32 3, i32 3, i32 3, i32 3> > %t82 = fmul <4 x float> %t67, %t80 > %t83 = fadd <4 x float> %t79, %t82 > ret <4 x float> %t83 > } > > Notice that it has failed to simplify %t80, which extracts element 3 > from %t48, which is obviously just the constant 1.0. > > Would you expect the change you committed to have this kind of effect? > Do you agree that simplifying %t80 is still valid, even with the new > interpretation of the rules about shufflevector and undef and poison? > > Thanks, > Jay. > > On Thu, 14 Nov 2019 at 16:25, Sanjay Patel via Phabricator via > llvm-commits <llvm-commits at lists.llvm.org> wrote: > > > > spatel created this revision. > > spatel added reviewers: efriedma, regehr, aqjune, nlopes, RKSimon, > lebedev.ri, liuz. > > Herald added subscribers: hiraditya, mcrosier. > > Herald added a project: LLVM. > > > > Given a shuffle that includes undef elements in an otherwise identity > mask like: > > > > define <4 x float> @shuffle(<4 x float> %arg) { > > %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> > <i32 undef, i32 1, i32 2, i32 3> > > ret <4 x float> %shuf > > } > > > > We were simplifying that to the input operand. > > > > But as discussed in PR43958: > > https://bugs.llvm.org/show_bug.cgi?id=43958 > > ...that means that per-vector-element poison that would be stopped by > the shuffle can now leak to the result. > > > > For reference, this transform was added long ago with: > > rL142671 <https://reviews.llvm.org/rL142671> > > > > Also note that we still have (and there are tests for) the same > transform with no undef elements in the mask (a fully-defined identity > mask). I don't think there's any controversy about that case - it's a valid > transform under any interpretation of shufflevector/undef/poison. > > > > Looking at a few of the diffs into codegen, I don't see any difference > in final asm. So depending on your perspective, that's good (no real loss > of optimization power) or bad (poison exists in the DAG, so we only > partially fixed the bug). > > > > > > https://reviews.llvm.org/D70246 > > > > Files: > > llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp > > llvm/test/Transforms/InstCombine/X86/clmulqdq.ll > > llvm/test/Transforms/InstCombine/X86/x86-avx2.ll > > llvm/test/Transforms/InstCombine/X86/x86-avx512.ll > > llvm/test/Transforms/InstCombine/X86/x86-f16c.ll > > llvm/test/Transforms/InstCombine/X86/x86-pack.ll > > llvm/test/Transforms/InstCombine/X86/x86-pshufb.ll > > llvm/test/Transforms/InstCombine/X86/x86-sse.ll > > llvm/test/Transforms/InstCombine/X86/x86-sse41.ll > > llvm/test/Transforms/InstCombine/X86/x86-sse4a.ll > > llvm/test/Transforms/InstCombine/X86/x86-vector-shifts.ll > > llvm/test/Transforms/InstCombine/X86/x86-vpermil.ll > > llvm/test/Transforms/InstCombine/X86/x86-xop.ll > > llvm/test/Transforms/InstCombine/shuffle_select.ll > > llvm/test/Transforms/InstCombine/vec_demanded_elts.ll > > llvm/test/Transforms/InstCombine/vec_shuffle.ll > > > > _______________________________________________ > > llvm-commits mailing list > > llvm-commits at lists.llvm.org > > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191209/02f18d40/attachment.html>
Jay Foad via llvm-dev
2019-Dec-09 16:24 UTC
[llvm-dev] [PATCH] D70246: [InstCombine] remove identity shuffle simplification for mask with undefs
Sanjay, I haven't started looking into it, so please go ahead! Thanks, Jay. On Mon, 9 Dec 2019 at 16:18, Sanjay Patel <spatel at rotateright.com> wrote:> > Hi Jay - > > Yes, this change carried some risk of regressions, and we should still be able to optimize this case. We may need to add or loosen a fold for splat-of-insert: > define <4 x float> @f(<4 x float> %x) { > %ins3 = insertelement <4 x float> %x, float 3.000000e+00, i32 3 > %splat0 = shufflevector <4 x float> %ins3, <4 x float> undef, <4 x i32> zeroinitializer > %splat3 = shufflevector <4 x float> %ins3, <4 x float> undef, <4 x i32> <i32 3, i32 3, i32 3, i32 3> > %r = fadd <4 x float> %splat0, %splat3 > ret <4 x float> %r > } > > SimplifyDemandedVectorElts() may not get this because %ins3 has multiple uses, but I didn't step through that yet to confirm. I can work on this if you haven't started already. Let me know. > > On Mon, Dec 9, 2019 at 10:36 AM Jay Foad <jay.foad at gmail.com> wrote: >> >> Sanjay, >> >> I'm looking at some missed optimizations caused by D70246. Here's a test case: >> >> define <4 x float> @f(i32 %t32, <4 x float>* %t24) { >> .entry: >> %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 >> %t44 = bitcast <3 x i32> %t43 to <3 x float> >> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> <i32 0, i32 undef, i32 undef, i32 undef> >> %t46 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> <i32 undef, i32 1, i32 undef, i32 undef> >> %t47 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> <i32 undef, i32 undef, i32 2, i32 undef> >> %t48 = insertelement <4 x float> %t45, float 1.000000e+00, i32 3 >> %t49 = shufflevector <4 x float> %t48, <4 x float> %t46, <4 x i32> >> <i32 0, i32 5, i32 undef, i32 3> >> %t50 = shufflevector <4 x float> %t49, <4 x float> %t47, <4 x i32> >> <i32 0, i32 1, i32 6, i32 3> >> %t55 = load <4 x float>, <4 x float>* %t24, align 16 >> %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1 >> %t59 = load <4 x float>, <4 x float>* %t58, align 16 >> %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2 >> %t63 = load <4 x float>, <4 x float>* %t62, align 16 >> %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3 >> %t67 = load <4 x float>, <4 x float>* %t66, align 16 >> %t69 = shufflevector <4 x float> %t50, <4 x float> undef, <4 x i32> >> zeroinitializer >> %t71 = fmul <4 x float> %t55, %t69 >> %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> <i32 1, i32 1, i32 1, i32 1> >> %t74 = fmul <4 x float> %t59, %t72 >> %t75 = fadd <4 x float> %t71, %t74 >> %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> <i32 2, i32 2, i32 2, i32 2> >> %t78 = fmul <4 x float> %t63, %t76 >> %t79 = fadd <4 x float> %t75, %t78 >> %t80 = shufflevector <4 x float> %t50, <4 x float> undef, <4 x i32> >> <i32 3, i32 3, i32 3, i32 3> >> %t82 = fmul <4 x float> %t67, %t80 >> %t83 = fadd <4 x float> %t79, %t82 >> ret <4 x float> %t83 >> } >> >> Before D70246, opt -instcombine gives this: >> >> define <4 x float> @f(i32 %t32, <4 x float>* %t24) { >> .entry: >> %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 >> %t44 = bitcast <3 x i32> %t43 to <3 x float> >> %t55 = load <4 x float>, <4 x float>* %t24, align 16 >> %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1 >> %t59 = load <4 x float>, <4 x float>* %t58, align 16 >> %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2 >> %t63 = load <4 x float>, <4 x float>* %t62, align 16 >> %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3 >> %t67 = load <4 x float>, <4 x float>* %t66, align 16 >> %t69 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> zeroinitializer >> %t71 = fmul <4 x float> %t55, %t69 >> %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> <i32 1, i32 1, i32 1, i32 1> >> %t74 = fmul <4 x float> %t59, %t72 >> %t75 = fadd <4 x float> %t71, %t74 >> %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> <i32 2, i32 2, i32 2, i32 2> >> %t78 = fmul <4 x float> %t63, %t76 >> %t79 = fadd <4 x float> %t75, %t78 >> %t83 = fadd <4 x float> %t79, %t67 >> ret <4 x float> %t83 >> } >> >> After D70246, opt -instcombine gives this: >> >> define <4 x float> @f(i32 %t32, <4 x float>* %t24) { >> .entry: >> %t43 = insertelement <3 x i32> undef, i32 %t32, i32 2 >> %t44 = bitcast <3 x i32> %t43 to <3 x float> >> %t45 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> <i32 0, i32 undef, i32 undef, i32 undef> >> %t48 = insertelement <4 x float> %t45, float 1.000000e+00, i32 3 >> %t55 = load <4 x float>, <4 x float>* %t24, align 16 >> %t58 = getelementptr <4 x float>, <4 x float>* %t24, i64 1 >> %t59 = load <4 x float>, <4 x float>* %t58, align 16 >> %t62 = getelementptr <4 x float>, <4 x float>* %t24, i64 2 >> %t63 = load <4 x float>, <4 x float>* %t62, align 16 >> %t66 = getelementptr <4 x float>, <4 x float>* %t24, i64 3 >> %t67 = load <4 x float>, <4 x float>* %t66, align 16 >> %t69 = shufflevector <4 x float> %t48, <4 x float> undef, <4 x i32> >> zeroinitializer >> %t71 = fmul <4 x float> %t55, %t69 >> %t72 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> <i32 1, i32 1, i32 1, i32 1> >> %t74 = fmul <4 x float> %t59, %t72 >> %t75 = fadd <4 x float> %t71, %t74 >> %t76 = shufflevector <3 x float> %t44, <3 x float> undef, <4 x i32> >> <i32 2, i32 2, i32 2, i32 2> >> %t78 = fmul <4 x float> %t63, %t76 >> %t79 = fadd <4 x float> %t75, %t78 >> %t80 = shufflevector <4 x float> %t48, <4 x float> undef, <4 x i32> >> <i32 3, i32 3, i32 3, i32 3> >> %t82 = fmul <4 x float> %t67, %t80 >> %t83 = fadd <4 x float> %t79, %t82 >> ret <4 x float> %t83 >> } >> >> Notice that it has failed to simplify %t80, which extracts element 3 >> from %t48, which is obviously just the constant 1.0. >> >> Would you expect the change you committed to have this kind of effect? >> Do you agree that simplifying %t80 is still valid, even with the new >> interpretation of the rules about shufflevector and undef and poison? >> >> Thanks, >> Jay. >> >> On Thu, 14 Nov 2019 at 16:25, Sanjay Patel via Phabricator via >> llvm-commits <llvm-commits at lists.llvm.org> wrote: >> > >> > spatel created this revision. >> > spatel added reviewers: efriedma, regehr, aqjune, nlopes, RKSimon, lebedev.ri, liuz. >> > Herald added subscribers: hiraditya, mcrosier. >> > Herald added a project: LLVM. >> > >> > Given a shuffle that includes undef elements in an otherwise identity mask like: >> > >> > define <4 x float> @shuffle(<4 x float> %arg) { >> > %shuf = shufflevector <4 x float> %arg, <4 x float> undef, <4 x i32> <i32 undef, i32 1, i32 2, i32 3> >> > ret <4 x float> %shuf >> > } >> > >> > We were simplifying that to the input operand. >> > >> > But as discussed in PR43958: >> > https://bugs.llvm.org/show_bug.cgi?id=43958 >> > ...that means that per-vector-element poison that would be stopped by the shuffle can now leak to the result. >> > >> > For reference, this transform was added long ago with: >> > rL142671 <https://reviews.llvm.org/rL142671> >> > >> > Also note that we still have (and there are tests for) the same transform with no undef elements in the mask (a fully-defined identity mask). I don't think there's any controversy about that case - it's a valid transform under any interpretation of shufflevector/undef/poison. >> > >> > Looking at a few of the diffs into codegen, I don't see any difference in final asm. So depending on your perspective, that's good (no real loss of optimization power) or bad (poison exists in the DAG, so we only partially fixed the bug). >> > >> > >> > https://reviews.llvm.org/D70246 >> > >> > Files: >> > llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp >> > llvm/test/Transforms/InstCombine/X86/clmulqdq.ll >> > llvm/test/Transforms/InstCombine/X86/x86-avx2.ll >> > llvm/test/Transforms/InstCombine/X86/x86-avx512.ll >> > llvm/test/Transforms/InstCombine/X86/x86-f16c.ll >> > llvm/test/Transforms/InstCombine/X86/x86-pack.ll >> > llvm/test/Transforms/InstCombine/X86/x86-pshufb.ll >> > llvm/test/Transforms/InstCombine/X86/x86-sse.ll >> > llvm/test/Transforms/InstCombine/X86/x86-sse41.ll >> > llvm/test/Transforms/InstCombine/X86/x86-sse4a.ll >> > llvm/test/Transforms/InstCombine/X86/x86-vector-shifts.ll >> > llvm/test/Transforms/InstCombine/X86/x86-vpermil.ll >> > llvm/test/Transforms/InstCombine/X86/x86-xop.ll >> > llvm/test/Transforms/InstCombine/shuffle_select.ll >> > llvm/test/Transforms/InstCombine/vec_demanded_elts.ll >> > llvm/test/Transforms/InstCombine/vec_shuffle.ll >> > >> > _______________________________________________ >> > llvm-commits mailing list >> > llvm-commits at lists.llvm.org >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
Possibly Parallel Threads
- Instruction selection problems due to SelectionDAGBuilder
- [SelectionDAG] Assertion due to MachineMemOperand flags difference.
- rL296252 Made large integer operation codegen significantly worse.
- Node deletion during DAG Combination ?
- Filtering on a dataframe- newbie question