Hal Finkel
2012-Jan-26 21:19 UTC
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote:> On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote: > > Thanks! Did you compile with any non-default flags other than -mllvm > > -vectorize? > > I used -O3 and -vectorize, no other non-default flags.If I run clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c then I get no vectorization at all (the output is identical to that without the -vectorize). What target triple is your clang targeting? If I include -mllvm -debug-only=bb-vectorize then the relevant output is: BBV: fusing loop #1 for entry in main... BBV: found 0 instructions with candidate pairs BBV: done! BBV: fusing loop #1 for for.body in main... BBV: found 0 instructions with candidate pairs BBV: done! BBV: fusing loop #1 for for.end in main... BBV: found 0 instructions with candidate pairs BBV: done! BBV: fusing loop #1 for for.cond7.preheader in main... BBV: found 0 instructions with candidate pairs BBV: done! BBV: fusing loop #1 for for.body10 in main... BBV: found 16 instructions with candidate pairs BBV: found 62 pair connections. BBV: selected 0 pairs. BBV: done! BBV: fusing loop #1 for for.inc45 in main... BBV: found 0 instructions with candidate pairs BBV: done! BBV: fusing loop #1 for for.end47 in main... BBV: found 3 instructions with candidate pairs BBV: found 0 pair connections. BBV: done! -Hal> > Sebastian > -- > Qualcomm Innovation Center, Inc is a member of Code Aurora Forum-- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Anton Korobeynikov
2012-Jan-26 21:30 UTC
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
> If I run clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c > then I get no vectorization at all (the output is identical to that > without the -vectorize). What target triple is your clang targeting?Probably Sebastian can provide the .ll file. Then no clang will be necessary :) -- With best regards, Anton Korobeynikov Faculty of Mathematics and Mechanics, Saint Petersburg State University
Sebastian Pop
2012-Jan-26 21:36 UTC
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, Jan 26, 2012 at 3:19 PM, Hal Finkel <hfinkel at anl.gov> wrote:> On Thu, 2012-01-26 at 15:12 -0600, Sebastian Pop wrote: >> On Thu, Jan 26, 2012 at 2:49 PM, Hal Finkel <hfinkel at anl.gov> wrote: >> > Thanks! Did you compile with any non-default flags other than -mllvm >> > -vectorize? >> >> I used -O3 and -vectorize, no other non-default flags. > > If I run clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c > then I get no vectorization at all (the output is identical to that > without the -vectorize). What target triple is your clang targeting? >Target: arm-none-linux-gnueabi> If I include -mllvm -debug-only=bb-vectorize then the relevant output > is: > BBV: fusing loop #1 for entry in main... > BBV: found 0 instructions with candidate pairs > BBV: done! > BBV: fusing loop #1 for for.body in main... > BBV: found 0 instructions with candidate pairs > BBV: done! > BBV: fusing loop #1 for for.end in main... > BBV: found 0 instructions with candidate pairs > BBV: done! > BBV: fusing loop #1 for for.cond7.preheader in main... > BBV: found 0 instructions with candidate pairs > BBV: done! > BBV: fusing loop #1 for for.body10 in main... > BBV: found 16 instructions with candidate pairs > BBV: found 62 pair connections. > BBV: selected 0 pairs. > BBV: done! > BBV: fusing loop #1 for for.inc45 in main... > BBV: found 0 instructions with candidate pairs > BBV: done! > BBV: fusing loop #1 for for.end47 in main... > BBV: found 3 instructions with candidate pairs > BBV: found 0 pair connections. > BBV: done! > > -Hal >Here is my output: clang -O3 -mllvm -vectorize -S -emit-llvm -o test.ll test.c -mllvm -debug-only=bb-vectorize BBV: fusing loop #1 for entry in main... BBV: found 0 instructions with candidate pairs BBV: done! BBV: fusing loop #1 for for.body in main... BBV: found 0 instructions with candidate pairs BBV: done! BBV: fusing loop #1 for for.end in main... BBV: found 0 instructions with candidate pairs BBV: done! BBV: fusing loop #1 for for.cond7.preheader in main... BBV: found 0 instructions with candidate pairs BBV: done! BBV: fusing loop #1 for for.body10 in main... BBV: found 22 instructions with candidate pairs BBV: found 82 pair connections. BBV: selected pairs in the best tree for: %0 = load i8* %r.063, align 1, !tbaa !0 BBV: selected pair: %mul23 = mul nsw i32 %conv14, 234 <-> %mul35 mul nsw i32 %conv15, 543 BBV: selected pair: %0 = load i8* %r.063, align 1, !tbaa !0 <-> %1 = load i8* %incdec.ptr11, align 1, !tbaa !0 BBV: selected pair: %conv14 = zext i8 %0 to i32 <-> %conv15 = zext i8 %1 to i32 BBV: selected pair: %add26 = add i32 %mul25, %mul23 <-> %add36 add i32 %mul35, %mul33 BBV: selected pair: %mul = mul nsw i32 %conv14, 123 <-> %mul16 mul nsw i32 %conv15, 321 BBV: selected pair: %conv30 = trunc i32 %add29 to i8 <-> %conv40 trunc i32 %add39 to i8 BBV: selected pair: %mul25 = mul nsw i32 %conv15, 432 <-> %mul33 mul nsw i32 %conv14, 345 BBV: selected pair: %add29 = add i32 %add26, %mul28 <-> %add39 add i32 %add36, %mul38 BBV: selected pair: store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0 <-> store i8 %conv40, i8* %incdec.ptr31, align 1, !tbaa !0 BBV: selected pairs in the best tree for: %conv14 = zext i8 %0 to i32 BBV: selected pair: %mul23 = mul nsw i32 %conv14, 234 <-> %mul35 mul nsw i32 %conv15, 543 BBV: selected pair: %conv14 = zext i8 %0 to i32 <-> %conv15 = zext i8 %1 to i32 BBV: selected pair: %mul = mul nsw i32 %conv14, 123 <-> %mul16 mul nsw i32 %conv15, 321 BBV: selected pair: %add26 = add i32 %mul25, %mul23 <-> %add36 add i32 %mul35, %mul33 BBV: selected pair: %conv30 = trunc i32 %add29 to i8 <-> %conv40 trunc i32 %add39 to i8 BBV: selected pair: %mul25 = mul nsw i32 %conv15, 432 <-> %mul33 mul nsw i32 %conv14, 345 BBV: selected pair: %add29 = add i32 %add26, %mul28 <-> %add39 add i32 %add36, %mul38 BBV: selected pair: store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0 <-> store i8 %conv40, i8* %incdec.ptr31, align 1, !tbaa !0 BBV: selected 9 pairs. BBV: initial: for.body10: ; preds %for.body10, %for.cond7.preheader %w.065 = phi i8* [ %call1, %for.cond7.preheader ], [ %incdec.ptr41, %for.body10 ] %i.164 = phi i32 [ 0, %for.cond7.preheader ], [ %inc43, %for.body10 ] %r.063 = phi i8* [ %call, %for.cond7.preheader ], [ %incdec.ptr13, %for.body10 ] %incdec.ptr11 = getelementptr inbounds i8* %r.063, i32 1 %0 = load i8* %r.063, align 1, !tbaa !0 %incdec.ptr12 = getelementptr inbounds i8* %r.063, i32 2 %1 = load i8* %incdec.ptr11, align 1, !tbaa !0 %incdec.ptr13 = getelementptr inbounds i8* %r.063, i32 3 %2 = load i8* %incdec.ptr12, align 1, !tbaa !0 %conv14 = zext i8 %0 to i32 %mul = mul nsw i32 %conv14, 123 %conv15 = zext i8 %1 to i32 %mul16 = mul nsw i32 %conv15, 321 %conv17 = zext i8 %2 to i32 %mul18 = mul nsw i32 %conv17, 567 %add = add i32 %mul16, %mul %add19 = add i32 %add, %mul18 %conv20 = trunc i32 %add19 to i8 %incdec.ptr21 = getelementptr inbounds i8* %w.065, i32 1 store i8 %conv20, i8* %w.065, align 1, !tbaa !0 %mul23 = mul nsw i32 %conv14, 234 %mul25 = mul nsw i32 %conv15, 432 %mul28 = mul nsw i32 %conv17, 987 %add26 = add i32 %mul25, %mul23 %add29 = add i32 %add26, %mul28 %conv30 = trunc i32 %add29 to i8 %incdec.ptr31 = getelementptr inbounds i8* %w.065, i32 2 store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0 %mul33 = mul nsw i32 %conv14, 345 %mul35 = mul nsw i32 %conv15, 543 %mul38 = mul nsw i32 %conv17, 789 %add36 = add i32 %mul35, %mul33 %add39 = add i32 %add36, %mul38 %conv40 = trunc i32 %add39 to i8 %incdec.ptr41 = getelementptr inbounds i8* %w.065, i32 3 store i8 %conv40, i8* %incdec.ptr31, align 1, !tbaa !0 %inc43 = add nsw i32 %i.164, 1 %exitcond = icmp eq i32 %inc43, 10000 br i1 %exitcond, label %for.inc45, label %for.body10 BBV: fusing: %0 = load i8* %r.063, align 1, !tbaa !0 <-> %1 = load i8* %incdec.ptr11, align 1, !tbaa !0 BBV: fusing: %conv14 = zext i8 %2 to i32 <-> %conv15 = zext i8 %3 to i32 BBV: moving: %mul = mul nsw i32 %5, 123 to after %conv14.v.r2 extractelement <2 x i32> %conv14, i32 1 BBV: fusing: %mul = mul nsw i32 %conv14.v.r1, 123 <-> %mul16 = mul nsw i32 %conv14.v.r2, 321 BBV: fusing: %mul23 = mul nsw i32 %conv14.v.r1, 234 <-> %mul35 mul nsw i32 %conv14.v.r2, 543 BBV: moving: %add26 = add i32 %mul25, %5 to after %mul23.v.r2 extractelement <2 x i32> %mul23, i32 1 BBV: moving: %add29 = add i32 %add26, %mul28 to after %add26 = add i32 %mul25, %5 BBV: moving: %conv30 = trunc i32 %add29 to i8 to after %add29 add i32 %add26, %mul28 BBV: moving: store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0 to after %conv30 = trunc i32 %add29 to i8 BBV: fusing: %mul25 = mul nsw i32 %conv14.v.r2, 432 <-> %mul33 mul nsw i32 %conv14.v.r1, 345 BBV: fusing: %add26 = add i32 %mul25.v.r1, %mul23.v.r1 <-> %add36 = add i32 %mul23.v.r2, %mul25.v.r2 BBV: moving: %add29 = add i32 %5, %mul28 to after %add26.v.r2 extractelement <2 x i32> %add26, i32 1 BBV: moving: %conv30 = trunc i32 %add29 to i8 to after %add29 add i32 %5, %mul28 BBV: moving: store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0 to after %conv30 = trunc i32 %add29 to i8 BBV: fusing: %add29 = add i32 %add26.v.r1, %mul28 <-> %add39 = add i32 %add26.v.r2, %mul38 BBV: moving: %conv30 = trunc i32 %5 to i8 to after %add29.v.r2 extractelement <2 x i32> %add29, i32 1 BBV: moving: store i8 %conv30, i8* %incdec.ptr21, align 1, !tbaa !0 to after %conv30 = trunc i32 %5 to i8 BBV: fusing: %conv30 = trunc i32 %add29.v.r1 to i8 <-> %conv40 trunc i32 %add29.v.r2 to i8 BBV: moving: store i8 %5, i8* %incdec.ptr21, align 1, !tbaa !0 to after %conv30.v.r2 = extractelement <2 x i8> %conv30, i32 1 BBV: fusing: store i8 %conv30.v.r1, i8* %incdec.ptr21, align 1, !tbaa !0 <-> store i8 %conv30.v.r2, i8* %incdec.ptr31, align 1, !tbaa !0 BBV: final: for.body10: ; preds %for.body10, %for.cond7.preheader %w.065 = phi i8* [ %call1, %for.cond7.preheader ], [ %incdec.ptr41, %for.body10 ] %i.164 = phi i32 [ 0, %for.cond7.preheader ], [ %inc43, %for.body10 ] %r.063 = phi i8* [ %call, %for.cond7.preheader ], [ %incdec.ptr13, %for.body10 ] %incdec.ptr11 = getelementptr inbounds i8* %r.063, i32 1 %0 = bitcast i8* %r.063 to <2 x i8>* %incdec.ptr12 = getelementptr inbounds i8* %r.063, i32 2 %1 = load <2 x i8>* %0, align 1, !tbaa !0 %2 = extractelement <2 x i8> %1, i32 0 %3 = extractelement <2 x i8> %1, i32 1 %incdec.ptr13 = getelementptr inbounds i8* %r.063, i32 3 %4 = load i8* %incdec.ptr12, align 1, !tbaa !0 %conv14 = zext <2 x i8> %1 to <2 x i32> %conv14.v.r1 = extractelement <2 x i32> %conv14, i32 0 %conv14.v.r2 = extractelement <2 x i32> %conv14, i32 1 %mul.v.i1.1 = insertelement <2 x i32> undef, i32 123, i32 0 %mul.v.i1.2 = insertelement <2 x i32> %mul.v.i1.1, i32 321, i32 1 %mul = mul nsw <2 x i32> %conv14, %mul.v.i1.2 %mul.v.r1 = extractelement <2 x i32> %mul, i32 0 %mul.v.r2 = extractelement <2 x i32> %mul, i32 1 %conv17 = zext i8 %4 to i32 %mul18 = mul nsw i32 %conv17, 567 %add = add i32 %mul.v.r2, %mul.v.r1 %add19 = add i32 %add, %mul18 %conv20 = trunc i32 %add19 to i8 %incdec.ptr21 = getelementptr inbounds i8* %w.065, i32 1 store i8 %conv20, i8* %w.065, align 1, !tbaa !0 %mul23.v.i1.1 = insertelement <2 x i32> undef, i32 234, i32 0 %mul25.v.i1.1 = insertelement <2 x i32> undef, i32 432, i32 0 %mul28 = mul nsw i32 %conv17, 987 %incdec.ptr31 = getelementptr inbounds i8* %w.065, i32 2 %mul25.v.i1.2 = insertelement <2 x i32> %mul25.v.i1.1, i32 345, i32 1 %mul25.v.i0 = shufflevector <2 x i32> %conv14, <2 x i32> undef, <2 x i32> <i32 1, i32 0> %mul25 = mul nsw <2 x i32> %mul25.v.i0, %mul25.v.i1.2 %mul25.v.r1 = extractelement <2 x i32> %mul25, i32 0 %mul25.v.r2 = extractelement <2 x i32> %mul25, i32 1 %mul23.v.i1.2 = insertelement <2 x i32> %mul23.v.i1.1, i32 543, i32 1 %mul23 = mul nsw <2 x i32> %conv14, %mul23.v.i1.2 %mul23.v.r1 = extractelement <2 x i32> %mul23, i32 0 %mul23.v.r2 = extractelement <2 x i32> %mul23, i32 1 %mul38 = mul nsw i32 %conv17, 789 %add26.v.i1 = shufflevector <2 x i32> %mul23, <2 x i32> %mul25, <2 x i32> <i32 0, i32 3> %add26.v.i0 = shufflevector <2 x i32> %mul25, <2 x i32> %mul23, <2 x i32> <i32 0, i32 3> %add26 = add <2 x i32> %add26.v.i0, %add26.v.i1 %add26.v.r1 = extractelement <2 x i32> %add26, i32 0 %add26.v.r2 = extractelement <2 x i32> %add26, i32 1 %add29.v.i1.1 = insertelement <2 x i32> undef, i32 %mul28, i32 0 %add29.v.i1.2 = insertelement <2 x i32> %add29.v.i1.1, i32 %mul38, i32 1 %add29 = add <2 x i32> %add26, %add29.v.i1.2 %add29.v.r1 = extractelement <2 x i32> %add29, i32 0 %add29.v.r2 = extractelement <2 x i32> %add29, i32 1 %conv30 = trunc <2 x i32> %add29 to <2 x i8> %conv30.v.r1 = extractelement <2 x i8> %conv30, i32 0 %conv30.v.r2 = extractelement <2 x i8> %conv30, i32 1 %5 = bitcast i8* %incdec.ptr21 to <2 x i8>* %incdec.ptr41 = getelementptr inbounds i8* %w.065, i32 3 store <2 x i8> %conv30, <2 x i8>* %5, align 1, !tbaa !0 %inc43 = add nsw i32 %i.164, 1 %exitcond = icmp eq i32 %inc43, 10000 br i1 %exitcond, label %for.inc45, label %for.body10 BBV: fusing loop #2 for for.body10 in main... BBV: found 27 instructions with candidate pairs BBV: found 33 pair connections. BBV: selected 0 pairs. BBV: done! BBV: fusing loop #1 for for.inc45 in main... BBV: found 0 instructions with candidate pairs BBV: done! BBV: fusing loop #1 for for.end47 in main... BBV: found 5 instructions with candidate pairs BBV: found 2 pair connections. BBV: selected 0 pairs. BBV: done! See also the attached test.ll (if that helps). Sebastian -- Qualcomm Innovation Center, Inc is a member of Code Aurora Forum -------------- next part -------------- A non-text attachment was scrubbed... Name: test.ll Type: application/octet-stream Size: 5983 bytes Desc: not available URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120126/f91d93eb/attachment.obj>
Hal Finkel
2012-Jan-26 21:41 UTC
[LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
On Thu, 2012-01-26 at 15:36 -0600, Sebastian Pop wrote:> arm-none-linux-gnueabiIndeed, adding -ccc-host-triple arm-none-linux-gnueabi I also get vectorization (even though I don't get vectorization when targeting x86_64). I'll let you know what I find. -Hal -- Hal Finkel Postdoctoral Appointee Leadership Computing Facility Argonne National Laboratory
Maybe Matching Threads
- [LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
- [LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
- [LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
- [LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
- [LLVMdev] [llvm-commits] [PATCH] BasicBlock Autovectorization Pass