Hal Finkel
2014-Oct-16 18:38 UTC
[LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2?
----- Original Message -----> From: "Chandler Carruth" <chandlerc at google.com> > To: "Zinovy Nis" <zinovy.nis at gmail.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "James Molloy" <james at jamesmolloy.co.uk>, "LLVM Developers Mailing List" > <llvmdev at cs.uiuc.edu> > Sent: Thursday, October 16, 2014 1:21:19 PM > Subject: Re: [LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2? > > > > > > On Thu, Oct 16, 2014 at 7:55 AM, Zinovy Nis < zinovy.nis at gmail.com > > wrote: > > > > Seems that adding -extra-vectorizer-passes doesn't help to vectorizer > in my case. LoopRotation re-run does nothing. > > Can you reduce this to a test case you can share?He had posted one earlier here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/238660.html (Arnold had posted some analysis here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/239144.html - which I imagine you saw) -Hal -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Chandler Carruth
2014-Oct-16 18:52 UTC
[LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2?
On Thu, Oct 16, 2014 at 11:38 AM, Hal Finkel <hfinkel at anl.gov> wrote:> He had posted one earlier here: > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/238660.html > (Arnold had posted some analysis here: > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/239144.html > - which I imagine you saw) >Doh, sorry. I saw Arnold's analysis but (wrongly) assumed that a complete working test case wasn't available which is why Arnold expected loop-rotate to fix this when it didn't. My bad. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141016/6eb19691/attachment.html>
Arnold Schwaighofer
2014-Oct-16 22:04 UTC
[LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2?
I quickly took a look again. Instcombine is removing the fast-math flag on the reduction so the vectorizer does not touch it. bin/clang -O3 -ffast-math -mllvm -extra-vectorizer-passes aobench.cpp -emit-llvm -S -o aobench.2.ll -mllvm -debug-only=loop-vectorize -mllvm -print-before-all -mllvm -print-after-all 2> aobench.2.debug.ll *** IR Dump Before Combine redundant instructions *** ; Function Attrs: noinline nounwind ssp uwtable define void @_Z21ambient_occlusion_vecP6_IsectR5vrandILm8EE(%struct._Isect* nocapture %isect, %class.vrand* nocapture readonly dereferenceable(32) %rng) #0 { entry: ... br label %for.body for.body: ; preds = %for.inc.for.body_crit_edge, %entry %8 = phi float [ %conv.i, %entry ], [ %.pre, %for.inc.for.body_crit_edge ] %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc.for.body_crit_edge ] %occlusion.017 = phi float [ 0.000000e+00, %entry ], [ %conv4, %for.inc.for.body_crit_edge ] %arrayidx = getelementptr inbounds [8 x float]* %rand1, i64 0, i64 %indvars.iv %9 = load float* %arrayidx, align 4, !tbaa !5 %mul3 = fmul fast float %8, %9 %mul.i.i = fmul fast float %mul3, %mul3 %cmp.i = fcmp olt float %mul.i.i, 0x3C670EF540000000 %conv4 = fadd fast float %occlusion.017, 1.000000e+00 <============= FAST %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %exitcond = icmp eq i64 %indvars.iv, 63 br i1 %exitcond, label %for.end, label %for.inc.for.body_crit_edge for.inc.for.body_crit_edge: ; preds = %for.body %arrayidx2.phi.trans.insert = getelementptr inbounds [8 x float]* %rand2, i64 0, i64 %indvars.iv.next %.pre = load float* %arrayidx2.phi.trans.insert, align 4, !tbaa !5 br label %for.body for.end: ; preds = %for.body %t5 = getelementptr inbounds %struct._Isect* %isect, i64 0, i32 0 store float %conv4, float* %t5, align 4, !tbaa !7 ret void } *** IR Dump After Combine redundant instructions *** ; Function Attrs: noinline nounwind ssp uwtable define void @_Z21ambient_occlusion_vecP6_IsectR5vrandILm8EE(%struct._Isect* nocapture %isect, %class.vrand* nocapture readonly dereferenceable(32) %rng) #0 { entry: br label %for.body for.body: ; preds = %for.inc.for.body_crit_edge, %entry %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc.for.body_crit_edge ] %occlusion.017 = phi float [ 1.000000e+00, %entry ], [ %phitmp, %for.inc.for.body_crit_edge ] %exitcond = icmp eq i64 %indvars.iv, 63 br i1 %exitcond, label %for.end, label %for.inc.for.body_crit_edge for.inc.for.body_crit_edge: ; preds = %for.body %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1 %phitmp = fadd float %occlusion.017, 1.000000e+00 <=============== NOT FAST br label %for.body for.end: ; preds = %for.body %t5 = getelementptr inbounds %struct._Isect* %isect, i64 0, i32 0 store float %occlusion.017, float* %t5, align 4, !tbaa !1 ret void }> On Oct 16, 2014, at 11:52 AM, Chandler Carruth <chandlerc at google.com> wrote: > > > On Thu, Oct 16, 2014 at 11:38 AM, Hal Finkel <hfinkel at anl.gov> wrote: > He had posted one earlier here: > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/238660.html > (Arnold had posted some analysis here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/239144.html - which I imagine you saw) > > Doh, sorry. I saw Arnold's analysis but (wrongly) assumed that a complete working test case wasn't available which is why Arnold expected loop-rotate to fix this when it didn't. My bad. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev