Hal Finkel
2014-Oct-16 18:38 UTC
[LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2?
----- Original Message -----> From: "Chandler Carruth" <chandlerc at google.com> > To: "Zinovy Nis" <zinovy.nis at gmail.com> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "James Molloy" <james at jamesmolloy.co.uk>, "LLVM Developers Mailing List" > <llvmdev at cs.uiuc.edu> > Sent: Thursday, October 16, 2014 1:21:19 PM > Subject: Re: [LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2? > > > > > > On Thu, Oct 16, 2014 at 7:55 AM, Zinovy Nis < zinovy.nis at gmail.com > > wrote: > > > > Seems that adding -extra-vectorizer-passes doesn't help to vectorizer > in my case. LoopRotation re-run does nothing. > > Can you reduce this to a test case you can share?He had posted one earlier here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/238660.html (Arnold had posted some analysis here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/239144.html - which I imagine you saw) -Hal -- Hal Finkel Assistant Computational Scientist Leadership Computing Facility Argonne National Laboratory
Chandler Carruth
2014-Oct-16 18:52 UTC
[LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2?
On Thu, Oct 16, 2014 at 11:38 AM, Hal Finkel <hfinkel at anl.gov> wrote:> He had posted one earlier here: > > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/238660.html > (Arnold had posted some analysis here: > http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/239144.html > - which I imagine you saw) >Doh, sorry. I saw Arnold's analysis but (wrongly) assumed that a complete working test case wasn't available which is why Arnold expected loop-rotate to fix this when it didn't. My bad. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141016/6eb19691/attachment.html>
Arnold Schwaighofer
2014-Oct-16 22:04 UTC
[LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2?
I quickly took a look again. Instcombine is removing the fast-math flag on the
reduction so the vectorizer does not touch it.
bin/clang -O3 -ffast-math -mllvm -extra-vectorizer-passes aobench.cpp -emit-llvm
-S -o aobench.2.ll -mllvm -debug-only=loop-vectorize -mllvm -print-before-all
-mllvm -print-after-all 2> aobench.2.debug.ll
*** IR Dump Before Combine redundant instructions ***
; Function Attrs: noinline nounwind ssp uwtable
define void @_Z21ambient_occlusion_vecP6_IsectR5vrandILm8EE(%struct._Isect*
nocapture %isect, %class.vrand* nocapture readonly dereferenceable(32) %rng) #0
{
entry:
...
br label %for.body
for.body: ; preds =
%for.inc.for.body_crit_edge, %entry
%8 = phi float [ %conv.i, %entry ], [ %.pre, %for.inc.for.body_crit_edge ]
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next,
%for.inc.for.body_crit_edge ]
%occlusion.017 = phi float [ 0.000000e+00, %entry ], [ %conv4,
%for.inc.for.body_crit_edge ]
%arrayidx = getelementptr inbounds [8 x float]* %rand1, i64 0, i64 %indvars.iv
%9 = load float* %arrayidx, align 4, !tbaa !5
%mul3 = fmul fast float %8, %9
%mul.i.i = fmul fast float %mul3, %mul3
%cmp.i = fcmp olt float %mul.i.i, 0x3C670EF540000000
%conv4 = fadd fast float %occlusion.017, 1.000000e+00 <============= FAST
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv, 63
br i1 %exitcond, label %for.end, label %for.inc.for.body_crit_edge
for.inc.for.body_crit_edge: ; preds = %for.body
%arrayidx2.phi.trans.insert = getelementptr inbounds [8 x float]* %rand2, i64
0, i64 %indvars.iv.next
%.pre = load float* %arrayidx2.phi.trans.insert, align 4, !tbaa !5
br label %for.body
for.end: ; preds = %for.body
%t5 = getelementptr inbounds %struct._Isect* %isect, i64 0, i32 0
store float %conv4, float* %t5, align 4, !tbaa !7
ret void
}
*** IR Dump After Combine redundant instructions ***
; Function Attrs: noinline nounwind ssp uwtable
define void @_Z21ambient_occlusion_vecP6_IsectR5vrandILm8EE(%struct._Isect*
nocapture %isect, %class.vrand* nocapture readonly dereferenceable(32) %rng) #0
{
entry:
br label %for.body
for.body: ; preds =
%for.inc.for.body_crit_edge, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next,
%for.inc.for.body_crit_edge ]
%occlusion.017 = phi float [ 1.000000e+00, %entry ], [ %phitmp,
%for.inc.for.body_crit_edge ]
%exitcond = icmp eq i64 %indvars.iv, 63
br i1 %exitcond, label %for.end, label %for.inc.for.body_crit_edge
for.inc.for.body_crit_edge: ; preds = %for.body
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%phitmp = fadd float %occlusion.017, 1.000000e+00 <=============== NOT FAST
br label %for.body
for.end: ; preds = %for.body
%t5 = getelementptr inbounds %struct._Isect* %isect, i64 0, i32 0
store float %occlusion.017, float* %t5, align 4, !tbaa !1
ret void
}
> On Oct 16, 2014, at 11:52 AM, Chandler Carruth <chandlerc at
google.com> wrote:
>
>
> On Thu, Oct 16, 2014 at 11:38 AM, Hal Finkel <hfinkel at anl.gov>
wrote:
> He had posted one earlier here:
>
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/238660.html
> (Arnold had posted some analysis here:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/239144.html
- which I imagine you saw)
>
> Doh, sorry. I saw Arnold's analysis but (wrongly) assumed that a
complete working test case wasn't available which is why Arnold expected
loop-rotate to fix this when it didn't. My bad.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev