On Aug 8, 2013, at 9:56 AM, Jim Grosbach <grosbach at apple.com> wrote:> Hi Chad, > > This is a great transform to do, but you’re right that it’s only safe under fast-math. This is particularly interesting when the original divisor is a constant so you can materialize the reciprocal at compile-time. You’re right that in either case, this optimization should only kick in when there is more than one divide instruction that will be changed to a mul.It can be worthwhile to do this even in the case where there is only a single divide since 1/Y might be loop invariant, and could then be hoisted out later by LICM. You just need to be able to fold it back together when there is only a single use, and that use is not inside a more deeply nested loop.> > I don’t have a strong preference for instcombine vs. dagcombine, though I lean slightly towards later when we’ll have more target information available if we want to apply a more complicated cost function for some targets. > > -Jim > > > On Aug 8, 2013, at 9:25 AM, Chad Rosier <chad.rosier at gmail.com> wrote: > >> I would like to transform X/Y -> X*1/Y. Specifically, I would like to convert: >> >> define void @t1a(double %a, double %b, double %d) { >> entry: >> %div = fdiv fast double %a, %d >> %div1 = fdiv fast double %b, %d >> %call = tail call i32 @foo(double %div, double %div1) >> ret void >> } >> >> to: >> >> define void @t1b(double %a, double %b, double %d) { >> entry: >> %div = fdiv fast double 1.000000e+00, %d >> %mul = fmul fast double %div, %a >> %mul1 = fmul fast double %div, %b >> %call = tail call i32 @foo(double %mul, double %mul1) >> ret void >> } >> >> Is such a transformation best done as a (target-specific) DAG combine? >> >> A similar instcombine already exists for the X/C->X*1/C case (see the CvtFDivConstToReciprocal function in InstCombineMlDivRem.cpp), but I don't believe the above can be done as an instcombine as it creates a new instruction (in addition to replacing the original). Also, I only want to perform the transformation if there are multiple uses of 1/Y (like in my test case). Otherwise, the transformation replaces a fdiv with a fdiv+fmul pair, which I doubt would be profitable. >> >> FWIW, I'm also pretty sure this combine requires -fast-math. >> >> Can someone point me in the right direction? >> >> Thanks, >> Chad >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130808/6c2c3834/attachment.html>
On Thu, Aug 8, 2013 at 1:56 PM, Mark Lacey <mark.lacey at apple.com> wrote:> > On Aug 8, 2013, at 9:56 AM, Jim Grosbach <grosbach at apple.com> wrote: > > Hi Chad, > > This is a great transform to do, but you’re right that it’s only safe > under fast-math. This is particularly interesting when the original divisor > is a constant so you can materialize the reciprocal at compile-time. You’re > right that in either case, this optimization should only kick in when there > is more than one divide instruction that will be changed to a mul. > > > It can be worthwhile to do this even in the case where there is only a > single divide since 1/Y might be loop invariant, and could then be hoisted > out later by LICM. You just need to be able to fold it back together when > there is only a single use, and that use is not inside a more deeply nested > loop. >Ben's patch does exactly this, so perhaps that is the right approach.> > I don’t have a strong preference for instcombine vs. dagcombine, though I > lean slightly towards later when we’ll have more target information > available if we want to apply a more complicated cost function for some > targets. > > -Jim > > > On Aug 8, 2013, at 9:25 AM, Chad Rosier <chad.rosier at gmail.com> wrote: > > I would like to transform X/Y -> X*1/Y. Specifically, I would like to > convert: > > define void @t1a(double %a, double %b, double %d) { > entry: > %div = fdiv fast double %a, %d > %div1 = fdiv fast double %b, %d > %call = tail call i32 @foo(double %div, double %div1) > ret void > } > > to: > > define void @t1b(double %a, double %b, double %d) { > entry: > %div = fdiv fast double 1.000000e+00, %d > %mul = fmul fast double %div, %a > %mul1 = fmul fast double %div, %b > %call = tail call i32 @foo(double %mul, double %mul1) > ret void > } > > Is such a transformation best done as a (target-specific) DAG combine? > > A similar instcombine already exists for the X/C->X*1/C case (see the > CvtFDivConstToReciprocal function in InstCombineMlDivRem.cpp), but I don't > believe the above can be done as an instcombine as it creates a new > instruction (in addition to replacing the original). Also, I only want to > perform the transformation if there are multiple uses of 1/Y (like in my > test case). Otherwise, the transformation replaces a fdiv with a fdiv+fmul > pair, which I doubt would be profitable. > > FWIW, I'm also pretty sure this combine requires -fast-math. > > Can someone point me in the right direction? > > Thanks, > Chad > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev > > >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130808/2a2531de/attachment.html>
On Thu, Aug 8, 2013 at 2:07 PM, Chad Rosier <chad.rosier at gmail.com> wrote:> On Thu, Aug 8, 2013 at 1:56 PM, Mark Lacey <mark.lacey at apple.com> wrote: > >> >> On Aug 8, 2013, at 9:56 AM, Jim Grosbach <grosbach at apple.com> wrote: >> >> Hi Chad, >> >> This is a great transform to do, but you’re right that it’s only safe >> under fast-math. This is particularly interesting when the original divisor >> is a constant so you can materialize the reciprocal at compile-time. You’re >> right that in either case, this optimization should only kick in when there >> is more than one divide instruction that will be changed to a mul. >> >> >> It can be worthwhile to do this even in the case where there is only a >> single divide since 1/Y might be loop invariant, and could then be hoisted >> out later by LICM. You just need to be able to fold it back together when >> there is only a single use, and that use is not inside a more deeply nested >> loop. >> > > Ben's patch does exactly this, so perhaps that is the right approach. >Just to be clear of what is being proposed (which I rather like): 1) Canonical form is to use the reciprocal when allowed (by the fast math flags, whichever we decide are appropriate). 2) The backend folds a single-use reciprocal into a direct divide. Did I get it right? If so, I think this is a really nice way to capture all of the potential benefits of forming reciprocals without pessimizing code where it isn't helpful. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130808/a0f3c417/attachment.html>