On Thu, Aug 8, 2013 at 2:07 PM, Chad Rosier <chad.rosier at gmail.com> wrote:> On Thu, Aug 8, 2013 at 1:56 PM, Mark Lacey <mark.lacey at apple.com> wrote: > >> >> On Aug 8, 2013, at 9:56 AM, Jim Grosbach <grosbach at apple.com> wrote: >> >> Hi Chad, >> >> This is a great transform to do, but you’re right that it’s only safe >> under fast-math. This is particularly interesting when the original divisor >> is a constant so you can materialize the reciprocal at compile-time. You’re >> right that in either case, this optimization should only kick in when there >> is more than one divide instruction that will be changed to a mul. >> >> >> It can be worthwhile to do this even in the case where there is only a >> single divide since 1/Y might be loop invariant, and could then be hoisted >> out later by LICM. You just need to be able to fold it back together when >> there is only a single use, and that use is not inside a more deeply nested >> loop. >> > > Ben's patch does exactly this, so perhaps that is the right approach. >Just to be clear of what is being proposed (which I rather like): 1) Canonical form is to use the reciprocal when allowed (by the fast math flags, whichever we decide are appropriate). 2) The backend folds a single-use reciprocal into a direct divide. Did I get it right? If so, I think this is a really nice way to capture all of the potential benefits of forming reciprocals without pessimizing code where it isn't helpful. -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130808/a0f3c417/attachment.html>
Point #1 makes sense to me. For point #2, wouldn't that be somewhat orthogonal to the discussion, as it has/needs no knowledge that an IR-level transformation happened? Also, reciprocal-multiply will be the preferred option for many (most) backends if the IR says to do that. But, I suppose some backend might want to be allowed to do the reverse transformation if allowed by fast-math flags in IR, or fast-math mode in selection DAG. On Aug 8, 2013, at 2:23 PM, Chandler Carruth <chandlerc at google.com> wrote:> > On Thu, Aug 8, 2013 at 2:07 PM, Chad Rosier <chad.rosier at gmail.com> wrote: > On Thu, Aug 8, 2013 at 1:56 PM, Mark Lacey <mark.lacey at apple.com> wrote: > > On Aug 8, 2013, at 9:56 AM, Jim Grosbach <grosbach at apple.com> wrote: > >> Hi Chad, >> >> This is a great transform to do, but you’re right that it’s only safe under fast-math. This is particularly interesting when the original divisor is a constant so you can materialize the reciprocal at compile-time. You’re right that in either case, this optimization should only kick in when there is more than one divide instruction that will be changed to a mul. > > It can be worthwhile to do this even in the case where there is only a single divide since 1/Y might be loop invariant, and could then be hoisted out later by LICM. You just need to be able to fold it back together when there is only a single use, and that use is not inside a more deeply nested loop. > > Ben's patch does exactly this, so perhaps that is the right approach. > > Just to be clear of what is being proposed (which I rather like): > > 1) Canonical form is to use the reciprocal when allowed (by the fast math flags, whichever we decide are appropriate). > 2) The backend folds a single-use reciprocal into a direct divide. > > Did I get it right? If so, I think this is a really nice way to capture all of the potential benefits of forming reciprocals without pessimizing code where it isn't helpful. > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130808/3d40bbf3/attachment.html>
On Aug 8, 2013, at 2:35 PM, Michael Ilseman <milseman at apple.com> wrote:> Point #1 makes sense to me. > > For point #2, wouldn't that be somewhat orthogonal to the discussion, as it has/needs no knowledge that an IR-level transformation happened? Also, reciprocal-multiply will be the preferred option for many (most) backends if the IR says to do that. But, I suppose some backend might want to be allowed to do the reverse transformation if allowed by fast-math flags in IR, or fast-math mode in selection DAG. >Oh, I forgot about optimize-for-size, which might be a user who desires the reverse transformation.> On Aug 8, 2013, at 2:23 PM, Chandler Carruth <chandlerc at google.com> wrote: > >> >> On Thu, Aug 8, 2013 at 2:07 PM, Chad Rosier <chad.rosier at gmail.com> wrote: >> On Thu, Aug 8, 2013 at 1:56 PM, Mark Lacey <mark.lacey at apple.com> wrote: >> >> On Aug 8, 2013, at 9:56 AM, Jim Grosbach <grosbach at apple.com> wrote: >> >>> Hi Chad, >>> >>> This is a great transform to do, but you’re right that it’s only safe under fast-math. This is particularly interesting when the original divisor is a constant so you can materialize the reciprocal at compile-time. You’re right that in either case, this optimization should only kick in when there is more than one divide instruction that will be changed to a mul. >> >> It can be worthwhile to do this even in the case where there is only a single divide since 1/Y might be loop invariant, and could then be hoisted out later by LICM. You just need to be able to fold it back together when there is only a single use, and that use is not inside a more deeply nested loop. >> >> Ben's patch does exactly this, so perhaps that is the right approach. >> >> Just to be clear of what is being proposed (which I rather like): >> >> 1) Canonical form is to use the reciprocal when allowed (by the fast math flags, whichever we decide are appropriate). >> 2) The backend folds a single-use reciprocal into a direct divide. >> >> Did I get it right? If so, I think this is a really nice way to capture all of the potential benefits of forming reciprocals without pessimizing code where it isn't helpful. >> _______________________________________________ >> LLVM Developers mailing list >> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev >-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130808/43f5dde4/attachment.html>
On Thu, Aug 8, 2013 at 5:23 PM, Chandler Carruth <chandlerc at google.com>wrote:> > On Thu, Aug 8, 2013 at 2:07 PM, Chad Rosier <chad.rosier at gmail.com> wrote: > >> On Thu, Aug 8, 2013 at 1:56 PM, Mark Lacey <mark.lacey at apple.com> wrote: >> >>> >>> On Aug 8, 2013, at 9:56 AM, Jim Grosbach <grosbach at apple.com> wrote: >>> >>> Hi Chad, >>> >>> This is a great transform to do, but you’re right that it’s only safe >>> under fast-math. This is particularly interesting when the original divisor >>> is a constant so you can materialize the reciprocal at compile-time. You’re >>> right that in either case, this optimization should only kick in when there >>> is more than one divide instruction that will be changed to a mul. >>> >>> >>> It can be worthwhile to do this even in the case where there is only a >>> single divide since 1/Y might be loop invariant, and could then be hoisted >>> out later by LICM. You just need to be able to fold it back together when >>> there is only a single use, and that use is not inside a more deeply nested >>> loop. >>> >> >> Ben's patch does exactly this, so perhaps that is the right approach. >> > > Just to be clear of what is being proposed (which I rather like): > > 1) Canonical form is to use the reciprocal when allowed (by the fast math > flags, whichever we decide are appropriate). > 2) The backend folds a single-use reciprocal into a direct divide. > > Did I get it right? If so, I think this is a really nice way to capture > all of the potential benefits of forming reciprocals without pessimizing > code where it isn't helpful. >I believe you're describing Ben's patch perfectly. A few transformations are pessimize, however.>From test/Transforms/InstCombine/fast-math.ll1. Previously x/y + x/z was not transformed. Not it becomes x*(1/y+1/x). define float @fact_div1(float %x, float %y, float %z) { %t1 = fdiv fast float %x, %y %t2 = fdiv fast float %x, %z %t3 = fadd fast float %t1, %t2 ret float %t3 } combines to: define float @fact_div1(float %x, float %y, float %z) { %reciprocal = fdiv fast float 1.000000e+00, %y %reciprocal1 = fdiv fast float 1.000000e+00, %z %1 = fadd fast float %reciprocal, %reciprocal1 %2 = fmul fast float %1, %x ret float %t3 } I don't believe the fixup in CodeGenPrepare will undo such a transformation. 2. Similarly, x/y + z/x was not previously changed, but now we generate x*(1/y) + z*(1/x). I believe we can undo this transformation. 3. Previously we would transform y/x + z/x => (y+z)/x. Now y/x + z/x is transformed to y*(1/x)+z*(1/x). This might be an ordering problem or perhaps we could just transform y*(1/x)+z*(1/x) => (y+z)/x. The same holds true for y/x - z/x. Chad -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130808/fe3cdafe/attachment.html>