thr3ads.net - llvm dev - [llvm-dev] [RFC] Improve iteration of estimating divisions [Aug 2019]

If this information is useful, please help other people find it:
Share via:

邱超凡 via llvm-dev

2019-Aug-06 05:20 UTC

[llvm-dev] [RFC] Improve iteration of estimating divisions

Hi there, I notice that our current implementation of fast division
transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared
with GCC.  Like this case in ppc64le:

        float fdiv(unsigned int a, unsigned int b) {
                return (float)a / (float)b;
        }

Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which
is the same as no optimizations opened.

Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the
reciprocal (`1/b`) first and multiply it with `a`.  But if we put the operand
`a` into iterations in the estimate function, the result would be better.

Patching such a change may break several existing test cases in different
platforms since it’s target-independent code.  So any suggestions are welcome. 
Thanks.

Regards,
Qiu Chaofan

Neil Nelson via llvm-dev

2019-Aug-06 16:54 UTC

head link

[llvm-dev] [RFC] Improve iteration of estimating divisions

Qiu Chaofan,

Yes, clearly, two floating point operations instead of one will increase the
degree of resulting error already present in the necessarily or commonly fixed
length number representations.

The reason for the two operations appears to be that there may be machine
instructions for a reciprocal that when combined with a multiplication obtains
fewer machine cycles than a division.

The trade-off is then precision vs. speed. There may be additional computations
along this line and perhaps an additional compile flag, along with code changes,
would allow that choice.

Regards, Neil Nelson

On 8/5/19 11:20 PM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division
transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared
with GCC.  Like this case in ppc64le:
>
>          float fdiv(unsigned int a, unsigned int b) {
>                  return (float)a / (float)b;
>          }
>
> Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000
which is the same as no optimizations opened.
>
> Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the
reciprocal (`1/b`) first and multiply it with `a`.  But if we put the operand
`a` into iterations in the estimate function, the result would be better.
>
> Patching such a change may break several existing test cases in different
platforms since it’s target-independent code.  So any suggestions are welcome. 
Thanks.
>
> Regards,
> Qiu Chaofan
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20190806/35fc3eb0/attachment.html>

Finkel, Hal J. via llvm-dev

2019-Aug-06 20:04 UTC

head link

[llvm-dev] [RFC] Improve iteration of estimating divisions

On 8/6/19 12:20 AM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division
transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared
with GCC.  Like this case in ppc64le:
>
>          float fdiv(unsigned int a, unsigned int b) {
>                  return (float)a / (float)b;
>          }
>
> Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000
which is the same as no optimizations opened.
>
> Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the
reciprocal (`1/b`) first and multiply it with `a`.  But if we put the operand
`a` into iterations in the estimate function, the result would be better.
>
> Patching such a change may break several existing test cases in different
platforms since it’s target-independent code.  So any suggestions are welcome. 
Thanks.

Test cases can be changed if the result is universally better, and 
alternatively, we can introduce a way for the target to control the 
behavior (e.g., how we choose between buildSqrtNROneConst and 
buildSqrtNRTwoConst). What's the effect on performance?

  -Hal

>
> Regards,
> Qiu Chaofan
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Qiu Chaofan via llvm-dev

2019-Aug-08 16:47 UTC

head link

[llvm-dev] 回复: [RFC] Improve iteration of estimating divisions

Hal,

Yes, speed is an important factor of making dicision. Here I just put the
numerator into estimation, so it won't add any more instructions. A simple
benchmark below keeps the same running time between the demo and current master:

```
float fdiv(unsigned int a, unsigned int b) {
  return (float)a / (float)b;
}

float m;

__attribute__((noinline)) void foo() {
  m = 0.0;
}

int main() {
  for (int i = 1; i < 1000000; ++i)
    for (int j = 1; j < 30000; ++j) {
      m = fdiv(i, j);
      foo();
    }
}
```

Regards,
Qiu Chaofan

________________________________________
发件人: Finkel, Hal J. <hfinkel at anl.gov>
发送时间: 2019年8月7日 4:04
收件人: 邱 超凡; llvm-dev at lists.llvm.org
主题: Re: [llvm-dev] [RFC] Improve iteration of estimating divisions


On 8/6/19 12:20 AM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division
transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared
with GCC.  Like this case in ppc64le:
>
>          float fdiv(unsigned int a, unsigned int b) {
>                  return (float)a / (float)b;
>          }
>
> Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000
which is the same as no optimizations opened.
>
> Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the
reciprocal (`1/b`) first and multiply it with `a`.  But if we put the operand
`a` into iterations in the estimate function, the result would be better.
>
> Patching such a change may break several existing test cases in different
platforms since it’s target-independent code.  So any suggestions are welcome. 
Thanks.

Test cases can be changed if the result is universally better, and
alternatively, we can introduce a way for the target to control the
behavior (e.g., how we choose between buildSqrtNROneConst and
buildSqrtNRTwoConst). What's the effect on performance?

  -Hal

>
> Regards,
> Qiu Chaofan
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
>
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&amp;data=02%7C01%7C%7Cdbff2450e5bb4b63e5f108d71aa94e7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637007186791161795&amp;sdata=LWVNeuqNP0FRnckeZQk03JwJcuBJgsKZh%2Fb%2BddLrhhU%3D&amp;reserved=0
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

llvm dev - Aug 2019 - [RFC] Improve iteration of estimating divisions

[llvm-dev] [RFC] Improve iteration of estimating divisions

[llvm-dev] [RFC] Improve iteration of estimating divisions

[llvm-dev] [RFC] Improve iteration of estimating divisions

[llvm-dev] 回复: [RFC] Improve iteration of estimating divisions