邱 超凡 via llvm-dev
2019-Aug-06 05:20 UTC
[llvm-dev] [RFC] Improve iteration of estimating divisions
Hi there, I notice that our current implementation of fast division
transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared
with GCC. Like this case in ppc64le:
float fdiv(unsigned int a, unsigned int b) {
return (float)a / (float)b;
}
Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which
is the same as no optimizations opened.
Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the
reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand
`a` into iterations in the estimate function, the result would be better.
Patching such a change may break several existing test cases in different
platforms since it’s target-independent code. So any suggestions are welcome.
Thanks.
Regards,
Qiu Chaofan
Neil Nelson via llvm-dev
2019-Aug-06 16:54 UTC
[llvm-dev] [RFC] Improve iteration of estimating divisions
Qiu Chaofan, Yes, clearly, two floating point operations instead of one will increase the degree of resulting error already present in the necessarily or commonly fixed length number representations. The reason for the two operations appears to be that there may be machine instructions for a reciprocal that when combined with a multiplication obtains fewer machine cycles than a division. The trade-off is then precision vs. speed. There may be additional computations along this line and perhaps an additional compile flag, along with code changes, would allow that choice. Regards, Neil Nelson On 8/5/19 11:20 PM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le: > > float fdiv(unsigned int a, unsigned int b) { > return (float)a / (float)b; > } > > Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened. > > Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better. > > Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks. > > Regards, > Qiu Chaofan > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190806/35fc3eb0/attachment.html>
Finkel, Hal J. via llvm-dev
2019-Aug-06 20:04 UTC
[llvm-dev] [RFC] Improve iteration of estimating divisions
On 8/6/19 12:20 AM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le: > > float fdiv(unsigned int a, unsigned int b) { > return (float)a / (float)b; > } > > Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened. > > Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better. > > Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks.Test cases can be changed if the result is universally better, and alternatively, we can introduce a way for the target to control the behavior (e.g., how we choose between buildSqrtNROneConst and buildSqrtNRTwoConst). What's the effect on performance? -Hal> > Regards, > Qiu Chaofan > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Qiu Chaofan via llvm-dev
2019-Aug-08 16:47 UTC
[llvm-dev] 回复: [RFC] Improve iteration of estimating divisions
Hal,
Yes, speed is an important factor of making dicision. Here I just put the
numerator into estimation, so it won't add any more instructions. A simple
benchmark below keeps the same running time between the demo and current master:
```
float fdiv(unsigned int a, unsigned int b) {
return (float)a / (float)b;
}
float m;
__attribute__((noinline)) void foo() {
m = 0.0;
}
int main() {
for (int i = 1; i < 1000000; ++i)
for (int j = 1; j < 30000; ++j) {
m = fdiv(i, j);
foo();
}
}
```
Regards,
Qiu Chaofan
________________________________________
发件人: Finkel, Hal J. <hfinkel at anl.gov>
发送时间: 2019年8月7日 4:04
收件人: 邱 超凡; llvm-dev at lists.llvm.org
主题: Re: [llvm-dev] [RFC] Improve iteration of estimating divisions
On 8/6/19 12:20 AM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division
transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared
with GCC. Like this case in ppc64le:
>
> float fdiv(unsigned int a, unsigned int b) {
> return (float)a / (float)b;
> }
>
> Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000
which is the same as no optimizations opened.
>
> Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the
reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand
`a` into iterations in the estimate function, the result would be better.
>
> Patching such a change may break several existing test cases in different
platforms since it’s target-independent code. So any suggestions are welcome.
Thanks.
Test cases can be changed if the result is universally better, and
alternatively, we can introduce a way for the target to control the
behavior (e.g., how we choose between buildSqrtNROneConst and
buildSqrtNRTwoConst). What's the effect on performance?
-Hal
>
> Regards,
> Qiu Chaofan
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
>
https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&data=02%7C01%7C%7Cdbff2450e5bb4b63e5f108d71aa94e7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637007186791161795&sdata=LWVNeuqNP0FRnckeZQk03JwJcuBJgsKZh%2Fb%2BddLrhhU%3D&reserved=0
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory