邱 超凡 via llvm-dev
2019-Aug-06 05:20 UTC
[llvm-dev] [RFC] Improve iteration of estimating divisions
Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le: float fdiv(unsigned int a, unsigned int b) { return (float)a / (float)b; } Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened. Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better. Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks. Regards, Qiu Chaofan
Neil Nelson via llvm-dev
2019-Aug-06 16:54 UTC
[llvm-dev] [RFC] Improve iteration of estimating divisions
Qiu Chaofan, Yes, clearly, two floating point operations instead of one will increase the degree of resulting error already present in the necessarily or commonly fixed length number representations. The reason for the two operations appears to be that there may be machine instructions for a reciprocal that when combined with a multiplication obtains fewer machine cycles than a division. The trade-off is then precision vs. speed. There may be additional computations along this line and perhaps an additional compile flag, along with code changes, would allow that choice. Regards, Neil Nelson On 8/5/19 11:20 PM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le: > > float fdiv(unsigned int a, unsigned int b) { > return (float)a / (float)b; > } > > Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened. > > Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better. > > Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks. > > Regards, > Qiu Chaofan > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190806/35fc3eb0/attachment.html>
Finkel, Hal J. via llvm-dev
2019-Aug-06 20:04 UTC
[llvm-dev] [RFC] Improve iteration of estimating divisions
On 8/6/19 12:20 AM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le: > > float fdiv(unsigned int a, unsigned int b) { > return (float)a / (float)b; > } > > Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened. > > Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better. > > Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks.Test cases can be changed if the result is universally better, and alternatively, we can introduce a way for the target to control the behavior (e.g., how we choose between buildSqrtNROneConst and buildSqrtNRTwoConst). What's the effect on performance? -Hal> > Regards, > Qiu Chaofan > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Qiu Chaofan via llvm-dev
2019-Aug-08 16:47 UTC
[llvm-dev] 回复: [RFC] Improve iteration of estimating divisions
Hal, Yes, speed is an important factor of making dicision. Here I just put the numerator into estimation, so it won't add any more instructions. A simple benchmark below keeps the same running time between the demo and current master: ``` float fdiv(unsigned int a, unsigned int b) { return (float)a / (float)b; } float m; __attribute__((noinline)) void foo() { m = 0.0; } int main() { for (int i = 1; i < 1000000; ++i) for (int j = 1; j < 30000; ++j) { m = fdiv(i, j); foo(); } } ``` Regards, Qiu Chaofan ________________________________________ 发件人: Finkel, Hal J. <hfinkel at anl.gov> 发送时间: 2019年8月7日 4:04 收件人: 邱 超凡; llvm-dev at lists.llvm.org 主题: Re: [llvm-dev] [RFC] Improve iteration of estimating divisions On 8/6/19 12:20 AM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le: > > float fdiv(unsigned int a, unsigned int b) { > return (float)a / (float)b; > } > > Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened. > > Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better. > > Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks.Test cases can be changed if the result is universally better, and alternatively, we can introduce a way for the target to control the behavior (e.g., how we choose between buildSqrtNROneConst and buildSqrtNRTwoConst). What's the effect on performance? -Hal> > Regards, > Qiu Chaofan > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&data=02%7C01%7C%7Cdbff2450e5bb4b63e5f108d71aa94e7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637007186791161795&sdata=LWVNeuqNP0FRnckeZQk03JwJcuBJgsKZh%2Fb%2BddLrhhU%3D&reserved=0-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory