Qiu Chaofan via llvm-dev
2019-Aug-08 16:47 UTC
[llvm-dev] 回复: [RFC] Improve iteration of estimating divisions
Hal, Yes, speed is an important factor of making dicision. Here I just put the numerator into estimation, so it won't add any more instructions. A simple benchmark below keeps the same running time between the demo and current master: ``` float fdiv(unsigned int a, unsigned int b) { return (float)a / (float)b; } float m; __attribute__((noinline)) void foo() { m = 0.0; } int main() { for (int i = 1; i < 1000000; ++i) for (int j = 1; j < 30000; ++j) { m = fdiv(i, j); foo(); } } ``` Regards, Qiu Chaofan ________________________________________ 发件人: Finkel, Hal J. <hfinkel at anl.gov> 发送时间: 2019年8月7日 4:04 收件人: 邱 超凡; llvm-dev at lists.llvm.org 主题: Re: [llvm-dev] [RFC] Improve iteration of estimating divisions On 8/6/19 12:20 AM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le: > > float fdiv(unsigned int a, unsigned int b) { > return (float)a / (float)b; > } > > Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened. > > Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better. > > Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks.Test cases can be changed if the result is universally better, and alternatively, we can introduce a way for the target to control the behavior (e.g., how we choose between buildSqrtNROneConst and buildSqrtNRTwoConst). What's the effect on performance? -Hal> > Regards, > Qiu Chaofan > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&data=02%7C01%7C%7Cdbff2450e5bb4b63e5f108d71aa94e7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637007186791161795&sdata=LWVNeuqNP0FRnckeZQk03JwJcuBJgsKZh%2Fb%2BddLrhhU%3D&reserved=0-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Finkel, Hal J. via llvm-dev
2019-Aug-08 16:58 UTC
[llvm-dev] 回复: [RFC] Improve iteration of estimating divisions
I think that it's certainly worth posting a patch and then we can evaluate it. Thanks again, Hal Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory ________________________________ From: Qiu Chaofan <qcf.ibm at outlook.com> Sent: Thursday, August 8, 2019 11:47 AM To: Finkel, Hal J. <hfinkel at anl.gov>; llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: 回复: [llvm-dev] [RFC] Improve iteration of estimating divisions Hal, Yes, speed is an important factor of making dicision. Here I just put the numerator into estimation, so it won't add any more instructions. A simple benchmark below keeps the same running time between the demo and current master: ``` float fdiv(unsigned int a, unsigned int b) { return (float)a / (float)b; } float m; __attribute__((noinline)) void foo() { m = 0.0; } int main() { for (int i = 1; i < 1000000; ++i) for (int j = 1; j < 30000; ++j) { m = fdiv(i, j); foo(); } } ``` Regards, Qiu Chaofan ________________________________________ 发件人: Finkel, Hal J. <hfinkel at anl.gov> 发送时间: 2019年8月7日 4:04 收件人: 邱 超凡; llvm-dev at lists.llvm.org 主题: Re: [llvm-dev] [RFC] Improve iteration of estimating divisions On 8/6/19 12:20 AM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le: > > float fdiv(unsigned int a, unsigned int b) { > return (float)a / (float)b; > } > > Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened. > > Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better. > > Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks.Test cases can be changed if the result is universally better, and alternatively, we can introduce a way for the target to control the behavior (e.g., how we choose between buildSqrtNROneConst and buildSqrtNRTwoConst). What's the effect on performance? -Hal> > Regards, > Qiu Chaofan > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&data=02%7C01%7C%7Cdbff2450e5bb4b63e5f108d71aa94e7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637007186791161795&sdata=LWVNeuqNP0FRnckeZQk03JwJcuBJgsKZh%2Fb%2BddLrhhU%3D&reserved=0-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190808/9597e3cc/attachment.html>
Qiu Chaofan via llvm-dev
2019-Aug-09 09:34 UTC
[llvm-dev] [RFC] Improve iteration of estimating divisions
Hal, Here is the patch. Thanks. Regards, Qiu Chaofan ________________________________________ From: Finkel, Hal J. <hfinkel at anl.gov> Sent: August 9, 2019 0:58 AM To: Qiu Chaofan; llvm-dev at lists.llvm.org Subject: Re: 回复: [llvm-dev] [RFC] Improve iteration of estimating divisions I think that it's certainly worth posting a patch and then we can evaluate it. Thanks again, Hal Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory ________________________________ From: Qiu Chaofan <qcf.ibm at outlook.com> Sent: Thursday, August 8, 2019 11:47 AM To: Finkel, Hal J. <hfinkel at anl.gov>; llvm-dev at lists.llvm.org <llvm-dev at lists.llvm.org> Subject: 回复: [llvm-dev] [RFC] Improve iteration of estimating divisions Hal, Yes, speed is an important factor of making dicision. Here I just put the numerator into estimation, so it won't add any more instructions. A simple benchmark below keeps the same running time between the demo and current master: ``` float fdiv(unsigned int a, unsigned int b) { return (float)a / (float)b; } float m; __attribute__((noinline)) void foo() { m = 0.0; } int main() { for (int i = 1; i < 1000000; ++i) for (int j = 1; j < 30000; ++j) { m = fdiv(i, j); foo(); } } ``` Regards, Qiu Chaofan ________________________________________ 发件人: Finkel, Hal J. <hfinkel at anl.gov> 发送时间: 2019年8月7日 4:04 收件人: 邱 超凡; llvm-dev at lists.llvm.org 主题: Re: [llvm-dev] [RFC] Improve iteration of estimating divisions On 8/6/19 12:20 AM, 邱 超凡 via llvm-dev wrote:> Hi there, I notice that our current implementation of fast division transformation (turn `a / b` into `a * (1/b)`) is worse in precision compared with GCC. Like this case in ppc64le: > > float fdiv(unsigned int a, unsigned int b) { > return (float)a / (float)b; > } > > Result of Clang -Ofast is 41A00001 (in Hex), while GCC produces 41A00000 which is the same as no optimizations opened. > > Currently, DAGCombiner uses `BuildReciprocalEstimate` to calculate the reciprocal (`1/b`) first and multiply it with `a`. But if we put the operand `a` into iterations in the estimate function, the result would be better. > > Patching such a change may break several existing test cases in different platforms since it’s target-independent code. So any suggestions are welcome. Thanks.Test cases can be changed if the result is universally better, and alternatively, we can introduce a way for the target to control the behavior (e.g., how we choose between buildSqrtNROneConst and buildSqrtNRTwoConst). What's the effect on performance? -Hal> > Regards, > Qiu Chaofan > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > https://nam02.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&data=02%7C01%7C%7Cdbff2450e5bb4b63e5f108d71aa94e7f%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637007186791161795&sdata=LWVNeuqNP0FRnckeZQk03JwJcuBJgsKZh%2Fb%2BddLrhhU%3D&reserved=0<https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.llvm.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fllvm-dev&data=02%7C01%7C%7Cc2abcac5871c4d53a5df08d71c219e51%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637008803028883849&sdata=RqzLfRlaeUZJzkeeVwzeDJ%2Be%2BjVlgINNs7wlQGWkzN8%3D&reserved=0>-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: recip-new.patch Type: application/octet-stream Size: 3305 bytes Desc: recip-new.patch URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190809/d4b73d65/attachment.obj>