Andrew V. Tischenko via llvm-dev
2017-Jan-27 16:30 UTC
[llvm-dev] RFC: Moving DAG heuristic-based transforms to MI passes
All llvm-devs, We're going to introduce the new possible implementation for such optimizations as reciprocal estimation instead of fdiv. In short it's a replacement of fdiv instruction (which is very expensive in most of CPUs) with alternative sequence of instructions which is usually cheaper but has appropriate precision (see genReciprocalDiv in lib/Target/X86/X86InstrInfo.cpp for details). There are other similar optimizations like usage of rsqrt, etc. but at the moment we're dealing with recip estimation only - see https://reviews.llvm.org/D26855 for details. The current version of optimization is done at DAG Combiner level when we don't know the exact target instructions which will be used by CodeGen. As result we don't know the real cost of the alternative sequence and can't compare that cost with the cost of the single fdiv. As result the decision to select an alternative sequence (made on compiler options only) could be wrong because modern CPUs introduce very cheap fdiv and we should use it directly. We suggest to move the implementation from DAG heuristics to MI-scheduler-based transformations (Machine Combiner). At that time we know exact target instructions and are able to use scheduler-based cost model. This knowledge allows as to select proper code sequence for final target code generation. A possible disadvantage of the new implementation is compile time increasing (as discussed in D26855), but we expect to make improvements in that area. For the initial change (reciprocal transform), any difference is limited to fast-math compilations. Any objections, suggestion, comments?
Hal Finkel via llvm-dev
2017-Jan-27 20:56 UTC
[llvm-dev] RFC: Moving DAG heuristic-based transforms to MI passes
On 01/27/2017 10:30 AM, Andrew V. Tischenko via llvm-dev wrote:> All llvm-devs, > > We're going to introduce the new possible implementation for such > optimizations as reciprocal estimation instead of fdiv. In short it's > a replacement of fdiv instruction (which is very expensive in most of > CPUs) with alternative sequence of instructions which is usually > cheaper but has appropriate precision (see genReciprocalDiv in > lib/Target/X86/X86InstrInfo.cpp for details). There are other similar > optimizations like usage of rsqrt, etc. but at the moment we're > dealing with recip estimation only - see > https://reviews.llvm.org/D26855 for details. > > The current version of optimization is done at DAG Combiner level when > we don't know the exact target instructions which will be used by > CodeGen. As result we don't know the real cost of the alternative > sequence and can't compare that cost with the cost of the single fdiv. > As result the decision to select an alternative sequence (made on > compiler options only) could be wrong because modern CPUs introduce > very cheap fdiv and we should use it directly. > > We suggest to move the implementation from DAG heuristics to > MI-scheduler-based transformations (Machine Combiner). At that time we > know exact target instructions and are able to use scheduler-based > cost model. This knowledge allows as to select proper code sequence > for final target code generation. > > A possible disadvantage of the new implementation is compile time > increasing (as discussed in D26855), but we expect to make > improvements in that area. For the initial change (reciprocal > transform), any difference is limited to fast-math compilations. > > Any objections, suggestion, comments? >Are you asking whether is okay to commit the change first and then look at the MachineCombiner's worst-case performance in followup? In general, I think that moving to using the MachineCombiner for these kinds of transformations, where there are complex tradeoffs between latency, throughput, etc., is the right direction. -Hal> > _______________________________________________ > LLVM Developers mailing list > llvm-dev at lists.llvm.org > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev-- Hal Finkel Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory
Andrew V. Tischenko via llvm-dev
2017-Jan-28 10:19 UTC
[llvm-dev] RFC: Moving DAG heuristic-based transforms to MI passes
In fact to commit the change before dealing with worst-case performance is a good idea because here we have 2 different issues. But the main idea of this RFC is an attempt to show the better approach to to these kinds of transformations and to suggest to use this approach in the future. At the same time, I'm trying to explain that this patch is not the performance one because the generated code is almost identical to what we have just now. It is a suggestion to change the strategy in such transformations elaborating. If the community accept this new strategy we're ready to introduce new similar transformations, automate the framework, etc. But of course it will be themes for new RFCs and discussions. On 1/27/2017 11:56 PM, Hal Finkel wrote:> On 01/27/2017 10:30 AM, Andrew V. Tischenko via llvm-dev wrote: > >> All llvm-devs, >> >> We're going to introduce the new possible implementation for such >> optimizations as reciprocal estimation instead of fdiv. In short it's >> a replacement of fdiv instruction (which is very expensive in most of >> CPUs) with alternative sequence of instructions which is usually >> cheaper but has appropriate precision (see genReciprocalDiv in >> lib/Target/X86/X86InstrInfo.cpp for details). There are other similar >> optimizations like usage of rsqrt, etc. but at the moment we're >> dealing with recip estimation only - see >> https://reviews.llvm.org/D26855 for details. >> >> The current version of optimization is done at DAG Combiner level >> when we don't know the exact target instructions which will be used >> by CodeGen. As result we don't know the real cost of the alternative >> sequence and can't compare that cost with the cost of the single >> fdiv. As result the decision to select an alternative sequence (made >> on compiler options only) could be wrong because modern CPUs >> introduce very cheap fdiv and we should use it directly. >> >> We suggest to move the implementation from DAG heuristics to >> MI-scheduler-based transformations (Machine Combiner). At that time >> we know exact target instructions and are able to use scheduler-based >> cost model. This knowledge allows as to select proper code sequence >> for final target code generation. >> >> A possible disadvantage of the new implementation is compile time >> increasing (as discussed in D26855), but we expect to make >> improvements in that area. For the initial change (reciprocal >> transform), any difference is limited to fast-math compilations. >> >> Any objections, suggestion, comments? >> > > Are you asking whether is okay to commit the change first and then > look at the MachineCombiner's worst-case performance in followup? In > general, I think that moving to using the MachineCombiner for these > kinds of transformations, where there are complex tradeoffs between > latency, throughput, etc., is the right direction. > > -Hal > >> >> _______________________________________________ >> LLVM Developers mailing list >> llvm-dev at lists.llvm.org >> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev >