Steve King via llvm-dev
2015-Aug-19 20:45 UTC
[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)
Hello LLVM, A recent commit creates the isIntDivCheap() target query. http://reviews.llvm.org/D12082 The current approach has a couple shortcomings. First, when targets decide divide is cheap, the DAGCombiner ignores obvious power-of-2 optimizations. In the targets I know, shifts are cheaper than divides in both speed and size. The target cannot see the value in the isIntDivCheap() call, so cannot itself check for power-of-2 cases. Second, isIntDivCheap passes a MinSize flag to the target, which is true for -Oz (MinSize function attribute). However, the MinSize flags is false for -Os (OptimizeSize function attribute). Both of these try to optimize for size and targets may like to indicate cheap division in both cases, or perhaps for other function attributes in the future. I think some improvements would be helpful: The DAGCombiner should always optimize for power-of-2 division before considering the general integer case. isIntDivCheap() should pass the MachineFunction to the target rather than just a MinSize flag. If it cares, the target can check function attributes as it sees fit. Does this sound sensible? Thanks, -steve
escha via llvm-dev
2015-Aug-19 22:48 UTC
[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)
> On Aug 19, 2015, at 1:45 PM, Steve King via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > In the targets I know, shifts are > cheaper than divides in both speed and size.From what I remember, udiv by power of 2 already gets turned into a shift in instcombine; the tricky case is sdiv by power of 2, which takes significantly more than one instruction. The generic implementation is this: // Splat the sign bit into the register SDValue SGN DAG.getNode(ISD::SRA, DL, VT, N0, DAG.getConstant(VT.getScalarSizeInBits() - 1, DL, getShiftAmountTy(N0.getValueType()))); AddToWorklist(SGN.getNode()); // Add (N0 < 0) ? abs2 - 1 : 0; SDValue SRL DAG.getNode(ISD::SRL, DL, VT, SGN, DAG.getConstant(VT.getScalarSizeInBits() - lg2, DL, getShiftAmountTy(SGN.getValueType()))); SDValue ADD = DAG.getNode(ISD::ADD, DL, VT, N0, SRL); AddToWorklist(SRL.getNode()); AddToWorklist(ADD.getNode()); // Divide by pow2 SDValue SRA = DAG.getNode(ISD::SRA, DL, VT, ADD, DAG.getConstant(lg2, DL, getShiftAmountTy(ADD.getValueType()))); And there’s already a target-specific hook to add a custom implementation (e.g. on PowerPC): // Target-specific implementation of sdiv x, pow2. if (SDValue Res = BuildSDIVPow2(N)) return Res; -— escha -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150819/8e447e83/attachment.html>
Steve King via llvm-dev
2015-Aug-20 00:00 UTC
[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)
On Wed, Aug 19, 2015 at 3:48 PM, escha <escha at apple.com> wrote:> > On Aug 19, 2015, at 1:45 PM, Steve King via llvm-dev > <llvm-dev at lists.llvm.org> wrote: > > In the targets I know, shifts are > cheaper than divides in both speed and size. > > > From what I remember, udiv by power of 2 already gets turned into a shift in > instcombine; the tricky case is sdiv by power of 2, which takes > significantly more than one instruction.Thanks escha. I fooled myself and all the offending cases in my target were indeed SDIVs.
Mehdi Amini via llvm-dev
2015-Aug-20 05:58 UTC
[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)
> On Aug 19, 2015, at 3:48 PM, escha via llvm-dev <llvm-dev at lists.llvm.org> wrote: > > >> On Aug 19, 2015, at 1:45 PM, Steve King via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote: >> >> In the targets I know, shifts are >> cheaper than divides in both speed and size. > > From what I remember, udiv by power of 2 already gets turned into a shift in instcombine; the tricky case is sdiv by power of 2, which takes significantly more than one instruction. The generic implementation is this: > > // Splat the sign bit into the register > SDValue SGN > DAG.getNode(ISD::SRA, DL, VT, N0, > DAG.getConstant(VT.getScalarSizeInBits() - 1, DL, > getShiftAmountTy(N0.getValueType()))); > AddToWorklist(SGN.getNode()); > > // Add (N0 < 0) ? abs2 - 1 : 0; > SDValue SRL > DAG.getNode(ISD::SRL, DL, VT, SGN, > DAG.getConstant(VT.getScalarSizeInBits() - lg2, DL, > getShiftAmountTy(SGN.getValueType()))); > SDValue ADD = DAG.getNode(ISD::ADD, DL, VT, N0, SRL); > AddToWorklist(SRL.getNode()); > AddToWorklist(ADD.getNode()); // Divide by pow2 > SDValue SRA = DAG.getNode(ISD::SRA, DL, VT, ADD, > DAG.getConstant(lg2, DL, > getShiftAmountTy(ADD.getValueType()))); > > And there’s already a target-specific hook to add a custom implementation (e.g. on PowerPC): > > // Target-specific implementation of sdiv x, pow2. > if (SDValue Res = BuildSDIVPow2(N)) > return Res;Isn’t the problem the fact that the patch makes it harder for a target to get the generic code to reach its custom hook? Now the "cheap pow2 sdiv” is merged with the generic “cheap div” you can’t distinguish anymore. — Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150819/aaca66f4/attachment.html>