thr3ads.net - llvm dev - [llvm-dev] [RFC] Improving integer divide optimization (related to D12082) [Aug 2015]

If this information is useful, please help other people find it:
Share via:

Steve King via llvm-dev

2015-Aug-19 20:45 UTC

[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)

Hello LLVM,  A recent commit creates the isIntDivCheap() target query.

http://reviews.llvm.org/D12082

The current approach has a couple shortcomings.

First, when targets decide divide is cheap, the DAGCombiner ignores
obvious power-of-2 optimizations.  In the targets I know, shifts are
cheaper than divides in both speed and size.  The target cannot see
the value in the isIntDivCheap() call, so cannot itself check for
power-of-2 cases.

Second, isIntDivCheap passes a MinSize flag to the target, which is
true for -Oz (MinSize function attribute).  However, the MinSize flags
is false for -Os (OptimizeSize function attribute).  Both of these try
to optimize for size and targets may like to indicate cheap division
in both cases, or perhaps for other function attributes in the future.

I think some improvements would be helpful:
The DAGCombiner should always optimize for power-of-2 division before
considering the general integer case.
isIntDivCheap() should pass the MachineFunction to the target rather
than just a MinSize flag.  If it cares, the target can check function
attributes as it sees fit.

Does this sound sensible?

Thanks,
-steve

escha via llvm-dev

2015-Aug-19 22:48 UTC

head link

[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)

> On Aug 19, 2015, at 1:45 PM, Steve King via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> In the targets I know, shifts are
> cheaper than divides in both speed and size.
From what I remember, udiv by power of 2 already gets turned into a shift in
instcombine; the tricky case is sdiv by power of 2, which takes significantly
more than one instruction. The generic implementation is this:

    // Splat the sign bit into the register
    SDValue SGN         DAG.getNode(ISD::SRA, DL, VT, N0,
                    DAG.getConstant(VT.getScalarSizeInBits() - 1, DL,
                                    getShiftAmountTy(N0.getValueType())));
    AddToWorklist(SGN.getNode());

    // Add (N0 < 0) ? abs2 - 1 : 0;
    SDValue SRL         DAG.getNode(ISD::SRL, DL, VT, SGN,
                    DAG.getConstant(VT.getScalarSizeInBits() - lg2, DL,
                                    getShiftAmountTy(SGN.getValueType())));
    SDValue ADD = DAG.getNode(ISD::ADD, DL, VT, N0, SRL);
    AddToWorklist(SRL.getNode());
    AddToWorklist(ADD.getNode());    // Divide by pow2
    SDValue SRA = DAG.getNode(ISD::SRA, DL, VT, ADD,
                  DAG.getConstant(lg2, DL,
                                  getShiftAmountTy(ADD.getValueType())));

And there’s already a target-specific hook to add a custom implementation (e.g.
on PowerPC):

    // Target-specific implementation of sdiv x, pow2.
    if (SDValue Res = BuildSDIVPow2(N))
      return Res;

-— escha
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150819/8e447e83/attachment.html>

Steve King via llvm-dev

2015-Aug-20 00:00 UTC

head link

[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)

On Wed, Aug 19, 2015 at 3:48 PM, escha <escha at apple.com>
wrote:>
> On Aug 19, 2015, at 1:45 PM, Steve King via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> In the targets I know, shifts are
> cheaper than divides in both speed and size.
>
>
> From what I remember, udiv by power of 2 already gets turned into a shift
in
> instcombine; the tricky case is sdiv by power of 2, which takes
> significantly more than one instruction.
Thanks escha.  I fooled myself and all the offending cases in my
target were indeed SDIVs.

Mehdi Amini via llvm-dev

2015-Aug-20 05:58 UTC

head link

[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)

> On Aug 19, 2015, at 3:48 PM, escha via llvm-dev <llvm-dev at
lists.llvm.org> wrote:
> 
> 
>> On Aug 19, 2015, at 1:45 PM, Steve King via llvm-dev <llvm-dev at
lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>> In the targets I know, shifts are
>> cheaper than divides in both speed and size.
> 
> From what I remember, udiv by power of 2 already gets turned into a shift
in instcombine; the tricky case is sdiv by power of 2, which takes significantly
more than one instruction. The generic implementation is this:
> 
>     // Splat the sign bit into the register
>     SDValue SGN >         DAG.getNode(ISD::SRA, DL, VT, N0,
>                     DAG.getConstant(VT.getScalarSizeInBits() - 1, DL,
>                                     getShiftAmountTy(N0.getValueType())));
>     AddToWorklist(SGN.getNode());
> 
>     // Add (N0 < 0) ? abs2 - 1 : 0;
>     SDValue SRL >         DAG.getNode(ISD::SRL, DL, VT, SGN,
>                     DAG.getConstant(VT.getScalarSizeInBits() - lg2, DL,
>                                     getShiftAmountTy(SGN.getValueType())));
>     SDValue ADD = DAG.getNode(ISD::ADD, DL, VT, N0, SRL);
>     AddToWorklist(SRL.getNode());
>     AddToWorklist(ADD.getNode());    // Divide by pow2
>     SDValue SRA = DAG.getNode(ISD::SRA, DL, VT, ADD,
>                   DAG.getConstant(lg2, DL,
>                                   getShiftAmountTy(ADD.getValueType())));
> 
> And there’s already a target-specific hook to add a custom implementation
(e.g. on PowerPC):
> 
>     // Target-specific implementation of sdiv x, pow2.
>     if (SDValue Res = BuildSDIVPow2(N))
>       return Res;

Isn’t the problem the fact that the patch makes it harder for a target to get
the generic code to reach its custom hook?
Now the "cheap pow2 sdiv” is merged with the generic “cheap div” you can’t
distinguish anymore.

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20150819/aaca66f4/attachment.html>

Maybe Matching Threads

Search for more reasonably related threads

llvm dev - Aug 2015 - [RFC] Improving integer divide optimization (related to D12082)

[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)

[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)

[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)

[llvm-dev] [RFC] Improving integer divide optimization (related to D12082)

Maybe Matching Threads