thr3ads.net - llvm dev - [llvm-dev] Dealing with information loss for widened integer operations at ISel time [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Alex Bradbury via llvm-dev

2018-Dec-14 11:43 UTC

[llvm-dev] Dealing with information loss for widened integer operations at ISel time

On Thu, 13 Dec 2018 at 21:41, Friedman, Eli <efriedma at codeaurora.org>
wrote:>
> On 12/13/2018 6:25 AM, Alex Bradbury wrote:
> > There's also likely to be cases where you want to calculate the
demanded bits
> > in order to determine if e.g. a W-suffixed instruction can be selected
for
> > `(somoeop (zexti32 GPR:$rs1), (zexti32 GPR:$rs2))`. This is easy to
match if
> > the SelectionDAG contains an explicit `sext_inreg` of the result. But
if not,
> > you'd need to check whether the upper 32 bits are actually
demanded or not.
>
> Could you describe more specifically where this matters?  I would guess
> the W-suffixed instructions aren't actually any cheaper, in general,
> than non-W-suffixed instructions, so there's no point to using W forms
> if you don't need the sign extension. Unless MULW has lower latency
than
> MUL on some CPU?
It only matters when it allows you to avoid an unnecessary sext/zext
of the input operands. Consider the following input:

define i32 @aext_divuw_aext_aext(i32 %a, i32 %b) nounwind {
  %1 = udiv i32 %a, %b
  ret i32 %1
}

This pattern won't match:
def : Pat<(sext_inreg (udiv (zexti32 GPR:$rs1), (zexti32 GPR:$rs2)), i32),
          (DIVUW GPR:$rs1, GPR:$rs2)>;

This pattern would match but is incorrect in the general case (e.g. if
rs1 is 0xffffffff and rs2 is 0x1, the result will be sign-extended).:
def : Pat<(udiv (zexti32 GPR:$rs1), (zexti32 GPR:$rs2)),
          (DIVUW GPR:$rs1, GPR:$rs2)>;

So this function would generate:
slli a1, a1, 32
srli a1, a1, 32
slli a0, a0, 32
srli a0, a0, 32
divu a0, a0, a1
ret

Rather than a simple divuw. Obviously we can argue whether such cases
are likely to occur in real-world code (certainly this specific case
of aext function args/returns isn't going to happen for
clang-generated code), but it makes me uncomfortable that we can't
easily make the best codegen decision for such a straight-forward
input.
> > Any thoughts on how to best handle these cases? For the shifts I could
> > obviously introduce a target-specific SelectionDAG node, but then
I'd lose the
> > benefit of most of the target-independent DAG combines. A target DAG
combine
> > could custom lower the shift amount. e.g. converting (shl val, shamt)
to (shl
> > val, (and shamt, 31)) before it gets widened to i64. But this runs the
risk
> > of introducing an unnecessary AND operation if it turns out that the
> > SelectionDAG node is transformed by the time it reaches instruction
selection.
> > So for this particular issue with shifts, introducing a
target-specific WasI32
> > node and converting to (shl val, (WasI32 shamt)) in a target DAG
combine looks
> > plausible.
>
> You can represent "WasI32" using AssertZext i5.  That seems like
a
> reasonable approach.
I hadn't considered AssertZext, thanks for the suggestion. Don't we
really want AssertAnyExt i5 though (which doesn't currently exist)? We
can't guarantee that the bits other than lower 5 are zero. I'd be
willing to believe that there's no way this infidelity could cause
issues for this particular transformation and the current dag
combiner, but it still seems shaky.

Best,

Alex

Friedman, Eli via llvm-dev

2018-Dec-14 21:08 UTC

head link

[llvm-dev] Dealing with information loss for widened integer operations at ISel time

On 12/14/2018 3:43 AM, Alex Bradbury wrote:> On Thu, 13 Dec 2018 at 21:41, Friedman, Eli <efriedma at
codeaurora.org> wrote:
>> On 12/13/2018 6:25 AM, Alex Bradbury wrote:
>>> There's also likely to be cases where you want to calculate the
demanded bits
>>> in order to determine if e.g. a W-suffixed instruction can be
selected for
>>> `(somoeop (zexti32 GPR:$rs1), (zexti32 GPR:$rs2))`. This is easy to
match if
>>> the SelectionDAG contains an explicit `sext_inreg` of the result.
But if not,
>>> you'd need to check whether the upper 32 bits are actually
demanded or not.
>> Could you describe more specifically where this matters?  I would guess
>> the W-suffixed instructions aren't actually any cheaper, in
general,
>> than non-W-suffixed instructions, so there's no point to using W
forms
>> if you don't need the sign extension. Unless MULW has lower latency
than
>> MUL on some CPU?
> It only matters when it allows you to avoid an unnecessary sext/zext
> of the input operands. Consider the following input:
>
> define i32 @aext_divuw_aext_aext(i32 %a, i32 %b) nounwind {
>    %1 = udiv i32 %a, %b
>    ret i32 %1
> }
>
> This pattern won't match:
> def : Pat<(sext_inreg (udiv (zexti32 GPR:$rs1), (zexti32 GPR:$rs2)),
i32),
>            (DIVUW GPR:$rs1, GPR:$rs2)>;
>
> This pattern would match but is incorrect in the general case (e.g. if
> rs1 is 0xffffffff and rs2 is 0x1, the result will be sign-extended).:
> def : Pat<(udiv (zexti32 GPR:$rs1), (zexti32 GPR:$rs2)),
>            (DIVUW GPR:$rs1, GPR:$rs2)>;
>
> So this function would generate:
> slli a1, a1, 32
> srli a1, a1, 32
> slli a0, a0, 32
> srli a0, a0, 32
> divu a0, a0, a1
> ret
>
> Rather than a simple divuw. Obviously we can argue whether such cases
> are likely to occur in real-world code (certainly this specific case
> of aext function args/returns isn't going to happen for
> clang-generated code), but it makes me uncomfortable that we can't
> easily make the best codegen decision for such a straight-forward
> input.
Okay, that makes sense.

I think you only end up with an explicit "zext" like this for unsigned
division and unsigned right shift with a variable amount?  And for 
signed division, I guess you end up with an analogous problem with sext 
(since -2^31/-1 needs to produce 2^31, not -2^31).

For add/mul/etc. if the high result bits aren't demanded, the high 
operand bits also aren't demanded, so the extensions should be 
eliminated.  For unsigned right shift with a zero-extended operand and a 
constant non-zero shift amount, sign-extension is a no-op, so the 
obvious pattern just works. And for signed right shift, the result is 
always sign-extended, so the obvious pattern also just works.

For the nodes where it does matter, you could define RISCVISD::DIVUW 
etc. if necessary, I guess... and generate it using either custom 
lowering or SimplifyDemandedBitsForTargetNode.  I can't think of any 
other approach.
>>> Any thoughts on how to best handle these cases? For the shifts I
could
>>> obviously introduce a target-specific SelectionDAG node, but then
I'd lose the
>>> benefit of most of the target-independent DAG combines. A target
DAG combine
>>> could custom lower the shift amount. e.g. converting (shl val,
shamt) to (shl
>>> val, (and shamt, 31)) before it gets widened to i64. But this runs
the risk
>>> of introducing an unnecessary AND operation if it turns out that
the
>>> SelectionDAG node is transformed by the time it reaches instruction
selection.
>>> So for this particular issue with shifts, introducing a
target-specific WasI32
>>> node and converting to (shl val, (WasI32 shamt)) in a target DAG
combine looks
>>> plausible.
>> You can represent "WasI32" using AssertZext i5.  That seems
like a
>> reasonable approach.
> I hadn't considered AssertZext, thanks for the suggestion. Don't we
> really want AssertAnyExt i5 though (which doesn't currently exist)? We
> can't guarantee that the bits other than lower 5 are zero. I'd be
> willing to believe that there's no way this infidelity could cause
> issues for this particular transformation and the current dag
> combiner, but it still seems shaky.
The property you want is that the high bits of the shift amount are 
zero, or the result of the shift is undefined.  That matches "(shl x, 
(AssertZext i5 shamt))", as far as I know... maybe we could clarify the 
documentation?

Another alternative is to introduce RISCVISD::SHLW which is the same as 
ISD::SHL except it masks the bottom bits.  You potentially lose (or have 
to duplicate) some late DAGCombines, but we don't really have any 
interesting combines for shifts with a variable shift amount anyway.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Alex Bradbury via llvm-dev

2019-Jan-03 12:01 UTC

head link

[llvm-dev] Dealing with information loss for widened integer operations at ISel time

On Fri, 14 Dec 2018 at 21:08, Friedman, Eli <efriedma at codeaurora.org>
wrote:>
> On 12/14/2018 3:43 AM, Alex Bradbury wrote:
> > It only matters when it allows you to avoid an unnecessary sext/zext
> > of the input operands. Consider the following input:
> >
> > define i32 @aext_divuw_aext_aext(i32 %a, i32 %b) nounwind {
> >    %1 = udiv i32 %a, %b
> >    ret i32 %1
> > }
> >
> > This pattern won't match:
> > def : Pat<(sext_inreg (udiv (zexti32 GPR:$rs1), (zexti32
GPR:$rs2)), i32),
> >            (DIVUW GPR:$rs1, GPR:$rs2)>;
> >
> > This pattern would match but is incorrect in the general case (e.g. if
> > rs1 is 0xffffffff and rs2 is 0x1, the result will be sign-extended).:
> > def : Pat<(udiv (zexti32 GPR:$rs1), (zexti32 GPR:$rs2)),
> >            (DIVUW GPR:$rs1, GPR:$rs2)>;
> >
> > So this function would generate:
> > slli a1, a1, 32
> > srli a1, a1, 32
> > slli a0, a0, 32
> > srli a0, a0, 32
> > divu a0, a0, a1
> > ret
> >
> > Rather than a simple divuw. Obviously we can argue whether such cases
> > are likely to occur in real-world code (certainly this specific case
> > of aext function args/returns isn't going to happen for
> > clang-generated code), but it makes me uncomfortable that we can't
> > easily make the best codegen decision for such a straight-forward
> > input.
>
> Okay, that makes sense.
>
> I think you only end up with an explicit "zext" like this for
unsigned
> division and unsigned right shift with a variable amount?  And for
> signed division, I guess you end up with an analogous problem with sext
> (since -2^31/-1 needs to produce 2^31, not -2^31).
Hi Eli, I've posted https://reviews.llvm.org/D56264 and updated
https://reviews.llvm.org/D53230 to add patterns for SLLW/SRLW/SRAW and
the RV64M instructions. Rather than introducing custom SelectionDAG
nodes, I've added DAG combines for the shifts to add assertzext i5 and
additionally added a DAG combine to convert an ANY_EXTEND to a
SIGN_EXTEND when this is likely to allow a *W instruction to be
selected (can't leave this to ISel time, because the any_extend will
be never reach instruction selection).

I'd really welcome review and any extra input on these patches.

Thanks,

Alex

llvm dev - Dec 2018 - Dealing with information loss for widened integer operations at ISel time

[llvm-dev] Dealing with information loss for widened integer operations at ISel time

[llvm-dev] Dealing with information loss for widened integer operations at ISel time

[llvm-dev] Dealing with information loss for widened integer operations at ISel time