thr3ads.net - llvm dev - [llvm-dev] Dealing with information loss for widened integer operations at ISel time [Dec 2018]

If this information is useful, please help other people find it:
Share via:

Alex Bradbury via llvm-dev

2018-Dec-13 14:25 UTC

[llvm-dev] Dealing with information loss for widened integer operations at ISel time

As previously discussed in an RFC
<http://lists.llvm.org/pipermail/llvm-dev/2018-October/126690.html>, the
RISC-V backend has i64 as the only legal integer type for the RV64 target.
Thanks to variable-sized register class support, this means there is no need
for duplication of either patterns or instruction definitions for RV32 and
RV64. It's worth noting that RV64I is a different base ISA to RV32I. Rather
than adding 64-bit operations, it re-defines the operations as 64-bit and
introduces a small number of 'W' suffixed instructions such as ADDW,
SLLW etc
to operate on 32-bit values.

There are some challenges though. Consider the variable-length shifts
introduced in RV64I. SLLW, SRLW, and SRAW operate on 32-bit values and produce
32-bit sign-extended results. They read only the lower 5 bits from the shift
amount. The following function should trivially resuslt in SRLW being
selected:

    define signext i32 @sllw(i32 signext %a, i32 zeroext %b) {
      %1 = shl i32 %a, %b
      ret i32 %1
    }

Except it's not actually possible to write a correct pattern for this. It is
essential to know the original type of the shift amount. The fact it was i32
means that it's safe to select SLLW in this case (as a shift amount > 31
would
be undefined behaviour). It's tempting to write a pattern like the
following:

    def : Pat<(sext_inreg (shl GPR:$rs1, GPR:$rs2), i32),
              (SLLW GPR:$rs1, GPR:$rs2)>;

But as Eli Friedman kindly pointed out, the `sext_inreg` node can be generated
in cases other than i32 to i64 widening. e.g.

    define i64 @tricky_shl(i64 %a, i64 %b) {
      %1 = shl i64 %a, %b
      %2 = shl i64 %1, 32
      %3 = ashr i64 %2, 32
      ret i64 %3
    }

There's also likely to be cases where you want to calculate the demanded
bits
in order to determine if e.g. a W-suffixed instruction can be selected for
`(somoeop (zexti32 GPR:$rs1), (zexti32 GPR:$rs2))`. This is easy to match if
the SelectionDAG contains an explicit `sext_inreg` of the result. But if not,
you'd need to check whether the upper 32 bits are actually demanded or not.

Any thoughts on how to best handle these cases? For the shifts I could
obviously introduce a target-specific SelectionDAG node, but then I'd lose
the
benefit of most of the target-independent DAG combines. A target DAG combine
could custom lower the shift amount. e.g. converting (shl val, shamt) to (shl
val, (and shamt, 31)) before it gets widened to i64. But this runs the risk
of introducing an unnecessary AND operation if it turns out that the
SelectionDAG node is transformed by the time it reaches instruction selection.
So for this particular issue with shifts, introducing a target-specific WasI32
node and converting to (shl val, (WasI32 shamt)) in a target DAG combine looks
plausible.

Thanks,

Alex

Friedman, Eli via llvm-dev

2018-Dec-13 21:41 UTC

head link

[llvm-dev] Dealing with information loss for widened integer operations at ISel time

On 12/13/2018 6:25 AM, Alex Bradbury wrote:> There's also likely to be cases where you want to calculate the
demanded bits
> in order to determine if e.g. a W-suffixed instruction can be selected for
> `(somoeop (zexti32 GPR:$rs1), (zexti32 GPR:$rs2))`. This is easy to match
if
> the SelectionDAG contains an explicit `sext_inreg` of the result. But if
not,
> you'd need to check whether the upper 32 bits are actually demanded or
not.
Could you describe more specifically where this matters?  I would guess 
the W-suffixed instructions aren't actually any cheaper, in general, 
than non-W-suffixed instructions, so there's no point to using W forms 
if you don't need the sign extension. Unless MULW has lower latency than 
MUL on some CPU?
> Any thoughts on how to best handle these cases? For the shifts I could
> obviously introduce a target-specific SelectionDAG node, but then I'd
lose the
> benefit of most of the target-independent DAG combines. A target DAG
combine
> could custom lower the shift amount. e.g. converting (shl val, shamt) to
(shl
> val, (and shamt, 31)) before it gets widened to i64. But this runs the risk
> of introducing an unnecessary AND operation if it turns out that the
> SelectionDAG node is transformed by the time it reaches instruction
selection.
> So for this particular issue with shifts, introducing a target-specific
WasI32
> node and converting to (shl val, (WasI32 shamt)) in a target DAG combine
looks
> plausible.
You can represent "WasI32" using AssertZext i5.  That seems like a 
reasonable approach.

-Eli

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
Foundation Collaborative Project

Alex Bradbury via llvm-dev

2018-Dec-14 11:43 UTC

head link

[llvm-dev] Dealing with information loss for widened integer operations at ISel time

On Thu, 13 Dec 2018 at 21:41, Friedman, Eli <efriedma at codeaurora.org>
wrote:>
> On 12/13/2018 6:25 AM, Alex Bradbury wrote:
> > There's also likely to be cases where you want to calculate the
demanded bits
> > in order to determine if e.g. a W-suffixed instruction can be selected
for
> > `(somoeop (zexti32 GPR:$rs1), (zexti32 GPR:$rs2))`. This is easy to
match if
> > the SelectionDAG contains an explicit `sext_inreg` of the result. But
if not,
> > you'd need to check whether the upper 32 bits are actually
demanded or not.
>
> Could you describe more specifically where this matters?  I would guess
> the W-suffixed instructions aren't actually any cheaper, in general,
> than non-W-suffixed instructions, so there's no point to using W forms
> if you don't need the sign extension. Unless MULW has lower latency
than
> MUL on some CPU?
It only matters when it allows you to avoid an unnecessary sext/zext
of the input operands. Consider the following input:

define i32 @aext_divuw_aext_aext(i32 %a, i32 %b) nounwind {
  %1 = udiv i32 %a, %b
  ret i32 %1
}

This pattern won't match:
def : Pat<(sext_inreg (udiv (zexti32 GPR:$rs1), (zexti32 GPR:$rs2)), i32),
          (DIVUW GPR:$rs1, GPR:$rs2)>;

This pattern would match but is incorrect in the general case (e.g. if
rs1 is 0xffffffff and rs2 is 0x1, the result will be sign-extended).:
def : Pat<(udiv (zexti32 GPR:$rs1), (zexti32 GPR:$rs2)),
          (DIVUW GPR:$rs1, GPR:$rs2)>;

So this function would generate:
slli a1, a1, 32
srli a1, a1, 32
slli a0, a0, 32
srli a0, a0, 32
divu a0, a0, a1
ret

Rather than a simple divuw. Obviously we can argue whether such cases
are likely to occur in real-world code (certainly this specific case
of aext function args/returns isn't going to happen for
clang-generated code), but it makes me uncomfortable that we can't
easily make the best codegen decision for such a straight-forward
input.
> > Any thoughts on how to best handle these cases? For the shifts I could
> > obviously introduce a target-specific SelectionDAG node, but then
I'd lose the
> > benefit of most of the target-independent DAG combines. A target DAG
combine
> > could custom lower the shift amount. e.g. converting (shl val, shamt)
to (shl
> > val, (and shamt, 31)) before it gets widened to i64. But this runs the
risk
> > of introducing an unnecessary AND operation if it turns out that the
> > SelectionDAG node is transformed by the time it reaches instruction
selection.
> > So for this particular issue with shifts, introducing a
target-specific WasI32
> > node and converting to (shl val, (WasI32 shamt)) in a target DAG
combine looks
> > plausible.
>
> You can represent "WasI32" using AssertZext i5.  That seems like
a
> reasonable approach.
I hadn't considered AssertZext, thanks for the suggestion. Don't we
really want AssertAnyExt i5 though (which doesn't currently exist)? We
can't guarantee that the bits other than lower 5 are zero. I'd be
willing to believe that there's no way this infidelity could cause
issues for this particular transformation and the current dag
combiner, but it still seems shaky.

Best,

Alex

llvm dev - Dec 2018 - Dealing with information loss for widened integer operations at ISel time

[llvm-dev] Dealing with information loss for widened integer operations at ISel time

[llvm-dev] Dealing with information loss for widened integer operations at ISel time

[llvm-dev] Dealing with information loss for widened integer operations at ISel time