thr3ads.net - llvm dev - [llvm-dev] Remove zext-unfolding from InstCombine [Jul 2016]

If this information is useful, please help other people find it:
Share via:

Matthias Reisinger via llvm-dev

2016-Jul-21 12:07 UTC

[llvm-dev] Remove zext-unfolding from InstCombine

Hi all,

I have a question regarding a transformation that is carried out in InstCombine,
which has been introduced by r48715. It unfolds expressions of the form
`zext(or(icmp, (icmp)))` to `or(zext(icmp), zext(icmp)))` to expose pairs of
`zext(icmp)`. In a subsequent iteration these `zext(icmp)` pairs could then
(possibly) be optimized by another optimization (which has already been there
before r48715), to replace them by bitwise or integer operations. The code in
question is located in `visitZExt()` in InstCombineCasts.cpp:

```
if (ICmpInst *ICI = dyn_cast<ICmpInst>(Src))
  return transformZExtICmp(ICI, CI);

BinaryOperator *SrcI = dyn_cast<BinaryOperator>(Src);
if (SrcI && SrcI->getOpcode() == Instruction::Or) {
  // zext (or icmp, icmp) --> or (zext icmp), (zext icmp) if at least one
  // of the (zext icmp) will be transformed.
  ICmpInst *LHS = dyn_cast<ICmpInst>(SrcI->getOperand(0));
  ICmpInst *RHS = dyn_cast<ICmpInst>(SrcI->getOperand(1));
  if (LHS && RHS && LHS->hasOneUse() &&
RHS->hasOneUse() &&
      (transformZExtICmp(LHS, CI, false) ||
       transformZExtICmp(RHS, CI, false))) {
    Value *LCast = Builder->CreateZExt(LHS, CI.getType(), LHS->getName());
    Value *RCast = Builder->CreateZExt(RHS, CI.getType(), RHS->getName());
    return BinaryOperator::Create(Instruction::Or, LCast, RCast);
  }
}
```

The first two lines do the actual `zext(icmp)` optimization. The remaining lines
have been added by r48715 and do the mentioned unfolding. However, in
`foldCastedBitwiseLogic()` there happens a reverse of the above unfolding, which
transforms `logic(cast(A), cast(B))` to `cast(logic(A, B))`:

```
if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))
&&
    shouldOptimizeCast(Cast0) && shouldOptimizeCast(Cast1)) {
  Value *NewOp = Builder->CreateBinOp(LogicOpc, Cast0Src, Cast1Src,
                                      I.getName());
  return CastInst::Create(CastOpcode, NewOp, DestTy);
}
```

The `(!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))` part
has been added by r48715 in order to ensure that InstCombine does not enter an
endless loop, switching back and forth between the two transformations. There
was also added an InstCombine test case `zext-or-icmp.ll` that would enter such
an endless loop, if this check would not be present. However, it also generally
permits `foldCastedBitwiseLogic()` from folding operations with two icmps as
operands. Something like

```
%1 = icmp sgt i64 %a, %b
%2 = zext i1 %1 to i8
%3 = icmp slt i64 %a, %c
%4 = zext i1 %3 to i8
%5 = and i8 %2, %4
```

would not be able to be transformed, even though the contained `zext icmp` pairs
will also not be able to be optimized by `transformZExtICmp()` above, and the
casts will survive. Therefore, in r275989 (https://reviews.llvm.org/D22511) I
tried to narrow the check in `foldCastedBitwiseLogic()` to allow the folding of
casts in the presence of icmps. For the above example this would result in:

```
%1 = icmp sgt i64 %a, %b
%2 = icmp slt i64 %a, %c
%3 = and i1 %1, %2
%4 = zext i1 %3 to i8
```

Despite `zext-or-icmp.ll` did not enter an endless loop with my change, it later
turned out the assumptions that I took in r275989 were too optimistic and
endless loops might still occur, which has been pointed out by Benjamin Kramer
in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html
and it has therefore been reverted.

I apologize in advance for this possibly naive question, but I was wondering
whether it is feasible to remove the unfolding introduced by r48715 in order to
enable a cast-folding like on the example above. I've also already tested
this and run all of LLVM's test cases via the `make check` target and it
seems that the only test case relies on this unfolding is the mentioned
`zext-or-icmp.ll`. I don't know if Evan Chang is still responsible for this
code region, but since he is the original author of the commit, I thought it
would be safe to include him here (I hope it's the right email address).

Thank you in advance for your thoughts on this!

Best regards,
Matthias

Sanjay Patel via llvm-dev

2016-Jul-21 16:51 UTC

head link

[llvm-dev] Remove zext-unfolding from InstCombine

Hi Matthias,

I've been fixing similar problems recently (eg,
https://llvm.org/bugs/show_bug.cgi?id=28476 ). I think your example (the
test in zext-or-icmp.ll) is another missed application of DeMorgan's Law:

char foo(char a, char b) {
  return (((a & 1) == 0) | (b == 0));
}

char goo(char a, char b) {
  return !(((a & 1) != 0) & (b != 0));
}

Make sure I didn't botch that translation: the 1st is the C equivalent of
zext-or-icmp.ll, and the 2nd is what we really want it to be optimized to
instead of what's shown in that test case.

define i8 @foo(i8 %a, i8 %b) {
  %and = and i8 %a, 1
  %cmp = icmp eq i8 %b, 0
  %xor = xor i8 %and, 1
  %zext = zext i1 %cmp to i8
  %or = or i8 %zext, %xor
  ret i8 %or
}

define i8 @goo(i8 %a, i8 %b) {
  %cmp = icmp ne i8 %b, 0
  %zext = zext i1 %cmp to i8
  %and = and i8 %zext, %a
  %xor = xor i8 %and, 1
  ret i8 %xor
}

The 2nd function is the better IR (one less logic op), so you need to add a
transform to produce that and get rid of the existing limitation. Any
InstCombine folds that rely on bypassing an earlier fold are wrong IMO.


On Thu, Jul 21, 2016 at 6:07 AM, Matthias Reisinger via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hi all,
>
> I have a question regarding a transformation that is carried out in
> InstCombine, which has been introduced by r48715. It unfolds expressions of
> the form `zext(or(icmp, (icmp)))` to `or(zext(icmp), zext(icmp)))` to
> expose pairs of `zext(icmp)`. In a subsequent iteration these `zext(icmp)`
> pairs could then (possibly) be optimized by another optimization (which has
> already been there before r48715), to replace them by bitwise or integer
> operations. The code in question is located in `visitZExt()` in
> InstCombineCasts.cpp:
>
> ```
> if (ICmpInst *ICI = dyn_cast<ICmpInst>(Src))
>   return transformZExtICmp(ICI, CI);
>
> BinaryOperator *SrcI = dyn_cast<BinaryOperator>(Src);
> if (SrcI && SrcI->getOpcode() == Instruction::Or) {
>   // zext (or icmp, icmp) --> or (zext icmp), (zext icmp) if at least
one
>   // of the (zext icmp) will be transformed.
>   ICmpInst *LHS = dyn_cast<ICmpInst>(SrcI->getOperand(0));
>   ICmpInst *RHS = dyn_cast<ICmpInst>(SrcI->getOperand(1));
>   if (LHS && RHS && LHS->hasOneUse() &&
RHS->hasOneUse() &&
>       (transformZExtICmp(LHS, CI, false) ||
>        transformZExtICmp(RHS, CI, false))) {
>     Value *LCast = Builder->CreateZExt(LHS, CI.getType(),
LHS->getName());
>     Value *RCast = Builder->CreateZExt(RHS, CI.getType(),
RHS->getName());
>     return BinaryOperator::Create(Instruction::Or, LCast, RCast);
>   }
> }
> ```
>
> The first two lines do the actual `zext(icmp)` optimization. The remaining
> lines have been added by r48715 and do the mentioned unfolding. However, in
> `foldCastedBitwiseLogic()` there happens a reverse of the above unfolding,
> which transforms `logic(cast(A), cast(B))` to `cast(logic(A, B))`:
>
> ```
> if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))
&&
>     shouldOptimizeCast(Cast0) && shouldOptimizeCast(Cast1)) {
>   Value *NewOp = Builder->CreateBinOp(LogicOpc, Cast0Src, Cast1Src,
>                                       I.getName());
>   return CastInst::Create(CastOpcode, NewOp, DestTy);
> }
> ```
>
> The `(!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))`
part has been
> added by r48715 in order to ensure that InstCombine does not enter an
> endless loop, switching back and forth between the two transformations.
> There was also added an InstCombine test case `zext-or-icmp.ll` that would
> enter such an endless loop, if this check would not be present. However, it
> also generally permits `foldCastedBitwiseLogic()` from folding operations
> with two icmps as operands. Something like
>
> ```
> %1 = icmp sgt i64 %a, %b
> %2 = zext i1 %1 to i8
> %3 = icmp slt i64 %a, %c
> %4 = zext i1 %3 to i8
> %5 = and i8 %2, %4
> ```
>
> would not be able to be transformed, even though the contained `zext icmp`
> pairs will also not be able to be optimized by `transformZExtICmp()` above,
> and the casts will survive. Therefore, in r275989 (
> https://reviews.llvm.org/D22511) I tried to narrow the check in
> `foldCastedBitwiseLogic()` to allow the folding of casts in the presence of
> icmps. For the above example this would result in:
>
> ```
> %1 = icmp sgt i64 %a, %b
> %2 = icmp slt i64 %a, %c
> %3 = and i1 %1, %2
> %4 = zext i1 %3 to i8
> ```
>
> Despite `zext-or-icmp.ll` did not enter an endless loop with my change, it
> later turned out the assumptions that I took in r275989 were too optimistic
> and endless loops might still occur, which has been pointed out by Benjamin
> Kramer in
>
http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html
> and it has therefore been reverted.
>
> I apologize in advance for this possibly naive question, but I was
> wondering whether it is feasible to remove the unfolding introduced by
> r48715 in order to enable a cast-folding like on the example above.
I've
> also already tested this and run all of LLVM's test cases via the `make
> check` target and it seems that the only test case relies on this unfolding
> is the mentioned `zext-or-icmp.ll`. I don't know if Evan Chang is still
> responsible for this code region, but since he is the original author of
> the commit, I thought it would be safe to include him here (I hope it's
the
> right email address).
>
> Thank you in advance for your thoughts on this!
>
> Best regards,
> Matthias
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20160721/37232269/attachment-0001.html>

Matthias Reisinger via llvm-dev

2016-Jul-27 14:13 UTC

head link

[llvm-dev] Remove zext-unfolding from InstCombine

Hi Sanjay,

thank you a lot for your answer. I understand that in your examples it is
desirable that `foo` and `goo` are canonicalized to the same IR, i.e., something
like `@goo`. However, I still have a few open questions, but please correct me
in case I'm thinking in the wrong direction.
> Am 21.07.2016 um 18:51 schrieb Sanjay Patel <spatel at
rotateright.com>:
> 
> I've been fixing similar problems recently (eg,
https://llvm.org/bugs/show_bug.cgi?id=28476 ). I think your example (the test in
zext-or-icmp.ll) is another missed application of DeMorgan's Law:
> 
> char foo(char a, char b) {
>   return (((a & 1) == 0) | (b == 0));
> }
> 
> char goo(char a, char b) {
>   return !(((a & 1) != 0) & (b != 0));
> }
> 
> Make sure I didn't botch that translation: the 1st is the C equivalent
of zext-or-icmp.ll, and the 2nd is what we really want it to be optimized to
instead of what's shown in that test case.
Just to make sure that I understand this correctly: your goal here would be to
add a transformation that changes `foo` and zext-or-icmp.ll to a form that
`mathDeMorgan()` can recognize and transform to `goo` to eventually get it to
the "optimal" form in `@goo`:

Therefore, in a first step, I examined the IR that clang generates for `foo` and
`goo` just before they are passed to InstCombine:

```
define signext i8 @foo_before_InstCombine(i8 signext %a, i8 signext %b)
local_unnamed_addr #0 {
entry:
  %conv = sext i8 %a to i32
  %and = and i32 %conv, 1
  %cmp = icmp eq i32 %and, 0
  %conv1 = zext i1 %cmp to i32
  %conv2 = sext i8 %b to i32
  %cmp3 = icmp eq i32 %conv2, 0
  %conv4 = zext i1 %cmp3 to i32
  %or = or i32 %conv1, %conv4
  %conv5 = trunc i32 %or to i8
  ret i8 %conv5
}

; Function Attrs: nounwind ssp uwtable
define signext i8 @goo_before_InstCombine(i8 signext %a, i8 signext %b)
local_unnamed_addr #0 {
entry:
  %conv = sext i8 %a to i32
  %and = and i32 %conv, 1
  %cmp = icmp ne i32 %and, 0
  %conv1 = zext i1 %cmp to i32
  %conv2 = sext i8 %b to i32
  %cmp3 = icmp ne i32 %conv2, 0
  %conv4 = zext i1 %cmp3 to i32
  %and5 = and i32 %conv1, %conv4
  %tobool = icmp ne i32 %and5, 0
  %lnot = xor i1 %tobool, true
  %lnot.ext = zext i1 %lnot to i32
  %conv6 = trunc i32 %lnot.ext to i8
  ret i8 %conv6
}
```

For both functions, the `icmp` operations will be immediately followed by `zext`
instructions, which will directly be optimized away by `transformZExtICmp()`,
which is the reason why in the end we will only have one of the `icmp`
instructions left. However, that means that `foo` will not be lowered to the IR
that we have in zext-or-icmp.ll, where the `zext` is placed after the `or`
instruction as opposed to `@foo_before_InstCombine`:

```
define i8 @zext_or_icmp_icmp(i8 %a, i8 %b) {
  %mask = and i8 %a, 1
  %toBool1 = icmp eq i8 %mask, 0
  %toBool2 = icmp eq i8 %b, 0
  %bothCond = or i1 %toBool1, %toBool2
  %zext = zext i1 %bothCond to i8
  ret i8 %zext
}
```

That means as long as the `zext` in zext-or-icmp.ll isn't pushed in front of
the `icmp` operations it will not be possible to remove one of them via
`transformZExtICmp()` to get to something like `@goo`. So if I'm not
mistaken we would still be in need of this unfolding for `zext-or-icmp.ll` if we
want it to be optimized equivalently to `foo`, which we would also have to take
into account when adapting it for DeMorgan.

In general, and independently from my concerns above, I would think that when we
want to get from `(((a & 1) == 0) | (b == 0))` to `!(((a & 1) != 0)
& (b != 0))` we would need to transform it to `(!((a & 1) != 0) | !(b !=
0))` in order to make it amenable to DeMorgan. I also tried to adapt `foo` to
this form, however, the problem is that InstCombine recognizes `!(... != 0)` as
redundant and will simplify it back to `(... == 0)`. So for our case, that means
that `(!((a & 1) != 0) | !(b != 0))` is immediately transformed back to
`(((a & 1) == 0) | (b == 0))` before `matchDeMorgan()` sees it.

I also wanted to let you know, that in the meantime I also prepared another
patch that tries to solve the mentioned problems with the `zext` unfolding
(still without taking DeMorgan into account): https://reviews.llvm.org/D22864.
It might not yet be the optimal way in this case but it enables us to to reduce
the existing limitations when folding `zext` operations. But in case I
misinterpreted your suggestions I'm surely interested in taking another
approach.

Sorry if I misunderstood you or if I overcomplicated the matter, however,
I'm looking forward to your thoughts on this. And also in case there's
anything unclear or if I forgot to mention something that would be important
don't hesitate to demand a more detailed explanation from my side.

Best,
Matthias

> define i8 @foo(i8 %a, i8 %b) {
>   %and = and i8 %a, 1
>   %cmp = icmp eq i8 %b, 0
>   %xor = 
> 
> define i8 @goo(i8 %a, i8 %b) {
>   %cmp = icmp ne i8 %b, 0
>   %zext = zext i1 %cmp to i8
>   %and = and i8 %zext, %a
>   %xor = xor i8 %and, 1
>   ret i8 %xor
> }
> 
> The 2nd function is the better IR (one less logic op), so you need to add a
transform to produce that and get rid of the existing limitation. Any
InstCombine folds that rely on bypassing an earlier fold are wrong IMO.
> 
> 
> On Thu, Jul 21, 2016 at 6:07 AM, Matthias Reisinger via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Hi all,
> 
> I have a question regarding a transformation that is carried out in
InstCombine, which has been introduced by r48715. It unfolds expressions of the
form `zext(or(icmp, (icmp)))` to `or(zext(icmp), zext(icmp)))` to expose pairs
of `zext(icmp)`. In a subsequent iteration these `zext(icmp)` pairs could then
(possibly) be optimized by another optimization (which has already been there
before r48715), to replace them by bitwise or integer operations. The code in
question is located in `visitZExt()` in InstCombineCasts.cpp:
> 
> ```
> if (ICmpInst *ICI = dyn_cast<ICmpInst>(Src))
>   return transformZExtICmp(ICI, CI);
> 
> BinaryOperator *SrcI = dyn_cast<BinaryOperator>(Src);
> if (SrcI && SrcI->getOpcode() == Instruction::Or) {
>   // zext (or icmp, icmp) --> or (zext icmp), (zext icmp) if at least
one
>   // of the (zext icmp) will be transformed.
>   ICmpInst *LHS = dyn_cast<ICmpInst>(SrcI->getOperand(0));
>   ICmpInst *RHS = dyn_cast<ICmpInst>(SrcI->getOperand(1));
>   if (LHS && RHS && LHS->hasOneUse() &&
RHS->hasOneUse() &&
>       (transformZExtICmp(LHS, CI, false) ||
>        transformZExtICmp(RHS, CI, false))) {
>     Value *LCast = Builder->CreateZExt(LHS, CI.getType(),
LHS->getName());
>     Value *RCast = Builder->CreateZExt(RHS, CI.getType(),
RHS->getName());
>     return BinaryOperator::Create(Instruction::Or, LCast, RCast);
>   }
> }
> ```
> 
> The first two lines do the actual `zext(icmp)` optimization. The remaining
lines have been added by r48715 and do the mentioned unfolding. However, in
`foldCastedBitwiseLogic()` there happens a reverse of the above unfolding, which
transforms `logic(cast(A), cast(B))` to `cast(logic(A, B))`:
> 
> ```
> if ((!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))
&&
>     shouldOptimizeCast(Cast0) && shouldOptimizeCast(Cast1)) {
>   Value *NewOp = Builder->CreateBinOp(LogicOpc, Cast0Src, Cast1Src,
>                                       I.getName());
>   return CastInst::Create(CastOpcode, NewOp, DestTy);
> }
> ```
> 
> The `(!isa<ICmpInst>(Cast0Src) || !isa<ICmpInst>(Cast1Src))`
part has been added by r48715 in order to ensure that InstCombine does not enter
an endless loop, switching back and forth between the two transformations. There
was also added an InstCombine test case `zext-or-icmp.ll` that would enter such
an endless loop, if this check would not be present. However, it also generally
permits `foldCastedBitwiseLogic()` from folding operations with two icmps as
operands. Something like
> 
> ```
> %1 = icmp sgt i64 %a, %b
> %2 = zext i1 %1 to i8
> %3 = icmp slt i64 %a, %c
> %4 = zext i1 %3 to i8
> %5 = and i8 %2, %4
> ```
> 
> would not be able to be transformed, even though the contained `zext icmp`
pairs will also not be able to be optimized by `transformZExtICmp()` above, and
the casts will survive. Therefore, in r275989 (https://reviews.llvm.org/D22511)
I tried to narrow the check in `foldCastedBitwiseLogic()` to allow the folding
of casts in the presence of icmps. For the above example this would result in:
> 
> ```
> %1 = icmp sgt i64 %a, %b
> %2 = icmp slt i64 %a, %c
> %3 = and i1 %1, %2
> %4 = zext i1 %3 to i8
> ```
> 
> Despite `zext-or-icmp.ll` did not enter an endless loop with my change, it
later turned out the assumptions that I took in r275989 were too optimistic and
endless loops might still occur, which has been pointed out by Benjamin Kramer
in http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160718/374347.html
and it has therefore been reverted.
> 
> I apologize in advance for this possibly naive question, but I was
wondering whether it is feasible to remove the unfolding introduced by r48715 in
order to enable a cast-folding like on the example above. I've also already
tested this and run all of LLVM's test cases via the `make check` target and
it seems that the only test case relies on this unfolding is the mentioned
`zext-or-icmp.ll`. I don't know if Evan Chang is still responsible for this
code region, but since he is the original author of the commit, I thought it
would be safe to include him here (I hope it's the right email address).
> 
> Thank you in advance for your thoughts on this!
> 
> Best regards,
> Matthias
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

Maybe Matching Threads

Search for more maybe matching threads

llvm dev - Jul 2016 - Remove zext-unfolding from InstCombine

[llvm-dev] Remove zext-unfolding from InstCombine

[llvm-dev] Remove zext-unfolding from InstCombine

[llvm-dev] Remove zext-unfolding from InstCombine

Maybe Matching Threads