thr3ads.net - llvm dev - [LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble

If this information is useful, please help other people find it:
Share via:

Heikki Kultala

2010-Sep-30 09:13 UTC

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble - might be a bug in LLVM 2.8?)

Bill Wendling wrote:> On Sep 29, 2010, at 12:36 AM, Heikki Kultala wrote:
> 
>> On 29 Sep 2010, at 06:25, Heikki Kultala wrote:
>>
>>> Our architecture has 1-bit boolean predicate registers.
>>>
>>> I've defined comparison
>>>
>>> def NErrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (setne
I32Regs:$op1, I32Regs:$op2))]>;
>>>
>>> But then I end up having the following bug:
>>>
>>> Code
>>>
>>> %0 = zext i8 %data to i32
>>> %1 = zext i16 %crc to i32
>>> %2 = xor i32 %1, %0
>>> %3 = and i32 %2, 1
>>> %4 = icmp eq i32 %3, 0
>>>
>>> which compares the lowest bits of the 2 variables
>>> ends up being compiled as
>>>
>>>       %reg16384<def> = LDWi <fi#-2>, 0;
mem:LD4[FixedStack-2] I32Regs:%reg16384
>>>       %reg16385<def> = LDWi <fi#-1>, 0;
mem:LD4[FixedStack-1] I32Regs:%reg16385
>>>       %reg16386<def> = COPY %reg16384;
I32Regs:%reg16386,16384
>>>       %reg16390<def> = NErrb %reg16384, %reg16385;
I1Regs:%reg16390 I32Regs:%reg16384,16385
>>>
>>> which just compares ALL BITS of the variables.
>> I also have a pattern:
>>
>> def XORrrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (trunc (xor
I32Regs:$op1, I32Regs:$op2)))]>;
>>
>> Which can do the whole 3-operation code sequence correctly with one
operation.
>>
>> With LLVM 2.7 this correct operation is selected, with LLVM 2.8 the
wrong operation(which compares all bits) is chosen
>>
>> So this looks like a bug in LLVM 2.8 isel?
>>
> Hi Heikki,
> 
> We need a better example of what's going on. What's the original
code? Also, I don't have access to your back-end's code so it's hard
to tell just from these snippets what's going on. For instance, it's not
clear whether it's the instruction selector that's at fault or if your
.td files have a bug in them somewhere.
The original code is:




#include <stdio.h>
typedef unsigned char e_u8;
typedef unsigned short e_u16;
e_u16 Calc_crc8(e_u8 data, e_u16 crc ) __attribute((__noinline__));

e_u16 Calc_crc8(e_u8 data, e_u16 crc )
{
         e_u8 i,x16,carry;

         for (i = 0; i < 2; i++)
             {
                 x16 = (e_u8)(((data) ^ ((e_u8)crc))&1);
                 if (x16 == 1)
                 {
                    crc ^= 0x4002;
                    carry = 1;
                 }
                 else {
                     carry = 0;
                 }
                 if (carry) {
                    crc |= 0x8000;
                 }
                    else {
                    crc &= 0x7fff;
                    }
     }
         return crc;
}

int main(void) {
     char *foo = "foobar";
     volatile short zero = 0;
     volatile int crc = Calc_crc8(foo[0],zero);
/*
#ifdef __TCE__
     iprintf("Crc8 at middle is %d\n", crc);
#else
     printf("Crc8 at middle is %d\n", crc);
#endif
*/
     crc = Calc_crc8(foo[1],crc);
/*
#ifdef __TCE__
     iprintf("Crc8 is %d\n", crc);
#else
     printf("Crc8 is %d\n", crc);
#endif
*/
     if (crc != 32768) {
         putchar('0');
     } else {
         putchar('1');
     }

     putchar('\n');
}

where the interesting lines are lines 12-13:

                 x16 = (e_u8)(((data) ^ ((e_u8)crc))&1);
                 if (x16 == 1)




The code which goes into isel is:

bb.nph:
   %0 = zext i8 %data to i32
   %1 = zext i16 %crc to i32
   %2 = xor i32 %1, %0
   %3 = and i32 %2, 1
   %4 = icmp eq i32 %3, 0
   br i1 %4, label %bb.nph._crit_edge, label %5

inside selectiondag this becomes:




Legalized selection DAG:
SelectionDAG has 21 nodes:
   0x2536968: ch = EntryToken [ORD=1] [ID=0]

   0x248da80: i32 = undef [ORD=1] [ID=2]

       0x263e968: <multiple use>
       0x248db80: i32 = FrameIndex<-1> [ORD=1] [ID=1]

       0x248da80: <multiple use>
     0x248d580: i32,ch = load 0x263e968, 0x248db80, 
0x248da80<LD4[FixedStack-1]> [ORD=1] [ID=10]

     0x248d480: ch = ValueType:i8 [ORD=1] [ID=4]

   0x248d980: i32 = AssertZext 0x248d580, 0x248d480 [ORD=1] [ID=12]

       0x263e968: <multiple use>
       0x248d680: i32 = FrameIndex<-2> [ORD=2] [ID=3]

       0x248da80: <multiple use>
     0x248dc80: i32,ch = load 0x263e968, 0x248d680, 
0x248da80<LD4[FixedStack-2]> [ORD=2] [ID=11]

     0x248d380: ch = ValueType:i16 [ORD=2] [ID=5]

   0x248d280: i32 = AssertZext 0x248dc80, 0x248d380 [ORD=2] [ID=13]

         0x263e968: <multiple use>
         0x248dd80: i32 = Register %reg16384 [ID=6]

         0x248d280: <multiple use>
       0x248de80: ch = CopyToReg 0x263e968, 0x248dd80, 0x248d280 [ID=17]

         0x263e968: <multiple use>
         0x248e080: i32 = Register %reg16385 [ID=7]

         0x248d980: <multiple use>
       0x25bb3f0: ch = CopyToReg 0x263e968, 0x248e080, 0x248d980 [ID=14]

         0x263e968: <multiple use>
         0x25bb5f0: i32 = Register %reg16386 [ID=8]

         0x248d280: <multiple use>
       0x25bb6f0: ch = CopyToReg 0x263e968, 0x25bb5f0, 0x248d280 [ID=16]

     0x25bc0f0: ch = TokenFactor 0x248de80, 0x25bb3f0, 0x25bb6f0 [ID=19]

         0x248d280: <multiple use>
         0x248d980: <multiple use>
       0x25bb7f0: i32 = xor 0x248d280, 0x248d980 [ORD=3] [ID=15]

     0x25bbbf0: i1 = truncate 0x25bb7f0 [ID=18]

     0x25bbff0: ch = BasicBlock< 0x2685718> [ID=9]

   0x25bc1f0: ch = brcond 0x25bc0f0, 0x25bbbf0, 0x25bbff0 [ID=20]



What is interesting is the 5 last lines..

There is xor, then trunc, taking only the lowest bit, and then brcond, 
jumping on condition based on the lowest bit.



This routine then goes to SelectionDAG:Combine() which runs 
DAGCombiner::visitBRCOND().



DAGCombiner::visitBRCOND() has code:

   SDValue N1 = N->getOperand(1);
   SDValue N2 = N->getOperand(2);

...

   SDNode *Trunc = 0;
   if (N1.getOpcode() == ISD::TRUNCATE && N1.hasOneUse()) {
     // Look past truncate.
     Trunc = N1.getNode();
     N1 = N1.getOperand(0);
   }

which just drops the truncate away..
then there is another optimization afterwards..

   // Transform br(xor(x, y)) -> br(x != y)
   // Transform br(xor(xor(x,y), 1)) -> br (x == y)
   if (N1.hasOneUse() && N1.getOpcode() == ISD::XOR) {
     SDNode *TheXor = N1.getNode();
     SDValue Op0 = TheXor->getOperand(0);
     SDValue Op1 = TheXor->getOperand(1);
     if (Op0.getOpcode() == Op1.getOpcode()) {
       // Avoid missing important xor optimizations.
       SDValue Tmp = visitXOR(TheXor);
       if (Tmp.getNode() && Tmp.getNode() != TheXor) {
         DEBUG(dbgs() << "\nReplacing.8 ";
               TheXor->dump(&DAG);
               dbgs() << "\nWith: ";
               Tmp.getNode()->dump(&DAG);
               dbgs() << '\n');
         WorkListRemover DeadNodes(*this);
         DAG.ReplaceAllUsesOfValueWith(N1, Tmp, &DeadNodes);
         removeFromWorkList(TheXor);
         DAG.DeleteNode(TheXor);
         return DAG.getNode(ISD::BRCOND, N->getDebugLoc(),
                            MVT::Other, Chain, Tmp, N2);
       }
     }

     if (Op0.getOpcode() != ISD::SETCC && Op1.getOpcode() != ISD::SETCC)
{
       bool Equal = false;
       if (ConstantSDNode *RHSCI = dyn_cast<ConstantSDNode>(Op0))
         if (RHSCI->getAPIntValue() == 1 && Op0.hasOneUse()
&&
             Op0.getOpcode() == ISD::XOR) {
           TheXor = Op0.getNode();
           Equal = true;
         }

       SDValue NodeToReplace = Trunc ? SDValue(Trunc, 0) : N1;

       EVT SetCCVT = NodeToReplace.getValueType();
       if (LegalTypes)
         SetCCVT = TLI.getSetCCResultType(SetCCVT);
       SDValue SetCC = DAG.getSetCC(TheXor->getDebugLoc(),
                                    SetCCVT,
                                    Op0, Op1,
                                    Equal ? ISD::SETEQ : ISD::SETNE);
       // Replace the uses of XOR with SETCC
       WorkListRemover DeadNodes(*this);
       DAG.ReplaceAllUsesOfValueWith(NodeToReplace, SetCC, &DeadNodes);
       removeFromWorkList(NodeToReplace.getNode());
       DAG.DeleteNode(NodeToReplace.getNode());
       return DAG.getNode(ISD::BRCOND, N->getDebugLoc(),
                          MVT::Other, Chain, SetCC, N2);
     }
   }
}


Which then optimizes the xor into setcc which looks ALL bits, not just 
the lowest bit.




Optimized legalized selection DAG:
SelectionDAG has 21 nodes:
   0x2536968: ch = EntryToken [ORD=1] [ID=0]

   0x2384a70: i32 = undef [ORD=1] [ID=2]

       0x2536968: <multiple use>
       0x2384b70: i32 = FrameIndex<-1> [ORD=1] [ID=1]

       0x2384a70: <multiple use>
     0x2384570: i32,ch = load 0x2536968, 0x2384b70, 
0x2384a70<LD4[FixedStack-1]> [ORD=1] [ID=10]

     0x2384470: ch = ValueType:i8 [ORD=1] [ID=4]

   0x2384970: i32 = AssertZext 0x2384570, 0x2384470 [ORD=1] [ID=12]

       0x2536968: <multiple use>
       0x2384670: i32 = FrameIndex<-2> [ORD=2] [ID=3]

       0x2384a70: <multiple use>
     0x2384c70: i32,ch = load 0x2536968, 0x2384670, 
0x2384a70<LD4[FixedStack-2]> [ORD=2] [ID=11]

     0x2384370: ch = ValueType:i16 [ORD=2] [ID=5]

   0x2384270: i32 = AssertZext 0x2384c70, 0x2384370 [ORD=2] [ID=13]

         0x2536968: <multiple use>
         0x2384d70: i32 = Register %reg16384 [ID=6]

         0x2384270: <multiple use>
       0x2384e70: ch = CopyToReg 0x2536968, 0x2384d70, 0x2384270 [ID=17]

         0x2536968: <multiple use>
         0x2385070: i32 = Register %reg16385 [ID=7]

         0x2384970: <multiple use>
       0x24b33f0: ch = CopyToReg 0x2536968, 0x2385070, 0x2384970 [ID=14]

         0x2536968: <multiple use>
         0x24b35f0: i32 = Register %reg16386 [ID=8]

         0x2384270: <multiple use>
       0x24b36f0: ch = CopyToReg 0x2536968, 0x24b35f0, 0x2384270 [ID=16]

     0x24b40f0: ch = TokenFactor 0x2384e70, 0x24b33f0, 0x24b36f0 [ID=19]

       0x2384270: <multiple use>
       0x2384970: <multiple use>
       0x24b38f0: ch = setne

     0x24b39f0: i1 = setcc 0x2384270, 0x2384970, 0x24b38f0

     0x24b3ff0: ch = BasicBlock< 0x25c5ff8> [ID=9]

   0x24b41f0: ch = brcond 0x24b40f0, 0x24b39f0, 0x24b3ff0 [ID=20]




Here we have lost the information that we are only comparing the lowest 
bits, as the trunc is gone, and the setcc is done with 32-bit 
comparison, comparing all bits.



The way our backend is dynamically generated and loaded from a dynamic 
library into custom executable which also links to llvm makes it a bit 
hard to send a "easy to test" backend package for this situation.

Bill Wendling

2010-Oct-01 10:35 UTC

head link

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble - might be a bug in LLVM 2.8?)

On Sep 30, 2010, at 2:13 AM, Heikki Kultala wrote:
> Bill Wendling wrote:
>> On Sep 29, 2010, at 12:36 AM, Heikki Kultala wrote:
>> 
>>> On 29 Sep 2010, at 06:25, Heikki Kultala wrote:
>>> 
>>>> Our architecture has 1-bit boolean predicate registers.
>>>> 
>>>> I've defined comparison
>>>> 
>>>> def NErrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (setne
I32Regs:$op1, I32Regs:$op2))]>;
>>>> 
>>>> But then I end up having the following bug:
>>>> 
>>>> Code
>>>> 
>>>> %0 = zext i8 %data to i32
>>>> %1 = zext i16 %crc to i32
>>>> %2 = xor i32 %1, %0
>>>> %3 = and i32 %2, 1
>>>> %4 = icmp eq i32 %3, 0
>>>> 
>>>> which compares the lowest bits of the 2 variables
>>>> ends up being compiled as
>>>> 
>>>>      %reg16384<def> = LDWi <fi#-2>, 0;
mem:LD4[FixedStack-2] I32Regs:%reg16384
>>>>      %reg16385<def> = LDWi <fi#-1>, 0;
mem:LD4[FixedStack-1] I32Regs:%reg16385
>>>>      %reg16386<def> = COPY %reg16384;
I32Regs:%reg16386,16384
>>>>      %reg16390<def> = NErrb %reg16384, %reg16385;
I1Regs:%reg16390 I32Regs:%reg16384,16385
>>>> 
>>>> which just compares ALL BITS of the variables.
>>> I also have a pattern:
>>> 
>>> def XORrrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (trunc (xor
I32Regs:$op1, I32Regs:$op2)))]>;
>>> 
>>> Which can do the whole 3-operation code sequence correctly with one
operation.
>>> 
>>> With LLVM 2.7 this correct operation is selected, with LLVM 2.8 the
wrong operation(which compares all bits) is chosen
>>> 
>>> So this looks like a bug in LLVM 2.8 isel?
>>> 
>> Hi Heikki,
>> 
>> We need a better example of what's going on. What's the
original code? Also, I don't have access to your back-end's code so
it's hard to tell just from these snippets what's going on. For
instance, it's not clear whether it's the instruction selector
that's at fault or if your .td files have a bug in them somewhere.
> 
> The original code is:
[snip]
> where the interesting lines are lines 12-13:
> 
>                 x16 = (e_u8)(((data) ^ ((e_u8)crc))&1);
>                 if (x16 == 1)
> 
> The code which goes into isel is:
> 
> bb.nph:
>   %0 = zext i8 %data to i32
>   %1 = zext i16 %crc to i32
>   %2 = xor i32 %1, %0
>   %3 = and i32 %2, 1
>   %4 = icmp eq i32 %3, 0
>   br i1 %4, label %bb.nph._crit_edge, label %5
> 
> inside selectiondag this becomes:
> 
> Legalized selection DAG:
[snip]
>         0x248d280: <multiple use>
>         0x248d980: <multiple use>
>       0x25bb7f0: i32 = xor 0x248d280, 0x248d980 [ORD=3] [ID=15]
> 
>     0x25bbbf0: i1 = truncate 0x25bb7f0 [ID=18]
This truncate is weird to me. If anything, it should be an "and"
instruction. I have a feeling that your back-end is telling instruction
selection and the type legalizer that it's okay to replace the normal
"and" with this truncate call, which leads to your troubles later on.

I would suggest running this same code through another back-end to see if
there's anything different going on. For instance, both ARM and X86 keep the
"and" around...

-bw
>     0x25bbff0: ch = BasicBlock< 0x2685718> [ID=9]
> 
>   0x25bc1f0: ch = brcond 0x25bc0f0, 0x25bbbf0, 0x25bbff0 [ID=20]
> 
> 
> 
> What is interesting is the 5 last lines..
> 
> There is xor, then trunc, taking only the lowest bit, and then brcond, 
> jumping on condition based on the lowest bit.
> 
> 
> 
> This routine then goes to SelectionDAG:Combine() which runs 
> DAGCombiner::visitBRCOND().
> 
> 
> 
> DAGCombiner::visitBRCOND() has code:
> 
>   SDValue N1 = N->getOperand(1);
>   SDValue N2 = N->getOperand(2);
> 
> ...
> 
>   SDNode *Trunc = 0;
>   if (N1.getOpcode() == ISD::TRUNCATE && N1.hasOneUse()) {
>     // Look past truncate.
>     Trunc = N1.getNode();
>     N1 = N1.getOperand(0);
>   }
> 
> which just drops the truncate away..
> then there is another optimization afterwards..
> 
>   // Transform br(xor(x, y)) -> br(x != y)
>   // Transform br(xor(xor(x,y), 1)) -> br (x == y)
>   if (N1.hasOneUse() && N1.getOpcode() == ISD::XOR) {
>     SDNode *TheXor = N1.getNode();
>     SDValue Op0 = TheXor->getOperand(0);
>     SDValue Op1 = TheXor->getOperand(1);
>     if (Op0.getOpcode() == Op1.getOpcode()) {
>       // Avoid missing important xor optimizations.
>       SDValue Tmp = visitXOR(TheXor);
>       if (Tmp.getNode() && Tmp.getNode() != TheXor) {
>         DEBUG(dbgs() << "\nReplacing.8 ";
>               TheXor->dump(&DAG);
>               dbgs() << "\nWith: ";
>               Tmp.getNode()->dump(&DAG);
>               dbgs() << '\n');
>         WorkListRemover DeadNodes(*this);
>         DAG.ReplaceAllUsesOfValueWith(N1, Tmp, &DeadNodes);
>         removeFromWorkList(TheXor);
>         DAG.DeleteNode(TheXor);
>         return DAG.getNode(ISD::BRCOND, N->getDebugLoc(),
>                            MVT::Other, Chain, Tmp, N2);
>       }
>     }
> 
>     if (Op0.getOpcode() != ISD::SETCC && Op1.getOpcode() !=
ISD::SETCC) {
>       bool Equal = false;
>       if (ConstantSDNode *RHSCI = dyn_cast<ConstantSDNode>(Op0))
>         if (RHSCI->getAPIntValue() == 1 && Op0.hasOneUse()
&&
>             Op0.getOpcode() == ISD::XOR) {
>           TheXor = Op0.getNode();
>           Equal = true;
>         }
> 
>       SDValue NodeToReplace = Trunc ? SDValue(Trunc, 0) : N1;
> 
>       EVT SetCCVT = NodeToReplace.getValueType();
>       if (LegalTypes)
>         SetCCVT = TLI.getSetCCResultType(SetCCVT);
>       SDValue SetCC = DAG.getSetCC(TheXor->getDebugLoc(),
>                                    SetCCVT,
>                                    Op0, Op1,
>                                    Equal ? ISD::SETEQ : ISD::SETNE);
>       // Replace the uses of XOR with SETCC
>       WorkListRemover DeadNodes(*this);
>       DAG.ReplaceAllUsesOfValueWith(NodeToReplace, SetCC, &DeadNodes);
>       removeFromWorkList(NodeToReplace.getNode());
>       DAG.DeleteNode(NodeToReplace.getNode());
>       return DAG.getNode(ISD::BRCOND, N->getDebugLoc(),
>                          MVT::Other, Chain, SetCC, N2);
>     }
>   }
> }
> 
> 
> Which then optimizes the xor into setcc which looks ALL bits, not just 
> the lowest bit.
> 
> 
> 
> 
> Optimized legalized selection DAG:
> SelectionDAG has 21 nodes:
>   0x2536968: ch = EntryToken [ORD=1] [ID=0]
> 
>   0x2384a70: i32 = undef [ORD=1] [ID=2]
> 
>       0x2536968: <multiple use>
>       0x2384b70: i32 = FrameIndex<-1> [ORD=1] [ID=1]
> 
>       0x2384a70: <multiple use>
>     0x2384570: i32,ch = load 0x2536968, 0x2384b70, 
> 0x2384a70<LD4[FixedStack-1]> [ORD=1] [ID=10]
> 
>     0x2384470: ch = ValueType:i8 [ORD=1] [ID=4]
> 
>   0x2384970: i32 = AssertZext 0x2384570, 0x2384470 [ORD=1] [ID=12]
> 
>       0x2536968: <multiple use>
>       0x2384670: i32 = FrameIndex<-2> [ORD=2] [ID=3]
> 
>       0x2384a70: <multiple use>
>     0x2384c70: i32,ch = load 0x2536968, 0x2384670, 
> 0x2384a70<LD4[FixedStack-2]> [ORD=2] [ID=11]
> 
>     0x2384370: ch = ValueType:i16 [ORD=2] [ID=5]
> 
>   0x2384270: i32 = AssertZext 0x2384c70, 0x2384370 [ORD=2] [ID=13]
> 
>         0x2536968: <multiple use>
>         0x2384d70: i32 = Register %reg16384 [ID=6]
> 
>         0x2384270: <multiple use>
>       0x2384e70: ch = CopyToReg 0x2536968, 0x2384d70, 0x2384270 [ID=17]
> 
>         0x2536968: <multiple use>
>         0x2385070: i32 = Register %reg16385 [ID=7]
> 
>         0x2384970: <multiple use>
>       0x24b33f0: ch = CopyToReg 0x2536968, 0x2385070, 0x2384970 [ID=14]
> 
>         0x2536968: <multiple use>
>         0x24b35f0: i32 = Register %reg16386 [ID=8]
> 
>         0x2384270: <multiple use>
>       0x24b36f0: ch = CopyToReg 0x2536968, 0x24b35f0, 0x2384270 [ID=16]
> 
>     0x24b40f0: ch = TokenFactor 0x2384e70, 0x24b33f0, 0x24b36f0 [ID=19]
> 
>       0x2384270: <multiple use>
>       0x2384970: <multiple use>
>       0x24b38f0: ch = setne
> 
>     0x24b39f0: i1 = setcc 0x2384270, 0x2384970, 0x24b38f0
> 
>     0x24b3ff0: ch = BasicBlock< 0x25c5ff8> [ID=9]
> 
>   0x24b41f0: ch = brcond 0x24b40f0, 0x24b39f0, 0x24b3ff0 [ID=20]
> 
> 
> 
> 
> Here we have lost the information that we are only comparing the lowest 
> bits, as the trunc is gone, and the setcc is done with 32-bit 
> comparison, comparing all bits.
> 
> 
> 
> The way our backend is dynamically generated and loaded from a dynamic 
> library into custom executable which also links to llvm makes it a bit 
> hard to send a "easy to test" backend package for this situation.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Heikki Kultala

2010-Oct-01 10:50 UTC

head link

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble - might be a bug in LLVM 2.8?)

On 1 Oct 2010, at 13:35, Bill Wendling wrote:
> On Sep 30, 2010, at 2:13 AM, Heikki Kultala wrote:
> 
>> Bill Wendling wrote:
>>> On Sep 29, 2010, at 12:36 AM, Heikki Kultala wrote:
>>> 
>>>> On 29 Sep 2010, at 06:25, Heikki Kultala wrote:
>>>> 
>>>>> Our architecture has 1-bit boolean predicate registers.
>>>>> 
>>>>> I've defined comparison
>>>>> 
>>>>> def NErrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (setne
I32Regs:$op1, I32Regs:$op2))]>;
>>>>> 
>>>>> But then I end up having the following bug:
>>>>> 
>>>>> Code
>>>>> 
>>>>> %0 = zext i8 %data to i32
>>>>> %1 = zext i16 %crc to i32
>>>>> %2 = xor i32 %1, %0
>>>>> %3 = and i32 %2, 1
>>>>> %4 = icmp eq i32 %3, 0
>>>>> 
>>>>> which compares the lowest bits of the 2 variables
>>>>> ends up being compiled as
>>>>> 
>>>>>    %reg16384<def> = LDWi <fi#-2>, 0;
mem:LD4[FixedStack-2] I32Regs:%reg16384
>>>>>    %reg16385<def> = LDWi <fi#-1>, 0;
mem:LD4[FixedStack-1] I32Regs:%reg16385
>>>>>    %reg16386<def> = COPY %reg16384;
I32Regs:%reg16386,16384
>>>>>    %reg16390<def> = NErrb %reg16384, %reg16385;
I1Regs:%reg16390 I32Regs:%reg16384,16385
>>>>> 
>>>>> which just compares ALL BITS of the variables.
>>>> I also have a pattern:
>>>> 
>>>> def XORrrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (trunc (xor
I32Regs:$op1, I32Regs:$op2)))]>;
>>>> 
>>>> Which can do the whole 3-operation code sequence correctly with
one operation.
>>>> 
>>>> With LLVM 2.7 this correct operation is selected, with LLVM 2.8
the wrong operation(which compares all bits) is chosen
>>>> 
>>>> So this looks like a bug in LLVM 2.8 isel?
>>>> 
>>> Hi Heikki,
>>> 
>>> We need a better example of what's going on. What's the
original code? Also, I don't have access to your back-end's code so
it's hard to tell just from these snippets what's going on. For
instance, it's not clear whether it's the instruction selector
that's at fault or if your .td files have a bug in them somewhere.
>> 
>> The original code is:
> 
> [snip]
> 
>> where the interesting lines are lines 12-13:
>> 
>>               x16 = (e_u8)(((data) ^ ((e_u8)crc))&1);
>>               if (x16 == 1)
>> 
>> The code which goes into isel is:
>> 
>> bb.nph:
>> %0 = zext i8 %data to i32
>> %1 = zext i16 %crc to i32
>> %2 = xor i32 %1, %0
>> %3 = and i32 %2, 1
>> %4 = icmp eq i32 %3, 0
>> br i1 %4, label %bb.nph._crit_edge, label %5
>> 
>> inside selectiondag this becomes:
>> 
>> Legalized selection DAG:
> 
> [snip]
> 
>>       0x248d280: <multiple use>
>>       0x248d980: <multiple use>
>>     0x25bb7f0: i32 = xor 0x248d280, 0x248d980 [ORD=3] [ID=15]
>> 
>>   0x25bbbf0: i1 = truncate 0x25bb7f0 [ID=18]
> 
> This truncate is weird to me. If anything, it should be an "and"
instruction. I have a feeling that your back-end is telling instruction
selection and the type legalizer that it's okay to replace the normal
"and" with this truncate call, which leads to your troubles later on.
> 
> I would suggest running this same code through another back-end to see if
there's anything different going on. For instance, both ARM and X86 keep the
"and" around...
> 
> -bw
neither x86 or ARM have 1-bit registers, so they cannot convert the and to
trunc.

The trunc should be legal when the machine has 1-bit registers.

Duncan Sands

2010-Oct-02 12:25 UTC

head link

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble - might be a bug in LLVM 2.8?)

Hi,
>> DAGCombiner::visitBRCOND() has code:
>>
>>    SDValue N1 = N->getOperand(1);
>>    SDValue N2 = N->getOperand(2);
>>
>> ...
>>
>>    SDNode *Trunc = 0;
>>    if (N1.getOpcode() == ISD::TRUNCATE&&  N1.hasOneUse()) {
>>      // Look past truncate.
>>      Trunc = N1.getNode();
>>      N1 = N1.getOperand(0);
>>    }
>>
>> which just drops the truncate away.
this looks wrong.  According to the documentation of BRCOND,

     // BRCOND - Conditional branch.  The first operand is the chain, the
     // second is the condition, the third is the block to branch to if the
     // condition is true.  If the type of the condition is not i1, then the

which means that on a machine with unsigned boolean values, bit 0 should be
zero or 1 and the rest 0 (on a machine with signed boolean values, all bits
should be 0, or all bits should be one).  So just dropping the truncate seems
bogus, since it results in a value that may not satisfy these conditions.

Ciao,

Duncan.
>> then there is another optimization afterwards..
>>
>>    // Transform br(xor(x, y)) ->  br(x != y)
>>    // Transform br(xor(xor(x,y), 1)) ->  br (x == y)
>>    if (N1.hasOneUse()&&  N1.getOpcode() == ISD::XOR) {
>>      SDNode *TheXor = N1.getNode();
>>      SDValue Op0 = TheXor->getOperand(0);
>>      SDValue Op1 = TheXor->getOperand(1);
>>      if (Op0.getOpcode() == Op1.getOpcode()) {
>>        // Avoid missing important xor optimizations.
>>        SDValue Tmp = visitXOR(TheXor);
>>        if (Tmp.getNode()&&  Tmp.getNode() != TheXor) {
>>          DEBUG(dbgs()<<  "\nReplacing.8 ";
>>                TheXor->dump(&DAG);
>>                dbgs()<<  "\nWith: ";
>>                Tmp.getNode()->dump(&DAG);
>>                dbgs()<<  '\n');
>>          WorkListRemover DeadNodes(*this);
>>          DAG.ReplaceAllUsesOfValueWith(N1, Tmp,&DeadNodes);
>>          removeFromWorkList(TheXor);
>>          DAG.DeleteNode(TheXor);
>>          return DAG.getNode(ISD::BRCOND, N->getDebugLoc(),
>>                             MVT::Other, Chain, Tmp, N2);
>>        }
>>      }
>>
>>      if (Op0.getOpcode() != ISD::SETCC&&  Op1.getOpcode() !=
ISD::SETCC) {
>>        bool Equal = false;
>>        if (ConstantSDNode *RHSCI = dyn_cast<ConstantSDNode>(Op0))
>>          if (RHSCI->getAPIntValue() == 1&& 
Op0.hasOneUse()&&
>>              Op0.getOpcode() == ISD::XOR) {
>>            TheXor = Op0.getNode();
>>            Equal = true;
>>          }
>>
>>        SDValue NodeToReplace = Trunc ? SDValue(Trunc, 0) : N1;
>>
>>        EVT SetCCVT = NodeToReplace.getValueType();
>>        if (LegalTypes)
>>          SetCCVT = TLI.getSetCCResultType(SetCCVT);
>>        SDValue SetCC = DAG.getSetCC(TheXor->getDebugLoc(),
>>                                     SetCCVT,
>>                                     Op0, Op1,
>>                                     Equal ? ISD::SETEQ : ISD::SETNE);
>>        // Replace the uses of XOR with SETCC
>>        WorkListRemover DeadNodes(*this);
>>        DAG.ReplaceAllUsesOfValueWith(NodeToReplace,
SetCC,&DeadNodes);
>>        removeFromWorkList(NodeToReplace.getNode());
>>        DAG.DeleteNode(NodeToReplace.getNode());
>>        return DAG.getNode(ISD::BRCOND, N->getDebugLoc(),
>>                           MVT::Other, Chain, SetCC, N2);
>>      }
>>    }
>> }
>>
>>
>> Which then optimizes the xor into setcc which looks ALL bits, not just
>> the lowest bit.
>>
>>
>>
>>
>> Optimized legalized selection DAG:
>> SelectionDAG has 21 nodes:
>>    0x2536968: ch = EntryToken [ORD=1] [ID=0]
>>
>>    0x2384a70: i32 = undef [ORD=1] [ID=2]
>>
>>        0x2536968:<multiple use>
>>        0x2384b70: i32 = FrameIndex<-1>  [ORD=1] [ID=1]
>>
>>        0x2384a70:<multiple use>
>>      0x2384570: i32,ch = load 0x2536968, 0x2384b70,
>> 0x2384a70<LD4[FixedStack-1]>  [ORD=1] [ID=10]
>>
>>      0x2384470: ch = ValueType:i8 [ORD=1] [ID=4]
>>
>>    0x2384970: i32 = AssertZext 0x2384570, 0x2384470 [ORD=1] [ID=12]
>>
>>        0x2536968:<multiple use>
>>        0x2384670: i32 = FrameIndex<-2>  [ORD=2] [ID=3]
>>
>>        0x2384a70:<multiple use>
>>      0x2384c70: i32,ch = load 0x2536968, 0x2384670,
>> 0x2384a70<LD4[FixedStack-2]>  [ORD=2] [ID=11]
>>
>>      0x2384370: ch = ValueType:i16 [ORD=2] [ID=5]
>>
>>    0x2384270: i32 = AssertZext 0x2384c70, 0x2384370 [ORD=2] [ID=13]
>>
>>          0x2536968:<multiple use>
>>          0x2384d70: i32 = Register %reg16384 [ID=6]
>>
>>          0x2384270:<multiple use>
>>        0x2384e70: ch = CopyToReg 0x2536968, 0x2384d70, 0x2384270
[ID=17]
>>
>>          0x2536968:<multiple use>
>>          0x2385070: i32 = Register %reg16385 [ID=7]
>>
>>          0x2384970:<multiple use>
>>        0x24b33f0: ch = CopyToReg 0x2536968, 0x2385070, 0x2384970
[ID=14]
>>
>>          0x2536968:<multiple use>
>>          0x24b35f0: i32 = Register %reg16386 [ID=8]
>>
>>          0x2384270:<multiple use>
>>        0x24b36f0: ch = CopyToReg 0x2536968, 0x24b35f0, 0x2384270
[ID=16]
>>
>>      0x24b40f0: ch = TokenFactor 0x2384e70, 0x24b33f0, 0x24b36f0
[ID=19]
>>
>>        0x2384270:<multiple use>
>>        0x2384970:<multiple use>
>>        0x24b38f0: ch = setne
>>
>>      0x24b39f0: i1 = setcc 0x2384270, 0x2384970, 0x24b38f0
>>
>>      0x24b3ff0: ch = BasicBlock<  0x25c5ff8>  [ID=9]
>>
>>    0x24b41f0: ch = brcond 0x24b40f0, 0x24b39f0, 0x24b3ff0 [ID=20]
>>
>>
>>
>>
>> Here we have lost the information that we are only comparing the lowest
>> bits, as the trunc is gone, and the setcc is done with 32-bit
>> comparison, comparing all bits.
>>
>>
>>
>> The way our backend is dynamically generated and loaded from a dynamic
>> library into custom executable which also links to llvm makes it a bit
>> hard to send a "easy to test" backend package for this
situation.
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Heikki Kultala

2010-Oct-04 12:00 UTC

head link

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG

Bill Wendling wrote:> On Sep 30, 2010, at 2:13 AM, Heikki Kultala wrote:
> 
>> Bill Wendling wrote:
>>> On Sep 29, 2010, at 12:36 AM, Heikki Kultala wrote:
>>>
>>>> On 29 Sep 2010, at 06:25, Heikki Kultala wrote:
>>>>
>>>>> Our architecture has 1-bit boolean predicate registers.
>>>>>
>>>>> I've defined comparison
>>>>>
>>>>> def NErrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (setne
I32Regs:$op1, I32Regs:$op2))]>;
>>>>>
>>>>> But then I end up having the following bug:
>>>>>
>>>>> Code
>>>>>
>>>>> %0 = zext i8 %data to i32
>>>>> %1 = zext i16 %crc to i32
>>>>> %2 = xor i32 %1, %0
>>>>> %3 = and i32 %2, 1
>>>>> %4 = icmp eq i32 %3, 0
>>>>>
>>>>> which compares the lowest bits of the 2 variables
>>>>> ends up being compiled as
>>>>>
>>>>>      %reg16384<def> = LDWi <fi#-2>, 0;
mem:LD4[FixedStack-2] I32Regs:%reg16384
>>>>>      %reg16385<def> = LDWi <fi#-1>, 0;
mem:LD4[FixedStack-1] I32Regs:%reg16385
>>>>>      %reg16386<def> = COPY %reg16384;
I32Regs:%reg16386,16384
>>>>>      %reg16390<def> = NErrb %reg16384, %reg16385;
I1Regs:%reg16390 I32Regs:%reg16384,16385
>>>>>
>>>>> which just compares ALL BITS of the variables.
>>>> I also have a pattern:
>>>>
>>>> def XORrrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (trunc (xor
I32Regs:$op1, I32Regs:$op2)))]>;
>>>>
>>>> Which can do the whole 3-operation code sequence correctly with
one operation.
>>>>
>>>> With LLVM 2.7 this correct operation is selected, with LLVM 2.8
the wrong operation(which compares all bits) is chosen
>>>>
>>>> So this looks like a bug in LLVM 2.8 isel?
>>>>
>>> Hi Heikki,
>>>
>>> We need a better example of what's going on. What's the
original code? Also, I don't have access to your back-end's code so
it's hard to tell just from these snippets what's going on. For
instance, it's not clear whether it's the instruction selector
that's at fault or if your .td files have a bug in them somewhere.
>> The original code is:
> 
> [snip]
> 
>> where the interesting lines are lines 12-13:
>>
>>                 x16 = (e_u8)(((data) ^ ((e_u8)crc))&1);
>>                 if (x16 == 1)
>>
>> The code which goes into isel is:
>>
>> bb.nph:
>>   %0 = zext i8 %data to i32
>>   %1 = zext i16 %crc to i32
>>   %2 = xor i32 %1, %0
>>   %3 = and i32 %2, 1
>>   %4 = icmp eq i32 %3, 0
>>   br i1 %4, label %bb.nph._crit_edge, label %5
>>
>> inside selectiondag this becomes:
>>
>> Legalized selection DAG:
> 
> [snip]
> 
>>         0x248d280: <multiple use>
>>         0x248d980: <multiple use>
>>       0x25bb7f0: i32 = xor 0x248d280, 0x248d980 [ORD=3] [ID=15]
>>
>>     0x25bbbf0: i1 = truncate 0x25bb7f0 [ID=18]
> 
> This truncate is weird to me. If anything, it should be an "and"
instruction. I have a feeling that your back-end is telling instruction
selection and the type legalizer that it's okay to replace the normal
"and" with this truncate call, which leads to your troubles later on.
It would seem that the truncate is created by:

TargetLowering::SimplifySetCC

...


      if (N0.getOpcode() == ISD::SETCC &&
           isTypeLegal(VT) && VT.bitsLE(N0.getValueType())) {
         bool TrueWhenTrue = (Cond == ISD::SETEQ) ^ 
(N1C->getAPIntValue() != 1);
         if (TrueWhenTrue)
           return DAG.getNode(ISD::TRUNCATE, dl, VT, N0);

         // Invert the condition.
         ISD::CondCode CC =
cast<CondCodeSDNode>(N0.getOperand(2))->get();
         CC = ISD::getSetCCInverse(CC,
 
N0.getOperand(0).getValueType().isInteger());
         return DAG.getSetCC(dl, VT, N0.getOperand(0), N0.getOperand(1), 
CC);
       }


and the AND is then dropped by

TargetLowering::SimplifyDemandedBits

...


   switch (Op.getOpcode()) {
...
   case ISD::AND:
     // If the RHS is a constant, check to see if the LHS would be zero 
without
     // using the bits from the RHS.  Below, we use knowledge about the 
RHS to
     // simplify the LHS, here we're using information from the LHS to 
simplify
     // the RHS.
     if (ConstantSDNode *RHSC = 
dyn_cast<ConstantSDNode>(Op.getOperand(1))) {
       APInt LHSZero, LHSOne;
       TLO.DAG.ComputeMaskedBits(Op.getOperand(0), NewMask,
                                 LHSZero, LHSOne, Depth+1);
       // If the LHS already has zeros where RHSC does, this and is dead.
       if ((LHSZero & NewMask) == (~RHSC->getAPIntValue() & NewMask))
	  return TLO.CombineTo(Op, Op.getOperand(0));





As neither of these are virtual functions, we cannot create an 
workaround hack for our backend to easily circumvent this bug.




It would now seem that TCE users cannot use the default LLVM 2.8 but 
we'll have to distribute our own patch to disable the invalid dropping 
of the trunc and make all our users compile LLVM themselves with the 
patch :(

Apparently Analagous Threads

Search for more seemingly similar threads

llvm dev - Oct 2010 - [LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble - might be a bug in LLVM 2.8?)

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble - might be a bug in LLVM 2.8?)

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble - might be a bug in LLVM 2.8?)

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble - might be a bug in LLVM 2.8?)

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG? (Re: comparison pattern trouble - might be a bug in LLVM 2.8?)

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG

Apparently Analagous Threads