thr3ads.net - llvm dev - [LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG [Oct 2010]

If this information is useful, please help other people find it:
Share via:

Heikki Kultala

2010-Oct-04 12:00 UTC

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG

Bill Wendling wrote:> On Sep 30, 2010, at 2:13 AM, Heikki Kultala wrote:
> 
>> Bill Wendling wrote:
>>> On Sep 29, 2010, at 12:36 AM, Heikki Kultala wrote:
>>>
>>>> On 29 Sep 2010, at 06:25, Heikki Kultala wrote:
>>>>
>>>>> Our architecture has 1-bit boolean predicate registers.
>>>>>
>>>>> I've defined comparison
>>>>>
>>>>> def NErrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (setne
I32Regs:$op1, I32Regs:$op2))]>;
>>>>>
>>>>> But then I end up having the following bug:
>>>>>
>>>>> Code
>>>>>
>>>>> %0 = zext i8 %data to i32
>>>>> %1 = zext i16 %crc to i32
>>>>> %2 = xor i32 %1, %0
>>>>> %3 = and i32 %2, 1
>>>>> %4 = icmp eq i32 %3, 0
>>>>>
>>>>> which compares the lowest bits of the 2 variables
>>>>> ends up being compiled as
>>>>>
>>>>>      %reg16384<def> = LDWi <fi#-2>, 0;
mem:LD4[FixedStack-2] I32Regs:%reg16384
>>>>>      %reg16385<def> = LDWi <fi#-1>, 0;
mem:LD4[FixedStack-1] I32Regs:%reg16385
>>>>>      %reg16386<def> = COPY %reg16384;
I32Regs:%reg16386,16384
>>>>>      %reg16390<def> = NErrb %reg16384, %reg16385;
I1Regs:%reg16390 I32Regs:%reg16384,16385
>>>>>
>>>>> which just compares ALL BITS of the variables.
>>>> I also have a pattern:
>>>>
>>>> def XORrrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (trunc (xor
I32Regs:$op1, I32Regs:$op2)))]>;
>>>>
>>>> Which can do the whole 3-operation code sequence correctly with
one operation.
>>>>
>>>> With LLVM 2.7 this correct operation is selected, with LLVM 2.8
the wrong operation(which compares all bits) is chosen
>>>>
>>>> So this looks like a bug in LLVM 2.8 isel?
>>>>
>>> Hi Heikki,
>>>
>>> We need a better example of what's going on. What's the
original code? Also, I don't have access to your back-end's code so
it's hard to tell just from these snippets what's going on. For
instance, it's not clear whether it's the instruction selector
that's at fault or if your .td files have a bug in them somewhere.
>> The original code is:
> 
> [snip]
> 
>> where the interesting lines are lines 12-13:
>>
>>                 x16 = (e_u8)(((data) ^ ((e_u8)crc))&1);
>>                 if (x16 == 1)
>>
>> The code which goes into isel is:
>>
>> bb.nph:
>>   %0 = zext i8 %data to i32
>>   %1 = zext i16 %crc to i32
>>   %2 = xor i32 %1, %0
>>   %3 = and i32 %2, 1
>>   %4 = icmp eq i32 %3, 0
>>   br i1 %4, label %bb.nph._crit_edge, label %5
>>
>> inside selectiondag this becomes:
>>
>> Legalized selection DAG:
> 
> [snip]
> 
>>         0x248d280: <multiple use>
>>         0x248d980: <multiple use>
>>       0x25bb7f0: i32 = xor 0x248d280, 0x248d980 [ORD=3] [ID=15]
>>
>>     0x25bbbf0: i1 = truncate 0x25bb7f0 [ID=18]
> 
> This truncate is weird to me. If anything, it should be an "and"
instruction. I have a feeling that your back-end is telling instruction
selection and the type legalizer that it's okay to replace the normal
"and" with this truncate call, which leads to your troubles later on.
It would seem that the truncate is created by:

TargetLowering::SimplifySetCC

...


      if (N0.getOpcode() == ISD::SETCC &&
           isTypeLegal(VT) && VT.bitsLE(N0.getValueType())) {
         bool TrueWhenTrue = (Cond == ISD::SETEQ) ^ 
(N1C->getAPIntValue() != 1);
         if (TrueWhenTrue)
           return DAG.getNode(ISD::TRUNCATE, dl, VT, N0);

         // Invert the condition.
         ISD::CondCode CC =
cast<CondCodeSDNode>(N0.getOperand(2))->get();
         CC = ISD::getSetCCInverse(CC,
 
N0.getOperand(0).getValueType().isInteger());
         return DAG.getSetCC(dl, VT, N0.getOperand(0), N0.getOperand(1), 
CC);
       }


and the AND is then dropped by

TargetLowering::SimplifyDemandedBits

...


   switch (Op.getOpcode()) {
...
   case ISD::AND:
     // If the RHS is a constant, check to see if the LHS would be zero 
without
     // using the bits from the RHS.  Below, we use knowledge about the 
RHS to
     // simplify the LHS, here we're using information from the LHS to 
simplify
     // the RHS.
     if (ConstantSDNode *RHSC = 
dyn_cast<ConstantSDNode>(Op.getOperand(1))) {
       APInt LHSZero, LHSOne;
       TLO.DAG.ComputeMaskedBits(Op.getOperand(0), NewMask,
                                 LHSZero, LHSOne, Depth+1);
       // If the LHS already has zeros where RHSC does, this and is dead.
       if ((LHSZero & NewMask) == (~RHSC->getAPIntValue() & NewMask))
	  return TLO.CombineTo(Op, Op.getOperand(0));





As neither of these are virtual functions, we cannot create an 
workaround hack for our backend to easily circumvent this bug.




It would now seem that TCE users cannot use the default LLVM 2.8 but 
we'll have to distribute our own patch to disable the invalid dropping 
of the trunc and make all our users compile LLVM themselves with the 
patch :(

Evan Cheng

2010-Oct-04 22:48 UTC

head link

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG

Please test if r115571 has fixed it.

Evan

On Oct 4, 2010, at 5:00 AM, Heikki Kultala wrote:
> Bill Wendling wrote:
>> On Sep 30, 2010, at 2:13 AM, Heikki Kultala wrote:
>> 
>>> Bill Wendling wrote:
>>>> On Sep 29, 2010, at 12:36 AM, Heikki Kultala wrote:
>>>> 
>>>>> On 29 Sep 2010, at 06:25, Heikki Kultala wrote:
>>>>> 
>>>>>> Our architecture has 1-bit boolean predicate registers.
>>>>>> 
>>>>>> I've defined comparison
>>>>>> 
>>>>>> def NErrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (setne
I32Regs:$op1, I32Regs:$op2))]>;
>>>>>> 
>>>>>> But then I end up having the following bug:
>>>>>> 
>>>>>> Code
>>>>>> 
>>>>>> %0 = zext i8 %data to i32
>>>>>> %1 = zext i16 %crc to i32
>>>>>> %2 = xor i32 %1, %0
>>>>>> %3 = and i32 %2, 1
>>>>>> %4 = icmp eq i32 %3, 0
>>>>>> 
>>>>>> which compares the lowest bits of the 2 variables
>>>>>> ends up being compiled as
>>>>>> 
>>>>>>     %reg16384<def> = LDWi <fi#-2>, 0;
mem:LD4[FixedStack-2] I32Regs:%reg16384
>>>>>>     %reg16385<def> = LDWi <fi#-1>, 0;
mem:LD4[FixedStack-1] I32Regs:%reg16385
>>>>>>     %reg16386<def> = COPY %reg16384;
I32Regs:%reg16386,16384
>>>>>>     %reg16390<def> = NErrb %reg16384, %reg16385;
I1Regs:%reg16390 I32Regs:%reg16384,16385
>>>>>> 
>>>>>> which just compares ALL BITS of the variables.
>>>>> I also have a pattern:
>>>>> 
>>>>> def XORrrb : InstTCE<(outs I1Regs:$op3), (ins
I32Regs:$op1,I32Regs:$op2), "", [(set I1Regs:$op3, (trunc (xor
I32Regs:$op1, I32Regs:$op2)))]>;
>>>>> 
>>>>> Which can do the whole 3-operation code sequence correctly
with one operation.
>>>>> 
>>>>> With LLVM 2.7 this correct operation is selected, with LLVM
2.8 the wrong operation(which compares all bits) is chosen
>>>>> 
>>>>> So this looks like a bug in LLVM 2.8 isel?
>>>>> 
>>>> Hi Heikki,
>>>> 
>>>> We need a better example of what's going on. What's the
original code? Also, I don't have access to your back-end's code so
it's hard to tell just from these snippets what's going on. For
instance, it's not clear whether it's the instruction selector
that's at fault or if your .td files have a bug in them somewhere.
>>> The original code is:
>> 
>> [snip]
>> 
>>> where the interesting lines are lines 12-13:
>>> 
>>>                x16 = (e_u8)(((data) ^ ((e_u8)crc))&1);
>>>                if (x16 == 1)
>>> 
>>> The code which goes into isel is:
>>> 
>>> bb.nph:
>>>  %0 = zext i8 %data to i32
>>>  %1 = zext i16 %crc to i32
>>>  %2 = xor i32 %1, %0
>>>  %3 = and i32 %2, 1
>>>  %4 = icmp eq i32 %3, 0
>>>  br i1 %4, label %bb.nph._crit_edge, label %5
>>> 
>>> inside selectiondag this becomes:
>>> 
>>> Legalized selection DAG:
>> 
>> [snip]
>> 
>>>        0x248d280: <multiple use>
>>>        0x248d980: <multiple use>
>>>      0x25bb7f0: i32 = xor 0x248d280, 0x248d980 [ORD=3] [ID=15]
>>> 
>>>    0x25bbbf0: i1 = truncate 0x25bb7f0 [ID=18]
>> 
>> This truncate is weird to me. If anything, it should be an
"and" instruction. I have a feeling that your back-end is telling
instruction selection and the type legalizer that it's okay to replace the
normal "and" with this truncate call, which leads to your troubles
later on.
> 
> It would seem that the truncate is created by:
> 
> TargetLowering::SimplifySetCC
> 
> ...
> 
> 
>      if (N0.getOpcode() == ISD::SETCC &&
>           isTypeLegal(VT) && VT.bitsLE(N0.getValueType())) {
>         bool TrueWhenTrue = (Cond == ISD::SETEQ) ^ 
> (N1C->getAPIntValue() != 1);
>         if (TrueWhenTrue)
>           return DAG.getNode(ISD::TRUNCATE, dl, VT, N0);
> 
>         // Invert the condition.
>         ISD::CondCode CC =
cast<CondCodeSDNode>(N0.getOperand(2))->get();
>         CC = ISD::getSetCCInverse(CC,
> 
> N0.getOperand(0).getValueType().isInteger());
>         return DAG.getSetCC(dl, VT, N0.getOperand(0), N0.getOperand(1), 
> CC);
>       }
> 
> 
> and the AND is then dropped by
> 
> TargetLowering::SimplifyDemandedBits
> 
> ...
> 
> 
>   switch (Op.getOpcode()) {
> ...
>   case ISD::AND:
>     // If the RHS is a constant, check to see if the LHS would be zero 
> without
>     // using the bits from the RHS.  Below, we use knowledge about the 
> RHS to
>     // simplify the LHS, here we're using information from the LHS to 
> simplify
>     // the RHS.
>     if (ConstantSDNode *RHSC = 
> dyn_cast<ConstantSDNode>(Op.getOperand(1))) {
>       APInt LHSZero, LHSOne;
>       TLO.DAG.ComputeMaskedBits(Op.getOperand(0), NewMask,
>                                 LHSZero, LHSOne, Depth+1);
>       // If the LHS already has zeros where RHSC does, this and is dead.
>       if ((LHSZero & NewMask) == (~RHSC->getAPIntValue() &
NewMask))
> 	  return TLO.CombineTo(Op, Op.getOperand(0));
> 
> 
> 
> 
> 
> As neither of these are virtual functions, we cannot create an 
> workaround hack for our backend to easily circumvent this bug.
> 
> 
> 
> 
> It would now seem that TCE users cannot use the default LLVM 2.8 but 
> we'll have to distribute our own patch to disable the invalid dropping 
> of the trunc and make all our users compile LLVM themselves with the 
> patch :(
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20101004/b88053e7/attachment.html>

Heikki Kultala

2010-Oct-05 06:03 UTC

head link

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG - 115571 fixes this

On 5 Oct 2010, at 01:48, Evan Cheng wrote:
> Please test if r115571 has fixed it.
thanks a lot,  I tested and 115571 fixed this.

can it still be merged into 2.8 before release?

Reasonably Related Threads

Search for more possibly parallel threads

llvm dev - Oct 2010 - [LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG

[LLVMdev] Illegal optimization in LLVM 2.8 during SelectionDAG - 115571 fixes this

Reasonably Related Threads