thr3ads.net - llvm dev - [LLVMdev] Avoiding load narrowing in DAGCombiner [Jul 2011]

If this information is useful, please help other people find it:
Share via:

Matt Johnson

2011-Jul-27 22:50 UTC

[LLVMdev] Avoiding load narrowing in DAGCombiner

Hi Eli,

On 07/27/2011 04:59 PM, Eli Friedman wrote:> On Wed, Jul 27, 2011 at 2:28 PM, Matt Johnson
> <johnso87 at crhc.illinois.edu>  wrote:
>> Hi All,
>>      I'm writing a backend for a target which only supports 4-byte,
>> 4-byte-aligned loads and stores.  I custom-lower all {*EXT}LOAD and
>> STORE nodes in TargetISelLowering.cpp to take advantage of all
alignment
>> information available to the backend, rather than treat each load and
>> store conservatively, which takes O(10) instructions.  My target's
>> allowsUnalignedMemoryOperations() always returns 'false', and
the
>> setOperationAction()s for i8,i16,i32 loads and stores are all
'Custom'.
>>
>>      I'm running into a problem where DAGCombiner is being too
clever
>> for me; it runs LegalizeDAG, which calls my custom LowerLOAD() and
>> LowerSTORE() routines (which emit between 1 and O(10) SDValues,
>> depending on alignment information), and then runs DAGCombine.  To
lower
>> an i16 STORE that is known to be in the high-addressed 2 bytes of a
word
>> on my little-endian target, I emit and LD4 from the word-aligned
address
>> and an SRL 16 to shift the i16 into the LSbits of the register.
>>
>> DAGCombine visit()s an ISD::SRL node and notices that it is
>> right-shifting the result of an LD4 from %arrayidx4 by 16 bits, and
>> replaces it with an LD2 from %arrayidx+2.
>>
>> Replaces
>> --------
>> 0x17f7070: i32,ch = load 0x17faa00, 0x17f7f70,
0x17f6a70<LD4[%arrayidx4]>
>> 0x17f94c0: i32 = Constant<16>  [ORD=9] [ID=10]
>> 0x17f7470: i32 = srl 0x17f7070, 0x17f94c0
>>
>> With
>> ----
>> 0x17fceb0: i32,ch = load 0x17faa00, 0x17fac00,
>> 0x17f6a70<LD2[%arrayidx4+2], zext from i16>
>>
>> That seems like a logical thing to do on a lot of targets, but in my
>> case, the LD4->SRL combo was emitted precisely because I *don't*
want a
>> ZEXTLOAD i16.  I'm wondering if there is:
>>
>> a) A way to turn off this optimization in DAGCombine if your target
>> doesn't support LD2 natively
>
> This.
>
> I think you're talking about DAGCombiner::ReduceLoadWidth... and the
> "if (LegalOperations&&  !TLI.isLoadExtLegal(ExtType,
ExtVT))" is
> supposed to ensure that the transformation in question is safe.  That
> said, IIRC I fixed a bug there recently; are you using trunk?
Perfect!  I'm using 2.8 for now (am hoping to roll forward to trunk and 
stay there in a month or two), and 2.8 only had that check for 
SIGN_EXTEND_INREG, not SRL or others.  The bugfix was in r124587.

I'm now seeing a similar problem with SimplifyDemandedBits() on a 
4-byte-aligned i8 load.  LowerLOAD() emits an aligned LD4 followed by an 
AND with constant 0xFF.  SimplifyDemandedBits() sees this and changes it 
to a zero-extended LD1.  Is this the same situation, where a bugfix was 
made after 2.8?  Any idea where to look?
>
> -EliThanks!
-Matt

Eli Friedman

2011-Jul-27 22:59 UTC

head link

[LLVMdev] Avoiding load narrowing in DAGCombiner

On Wed, Jul 27, 2011 at 3:50 PM, Matt Johnson
<johnso87 at crhc.illinois.edu> wrote:> Hi Eli,
>
> On 07/27/2011 04:59 PM, Eli Friedman wrote:
>>
>> On Wed, Jul 27, 2011 at 2:28 PM, Matt Johnson
>> <johnso87 at crhc.illinois.edu>  wrote:
>>>
>>> Hi All,
>>>     I'm writing a backend for a target which only supports
4-byte,
>>> 4-byte-aligned loads and stores.  I custom-lower all {*EXT}LOAD and
>>> STORE nodes in TargetISelLowering.cpp to take advantage of all
alignment
>>> information available to the backend, rather than treat each load
and
>>> store conservatively, which takes O(10) instructions.  My
target's
>>> allowsUnalignedMemoryOperations() always returns 'false',
and the
>>> setOperationAction()s for i8,i16,i32 loads and stores are all
'Custom'.
>>>
>>>     I'm running into a problem where DAGCombiner is being too
clever
>>> for me; it runs LegalizeDAG, which calls my custom LowerLOAD() and
>>> LowerSTORE() routines (which emit between 1 and O(10) SDValues,
>>> depending on alignment information), and then runs DAGCombine.  To
lower
>>> an i16 STORE that is known to be in the high-addressed 2 bytes of a
word
>>> on my little-endian target, I emit and LD4 from the word-aligned
address
>>> and an SRL 16 to shift the i16 into the LSbits of the register.
>>>
>>> DAGCombine visit()s an ISD::SRL node and notices that it is
>>> right-shifting the result of an LD4 from %arrayidx4 by 16 bits, and
>>> replaces it with an LD2 from %arrayidx+2.
>>>
>>> Replaces
>>> --------
>>> 0x17f7070: i32,ch = load 0x17faa00, 0x17f7f70,
0x17f6a70<LD4[%arrayidx4]>
>>> 0x17f94c0: i32 = Constant<16>  [ORD=9] [ID=10]
>>> 0x17f7470: i32 = srl 0x17f7070, 0x17f94c0
>>>
>>> With
>>> ----
>>> 0x17fceb0: i32,ch = load 0x17faa00, 0x17fac00,
>>> 0x17f6a70<LD2[%arrayidx4+2], zext from i16>
>>>
>>> That seems like a logical thing to do on a lot of targets, but in
my
>>> case, the LD4->SRL combo was emitted precisely because I
*don't* want a
>>> ZEXTLOAD i16.  I'm wondering if there is:
>>>
>>> a) A way to turn off this optimization in DAGCombine if your target
>>> doesn't support LD2 natively
>>
>> This.
>>
>> I think you're talking about DAGCombiner::ReduceLoadWidth... and
the
>> "if (LegalOperations&&  !TLI.isLoadExtLegal(ExtType,
ExtVT))" is
>> supposed to ensure that the transformation in question is safe.  That
>> said, IIRC I fixed a bug there recently; are you using trunk?
>
> Perfect!  I'm using 2.8 for now (am hoping to roll forward to trunk and
stay
> there in a month or two), and 2.8 only had that check for
SIGN_EXTEND_INREG,
> not SRL or others.  The bugfix was in r124587.
>
> I'm now seeing a similar problem with SimplifyDemandedBits() on a
> 4-byte-aligned i8 load.  LowerLOAD() emits an aligned LD4 followed by an
AND
> with constant 0xFF.  SimplifyDemandedBits() sees this and changes it to a
> zero-extended LD1.  Is this the same situation, where a bugfix was made
> after 2.8?  Any idea where to look?
The following?

    // If the LHS is '(and load, const)', the RHS is 0,
    // the test is for equality or unsigned, and all 1 bits of the const are
    // in the same partial word, see if we can shorten the load.
    if (DCI.isBeforeLegalize() &&
        N0.getOpcode() == ISD::AND && C1 == 0 &&
        N0.getNode()->hasOneUse() &&
        isa<LoadSDNode>(N0.getOperand(0)) &&
        N0.getOperand(0).getNode()->hasOneUse() &&
        isa<ConstantSDNode>(N0.getOperand(1))) {
      LoadSDNode *Lod = cast<LoadSDNode>(N0.getOperand(0));
      APInt bestMask;
[...]

This shouldn't ever do the wrong thing because of the
DCI.isBeforeLegalize() check, although the result might be less than
ideal.

Note that in general, this stuff only gets very lightly tested.

-Eli

Matt Johnson

2011-Jul-28 01:07 UTC

head link

[LLVMdev] Avoiding load narrowing in DAGCombiner

On 07/27/2011 05:59 PM, Eli Friedman wrote:> On Wed, Jul 27, 2011 at 3:50 PM, Matt Johnson
> <johnso87 at crhc.illinois.edu>  wrote:
>> Hi Eli,
>>
>> On 07/27/2011 04:59 PM, Eli Friedman wrote:
>>> On Wed, Jul 27, 2011 at 2:28 PM, Matt Johnson
>>> <johnso87 at crhc.illinois.edu>    wrote:
>>>> Hi All,
>>>>      I'm writing a backend for a target which only supports
4-byte,
>>>> 4-byte-aligned loads and stores.  I custom-lower all {*EXT}LOAD
and
>>>> STORE nodes in TargetISelLowering.cpp to take advantage of all
alignment
>>>> information available to the backend, rather than treat each
load and
>>>> store conservatively, which takes O(10) instructions.  My
target's
>>>> allowsUnalignedMemoryOperations() always returns
'false', and the
>>>> setOperationAction()s for i8,i16,i32 loads and stores are all
'Custom'.
>>>>
>>>>      I'm running into a problem where DAGCombiner is being
too clever
>>>> for me; it runs LegalizeDAG, which calls my custom LowerLOAD()
and
>>>> LowerSTORE() routines (which emit between 1 and O(10) SDValues,
>>>> depending on alignment information), and then runs DAGCombine. 
To lower
>>>> an i16 STORE that is known to be in the high-addressed 2 bytes
of a word
>>>> on my little-endian target, I emit and LD4 from the
word-aligned address
>>>> and an SRL 16 to shift the i16 into the LSbits of the register.
>>>>
>>>> DAGCombine visit()s an ISD::SRL node and notices that it is
>>>> right-shifting the result of an LD4 from %arrayidx4 by 16 bits,
and
>>>> replaces it with an LD2 from %arrayidx+2.
>>>>
>>>> Replaces
>>>> --------
>>>> 0x17f7070: i32,ch = load 0x17faa00, 0x17f7f70,
0x17f6a70<LD4[%arrayidx4]>
>>>> 0x17f94c0: i32 = Constant<16>    [ORD=9] [ID=10]
>>>> 0x17f7470: i32 = srl 0x17f7070, 0x17f94c0
>>>>
>>>> With
>>>> ----
>>>> 0x17fceb0: i32,ch = load 0x17faa00, 0x17fac00,
>>>> 0x17f6a70<LD2[%arrayidx4+2], zext from i16>
>>>>
>>>> That seems like a logical thing to do on a lot of targets, but
in my
>>>> case, the LD4->SRL combo was emitted precisely because I
*don't* want a
>>>> ZEXTLOAD i16.  I'm wondering if there is:
>>>>
>>>> a) A way to turn off this optimization in DAGCombine if your
target
>>>> doesn't support LD2 natively
>>> This.
>>>
>>> I think you're talking about DAGCombiner::ReduceLoadWidth...
and the
>>> "if (LegalOperations&&    !TLI.isLoadExtLegal(ExtType,
ExtVT))" is
>>> supposed to ensure that the transformation in question is safe. 
That
>>> said, IIRC I fixed a bug there recently; are you using trunk?
>> Perfect!  I'm using 2.8 for now (am hoping to roll forward to trunk
and stay
>> there in a month or two), and 2.8 only had that check for
SIGN_EXTEND_INREG,
>> not SRL or others.  The bugfix was in r124587.
>>
>> I'm now seeing a similar problem with SimplifyDemandedBits() on a
>> 4-byte-aligned i8 load.  LowerLOAD() emits an aligned LD4 followed by
an AND
>> with constant 0xFF.  SimplifyDemandedBits() sees this and changes it to
a
>> zero-extended LD1.  Is this the same situation, where a bugfix was made
>> after 2.8?  Any idea where to look?
> The following?
>
>      // If the LHS is '(and load, const)', the RHS is 0,
>      // the test is for equality or unsigned, and all 1 bits of the const
are
>      // in the same partial word, see if we can shorten the load.
>      if (DCI.isBeforeLegalize()&&
>          N0.getOpcode() == ISD::AND&&  C1 == 0&&
>          N0.getNode()->hasOneUse()&&
>          isa<LoadSDNode>(N0.getOperand(0))&&
>          N0.getOperand(0).getNode()->hasOneUse()&&
>          isa<ConstantSDNode>(N0.getOperand(1))) {
>        LoadSDNode *Lod = cast<LoadSDNode>(N0.getOperand(0));
>        APInt bestMask;
> [...]
>It turned out that this problem is fixed by (your) r135462.  
isLoadExtLegal() returned true if the extload was marked 'Custom', which
is a problem post-legalize because you can't go back and do your custom 
lowering :)  It sounds like that patch was motivated by CellSPU, which 
isn't surprising since I'm in much the same boat w.r.t. custom 
load/store lowering.

On a side note, are there other consumers of isLoadExt/TruncStoreLegal() 
that run pre-legalize and *do* want 'true' when the extload/truncstore 
is 'Custom'?
> This shouldn't ever do the wrong thing because of the
> DCI.isBeforeLegalize() check, although the result might be less than
> ideal.
>
> Note that in general, this stuff only gets very lightly tested.
>Fair enough.  Hopefully I'll be able to come up with some bugfixes of my 
own once I roll the backend forward :)> -EliThanks again,
-Matt

Reasonably Related Threads

Search for more maybe matching threads

llvm dev - Jul 2011 - [LLVMdev] Avoiding load narrowing in DAGCombiner

[LLVMdev] Avoiding load narrowing in DAGCombiner

[LLVMdev] Avoiding load narrowing in DAGCombiner

[LLVMdev] Avoiding load narrowing in DAGCombiner

Reasonably Related Threads