Hi All,
     I'm writing a backend for a target which only supports 4-byte, 
4-byte-aligned loads and stores.  I custom-lower all {*EXT}LOAD and 
STORE nodes in TargetISelLowering.cpp to take advantage of all alignment 
information available to the backend, rather than treat each load and 
store conservatively, which takes O(10) instructions.  My target's 
allowsUnalignedMemoryOperations() always returns 'false', and the 
setOperationAction()s for i8,i16,i32 loads and stores are all 'Custom'.
     I'm running into a problem where DAGCombiner is being too clever 
for me; it runs LegalizeDAG, which calls my custom LowerLOAD() and 
LowerSTORE() routines (which emit between 1 and O(10) SDValues, 
depending on alignment information), and then runs DAGCombine.  To lower 
an i16 STORE that is known to be in the high-addressed 2 bytes of a word 
on my little-endian target, I emit and LD4 from the word-aligned address 
and an SRL 16 to shift the i16 into the LSbits of the register.
DAGCombine visit()s an ISD::SRL node and notices that it is 
right-shifting the result of an LD4 from %arrayidx4 by 16 bits, and 
replaces it with an LD2 from %arrayidx+2.
Replaces
--------
0x17f7070: i32,ch = load 0x17faa00, 0x17f7f70, 0x17f6a70<LD4[%arrayidx4]>
0x17f94c0: i32 = Constant<16> [ORD=9] [ID=10]
0x17f7470: i32 = srl 0x17f7070, 0x17f94c0
With
----
0x17fceb0: i32,ch = load 0x17faa00, 0x17fac00, 
0x17f6a70<LD2[%arrayidx4+2], zext from i16>
That seems like a logical thing to do on a lot of targets, but in my 
case, the LD4->SRL combo was emitted precisely because I *don't* want a 
ZEXTLOAD i16.  I'm wondering if there is:
a) A way to turn off this optimization in DAGCombine if your target 
doesn't support LD2 natively
b) A way to (repeatedly) call my LowerLOAD/LowerSTORE functions after 
DAGCombine until no LD1's or LD2's remain
c) A completely different way I should be approaching this (e.g., in 
ISelDAGToDAG?)
For now, the workaround is to have a conservative pattern in my 
InstrInfo.td that will do the right thing with any LD2's DAGCombine 
creates, but ideally I can find a way to keep DAGCombine from fighting me :)
Any help would be greatly appreciated.
Best,
Matt
On Wed, Jul 27, 2011 at 2:28 PM, Matt Johnson <johnso87 at crhc.illinois.edu> wrote:> Hi All, > I'm writing a backend for a target which only supports 4-byte, > 4-byte-aligned loads and stores. I custom-lower all {*EXT}LOAD and > STORE nodes in TargetISelLowering.cpp to take advantage of all alignment > information available to the backend, rather than treat each load and > store conservatively, which takes O(10) instructions. My target's > allowsUnalignedMemoryOperations() always returns 'false', and the > setOperationAction()s for i8,i16,i32 loads and stores are all 'Custom'. > > I'm running into a problem where DAGCombiner is being too clever > for me; it runs LegalizeDAG, which calls my custom LowerLOAD() and > LowerSTORE() routines (which emit between 1 and O(10) SDValues, > depending on alignment information), and then runs DAGCombine. To lower > an i16 STORE that is known to be in the high-addressed 2 bytes of a word > on my little-endian target, I emit and LD4 from the word-aligned address > and an SRL 16 to shift the i16 into the LSbits of the register. > > DAGCombine visit()s an ISD::SRL node and notices that it is > right-shifting the result of an LD4 from %arrayidx4 by 16 bits, and > replaces it with an LD2 from %arrayidx+2. > > Replaces > -------- > 0x17f7070: i32,ch = load 0x17faa00, 0x17f7f70, 0x17f6a70<LD4[%arrayidx4]> > 0x17f94c0: i32 = Constant<16> [ORD=9] [ID=10] > 0x17f7470: i32 = srl 0x17f7070, 0x17f94c0 > > With > ---- > 0x17fceb0: i32,ch = load 0x17faa00, 0x17fac00, > 0x17f6a70<LD2[%arrayidx4+2], zext from i16> > > That seems like a logical thing to do on a lot of targets, but in my > case, the LD4->SRL combo was emitted precisely because I *don't* want a > ZEXTLOAD i16. I'm wondering if there is: > > a) A way to turn off this optimization in DAGCombine if your target > doesn't support LD2 nativelyThis. I think you're talking about DAGCombiner::ReduceLoadWidth... and the "if (LegalOperations && !TLI.isLoadExtLegal(ExtType, ExtVT))" is supposed to ensure that the transformation in question is safe. That said, IIRC I fixed a bug there recently; are you using trunk? -Eli
Hi Eli, On 07/27/2011 04:59 PM, Eli Friedman wrote:> On Wed, Jul 27, 2011 at 2:28 PM, Matt Johnson > <johnso87 at crhc.illinois.edu> wrote: >> Hi All, >> I'm writing a backend for a target which only supports 4-byte, >> 4-byte-aligned loads and stores. I custom-lower all {*EXT}LOAD and >> STORE nodes in TargetISelLowering.cpp to take advantage of all alignment >> information available to the backend, rather than treat each load and >> store conservatively, which takes O(10) instructions. My target's >> allowsUnalignedMemoryOperations() always returns 'false', and the >> setOperationAction()s for i8,i16,i32 loads and stores are all 'Custom'. >> >> I'm running into a problem where DAGCombiner is being too clever >> for me; it runs LegalizeDAG, which calls my custom LowerLOAD() and >> LowerSTORE() routines (which emit between 1 and O(10) SDValues, >> depending on alignment information), and then runs DAGCombine. To lower >> an i16 STORE that is known to be in the high-addressed 2 bytes of a word >> on my little-endian target, I emit and LD4 from the word-aligned address >> and an SRL 16 to shift the i16 into the LSbits of the register. >> >> DAGCombine visit()s an ISD::SRL node and notices that it is >> right-shifting the result of an LD4 from %arrayidx4 by 16 bits, and >> replaces it with an LD2 from %arrayidx+2. >> >> Replaces >> -------- >> 0x17f7070: i32,ch = load 0x17faa00, 0x17f7f70, 0x17f6a70<LD4[%arrayidx4]> >> 0x17f94c0: i32 = Constant<16> [ORD=9] [ID=10] >> 0x17f7470: i32 = srl 0x17f7070, 0x17f94c0 >> >> With >> ---- >> 0x17fceb0: i32,ch = load 0x17faa00, 0x17fac00, >> 0x17f6a70<LD2[%arrayidx4+2], zext from i16> >> >> That seems like a logical thing to do on a lot of targets, but in my >> case, the LD4->SRL combo was emitted precisely because I *don't* want a >> ZEXTLOAD i16. I'm wondering if there is: >> >> a) A way to turn off this optimization in DAGCombine if your target >> doesn't support LD2 natively > > This. > > I think you're talking about DAGCombiner::ReduceLoadWidth... and the > "if (LegalOperations&& !TLI.isLoadExtLegal(ExtType, ExtVT))" is > supposed to ensure that the transformation in question is safe. That > said, IIRC I fixed a bug there recently; are you using trunk?Perfect! I'm using 2.8 for now (am hoping to roll forward to trunk and stay there in a month or two), and 2.8 only had that check for SIGN_EXTEND_INREG, not SRL or others. The bugfix was in r124587. I'm now seeing a similar problem with SimplifyDemandedBits() on a 4-byte-aligned i8 load. LowerLOAD() emits an aligned LD4 followed by an AND with constant 0xFF. SimplifyDemandedBits() sees this and changes it to a zero-extended LD1. Is this the same situation, where a bugfix was made after 2.8? Any idea where to look?> > -EliThanks! -Matt
Possibly Parallel Threads
- [LLVMdev] Avoiding load narrowing in DAGCombiner
- [LLVMdev] Avoiding load narrowing in DAGCombiner
- [LLVMdev] Avoiding load narrowing in DAGCombiner
- [LLVMdev] [llvm] r195903 - AArch64: Fix a bug about disassembling post-index load single element to 4 vectors
- [LLVMdev] [llvm] r195903 - AArch64: Fix a bug about disassembling post-index load single element to 4 vectors