I've developed a working back-end for a custom architecture, based on LLVM 2.6. I'm now trying to cover more of the unique features of this architecture. To make use of one such feature, I'm trying something cunning/crazy with the stack - implementing it in a type of memory that can only be addressed via immediates. I've got this mostly working. However, I came across a problem which I've been unable to work around: lowering the IR (even without any optimisations enabled) often requires the pattern: i32 = FrameIndex <n> For normal memory, I was using the following instruction to match this pattern: // Get the address in memory corresponding to the given frame index, saving the address // in a register. def MOV_FI : PseudoInstr<(outs GPR:$dst), (ins frameIndex:$addr), "// $dst := frame index $addr", [(set GPR:$dst, frameIndex:$addr)]>; Which is later replaced by a MOV (output register = stack pointer + constant offset) in eliminateFrameIndex(). However, it isn't appropriate to do this with the proposed stack memory - it doesn't make sense to move the address into a register (where arithmetic can be performed on it), as it isn't possible to move that back to the domain of an immediate. So I conditionally disabled this instruction. But that leads to most programs failing to select the above pattern. The issue is that this pattern is required even in code that doesn't conceptually seem to need it (see the example below). I couldn't figure out how to avoid this during DAG legalisation. Most often, the resulting machine assembly when the above pattern is enabled, simply stores a particular stack slot in a register, for later use in the same basic block, e.g.: MOV out=r4 in=SP+4 LOAD out=r4 addr=r4 despite patterns existing for LOAD with a constant offset (which is successfully used by other stack slots in the same basic block), e.g.: LOAD out=r3 addr=SP off=8 Am I missing some other patterns that would avoid this? For example, is it possible to write patterns that allow for arithmetic involving only immediates, with the result being another immediate? If all else fails, I was thinking of writing a custom pass to identify and remove these. But that could be a lot of work. Thanks, - Mark Example: int result; int foo(int cond, int a, int b) { return cond? a : b; } int main() { return result = foo(1, 2, 3); // Expected: result = 2. } Resulting IR: @result = common global i32 0, align 4 ; <i32*> [#uses=2] define i32 @foo(i32 %cond, i32 %a, i32 %b) nounwind { entry: %retval = alloca i32 ; <i32*> [#uses=2] %cond.addr = alloca i32 ; <i32*> [#uses=2] %a.addr = alloca i32 ; <i32*> [#uses=2] %b.addr = alloca i32 ; <i32*> [#uses=2] store i32 %cond, i32* %cond.addr store i32 %a, i32* %a.addr store i32 %b, i32* %b.addr %tmp = load i32* %cond.addr ; <i32> [#uses=1] %tobool = icmp ne i32 %tmp, 0 ; <i1> [#uses=1] %tmp1 = load i32* %a.addr ; <i32> [#uses=1] %tmp2 = load i32* %b.addr ; <i32> [#uses=1] %cond3 = select i1 %tobool, i32 %tmp1, i32 %tmp2 ; <i32> [#uses=1] store i32 %cond3, i32* %retval %0 = load i32* %retval ; <i32> [#uses=1] ret i32 %0 } define i32 @main() nounwind { entry: %retval = alloca i32 ; <i32*> [#uses=3] store i32 0, i32* %retval %call = call i32 @foo(i32 1, i32 2, i32 3) ; <i32> [#uses=1] store i32 %call, i32* @result %tmp = load i32* @result ; <i32> [#uses=1] store i32 %tmp, i32* %retval %0 = load i32* %retval ; <i32> [#uses=1] ret i32 %0 } (Note: for simplicity, the calling convention in use here places all arguments on the stack) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100117/9cfd12f5/attachment.html>
On Jan 17, 2010, at 2:56 AM, Mark Muir wrote:> I've developed a working back-end for a custom architecture, based on LLVM 2.6. I'm now trying to cover more of the unique features of this architecture. > > To make use of one such feature, I'm trying something cunning/crazy with the stack - implementing it in a type of memory that can only be addressed via immediates. > > I've got this mostly working. However, I came across a problem which I've been unable to work around: lowering the IR (even without any optimisations enabled) often requires the pattern: > > i32 = FrameIndex <n> > > For normal memory, I was using the following instruction to match this pattern: > > // Get the address in memory corresponding to the given frame index, saving the address > // in a register. > def MOV_FI : PseudoInstr<(outs GPR:$dst), (ins frameIndex:$addr), > "// $dst := frame index $addr", > [(set GPR:$dst, frameIndex:$addr)]>; > > Which is later replaced by a MOV (output register = stack pointer + constant offset) in eliminateFrameIndex(). > > However, it isn't appropriate to do this with the proposed stack memory - it doesn't make sense to move the address into a register (where arithmetic can be performed on it), as it isn't possible to move that back to the domain of an immediate. So I conditionally disabled this instruction. But that leads to most programs failing to select the above pattern. > > The issue is that this pattern is required even in code that doesn't conceptually seem to need it (see the example below). I couldn't figure out how to avoid this during DAG legalisation. Most often, the resulting machine assembly when the above pattern is enabled, simply stores a particular stack slot in a register, for later use in the same basic block, e.g.: > > MOV out=r4 in=SP+4 > LOAD out=r4 addr=r4 > > despite patterns existing for LOAD with a constant offset (which is successfully used by other stack slots in the same basic block), e.g.: > > LOAD out=r3 addr=SP off=8 > > Am I missing some other patterns that would avoid this? For example, is it possible to write patterns that allow for arithmetic involving only immediates, with the result being another immediate?Sounds like your load / store address selection routine isn't working like what you expected. Evan> > If all else fails, I was thinking of writing a custom pass to identify and remove these. But that could be a lot of work. > > Thanks, > > - Mark > > > Example: > > int result; > > int foo(int cond, int a, int b) > { > return cond? a : b; > } > > int main() > { > return result = foo(1, 2, 3); > // Expected: result = 2. > } > > Resulting IR: > > @result = common global i32 0, align 4 ; <i32*> [#uses=2] > > define i32 @foo(i32 %cond, i32 %a, i32 %b) nounwind { > entry: > %retval = alloca i32 ; <i32*> [#uses=2] > %cond.addr = alloca i32 ; <i32*> [#uses=2] > %a.addr = alloca i32 ; <i32*> [#uses=2] > %b.addr = alloca i32 ; <i32*> [#uses=2] > store i32 %cond, i32* %cond.addr > store i32 %a, i32* %a.addr > store i32 %b, i32* %b.addr > %tmp = load i32* %cond.addr ; <i32> [#uses=1] > %tobool = icmp ne i32 %tmp, 0 ; <i1> [#uses=1] > %tmp1 = load i32* %a.addr ; <i32> [#uses=1] > %tmp2 = load i32* %b.addr ; <i32> [#uses=1] > %cond3 = select i1 %tobool, i32 %tmp1, i32 %tmp2 ; <i32> [#uses=1] > store i32 %cond3, i32* %retval > %0 = load i32* %retval ; <i32> [#uses=1] > ret i32 %0 > } > > define i32 @main() nounwind { > entry: > %retval = alloca i32 ; <i32*> [#uses=3] > store i32 0, i32* %retval > %call = call i32 @foo(i32 1, i32 2, i32 3) ; <i32> [#uses=1] > store i32 %call, i32* @result > %tmp = load i32* @result ; <i32> [#uses=1] > store i32 %tmp, i32* %retval > %0 = load i32* %retval ; <i32> [#uses=1] > ret i32 %0 > } > > (Note: for simplicity, the calling convention in use here places all arguments on the stack) > > _______________________________________________ > LLVM Developers mailing list > LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev-------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100118/9b3be6b4/attachment.html>
>> I'm trying something cunning/crazy with the stack - implementing it in a type of memory that can only be addressed via immediates. >> >> I've got this mostly working. However, I came across a problem which I've been unable to work around: lowering the IR (even without any optimisations enabled) often requires the pattern: >> >> i32 = FrameIndex <n> >> >> It isn't appropriate to do this with the proposed stack memory - it doesn't make sense to move the address into a register, as it isn't possible to move that back to the domain of an immediate. So I conditionally disabled this instruction. But that leads to most programs failing to select the above pattern. > > Sounds like your load / store address selection routine isn't working like what you expected. >Thanks for the reply. Unfortunately, this doesn't seem to be the problem. I have the following definition for the frameIndex: def frameIndex : Operand<i32>, ComplexPattern<i32 /*valueType*/, 1 /*numOperands*/, "SelectFrameIndex" /*selectFunction*/, [frameindex] /*rootNodes*/> { let PrintMethod = "printFrameIndexOperand"; let MIOperandInfo = (ops GPR); } And the following selection code: bool MyDAGToDAGISel:: SelectFrameIndex(SDValue Op, SDValue N, SDValue& Address) { if (FrameIndexSDNode* FIN = dyn_cast<FrameIndexSDNode>(N)) { Address = CurDAG->getTargetFrameIndex(FIN->getIndex(), MVT::i32); return true; } return false; } In light of your comment, I tried extending this method to only allow cases where Op is ISD::LOAD or ISD::STORE. I found this made no difference to the behaviour. That was surprising, so I added code to print out each instruction seen by that method. And it turns out that all the operations were loads or stores anyway. So it must be later on that the conversion happens, which turns the operation into some form of indirect addressing. To further explore the example I gave in my original email, I have an instruction matching the pattern: [(set GPR:$dst, (select GPR:$sel, immOrGPR:$a, immOrGPR:$b))] The target-independent IR (as shown in the original message):> define i32 @foo(i32 %cond, i32 %a, i32 %b) nounwind { > entry: > %retval = alloca i32 ; <i32*> [#uses=2] > %cond.addr = alloca i32 ; <i32*> [#uses=2] > %a.addr = alloca i32 ; <i32*> [#uses=2] > %b.addr = alloca i32 ; <i32*> [#uses=2] > store i32 %cond, i32* %cond.addr > store i32 %a, i32* %a.addr > store i32 %b, i32* %b.addr > %tmp = load i32* %cond.addr ; <i32> [#uses=1] > %tobool = icmp ne i32 %tmp, 0 ; <i1> [#uses=1] > %tmp1 = load i32* %a.addr ; <i32> [#uses=1] > %tmp2 = load i32* %b.addr ; <i32> [#uses=1] > %cond3 = select i1 %tobool, i32 %tmp1, i32 %tmp2 ; <i32> [#uses=1]To me seems to be doing what I want - i.e. storing arguments 'a' and 'b' into local stack slots, then selecting between the values stored in those stack slots. This appears to be what is seen during selection - SelectFrameIndex(), as indicated above. The normal debug output from the back-end shows that the target-independent IR gets lowered to a select between the addresses of the two argument stack slots, rather than their values. That's a nice optimisation in general, but isn't allowed here, hence leading to the: LLVM ERROR: Cannot yet select: 0x1811798: i32 = FrameIndex <3> What transforms are performed during selection? I think this is where I should be looking, but I'm a bit lost. Any help would be greatly appreciated. - Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20100119/c3e5d2dd/attachment.html>