Jim/Tim/Renato,
A few days ago (has it been weeks now?) we discussed a codegen problem on
armv4t having to do with lo->lo register copies. I'd like to start that
discussion again, this time with a patch.
A brief summary of the problem for folks who didn't catch the discussion
earlier, and those like me who forget what they ate for breakfast: ;]
The mov instruction on armv4t (specifically, thumb1) is limited in that it
cannot copy from lo register to lo register, but the register allocator assumes
it can do whatever it wants. This means that copies get emitted which don't
work on that architecture, leading to broken code even for trivial cases like:
"void foo(int a, int b) { bar(b, a); }".
There are several different options which we discussed, and various trade-offs
for each (roughly ordered in amount of work to implement):
1) push {$src} ; pop {$dst}
The idea here is that since push & pop are relatively unrestricted
in
which registers they can operate on, we can use the stack to do the
lo -> lo copy. The big drawback here is performance (both code size
and
speed). On the other hand, it looks 'obviously' correct. The
attached
patch implements this.
2) mov hi, $src ; mov $dst, hi
Here, we copy through some high register, but the options on which one
are a bit limited:
r8 - Callee save (in ARM mode, unused in Thumb mode)
r9 - IIRC, this one is a platform defined special register?
r10 - Caller save (in ARM mode, unused in Thumb mode)
r11 - Caller save (in ARM mode, unused in Thumb mode)
r12 - Linker Scratch
Callee save regs are basically a non-starter, and r9 is not available
on all platforms so that leaves r10-12. I tired using r12, but I think
this clashes with assumptions made by the register scavenger that it
can
use that register. I haven't tried using r10 & r11.
At the time when we discussed this, it was suggested that I just let
the register scavenger take care of it. I tried that unsuccessfully,
and I don't remember what the hang-up was.
3) movs $dsdt, $src
This one clobbers the cpsr. This one works for a lot of cases, but
fails for phi's that end up between, say a cmp and a branch at the
bottom of a loop. To use this one, we'd need liveness information
at
the location of the copy in order to ensure that we don't clobber
cpsr
when it's actually going to be used. I went looking for liveness
information, and found a function that looked like it calculated it for
a small window around a MachineBasicBlock::iterator... that seems like
it would work if I told it that the window was the whole basic block.
Am
I missing a better way to get post-RA liveness? I haven't tried
implementing this.
4) Do some combination of the others, and tell the register allocator
that non-cpsr-clobbering lo-lo copies are slower, and that it should
avoid them.
I've got the patch ready that implements (1). Does it make sense to commit
this
(or something close to it) in the near term to fix the bug, and let someone
else worry about performance later? (Possibly me, but it will probably be a
a while before I come back to this. Hopefully this thread serves as a nice
bread-crumb for anyone who is interested)
Cheers,
Jon
--
Jon Roelofs
jonathan at codesourcery.com
CodeSourcery / Mentor Embedded
-------------- next part --------------
Index: lib/Target/ARM/Thumb1InstrInfo.cpp
==================================================================---
lib/Target/ARM/Thumb1InstrInfo.cpp (revision 436473)
+++ lib/Target/ARM/Thumb1InstrInfo.cpp (revision 437773)
@@ -1,67 +1,82 @@
//===-- Thumb1InstrInfo.cpp - Thumb-1 Instruction Information
-------------===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//
//===----------------------------------------------------------------------===//
//
// This file contains the Thumb-1 implementation of the TargetInstrInfo class.
//
//===----------------------------------------------------------------------===//
+#include "ARMSubtarget.h"
#include "Thumb1InstrInfo.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineMemOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/MC/MCInst.h"
using namespace llvm;
Thumb1InstrInfo::Thumb1InstrInfo(const ARMSubtarget &STI)
: ARMBaseInstrInfo(STI), RI(STI) {
}
/// getNoopForMachoTarget - Return the noop instruction to use for a noop.
void Thumb1InstrInfo::getNoopForMachoTarget(MCInst &NopInst) const {
NopInst.setOpcode(ARM::tMOVr);
NopInst.addOperand(MCOperand::CreateReg(ARM::R8));
NopInst.addOperand(MCOperand::CreateReg(ARM::R8));
NopInst.addOperand(MCOperand::CreateImm(ARMCC::AL));
NopInst.addOperand(MCOperand::CreateReg(0));
}
unsigned Thumb1InstrInfo::getUnindexedOpcode(unsigned Opc) const {
return 0;
}
void Thumb1InstrInfo::copyPhysReg(MachineBasicBlock &MBB,
MachineBasicBlock::iterator I, DebugLoc DL,
unsigned DestReg, unsigned SrcReg,
bool KillSrc) const {
- AddDefaultPred(BuildMI(MBB, I, DL, get(ARM::tMOVr), DestReg)
- .addReg(SrcReg, getKillRegState(KillSrc)));
+ // Need to check the arch.
+ MachineFunction &MF = *MBB.getParent();
+ const ARMSubtarget &st =
MF.getTarget().getSubtarget<ARMSubtarget>();
+
assert(ARM::GPRRegClass.contains(DestReg, SrcReg) &&
"Thumb1 can only copy GPR registers");
+
+ if (st.hasV6Ops() || ARM::hGPRRegClass.contains(SrcReg)
+ || ! ARM::tGPRRegClass.contains(DestReg)) {
+ AddDefaultPred(BuildMI(MBB, I, DL, get(ARM::tMOVr), DestReg)
+ .addReg(SrcReg, getKillRegState(KillSrc)));
+ } else {
+ // This is the only way we can move low -> low for < v6.
+ AddDefaultPred(BuildMI(MBB, I, DL, get(ARM::tPUSH)))
+ .addReg(SrcReg, getKillRegState(KillSrc));
+ AddDefaultPred(BuildMI(MBB, I, DL, get(ARM::tPOP)))
+ .addReg(DestReg, getDefRegState(true));
+ }
}
void Thumb1InstrInfo::
storeRegToStackSlot(MachineBasicBlock &MBB, MachineBasicBlock::iterator I,
unsigned SrcReg, bool isKill, int FI,
const TargetRegisterClass *RC,
const TargetRegisterInfo *TRI) const {
assert((RC == &ARM::tGPRRegClass ||
(TargetRegisterInfo::isPhysicalRegister(SrcReg) &&
isARMLowRegister(SrcReg))) && "Unknown
regclass!");
if (RC == &ARM::tGPRRegClass ||
(TargetRegisterInfo::isPhysicalRegister(SrcReg) &&
isARMLowRegister(SrcReg))) {
DebugLoc DL;
if (I != MBB.end()) DL = I->getDebugLoc();
MachineFunction &MF = *MBB.getParent();
MachineFrameInfo &MFI = *MF.getFrameInfo();
MachineMemOperand *MMO =