Does anybody have any tips for generating spills/reloads for large non-vector registers? I'm working on a back end for a DSP architecture that has accumulator registers that are too large to be spilled or reloaded with a single instruction. All of their bits can be accessed in word-size chunks via three sub-registers (low, high, and ext). So loading or storing one requires three instructions: one for each sub-register. For quite a while now, my implementation of loadRegFromStackSlot() and storeRegToStackSlot() has assumed that it would only receive physical registers, which makes it fairly straight-forward. They generate three memory instructions, calling TargetRegisterInfo::getSubReg() to get the sub-register operand for each of them. So it was a rude awakening when a test program resulted in a _virtual_ register being passed into loadRegFromStackSlot() (via LiveIntervals::tryFoldMemoryOperand() if it matters). Obviously I need to make some changes. But what? A couple options immediately come to mind: 1. Generate INSERT_SUBREG/EXTRACT_SUBREG machine instructions in loadRegFromStackSlot() and storeRegToStackSlot() to handle virtual registers. Will this work? Is it safe to create additional virtual register from these methods? 2. Emit a single pseudo-instruction for large loads and stores and use a custom pass to expand it to multiple instructions after register allocation. Any other suggestions? -Ken
Jakob Stoklund Olesen
2010-Jul-20 20:04 UTC
[LLVMdev] Spilling multi-word virtual registers
On Jul 20, 2010, at 10:57 AM, Ken Dyck wrote:> Does anybody have any tips for generating spills/reloads for large > non-vector registers? > > I'm working on a back end for a DSP architecture that has accumulator > registers that are too large to be spilled or reloaded with a single > instruction. All of their bits can be accessed in word-size chunks via > three sub-registers (low, high, and ext). So loading or storing one > requires three instructions: one for each sub-register. > > For quite a while now, my implementation of loadRegFromStackSlot() and > storeRegToStackSlot() has assumed that it would only receive physical > registers, which makes it fairly straight-forward. They generate three > memory instructions, calling TargetRegisterInfo::getSubReg() to get the > sub-register operand for each of them. > > So it was a rude awakening when a test program resulted in a _virtual_ > register being passed into loadRegFromStackSlot() (via > LiveIntervals::tryFoldMemoryOperand() if it matters). Obviously I need > to make some changes. But what?This is quite simple to handle. A register MachineOperand has a subreg field for this purpose. It is used to pick out subregisters of a virtual register. For a physical register: MO.setReg(TRI.getSubReg(Reg, SubIdx)); For a virtual register: MO.setReg(Reg); MO.setSubReg(SubIdx); If you are using BuildMI, the subreg is passed as the third argument to addReg(). The register allocator (rewriter to be exact) will clear the subreg field when substituting the allocated physical register. Note that a physical register operand may not have a subreg. It must be 0. /jakob
On Tuesday, July 20, 2010 4:04 PM, Jakob Stoklund Olesen> > On Jul 20, 2010, at 10:57 AM, Ken Dyck wrote: > > > Does anybody have any tips for generating spills/reloads for large > > non-vector registers? > > [snip] > > > This is quite simple to handle. A register > MachineOperand has a subreg field for this > purpose. It is used to pick out subregisters > of a virtual register.Thanks, Jakob. That indeed was a simple fix.> The register allocator (rewriter to be exact) > will clear the subreg field when substituting > the allocated physical register.Speaking of the rewriter, I've had some problems recently where the rewriter replaces the last of the three load instructions with a COPY instruction because isLoadFromStackSlot() returns the same frame index for all three load. For example, load a.l, <fi#n>, 0 load a.l, <fi#n>, 0 load a.h <fi#n>, 1 ===> load a.h, <fi#n>, 1 load a.e <fi#n>, 3 move a.e, a.l I quickly hacked around the problem by returning a frame index only for the loads of the low sub-register (returning 0 for the rest), but I'm sure this isn't the best solution. Is there a simple way to avoid the replacement while still reporting the actual frame index for all of the load instructions? -Ken