thr3ads.net - llvm dev - [LLVMdev] Simpler subreg ops in machine code IR [Jun 2010]

If this information is useful, please help other people find it:
Share via:

Jakob Stoklund Olesen

2010-Jun-15 21:48 UTC

[LLVMdev] Simpler subreg ops in machine code IR

I am considering adding a new target independent codegen-only COPY instruction
to our MachineInstr representation. It would be used to replace INSERT_SUBREG,
EXTRACT_SUBREG, and virtual register copies after instruction selection.
Selection DAG still needs {INSERT,EXTRACT}_SUBREG, but they would not appear as
MachineInstrs any longer.

The COPY instruction handles subreg operations with less redundancy:

	%reg1045<def> = EXTRACT_SUBREG %reg1044<kill>, 4
	%reg1045<def> = COPY %reg1044:sub_32bit<kill>

	%reg1045<def> = INSERT_SUBREG %reg1045, %reg1044<kill>, 4
	%reg1045:sub_32bit<def> = COPY %reg1044<kill>

	%reg1050:ssub_0<def> = EXTRACT_SUBREG %reg1060:dsub_1<kill>, ssub_0
	%reg1050:ssub_0<def> = COPY %reg1060:ssub_2<kill>

It will also replace the TargetInstrInfo::copyRegToReg hook when copying virtual
registers:

	%reg1050 = COPY %reg1044<kill>

It will be lowered with a TII.copyRegToReg() call in LowerSubregsInstructionPass
(which may need renaming).


Why?


1. The new function CoalescerPair::isMoveInstr() can correctly determine if a
MachineInstr is a (partial) register copy with source and destination registers
and subreg indices. I think that is the only place it is done correctly
currently. Weird stuff like subreg indices on EXTRACT_SUBREG operands is pretty
hard to figure out. The COPY instruction is simpler.

2. Many more copies are created than are eventually output - the coalescer
removes most of the copies inserted by phi elimination and 2-addr pass. The
TII.copyRegToReg() hook is relatively expensive because it has to do register
class comparisons to pick the right copy instruction. Similarly the
TII.isMoveInstr() hook is more expensive than just checking for a COPY
instruction. By calling copyRegToReg() late, and by avoiding isMoveInstr()
entirely, the overhead is avoided.

3. The register class arguments to TII.copyRegToReg() can be eliminated - it
will only ever be called for physical registers. This means that the
implementation can be simpler, and sometimes better code can be generated. A
register may be allocated to a more conveniently than the register class
specifies. It also means that most of the annoying getMinimalPhysRegClass()
calls go away.

4. The COPY instruction does not impose register class constraints on its
operands, native copies do. This is important when we implement live interval
splitting in the register allocator. After splitting an interval the register
class can be recomputed. Without the copy constraints a larger register class
may be valid, and spilling might be avoided.

5. copyRegToReg() can insert multiple instructions, avoiding fun
pseudo-instructions like VMOVQQ and VMOVQQQQ.


Why Not?


1. copyRegToReg() won't be able to use register classes to pick a copy
opcode. For instance, an XMM register will no longer be copied by MOVSS or
MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem?

2. What else?

Rafael Espindola

2010-Jun-16 01:21 UTC

head link

[LLVMdev] Simpler subreg ops in machine code IR

> 1. copyRegToReg() won't be able to use register classes to pick a copy
opcode. For instance, an XMM register will no longer be copied by MOVSS or
MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem?
I haven't had time to really look into it, but have been playing
around with the idea that instead of two register classes copyRegToReg
and some of the load and store callbacks should take a MVT. This
should be enough, no? Anyway, I like anything that makes the backend
interface simpler :-)

Thanks a lot for working on this.

Cheers,
-- 
Rafael Ávila de Espíndola

Jakob Stoklund Olesen

2010-Jun-16 01:45 UTC

head link

[LLVMdev] Simpler subreg ops in machine code IR

On Jun 15, 2010, at 6:21 PM, Rafael Espindola wrote:
>> 1. copyRegToReg() won't be able to use register classes to pick a
copy opcode. For instance, an XMM register will no longer be copied by MOVSS or
MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem?
> 
> I haven't had time to really look into it, but have been playing
> around with the idea that instead of two register classes copyRegToReg
> and some of the load and store callbacks should take a MVT. This
> should be enough, no? Anyway, I like anything that makes the backend
> interface simpler :-)
Well, MVTs are even harder to get at.

MVTs disappear when the SelectionDAG IR is translated to MachineInstr IR.
Register classes disappear with register allocation.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 1929 bytes
Desc: not available
URL:
<http://lists.llvm.org/pipermail/llvm-dev/attachments/20100615/c760f2cc/attachment.bin>

Evan Cheng

2010-Jun-16 06:57 UTC

head link

[LLVMdev] Simpler subreg ops in machine code IR

On Jun 15, 2010, at 2:48 PM, Jakob Stoklund Olesen wrote:
> I am considering adding a new target independent codegen-only COPY
instruction to our MachineInstr representation. It would be used to replace
INSERT_SUBREG, EXTRACT_SUBREG, and virtual register copies after instruction
selection. Selection DAG still needs {INSERT,EXTRACT}_SUBREG, but they would not
appear as MachineInstrs any longer.
> 
> The COPY instruction handles subreg operations with less redundancy:
> 
> 	%reg1045<def> = EXTRACT_SUBREG %reg1044<kill>, 4
> 	%reg1045<def> = COPY %reg1044:sub_32bit<kill>
> 
> 	%reg1045<def> = INSERT_SUBREG %reg1045, %reg1044<kill>, 4
> 	%reg1045:sub_32bit<def> = COPY %reg1044<kill>
> 
> 	%reg1050:ssub_0<def> = EXTRACT_SUBREG %reg1060:dsub_1<kill>,
ssub_0
> 	%reg1050:ssub_0<def> = COPY %reg1060:ssub_2<kill>
> 
> It will also replace the TargetInstrInfo::copyRegToReg hook when copying
virtual registers:
> 
> 	%reg1050 = COPY %reg1044<kill>
> 
> It will be lowered with a TII.copyRegToReg() call in
LowerSubregsInstructionPass (which may need renaming).
Ok. An immediate concern is this will make it harder to understand machine
instructions. For example, it's hard for me to read
%reg1045:sub_32bit<def> = COPY %reg1044<kill>
as updating part of reg1045. It's solvable by pretty printing COPY
instructions that include register class information.
> 
> 
> Why?
> 
> 
> 1. The new function CoalescerPair::isMoveInstr() can correctly determine if
a MachineInstr is a (partial) register copy with source and destination
registers and subreg indices. I think that is the only place it is done
correctly currently. Weird stuff like subreg indices on EXTRACT_SUBREG operands
is pretty hard to figure out. The COPY instruction is simpler.
Agreed.
> 
> 2. Many more copies are created than are eventually output - the coalescer
removes most of the copies inserted by phi elimination and 2-addr pass. The
TII.copyRegToReg() hook is relatively expensive because it has to do register
class comparisons to pick the right copy instruction. Similarly the
TII.isMoveInstr() hook is more expensive than just checking for a COPY
instruction. By calling copyRegToReg() late, and by avoiding isMoveInstr()
entirely, the overhead is avoided.
Ok. It seems like a nice secondary benefit.
> 
> 3. The register class arguments to TII.copyRegToReg() can be eliminated -
it will only ever be called for physical registers. This
I am not sure I follow this. Why would the regclass arguments be eliminated?
> means that the implementation can be simpler, and sometimes better code can
be generated. A register may be allocated to a more conveniently than the
register class specifies. It also means that most of the annoying
getMinimalPhysRegClass() calls go away.
True.
> 
> 4. The COPY instruction does not impose register class constraints on its
operands, native copies do. This is important when we implement live interval
splitting in the register allocator. After splitting an interval the register
class can be recomputed. Without the copy constraints a larger register class
may be valid, and spilling might be avoided.
Ok.
> 
> 5. copyRegToReg() can insert multiple instructions, avoiding fun
pseudo-instructions like VMOVQQ and VMOVQQQQ.
The reason for VMOVQQ etc. is actually to keep scavenger happy by pretending the
whole QQ register is updated / referenced. With sub-register indices on the
operands, it's possible for current copyRegToReg to expand to multiple
instructions. But then the scavenger can't determine a register is partially
defined.

> 
> 
> Why Not?
> 
> 
> 1. copyRegToReg() won't be able to use register classes to pick a copy
opcode. For instance, an XMM register will no longer be copied by MOVSS or
MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem?
I don't see this being a big problem. That said, I am not sure I totally
follow this. Do you propose we eliminate register classes arguments even when
virtual register copies are being created?
> 
> 2. What else?
Every corner case that we can't anticipate right now. :-)

Evan
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

Jakob Stoklund Olesen

2010-Jun-16 19:57 UTC

head link

[LLVMdev] Simpler subreg ops in machine code IR

On Jun 15, 2010, at 11:57 PM, Evan Cheng wrote:
> 
> On Jun 15, 2010, at 2:48 PM, Jakob Stoklund Olesen wrote:
> 
>> I am considering adding a new target independent codegen-only COPY
instruction to our MachineInstr representation. It would be used to replace
INSERT_SUBREG, EXTRACT_SUBREG, and virtual register copies after instruction
selection. Selection DAG still needs {INSERT,EXTRACT}_SUBREG, but they would not
appear as MachineInstrs any longer.
>> 
>> The COPY instruction handles subreg operations with less redundancy:
>> 
>> 	%reg1045<def> = EXTRACT_SUBREG %reg1044<kill>, 4
>> 	%reg1045<def> = COPY %reg1044:sub_32bit<kill>
>> 
>> 	%reg1045<def> = INSERT_SUBREG %reg1045, %reg1044<kill>, 4
>> 	%reg1045:sub_32bit<def> = COPY %reg1044<kill>
>> 
>> 	%reg1050:ssub_0<def> = EXTRACT_SUBREG
%reg1060:dsub_1<kill>, ssub_0
>> 	%reg1050:ssub_0<def> = COPY %reg1060:ssub_2<kill>
>> 
>> It will also replace the TargetInstrInfo::copyRegToReg hook when
copying virtual registers:
>> 
>> 	%reg1050 = COPY %reg1044<kill>
>> 
>> It will be lowered with a TII.copyRegToReg() call in
LowerSubregsInstructionPass (which may need renaming).
> 
> Ok. An immediate concern is this will make it harder to understand machine
instructions. For example, it's hard for me to read
> %reg1045:sub_32bit<def> = COPY %reg1044<kill>
> as updating part of reg1045. It's solvable by pretty printing COPY
instructions that include register class information.
It is safe to assume that a <def> operand with a subreg index represents a
partial update to a subregister on any instruction. You can't have a noop
subreg index, for instance TRI.getSubReg(%EAX, sub_32bit) returns 0. X86 has
some weird idempotent subreg indices, TRI.getSubReg(%XMM0, sub_ss) returns
%XMM0, but that is an exception.

So this is an insert_subreg:

	%reg1045:sub_32bit<def> = COPY %reg1044<kill>

and this is a normal copy:

	%reg1045<def> = COPY %reg1044<kill>

But decorating with register classes is a good idea. The question is how to do
it without too much clutter. I thought about listing them after the operands:

	%reg1045:sub_32bit<def> = COPY %reg1044<kill>	; GR64 GR32_ABCD

Inline is tricky:

	(GR64)%reg1045:sub_32bit<def> = COPY (GR32_ABCD)%reg1044<kill>

[...]
>> 3. The register class arguments to TII.copyRegToReg() can be eliminated
- it will only ever be called for physical registers. This
> 
> I am not sure I follow this. Why would the regclass arguments be
eliminated?
If you look in LowerSubRegs.cpp, copying physical registers goes like this:

    const TargetRegisterClass *TRCS =
TRI->getPhysicalRegisterRegClass(DstReg);
    const TargetRegisterClass *TRCD =
TRI->getPhysicalRegisterRegClass(SrcReg);
    bool Emitted = TII->copyRegToReg(*MBB, MI, DstReg, SrcReg, TRCD, TRCS,
                                     MI->getDebugLoc());
    (void)Emitted;
    assert(Emitted && "Subreg and Dst must be of compatible
register class");

The register class arguments provide no added value in this case. If
copyRegToReg wants register classes, it can call getPhysicalRegisterRegClass()
itself.

When copying virtual registers, the regclass arguments are essential, but
copyRegToReg won't be used for copying virtual registers any longer.
>> 5. copyRegToReg() can insert multiple instructions, avoiding fun
pseudo-instructions like VMOVQQ and VMOVQQQQ.
> 
> The reason for VMOVQQ etc. is actually to keep scavenger happy by
pretending the whole QQ register is updated / referenced. With sub-register
indices on the operands, it's possible for current copyRegToReg to expand to
multiple instructions. But then the scavenger can't determine a register is
partially defined.
Good point.
>> 1. copyRegToReg() won't be able to use register classes to pick a
copy opcode. For instance, an XMM register will no longer be copied by MOVSS or
MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem?
> 
> I don't see this being a big problem. That said, I am not sure I
totally follow this. Do you propose we eliminate register classes arguments even
when virtual register copies are being created?
No, copying of virtual registers is done with the generic COPY instruction. It
would be an error to pass a virtual register to copyRegToReg.

COPY is lowered to native instructions with copyRegToReg after register
allocation where all registers are physical.

A transitional implementation would look like this:

  virtual void copyRegToReg(MachineBasicBlock &MBB,
                            MachineBasicBlock::iterator MI,
                            unsigned DstReg, unsigned SrcReg,
                            DebugLoc DL) const {
    assert(TargetRegisterInfo::isPhysicalRegister(DstReg));
    assert(TargetRegisterInfo::isPhysicalRegister(SrcReg));
    const TargetRegisterInfo *TRI =
MBB.getParent()->getTarget().getRegisterInfo();
    const TargetRegisterClass *TRCS =
TRI->getPhysicalRegisterRegClass(DstReg);
    const TargetRegisterClass *TRCD =
TRI->getPhysicalRegisterRegClass(SrcReg);
    bool Emitted = copyRegToReg(*MBB, MI, DstReg, SrcReg, TRCD, TRCS, DL));
    (void)Emitted;
    assert(Emitted && "Nobody uses the copyRegToReg return
value");
  }
>> 2. What else?
> 
> Every corner case that we can't anticipate right now. :-)
I suspect that our handling of partial redefines in not yet complete enough that
we can do without the explicit use operand on INSERT_SUBREG. We'll see.

Possibly Parallel Threads

Search for more reasonably related threads

llvm dev - Jun 2010 - [LLVMdev] Simpler subreg ops in machine code IR

[LLVMdev] Simpler subreg ops in machine code IR

[LLVMdev] Simpler subreg ops in machine code IR

[LLVMdev] Simpler subreg ops in machine code IR

[LLVMdev] Simpler subreg ops in machine code IR

[LLVMdev] Simpler subreg ops in machine code IR

Possibly Parallel Threads